- July 26th, 2011
- 6 Comments
The benefits of using a “version control system” are many. It can improve software quality, facilitate collaboration and even help you become a better developer or designer. In this three-part series, I will introduce you to the increasingly popular Git1 version control system. I’ll discuss the main benefits and features of Git and finally demonstrate how to integrate it into your workflow.
In this first part, we will cover the basic background information for understanding how — and more importantly, why — to use Git. In the second and third parts, we will take a closer look at Git’s features, including branching and merging, and discuss how to use it in your own design and development projects.
Git: Born Of Necessity
Linus Torvalds was unsatisfied: none of the version control systems (VCS) available in 2005 met his requirements. Once the proprietary version control system BitKeeper changed its license agreement, it couldn’t be used to manage the Linux kernel project anymore. An alternative had to be found, one that was distributed, scalable and — above all — fast.
The Linux community took action by starting two new projects: Git3 and Mercurial4. Both have their origins in this emergency, and they are among today’s leading distributed version control systems. Git is now used in countless well-known open-source projects: the Linux kernel, jQuery, Ruby on Rails, Symfony, CakePHP, Debian, Fedora, Perl and many more. The large number of tutorials and tools, including desktop clients, shows how important Git has become.
Centralized Vs. Distributed
Git (like Mercurial) is a “distributed” version control system (DVCS). The classic systems like Subversion and CVS, in contrast, function as centralized systems (CVCS).
In centralized systems, there is only one “master” repository, which every developer feeds their changes into. Every action must be synchronized with this central repository. And because it usually resides on a central server, each action has to pass through the network — leaving a developer unable to work if they happen to have no network connection.
In distributed systems, each developer has their own full-fledged repository on their computer. In most set-ups there’s an additional central repository on a server that’s used for sharing. However, this is not a requirement; every developer can perform all important actions in their local repository: committing changes, viewing differences between revisions, switching branches, etc.
One of Git’s main advantages is its distributed nature. It doesn’t matter whether you’re using a complex set-up with multiple remote repositories or you have just one central server to share code (working “Subversion style”). A DVCS can be used independently of any one person’s workflow. Being able to work offline is an important advantage of DVCS for many developers. You can work without constraints, even if you’re not connected to the network.
Git saves quite some time in your daily workflow.
Speed is another important factor, and the differences between Git and other DVCS here are evident. In almost any situation, Git is faster than other modern systems, such as Mercurial and Bazaar. One of the reasons for Git’s remarkable speed is that it was written in C. Another reason is that it was designed to work with the Linux kernel and therefore has to perform well even under huge amounts of data.
Another convenience: every local Git repository can serve as a full-fledged back-up, because it contains the project’s complete history. And considering that almost every action in Git only adds data, losing data is pretty hard to do.
The biggest advantages, however, lie in Git’s feature set: in how it deals with code and in its tools and workflows. We’ll take a closer look at things like the staging area, the stash and the concept of branching later on.
The Local Git Repository
In SVN, every directory that’s under version control is assigned a hidden .svn folder, which saves all relevant meta data for that directory. Have you ever (inadvertently, of course) deleted or moved this magical folder? If so, then you’ll appreciate that Git has only one of these folders. This means you can move, delete or rename in your favorite editor or file browser as you wish, without any headaches the next morning.
This folder (.git) resides in the root directory of your project and makes up your local repository. The actual files you work with comprise your so-called “working directory” (or “working tree”).
In addition to the .git repository and the working directory, the third crucial part is the so-called “staging area.” This enables you to precisely define which changes you want to have in your next commit — even down to individual lines in a file.
Let’s consider this more carefully, because it’s one of Git’s best features. Say you have modified 10 files, and you realize that splitting these changes into two separate commits would be best (because every commit should contain only related changes and not be a hodgepodge). Using the staging area, you can define exactly which changes to commit and which to leave for a later commit.
Technically, the staging area is nothing more than a file named index that lies in your local .git repository. That’s why it’s sometimes referred to as the index.
States of Files
In Git, a file in your working directory can be in one of several states. The most basic distinction is between “tracked” and “untracked.” A file is tracked if it’s already under version control; in this case, Git observes all changes in that file. If it’s not (yet) saved in your Git repository, then it’s treated as untracked.
Tracked files can be in one of the following three states:
- Unmodified or committed
The file is in its last committed state and therefore has no local modifications.
The file has been changed since it was last committed.
The file was not only changed but also added to the staging area, meaning that the changes will be included in the next commit. Because a file can be staged partially, it’s entirely possible for it to be both staged and modified.
A Basic Workflow
A typical workflow in Git usually consists of the following steps:
- You modify, create or delete files in your working directory. Your favorite editor or file browser is perfectly suited to this job.
- You execute the
git statuscommand in the command line to see an overview of what has been changed. People use this command rather frequently in their workflow to stay on top of things.
- You add all changes for the next commit to the staging area using the
- Finally, you execute
git committo save the staged changes in your local repository.
git status command gives you an overview of the state of your working directory.
Hashes Instead of Revision Numbers
I’ll try to break this to you gently: Git doesn’t understand revision numbers.
In a CVCS such as Subversion, every commit is assigned a consecutive revision number. This doesn’t work in DVCS anymore. Because commits are created locally, the system can’t assign consecutive numbers. Imagine the developers on a team working on their own, producing commits locally, and then publishing their work on a shared remote repository as they go along. When an individual commit is created locally, there’s no way for Git to foresee the eventual order.
A short summary of a commit in Git.
So, Git uses SHA-1 hashes to identify commits (and all other objects) internally. SHA-1 hashes are 40-character checksums that are unique, like conventional revision numbers, but with the benefit of being compatible with DVCS.
Installation and Tools
In the next article in this three-part series, we’ll look at how to actually use Git in practice including branching and merging your files.
- 1 http://git-scm.com/
- 2 http://git-scm.com/
- 3 http://git-scm.com/
- 4 http://mercurial.selenic.com
- 5 http://progit.org/book/ch1-4.html
- 6 http://www.git-tower.com
- 7 http://code.google.com/p/tortoisegit/