- August 3rd, 2011
- 5 Comments
In the first part of this series1, I introduced you to the Git version control system. We looked at the history of the project, highlighted well-known open-source projects that use it (Ruby on Rails, jQuery and the Linux kernel), discussed its key features and went over a very basic workflow scenario. In this second part, we’ll go into more detail and get our hands dirty with a look at a real-world workflow.
Improve Development Quality
When inviting people to learn a “complex” new technology, you’ll hardly get volunteers. But what if the technology could improve software quality and maybe even your own way of developing software? Git is such a technology for which investing time is worth it. Moreover, desktop clients such as Tower3 for Mac OS (disclaimer: this is the author’s product) and Tortoise Git4 for Windows make a lot of the tasks easier.
Installation and Help
If a recent version of Git is not installed on your machine, you can quickly and easily catch up by, for example, reading the instructions in Scott Chacon’s free eBook Pro Git5. I’ll mention commands only briefly in this article. To get more detailed descriptions and parameter listings, you can use
git help <command> in the command line or browse the GitRef6 online.
Creating And Cloning Repositories
Having a repository is the most basic requirement for working in Git. If there is not one for your current project or if you’re starting afresh, then create a new repository in the current location by executing
git init in the command line. If a remote repository already exists on a server, then you can get it via
git clone on your machine. You can learn more about
git clone on the GitRef website7.
After working for some time, you’ll have a couple of new, deleted or modified files that you’ll want to save to your local repository as commits. As a first step, we’ll use
git status to show us which changes we currently have in our working directory.
To add certain changes to the next commit, you have to explicitly add them to Git’s “staging area.” The command
git add is used for this purpose. Let’s look at a concrete scenario:
The paragraph that begins “Changes to be committed” lists all of the files that will be included in the next commit. The changes had to be added to the staging area via
The paragraph that begins “Changes not staged for commit” lists all of the files that have been changed but that have not been added to the staging area. Therefore, they won’t be included in our next commit but will remain simply as changes in our working directory.
The last paragraph lists “Untracked files,” files that aren’t under version control yet. In other words they are unknown to Git.
You might have spotted a little peculiarity: error.html is listed twice! That’s because we staged some of the changes in this file, while leaving other changes in the same file unstaged. This feature — staging individual files or even parts of a file — enables us to create extremely granular commits that really only contain related changes.
After having composed the commit exactly as we want it to be, we can save it to our local repository via
Once a repository contains a couple of commits, we should look into its history (or “log”). The
git log command gives us an overview of the last few commits. Git log’s default output shows each commit with its SHA-1 hash (which is the equivalent of a revision number in a classic centralized VCS, or CVCS), its author, the date and the message used.
Customizing git log
You can bend the presentation of
git log almost completely to your will. Using the parameter
--oneline, for example, reduces the output to a single line per commit. Using
-p adds the “diff” (or patch) to every commit, thereby showing you what detailed changes were introduced. Customization can be taken to the point where you have your very own output format:
Branching And Merging
Branching is often referred to as Git’s killer feature, and rightly so. If you already know the term from another VCS like Subversion, then take an unbiased, fresh look at the topic. Using branches in Git is extremely easy and fast because they have been a core feature since the very beginning. They’re certainly one of the most important tools in a developer’s daily work. But what are they actually used for?
Branches allow you to cleanly separate variants or features in your code base. “Separation of concerns” is a good term for this.
When to Use Branching
Take the following scenario. At a certain point, you decide to develop a new feature. To separate it from your other development work, you create a new branch, based on the current status. The
git branch command creates the new branch, while
git checkout makes it your current working branch.
After having implemented a couple of new features, you commit the current status to this branch (C2 in the sketch below). But you haven’t yet finished developing the new feature when an urgent bug report forces you to continue working on your code’s main line (your production or “live” code).
After having switched back (via
git checkout) to your master branch, you commit the fix for the problem (C3). Again, you switch to your feature branch and commit some more changes (C4) to complete the development for this feature. Finally, you use
git merge to unite the two branches again (C5).
Imagine this scenario without branches. Because you already had changes that belonged to an unfinished, not yet releasable feature, you would have been mixing two different concerns (the experimental feature and the bug fix in your production code).
With only one feature, the problem might not have been major. But in larger projects, with numerous parallel features and development stages, such processes quickly become confusing and error-prone without branches.
Of course, Git is not the only VCS that uses the concept of branches, but it does make branching easier, faster and more effortless than any other system. As soon as you discover branching in Git, switching your active branch a dozen times a day or creating a new branch for every bug fix, however small, becomes totally run of the mill. Greater clarity and a clear separation of code is worthwhile in the long run.
Terms Related to Branching
The currently active branch is called “HEAD” in Git. Switching the HEAD is done with the
git checkout command (not related to the
checkout command in SVN). Your working directory will then contain the files that belong to this branch (or, more precisely, that belong to the last commit in this branch).
Sharing Code: Working With Remote Repositories
Remote repositories are used to make your code available to teammates and to integrate code from other developers into your local repository. With
git remote add, you can connect any number of remotes to your local repository.
To publish local commits from your current branch to a remote repository, use the
git push command.
Integrating remote changes into your local repository, on the other hand, is done via
git fetch. This downloads all of the changes from the remote server to your local computer, but it does not modify the files in your working directory in any way!
You can decide for yourself when to integrate the downloaded changes into your current HEAD via
git merge. Alternatively, you can use
git pull to combine downloading and merging into one step.
In the third and final part of this series, we’ll look at some more advanced concepts of Git, and highlight some useful tips and tricks.
- 1 http://coding.smashingmagazine.com/2011/07/26/modern-version-control-with-git-series/
- 2 http://git-scm.com/
- 3 http://www.git-tower.com
- 4 http://code.google.com/p/tortoisegit/
- 5 http://progit.org/book/ch1-4.html
- 6 http://gitref.org
- 7 http://gitref.org/creating/#clone
- 8 http://coding.smashingmagazine.com/wp-content/uploads/2011/06/git2_3_git-log-format.gif