This document tells you how to follow Linux kernel development (and examine its history) with git. It does not assume you've ever used a source control system before, nor does it assume that you're familiar with "distributed" vs "centralized" source control systems.
This document describes a read-only approach, suitable for trying out recent versions quickly, using "git bisect" to track down bugs, and applying patches temporarily to see if they work for you. If you want to learn how to save changes into your copy of the git history and submit them back to the kernel developers through git, you'll need a much larger tutorial that explains concepts like "branches". This one shouldn't get in the way of doing that sort of thing, but it doesn't go there.
First, install a recent version of git. (Note that the user interface changed drastically in git-1.5.0, and this page only describes the new interface.)
If your distro doesn't have a recent enough version, you can grab a source tarball and build it yourself. (There's no ./configure, as root go "make install prefix=/usr". It needs zlib, libssl, libcurl, and libexpat.)
When building from source, the easy way to get the man pages is to download the appropriate git-manpages tarball (at the same URL as the source code) and extract it into /usr/share/man. You want the man pages because "git help" displays them.
The following command will download the current linux-kernel repository into a local directory called "linux-git":
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-git
This downloads a local copy of the entire revision history (back to 2.6.12-rc2), which takes a couple hundred megabytes. It extracts the most recent version of all the files into your linux-git directory, but that's just a snapshot (generally referred to by git people as your "working copy"). The history is actually stored in the subdirectory "linux-git/.git", and the snapshot can be recreated from that (or changed to match any historical version) via various git commands explained below.
You start with an up-to-the-minute copy of the linux kernel source, which you can use just like an extracted tarball (ignoring the extra files in the ".git" directory). If you're interested in history from the bitkeeper days (before 2.6.12-rc2), that's stored in a seperate repository, "git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git". (Here is a list of all git repositories hosted on kernel.org.)
(If you forget the URL a git repository came from, it's in the file ".git/FETCH_HEAD". Normally you shouldn't need to care, since git remembers it.)
The command "git pull" downloads all the changes committed to Linus's git repository since the last time you updated your copy, and appends those commits to your copy of the repository (in the .git subdirectory). In addition, this will automatically update the files in your working copy as appropriate. (If your working copy was set to a historical version, it won't be changed, but returning your working copy to the present after a pull will get you the newest version.)
Note that this copies the revision history into your local .git directory. Other git commands (log, checkout, tag, blame, etc.) don't need to talk to the server, you can work on your laptop without an internet connection (or with a very slow one) and still have access to the complete revision history you've already downloaded.
The git log command lists the changes recorded in your repository, starting with the most recent and working back. The big hexadecimal numbers are unique identifiers (sha1sum) for each commit. If you want to specify a commit, you only need the first few digits, enough to form a unique prefix. (Six digits should be plenty.)
You can limit the log to a specific file or directory, which lists only the commits changing that file/directory. Just add the file(s) you're interested in to the end of the git log command line.
The git tag -l command shows all the tagged releases. These human-readable names can be used as synonyms for the appropriate commit identifier, which is useful when doing things like checkout and diff. The special tag "master" points to the most recent commit.
The git blame $FILE command displays all the changes that resulted in the current state of a file. It shows each line, prefixed with the commit identifier which last changed that line. (If the default version of git blame is difficult to read on an 80 charater terminal, try git blame $FILE | sed 's/(.*)//' to see more of the file itself.)
The git checkout command changes your working copy of the source to a specific version. The -f option to checkout backs out any local changes you've made to the files. The git clean command deletes any extra files in your working directory (ones which aren't checked into the repository). The -d option to clean deletes untracked directories as well as files.
So to reset your working copy of the source to a historical version, go git checkout -f $VERSION; git clean -d where $VERSION is the tag or sha1sum identifier of the version you want. If you don't specify a $VERSION, git will default to "master" which is the most recent checkout in Linus's tree (what mercurial calls "tip" and Subversion calls HEAD), returning you to the present and removing any uncommitted local changes.
Another way to undo all changes to your copy is to do "rm -rf *" in the linux-git directory (which doesn't delete hidden files like ".git"), followed by "git checkout -f" to grab fresh copies from the repository in the .git subdirectory. This generally isn't necessary. Most of the time, git checkout -f is sufficient to reset your working copy to the most recent version in the repository.
If you lose track of which version is currently checked out as your working copy, use git log to see the most recent commits to the version you're looking at, and git log master to compare against the most recent commits in the repository.
The command "git diff" shows differences between git versions. You can ask it to show differences between:
What git is doing is checking each argument to see if it recognizes it as a historical version sha1sum or tag, and if it isn't it checks to see if it's a file. If this is likely to cause confusion, you can use the magic argument "--" to indicate that all the arguments before that are versions and all the arguments after that are filenames.
The argument --find-copies-harder tells git diff to detect renamed or copied files. Notice that git diff has a special syntax to indicate renamed or copied files, which is much more concise and portable than the traditional behavior of removing all lines from one file and adding them to another. (This behavior may become the default in a future version.)
The git archive $VERSION command creates a tarball (written to stdout) of the given version. Note that "master" isn't the default here, you have to specify that if you want the most up-to-date version. You can pipe it through bzip and write it to a file (git archive master | bzip2 > master.tar.bz2) or you can use git archive to grab a clean copy out of your local git repository and extract it into another directory, ala:
mkdir $COPY git archive master | tar xCv $COPY
You can also use the standard Linux kernel out-of-tree building infrastructure on the git working directory, ala:
cd $GITDIR make allnoconfig O=$OTHERDIR cd $OTHERDIR make menuconfig make
Finally, you can build in your git directory, and then clean it up afterwards with git checkout -f; git clean -d. (Better than "make distclean".)
Possibly the most useful thing git does for non-kernel developers is git bisect, which can track down a bug to a specific revision. This is a multi-step process which binary searches through the revision history to find a specific commit responsible for a testable change in behavior.
(You don't need to know this, but bisect turns out to be nontrivial to implement in a distributed source control system, because the revision history isn't linear. When the history branches and comes back together again, binary searching through it requires remembering more than just a single starting and ending point. That's why bisect works the way it does.)
The git bisect commands are:
To track down the commit that introduced a bug via git bisect, start with git bisect reset master (just to be safe), then git bisect start. Next identify the last version known to work (ala git bisect good v2.6.20), and identify the first bad version you're aware of (if it's still broken, use "master".)
After you identify one good and one bad version, git will grind for a bit and reset the working directory state to some version in between, displaying the version identifier it selected. Test this version (build and run your test), then identify it as good or bad with the appropriate git bisect command. (Just "git bisect good" or "get bisect bad", there's no need to identify version here because it's the current version.) After each such identification, git will grind for a bit and find another version to test, resetting the working directory state to the new version until it narrows it down to one specific commit.
The biggest problem with git bisect is hitting a revision that doesn't compile properly. When the build breaks, you can't determine whether or not the current version is good or bad. This is where git bisect log comes into play.
When in doubt, save the git bisect log output to a file (git bisect log > ../bisect.log). Then make a guess whether the commit you can't build would have shown the problem if you could build it. If you guess wrong (hint: every revision bisect wants to test after that comes out the opposite of your guess, all the way to the end) do a git bisect replay ../bisect.log to restart from your saved position, and guess the other way. If you realize after the fact you need to back up, the bisect log is an easily editable text file you can always chop a few lines off the end of.
Here is a real git bisect run I did on the qemu git repository (git://git.kernel.dk/data/git/qemu) to figure out why the PowerPC Linux kernel I'd built was hanging during IDE controller intiialization under the current development version of qemu-system-ppc (but not under older versions).
$ git bisect reset master Already on branch "master" $ git bisect good release_0_8_1 You need to start by "git bisect start" Do you want me to do it for you [Y/n]? y $ git bisect bad master Bisecting: 753 revisions left to test after this [7c8ad370662b706b4f46497f532016cc7a49b83e] Embedded PowerPC Device Control Registers infrastructure. $ ./configure && make -j 2 && ~/mytest ... Unhappy :( $ git bisect bad # The test failed Bisecting: 376 revisions left to test after this [255d4f6dd496d2d529bce38a85cc02199833f080] Simplify error handling again. $ ./configure && make -j 2 && ~/mytest WARNING: "gcc" looks like gcc 4.x Looking for gcc 3.x ./configure: 357: Syntax error: Bad fd number $ git bisect log > ../bisect.log # Darn it, build break. Save state and... $ git bisect good # Wild guess because I couldn't run the test. Bisecting: 188 revisions left to test after this [16bcc6b31799ca01cd389db7cb90a345e9b68dd9] Fix wrong interrupt number for the second serial interface. $ ./configure && make -j 2 && ~/mytest ... Happy :) $ git bisect good # Hey, maybe my guess was right Bisecting: 94 revisions left to test after this [37781cc88f69e45624c1cb15321ddd2055cf74b6] Fix usb hid and mass-storage protocol revision, by Juergen Keil. $ ./configure && make -j 2 && ~/mytest ... Happy :) $ git bisect good Bisecting: 47 revisions left to test after this [30347b54b7212eba09db05317217dbc65a149e25] Documentation update $ ./configure && make -j 2 && ~/mytest ... Happy :) $ git bisect good Bisecting: 23 revisions left to test after this [06a21b23c22ac18d04c9f676b9b70bb6ef72d7f1] Set proper BadVAddress value for unaligned instruction fetch. $ ./configure && make -j 2 && ~/mytest ... Happy :) $ git bisect good Bisecting: 11 revisions left to test after this [da77e9d7918cabed5b0725f87496a1dc28da8b8c] Fix exception handling cornercase for rdhwr. $ ./configure && make -j 2 && ~/mytest ... Happy :) $ git bisect good Bisecting: 5 revisions left to test after this [36f447f730f61ac413c5b1c4a512781f5dea0c94] Implement embedded IRQ controller for PowerPC 6xx/740 & 750. $ ./configure && make -j 2 && ~/mytest ... Unhappy :( $ git bisect bad # Oh good, I was getting worried I'd guessed wrong above... Bisecting: 2 revisions left to test after this [d4838c6aa7442fae62b08afbf4c358200f10ec74] Proper handling of reserved bits in the context register. $ ./configure && make -j 2 && ~/mytest ... Happy :) Bisecting: 1 revisions left to test after this [a8b64e6f4c7f3c4850be5fd303bf590564264294] Fix monitor disasm output for Sparc64 target $ ./configure && make -j 2 && ~/mytest ... Happy :) $ git bisect good 36f447f730f61ac413c5b1c4a512781f5dea0c94 is first bad commit commit 36f447f730f61ac413c5b1c4a512781f5dea0c94 Author: j_mayer
Date: Mon Apr 9 22:45:36 2007 +0000 Implement embedded IRQ controller for PowerPC 6xx/740 & 750. Fix PowerPC external interrupt input handling and lowering. Fix OpenPIC output pins management. Fix multiples bugs in OpenPIC IRQ management. Fix OpenPIC CPU(s) reset function. Fix Mac99 machine to properly route OpenPIC outputs to the PowerPC input pins. Fix PREP machine to properly route i8259 output to the PowerPC external interrupt pin. :100644 100644 0eabacd6434b8e40876581605c619513bf9ac512 284cb92ae83a2a36e05137d3532106ff85167364 M cpu-exec.c :040000 040000 68740f5b1330c7859abfea3ce31062cb92adaa7f 5c48b0d20f1c4d3115881b5e9e5b6c1d681f4880 M hw :040000 040000 3ad1f0d09c60d8190d98b28318519ebaaccbb569 69efc274cec1801848de9238ae71e97681978433 M target-ppc :100644 100644 2f87946e874e8f6cbf9afd47c65e0baff236dc45 b40ff3747530d275181ff071c9cc9cff1d5ba02d M vl.h $ git bisect reset
git help- List available commands. You can also go git help COMMANDNAME to see help on a specific command. Note, this displays the man page for the appropriate command, so you need to have the git man pages installed for it to work.
git clone git://blah/blah/blah localdir - Download a repository from the web into "localdir". Linus's current repository is at "git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git", the old history from the bitkeeper days (before 2.6.12-rc2) is at "git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git", and there are lots of other trees hosted on kernel.org and elsewhere.
git pull - Freshen up your local copy of the repository, downloading and merging all of Linus's changes since last time you did this. In addition to appending lots of commits to your repository in the .git directory, this also updates the snapshot of the files (if it isn't already pointing into the past).
git log - List the changes recorded in your repository, starting with the most recent and working back. Note: the big hex numbers are unique identifiers (sha1sum) for each commit. If you want to specify a commit, you only need a unique prefix (generally the first four digits is enough).
git tag -l - Show all the tagged releases. These human-readable names can be used as synonyms for the appropriate commit identifier when doing things like checkout and diff. (Note, the special tag "master" points to the most recent commit.)
git checkout -f; git clean -d - reset your snapshot to the most recent commit. The "checkout" command updates your snapshot to a specific version (defaulting to the tip of the current branch). The -f argument says to back out any local changes you've made to the files, and "clean -d" says to delete any extra files in the snapshot that aren't checked into the repository.
git diff - Show differences between two commits, such as "git diff v2.6.20 v2.6.21". You can also specify specific files you're interested in, ala "git diff v2.6.20 v2.6.21 README init/main.c". If you specify one version it'll compare your working directory against that version, and if you specify no versions it'll compare the version you checked out against your working directory. Anything that isn't recognized as the start of a commit indentifying sha1sum, or a tagged release, is assumed to be a filename. If this causes problems, you can add "--" to the command line to explicitly specify that arguments before that (if any) are version identifiers and all the arguments after that are filenames. Add "--find-copies-harder" to detect renames.
In this Google Tech Talk