view pending/git-quick.html @ 77:27dcbe1b4669

The git quickstart guide I've been working on for ages.
author Rob Landley <rob@landley.net>
date Thu, 18 Oct 2007 18:40:17 -0500
parents
children
line wrap: on
line source

<title>Following Linux kernel development with git</title>

<h2>A "git bisect HOWTO" with a few extras.</h2>

<p>This document tells you how to follow Linux kernel development (and
examine its history) with git.  It does not assume you've ever used a source
control system before, nor does it assume that you're familiar with
"distributed" vs "centralized" source control systems.</p>

<p>This document describes a read-only approach, suitable for trying out
recent versions quickly, using "git bisect" to track down bugs, and
applying patches temporarily to see if they work for you.
If you want to learn how to save changes into your copy of the git history and
submit them back to the kernel developers through git, you'll need
<a href=http://www.kernel.org/pub/software/scm/git/docs/tutorial.html>a much
larger tutorial</a> that explains concepts like "branches".  This one
shouldn't get in the way of doing that sort of thing, but it doesn't go there.</p>

<h2>Installing git</h2>

<p>First, install a recent version of git.  (Note that the user interface
changed drastically in git-1.5.0, and this page only describes the new
interface.)</p>

<p>If your distro doesn't have a recent enough version, you can grab a
<a href=http://www.kernel.org/pub/software/scm/git/>source tarball</a> and
build it yourself.  (There's no ./configure, as root go
"make install prefix=/usr".  It needs zlib, libssl, libcurl, and libexpat.)</p>

<p>When building from source, the easy way to get the man pages is to download
the appropriate git-manpages tarball (at the same URL as the source code)
and extract it into /usr/share/man.  You want the man pages because "git help"
displays them.</p>

<h2>Downloading the kernel with git</h2>

<p>The following command will download the current linux-kernel repository into
a local directory called "linux-git":</p>

<blockquote>
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-git
</blockquote>

<p>This downloads a local copy of the entire revision history (back to
2.6.12-rc2), which takes a couple hundred megabytes.  It extracts the most
recent version of all the files into your linux-git directory, but that's just
a snapshot (generally referred to by git people as your
"<a href=http://www.kernel.org/pub/software/scm/git/docs/glossary.html#def_working_tree>working copy</a>").
The history is actually stored in the subdirectory "linux-git/.git", and the
snapshot can be recreated from that (or changed to match any historical
version) via various git commands explained below.</p>

<p>You start with an up-to-the-minute copy of the linux kernel source, which
you can use just like an extracted tarball (ignoring the extra files in the
".git" directory).  If you're interested in history from the bitkeeper days
(before 2.6.12-rc2), that's stored in a seperate repository,
"<b>git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git</b>".
(<a href=http://git.kernel.org>Here is a list of all git repositories hosted
on kernel.org</a>.)</p>

<p>(If you forget the URL a git repository came from, it's in the file
".git/FETCH_HEAD".  Normally you shouldn't need to care, since git remembers
it.)</p>

<h2>Updating your local copy</h2>

<p>The command "<b>git pull</b>" downloads all the changes committed to Linus's
git repository since the last time you updated your copy, and appends those
commits to your copy of the repository (in the .git subdirectory).  In addition,
this will automatically update the files in your working copy as appropriate.
(If your working copy was set to a historical version, it won't be changed,
but returning your working copy to the present after a pull will get you the
newest version.)</p>

<p>Note that this copies the revision history into your local .git directory.
Other git commands (log, checkout, tag, blame, etc.) don't need to talk to
the server, you can work on your laptop without an internet connection (or
with a very slow one) and still have access to the complete revision history
you've already downloaded.</p>

<h2>Looking at historical versions</h2>

<p>The <b>git log</b> command lists the changes recorded in your repository,
starting with the most recent and working back.  The big hexadecimal numbers
are unique identifiers (sha1sum) for each commit.  If you want to specify a
commit, you only need the first few digits, enough to form a unique prefix.
(Six digits should be plenty.)</p>

<p>You can limit the log to a specific file or directory, which lists
only the commits changing that file/directory.  Just add the file(s)
you're interested in to the end of the <b>git log</b> command line.</p>

<p>The <b>git tag -l</b> command shows all the tagged releases.  These
human-readable names can be used as synonyms for the appropriate commit
identifier, which is useful when doing things like checkout and diff.
The special tag "<b>master</b>" points to the most recent commit.</p>

<p>The <b>git blame $FILE</b> command displays all the changes that resulted in
the current state of a file.  It shows each line, prefixed with the commit
identifier which last changed that line.  (If the default version of <b>git
blame</b> is difficult to read on an 80 charater terminal, try <b>git blame
$FILE | sed 's/(.*)//'</b> to see more of the file itself.)</p>

<h2>Working with historical versions</h2>

<p>The <b>git checkout</b> command changes your working copy of the source to a
specific version.  The -f option to checkout backs out any local changes
you've made to the files.  The <b>git clean</b> command deletes any extra files
in your working directory (ones which aren't checked into the repository).
The -d option to clean deletes untracked directories as well as files.</p>

<p>So to reset your working copy of the source to a historical version, go
<b>git checkout -f $VERSION; git clean -d</b> where $VERSION is the tag or
sha1sum identifier of the version you want.  If you don't specify a $VERSION,
git will default to "master" which is the most recent checkout in Linus's
tree (what mercurial calls "tip" and Subversion calls HEAD), returning you
to the present and removing any uncommitted local changes.</p>

<p>Another way to undo all changes to your copy is to do "rm -rf *" in
the linux-git directory (which doesn't delete hidden files like ".git"),
followed by "git checkout -f" to grab fresh copies from the repository in
the .git subdirectory.  This generally isn't necessary.  Most of the time,
<b>git checkout -f</b> is sufficient to reset your working copy to the most
recent version in the repository.</p>

<p>If you lose track of which version is currently checked out as your working
copy, use <b>git log</b> to see the most recent commits to the version you're
looking at, and <b>git log master</b> to compare against the most recent
commits in the repository.</p>

<h2>Using git diff</h2>

<p>The command "git diff" shows differences between git versions.  You can
ask it to show differences between:</p>
<ul>
<li><b>git diff</b> - the current version checked out from the respository and all files in the working directory</li>
<li><b>git diff v2.6.21</b> - a specific historical version and all files in the working directory</li>
<li><b>git diff v2.6.20 v2.6.21</b> - all files in two different historial
versions</li>
<li><b>git diff init/main.c</b> - specific locally modified files in the
working directory that don't match what was checked out from the repository</li>
<li><b>git diff v2.6.21 init/main.c</b> - specific file(s) in a specific historical version of the repository vs those same files in the working directory.</li>
<li><b>git diff v2.6.20 v2.6.21 init/main.c</b> - specific files in two
different historical version of the repository</li>
</ul>

<p>What git is doing is checking each argument to see if it recognizes it
as a historical version sha1sum or tag, and if it isn't it checks to see if
it's a file.  If this is likely to cause confusion, you can use the magic
argument "--" to indicate that all the arguments before that are versions
and all the arguments after that are filenames.</p>

<p>The argument <b>--find-copies-harder</b> tells git diff to detect renamed or
copied files.  Notice that git diff has a special syntax to indicate renamed
or copied files, which is much more concise and portable than the traditional
behavior of removing all lines from one file and adding them to another.
(This behavior may become the default in a future version.)</p>

<h2>Creating tarballs</h2>

<p>The <b>git archive $VERSION</b> command creates a tarball (written to
stdout) of the given version.  Note that "master" isn't the default here,
you have to specify that if you want the most up-to-date version.
You can pipe it through bzip and write it to a file (<b>git archive master |
bzip2 > master.tar.bz2</b>) or you can use git archive to grab a clean copy
out of your local git repository and extract it into another directory, ala:</p>

<blockquote>
<pre>
mkdir $COPY
git archive master | tar xCv $COPY
</pre>
</blockquote>

<p>You can also use the standard Linux kernel out-of-tree building
infrastructure on the git working directory, ala:</p>

<blockquote>
<pre>
cd $GITDIR
make allnoconfig O=$OTHERDIR
cd $OTHERDIR
make menuconfig
make
</pre>
</blockquote>

<p>Finally, you can build in your git directory, and then clean it up
afterwards with <b>git checkout -f; git clean -d</b>.  (Better than
"make distclean".)</p>

<h2>Bisect</h2>

<p>Possibly the most useful thing git does for non-kernel developers is
<b>git bisect</b>, which can track down a bug to a specific revision.  This
is a multi-step process which binary searches through the revision history
to find a specific commit responsible for a testable change in behavior.</p>

<p>(You don't need to know this, but bisect turns out to be nontrivial to
implement in a distributed source control system, because the revision history
isn't linear.  When the history branches and comes back together again, binary
searching through it requires remembering more than just a single starting and
ending point.  That's why bisect works the way it does.)</p>

<p>The git bisect commands are:</p>
<ul>
<li><b>git bisect start</b> - start a new bisect.  This opens a new (empty)
log file tracking all the known good and bad versions.</li>
<li><b>git bisect bad $VERSION</b> - Identify a known broken version.  (Leaving
$VERSION blank indicates the current version, "master".)</li>
<li><b>git bisect good $VERSION</b> - Identify a version that was known to
work.</li>
<li><b>git bisect log</b> - Show bisection history so far this run.</li>
<li><b>git bisect replay $LOGFILE</b> - Reset to an earlier state using the output of git bsect log.</li>
<li><b>git bisect reset</b> - Finished bisecting, clean up and return to
head.  (If git bisect start says "won't bisect on seeked tree", you forgot
to do this last time and should do it now.)</li>
</ul>

<p>To track down the commit that introduced a bug via git bisect, start with
<b>git bisect reset master</b> (just to be safe), then <b>git bisect start</b>.
Next identify the last version known to work (ala <b>git bisect good
v2.6.20</b>), and identify the first bad version you're aware of (if it's
still broken, use "master".)</p>

<p>After you identify one good and one bad version, git will grind for a bit
and reset the working directory state to some version in between, displaying
the version identifier it selected.  Test this version (build and run your
test), then identify it as good or bad with the appropriate git bisect
command.  (Just "git bisect good" or "get bisect bad", there's no need to
identify version here because it's the current version.)  After each such
identification, git will grind for a bit and find another version to test,
resetting the working directory state to the new version until it narrows
it down to one specific commit.</p>

<p>The biggest problem with <b>git bisect</b> is hitting a revision that
doesn't compile properly.  When the build breaks, you can't determine
whether or not the current version is good or bad.  This is where
<b>git bisect log</b> comes into play.</p>

<p>When in doubt, save the git bisect log output to a file
(<b>git bisect log > ../bisect.log</b>).  Then make a guess
whether the commit you can't build would have shown the problem if you
could build it.  If you guess wrong (hint: every revision bisect wants
to test after that comes out the opposite of your guess, all the way to the
end) do a <b>git bisect replay ../bisect.log</b> to restart from your
saved position, and guess the other way.  If you realize after the fact you
need to back up, the bisect log is an easily editable text file you can
always chop a few lines off the end of.</p>

<h2>Example git bisect run</h2>

<p>Here is a real git bisect run I did on the <a href=http://qemu.org>qemu</a>
git repository (git://git.kernel.dk/data/git/qemu) to figure out why
the PowerPC Linux kernel I'd built was hanging during IDE controller
intiialization under the current development version of qemu-system-ppc
(but not under older versions).</p>

<blockquote>
<pre><b>$ git bisect reset master</b>
Already on branch "master"
<b>$ git bisect good release_0_8_1</b>
You need to start by "git bisect start"
Do you want me to do it for you [Y/n]? y
<b>$ git bisect bad master</b>
Bisecting: 753 revisions left to test after this
[7c8ad370662b706b4f46497f532016cc7a49b83e] Embedded PowerPC Device Control Registers infrastructure.
<b>$ ./configure && make -j 2 && ~/mytest</b>
...
Unhappy :(
<b>$ git bisect bad # The test failed</b>
Bisecting: 376 revisions left to test after this
[255d4f6dd496d2d529bce38a85cc02199833f080] Simplify error handling again.
<b>$ ./configure && make -j 2 && ~/mytest</b>
WARNING: "gcc" looks like gcc 4.x
Looking for gcc 3.x
./configure: 357: Syntax error: Bad fd number
<b>$ git bisect log > ../bisect.log  # Darn it, build break.  Save state and...</b>
<b>$ git bisect good # Wild guess because I couldn't run the test.</b>
Bisecting: 188 revisions left to test after this
[16bcc6b31799ca01cd389db7cb90a345e9b68dd9] Fix wrong interrupt number for the second serial interface.
<b>$ ./configure && make -j 2 && ~/mytest</b>
...
Happy :)
<b>$ git bisect good # Hey, maybe my guess was right</b>
Bisecting: 94 revisions left to test after this
[37781cc88f69e45624c1cb15321ddd2055cf74b6] Fix usb hid and mass-storage protocol revision, by Juergen Keil.
<b>$ ./configure && make -j 2 && ~/mytest</b>
...
Happy :)
<b>$ git bisect good</b>
Bisecting: 47 revisions left to test after this
[30347b54b7212eba09db05317217dbc65a149e25] Documentation update
<b>$ ./configure && make -j 2 && ~/mytest</b>
...
Happy :)
<b>$ git bisect good</b>
Bisecting: 23 revisions left to test after this
[06a21b23c22ac18d04c9f676b9b70bb6ef72d7f1] Set proper BadVAddress value for unaligned instruction fetch.
<b>$ ./configure && make -j 2 && ~/mytest</b>
...
Happy :)
<b>$ git bisect good</b>
Bisecting: 11 revisions left to test after this
[da77e9d7918cabed5b0725f87496a1dc28da8b8c] Fix exception handling cornercase for rdhwr.
<b>$ ./configure && make -j 2 && ~/mytest</b>
...
Happy :)
<b>$ git bisect good</b>
Bisecting: 5 revisions left to test after this
[36f447f730f61ac413c5b1c4a512781f5dea0c94] Implement embedded IRQ controller for PowerPC 6xx/740 & 750.
<b>$ ./configure && make -j 2 && ~/mytest</b>
...
Unhappy :(
<b>$ git bisect bad # Oh good, I was getting worried I'd guessed wrong above...</b>
Bisecting: 2 revisions left to test after this
[d4838c6aa7442fae62b08afbf4c358200f10ec74] Proper handling of reserved bits in the context register.
<b>$ ./configure && make -j 2 && ~/mytest</b>
...
Happy :)
Bisecting: 1 revisions left to test after this
[a8b64e6f4c7f3c4850be5fd303bf590564264294] Fix monitor disasm output for Sparc64 target
<b>$ ./configure && make -j 2 && ~/mytest</b>
...
Happy :)
<b>$ git bisect good</b>
36f447f730f61ac413c5b1c4a512781f5dea0c94 is first bad commit
commit 36f447f730f61ac413c5b1c4a512781f5dea0c94
Author: j_mayer <j_mayer>
Date:   Mon Apr 9 22:45:36 2007 +0000

    Implement embedded IRQ controller for PowerPC 6xx/740 & 750.
    Fix PowerPC external interrupt input handling and lowering.
    Fix OpenPIC output pins management.
    Fix multiples bugs in OpenPIC IRQ management.
    Fix OpenPIC CPU(s) reset function.
    Fix Mac99 machine to properly route OpenPIC outputs to the PowerPC input pins.
    Fix PREP machine to properly route i8259 output to the PowerPC external
      interrupt pin.

:100644 100644 0eabacd6434b8e40876581605c619513bf9ac512 284cb92ae83a2a36e05137d3532106ff85167364 M      cpu-exec.c
:040000 040000 68740f5b1330c7859abfea3ce31062cb92adaa7f 5c48b0d20f1c4d3115881b5e9e5b6c1d681f4880 M      hw
:040000 040000 3ad1f0d09c60d8190d98b28318519ebaaccbb569 69efc274cec1801848de9238ae71e97681978433 M      target-ppc
:100644 100644 2f87946e874e8f6cbf9afd47c65e0baff236dc45 b40ff3747530d275181ff071c9cc9cff1d5ba02d M      vl.h
<b>$ git bisect reset</b>
</pre>
</blockquote>

<p>

<h2>Command summary</h2>

<p><b>git help</b></p> - List available commands.  You can also go
<b>git help COMMANDNAME</b> to see help on a specific command.  Note,
this displays the man page for the appropriate command, so you need to have
the git man pages installed for it to work.</p>

<p><b>git clone git://blah/blah/blah localdir</b> - Download a repository
from the web into "localdir".  Linus's current repository is at
"git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git",
the old history from the bitkeeper days (before 2.6.12-rc2) is at
"git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git", and
there are lots of <a href=http;//git.kernel.org>other trees hosted on
kernel.org</a> and elsewhere.</p>

<p><b>git pull</b> - Freshen up your local copy of the repository, downloading
and merging all of Linus's changes since last time you did this.  In addition
to appending lots of commits to your repository in the .git directory, this
also updates the snapshot of the files (if it isn't already pointing into
the past).</p>

<p><b>git log</b> - List the changes recorded in your repository, starting with
the most recent and working back.  Note: the big hex numbers are unique
identifiers (sha1sum) for each commit.  If you want to specify a commit, you
only need a unique prefix (generally the first four digits is enough).</p>

<p><b>git tag -l</b> - Show all the tagged releases.  These human-readable
names can be used as synonyms for the appropriate commit identifier when
doing things like checkout and diff.  (Note, the special tag "master"
points to the most recent commit.)</p>

<p><b>git checkout -f; git clean -d</b> - reset your snapshot to the most recent
commit.  The "checkout" command updates your snapshot to a specific version
(defaulting to the tip of the current branch).  The -f argument says to back
out any local changes you've made to the files, and "clean -d" says to
delete any extra files in the snapshot that aren't checked into the
repository.</p>

<p><b>git diff</b> - Show differences between two commits, such as
"git diff v2.6.20 v2.6.21".  You can also specify specific files you're
interested in, ala "git diff v2.6.20 v2.6.21 README init/main.c".  If you
specify one version it'll compare your working directory against that version,
and if you specify no versions it'll compare the version you checked out
against your working directory.  Anything that isn't recognized as the start of
a commit indentifying sha1sum, or a tagged release, is assumed to be a filename.
If this causes problems, you can add "--" to the command line to explicitly
specify that arguments before that (if any) are version identifiers and all the
arguments after that are filenames.  Add "--find-copies-harder" to detect
renames.</p>

<h2>Linus Tovalds talks about git</h2>

<p>In <a href=http://youtube.com/watch?v=4XpnKHJAok8>this Google Tech Talk</a></p>

<!--
 "git show @{163}"... one character less...

http://www.kernel.org/pub/software/scm/git/docs/glossary.html#def_working_tree
-->