Changed an example to suggest a Shakespeare quote.

alexgarel · May 5, 2009 · 3bc3fbf · 3bc3fbf
1 parent d63b428
commit 3bc3fbf
Showing 1 changed file with 47 additions and 36 deletions.
diff --git a/secrets.txt b/secrets.txt
@@ -32,7 +32,8 @@ Git heuristically ferrets out renames and copies between successive versions. In
 
 For every tracked file, Git records information such as its size, creation time and last modification time in a file known as the 'index'. To determine whether a file has changed, Git compares its current stats with that held the index. If they match, then Git can skip reading the file again.
 
-Since stat calls are vastly cheaper than reading file contents, if you only edit a few files, Git can update its state in almost no time.
+Since stat calls are considerably faster than file reads, if you only edit a
+few files, Git can update its state in almost no time.
 
 === Bare Repositories ===
 
@@ -54,23 +55,23 @@ system from scratch in a few hours.
 
 First, a magic trick. Pick a filename, any filename. In an empty directory:
 
- $ echo foo > YOUR_FILENAME
+ $ echo sweet > YOUR_FILENAME
  $ git init
  $ git add .
  $ find .git/objects -type f
 
-You'll see +.git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99+.
+You'll see +.git/objects/aa/823728ea7d592acc69b36875a482cdf3fd5c8d+.
 
 How do I know this despite not knowing the filename you chose? It's because the
 SHA1 hash of:
 
- "blob" SP "4" NUL "foo" LF
+ "blob" SP "6" NUL "sweet" LF
 
-is 257cc5642cb1a054f08cc83f2d943e56fd3ebe99,
+is aa823728ea7d592acc69b36875a482cdf3fd5c8d,
 where SP is a space, NUL is a zero byte and LF is a linefeed. You can verify
 this by typing:
 
-  $ echo "blob 4"$'\001'"foo" | tr '\001' '\000' | sha1sum
+  $ echo "blob 6"$'\001'"sweet" | tr '\001' '\000' | sha1sum
 
 This is written with the bash shell in mind; other shells may be able to handle
 NUL on the command line, obviating the need for the *tr* workaround.
@@ -80,18 +81,20 @@ but rather by the hash of the data they contain, in a file we call a 'blob
 object'. We can think of the hash as a unique ID for a file's contents, so
 in a sense we are addressing files by their content.
 
-The initial "blob 4" is a just a header denoting the type of the
+The initial "blob 6" is a just a header denoting the type of the
 object and the length of its contents in bytes, to simplify internal
-bookkeeping. This is how I knew what you would see. The filename is irrelevant:
-only the data inside is used to construct the blob object.
+bookkeeping. This is how I knew what you would see. The file's name is
+irrelevant: only the data inside is used to construct the blob object.
 
-Thus for identical files, Git only stores the data once as the same blob. Indeed, try adding copies of your file, with any filenames whatsoever. The contents of +.git/objects+ stay the same no matter how many copies you add.
+You may be wondering: what happens with identical files? Try adding copies of
+your file, with any filenames whatsoever. The contents of +.git/objects+ stay
+the same no matter how many copies you add. Git only stores the data once.
 
 By the way, the files within +.git/objects+ are compressed with zlib so you
 should not stare at them directly. Filter them through
 http://www.zlib.net/zpipe.c[zpipe -d], or type:
 
- $ git cat-file -p 257cc5642cb1a054f08cc83f2d943e56fd3ebe99
+ $ git cat-file -p aa823728ea7d592acc69b36875a482cdf3fd5c8d
 
 which pretty-prints the given object.
 
@@ -103,48 +106,55 @@ Git gets around to the filenames during a commit:
  $ git commit
  $ find .git/objects -type f
 
-You should now see 3 objects. This time I cannot tell you what the 2 new files are, as it partly depends on the filename you picked. We'll proceed assuming you chose "bar". If you didn't, you can rewrite history to make it look like you did:
+You should now see 3 objects. This time I cannot tell you what the 2 new files are, as it partly depends on the filename you picked. We'll proceed assuming you chose "rose". If you didn't, you can rewrite history to make it look like you did:
 
- $ git filter-branch --tree-filter 'mv YOUR_FILENAME bar'
+ $ git filter-branch --tree-filter 'mv YOUR_FILENAME rose'
  $ find .git/objects -type f
 
-Now you should see +.git/objects/ef/bc17e61e746dad5c834bcb94869ba66b6264f9+, because this is the SHA1 hash of:
+Now you should see +.git/objects/05/b217bb859794d08bb9e4f7f04cbda4b207fbe9+,
+because this is the SHA1 hash of:
 
- "tree" SP "31" NUL "100644 bar" NUL 0x257cc5642cb1a054f08cc83f2d943e56fd3ebe99
+ "tree" SP "32" NUL "100644 rose" NUL 0xaa823728ea7d592acc69b36875a482cdf3fd5c8d
 
 Check this file does indeed contain this by typing:
 
- $ echo efbc17e61e746dad5c834bcb94869ba66b6264f9 | git cat-file --batch
+ $ echo 05b217bb859794d08bb9e4f7f04cbda4b207fbe9 | git cat-file --batch
 
 With zpipe, it's easy to verify the hash:
 
- $ zpipe -d < .git/objects/ef/bc17e61e746dad5c834bcb94869ba66b6264f9 | sha1sum
+ $ zpipe -d < .git/objects/05/b217bb859794d08bb9e4f7f04cbda4b207fbe9 | sha1sum
 
 Hash verification is trickier via cat-file because its output contains more
 than the raw uncompressed object file.
 
-This file is a 'tree' object. All filenames are kept in tree objects, where
-they are mapped to SHA1 hashes describing their contents. The string "100644"
-specifies the file type: normal file, executable, or symlink. The hash can be a
-blob object, or another tree object, allowing directory hierarchies to be
-represented.
+This file is a 'tree' object: a list of tuples consisting of a file
+type, a filename, and a hash. In our example, the file type is "100644", which
+means "rose" is a normal file, and the hash is the blob object that contains
+the contents of "rose". Other possible file types are executables, symlinks or
+directories. In the last case, the hash points to a tree object.
 
-If you ran filter-branch, you'll now have old objects you no longer need. Although they will be jettisoned automatically once the grace period expires, we'll
+If you ran filter-branch, you'll have old objects you no longer need. Although
+they will be jettisoned automatically once the grace period expires, we'll
 delete them now to make our toy example easier to follow:
 
  $ rm -r .git/refs/original
  $ git reflog expire --expire=now --all
  $ git prune
 
-For real projects you should typically avoid commands like this as you are destroying backups. If you want a clean repository, it is usually best to make a fresh clone. Also, take care if you directly manipulate +.git+: what if a Git command is running at the same time? For a serious project, ideally delete the original refs with *git update-ref -d*.
+For real projects you should typically avoid commands like this, as you are
+destroying backups. If you want a clean repository, it is usually best to make
+a fresh clone. Also, take care if you directly manipulate +.git+: what if a Git
+command is running at the same time, or a sudden power outage occurs?
+Ideally, refs should be deleted with *git update-ref -d*,
+though usually it's safe to remove +refs/original+ by hand.
 
 ==== Commits ====
 
 We've explained 2 of the 3 objects. The third is a 'commit' object. Its
 contents depend on the commit message as well as the date and time it was
 created. To match what we have here, we'll have to tweak it a little:
 
- $ git commit --amend -m baz  # Change the commit message.
+ $ git commit --amend -m Shakespeare  # Change the commit message.
  $ git filter-branch --env-filter 'export
      GIT_AUTHOR_DATE="Fri 13 Feb 2009 15:31:30 -0800"
      GIT_AUTHOR_NAME="Alice"
@@ -155,15 +165,15 @@ created. To match what we have here, we'll have to tweak it a little:
  $ find .git/objects -type f
 
 You should now see
-+.git/objects/f0/92611fe90e213cd76b35ce165fc00b6e311b4f+ which is the SHA1 hash
-of its contents:
++.git/objects/49/993fe130c4b3bf24857a15d7969c396b7bc187+
+which is the SHA1 hash of its contents:
 
- "commit 160" NUL
- "tree efbc17e61e746dad5c834bcb94869ba66b6264f9" LF
+ "commit 158" NUL
+ "tree 05b217bb859794d08bb9e4f7f04cbda4b207fbe9" LF
  "author Alice <alice@example.com> 1234567890 -0800" LF
  "committer Bob <bob@example.com> 1234567890 -0800" LF
  LF
- "baz" LF
+ "Shakespeare" LF
 
 As before, you can run zpipe or cat-file to see for yourself.
 
@@ -181,11 +191,12 @@ tricks to save time, we now know how Git deftly changes a filesystem into a
 database perfect for version control.
 
 For example, if any file within the object database is corrupted by a disk
-error, then its hash will no longer match. Commits are atomic, that is, a
-commit can never only partially record changes: we cannot compute the hash of a
-commit and store it in the database until we already have stored all relevant
-trees, blobs and parent commits. The object database is immune to
-unexpected interruptions such as power outages.
+error, then its hash will no longer match, alerting us to the problem. By
+hashing hashes of other objects, we maintain integrity at all levels. Commits
+are atomic, that is, a commit can never only partially record changes: we can
+only compute the hash of a commit and store it in the database after we already
+have stored all relevant trees, blobs and parent commits. The object
+database is immune to unexpected interruptions such as power outages.
 
 We defeat even the most devious adversaries. Suppose somebody attempts to
 stealthily modify the contents of a file in an ancient version of a project. To
@@ -194,7 +205,7 @@ corresponding blob object since it's now a different string of bytes. This
 means they'll have to change the hash of any tree object referencing the file,
 and in turn change the hash of all commit objects involving such a tree, in
 addition to the hashes of all the descendants of these commits. This implies the
-hash of the official current head differs to that of the bad repository. By
+hash of the official head differs to that of the bad repository. By
 following the trail of mismatching hashes we can pinpoint the mutilated file,
 as well as the commit where it was first corrupted.