Skip to content

Commit

Permalink
Split Git Shortcomings into new chapter, added to it
Browse files Browse the repository at this point in the history
  • Loading branch information
Ben Lynn committed May 2, 2008
1 parent e17590a commit aa80f6f
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 45 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

target: book book/default.css

TXTFILES=preface.txt intro.txt basic.txt clone.txt branch.txt grandmaster.txt secrets.txt
TXTFILES=preface.txt intro.txt basic.txt clone.txt branch.txt grandmaster.txt secrets.txt drawbacks.txt

book.xml: $(TXTFILES)
( for FILE in $^ ; do cat $$FILE ; echo ; done ) | asciidoc -d book -b docbook - > $@
Expand Down
73 changes: 73 additions & 0 deletions drawbacks.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
== Git Shortcomings ==

There are some Git issues I've swept under the carpet until now. Some can be handled easily with scripts, others require reorganizing or redefining the project, and as for Windows annoyances, one will just have to wait. Or better yet, pitch in and help!

=== Microsoft Windows ===

Git on Microsoft Windows can be cumbersome. It works with Cygwin installed, though is slower. There is also the less invasive http://repo.or.cz/w/git/mingw.git[mingw port] which can be run from the Windows command-line.

=== Unrelated Files ===

If your project is very large and contains many unrelated files that are constantly being changed, Git may be disadvantaged more than other systems because single files are not tracked. Git tracks changes to the whole project, which is usually beneficial.

A solution is to break up your project into pieces, each consisting of related files. Use *git submodule* if you still want to keep everything in a single repository.

=== Diffs ===

Some version control systems force you to explicitly tag a file before editing. While this is especially annoying when this tagging involves talking to a central server, it does have two benefits:

1. Diffs are quick because only the tagged files need be examined.

2. When a central server stores the tags, one can discover who else is working on the file.

However, with appropriate scripting, you can achieve the same with Git, though it does require cooperation from the programmer, who should execute particular scripts when editing a file.

=== File History ===

Since Git records project-wide changes, reconstructing the history of a single file requires more work than in version control systems that track individual files. However, the penalty is slight.

=== Initial Clone ===

Creating a clone is more expensive than checking out code in other version control systems when there is a lot of history.

The initial payment is worth it in the long run, as most operations will then be fast and offline. However, if the down payment is prohibitive, create a shallow clone with the \--depth option. This is much faster, but the resulting clone has fewer capabilities.

=== Volatile Projects ===

Git was written to be fast with respect to the size of the changes. Humans make small edits from version to version. A one-liner bugfix here, a new feature there, emended comments, that sort of thing. But if you're mostly keeping files that are radically different in successive revisions, on each commit, your history necessarily grows by the size of your whole project.

There is nothing any version control system can do about this, but standard Git users will suffer more since normally histories are cloned.

The reasons why the changes are so great should be examined. Perhaps file formats should be changed. Minor edits should only cause minor changes to at most a few files.

Or perhaps a database or backup/archival solution is what is actually being sought, not a version control system. For example, version control may not be suitable for managing photos periodically taken from a webcam. Again, version control is meant for keeping track of alterations made by humans.

If the files really must be constantly morphing and they really must be versioned, a possibility is to use Git in a centralized fashion. One can create shallow clones, which checks out little or no history of the project. Of course, many Git tools will be unavailable, and fixes must be submitted as patches. This is probably fine as it's unclear why anyone would want the history of wildly unstable files.

Another example is a project depending on firmware, which takes the form of a huge binary file. The history of the firmware is uninteresting to users, and updates compress poorly, so firmware revisions would unnecessarily blow up the size of the repository.

In this case, the source code should be stored in a Git repository, and the binary file should be kept separately. To make life easier, one could distribute a script that uses Git to checkout the code, and rsync for the firmware.

=== Global Counter ===

Some centralized version control systems maintain a positive integer that increases when a new commit is accepted. Git refers to changes by their hash, which is better in many circumstances.

But some people like having this integer around. Luckily, it's easy to write scripts so that with every update, the central Git repository increments an integer, perhaps in a tag, and associates it with the hash of the latest commit.

Every clone could maintain such a counter, but this would probably be useless, since only the central repository and its counter matters to everyone.

=== Empty Subdirectories ===

Empty subdirectories cannot be tracked. Create dummy files to work around this problem.

The current implementation of Git, rather than its design, is to blame for this drawback. With luck, once Git gains more traction, more users will clamour for this feature and it will be implemented.

=== Initial Commit ===

A stereotypical computer scientist counts from 0, rather than 1. Unfortunately, git does not, and many commands are unfriendly before the initial commit. Aside from decreased usability, some corner cases must be handled specially, such as rebasing a branch with a different initial commit.

Git would benefit from defining the zero commit: as soon as a repository is constructed, HEAD would be set to the string consisting of 20 zero bytes. This special commit represents an empty tree, with no parent, at a time that predates all other Git repositories.

Then running git log, for example, would inform the user that no commits have been made yet, instead of exiting with a fatal error. Similarly for other tools.

Every initial commit is implicitly a descendant of this zero commit, so for example, rebasing an unrelated branch would cause the whole branch to be grafted on to the target. Currently, all but the initial commit is applied, resulting in a merge conflict.
45 changes: 1 addition & 44 deletions secrets.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,54 +38,11 @@ I've ignored details such as file permissions and signatures. See the link:.[ref

How does Git know you renamed a file, even though you never mentioned the fact explicitly? Sure, you may have run *git mv*, but that is exactly the same as a *git rm* followed by a *git add*.

Git heuristically ferrets out renames and copies between successive versions. In fact, it can detect chunks of code being moved or copied around between files! Though it cannot cover all cases, it does a decent job, and this feature is always improving. If it does not seem to be working for you, consider upgrading.
Git heuristically ferrets out renames and copies between successive versions. In fact, it can detect chunks of code being moved or copied around between files! Though it cannot cover all cases, it does a decent job, and this feature is always improving. If it fails to work for you, try options enabling more expensive copy detection, and consider upgrading.

=== Bare Repositories ===

You may have been wondering what format those online Git repositories use.
They're plain Git repositories, just like your ".git" directory except they've got names like "proj.git", and they have no working directory associated with them.

Most Git commands expect the Git index to live in ".git", and will fail on these bare repositories. Fix this by setting the "GIT_DIR" environment variable to the path of the bare repository.

=== Git Shortcomings ===

There are some Git issues I've swept under the carpet until now. Some can be handled easily with scripts, others require reorganizing or redefining the project, and as for Windows annoyances, one will just have to wait. Or better yet, pitch in and help!

==== Microsoft Windows ====

Git on Windows can be cumbersome. It works with Cygwin installed, though is slower. There is also the less invasive http://repo.or.cz/w/git/mingw.git[mingw port] which can be run from the Windows command-line.

==== Unrelated Files ====

If your project is very large and contains many unrelated files that are constantly being changed, Git may be disadvantaged more than other systems because single files are not tracked. Git tracks changes to the whole project, which is usually beneficial.

A solution is to break up your project into pieces, each consisting of related files. Use *git submodule* if you still want to keep everything in a single repository.

==== Volatile Projects ====

Git was written to be fast with respect to the size of the changes. Humans make small edits from version to version. A one-liner bugfix here, a new feature there, emended comments, that sort of thing. But if you're mostly keeping files that are radically different in successive revisions, on each commit, your history necessarily grows by the size of your whole project.

There is nothing any version control system can do about this, but standard Git users will suffer more since normally histories are cloned.

The reasons why the changes are so great should be examined. Perhaps file formats should be changed. Minor edits should only cause minor changes to at most a few files.

Or perhaps a database or backup/archival solution is what is actually being sought, not a version control system. For example, version control may not be suitable for managing photos periodically taken from a webcam. Again, version control is meant for keeping track of alterations made by humans.

If the files really must be constantly morphing and they really must be versioned, a possibility is to use Git in a centralized fashion. One can create shallow clones, which checks out little or no history of the project. Of course, many Git tools will be unavailable, and fixes must be submitted as patches. This is probably fine as it's unclear why anyone would want the history of wildly unstable files.

Another example is a project depending on firmware, which takes the form of a huge binary file. The history of the firmware is uninteresting to users, and updates compress poorly, so firmware revisions would unnecessarily blow up the size of the repository.

In this case, the source code should be stored in a Git repository, and the binary file should be kept separately. To make life easier, one could distribute a script that uses Git to checkout the code, and rsync for the firmware.

==== Global Counter ====

Some centralized version control systems maintain a positive integer that increases when a new commit is accepted. Git refers to changes by their hash, which is better in many circumstances.

But some people like having this integer around. Luckily, it's easy to write scripts so that with every update, the central Git repository increments an integer, perhaps in a tag, and associates it with the hash of the latest commit.

Every clone could maintain such a counter, but this would probably not be useful, since everyone only really cares about the central repository and its counter.

==== Automatic Compression ====

To save space, *git gc* should be run once in a while. Git will automatically run it for you when it considers it's appropriate (which depends on how frequent you commit, and is usually less than once per month).

0 comments on commit aa80f6f

Please sign in to comment.