This is the material for the URFIST course on Git & GitHub.
Bordeaux, March 27 & 28, 2018.
Table of Content
- Building our own versioning system
- Git to the rescue
- Interactive Git
- Collaborative book on GitHub
- External Resources
- Goal: To build our own versioning system using system tools
- Duration: 3h
- Public: Novice
- Difficulty: ★☆☆
Let us imagine we've decided to write a book. This book will be composed of an introduction, three chapters and a conclusion. We could decide to use a word processor (e.g. Open Office) to write it, but for the sake of this course, we'll use instead a light markup format (markdown) such that we won't have to deal with style nor layout (you'll see that we can do it later using pandoc). At this point, we only have to decide how we'll organize ourselves. Either we create a single file that will ultimately contains the whole book, or we can also use a logical split into the five different parts (introduction, 3 chapters and conclusion). Let's start with the second solution using the touch command to create empty files. Open a terminal and type:
$ mkdir book
$ cd book
$ touch introduction.txt
$ touch chapter-1.txt chapter-2.txt chapter-3.txt
$ touch conclusion.txt
You should obtain a book
directory containing five files. You can now use
your preferred text editor (for example Notepad++ (Windows only),
Atom, Emacs or Vim) to start writing the next best-seller...
Let's start with the introduction. Open the introduction.txt
file and type a
few words.
You've written a few lines already and you start to worry that you may have
erased some text that you think is not so good right now but might be worth to
be reconsidered some time later. If you do not have any backup, this erased
text is lost once and for all. Consequently, before erasing something, it might
be a good idea to save a snapshot of your current work. The question is how do
we do that? One obvious and simple solution is to make a copy of all your files
and to store them at some other place (e.g. in a snapshots
directory). This
can be done very easily using the cp command (after having created the
snapshots
directory):
$ mkdir snapshots
$ mkdir snapshots/snapshot-1
$ cp *.txt snapshots/snapshot-1/
We're safe! We now have a full copy of our book. We can continue modifying it and save as many snapshots as needed:
$ mkdir snapshots/snapshot-2
$ cp *.txt snapshots/snapshot-2/
Start writing the introduction and save a snapshot then start writing the first chapter and save another snapshot.
- How do you decide of the name of the next snapshot?
- What happens if you re-use an already existing snapshot name?
- How can you prevent to accidentaly overwrite a previous snapshot?
After a few hours (or a few days), and depending on the frequency of your
snapshots, you'll end up with a snapshots
directory holding several copies of
your work at different time, the oldest being snapshot-1
. At that point, we
can see we have a problem. For example, we don't know if snapshots/snapshot-5
refers to a version from a few hours ago or from a few days ago. The
incremental naming scheme we've been using so far is not really informative and
this is not the best one we could have chosen. We can use instead the current
date and time for the snapshot filename. Not only we won't have to keep track
of the number of snapshots anymore but the name of the snapshots will be
indicative of when have they have been created. To do that, we can use the
date function.
$ name=snapshot-`date '+%Y-%M-%d-%Hh%M'`
$ mkdir snapshots/$name
$ cp *.txt snapshots/$name
We can further simplify things by creating a simple bash script that will
take care of everything. We'll name it commit.sh
:
#!/bin/bash
# commit.sh, version 1
# Name of the snapshot
name=snapshot-`date '+%Y-%M-%d-%Hh%M'`
# Creation of the snapshot directory
mkdir snapshots/$name
# Actual saving of the file from the stage directory
cp *.txt snapshots/$name
# Message
echo "Saving snapshot $name"
You may have to change the rights on the file such that it becomes executable:
$ chmod 755 commit.sh
Now, each time we need to create a snapshot, we just have to call this script.
Try to change the name of the snapshot such as to remove the time. What does it imply for snapshots? How many different snapshots can you take a day now?
We now have a script to make a new snapshot whenever we want and we also keep
track of the time when a snapshot has been created (stored in the filename).
However, we do not know why a specific snapshot has been taken. Imagine you
just finished the introduction in your book, and before going to sleep, you
want to save a snapshot and to have a way to know this specific snapshot
corresponds to end of the introduction. Adding a message to the filename is not
really possible anymore because it is now given an automatic name and in any
case, we could not write a long message. To keep track of what a given snapshot
corresponds to, we'll use of a log file, i.e. a file where messages will be
recorded alongside the name of the snapshot. This means that when we'll save a
snapshot, we'll have to give a message to be registered. Our new commit.sh
script reads:
#!/bin/bash
# commit.sh, version 2
# Name of the snapshot
name=snapshot-`date '+%Y-%M-%d-%Hh%M'`
# Creation of the snapshot directory
mkdir snapshots/$name
# Actual saving of the file from the stage directory
cp *.txt snapshots/$name
# Message
echo "Saving snapshot $name"
# Recording log message
echo "$name : $1" >> snapshots/log.txt
And we can use it with a command such as:
$ ./snapshot.sh "Introduction finished!"
Here is the view on our snaphots
directory
$ cd snapshots
$ ls -al
total 8
drwxr-xr-x 4 rougier staff 128 Mar 10 18:42 ./
drwxr-xr-x 9 rougier staff 288 Mar 10 18:42 ../
-rw-r--r-- 1 rougier staff 50 Mar 10 18:42 log.txt
drwxr-xr-x 7 rougier staff 224 Mar 10 18:42 snapshot-2018-03-10-18h42/
And you look at the log.txt
file, you should have:
$ cat snapshots/log.txt
snapshot-2018-42-10-18h42 : Introduction finished
What happen if no message is given? How do you fix that?
What we have created so far is a very basic versioning system, but this system
is far from being efficient. One problem is that each time we create a
snapshot, we copy all the files unconditionally, modified or not. This means
that each snapshot may contain a copy of the same unmodified file. For example,
in our case, each and every snapshot contains a full copy of the conclusion.txt
file even though we never modified it. To avoid unnecessary copy, we thus need
to manually select what needs to be actually saved in a snapshot. To do that,
we'll create a temporary directory named stage
where we will copy the files we
want to save and we'll take a snapshot of this directory instead of copying all
the .txt
files:
$ mkdir snapshots/stage
And we need to modify our commit.sh
script accordingly
#!/bin/bash
# commit.sh, version 3
# Name of the snapshot
name=snapshot-`date '+%Y-%m-%d-%Hh%M'`
# Creation of the snapshot directory
mkdir snapshots/$name
# Actual saving of the file from the stage directory
cp snapshots/stage/* snapshots/$name
# Message
echo "Saving snapshot $name: $1"
echo "$name : $1" >> snapshots/log.txt
# Cleanup the stage directory
rm snapshots/stage/*
We can now select what we want to backup:
$ cp introduction.txt stage/
$ cp chapter-1.txt stage/
$ ./commit.sh "Introduction extended"
And if we want to backup everything:
How would you cancel the addition of a file to the stage area?
$ cp *.txt stage/
$ ./commit.sh "New idea incorporated"
Our last solution, that make use of a temporary stage
directory, is nice but
less than ideal because we may still backup a file that has not been modified
since last backup. Ideally, we want to have some finer control. This is where
the diff command comes handy. This unix command allows to compare two
files to know if they're different (and to know what and where are the
differences). Instead of manually copying the file to the stage
directory,
we'll thus use an add.sh
script that will check if a file really needs some
backup. If not, we just make a symbolic link to the last saved version.
But how do we know where is the last saved version? Of course, we can compare
all the dates from the different snapshots in order to find the most recent
one. But if we were to write a script for doing the same, it would a bit
painful (but it can be done). What we can do instead is to have a symbolic
snapshot that will point to the last snapshot taken. This way, we'll know
immediately where is the last snapshot. Let's name this symolic snapshot HEAD
and let's modify our commit.sh
script using the ln command.
#!/bin/bash
# commit.sh, version 4
# Name of the snapshot
name=snapshot-`date '+%Y-%m-%d-%Hh%M'`
# Creation of the snapshot directory
mkdir snapshots/$name
# Actual saving of the file from the stage directory
cp stage/* snapshots/$name
# Message
echo "Saving snapshot $name: $1"
echo "$name : $1" > snapshots/log.txt
# Cleanup the stage directory
rm stage/*
# Creation of the symbolic latest snapshot
ln -Fs $snapshots/$name snapshots/HEAD
Let's test it:
$ cp introduction.txt stage/
$ commit.sh "Testing the HEAD directory"
$ cd snaphosts
$ ls
total 8
drwxr-xr-x 10 rougier staff 320 Mar 12 10:50 ./
drwxr-xr-x 12 rougier staff 384 Mar 12 10:50 ../
lrwxr-xr-x 1 rougier staff 35 Mar 12 10:50 HEAD@ -> snapshot-2018-03-12-10h50
-rw-r--r-- 1 rougier staff 164 Mar 12 10:50 log.txt
drwxr-xr-x 3 rougier staff 96 Mar 12 10:50 snapshot-2018-03-12-10h50/
drwxr-xr-x 7 rougier staff 224 Mar 12 10:15 snapshot-2018-03-12-10h15/
drwxr-xr-x 7 rougier staff 224 Mar 12 10:16 snapshot-2018-03-12-10h16/
drwxr-xr-x 7 rougier staff 224 Mar 12 10:17 snapshot-2018-03-12-10h17/
drwxr-xr-x 7 rougier staff 224 Mar 12 10:18 snapshot-2018-03-12-10h18/
drwxr-xr-x 2 rougier staff 64 Mar 12 10:50 stage/
Now we can write a script add.sh
that will add a file only if it differs from
the one in HEAD
:
#!/bin/bash
# add.sh, version 1
for file in "$@"
do
if ! `diff -q $file snapshots/HEAD/$file`; then
cp $file snapshots/stage/
fi
done
We can use it with:
$ ./add.sh *.txt
$ ./commit.sh "Adding modified files"
What happens if you make two snapshots? Why an unmodified file is present in the second snapshot? How to fix it?
Let's imagine now you have two ideas for the conclusion of your book. Ideally,
you would like to work on both versions but you have only one
conclusion.txt
. How could we work on the same file in two different ways and
switch to one or the other at will? This is now a bit too complex for our
rudimentary versioning system and we need to use a real versioning tool such as
git.
- Goal: To discover git basic commands
- Duration: 3h
- Public: Novice
- Difficulty: ★☆☆
We'll use the introduction to version control for novices using Git by
Software Carpentry.
Please go to https://swcarpentry.github.io/git-novice/
- Goal: To discover git advanced commands
- Duration: 3h
- Public: Beginner
- Difficulty: ★★☆
We'll use the very nice Learn Git Branching by Peter Cottle.
- Goal: To use GitHub for writing a collaborative book
- Duration: 3h
- Public: Novice
- Difficulty: ★☆☆
See also GitHub Guides
First you need to create an account on GitHub. Choose your login wisely since you'll see it a lot while interacting on GitHub. Once created, you can browse around and search for interesting repositories or people. For each of them, you can star them to bookmark them and receive notifications when something is happening. You can also configure how you'll receive notifications (online or by email). If you're using GitHub a lot, it might be a good thing to disable email notifications.
You have to fork the repository https://github.com/rougier/URFIST-Book in your GitHub account and then to clone your fork (i.e. not the original repository) on your computer. What are is the current remote?
$ git remote -v
origin git@github.com:rougier/URFIST-Book.git (fetch)
origin git@github.com:rougier/URFIST-Book.git (push)
After you cloned the repository, you should have an origin
remote that points
toward your GitHub repository. You need now to add a new remote named upstream
that points back to the original repository.
$ git remote add upstream git@github.com:rougier/URFIST-Book.git
$ git remote -v
origin git@github.com:rougier/URFIST-Book.git (fetch)
origin git@github.com:rougier/URFIST-Book.git (push)
upstream git@github.com:rougier/URFIST-Book.git (fetch)
upstream git@github.com:rougier/URFIST-Book.git (push)
Each group will need to write a chapter, however, to know which chapter you
have to write, you'll have to first open an issue on the original repository on
GitHub and ask what chapter you have to write. One you know the answer, you can
start creating the corrresponding file and commit locally your change. Then you
can push your change on GitHub. But where exactly do you need to push? origin
or upstream
?
After you've made your push, you can go to your repository on GitHub and click at the newly proposed Pull Request button (relatively to your last commit).
There exist a huge set of git resources online. You'll find below a selected subset.
-
Learn Git Branching by Peter Cottle
Interested in learning Git? Well you've come to the right place! "Learn Git Branching" is the most visual and interactive way to learn Git on the web; you'll be challenged with exciting levels, given step-by-step demonstrations of powerful features, and maybe even have a bit of fun along the way.
-
Version Control with Git by Software Carpentry
Version control is the lab notebook of the digital world: it’s what professionals use to keep track of what they’ve done and to collaborate with other people. Every large software development project relies on it, and most programmers use it for their small jobs as well. And it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can and should be stored in a version control system.
-
Pro Git Book by Scott Chacon & Ben Straub
What is "version control", and why should you care? Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. For the examples in this book, you will use software source code as the files being version controlled, though in reality you can do this with nearly any type of file on a computer.
-
Git Cheat Sheet by Tobias Günther
Even with a GUI application at hand there are times when you resort to the command line. We admit we can’t memorize all important Git commands – that’s why we created a nice cheat sheet for Git that we would like to share with you. On the front you can find all important commands. On the back you can find our Version Control Best Practices that help you get the most out of version control with Git.
-
Oh shit, git! by Katie Sylor-Miller
Git is hard: screwing up is easy, and figuring out how to fix your mistakes is fucking impossible. Git documentation has this chicken and egg problem where you can't search for how to get yourself out of a mess, unless you already know the name of the thing you need to know about in order to fix your problem.
-
Your mission is to learn the ropes of Git by completing the tutorial and tracking down all your team's space stations.