Repository initiates all requests. It is responsible for managing Commits and Objects.
The following operations, which make up most of the commands in gitlet, are repository-based:
- initializing repo:
init; - tracking files:
add,remove; - branching:
checkout,branch,merge; - status checking:
log,global-log,status; - and
reset.
- String
currentBranch: tracks the current branch. - String
currentHead: tracks the hash of current head commit. - Commit
currentCommit: a pointer to the current head commit instance. Markedtransientto exclude it from serialization. - HashMap<String, String>
branches: maps branch names to their head. - HashMap<String, String>
stagedAdd: tracks staged addition. - HashSet
stagedDelete: tracks staged deletion.
Commit keep tracks of all the metadata, including a message and a timestamp, as well as a mapping from filenames to their contents tracked in the particular commit.
All fields are final to avoid any mutation.
Commit encapsulates all data and provides a range of public methods, consisting of
- two constructors for normal commits and merge commits respectively:
Commit(); - one public method for resetting directory to committed status:
restore(); - getter methods:
getParentCommit(),getFile(),getFileHash(),getAllFiles(),getHash(),getMessage(),contains(); - and a private helper method for merging branches:
getMergeBase.
equals() and hashCode() are overridden to compare Commits by their contents.
- String
message: a descriptive commit message. - ZonedDateTime
time: a timestamp of the commit. - String
parent: the hash of its parent. - String
mergeParent: the hash of the other parents if it's a merge commit.nullotherwise. - TreeMap<String, String>
files: maps filenames to blob references.- Note that a HashMap does not produce a consistent hash value—which is required to identify the commit—and is therefore replaced by a TreeMap.
Blob is a library class with no instance variable. It realizes persistence in this project and manages blobs and staging areas.
It provides the following I/O functionalities for file manipulation:
- stage and un-stage a file:
stageBlob(),unstageBlob(); - committing staged files and clear stage:
commitStagedFiles(),clearStage(); - hashing a file:
hash(); - fetching files by its hash:
readBlob(); - ensuring every file is tracked:
containsAllFilesInCWD(); - clearing and resetting current working directory:
safeClearCWD(),restoreFiles().
Changes in the directory are tracked in stagedAdd and stagedDelete and finalized only when the user calls gitlet commit.
Adding a file behaves differently depending on context:
- If the file doesn't exist: return.
- If the same version of the file is found in the current commit: return.
- If the file is staged for removal: "un-remove" the file.
- Default behaviour: stage the file for addition and copy its contents into staging area.
remove is the inverse of add:
- If the file doesn't exist: notify the user that there is no reason to remove the file.
- If it's staged for addition: "un-stage" the file.
- Default behaviour: stage the file for addition and remove the file from working directory if the user hasn't already done so.
To create a Commit, a commit message from the user is required.
When gitlet commit is called, the repo then passes a few parameters to the Commit constructor: a commit message, the hash of current commit, and staged changes.
After the instantiation, the repo writes the commit to file, updates its variable tracking current head commits, moves staged files into blob folder, and clears the stage.
The user can check out a file in current commit, a file in a past commit, or check out a branch. checkout will modify the current working directory and might overwrite any unsaved changes.
Checking out a file means to restore a file to its previous state tracked in a particular commit. On the other hand, checking out a branch tells the repo to point its current head to the latest commit on that branch and reset files in the directory accordingly.
A HashMap is stored in the Repository instance to map branches to their latest commits. Commits’ hashes are stored in place of the actual objects to avoid serializing the commits as well.
When the user creates a new branch, a new record is put into branches mapping the branch name to the current head commit.
In this implementation, merge is broken down into two sub-tasks: finding the merge base and comparing files.
The logic is implemented in Commit class as a static method, which takes two commits and return the latest common ancestor.
The program does this by listing all parents of the two commits and their parents recursively in reversed chronological order, and find the latest common parent.
Given base, current, and merge commits, the program then compares every file tracked in current and merge individually to fetch the latest version of the file.
Comparing a file can have one of the following outcomes:
- File is the same in
currentandmerge: it is considered latest. - File is the same in
baseand one ofcurrentormerge: the other version is considered latest. - File is different in all three commits: a merge conflict occurred.
The program runs the checks for every file and, for each of them, the latest version will be fetched to current working directory. When the merge is finished, the repo automatically add all files and commit the changes.
Persistence for internal objects like Repository and Commit are realized by serialization. For those objects, a static read method and a write method are provided to standardize the serialization.
The Repository instance is saved in ./gitlet/meta. Since there is only one repo per directory, Repository.read() doesn’t need any additional arguments.
write() simply serializes itself and saves the result to the defined path.
Commits, however, are only addressable by their hashes. Commit.read() therefore takes a String hash as an argument and search for a commit with the given hash in ./gitlet/commit.
Due to the length of hashes, Commit.read() also accepts an abbreviated hash value, where it will return the first commit it encounters with a hash that starts with the given value. Unfortunately, one thing to note about this behaviour is that the result is not deterministic when two or more commits include the same starting characters in the argument.
When write for a commit is called, the commit serializes and stores itself in a file named its hash under ./gitlet/commit . Upon success, it returns the hash as a String.
When creating objects, the contents are simply copied and saved in a file. The files also use their hashes as the filenames.