August 30, 2015

Git: Terms, Concepts, and Commands

Once upon a time, I had to introduce Git to a subversion community. So I read the Pro Git book, mined some cool sites, and took notes.

Overview

This note provides very basic terms and concepts to help orient the new VCS working environment. It is intended as a jumping off point for further research.

Git is a command line first product; all functionality is available on the command line and all GUIs are built on this functionality. Understanding the command line tools allows you to easily map those concepts onto any IDE or GUI client. No IDE or GUI client implements all git commands and possible workflows; knowing the command line tools can augment IDE or GUI client functionality or fix things should they break.

Git is a local first VCS. You generally make all modifications to your local repository and periodically push those changes to a remote repository when you need to share your work.

Resources

Git advantages over Subversion

Speed: all operations are faster, most exponentially so because they are local
Secure: each file is SHA-1’d so git can detect a corrupt file and refuse to work on it. The rare corruption affects one version of a file, and is easily repaired
Off-intranet development: a project carries the entire git repo with it - you can use VCS commands anywhere. Also, GitLab is publicly accessible
Greater control over history: a variety of tools allow you rewrite your local history to only contain what you think will interest posterity. In contrast, svn history is generally littered with trivial commit comments which makes finding interesting things more difficult.

Git in contrast to Subversion

SVN stores every project in a single repository. In GIT, every repository is stored in its project.
SVN tracks change sets to individual files and builds the current file from a history of revisions. Git stores each complete file as part of a project filesystem snapshot.
SVN branches are conceptually a different folder and worked on in a separate local folder. Git has all branches in the same folder; switching is an IDE click or single command.
SVN branching is typically rare, git branching is trivial and part of everyday work.
SVN branch merging is typically rare. In git, it is done daily. NOTE: Merge conflicts exist in both.

Note on branching: from my readings and experience, most svn users work on the tip of trunk, use process and peer-pressure to avoid collisions, and don’t branch. Many articles say that svn branching/merging is hard, but since v1.6, that is not true, it is just a couple of IDE clicks. Git branching is absolutely trivial and I find that I create a branch for every new idea. If that idea works out, I merge it. If not, I delete it. Git branching has dramatically changed the way I work and eliminated tedious and time consuming rollbacks of bad ideas that happened in svn.

Concepts

Git keeps the entire history of the project in every clone of the repo. All git metadata, files, branch history, hooks, etc. are stored in the .git directory in your project root.

Git’s focus is the project; where most other VCSs focus on files. Git’s commit saves a snapshot of the project file system; where other VCSs usually focus on a file or subset of project files.

The working directory is a single checkout of one version of the project. It may contain git tracked and untracked files.

Tracked files exist in three states:

modified – you have changed a file but not committed it
staged (or indexed) – a modified file has been marked to go into the next commit
committed – securely added to the local git database

The staging area is a file in the .git directory that stores information about what will go into your next commit. The staging area (or index) contains a snapshot of a file when you “git add” that file. If you modify that file after the “git add”, those modifications will not be committed. This provides for some sophisticated workflows, but in general is more power than most workflows need, and is bypassed.

NOTE: most Git IDE and GUI clients manage the notion of staged or indexed transparently for you. In practice, you will normally deal with project untracked, modified, and committed files.

Since git stores all branches and their history in a single repo, when you clone that repo, you get all branches and history as well. You can easily switch between branches locally, update them, commit to them, or merge between them. These changes can be shared with others on your next push.

When you change branches in your local repo, the contents of the working directory change to reflect the state of that branch on last commit. If you have modified files in one branch that you do not want to commit before switching to another branch, you “stash” your changes and then when you return to that branch, you “unstash” them.

Terms

.gitignore or .git/info/exclude: a file to record project files that should never be tracked. like build files or IDE configuration files. The .git/info/exclude file is part of the repo and applies to everyone. Using .gitignore allows you to choose whether you want to include that file in the project.
author: the person writing the code – may not be the same as the person committing the code
committer: the person that committed the code
fetch vs pull: fetch pulls all information from a remote repo without merging it into your current working directory. It is available for inspection or merging at your discretion. Pull generally fetches data from the remote repository and tries to merge it into your current work.
HEAD: a pointer to the local branch you are currently on. In svn, trunk or branch is a separate working directory and HEAD refers to the latest version of that line of development. In git, HEAD is a pointer to what ever branch you are currently working on; since all branches could potentially exist in the same working directory.
master: the default main branch of a project origin: by convention, the name of the primary, centralized repository for the project
remote: a named remote repository. The name is an alias pointing to URLs for the repo and used in many commands.
snapshot: the collection of all tracked project files at a point in time – roughly equivalent to a commit.
tags: work as with any other VCS; naming a specific commit version for future easy identification. Note that git push does not automatically share tag names; you must do that manually with a separate “git push origin ”.
tracking branch: a remote branch that has a direct relationship to a local branch – you created your local branch from a remote branch. E.g. every time you clone a repository, git creates a master branch that tracks origin/master. For clarity and to simplify GUI operations, local and remote branch names should be the same.

Commands

You can get help on any git command through the man page system with “git help ”. Briefly, this is what the most common commands do.

add: have git stage this file(s)/directory for the next commit
checkout: set the current working directory branch. All files in your working directory will be replaced with the last commit of the branch you are checking out.
clone: bring a copy of a remote repo to your computer – all versions of all files and their history as opposed to checking out a single version in other VCS programs.
commit: commit changes to the local repo. Use “git commit –a” to bypass the staging part of the workflow.
fetch: get all information from a remote repo. This does not merge any files with your working directory, but that information is now available to inspect or merge into your code.
init: create a git repo in a project directory
log: list commits made in the repo
merge: merge changes from another branch into your currently checked out branch.
mv: rename a file. This handles deleting the old file adding the new file to the index.
pull: update your local repository from a remote repository (fetch and merge)
push: update a remote repository from your local repository. This pushes all commits since the last push (or initial clone)
remote: show and operate on remote repositories.
rm: delete a file from the project. On commit, this will delete the file from the working directory and the file will not be included in future project versions. Simply deleting the file does not remove it from the project.
status: show which files are in which state (untracked, modified).
tag: list and manipulate tags. You should always use annotated tags to capture full details: tagger name, email, date, and message.

Common command use

git add : Start tracking a single file or directory.
git checkout – : revert the file to its committed state – overwrites the file with the latest committed version of the file.
git checkout –b SS-903 : Create a new branch for Jira story SS-903, check it out to the current project working directory, and start working on it.
git init : Create a new directory and initialize a git repo there.
git log –p -2 : show the last two commits with diffs
git log –pretty=format:”%h %s” –graph : show revision and short summary in graph format. Very useful if you are using a lot of branches and merges.
git log –since=1.weeks : show what was committed in the last week.
git log –no-merges ..origin/master : Shows differences between you branch and master to see what you have to merge in before commit.
git remote add : Set a remote repository for the local git repo. This allows you to push, pull, and fetch to/from that repository. The conventional name for the main centralized repository is origin.
git remote show origin : show details of the remote repository named origin: URLs, branches, branch hierarchies for push and pull.
git rm –cached : remove a file from the list of tracked files. This is handy to undo an accidental add
git tag –a my-service-1.1.0 –m “1.1.0 release version” : Create an annotated tag for a release version.