YAGT

Yet Another Git Talk

by Harald Schilly

# What's Git? From the website:

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

## Purpose

Git tracks line oriented text files (i.e. Source Code, TXT, HTML, SVG, LaTeX, ...) in a specific directory and all its sub-directories. (This is called "repository".)
Git records modifications in a history.
Git allows to conveniently share modifications of a project between any member in a p2p fashion.

## History Git was created by [Linus Torvalds](http://en.wikipedia.org/wiki/Linus_Torvalds) for the development of Linux in 2005. There was no tool available that would meet all his needs, and would fit into the existing kernel development model. His major accomplishment was to break down the complexity of a full fledged RCS/SCM into 3 very basic building blocks. All Git operations manipulate and operate only on those!

## Design Goals * Speed * Simple design * Strong support for non-linear development (thousands of parallel branches) * Fully distributed * Able to handle large projects like the Linux kernel efficiently (speed and data size)

## Distributed Git is also a *distributed* version control system. This means, everyone has all the history stored locally. This allows you to make any kind of queries or modifications without connecting to a central server. It also fits much better with our mental model of *collaboration* in general.

## Git is not ...

... a friend of binary files (i.e. jpeg, mp3, avi, doc, docx, odt, zip (unzip first), png, ...)
... a way to just upload data to a (web-) server
→ use rsync -av --delete $SRC $DEST
... a mere incremental backup
→ use rdiff-backup

TED Talk: Git/Internet transforms Government

## Internal Stuff Git models the state and history via three very simplistic data structures: blob, tree and commit. It sounds scary, but the best way to understand Git, is to learn about those basic ideas.

Hey, we are hackers after all :-)

## Hash function A hash function is a cryptographic primitive, which transforms some arbitrary data (file, byte-string) into a byte-string of *fixed length*! Such a hash is **unique** and **deterministic**.¹ Git uses this technique extensively via [sha1](http://en.wikipedia.org/wiki/SHA-1). Usually, the first few characters are enough to specify it. e.g. d22f6ac

## Hash Example A commit from the Git history:

$ git log -1
commit d22f6ac868fd62dba328b555d93ec63b0fb00194
Author: Harald Schilly <harald.schilly@gmail.com>
Date:   Sat Oct 6 21:07:48 2012 +0200

    fragment: zoom and relative offset animation

## Linked List & Trees All "objects" behind such hashes are uniquely identified by the hashsum¹. References to other objects from "*inside*" such an object are used to build trees (or to be precise, a DAG). This data-structure will be the "*history*" of the project.

## Tree = File-System All files tracked by Git are hierarchically organized by "*trees*". Each entry of a tree points to one specific "hashed" content of a file (a "*blob*") or another tree (a sub-directory). Besides that, the name of the file is recorded, too.

## Commit A **commit** is one specific snapshot in time of a tree with all referenced files in their current state. This is the fundamental building block of everything inside the Git repository. Learning Git means mastering how to create commits and how to juggle them around. Such a commit also records the author, the time/date, and usually references one or more previous commits.

## Working Dir & Staging To create a commit, you have to go "*shopping*".

Modify or create at least one file in your "working directory".
Go around and explicitly put desired changes into your shopping cart ("index").
Finally, a commit appends the content of this shopping cart to your "history".

## History The collection of all those commits, referencing to the immediate ancestor(s), builds the history of the entire project. * A **root** is a commit with no ancestors¹. * A **merge** is a commit with more than one ancestor.

¹ Yes, there could be more than one: git checkout --orphan root2

## Branch It is hard to remember all those constantly changing hashsums. A *named reference* to a specific commit is the start of a **branch**. It is the linear ordering of all reachable commits until the root. Also, when we *commit*, the branch reference automatically changes and points to the new commit!¹

History is a [DAG](http://en.wikipedia.org/wiki/Directed_acyclic_graph)

Same history with names.

## HEAD The **HEAD** is a special companion, which points to to the *current* branch or a specific revision. It is the base for all files in your current working space.

$ cat .git/HEAD
ref: refs/heads/master

Committing: HEAD moves and takes the branch reference with it.

## Detached HEAD Unfortunately, your **HEAD** could fall off. In this case, it points directly to a revision and not a branch. Then, the branch reference doesn't automatically move with the next commit! Git does not care about those revisions which aren't reachable from any branch (or tag). This means, a commit in a detached state will get lost!

$ cat .git/HEAD
b1b7e86e011aebad8df6d9af33214abedb113983

committing w/ a detached head

## Rewriting History The feature¹ of Git to discard all commits which are not reachable, is used in various ways. This is often called "**rewriting history**".

Cons: You can shoot yourself in the foot².
Pros: It allows you to fix errors (e.g. the last commit was incomplete), or to extract and combine parts of a larger history.

¹Yes, feature not bug. This is the key to repair mistakes!
²But don't worry, Git takes great care to not loose anything unless you -f "*force*" it. There is also a safety net called "*reflog*".

## Selecting Revisions

HEAD is where you are
beginning of the hashsum 8frfa1
ancestors: master~ or master~2 ...
2nd parent of a merge: 8r7af^2
combo move: master~3^2~~
ranges: master..topic1
branch differences: master...topic2

## Installation * download from [Git homepage](http://www.git-scm.com/)¹ * Ubuntu: sudo apt-get install git * MacPorts: sudo port install git-core ### Install Tools sudo apt-get install gitk git-cola

Basics

Git is a command-line tool, called via git.

HELP

I NEED HHHEEEELP!!!

git help [<command>]
or
git [<command>] --help

GUIs: gitk and git-gui, Git Guis, git-cola, SmartGit, TortoiseGit, Eclipse, ... but this talk stays "pure".

## Configuration Git has a global ~/.gitconfig and project specific local .git/config configuration file. Edit them either directly (leet!) or via git config [--global] [--edit] ...

## Config: Identity All commits are associated with you. That's necessary for [licensing copyrighted works](http://www.wipo.int/treaties/en/ip/berne/trtdocs_wo001.html).

$ git config --global user.name "Gitti Gitian"
$ git config --global user.email "gitti@gitiscool.com"
# and check it
$ git config --global user.name
Gitti Gitian

Remember: --global sets this for your entire account!

## Config: Aliases Git allows to create custom commands. They can be shortcuts or even small scripts. Set them this way:

$ git config --global alias.[SHORTCUT] [COMMAND ...]

## My Aliases

$ git aliases
st = status
ci = commit
ll = log --oneline --graph --decorate -25
lla = log --oneline --graph --decorate --all -25
br = branch
co = checkout
df = diff
who = shortlog -s --
aliases = !git config --get-regexp 'alias.*' | colrm 1 6 | sed 's/[ ]/ = /'

## Config: Editor Git sometimes pops up an text editor. Choose your poison:

$ git config --global core.editor "vim"

## Config: Merge Tool This is what Git gives you via git mergetool, when it gives up resolving conflicts.

$ git config --global merge.tool "meld"

## Config: Color Makes reading output much easier.

$ git config --global color.ui true

## Config: Ignore Files Usually, your working directory contains a mix of tracked files and generated files which should not be tracked. You can tell Git to ignore them globally:

$ git config --global core.excludesfile ~/.gitignore

and project specific:
edit $PROJ_ROOT/.gitignore

Config: Ignore Files /2

Typical entries for the global ~/.gitignore:

*.o
*.so
*.class
*.pyc
*~           # temporary editor files
*.swp        # vim temp file
*.min.js
*.log
*.DS_Store   # Mac users

## Start a repository A repository in Git is decentralized. You do not need to know in advance, with whom you want to work together.

$ mkdir myshinynewrepo
$ cd !$
$ git init

## Clone a Repo ... but if you know the project, it is easier to just "clone" it.

$ git clone [git|http]://server/.../project.git

Then, this remote server is already setup for sending your commits to it ("pushing").

Status

To see all modified and untracked files (which are not ignored), Git provides the status command. Its output also lists useful commands, which will be explained in the next slides.

Staging

Git has a so called "staging area". This is a selection of modified files, or parts of files, that you wish to be part of the next "[commit](#/2/5)".

$ git add fileA fileB ...
# or just all, which are already tracked or deleted
$ git add -A
# or just a part
$ git add -p fileX
# to get rid of files
$ git rm fileY

##Edits after staging are ignored! Only the changes that were present at the time of staging will be committed - not the additional modifications. Reason: Staging means, that the modifications are copied into something called *"index"*; it has nothing to do with the actual files. To see what's going on: git status.

git diff --cached tells what will be committed

## Committing Creating a new commit is probably the most famous feature. In its basic use case, the modifications from the *index* are taken and stored as a new commit in the history. git commit Additionally, your author information, the time and a custom "*commit message*" are stored.

It is also easy to see commit **diff**erences.

$ git diff HEAD~3..HEAD~2 index.html
diff --git a/index.html b/index.html
index 43063fb..06d9a2e 100644
--- a/index.html
+++ b/index.html
@@ -140,7 +144,9 @@
     Gitti Gitian
-    <code>--global</code> sets this for your entire account!
+    <aside class="footnote">
+      Remember: <code>--global</code> sets this for your entire account!
+    </aside>
     </section>

- = removed + = added

## Too many commands? It is possible to combine **add -A** and **commit**, even including the commit message:

$ git commit -am "commit message"

## Log To see the history, use git log. * -[number] limits to the last [number] entries * --oneline self explanatory * --graph tries to be artsy * --decorate shows HEAD, branches, and tags

$ git log -5 --oneline --graph --decorate HEAD~59
*   c411c4d Merge pull request #131 from Sinetheta/master
|\
| * 495cb98 changed theme file swap to be relative to theme file instead of slide-deck
|/
* aa97d80 correction to code style in sky theme, adjusted transition demo page
* 0c06469 prevent same theme from loading repeatedly
* dc9b93a tweaks to the simple theme
* 90f343e add theme config option, add sky theme, fix line-height of <small>
* e3f3e9d Update README.md
* 141d694 support for named links (closes #55)

## Branch, Checkout, Reset Now we start to get really serious! There are three very powerful commands, that let you * move around in the history, pick files from the past * change all files in the working directory to another state * change a branch reference without committing * create a new branch anywhere * get rid of erroneous work * etc.

## Branch

git branch branch_name [commit] [-f]

This creates a new branch right where the HEAD is, or at the given commit. -f can be used to move an existing branch *reference* around. -d deletes and -m renames.

## Checkout The checkout command has several basic functions. The most basic one is for switching branches, which moves the HEAD and populates the index and working directory:

git checkout branch_name

Copy a file from the repository:

git checkout [commit] -- path/filename

Apart from that, it also has the shortcut -B for creating a new branch and immediately switching to it.

checking out a branch (HEAD moves)

checking out files from the history (HEAD doesn't move)

## Reset reset sounds a bit more scary - and yes it is. It's job is to move around the branch reference underneath the HEAD.

--soft only moves HEAD (like all modes do) and doesn't touch index
--mixed (default), resets index
--hard resets index and working directory (local changes lost forever!)

HEAD+master moves, three commits are lost

## Reset -- path/filename Like checkout, reset also has a mode where it works on paths to files. This resets the entries for the specified files in the index.

git reset [commit] -- path/filename

Hence, this is the opposite of add.

## Summary /1

### Summary /2 It is possible to *skip* the staging area. *Both are modified!*

## Merge Finally, once we have several alternative realities in different branches, we want to "*merge*" them into one. The merge command has the job, of creating a new commit with more than one parent.¹

Assume the following history exists and the 
current branch is "master":

      A---B---C topic
     /
D---E---F---G master

Then "git merge topic" will replay the changes made 
on the topic branch [...] and record the result 
in a new commit [...]

      A---B---C topic
     /         \
D---E---F---G---H master

## P2P Ok wait. Wasn't Git about *collaboration*?

Well, 99% happens locally.

The missing piece is a way to send and recieve commits as part of branches, based on given references.

## Hosting Lonely Git repositories are sad repositories. Share your project with collaborators via hosting services!

Any server with an sshd server and Git installed
GitHub - very popular, issue tracker, hosting, wiki
Gitorious - open-source project hosting w/ collaboration tools.
Bitbucket - similar to GitHub, also offers free non-public hosting
Google Code - issue tracker and wiki
CodePlex - even Microsoft talks Git
... or a shared directory initialized with git init --bare usually ending in *.git.
... or a bundle for the paranoid on an isolated machine :-)

## Remote git remote manages all remote servers. git remote add [name] [URL] adds a remote server. And you can list remotes

$ git remote -v show
origin  git@github.com:.../...git (fetch)
origin  git@github.com:.../...git (push)
uni ssh://name@login/.../repo (fetch)
uni ssh://name@login/.../repo (push)

## origin/master When you do a git clone git://.../, Git has already set you up with a "*remote*" called **origin** pointing to the given URL. Also, your main branch is usually called **master**. Hence, the reference to the master branch on the server is origin/master. (But you can call them as you like.)

## pull/push and fetch * push uploads commits that aren't on the server to it * fetch downloads new commits which you don't have * pull does a fetch and also merges with your respective branch

## Tracking Finally, you can tell Git which of your local branches correspond to which ones of the remote repository. This is called "*tracking*" and could be done via

git branch --set-upstream master origin/master

Also, when pushing to a remote, you can do this via the -u switch: git push -u joe featureX.

In case of total confusion, this is also in .git/config.

## Check tracking

$ git branch -vv
* master cf36560 [origin/master] readme
  other  d39c6f1 [origin/other] init "other" branch, ...

## Outdated references Git could be unaware of new changes on the server. That usually causes headaches! In that case, do a git remote update to get the references straight. Then, e.g. git log --decorate shows the remote references at the correct positions!

## Tags git tag name [commit] creates a named reference to a specific commit. They are very similar to a branch reference, but stay fixed. Usually, it is used to name a specific release version, v2.7.1. Checking out a tag brings you into the detached HEAD state. It's also possible to cryptographically sign it.

## Quick Undo One way to do a quick undo: git reset [--hard] HEAD^

## Reflog As a safeguard, Git stores the entire journey of the HEAD. This "*reflog*" prevents Git from forgetting unreachable commits for at least a few weeks. This means, the quick undo can be undone! git reflog lists you the relevant hashes. Use reset --hard to reset everything back to how it was. Note: This reflog is only local to your repository!

## Tree

$ git ls-tree HEAD
100644 blob aa5fa7141766f8cf11d1d25d39cd04a2ffee7a95  .gitignore
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391  .nojekyll
100644 blob 6fed1c32d012538ea783b0eb8d13a643cc981666  LICENSE
100644 blob 27ed242ad6ab8bbb19c9b6182ee47b3939a15db0  README.md
040000 tree 96d0c04c1d308c31faacfa26eeee2b4b256be116  css
100644 blob a9aa16404dd4d78493208929095137097f133975  index.html
040000 tree dd1ae6043bf368f504b2aff6e3b936607bf0be07  js
040000 tree 6e46172d37ce36c570fd65458eefc02cee58c8b2  lib
100644 blob 86578fe468e926513e2ca87d4c5792e333b4b58d  package.json
040000 tree 04016b53decd21110649ef3cfdc936ca7849ddce  plugin
040000 tree 3f2ac677462a553dbcfe81088d8b276b52c50f29  res

## Blob by Hash

$ git show 27ed242ad6ab8bbb19c9b6182ee47b3939a15db0
# YAGT - Yet another Git talk

This is a fork of [reveal.js](https://github.com/hakimel/reveal.js)
with the goal to build a talk about [Git](http://www.git-scm.org/)
and hence explaining it.

[...]

## Commit

$ git show eec35ee
commit eec35ee92b9e3d1f97aecd49b6d884db82604c47
Author: Harald Schilly <harald.schilly@gmail.com>
Date:   Sun Oct 7 01:19:58 2012 +0200

    more links

diff --git a/index.html b/index.html
index 97f4792..1744916 100644
--- a/index.html
+++ b/index.html
@@ -502,21 +502,33 @@ It is possible to *skip* the staging area. *Both are modified!*

[...]

## Cherry Picking **Copy** one changeset from one branch into another one. This is useful to extract changes for one particular aspect into a separate branch.

$ git checkout -b fix666 [start_point]
$ git cherry-pick 5d3e1b6
Finished one cherry-pick.
[master e458a9b] Bug fix for Issue #666
 3 files changed, 36 insertions(+), 3 deletions(-)

Note: Later, merging branches containing the same commits are no problem, since they produce the same content!

## Rebasing Basically, this resets to an earlier revision and replays the history "in a different way" by consecutive cherry picks. Imagine, you are on a side-branch of master. Now, you want to move this side-branch on top of the last commit in master:

$ git checkout side-branch
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: ...

## Interactive Rebase git rebase -i HEAD~[n] It lets you:

reorder commits
edit an earlier one (e.g. stash your current work before rebasing and ammend it earlier!)
get rid of them
squash some together

## Squashing This is the art of combining several consecutive commits into just one. This can be done via git rebase -i HEAD~[n] or

git checkout END_SHA1
git reset --soft START_SHA1
git commit --amend -m "squashed history"
TARGET=`git rev-list HEAD --max-count=1`
git checkout master
git rebase --onto $TARGET START_SHA1

[credits for this madness](http://stackoverflow.com/questions/598672/git-how-to-squash-the-first-two-commits)

Talk

Git

Git
Git Book

Resources

Fork Me: yagt@github

License: CC BY-SA

Based on work by Hakim El Hattab and Mark Lodato.
Details: git blame ;-)