About internal Git objects
For the purposes of this training, you don’t need to know anything about the internals of Git!
Git objects
Inside the .git/
directory, the actual data is stored in the objects/
directory, indexed by object hash. The first two characters of each hash make
the name of the directory, and the rest of the hash makes a filename inside the
directory.
Git objects are used to represent the changing state of a repository:
commit
objects represent a unit of change. Eachcommit
contains atree
, and aparent
, which is a reference to another commit object. The full history of a Git repository is built of these parent relationships.tree
objects are snapshots of directory structure. Eachtree
contains at least one othertree
orblob
.blob
objects are snapshots of file contents. Eachblob
contains the contents of that file at a given point in time.
Exploring some objects
In this repository the dff557a5cdaa8b7a826d74ce82fd65c878d2b9bd
commit is
represented by the file below. The first 7 digits dff557a
can be used with
git
as a short representation of the full commit hash.
$ file .git/objects/df/f557a5cdaa8b7a826d74ce82fd65c878d2b9bd
.git/objects/df/f557a5cdaa8b7a826d74ce82fd65c878d2b9bd: zlib compressed data
As shown by the file
utility above, this data is compressed, so we won’t be
able to read it normally:
$ cat !:1
cat .git/objects/df/f557a5cdaa8b7a826d74ce82fd65c878d2b9bd
xAn @Ѭ9\`E+q
!*]Ԇ)o(#z=hOr6G!Qܶ?I~m[pC֊ޓ"
,$uk;k[Oo
We can decompress it with generic tools:
$ zlib-flate -uncompress < .git/objects/df/f557a5cdaa8b7a826d74ce82fd65c878d2b9bd
commit 248tree be8302d77917a381cd91bb04901eace1d542ae47
parent cb775ec187cc757e55f4c9255a12ebb2b97b8846
author Matt Fisher <matt.fisher@nsidc.org> 1647034766 -0700
committer Matt Fisher <matt.fisher@nsidc.org> 1647034766 -0700
Add SSH key instructions link
But git
also provides a utility we can use to display the object
corresponding with a hash: git cat-file
. We will use this utility going
forward, just know that under the hood, git cat-file
is only doing the above:
Finding the object corresponding to a hash, and decompressing it for us to
read.
A commit
object
A commit
object has two pieces of data (keep in mind, “objects” are data and
are referenced by hash. Everything else is metadata): a tree
, which is its
own type of object that we’ll explore later, and a parent
. Finally the commit
contains metadata like the author
(a name, email, and a UNIX timestamp),
committer
, and commit message.
The commit
object dff557a
can be inspected using -p
to “pretty-print” the
contents of the object, and -t
to show the type of the object. Additionally,
-s
can show the size in bytes. Further, we can use the date
utility to
convert the UNIX timestamp to a human-readable one:
$ git cat-file -t dff557a
commit
$ git cat-file -s dff557a
248
$ git cat-file -p dff557a
tree be8302d77917a381cd91bb04901eace1d542ae47
parent cb775ec187cc757e55f4c9255a12ebb2b97b8846
author Matt Fisher <matt.fisher@nsidc.org> 1647034766 -0700
committer Matt Fisher <matt.fisher@nsidc.org> 1647034766 -0700
Add SSH key instructions link
$ date -d @1647034766
Fri 11 Mar 2022 02:39:26 PM MST
The parent
is just another commit
object. The parent
relationship is what
establishes the structure of the git history. Just like with humans, parents
are ancestors, or past versions. Children are descendents, or future versions.
Each commit can have one to many parents (except a root commit which can have
zero parents), and zero to many children.
zero parents for the initial commit, one parent for a normal commit, and multiple parents for a commit that results from a merge of two or more branches.
By looking at the parent
object, we can see that Matt Fisher
first
performed a change called Extract planning to its own doc
, then another
operation called Add SSH key instructions link
. We can continue to follow
this parent relationship all the way to the beginning of the repository if we
want!
$ git cat-file -t cb775ec187cc757e55f4c9255a12ebb2b97b8846
commit
$ git cat-file -p cb775ec187cc757e55f4c9255a12ebb2b97b8846
tree 81e6377718f49156f99e60fcac725fd45bc64519
parent d913a4784fd68d13e6bc04d61f374f336ab090ea
author Matt Fisher <matt.fisher@nsidc.org> 1647034564 -0700
committer Matt Fisher <matt.fisher@nsidc.org> 1647034564 -0700
Extract planning to its own doc
A tree
object
The tree
object from the previous section can be abbreviated as 81e6377
. A
tree
object represents the structure of the files in the repo at a given
point in time. The “root” tree we’re looking at contains a tree
for each
directory in the repository, as well as a blob
for each file.
$ git cat-file -t 81e6377
tree
$ git cat-file -p 81e6377
100644 blob 29791eda0b90a331e41d4d3992d089ae6d764748 README.md
100644 blob 32303047543dfb4f77a804070f82a699aa442581 _planning.md
040000 tree 9aa2a1611c62dd36db2ec6329021e75a5e586f2d doc
This tells us that at the time of commit cb775ec
(which we learned above was
March 11th), the repository contained files (blob
s) named README.md
and
_planning.md
, and a directory (tree
) called doc
. We can see what was
inside the doc
tree
and thusly explore the entire contents of the
repository at this point in time:
$ git cat-file -p 9aa2a16
100644 blob bda1a223d98790c8710ebbd89b2571f70e8b28f5 .gitconfig-example
100644 blob 5b549963deab12a43c42d33e416b19a19557f684 git-cheatsheet.md
100644 blob e9753d38cdbfdceff24d97da93748ae8171cae32 glossary.md
100644 blob 67fa85e89ced7979424e2cf94755a7c92c93032a learn-more.md
A blob
object
Finally, a blob
is just a snapshot of a file. The full contents of the file
at a given time are contained in the blob.
$ git cat-file -p 29791ed
# Git Training
![git logo](https://git-scm.com/images/logos/2color-lightbg@2x.png)
Learn to use `git`!
<...output truncated...>
Hashes
Git’s hashes are (usually) sha1
hashes calculated on the type, size, and
contents of an object. The hashes are deterministic, meaning the hash will
always be the same for the same content. They are not random as they appear!
We can manually re-generate the hash of commit dff557a
like so:
$ content=$(git cat-file -p dff557a)
$ size=$(git cat-file -s dff557a)
$ type=$(git cat-file -t dff557a)
$ echo $content
tree be8302d77917a381cd91bb04901eace1d542ae47 parent cb775ec187cc757e55f4c9255a12ebb2b97b8846 author Matt Fisher <matt.fisher@nsidc.org> 1647034766 -0700 committer Matt Fisher <matt.fisher@nsidc.org> 1647034766 -0700 Add SSH key instructions link
$ echo $size
248
$ echo $type
commit
$ echo -e "$type $size\x00$content" | sha1sum
dff557a5cdaa8b7a826d74ce82fd65c878d2b9bd -
In the end, we have an exact match!