Git Under the Hood, How it Actually Works
Git’s Core Philosophy
Git isn’t about “versions of files”, it’s snapshots of content.
Most version control systems (like SVN) store differences, the delta between file versions.
Git stores snapshots, each commit is a complete snapshot of your project’s state but in a clever way (deduplication for unchanged files).
The .git Directory
Whenever you run git init, Git creates a hidden .git folder, this is the entire repository. Your working directory is just an easy layer.
Inside .git, you’ll find key components:
├── config
├── description
├── HEAD
├── index
├── info
│ ├── exclude
│ └── refs
├── logs
│ ├── HEAD
│ └── refs
├── objects
│ ├── info
│ └── pack
├── packed-refs
└── refs
├── heads
└── tagsLet’s decode these one by one.
The Object Database: Git’s Heart
Everything in Git is stored as an object inside .git/objects/.
There are four types of objects:
- Blob — represents file contents
- Tree — represents directories (contains references to blobs and subtrees)
- Commit — represents a snapshot with metadata (author, parent, message, etc.)
- Tag — a named pointer to a commit (often annotated)
Git stores all these as content-addressable objects, meaning:
The object’s filename = SHA-1 hash of its contents.
Blob: The File’s DNA
When you git add file.txt, Git compresses and stores the file contents as a blob object.
You can inspect it with:
git hash-object file.txtThis command outputs something like:
9562e8b2802fe9b2ea5764741c19f37847cb8acfThis is a SHA-1 hash. Inside .git/objects, Git stores it as:
.git/objects/95/62e8b2802fe9b2ea5764741c19f37847cb8acfSo Git doesn’t store filenames but file contents. The link between names and blobs? There comes, Trees.
Tree: The Directory Blueprint
A tree object represents the structure of a directory. Each entry in a tree object maps a filename to:
- A blob/tree hash
- File mode
- Type (blob or tree)
You can see tree content with:
git cat-file -p <tree-hash>100644 blob b2d8968efa87a1845f3a94ca73618a194c2304a1 README.md
040000 tree dddd84a1983a6438c2f4831b012c214bb17df43e srcSo a tree is basically the directory listing, recursively pointing to blobs and subtrees.
Commit: The Snapshot Metadata
A commit object points to:
- One tree (the root directory snapshot)
- Zero or more parent commits
- Author and committer metadata
- Commit message
You can view a commit object like this:
git cat-file -p <commit-hash>tree 421da6484e3079368699a902463f9ac09898a71b
parent 33f9d5b4d4998b2e12b77bac1f6dad96d901030b
author admincodes7 <arjunbanur27@gmail.com> 1761184353 +0530
committer admincodes7 <arjunbanur27@gmail.com> 1761184353 +0530
Just to see parent of this commitThe Index: Git’s Staging Area
When you git add, Git doesn’t commit yet, it updates the index (a binary file stored at .git/index). The index maps:
<path> → <blob-hash>So when you git commit, Git:
- Writes a new tree object from the index.
- Writes a commit object pointing to that tree (and to its parent).
That’s it, no magic.
Branches and Refs = Pointers
A branch in Git is simply a file that stores a commit hash.
.git/refs/heads/main → a1b2c3d4...HEAD is another file:
.git/HEAD → ref: refs/heads/mainSo when you git commit, Git:
- Creates a new commit object.
- Moves the branch’s ref to point to it.
- Updates HEAD accordingly.
Tags (Named Commits)
A tag is a named reference to a commit. Lightweight tags just point to a commit hash. Annotated tags are full objects containing metadata.
object 33d906ce59d2e29f647bb182cda61eaa22f12c72
type commit
tag v1.0
tagger admincodes7 <arjunbanur27@gmail.com> 1761185097 +0530
Release version 1.0Git Add, Commit, Checkout
Let’s trace one complete flow:
git add
- Hash each file → store as blob.
- Update index with <path → blob-hash>.
git commit
- Create a tree from index.
- Create a commit object with:
- - tree = <tree-hash>
- - parent = <previous-commit>
- Move HEAD’s ref (branch) to new commit.
git checkout
- Read the commit → tree → blobs.
- Replace working directory files with those blobs.
Git Merge: Combining Histories
A merge commit has two parents:
parent 1a410ef...
parent b7d34e2...Git finds the common ancestor (the merge base), performs a three-way merge, and creates a new commit that has both as parents.
If conflicts arise, it just writes conflict markers in the working directory. Git never auto-edits blobs beyond that.
Git Stash
When you git stash, Git actually creates two commits:
- One commit for your working directory changes.
- Another for your index (staged) changes.
Then it stores a reference to those commits under:
refs/stashYou can view them using:
git cat-file -p refs/stashSo stash isn’t some temporary buffer, it’s a hidden commit chain.
Git Garbage Collection
When you delete a branch or rewrite history, some commits become unreachable.
Git doesn’t delete them immediately, it keeps them as loose objects.
Running git gc (garbage collection) packs these objects into .git/objects/pack/ files and removes unreachable ones after a grace period.
So even after “deleting,” your data dances until cleanup.
Git Packfiles: Efficiency Mode
Individual objects (in .git/objects/) are inefficient for large repos.
Git’s packfiles combine objects and store them delta-compressed.
That’s what happens during:
git gc.git/objects/pack/
├── pack-xxx.pack
├── pack-xxx.idxThese pack files are highly optimized. Git can store thousands of versions of a file efficiently.
Git Refspecs and Remote Sync
When you git push or git fetch, Git transfers objects by comparing hashes.
A refspec defines what refs to push/fetch:
refs/heads/main:refs/remotes/origin/mainSo git push origin main means:
Send my local refs/heads/main commit chain to origin’s refs/heads/main.
Git’s Fundamental Data Model
Everything is hash-driven. If the remote already has an object (by hash), Git skips it = no redundancy.
To summarize:
Commit → Tree → (Subtrees + Blobs)
↑
Branches & Tags (refs)Everything is content-addressable by hash. That’s why Git is immutable and trustworthy, you can’t fake history without changing hashes.
Why Git Feels So Fast
- All operations are local, no server needed.
- Everything is hash-based lookup.
- Delta compression keeps storage compact.
- Immutable data ensures consistency.
Git is a Database, Not Just a Tool
Git is a key–value store, where:
key = SHA-1(content)
value = object (blob/tree/commit/tag)The rest: branches, merges, remotes are just clever things built on top of that.
Wrapping up
Next time you type git commit, remember:
You’re not “saving changes”. You’re adding a new immutable object into a beautifully designed universe.