Git Under the Hood, How it Actually Works

Oct 22, 2025

Git’s Core Philosophy

Git isn’t about “versions of files”, it’s snapshots of content.

Most version control systems (like SVN) store differences, the delta between file versions.

Git stores snapshots, each commit is a complete snapshot of your project’s state but in a clever way (deduplication for unchanged files).

The .git Directory

Whenever you run git init, Git creates a hidden .git folder, this is the entire repository. Your working directory is just an easy layer.

Inside .git, you’ll find key components:

├── config
├── description
├── HEAD
├── index
├── info
│   ├── exclude
│   └── refs
├── logs
│   ├── HEAD
│   └── refs
├── objects
│   ├── info
│   └── pack
├── packed-refs
└── refs
    ├── heads
    └── tags

Let’s decode these one by one.

The Object Database: Git’s Heart

Everything in Git is stored as an object inside .git/objects/.

There are four types of objects:

  • Blob — represents file contents
  • Tree — represents directories (contains references to blobs and subtrees)
  • Commit — represents a snapshot with metadata (author, parent, message, etc.)
  • Tag — a named pointer to a commit (often annotated)

Git stores all these as content-addressable objects, meaning:

The object’s filename = SHA-1 hash of its contents.

Blob: The File’s DNA

When you git add file.txt, Git compresses and stores the file contents as a blob object.

You can inspect it with:

git hash-object file.txt

This command outputs something like:

9562e8b2802fe9b2ea5764741c19f37847cb8acf

This is a SHA-1 hash. Inside .git/objects, Git stores it as:

.git/objects/95/62e8b2802fe9b2ea5764741c19f37847cb8acf

So Git doesn’t store filenames but file contents. The link between names and blobs? There comes, Trees.

Tree: The Directory Blueprint

A tree object represents the structure of a directory. Each entry in a tree object maps a filename to:

  • A blob/tree hash
  • File mode
  • Type (blob or tree)

You can see tree content with:

git cat-file -p <tree-hash>
100644  blob  b2d8968efa87a1845f3a94ca73618a194c2304a1	README.md
040000  tree  dddd84a1983a6438c2f4831b012c214bb17df43e	src

So a tree is basically the directory listing, recursively pointing to blobs and subtrees.

Commit: The Snapshot Metadata

A commit object points to:

  • One tree (the root directory snapshot)
  • Zero or more parent commits
  • Author and committer metadata
  • Commit message

You can view a commit object like this:

git cat-file -p <commit-hash>
tree  421da6484e3079368699a902463f9ac09898a71b
parent  33f9d5b4d4998b2e12b77bac1f6dad96d901030b
author  admincodes7  <arjunbanur27@gmail.com>  1761184353  +0530
committer  admincodes7  <arjunbanur27@gmail.com>  1761184353  +0530

Just to see parent of this commit

The Index: Git’s Staging Area

When you git add, Git doesn’t commit yet, it updates the index (a binary file stored at .git/index). The index maps:

<path><blob-hash>

So when you git commit, Git:

  • Writes a new tree object from the index.
  • Writes a commit object pointing to that tree (and to its parent).

That’s it, no magic.

Branches and Refs = Pointers

A branch in Git is simply a file that stores a commit hash.

.git/refs/heads/main → a1b2c3d4...

HEAD is another file:

.git/HEAD → ref: refs/heads/main

So when you git commit, Git:

  • Creates a new commit object.
  • Moves the branch’s ref to point to it.
  • Updates HEAD accordingly.

Tags (Named Commits)

A tag is a named reference to a commit. Lightweight tags just point to a commit hash. Annotated tags are full objects containing metadata.

object 33d906ce59d2e29f647bb182cda61eaa22f12c72
type commit
tag v1.0
tagger admincodes7 <arjunbanur27@gmail.com> 1761185097 +0530

Release version 1.0

Git Add, Commit, Checkout

Let’s trace one complete flow:

git add

  • Hash each file → store as blob.
  • Update index with <path → blob-hash>.

git commit

  • Create a tree from index.
  • Create a commit object with:
  • - tree = <tree-hash>
  • - parent = <previous-commit>
  • Move HEAD’s ref (branch) to new commit.

git checkout

  • Read the commit → tree → blobs.
  • Replace working directory files with those blobs.

Git Merge: Combining Histories

A merge commit has two parents:

parent 1a410ef...
parent b7d34e2...

Git finds the common ancestor (the merge base), performs a three-way merge, and creates a new commit that has both as parents.

If conflicts arise, it just writes conflict markers in the working directory. Git never auto-edits blobs beyond that.

Git Stash

When you git stash, Git actually creates two commits:

  • One commit for your working directory changes.
  • Another for your index (staged) changes.

Then it stores a reference to those commits under:

refs/stash

You can view them using:

git cat-file -p refs/stash

So stash isn’t some temporary buffer, it’s a hidden commit chain.

Git Garbage Collection

When you delete a branch or rewrite history, some commits become unreachable.

Git doesn’t delete them immediately, it keeps them as loose objects.

Running git gc (garbage collection) packs these objects into .git/objects/pack/ files and removes unreachable ones after a grace period.

So even after “deleting,” your data dances until cleanup.

Git Packfiles: Efficiency Mode

Individual objects (in .git/objects/) are inefficient for large repos.

Git’s packfiles combine objects and store them delta-compressed.

That’s what happens during:

git gc
.git/objects/pack/
├── pack-xxx.pack
├── pack-xxx.idx

These pack files are highly optimized. Git can store thousands of versions of a file efficiently.

Git Refspecs and Remote Sync

When you git push or git fetch, Git transfers objects by comparing hashes.

A refspec defines what refs to push/fetch:

refs/heads/main:refs/remotes/origin/main

So git push origin main means:

Send my local refs/heads/main commit chain to origin’s refs/heads/main.

Git’s Fundamental Data Model

Everything is hash-driven. If the remote already has an object (by hash), Git skips it = no redundancy.

To summarize:

Commit → Tree → (Subtrees + Blobs)
        Branches & Tags (refs)

Everything is content-addressable by hash. That’s why Git is immutable and trustworthy, you can’t fake history without changing hashes.

Why Git Feels So Fast

  • All operations are local, no server needed.
  • Everything is hash-based lookup.
  • Delta compression keeps storage compact.
  • Immutable data ensures consistency.

Git is a Database, Not Just a Tool

Git is a key–value store, where:

key = SHA-1(content)
value = object (blob/tree/commit/tag)

The rest: branches, merges, remotes are just clever things built on top of that.

Wrapping up

Next time you type git commit, remember:

You’re not “saving changes”. You’re adding a new immutable object into a beautifully designed universe.