Git: Under the Hood — How Git Actually Works

Oct 22, 2025
GitVCS

Git’s Core Philosophy

At its core, Git isn’t about “versions of files” — it’s about snapshots of content.

Most version control systems (like SVN) store differences — the delta between file versions.

Git stores snapshots — each commit is a complete snapshot of your project’s state (with deduplication for unchanged files).

The .git Directory

Whenever you run git init, Git creates a hidden .git folder — this is the entire repository. Your working directory is just a convenience layer.

Inside .git, you’ll find key components:

├── config
├── description
├── HEAD
├── index
├── info
│   ├── exclude
│   └── refs
├── logs
│   ├── HEAD
│   └── refs
├── objects
│   ├── info
│   └── pack
├── packed-refs
└── refs
    ├── heads
    └── tags

Let’s decode these one by one.

The Object Database — Git’s Heart

Everything in Git is stored as an object inside .git/objects/.

There are four types of objects:

  • Blob — represents file contents
  • Tree — represents directories (contains references to blobs and subtrees)
  • Commit — represents a snapshot with metadata (author, parent, message, etc.)
  • Tag — a named pointer to a commit (often annotated)

Git stores all these as content-addressable objects, meaning:

The object’s filename = SHA-1 hash of its contents.

Blob — The File’s DNA

When you git add file.txt, Git compresses and stores the file contents as a blob object.

You can inspect it with:

git hash-object file.txt

This command outputs something like:

9562e8b2802fe9b2ea5764741c19f37847cb8acf

This is a SHA-1 hash. Inside .git/objects, Git stores it as:

.git/objects/95/62e8b2802fe9b2ea5764741c19f37847cb8acf

So Git doesn’t store filenames — just file contents. The link between names and blobs comes later (in trees).

Tree — The Directory Blueprint

A tree object represents the structure of a directory. Each entry in a tree object maps a filename to:

  • A blob/tree hash
  • File mode
  • Type (blob or tree)

You can see tree content with:

git cat-file -p <tree-hash>
100644  blob  b2d8968efa87a1845f3a94ca73618a194c2304a1	README.md
040000  tree  dddd84a1983a6438c2f4831b012c214bb17df43e	src

So a tree is basically the directory listing, recursively pointing to blobs and subtrees.

Commit — The Snapshot Metadata

A commit object points to:

  • One tree (the root directory snapshot)
  • Zero or more parent commits
  • Author and committer metadata
  • Commit message

You can view a commit object like this:

git cat-file -p <commit-hash>
tree  421da6484e3079368699a902463f9ac09898a71b
parent  33f9d5b4d4998b2e12b77bac1f6dad96d901030b
author  admincodes7  <arjunbanur27@gmail.com>  1761184353  +0530
committer  admincodes7  <arjunbanur27@gmail.com>  1761184353  +0530

Just to see parent of this commit

The Index — Git’s Staging Area

When you git add, Git doesn’t commit yet — it updates the index (a binary file stored at .git/index). The index maps:

<path><blob-hash>

So when you git commit, Git:

  • Writes a new tree object from the index.
  • Writes a commit object pointing to that tree (and to its parent).

That’s it — no magic.

Branches and Refs — Just Pointers

A branch in Git is simply a file that stores a commit hash.

.git/refs/heads/main → a1b2c3d4...

HEAD is another file:

.git/HEAD → ref: refs/heads/main

So when you git commit, Git:

  • Creates a new commit object.
  • Moves the branch’s ref to point to it.
  • Updates HEAD accordingly.

Tags — Named Commits

A tag is a named reference to a commit. Lightweight tags just point to a commit hash. Annotated tags are full objects containing metadata.

object 33d906ce59d2e29f647bb182cda61eaa22f12c72
type commit
tag v1.0
tagger admincodes7 <arjunbanur27@gmail.com> 1761185097 +0530

Release version 1.0

Git Add, Commit, Checkout — Internals

Let’s trace one complete flow:

git add

  • Hash each file → store as blob.
  • Update index with <path → blob-hash>.

git commit

  • Create a tree from index.
  • Create a commit object with:
  • - tree = <tree-hash>
  • - parent = <previous-commit>
  • Move HEAD’s ref (branch) to new commit.

git checkout

  • Read the commit → tree → blobs.
  • Replace working directory files with those blobs.

Git Merge — Combining Histories

A merge commit has two parents:

parent 1a410ef...
parent b7d34e2...

Git finds the common ancestor (the merge base), performs a three-way merge, and creates a new commit that has both as parents.

If conflicts arise, it just writes conflict markers in the working directory — Git never auto-edits blobs beyond that.

Git Stash — A Commit in Disguise

When you git stash, Git actually creates two commits:

  • One commit for your working directory changes.
  • Another for your index (staged) changes.

Then it stores a reference to those commits under:

refs/stash

You can view them using:

git cat-file -p refs/stash

So stash isn’t some temporary buffer — it’s a hidden commit chain.

Git Garbage Collection

When you delete a branch or rewrite history, some commits become unreachable.

Git doesn’t delete them immediately — it keeps them as loose objects.

Running git gc (garbage collection) packs these objects into .git/objects/pack/ files and removes unreachable ones after a grace period.

So even after “deleting,” your data lingers until cleanup.

Git Packfiles — Efficiency Mode

Individual objects (in .git/objects/) are inefficient for large repos.

Git’s packfiles combine objects and store them delta-compressed.

That’s what happens during:

git gc
.git/objects/pack/
├── pack-xxx.pack
├── pack-xxx.idx

These pack files are highly optimized — Git can store thousands of versions of a file efficiently.

Git Refspecs and Remote Sync

When you git push or git fetch, Git transfers objects by comparing hashes.

A refspec defines what refs to push/fetch:

refs/heads/main:refs/remotes/origin/main

So git push origin main means:

Send my local refs/heads/main commit chain to origin’s refs/heads/main.

Everything is hash-driven. If the remote already has an object (by hash), Git skips it — no redundancy.

Git’s Fundamental Data Model

Everything is hash-driven. If the remote already has an object (by hash), Git skips it — no redundancy.

To summarize:

Commit → Tree → (Subtrees + Blobs)
        Branches & Tags (refs)

Everything is content-addressable by hash. That’s why Git is immutable and trustworthy — you can’t fake history without changing hashes.

Why Git Feels So Fast

  • All operations are local — no server needed.
  • Everything is hash-based lookup.
  • Delta compression keeps storage compact.
  • Immutable data ensures consistency.

Git is a Database, Not Just a Tool

Git is a key–value store, where:

key = SHA-1(content)
value = object (blob/tree/commit/tag)

The rest — branches, merges, remotes — are just clever abstractions built on top of that.

Wrapping up

Next time you type git commit, remember:

You’re not “saving changes”. You’re minting a new immutable object into a beautifully designed content-addressable universe.