Ghost Rows and Time Travel: A Deep Dive into PostgreSQL MVCC

by dev

If you’ve ever wondered how a database can let you run a massive analytics query while simultaneously processing thousands of user updates, without the whole system grinding to a halt, the answer lies in a concept called MVCC (Multi-Version Concurrency Control).

In this deep dive, we’re going to step away from the abstract theory and look at the actual bytes on the disk. We will explore how PostgreSQL implements this "time travel" feature using hidden columns, how updates are actually inserts in disguise, and why your database needs a garbage collector.

The Concept: Snapshots over Locks

In older database systems, ensuring data consistency often meant locking. If User A is editing a row, User B has to wait to read it. PostgreSQL takes a different approach.

Instead of updating data in place, Postgres treats data as immutable versions. When you "update" a row, you aren't overwriting the old data; you are creating a specific version of that row valid for a specific timeframe.

Transaction A sees the database state as it existed at the start of its transaction.
Transaction B sees the state as it exists after updates.

Both exist simultaneously. This is MVCC.

The Hidden Architecture

To make this work, PostgreSQL quietly appends several system columns to every single row you create. You usually don't see them, but they are the engine behind the magic.

Let's uncover them.

Setting up the Laboratory

We'll use a fresh scenario: an Airline Reservation System. We need to track seats on a flight.

First, create the environment (assuming you have a Postgres instance running):

CREATE DATABASE flight_systems;
\\c flight_systems
CREATE EXTENSION pageinspect; -- We need this to see the raw bits later

Now, create a simple table for seat assignments.

CREATE TABLE seats (
    seat_no TEXT PRIMARY KEY,
    class TEXT,
    price INT
);

INSERT INTO seats (seat_no, class, price) VALUES
('1A', 'First', 1000),
('1B', 'First', 1000),
('10C', 'Economy', 200);