The NameNode is the bookkeeper of an HDFS cluster — the node that maintains the metadata describing what files exist and which blocks they’re made of. The actual file contents live on the DataNodes; the NameNode tracks which blocks live on which DataNodes and how the blocks fit together to form complete files.

Conceptually, the NameNode’s metadata looks like a Python dictionary mapping file paths to lists of block IDs:

{
    "/user/hadoop/input.txt":  [1, 2],
    "/books/moby_dick.pdf":    [3, 6],
    "/images/IMG_001.png":     [4, 5]
}

When a client wants to read /user/hadoop/input.txt, the conversation goes:

  1. Client asks the NameNode: where do blocks 1 and 2 live?
  2. NameNode replies: block 1 is on Node 1 (replica also on Node 3 and Node 7); block 2 is on Node 1, Node 2, Node 5.
  3. Client talks directly to the DataNodes to read the actual bytes.

The NameNode handles only metadata operations. The data path — actually transferring bytes — bypasses it. This separation is what lets HDFS scale: the NameNode’s CPU isn’t a bottleneck for read/write throughput.

In Hadoop 1.x the NameNode was a hard single point of failure: if the metadata server went down, the cluster’s files became inaccessible even though the bytes still sat safely on the DataNodes. A Secondary NameNode runs alongside the active one, but despite the name it isn’t a hot standby — its job is to periodically merge the edit log into a fresh checkpoint of the metadata, so restart after a crash is faster. The actual high-availability story arrived in Hadoop 2.x: two NameNodes operate in active/standby mode with shared edit logs (on JournalNodes) and coordinated failover via ZooKeeper, so failure of the active one promotes the standby without taking the cluster offline.

When the NameNode is first set up, we format it to initialize the metadata storage:

hadoop namenode -format

This is a one-time operation per cluster. Running it on an existing cluster wipes the metadata and effectively destroys the cluster’s view of its files — the DataNodes still have the block bytes on disk, but no one knows which file they belong to. This is why the format command lives behind a warning and a confirmation.