Linux File System Layout and Mount Points...

What the Linux File System Actually Is

When most engineers say "file system" they mean the format on disk — ext4, XFS, Btrfs. But Linux uses the term in a broader sense, and if you conflate the two meanings you'll confuse yourself badly the first time you encounter something like

proc

tmpfs

. A Linux file system is any hierarchical namespace that exposes data through the standard file operations:

open

read

write

stat

readdir

. The storage backend is irrelevant. Disk, RAM, kernel data structures, network — it doesn't matter. If you can mount it and navigate it with a path, it's a file system as far as Linux is concerned.

The piece that makes this possible is the Virtual File System, or VFS. VFS is a kernel abstraction layer that sits between system calls and the concrete file system drivers. When your process calls

open("/var/log/syslog", O_RDONLY)

, the kernel doesn't know or care whether

/var/log

sits on an ext4 partition, an NFS share, or an overlay mount. VFS translates the call into driver-specific operations and returns a file descriptor. This is why "everything is a file" isn't just a philosophy — it's a kernel engineering decision with real consequences for how you build and manage systems.

Under VFS, four key data structures do the heavy lifting. The superblock represents a mounted file system instance and stores global metadata: block size, inode count, flags, and a pointer to the file system operations table. The inode stores per-file metadata — permissions, ownership, timestamps, and block pointers — but notably not the file name. The dentry (directory entry) maps a name to an inode and is cached aggressively in the dentry cache for performance. Finally, the file object represents an open file descriptor in a process and tracks the current position within the file. Understanding that names live in dentries and not inodes is the key to understanding hard links, which I'll come back to later.

How Mount Points Work

A mount point is simply a directory in the existing tree where a new file system is grafted. When you run

mount

, the kernel calls the

mount(2)

syscall, which allocates a new superblock for the target file system, creates a mount structure, and attaches it to the mount tree at the specified path. From that moment on, any path resolution that reaches that directory gets handed off to the new file system's driver rather than continuing through the parent. The directory itself — the mount point — isn't deleted or modified. It's hidden behind the newly attached tree. Unmount it and the original directory reappears, contents intact.

Linux maintains a per-namespace mount tree. In the early days this was a single global tree, but since Linux 3.8 every process can have its own mount namespace, which is fundamental to how containers work. When you run

unshare --mount

or create a new mount namespace via

clone(2)

with

CLONE_NEWNS

, the child gets a copy of the parent's mount tree. Changes made inside the namespace — new mounts, unmounts — don't affect the parent unless you're using shared propagation, which I'll get to in a moment.

Mount propagation is one of those topics that reads like simple documentation until you break production with it. There are four propagation types. Shared mounts propagate events bidirectionally between peer groups — mount something inside a shared mount, and it appears in all peers. Slave mounts receive events from a master but don't send them back. Private mounts have no propagation at all. Unbindable mounts are private and additionally can't be bind-mounted. In my experience, the default on most modern distributions is shared propagation for the root file system, which means that if you mount something inside a container's namespace without thinking about this, it can leak back to the host. Always check with

findmnt -o TARGET,PROPAGATION

before assuming isolation.

# Show mount tree with propagation flags
findmnt --tree -o TARGET,SOURCE,FSTYPE,OPTIONS,PROPAGATION

# Sample output excerpt
TARGET                SOURCE      FSTYPE      OPTIONS                  PROPAGATION
/                     /dev/sda1   ext4        rw,relatime              shared
├─/sys                sysfs       sysfs       rw,nosuid,nodev,noexec   shared
├─/proc               proc        proc        rw,nosuid,nodev,noexec   shared
├─/dev                devtmpfs    devtmpfs    rw,nosuid                shared
│ ├─/dev/pts          devpts      devpts      rw,nosuid,noexec         shared
│ └─/dev/shm          tmpfs       tmpfs       rw,nosuid,nodev          shared
└─/data               /dev/sdb1   xfs         rw,relatime              shared

The Filesystem Hierarchy Standard and Why It's Laid Out That Way

The FHS isn't arbitrary. Every major directory split exists because of a real operational concern — either separability for independent mounting, performance characteristics, or administrative convenience.

/usr

was historically a separate partition because it held read-only user programs that could be shared over NFS across workstations.

/var

holds variable data — logs, spool files, package databases — and belongs on its own partition so that a runaway log file can't fill the root and take down the system.

/tmp

is volatile by design.

/boot

often needs to live on a partition that the BIOS or UEFI firmware can access before the kernel is running, which can constrain the file system type and encryption options.

In production I've seen the root file system fill to 100% more times than I can count, and it's almost always

/var/log

that's responsible. A properly partitioned server carves

/var

off the root so that log growth never threatens system stability. Similarly, on database servers I'll put

/var/lib/postgresql

/var/lib/mysql

on a dedicated XFS partition with mount options tuned for write throughput. Keeping the database data path on its own mount point means you can snapshot it cleanly, remount it read-only for a consistent backup, or replace the underlying block device without touching anything else.

/etc/fstab — The Configuration File You Must Not Ignore

Every persistent mount is defined in

/etc/fstab

. Each line has six fields: device, mount point, file system type, mount options, dump frequency, and fsck pass order. The device field accepts block device paths, but you should almost never use bare

/dev/sdX

paths in production. Device names are not stable across reboots — a disk that enumerates as

/dev/sdb

today might come up as

/dev/sdc

tomorrow if another disk is added or the enumeration order changes. Use UUIDs or labels instead.

# Get UUID and label info for all block devices
blkid

# /dev/sda1: UUID="a1b2c3d4-e5f6-7890-abcd-ef1234567890" TYPE="ext4" LABEL="root"
# /dev/sdb1: UUID="f9e8d7c6-b5a4-3210-fedc-ba9876543210" TYPE="xfs" LABEL="data"

# Correct fstab entries using UUID
UUID=a1b2c3d4-e5f6-7890-abcd-ef1234567890  /       ext4  defaults,relatime          0 1
UUID=f9e8d7c6-b5a4-3210-fedc-ba9876543210  /data   xfs   defaults,relatime,noatime  0 2
tmpfs                                       /tmp    tmpfs mode=1777,nosuid,nodev     0 0

The last two fields trip up junior engineers constantly. The dump field (fifth column) should be

0

for virtually everything — it controls the legacy

dump

utility which nobody actually uses. The fsck pass field (sixth column) controls whether and when

fsck

checks the file system at boot.

0

means skip,

1

means check first (root only),

2

means check after root. If you have XFS or Btrfs file systems, set this to

0

— those file systems use their own recovery mechanisms and running

fsck

on them is either a no-op or actively harmful.

Mount Options That Actually Matter in Production

The options field is where you harden your mounts and tune performance. Don't leave everything as

defaults

and call it done. Let me walk through the options I actually configure on production systems at sw-infrarunbook-01.

noexec prevents execution of binaries directly from the mount point. I put this on

/tmp

/var/tmp

, and any partition that doesn't need to run executables. It won't stop a determined attacker, but it meaningfully raises the cost of exploiting a write vulnerability to execute a payload. nosuid ignores the setuid and setgid bits on files in that mount, which prevents privilege escalation through setuid binaries dropped onto world-writable mounts. nodev prevents device file interpretation, which matters on any mount that untrusted users can write to.

relatime versus noatime is a performance decision. By default, every file read updates the access time (

atime

) on the inode, which turns every read into a write and can thrash your storage on read-heavy workloads.

noatime

disables atime updates entirely.

relatime

, the common middle ground, only updates atime when it's older than mtime or ctime — which satisfies most applications that check atime while eliminating the worst-case write amplification. On busy log servers and database hosts, switching from default atime to relatime can make a measurable difference in I/O utilization.

# Hardened /tmp in fstab
tmpfs  /tmp     tmpfs  rw,nosuid,nodev,noexec,relatime,size=2G,mode=1777  0 0
tmpfs  /var/tmp tmpfs  rw,nosuid,nodev,noexec,relatime,size=1G,mode=1777  0 0

# Verify options are applied after mount
grep '/tmp' /proc/mounts

Bind Mounts and Their Practical Uses

A bind mount takes an existing directory (or file) in the tree and makes it appear at a second location simultaneously. Both paths point to the same underlying data — there's no copy. The kernel simply attaches the source's dentry tree at the target location. Bind mounts are powerful because they let you reshape the namespace without moving data on disk.

The most common production use case I reach for is exposing a specific subdirectory into a chroot or container. If a service runs in a chroot at

/srv/chroot/dns

and it needs access to

/etc/resolv.conf

, you don't copy the file — you bind-mount it in. When the host updates

/etc/resolv.conf

, the chroot sees the change immediately because there's only one inode.

# Bind-mount a single file into a chroot
mount --bind /etc/resolv.conf /srv/chroot/dns/etc/resolv.conf

# Make it read-only inside the chroot
mount --bind /etc/resolv.conf /srv/chroot/dns/etc/resolv.conf
mount -o remount,ro,bind /srv/chroot/dns/etc/resolv.conf

# Bind-mount in /etc/fstab
/etc/resolv.conf  /srv/chroot/dns/etc/resolv.conf  none  bind,ro  0 0

Another case where bind mounts shine is testing. On sw-infrarunbook-01, when I need to test a new configuration directory structure without modifying the live path, I'll bind-mount the test directory over the live one for the duration of the test and unmount when done. No file moves, no risk of leaving behind a symlink.

Special File Systems: proc, sysfs, devtmpfs, and tmpfs

These aren't storage file systems — they're kernel interfaces that happen to speak the VFS protocol.

proc

exposes process information and kernel state.

/proc/mounts

is what the kernel actually has mounted, which is the authoritative source — more reliable than parsing

/etc/fstab

, which is just a wishlist.

/proc/meminfo

/proc/cpuinfo

/proc/net/

— all dynamically generated from kernel data structures on every read. No disk I/O happens.

sysfs

, mounted at

/sys

, exports kernel object hierarchies — devices, drivers, buses, power management. It's the kernel's preferred modern interface for device configuration. When you write to

/sys/block/sda/queue/scheduler

to change the I/O scheduler for a disk, you're triggering a kernel function through what looks like a file write.

devtmpfs

manages the device nodes under

/dev

dynamically, creating and removing nodes as devices appear and disappear.

tmpfs

is a real file system backed by virtual memory — RAM and swap. It performs extremely well because there's no disk, but its contents disappear on unmount. I use it for

/tmp

/run

, and on systems with enough RAM, for scratch space in data pipelines where intermediate results don't need to survive a reboot. Always set a size limit on tmpfs mounts. The default limit is half of physical RAM, and I've seen servers run out of memory because an application hammered

/tmp

with temporary files and nothing enforced a ceiling.

How Inodes Connect to All of This

Each file system has a fixed inode table allocated at format time (for ext4; XFS allocates dynamically). Running out of inodes is a different failure mode from running out of disk space, and it can be just as fatal. You'll see "No space left on device" even with gigabytes free on disk. This happens most commonly when an application creates enormous numbers of small files — mail spools, PHP session directories, package manager caches.

# Check inode usage per mount point
df -i

# Filesystem      Inodes   IUsed   IFree IUse% Mounted on
# /dev/sda1      3932160  312000 3620160    8% /
# /dev/sdb1      6553600 6553590      10  100% /data

# Find directories with massive inode consumption
find /data -xdev -printf '%h\n' | sort | uniq -c | sort -rn | head -20

Hard links are a direct consequence of inode architecture. A hard link is a dentry that points to an existing inode — a second name for the same underlying data. The inode has a link count field that tracks how many dentries reference it. The data is only freed when the link count drops to zero and no process has the file open. Symbolic links, by contrast, are their own inodes containing a path string. They can cross file system boundaries because they're just path pointers; hard links cannot, because dentries are only meaningful within a single superblock's namespace.

Common Misconceptions

The biggest one I hear: "mounting replaces the directory." It doesn't. The directory still exists on the underlying file system — it's just hidden while the mount is active. Unmount the file system and the directory, including any files you might have accidentally left there before mounting, comes back. I've seen engineers create files in what they thought was a mounted directory, only to realize later they were writing to the hidden layer underneath. Always verify with

findmnt

mount | grep target

before writing to a path you expect to be mounted.

Second misconception: "NFS mounts work like local mounts." They do at the VFS interface level, but semantics differ in ways that will ruin your day. NFS has weaker consistency guarantees, atime behavior can differ based on server settings, file locking uses a separate daemon (rpcbind/statd), and a network partition will cause your mount to hang indefinitely by default unless you use the

soft

and

timeo

options — which introduce their own tradeoffs around silent data corruption on write failures. NFS deserves its own article, but know that mounting it and treating it as ext4 is a mistake.

Third: "

/etc/mtab

is authoritative." On modern systems,

/etc/mtab

is a symlink to

/proc/self/mounts

, which is the kernel's own view of your mount table. This is actually correct behavior. On older systems where

/etc/mtab

was a real file, it could drift out of sync with actual mounts if a mount operation failed to update it cleanly — a particularly nasty failure mode after a crash. If you're troubleshooting mounts, always read

/proc/mounts

directly and treat it as ground truth.

# The authoritative mount table
cat /proc/mounts

# Or with more human-readable output
findmnt

# Find what's mounted on a specific path
findmnt /data

# TARGET  SOURCE     FSTYPE  OPTIONS
# /data   /dev/sdb1  xfs     rw,relatime,attr2,inode64,logbufs=8,noquota

In my experience, engineers who invest the time to understand VFS, mount propagation, and inode semantics stop treating storage as a black box and start making better decisions — about partition layout, about mount options, about how containers interact with the host file system. It's foundational knowledge that pays dividends every time you're debugging a full disk, a permission problem, or an unexpected mount behavior in a containerized environment. Get comfortable with

findmnt

, understand what's in

/proc/mounts

, and never use bare device paths in fstab again.

Linux File System Layout and Mount Points Explained

What the Linux File System Actually Is

How Mount Points Work

The Filesystem Hierarchy Standard and Why It's Laid Out That Way

/etc/fstab — The Configuration File You Must Not Ignore

Mount Options That Actually Matter in Production

Bind Mounts and Their Practical Uses

Special File Systems: proc, sysfs, devtmpfs, and tmpfs

How Inodes Connect to All of This

Common Misconceptions

Related Articles

Frequently Asked Questions

What is the difference between a file system and a mount point in Linux?

Why should I use UUIDs instead of device names like /dev/sdb1 in /etc/fstab?

What is a bind mount and when would you use one?

Can you run out of inodes even with free disk space?

What is the VFS layer in Linux and why does it matter?

Related Articles