Linux File System and Mount Points Explained

Everything Is a File — But What Does That Actually Mean?

You've heard the phrase a hundred times: in Linux, everything is a file. Processes, sockets, hardware devices, kernel tunables — the operating system presents all of it through a unified file interface. But the reason that abstraction works at all is because of the Virtual File System layer, and the mount system that hangs concrete storage onto that abstraction. If you've ever wondered what actually happens when you run

mount

, or why your container sees a completely different root than the host, this is the article for you.

I'm going to walk through this the way I'd explain it to a colleague who's comfortable with Linux administration but hasn't dug into the internals. We'll go from the VFS layer down to fstab, bind mounts, and mount namespaces — and I'll flag the misconceptions I see most often in the field.

The Virtual File System: One Interface, Many Backends

The kernel's VFS is a software layer that sits between user-space applications and actual file system implementations. When a process calls

open()

read()

, or

stat()

, it talks to the VFS. The VFS then dispatches that call to the appropriate concrete file system driver — whether that's ext4, XFS, tmpfs, or even a network file system like NFS.

The VFS defines four core objects: superblocks, inodes, dentries, and file objects. The superblock holds metadata about the mounted file system as a whole — its type, block size, and state. An inode represents a single file or directory, tracking permissions, timestamps, and pointers to the actual data blocks. A dentry (directory entry) maps a filename to an inode — it's the kernel's cache of the namespace tree. And a file object represents an open file descriptor in a running process, tying together the inode, current position, and flags.

This design is why you can do things like

stat /proc/1/status

and get sensible output even though there are no actual disk blocks backing that file. The

procfs

driver implements the VFS interface and generates content on the fly when the kernel services the read call. The application doesn't need to know or care.

What a Mount Point Actually Is

A mount point is just a directory. That's it. When you mount a file system onto a directory, the kernel attaches the root of that file system to that directory entry in the namespace tree. Anything that was previously in that directory becomes temporarily invisible — hidden behind the mounted file system's own root. This is not a copy, not a symlink, not a bind. The kernel literally replaces what the dentry points to.

The kernel tracks all active mounts in an internal structure accessible via

/proc/mounts

and the more detailed

/proc/self/mountinfo

. The

findmnt

tool reads these and formats them usably:

[infrarunbook-admin@sw-infrarunbook-01 ~]$ findmnt
TARGET                                SOURCE      FSTYPE      OPTIONS
/                                     /dev/sda2   ext4        rw,relatime
├─/sys                                sysfs       sysfs       rw,nosuid,nodev,noexec,relatime
│ ├─/sys/kernel/security              securityfs  securityfs  rw,nosuid,nodev,noexec,relatime
│ └─/sys/fs/cgroup                    cgroup2     cgroup2     rw,nosuid,nodev,noexec,relatime
├─/proc                               proc        proc        rw,nosuid,nodev,noexec,relatime
├─/dev                                devtmpfs    devtmpfs    rw,nosuid,size=8192k,nr_inodes=4096
│ ├─/dev/pts                          devpts      devpts      rw,nosuid,noexec,relatime
│ └─/dev/shm                          tmpfs       tmpfs       rw,nosuid,nodev
├─/run                                tmpfs       tmpfs       rw,nosuid,nodev,mode=755
└─/data                               /dev/sdb1   xfs         rw,relatime,attr2,inode64

Notice the tree structure. Every indented entry is a child mount — a mount whose mount point lives inside the parent mount's directory tree. This hierarchy is called the mount tree, and it's how the kernel resolves paths. When you stat

/sys/fs/cgroup

, the kernel walks the path components, checks the dentry cache, and follows the mount attachment to reach cgroup2's own inode space.

File System Types You'll Actually Encounter

Not all file systems store data on disk. Linux mounts a mix of storage-backed, memory-backed, and kernel-virtual file systems during a normal boot. Understanding what each one is for saves you a lot of confusion when you're staring at

/proc/mounts

wondering why there are fifteen entries before you even get to your disk.

ext4 is still the most common root file system type on general-purpose Linux. It's journaled, mature, and has excellent fsck tooling. XFS is a better choice for large files and high-throughput workloads — it scales better under heavy parallelism, which is why RHEL defaults to it. Both support extended attributes and ACLs out of the box.

tmpfs is memory-backed storage. It behaves exactly like a disk-backed file system from a user-space perspective, but its data lives in RAM (and optionally swap). The kernel uses it for

/dev/shm

/run

, and often

/tmp

. It doesn't have a fixed size ceiling unless you specify one — it grows and shrinks dynamically. I've seen junior engineers try to diagnose "disk full" errors in

/run

and spend twenty minutes looking for a disk device that doesn't exist.

proc and sysfs are pseudo-file systems that expose kernel data structures.

/proc

is process and system information.

/sys

is the kernel object model — device trees, driver parameters, and hardware state. Neither has any on-disk presence. If you're ever tuning kernel parameters at runtime with

echo

into

/sys/block/sdb/queue/scheduler

, you're writing directly into kernel memory through the sysfs interface.

devtmpfs provides the

/dev

tree. The kernel auto-populates it with device nodes as drivers register hardware. Without it mounted, you'd have no

/dev/sda

, no

/dev/null

, nothing. udev then manages the naming and permissions on top of this.

overlay (overlayfs) is what Docker and most container runtimes use for image layers. It stacks a read-only lower directory and a writable upper directory, presenting a merged view. Writes go to the upper layer. The lower layers are never modified. This is how you can spin up fifty containers from the same base image without duplicating gigabytes of data.

How fstab Works — and Where People Get It Wrong

The file

/etc/fstab

is a static table that tells the system what to mount at boot, and how. Each line has six fields: device, mount point, file system type, options, dump frequency, and fsck pass order. Here's a representative example from a server I work on:

# /etc/fstab on sw-infrarunbook-01
# <device>                                  <mount>   <type>  <options>                  <dump> <pass>
UUID=3f2e1a4b-8c7d-4e56-9f01-2b3c4d5e6f7a  /         ext4    defaults,errors=remount-ro  0      1
UUID=a1b2c3d4-e5f6-7890-abcd-ef1234567890  /data     xfs     defaults,noatime            0      2
UUID=dead1234-beef-cafe-0000-111122223333  /boot     ext4    defaults                    0      2
tmpfs                                      /tmp      tmpfs   defaults,size=2G,mode=1777  0      0

The device field should almost always use a UUID, not

/dev/sda1

. Device names aren't stable — if you add a disk or the kernel enumerates storage in a different order,

/dev/sdb

might become

/dev/sda

after a reboot. UUIDs don't change. Get them with

blkid

lsblk -o NAME,UUID

The options field is where the real control lives.

noexec

prevents execution of binaries from that mount — useful on

/tmp

and

/home

to limit attacker surface.

nosuid

ignores setuid bits, which matters if users can put files on a mount.

nodev

prevents interpretation of device files — you generally want this on any non-root partition.

relatime

is the modern default for atime handling: it only updates access time if the current atime is older than the modification time, reducing write overhead without completely disabling atime tracking like

noatime

does.

The dump and fsck fields trip people up. Dump is almost universally 0 these days — the dump utility is rarely used. The fsck pass field tells

fsck

when to check the file system at boot. The root file system should be 1. All other file systems that need checking should be 2 (they run after root, in parallel if possible). A value of 0 means skip fsck entirely, which is correct for tmpfs, proc, sysfs, and any network file system.

Bind Mounts: The Same Data, Different Path

A bind mount takes an existing directory (or file) and mounts it at a second location. Both paths show the same inode tree. Changes through either path are immediately visible through the other because they're backed by the same in-memory structures.

[infrarunbook-admin@sw-infrarunbook-01 ~]$ mkdir /mnt/bindtest
[infrarunbook-admin@sw-infrarunbook-01 ~]$ mount --bind /data/shared /mnt/bindtest
[infrarunbook-admin@sw-infrarunbook-01 ~]$ findmnt /mnt/bindtest
TARGET          SOURCE      FSTYPE  OPTIONS
/mnt/bindtest   /dev/sdb1   xfs     rw,relatime,attr2,inode64

Bind mounts are how container runtimes expose host paths inside a container's mount namespace. When you run a container with

-v /data/configs:/etc/app

, the runtime bind-mounts

/data/configs

into the container's private namespace at

/etc/app

. The container process sees it as a regular directory. The host directory is unchanged.

You can also bind-mount individual files, which is useful for injecting a single config file into a container without exposing an entire directory. And you can make a bind mount read-only even if the source is writable:

[infrarunbook-admin@sw-infrarunbook-01 ~]$ mount --bind /data/shared /mnt/readonly-view
[infrarunbook-admin@sw-infrarunbook-01 ~]$ mount -o remount,ro,bind /mnt/readonly-view

To make bind mounts persistent across reboots, add them to fstab with the

bind

option:

/data/shared  /mnt/bindtest  none  bind  0  0

Mount Namespaces: Isolation at the Kernel Level

Mount namespaces are the mechanism that lets containers have an entirely different file system view from the host. When a process creates a new mount namespace (via

unshare(2)

clone(2)

with

CLONE_NEWNS

), it gets its own private copy of the mount tree. Changes inside that namespace — mounting, unmounting, bind-mounting — are invisible to the parent namespace and to other namespaces.

In my experience, this is where a lot of engineers get confused when debugging containers. They'll shell into a host and try to find a mount they're certain they did inside a container, only to find it's not there. Of course it isn't — the container is running in its own mount namespace. The host's

/proc/mounts

shows the host namespace. The container's namespace is separate.

You can inspect another process's mount namespace by reading

/proc/<pid>/mounts

or by using

nsenter

[infrarunbook-admin@sw-infrarunbook-01 ~]$ nsenter --mount --target 4821 findmnt
TARGET                   SOURCE       FSTYPE   OPTIONS
/                        overlay      overlay  rw,relatime,lowerdir=...,upperdir=...,workdir=...
├─/proc                  proc         proc     rw,nosuid,nodev,noexec,relatime
├─/dev                   tmpfs        tmpfs    rw,nosuid,size=65536k,mode=755
└─/etc/resolv.conf       /dev/sda2    ext4     rw,relatime

That last line is particularly revealing — a single file bind-mounted from the host's ext4 root into the container's overlay namespace. That's exactly how container runtimes inject DNS configuration without affecting anything else in the container's file system.

Shared subtrees complicate this further. A mount can be marked as shared, slave, private, or unbindable. A shared mount propagates mount events to its peers. A slave mount receives propagation from its master but doesn't send it back. Private mounts don't propagate at all. This controls whether new mounts inside a namespace become visible outside it, and it's what the

--mount=type=bind,propagation=rprivate

syntax in container tooling is controlling.

systemd and Mount Units

On any systemd-based distribution, fstab entries are automatically converted into mount units at boot. You can also write mount units directly. A

.mount

unit's name must match the escaped mount point path — so a mount at

/data/shared

becomes

data-shared.mount

# /etc/systemd/system/data-shared.mount
[Unit]
Description=Shared Data Volume
After=network.target

[Mount]
What=/dev/disk/by-uuid/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Where=/data/shared
Type=xfs
Options=defaults,noatime

[Install]
WantedBy=multi-user.target

Paired with an

.automount

unit, systemd can mount file systems on demand — only when something actually accesses the mount point. This is useful for NFS shares that you don't want to block boot if the network isn't ready, or for removable storage.

Common Misconceptions I Keep Seeing

The first one: mounting doesn't copy data. I've had people tell me they're worried that mounting a directory over another directory will destroy the contents of the lower directory. Nothing is destroyed. The contents of the mount point directory are hidden while something is mounted on top of it, but they come right back when you unmount. Run

umount /mountpoint

and the original directory contents are exactly where you left them.

The second: proc and sys are not optional. I've seen people strip down containers to the point where

/proc

isn't mounted, then wonder why

ps

shows nothing, why

top

refuses to start, and why half the tooling is broken. Many system utilities read process and system state directly from procfs. If it's not mounted, they fail silently or with cryptic errors.

The third:

/etc/mtab

is not the source of truth. On modern systems,

/etc/mtab

is a symlink to

/proc/self/mounts

. The kernel's mount table is the authoritative record. Don't hand-edit mtab — it's auto-generated. If your mount isn't in

/proc/mounts

, it didn't work, regardless of what's in fstab.

The fourth: lazy unmount is not a safe default.

umount -l

performs a lazy unmount — it detaches the file system from the namespace tree immediately but doesn't actually release the resources until all open file descriptors against it are closed. This sounds convenient, but I've seen it cause real problems: processes continue reading and writing to what they think is the file system long after you thought you unmounted it, and then you try to run fsck on what you believe is an idle device. Use lazy unmount intentionally, not as a workaround for a busy device. The right approach is to find what has files open with

lsof +f -- /mountpoint

and deal with those processes first.

The fifth: tmpfs data doesn't survive reboots — obviously — but it also doesn't survive unmounts. This trips people up occasionally when they're testing something in

/dev/shm

and manually unmount and remount. The data is gone. tmpfs is volatile by design. Don't confuse it with a ramdisk image that you can serialize and reload.

Practical Checks for Day-to-Day Work

When you're investigating a storage problem,

findmnt

is almost always the right starting point. It reads

/proc/self/mountinfo

and gives you the full picture including propagation flags and bind sources.

lsblk -f

maps block devices to file system types and UUIDs.

df -Th

shows usage with file system types included — the

-T

flag is something I forget to add half the time and then wonder why I'm looking at unlabeled columns.

[infrarunbook-admin@sw-infrarunbook-01 ~]$ df -Th
Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/sda2      ext4       50G   18G   30G  38% /
tmpfs          tmpfs     7.8G     0  7.8G   0% /dev/shm
/dev/sdb1      xfs       500G  220G  280G  44% /data
tmpfs          tmpfs     2.0G  1.2M  2.0G   1% /tmp

If you suspect a mount point is hiding something (i.e., something mounted over a non-empty directory), you can temporarily move the mount aside with

mount --move

to inspect what's underneath, or check if the inode count for the mount point directory looks off relative to what you'd expect.

For anything involving mount namespaces and containers,

lsns -t mnt

lists all mount namespaces on the system with their owning process and PID. Combined with

nsenter

, you can walk into any process's mount namespace and inspect its exact view of the file system tree without affecting it.

[infrarunbook-admin@sw-infrarunbook-01 ~]$ lsns -t mnt
        NS TYPE NPROCS   PID USER               COMMAND
4026531841 mnt     142     1 root               /sbin/init
4026532198 mnt       4  4821 infrarunbook-admin /usr/bin/containerd-shim-runc-v2
4026532301 mnt       1  4835 100000             nginx: master process nginx

The Linux file system layer is one of those areas where understanding the internals pays compounding dividends. Once you understand what a mount point actually does at the kernel level, the behavior of containers, bind mounts, chroots, and namespace isolation stops being magic and starts being obvious. And when something breaks at 2 AM, obvious beats magic every single time.

Linux File System and Mount Points Explained

Everything Is a File — But What Does That Actually Mean?

The Virtual File System: One Interface, Many Backends

What a Mount Point Actually Is

File System Types You'll Actually Encounter

How fstab Works — and Where People Get It Wrong

Bind Mounts: The Same Data, Different Path

Mount Namespaces: Isolation at the Kernel Level

systemd and Mount Units

Common Misconceptions I Keep Seeing

Practical Checks for Day-to-Day Work

Related Articles

Frequently Asked Questions

What's the difference between a bind mount and a symlink?

Why does my container not see mounts I made on the host?

Is it safe to use /dev/sdX device names in fstab?

What happens to data in tmpfs when the system reboots?

How does overlayfs work and why do containers use it?

Related Articles