InfraRunBook
    Back to articles

    Linux File System and Mount Points Explained

    Linux
    Published: Apr 7, 2026
    Updated: Apr 7, 2026

    A deep-dive into how Linux organizes storage through its Virtual File System layer, mount points, and fstab — written for engineers who want to understand what's actually happening under the hood.

    Linux File System and Mount Points Explained

    Everything Is a File — But What Does That Actually Mean?

    You've heard the phrase a hundred times: in Linux, everything is a file. Processes, sockets, hardware devices, kernel tunables — the operating system presents all of it through a unified file interface. But the reason that abstraction works at all is because of the Virtual File System layer, and the mount system that hangs concrete storage onto that abstraction. If you've ever wondered what actually happens when you run

    mount
    , or why your container sees a completely different root than the host, this is the article for you.

    I'm going to walk through this the way I'd explain it to a colleague who's comfortable with Linux administration but hasn't dug into the internals. We'll go from the VFS layer down to fstab, bind mounts, and mount namespaces — and I'll flag the misconceptions I see most often in the field.

    The Virtual File System: One Interface, Many Backends

    The kernel's VFS is a software layer that sits between user-space applications and actual file system implementations. When a process calls

    open()
    ,
    read()
    , or
    stat()
    , it talks to the VFS. The VFS then dispatches that call to the appropriate concrete file system driver — whether that's ext4, XFS, tmpfs, or even a network file system like NFS.

    The VFS defines four core objects: superblocks, inodes, dentries, and file objects. The superblock holds metadata about the mounted file system as a whole — its type, block size, and state. An inode represents a single file or directory, tracking permissions, timestamps, and pointers to the actual data blocks. A dentry (directory entry) maps a filename to an inode — it's the kernel's cache of the namespace tree. And a file object represents an open file descriptor in a running process, tying together the inode, current position, and flags.

    This design is why you can do things like

    stat /proc/1/status
    and get sensible output even though there are no actual disk blocks backing that file. The
    procfs
    driver implements the VFS interface and generates content on the fly when the kernel services the read call. The application doesn't need to know or care.

    What a Mount Point Actually Is

    A mount point is just a directory. That's it. When you mount a file system onto a directory, the kernel attaches the root of that file system to that directory entry in the namespace tree. Anything that was previously in that directory becomes temporarily invisible — hidden behind the mounted file system's own root. This is not a copy, not a symlink, not a bind. The kernel literally replaces what the dentry points to.

    The kernel tracks all active mounts in an internal structure accessible via

    /proc/mounts
    and the more detailed
    /proc/self/mountinfo
    . The
    findmnt
    tool reads these and formats them usably:

    [infrarunbook-admin@sw-infrarunbook-01 ~]$ findmnt
    TARGET                                SOURCE      FSTYPE      OPTIONS
    /                                     /dev/sda2   ext4        rw,relatime
    ├─/sys                                sysfs       sysfs       rw,nosuid,nodev,noexec,relatime
    │ ├─/sys/kernel/security              securityfs  securityfs  rw,nosuid,nodev,noexec,relatime
    │ └─/sys/fs/cgroup                    cgroup2     cgroup2     rw,nosuid,nodev,noexec,relatime
    ├─/proc                               proc        proc        rw,nosuid,nodev,noexec,relatime
    ├─/dev                                devtmpfs    devtmpfs    rw,nosuid,size=8192k,nr_inodes=4096
    │ ├─/dev/pts                          devpts      devpts      rw,nosuid,noexec,relatime
    │ └─/dev/shm                          tmpfs       tmpfs       rw,nosuid,nodev
    ├─/run                                tmpfs       tmpfs       rw,nosuid,nodev,mode=755
    └─/data                               /dev/sdb1   xfs         rw,relatime,attr2,inode64
    

    Notice the tree structure. Every indented entry is a child mount — a mount whose mount point lives inside the parent mount's directory tree. This hierarchy is called the mount tree, and it's how the kernel resolves paths. When you stat

    /sys/fs/cgroup
    , the kernel walks the path components, checks the dentry cache, and follows the mount attachment to reach cgroup2's own inode space.

    File System Types You'll Actually Encounter

    Not all file systems store data on disk. Linux mounts a mix of storage-backed, memory-backed, and kernel-virtual file systems during a normal boot. Understanding what each one is for saves you a lot of confusion when you're staring at

    /proc/mounts
    wondering why there are fifteen entries before you even get to your disk.

    ext4 is still the most common root file system type on general-purpose Linux. It's journaled, mature, and has excellent fsck tooling. XFS is a better choice for large files and high-throughput workloads — it scales better under heavy parallelism, which is why RHEL defaults to it. Both support extended attributes and ACLs out of the box.

    tmpfs is memory-backed storage. It behaves exactly like a disk-backed file system from a user-space perspective, but its data lives in RAM (and optionally swap). The kernel uses it for

    /dev/shm
    ,
    /run
    , and often
    /tmp
    . It doesn't have a fixed size ceiling unless you specify one — it grows and shrinks dynamically. I've seen junior engineers try to diagnose "disk full" errors in
    /run
    and spend twenty minutes looking for a disk device that doesn't exist.

    proc and sysfs are pseudo-file systems that expose kernel data structures.

    /proc
    is process and system information.
    /sys
    is the kernel object model — device trees, driver parameters, and hardware state. Neither has any on-disk presence. If you're ever tuning kernel parameters at runtime with
    echo
    into
    /sys/block/sdb/queue/scheduler
    , you're writing directly into kernel memory through the sysfs interface.

    devtmpfs provides the

    /dev
    tree. The kernel auto-populates it with device nodes as drivers register hardware. Without it mounted, you'd have no
    /dev/sda
    , no
    /dev/null
    , nothing. udev then manages the naming and permissions on top of this.

    overlay (overlayfs) is what Docker and most container runtimes use for image layers. It stacks a read-only lower directory and a writable upper directory, presenting a merged view. Writes go to the upper layer. The lower layers are never modified. This is how you can spin up fifty containers from the same base image without duplicating gigabytes of data.

    How fstab Works — and Where People Get It Wrong

    The file

    /etc/fstab
    is a static table that tells the system what to mount at boot, and how. Each line has six fields: device, mount point, file system type, options, dump frequency, and fsck pass order. Here's a representative example from a server I work on:

    # /etc/fstab on sw-infrarunbook-01
    # <device>                                  <mount>   <type>  <options>                  <dump> <pass>
    UUID=3f2e1a4b-8c7d-4e56-9f01-2b3c4d5e6f7a  /         ext4    defaults,errors=remount-ro  0      1
    UUID=a1b2c3d4-e5f6-7890-abcd-ef1234567890  /data     xfs     defaults,noatime            0      2
    UUID=dead1234-beef-cafe-0000-111122223333  /boot     ext4    defaults                    0      2
    tmpfs                                      /tmp      tmpfs   defaults,size=2G,mode=1777  0      0
    

    The device field should almost always use a UUID, not

    /dev/sda1
    . Device names aren't stable — if you add a disk or the kernel enumerates storage in a different order,
    /dev/sdb
    might become
    /dev/sda
    after a reboot. UUIDs don't change. Get them with
    blkid
    or
    lsblk -o NAME,UUID
    .

    The options field is where the real control lives.

    noexec
    prevents execution of binaries from that mount — useful on
    /tmp
    and
    /home
    to limit attacker surface.
    nosuid
    ignores setuid bits, which matters if users can put files on a mount.
    nodev
    prevents interpretation of device files — you generally want this on any non-root partition.
    relatime
    is the modern default for atime handling: it only updates access time if the current atime is older than the modification time, reducing write overhead without completely disabling atime tracking like
    noatime
    does.

    The dump and fsck fields trip people up. Dump is almost universally 0 these days — the dump utility is rarely used. The fsck pass field tells

    fsck
    when to check the file system at boot. The root file system should be 1. All other file systems that need checking should be 2 (they run after root, in parallel if possible). A value of 0 means skip fsck entirely, which is correct for tmpfs, proc, sysfs, and any network file system.

    Bind Mounts: The Same Data, Different Path

    A bind mount takes an existing directory (or file) and mounts it at a second location. Both paths show the same inode tree. Changes through either path are immediately visible through the other because they're backed by the same in-memory structures.

    [infrarunbook-admin@sw-infrarunbook-01 ~]$ mkdir /mnt/bindtest
    [infrarunbook-admin@sw-infrarunbook-01 ~]$ mount --bind /data/shared /mnt/bindtest
    [infrarunbook-admin@sw-infrarunbook-01 ~]$ findmnt /mnt/bindtest
    TARGET          SOURCE      FSTYPE  OPTIONS
    /mnt/bindtest   /dev/sdb1   xfs     rw,relatime,attr2,inode64
    

    Bind mounts are how container runtimes expose host paths inside a container's mount namespace. When you run a container with

    -v /data/configs:/etc/app
    , the runtime bind-mounts
    /data/configs
    into the container's private namespace at
    /etc/app
    . The container process sees it as a regular directory. The host directory is unchanged.

    You can also bind-mount individual files, which is useful for injecting a single config file into a container without exposing an entire directory. And you can make a bind mount read-only even if the source is writable:

    [infrarunbook-admin@sw-infrarunbook-01 ~]$ mount --bind /data/shared /mnt/readonly-view
    [infrarunbook-admin@sw-infrarunbook-01 ~]$ mount -o remount,ro,bind /mnt/readonly-view
    

    To make bind mounts persistent across reboots, add them to fstab with the

    bind
    option:

    /data/shared  /mnt/bindtest  none  bind  0  0
    

    Mount Namespaces: Isolation at the Kernel Level

    Mount namespaces are the mechanism that lets containers have an entirely different file system view from the host. When a process creates a new mount namespace (via

    unshare(2)
    or
    clone(2)
    with
    CLONE_NEWNS
    ), it gets its own private copy of the mount tree. Changes inside that namespace — mounting, unmounting, bind-mounting — are invisible to the parent namespace and to other namespaces.

    In my experience, this is where a lot of engineers get confused when debugging containers. They'll shell into a host and try to find a mount they're certain they did inside a container, only to find it's not there. Of course it isn't — the container is running in its own mount namespace. The host's

    /proc/mounts
    shows the host namespace. The container's namespace is separate.

    You can inspect another process's mount namespace by reading

    /proc/<pid>/mounts
    or by using
    nsenter
    :

    [infrarunbook-admin@sw-infrarunbook-01 ~]$ nsenter --mount --target 4821 findmnt
    TARGET                   SOURCE       FSTYPE   OPTIONS
    /                        overlay      overlay  rw,relatime,lowerdir=...,upperdir=...,workdir=...
    ├─/proc                  proc         proc     rw,nosuid,nodev,noexec,relatime
    ├─/dev                   tmpfs        tmpfs    rw,nosuid,size=65536k,mode=755
    └─/etc/resolv.conf       /dev/sda2    ext4     rw,relatime
    

    That last line is particularly revealing — a single file bind-mounted from the host's ext4 root into the container's overlay namespace. That's exactly how container runtimes inject DNS configuration without affecting anything else in the container's file system.

    Shared subtrees complicate this further. A mount can be marked as shared, slave, private, or unbindable. A shared mount propagates mount events to its peers. A slave mount receives propagation from its master but doesn't send it back. Private mounts don't propagate at all. This controls whether new mounts inside a namespace become visible outside it, and it's what the

    --mount=type=bind,propagation=rprivate
    syntax in container tooling is controlling.

    systemd and Mount Units

    On any systemd-based distribution, fstab entries are automatically converted into mount units at boot. You can also write mount units directly. A

    .mount
    unit's name must match the escaped mount point path — so a mount at
    /data/shared
    becomes
    data-shared.mount
    .

    # /etc/systemd/system/data-shared.mount
    [Unit]
    Description=Shared Data Volume
    After=network.target
    
    [Mount]
    What=/dev/disk/by-uuid/a1b2c3d4-e5f6-7890-abcd-ef1234567890
    Where=/data/shared
    Type=xfs
    Options=defaults,noatime
    
    [Install]
    WantedBy=multi-user.target
    

    Paired with an

    .automount
    unit, systemd can mount file systems on demand — only when something actually accesses the mount point. This is useful for NFS shares that you don't want to block boot if the network isn't ready, or for removable storage.

    Common Misconceptions I Keep Seeing

    The first one: mounting doesn't copy data. I've had people tell me they're worried that mounting a directory over another directory will destroy the contents of the lower directory. Nothing is destroyed. The contents of the mount point directory are hidden while something is mounted on top of it, but they come right back when you unmount. Run

    umount /mountpoint
    and the original directory contents are exactly where you left them.

    The second: proc and sys are not optional. I've seen people strip down containers to the point where

    /proc
    isn't mounted, then wonder why
    ps
    shows nothing, why
    top
    refuses to start, and why half the tooling is broken. Many system utilities read process and system state directly from procfs. If it's not mounted, they fail silently or with cryptic errors.

    The third:

    /etc/mtab
    is not the source of truth. On modern systems,
    /etc/mtab
    is a symlink to
    /proc/self/mounts
    . The kernel's mount table is the authoritative record. Don't hand-edit mtab — it's auto-generated. If your mount isn't in
    /proc/mounts
    , it didn't work, regardless of what's in fstab.

    The fourth: lazy unmount is not a safe default.

    umount -l
    performs a lazy unmount — it detaches the file system from the namespace tree immediately but doesn't actually release the resources until all open file descriptors against it are closed. This sounds convenient, but I've seen it cause real problems: processes continue reading and writing to what they think is the file system long after you thought you unmounted it, and then you try to run fsck on what you believe is an idle device. Use lazy unmount intentionally, not as a workaround for a busy device. The right approach is to find what has files open with
    lsof +f -- /mountpoint
    and deal with those processes first.

    The fifth: tmpfs data doesn't survive reboots — obviously — but it also doesn't survive unmounts. This trips people up occasionally when they're testing something in

    /dev/shm
    and manually unmount and remount. The data is gone. tmpfs is volatile by design. Don't confuse it with a ramdisk image that you can serialize and reload.

    Practical Checks for Day-to-Day Work

    When you're investigating a storage problem,

    findmnt
    is almost always the right starting point. It reads
    /proc/self/mountinfo
    and gives you the full picture including propagation flags and bind sources.
    lsblk -f
    maps block devices to file system types and UUIDs.
    df -Th
    shows usage with file system types included — the
    -T
    flag is something I forget to add half the time and then wonder why I'm looking at unlabeled columns.

    [infrarunbook-admin@sw-infrarunbook-01 ~]$ df -Th
    Filesystem     Type      Size  Used Avail Use% Mounted on
    /dev/sda2      ext4       50G   18G   30G  38% /
    tmpfs          tmpfs     7.8G     0  7.8G   0% /dev/shm
    /dev/sdb1      xfs       500G  220G  280G  44% /data
    tmpfs          tmpfs     2.0G  1.2M  2.0G   1% /tmp
    

    If you suspect a mount point is hiding something (i.e., something mounted over a non-empty directory), you can temporarily move the mount aside with

    mount --move
    to inspect what's underneath, or check if the inode count for the mount point directory looks off relative to what you'd expect.

    For anything involving mount namespaces and containers,

    lsns -t mnt
    lists all mount namespaces on the system with their owning process and PID. Combined with
    nsenter
    , you can walk into any process's mount namespace and inspect its exact view of the file system tree without affecting it.

    [infrarunbook-admin@sw-infrarunbook-01 ~]$ lsns -t mnt
            NS TYPE NPROCS   PID USER               COMMAND
    4026531841 mnt     142     1 root               /sbin/init
    4026532198 mnt       4  4821 infrarunbook-admin /usr/bin/containerd-shim-runc-v2
    4026532301 mnt       1  4835 100000             nginx: master process nginx
    

    The Linux file system layer is one of those areas where understanding the internals pays compounding dividends. Once you understand what a mount point actually does at the kernel level, the behavior of containers, bind mounts, chroots, and namespace isolation stops being magic and starts being obvious. And when something breaks at 2 AM, obvious beats magic every single time.

    Frequently Asked Questions

    What's the difference between a bind mount and a symlink?

    A symlink is a file that stores a path string — the kernel resolves it at lookup time. A bind mount is a kernel-level attachment that makes one directory tree appear at two locations simultaneously, sharing the same inodes and inode state. Bind mounts work across chroot boundaries and mount namespaces where symlinks would break; they're also unaffected by path resolution rules that can cause symlinks to behave differently depending on where they're dereferenced.

    Why does my container not see mounts I made on the host?

    Containers typically run in a private mount namespace created with CLONE_NEWNS. Any mounts made in the host's namespace after the container started are not visible inside the container unless the mount was made with shared propagation and the container's namespace was configured to receive it as a slave or shared subtree. Use 'nsenter --mount --target <pid>' to inspect the container's namespace directly.

    Is it safe to use /dev/sdX device names in fstab?

    No. Device names like /dev/sda1 are assigned by the kernel at boot based on device enumeration order, which can change if you add, remove, or rearrange storage. Always use stable identifiers: UUID (from blkid), PARTUUID, or device labels. UUIDs are generated at file system creation time and remain stable across reboots and hardware changes.

    What happens to data in tmpfs when the system reboots?

    It's gone. tmpfs is backed by RAM and swap — there is no persistent storage behind it. This is by design. Common tmpfs mount points like /run, /dev/shm, and /tmp are recreated empty at every boot. If you store something critical in tmpfs and reboot or unmount it, that data is not recoverable.

    How does overlayfs work and why do containers use it?

    Overlayfs presents a merged view of a read-only lower directory stack and a writable upper directory. Reads are served from whichever layer the file exists in, starting from the upper. Writes always go to the upper layer — the lower layers are never modified. This allows many container instances to share a single read-only base image without copying gigabytes of data, while each container gets its own isolated writable layer for runtime changes.

    Related Articles