From 218e3aa7eb15c62a0f37dd07f2b3e2170636b5c5 Mon Sep 17 00:00:00 2001 From: Rob Landley Date: Fri, 31 Dec 2021 18:10:21 -0600 Subject: [PATCH] Move ext2.html into www/doc and convert mount.txt to mount.html --- www/{ => doc}/ext2.html | 0 www/doc/mount.html | 196 ++++++++++++++++++++++++++++++++++++++++ www/doc/mount.txt | 163 --------------------------------- 3 files changed, 196 insertions(+), 163 deletions(-) rename www/{ => doc}/ext2.html (100%) create mode 100644 www/doc/mount.html delete mode 100644 www/doc/mount.txt diff --git a/www/ext2.html b/www/doc/ext2.html similarity index 100% rename from www/ext2.html rename to www/doc/ext2.html diff --git a/www/doc/mount.html b/www/doc/mount.html new file mode 100644 index 00000000..89362652 --- /dev/null +++ b/www/doc/mount.html @@ -0,0 +1,196 @@ +

How mount actually works

+ +

The mount comand +calls the mount +system call, which has five arguments:

+ +
+int mount(const char *source, const char *target, const char *filesystemtype, + unsigned long mountflags, const void *data); +
+ +

The command "mount -t ext2 /dev/sda1 /path/to/mntpoint -o ro,noatime" +parses its command line arguments to feed them into those five system call +arguments. In this example, the source is "/dev/sda1", the target +is "/path/to/mountpoint", and the filesystemtype is "ext2". + +

The other two syscall arguments (mountflags and data) +come from the "-o option,option,option" argument. The mountflags argument goes +to the VFS (explained below), and the data argument is passed to the filesystem +driver.

+ +

The mount command's options string is a list of comma separated values. If +there's more than one -o argument on the mount command line, they get glued +together (in order) with a comma. The mount command also checks the file +/etc/fstab for default options, and the options you specify on the command +line get appended to those defaults (if any). Most other command line mount +flags are just synonyms for adding option flags (for example +"mount -o remount -w" is equivalent to "mount -o remount,rw"). Behind the +scenes they all get appended to the -o string and fed to a common parser.

+ +

VFS stands for "Virtual File System" and is the common infrastructure shared +by different filesystems. It handles common things like making the filesystem +read only. The mount command assembles an option string to supply to the "data" +argument of the option syscall, but first it parses it for VFS options +(ro,noexec,nodev,nosuid,noatime...) each of which corresponds to a flag +from #include <sys/mount.h>. The mount command removes those options +from the string and sets the corresponding bit in mountflags, then the +remaining options (if any) form the data argument for the filesystem driver.

+ +
+

Implementation details: the mountflag MS_SILENCE gets set by +default even if there's nothing in /etc/fstab. Some actions (such as --bind +and --move mounts, I.E. -o bind and -o move) are just VFS actions and don't +require any specific filesystem at all. The "-o remount" flag requires looking +up the filesystem in /proc/mounts and reassembling the full option string +because you don't _just_ pass in the changed flags but have to reassemble +the complete new filesystem state to give the system call. Some of the options +in /etc/fstab are for the mount command (such as "user" which only does +anything if the mount command has the suid bit set) and don't get passed +through to the system call.

+
+ +

When mounting a new filesystem, the "filesystem" argument to the mount system +call specifies which filesystem driver to use. All the loaded drivers are +listed in /proc/filesystems, but calling mount can also trigger a module load +request to add another. A filesystem driver is responsible for putting files +and subdirectories under the mount point: any time you open, close, read, +write, truncate, list the contents of a directory, move, or delete a file, +you're talking to a filesystem driver to do it. (Or when you call +ioctl(), stat(), statvfs(), utime()...)

+ +

Four filesystem types (block backed, server backed, ramfs, synthetic).

+ +

Different drivers implement different filesystems, which come in four +different types: the filesystem's backing store can be a fixed length +block of storage, the backing store can be some server the driver connects to, +the files can remain in memory with no backing store, +or the filesystem driver can algorithmically create the filesystem's contents +on the fly.

+ +
    +
  1. Block device backed filesystems, such as ext2 and vfat.

    + +

    This kind of filesystem driver acts as a lens to look at a block device +through. The source argument for block backed filesystems is a path to a +block device (such as "/dev/hda1") which stores the contents of the +filesystem in a fixed length block of sequential storage, with a seperate +driver providing that block device.

    + +

    Block backed filesystems are the "conventional" filesystem type most people +think of when they mount things. The name means that the "backing store" +(where the data lives when the system is switched off) is on a block device.

    +
  2. + +
  3. Server backed filesystems, such as cifs/samba or fuse.

    + +

    These drivers convert filesystem operations into a sequential stream of +bytes, which it can send through a pipe to talk to a program. The filesystem +server could be a local Filesystem in Userspace daemon (connected to a local +process through a pipe filehandle), behind a network socket (CIFS and v9fs), +behind a char device (/dev/ttyS0), and so on. The common attribute is there's +some program on the other end sending and receiving a sequential bytestream. +The backing store is a server somewhere, and the filesystem driver is talking +to a process that reads and writes data in some known protocol.

    + +

    The source argument for these filesystems indicates where the filesystem +lives. It's often in a URL-like format for network filesystems, but it's +really just a blob of data that the filesystem driver understands.

    + +

    A lot of server backed filesystems want to open their own connection so they +don't have to pass their data through a persistent local userspace process, +not really for performance reasons but because in low memory situations a +chicken-and-egg situation can develop where all the process's pages have +been swapped out but the filesystem needs to write data to its backing +store in order to free up memory so it can swap the process's pages back in. +If this mechanism is providing the root filesystem, this can deadlock and +freeze the system solid. So while you _can_ pass some of them a filehandle, +more often than not you don't.

    + +

    These are also known as "pipe backed" filesystems (or "network filesystems" +because that's a common case, although a network doesn't need to be inolved). +Conceptually they're char device backed filesystems analogous to the block +backed filesystems (block devices provide seekable storage, char devices +provide serial I/O), but you don't commonly specify a character device in +/dev when mounting them because you're talking to a specific server process, +not a whole machine.

    +
  4. + +
  5. Ram backed filesystems (ramfs and tmpfs).

    + +

    These are very simple filesystems that don't implement a backing store, +but just keep the data in memory. Data +written to these gets stored in the disk cache, and the driver ignores requests +to flush it to backing store (reporting all the pages as pinned and +unfreeable).

    + +

    These filesystem drivers essentially mount the VFS's page/dentry cache as if it was a +filesystem. (Page cache stores file contents, dentry cache stores directory +entries.) They grow and shrink dynamically as needed: when you write files +into them they allocate more memory to store it, and when you delete files +the memory is freed.

    + +

    The "ramfs" driver provides the simplest possible ram filesystem, +which is too simple for most real use cases. The "tmpfs" driver adds +a size limitation (by default 50% of system RAM, but it's adjustable as a mount +option) so the system doesn't run out of memory and lock up if you +"cat /dev/zero > file", can report how much space is remaining +when asked (ramfs always says 0 bytes free), and can write its data +out to swap space (like processes do) when the system is under memory pressure.

    + +
    +

    Note that "ramdisk" is not the same as "ramfs". The ramdisk driver uses a +chunk of memory to implement a block device, and then you can format that +block device and mount it with a block device backed filesystem driver. +(This is the same "two device drivers" approach you always have with block +backed filesystems: one driver provides /dev/ram0 and the second driver mounts +it as vfat.) Ram disks are significantly less efficient than ramfs, +allocating a fixed amount of memory up front for the block device instead of +dynamically resizing itself as files are written into an deleted from the +page and dentry caches the way ramfs does.

    +
    + +

    Initramfs (I.E. rootfs) is a ram backed filesystem mounted on / which +can't be unmounted for the same reason PID 1 can't exit. The boot +process can extract a cpio.gz archive into it (either statically linked +into the kernel or loaded as a second file by the bootloader), and +if it contains an executable "init" binary at the top level that +will be run as PID 1. If you specify "root=" on the kernel command line, +initramfs will be ramfs and will get overmounted with the specified +filesystem if no "/init" binary can be run out of the initramfs. +If you don't specify root= then initramfs will be tmpfs, which is probably +what you want when the system is running from initramfs.

    +
  6. + +
  7. Synthetic filesystems (proc, sysfs, devtmpfs, devpts...)

    + +

    These filesystems don't have any backing store because they don't +store arbitrary data the way the first three types of filesystems do.

    + +

    Instead they present artificial contents, which can represent processes or +hardware or anything the driver writer wants them to show. Listing or reading +from these files calls a driver function that produces whatever output it's +programmed to, and writing to these files submits data to the driver which +can do anything it wants with it.

    + +

    Synthetic filesystems are often implemented to provide monitoring and control +knobs for parts of the operating system, as an alternative to adding more +system calls (or ioctl, sysctl, etc). They provide a more human friendly user +interface which programs can use but which users can also interact with +directly from the command line via "cat" and redirecting the output of +"echo" into special files.

    + +
    +

    The first synthetic filesystem in Linux was "proc", which was initially +intended to provide a directory for each process in the system to provide +information to tools like "ps" and "top" (the /proc/[0-9]* entries) +but became a dumping ground for any information the kernel wanted to export. +Eventually the kernel developers genericized +the synthetic filesystem infrastructure so the system could have multiple +different synthetic filesystems, but /proc remains full +unrelated historic legacy exports kept for backwards compatibility.

    +
    +
  8. +
+ +

TODO: explain overmounts, mount --move, mount namespaces.

diff --git a/www/doc/mount.txt b/www/doc/mount.txt deleted file mode 100644 index f538c467..00000000 --- a/www/doc/mount.txt +++ /dev/null @@ -1,163 +0,0 @@ -Here's how mount actually works: - -The mount comand calls the mount system call, which has five arguments you -can see on the "man 2 mount" page: - - int mount(const char *source, const char *target, const char *filesystemtype, - unsigned long mountflags, const void *data); - -The command "mount -t ext2 /dev/sda1 /path/to/mntpoint -o ro,noatime", -parses its command line arguments to feed them into those five system call -arguments. In this example, the source is "/dev/sda1", the target is -"/path/to/mountpoint", and the filesystemtype is "ext2". - -The other two syscall arguments (mountflags and data) come from the -"-o option,option,option" argument. The mountflags argument goes to the VFS -(explained below), and the data argument is passed to the filesystem driver. - -The mount command's options string is a list of comma separated values. If -there's more than one -o argument on the mount command line, they get glued -together (in order) with a comma. The mount command also checks the file -/etc/fstab for default options, and the options you specify on the command -line get appended to those defaults (if any). Most other command line mount -flags are just synonyms for adding option flags (for example -"mount -o remount -w" is equivalent to "mount -o remount,rw"). Behind the -scenes they all get appended to the -o string and fed to a common parser. - -VFS stands for "Virtual File System" and is the common infrastructure shared -by different filesystems. It handles common things like making the filesystem -read only. The mount command assembles an option string to supply to the "data" -argument of the option syscall, but first it parses it for VFS options -(ro,noexec,nodev,nosuid,noatime...) each of which corresponds to a flag -from #include . The mount command removes those options from the -sting and sets the corresponding bit in mountflags, then the remaining options -(if any) form the data argument for the filesystem driver. - -A few quick implementation details: the mountflag MS_SILENCE gets set by -default even if there's nothing in /etc/fstab. Some actions (such as --bind -and --move mounts, I.E. -o bind and -o move) are just VFS actions and don't -require any specific filesystem at all. The "-o remount" flag requires looking -up the filesystem in /proc/mounts and reassembling the full option string -because you don't _just_ pass in the changed flags but have to reassemble -the complete new filesystem state to give the system call. Some of the options -in /etc/fstab are for the mount command (such as "user" which only does -anything if the mount command has the suid bit set) and don't get passed -through to the system call. - -When mounting a new filesystem, the "filesystem" argument to the mount system -call specifies which filesystem driver to use. All the loaded drivers are -listed in /proc/filesystems, but calling mount can also trigger a module load -request to add another. A filesystem driver is responsible for putting files -and subdirectories under the mount point: any time you open, close, read, -write, truncate, list the contents of a directory, move, or delete a file, -you're talking to a filesystem driver to do it. (Or when you call -ioctl(), stat(), statvfs(), utime()...) - -Different drivers implement different filesystems, which have four categories: - -1) Block device backed filesystems, such as ext2 and vfat. - -This kind of filesystem driver acts as a lens to look at a block device -through. The source argument for block backed filesystems is a path to a -block device, such as "/dev/hda1", which stores the contents of the -filesystem in a fixed block of sequential storage, and there's a seperate -driver providing that block device. - -Block backed filesystems are the "conventional" filesystem type most people -think of when they mount things. The name means that the "backing store" -(where the data lives when the system is switched off) is on a block device. - -2) Server backed filesystems, such as cifs/samba or fuse. - -These drivers convert filesystem operations into a sequential stream of -bytes, which it can send through a pipe to talk to a program. The filesystem -server could be a local Filesystem in Userspace daemon (connected to a local -process through a pipe filehandle), behind a network socket (CIFS and v9fs), -behind a char device (/dev/ttyS0), and so on. The common attribute is there's -some program on the other end sending and receiving a sequential bytestream. -The backing store is a server somewhere, and the filesystem driver is talking -to a process that reads and writes data in some known protocol. - -The source argument for these filesystems indicates where the filesystem lives. It's often in a URL-like format for network filesystems, but it's really just a blob of data that the filesystem driver understands. - -A lot of server backed filesystems want to open their own connection so they -don't have to pass their data through a persistent local userspace process, -not really for performance reasons but because in low memory situations a -chicken-and-egg situation can develop where all the process's pages have -been swapped out but the filesystem needs to write data to its backing -store in order to free up memory so it can swap the process's pages back in. -If this mechanism is providing the root filesystem, this can deadlock and -freeze the system solid. So while you _can_ pass some of them a filehandle, -more often than not you don't. - -These are also known as "pipe backed" filesystems (or "network filesystems" -because that's a common case, although a network doesn't need to be inolved). -Conceptually they're char device backed filesystems (analogus to the block -device backed ones), but you don't commonly specify a character device in -/dev when mounting them because you're talking to a specific server process, -not a whole machine. - -3) Ram backed filesystems, such as ramfs and tmpfs. - -These are very simple filesystems that don't implement a backing store. Data -written to these gets stored in the disk cache, and the driver ignores requests -to flush it to backing store (reporting all the pages as pinned and -unfreeable). - -These drivers essentially mount the VFS's page/dentry cache as if it was a -filesystem. (Page cache stores file contents, dentry cache stores directory -entries.) They grow and shrink dynamically, as needed: when you write files -into them they allocate more memory to store it, and when you delete files -the memory is freed. - -There's a simple one (ramfs) that does only that, and a more complex one (tmpfs) -which adds a size limitation (by default 50%, but it's adjustable as a mount -option) so the system doesn't run out of memory and lock up if you -"cat /dev/zero > file", and can also report how much space is remaining -when asked (ramfs always says 0 bytes free). The other thing tmpfs does -is write its data out to swap space (like processes do) when the system -is under memory proessure. - -Note that "ramdisk" is not the same as "ramfs". The ramdisk driver uses a -chunk of memory to implement a block device, and then you can format that -block device and mount it with a block device backed filesystem driver. -(This is the same "two device drivers" approach you always have with block -backed filesystems: one driver provides /dev/ram0 and the second driver mounts -it as vfat.) Ram disks are significantly less efficient than ramfs, -allocating a fixed amount of memory up front for the block device instead of -dynamically resizing itself as files are written into an deleted from the -page and dentry caches the way ramfs does. - -Note: initramfs cpio, tmpfs as rootfs. - -4) Synthetic filesystems, such as proc, sysfs, devpts... - -These filesystems don't have any backing store either, because they don't -store arbitrary data the way the first three types of filesystems do. - -Instead they present artificial contents, which can represent processes or -hardware or anything the driver writer wants them to show. Listing or reading -from these files calls a driver function that produces whatever output it's -programmed to, and writing to these files submits data to the driver which -can do anything it wants with it. - -Synthetic ilesystems are often implemented to provide monitoring and control -knobs for parts of the operating system. It's an alternative to adding more -system calls (or ioctl, sysctl, etc), and provides a more human friendly user -interface which programs can use but which users can also interact with -directly from the command line via "cat" and redirecting the output of -"echo" into special files. - - -Those are the four types of filesystems: backing store can be a fixed length -block of storage, backing store can be some server the driver connects to, -backing store can not exist and the files merely reside in the disk cache, -or the filesystem driver can just make up its contents programmatically. - -And that's how filesystems get mounted, using the mount system call which has -five arguments. The "filesystem" argument specifies the driver implementing -one of those filesystems, and the "source" and "data" arguments get fed to -that driver. The "target" and "mountflags" arguments get parsed (and handled) -by the generic VFS infrastructure. (The filesystem driver can peek at the -VFS data, but generally doesn't need to care. The VFS tells the filesystem -what to do, in response to what userspace said to do.) -- 2.39.2