By: Allen Chen, Aly Ibrahim, Adam Wright
Modified by: Evan Zhen
Professor's Definition: A file system is an on-disk data structure, providing a virtual memory-like abstraction.
This was a single user real time (RT) operating system used by the PDP-11 family of 16 bit computers. Implemented in 1970, it was built to support multitasking. It basically performs one contiguous file allocation, which generates the issue of external fragmentation. More information about the RT-11 can be found here.
The File Allocation Table (FAT) file system, implemented in 1977, stores data in fixed sized blocks (usually 4KB). It is more complex than the RT-11 File System because it can break up files and store it in different blocks rather than in one long contiguous block. The advantage of this system is that it prevents external fragmentation, but it might have internal fragmentation.
For more information, read the introduction of file systems in our lab 3 project.
Modern disks perform reads and writes in units of sectors, which is usually 512 bytes each. However, file systems allocate and use disk storage in units of blocks, where each block has to be at least the size of a sector, but it is usually 4KB. Note the distinction between a sector size and block size:
sector size: property of disk hardware block size: aspect of the OS that is using the disk
The block bitmap addresses the question of which blocks are free. Each block in the block bitmap has a corresponding bit that indicates if it is free or in use. If bit n is '1', then block n is free; if it's '0' then it's used. In a FAT file system, the block bitmap is actually a linked list instead of bits. The advantage of a bitmap over linked list is that it is more storage efficient and easier to keep consistent on the disk. Although it takes more CPU time to search a bitmap than to remove the head of a linked list, this I/O cost dominates this performance by a lot once we find a free block in a block bitmap.
It is important that the boot sector, superblock, block bitmap, are marked '0' because these blocks are crucial for a file system to operate properly. If an inode block is marked free, it is not too severe because it just means some of the files will be lost in the disk.
Each file and directory in a file system corresponds to one inode. The inode stores information about the file such as its size, type (file or directory), number of links to the file, and which blocks the file occupies. An inode, however, will not tell you which directory a file is located. Having inodes allow directory entries to safely move files from one directory to another because directory entries refer to files by inode numbers. Also, an inode allows the ability of creating hard links to a file, where files located in different directories or under different names, but it uses the exact same blocks, hence having the exact same data.
Because there are more blocks than inodes, it is possible for a file system to run out of inodes before it runs out of free blocks, leaving a file system without the ability to generate create new files or directories. However, when this occurs, most file systems would be able to generate more inodes, or a file system would automatically generate more inodes than blocks.
Each inode points to the blocks that a file or directory contains. An inode contains 10 direct pointers and an indirect block pointer. The direct pointers points to the blocks that a file occupies. If a file occupies more than 10 blocks, this is where the indirect block pointer comes in. The indirect block pointer points to a block, and in that block, it points to other blocks that the file occupies.
Each inode reside in a block called inode block. An inode is a very small structure, so an inode block can hold many inodes. Most file system contains many inode blocks to prevent the possibility of running out of inodes.
It is important to note that not all file system uses inodes.
A directory block stores the information about a directory and its files. Each file in the directory has its own directory entry. A directory entry contains:
Name Size Linked list (if a file system doesn't use inodes)
In keeping with our idea of a tree based file system rather than a flat one, our directory entries now hold the following information: Directory entry:
File Name File Size 10 Direct block pointers to the first 10 blocks of the file Indirect block containing 1024 block pointers (If file system uses inodes, then a directory entry will contain inode instead of block pointers)
diagram for a directory entry (no inodes)
Note: a directory entry that uses inodes is pretty similar, except a directory entry will contain the inode number, which the inode structure will point to the blocks that are used.
Using only direct block pointers we can store a 40KB file (10 pointers pointing to one 4KB block each).
FAT internal fragmentation: 4KB – 1B = 4095B + 4B (block pointer in FAT) = 4099B
New hierarchical block pointer design: 4139B (9 direct block pointers, 1 indirect block pointer, 1 doubly indirect block pointer, (11x4B=44B) and then the 4095B of fragmentation internal to the one block the file uses)
What about a larger file such as 40KB + 1B (40961B) Since we’re 1B beyond full blocks, that gives 4095B of wasted space in the last block of file data. Then we have another 4092B of the indirect block (1023 unused pointers). Then we count the unused doubly indirect block pointer so the total is just under 8KB of wasted space
What if we want to support files that are larger than 4GB + 4MB + 40KB? Then we have to include a triply indirect block (which gives us support for 4TB files)
Is there any need for quadruply or quintuply indirect blocks? In the end you’d probably want to consider increasing the 4KB block size which would help avoid the increasing amounts of overhead from adding these more indirect blocks.
Another issue is that we are using 4B pointers which means that the maximum disk size we can support is 16TB (2^12 * 2^32)
Let's say we want a simple set of system calls that work no matter what file system lies underneath e.g.: open, read, write, readdir, etc. What the kernel will do to is to use VFS (virtual file system), which is just a fancy name for object orientated programming, having a universal interface that hides different underlying implementations
These function pointers in the inode_operations and file_operations structs are called the VFS layer
Fit multiple file systems into one namespace
DOS – uses drive letters A:\ C:\ etc.
UNIX – mount: attaches one file system to another
Say we have directory A with a root directory and a /c directory and the file /hello.txt
Then we have directory C with a root directory and the file /world.txt
Then we can mount directory C to /c in directory A.
Going to /c in A we see the contents of directory C, and C can be a different file system
POTENTAL PROBLEM! Let’s say we do step 1, then the computer crashes, then we start back up. As a result, you see 2 copies of hello.txt so you delete one. Now the indirect block of hello.txt is marked as free so we can create a file that overwrites that indirect block, and thus change where data stored to the end of hello.txt actually goes.
A given file system is said to be robust if it obeys the following 4 invariants:
If 1, 2, or 3 are violated we can have some serious problems, but if 4 is violated then we get performance degradation but it’s not as serious since it just means that we have free space that we cannot use. Thus, if we have to break one of these invariants, it's best to break the 4th one.
Let's look at the previous operation and alter the order by doing the following:
Doing so, we only violate the 4th invariant, and it’s not quite so serious if we crash after step 1 but before step 2. HOWEVER, in this case, we lose the file.
If we incorporate inodes into this file system implementation, we have an inode block after the block bitmap
Now to create a file hello.txt, we need to do:
So in order to violate only invariant 4:
What about renaming a file without the possibility of losing it?
Thus when a file is renamed, the file will not disappear and at least one of the names will remain.
However it’s too bad this results in many reads