You are expected to understand this. CS 111 Operating Systems Principles, Fall 2006
You are here: CS111: [[2006fall:notes:lec14]]
 
 
 

Lecture 14 notes

By: Allen Chen, Aly Ibrahim, Adam Wright

Modified by: Evan Zhen

File Systems and Robustness

What is a file system?

Professor's Definition: A file system is an on-disk data structure, providing a virtual memory-like abstraction.

RT-11 File System

This was a single user real time (RT) operating system used by the PDP-11 family of 16 bit computers. Implemented in 1970, it was built to support multitasking. It basically performs one contiguous file allocation, which generates the issue of external fragmentation. More information about the RT-11 can be found here.

FAT File System

The File Allocation Table (FAT) file system, implemented in 1977, stores data in fixed sized blocks (usually 4KB). It is more complex than the RT-11 File System because it can break up files and store it in different blocks rather than in one long contiguous block. The advantage of this system is that it prevents external fragmentation, but it might have internal fragmentation.

File System Characteristics

For more information, read the introduction of file systems in our lab 3 project.

Sectors vs. Blocks

Modern disks perform reads and writes in units of sectors, which is usually 512 bytes each. However, file systems allocate and use disk storage in units of blocks, where each block has to be at least the size of a sector, but it is usually 4KB. Note the distinction between a sector size and block size:

sector size: property of disk hardware
block size: aspect of the OS that is using the disk

Block System

  • Boot Sector: Contains boot loader information. Usually resides in Block#0 for quick access.
  • Superblock : Stores the metadata of the file system such as version number, disk size, number of blocks and inodes. Most file systems contain more than one superblock, usually for backup purposes, and the first superblock is usually located in Block#1.
  • Block Bitmap : Blocks that contain a list of which blocks are used and which blocks are free. Most file systems contain more than one block bitmap (FAT doesn't use a block bitmap but uses linked lists instead to keep track of the block status).
  • Inode Blocks : Blocks containing information about which inode points to which blocks (more on inodes later). FAT don't have inode blocks because it uses linked lists.
  • Data Blocks: These blocks contains the data of the directories and files on the disk. Data blocks used to hold information about directories are called Directory Entry Blocks.

Block Bitmap

The block bitmap addresses the question of which blocks are free. Each block in the block bitmap has a corresponding bit that indicates if it is free or in use. If bit n is '1', then block n is free; if it's '0' then it's used. In a FAT file system, the block bitmap is actually a linked list instead of bits. The advantage of a bitmap over linked list is that it is more storage efficient and easier to keep consistent on the disk. Although it takes more CPU time to search a bitmap than to remove the head of a linked list, this I/O cost dominates this performance by a lot once we find a free block in a block bitmap.

It is important that the boot sector, superblock, block bitmap, are marked '0' because these blocks are crucial for a file system to operate properly. If an inode block is marked free, it is not too severe because it just means some of the files will be lost in the disk.

Usually, the boot sector, the superblock ,and the block bitmap are located in the first 3 blocks. Here is an equation that helps us find the inode blocks and data blocks:
n-equation.jpg

Inodes

Each file and directory in a file system corresponds to one inode. The inode stores information about the file such as its size, type (file or directory), number of links to the file, and which blocks the file occupies. An inode, however, will not tell you which directory a file is located. Having inodes allow directory entries to safely move files from one directory to another because directory entries refer to files by inode numbers. Also, an inode allows the ability of creating hard links to a file, where files located in different directories or under different names, but it uses the exact same blocks, hence having the exact same data.

Because there are more blocks than inodes, it is possible for a file system to run out of inodes before it runs out of free blocks, leaving a file system without the ability to generate create new files or directories. However, when this occurs, most file systems would be able to generate more inodes, or a file system would automatically generate more inodes than blocks.

Each inode points to the blocks that a file or directory contains. An inode contains 10 direct pointers and an indirect block pointer. The direct pointers points to the blocks that a file occupies. If a file occupies more than 10 blocks, this is where the indirect block pointer comes in. The indirect block pointer points to a block, and in that block, it points to other blocks that the file occupies.

Each inode reside in a block called inode block. An inode is a very small structure, so an inode block can hold many inodes. Most file system contains many inode blocks to prevent the possibility of running out of inodes.

Here is a visual diagram of an inode:
inode.jpg

It is important to note that not all file system uses inodes.

Directory Entry Blocks

A directory block stores the information about a directory and its files. Each file in the directory has its own directory entry. A directory entry contains:

Name
Size
Linked list (if a file system doesn't use inodes)
  • # blocks in a file = ciel(size/block size) = ciel(size/4KB)

In keeping with our idea of a tree based file system rather than a flat one, our directory entries now hold the following information: Directory entry:

File Name
File Size
10 Direct block pointers to the first 10 blocks of the file
Indirect block containing 1024 block pointers
(If file system uses inodes, then a directory entry will contain inode instead of block pointers)

diagram for a directory entry (no inodes)
direntry2.jpg
Note: a directory entry that uses inodes is pretty similar, except a directory entry will contain the inode number, which the inode structure will point to the blocks that are used.

Using only direct block pointers we can store a 40KB file (10 pointers pointing to one 4KB block each).

  • The indirect block allows us to point to 4MB so total file size we can support with the indirect block is 4MB+4KB. The indirect block pointer points to a block of size 4KB (like all blocks on the system) containing 1024 pointers to other blocks. (4KB * 1024 = 4MB).
  • To support even larger files we have a doubly indirect block as well where the doubly indirect block has 1024 pointers to indirect blocks meaning we can then support 4GB (actually 4GB+4MB+40KB)

Internal Fragmentation & Indirect Blocks

FAT internal fragmentation: 4KB – 1B = 4095B + 4B (block pointer in FAT) = 4099B
New hierarchical block pointer design: 4139B (9 direct block pointers, 1 indirect block pointer, 1 doubly indirect block pointer, (11x4B=44B) and then the 4095B of fragmentation internal to the one block the file uses)

What about a larger file such as 40KB + 1B (40961B) Since we’re 1B beyond full blocks, that gives 4095B of wasted space in the last block of file data. Then we have another 4092B of the indirect block (1023 unused pointers). Then we count the unused doubly indirect block pointer so the total is just under 8KB of wasted space

What if we want to support files that are larger than 4GB + 4MB + 40KB? Then we have to include a triply indirect block (which gives us support for 4TB files)

Is there any need for quadruply or quintuply indirect blocks? In the end you’d probably want to consider increasing the 4KB block size which would help avoid the increasing amounts of overhead from adding these more indirect blocks.

Another issue is that we are using 4B pointers which means that the maximum disk size we can support is 16TB (2^12 * 2^32)

Multiple File System Support

Let's say we want a simple set of system calls that work no matter what file system lies underneath e.g.: open, read, write, readdir, etc. What the kernel will do to is to use VFS (virtual file system), which is just a fancy name for object orientated programming, having a universal interface that hides different underlying implementations

vfs.jpg

  • Top level is a struct task (the structure used to represent a process)
  • This points to a struct file_struct (the structure used to represent the file descriptor table)
  • Each entry in file_struct points to a struct file (the structure used to represent an open file)
  • Each file has a struct inode (file or directory, open or not; where file is on disk)
  • The file has a struct file_operations with functon pointers to the read and write functions that will do whatever is necessary to do that job on this file system
  • The inode has a struct inode_operations with open, rename, unlinke function pointers in the same way

These function pointers in the inode_operations and file_operations structs are called the VFS layer

Multiple File System User View

Fit multiple file systems into one namespace

DOS – uses drive letters A:\ C:\ etc.

UNIX – mount: attaches one file system to another

mounting.jpg

Say we have directory A with a root directory and a /c directory and the file /hello.txt
Then we have directory C with a root directory and the file /world.txt
Then we can mount directory C to /c in directory A.
Going to /c in A we see the contents of directory C, and C can be a different file system

fs.jpg
Diagram of file system – add hello.txt to root directory using a file system that uses linked list blocks

  1. Add a direntry of hello.txt to the root directory
  2. Add a direntry for the subdirectory “sub” to the root directory
  3. Then say we want to execute mv hello.txt sub
  4. Say the kernel does this by performing the following set of operations:
    1. copy directory entry
    2. remove old entry

POTENTAL PROBLEM! Let’s say we do step 1, then the computer crashes, then we start back up. As a result, you see 2 copies of hello.txt so you delete one. Now the indirect block of hello.txt is marked as free so we can create a file that overwrites that indirect block, and thus change where data stored to the end of hello.txt actually goes.

File System Correctness Invariants

A given file system is said to be robust if it obeys the following 4 invariants:

  1. Every block used for exactly one purpose. This means a block should not be used by two different files unless they are hard-linked. Also, the boot sector, superblock, and block bitmap should not be used by other files.
  2. All referenced blocked are initialized. So if a block is being used by a file, or is pointed by an inode, it should contain data and not garbage.
  3. All referenced blocks are marked used. Make sure a block pointed by an inode is marked used in the block bitmap.
  4. All unreferenced blocks are marked free. Make sure any free block (not pointed by an inode or used by any files) is marked free n the block bitmap.

If 1, 2, or 3 are violated we can have some serious problems, but if 4 is violated then we get performance degradation but it’s not as serious since it just means that we have free space that we cannot use. Thus, if we have to break one of these invariants, it's best to break the 4th one.

Let's look at the previous operation and alter the order by doing the following:

  1. remove old entry
  2. copy directory entry

Doing so, we only violate the 4th invariant, and it’s not quite so serious if we crash after step 1 but before step 2. HOWEVER, in this case, we lose the file.

Inodes (revisited)

If we incorporate inodes into this file system implementation, we have an inode block after the block bitmap

fs2.jpg

Now to create a file hello.txt, we need to do:

  1. Allocate inode
  2. Allocate data block
  3. Write directory entry
  4. Increment number of links on inode
  5. Store data block in inode
  6. Write data

So in order to violate only invariant 4:

  1. allocate data (1 block write)
  2. write the data block
  3. allocate inode, store data block pointer, increment number of links
  4. write dir entry

What about renaming a file without the possibility of losing it?

  1. increment link count
  2. write new directory entry
  3. remove old directory entry
  4. decrement link count

Thus when a file is renamed, the file will not disappear and at least one of the names will remain.

However it’s too bad this results in many reads

 
2006fall/notes/lec14.txt · Last modified: 2007/09/28 00:25 (external edit)
 
Recent changes RSS feed Driven by DokuWiki