Phoenix File System

Phoenix File System

Documentation Version: 1.32

Table Of Contents

Physical Media Abstraction Layer
Media Partitioning
Volume Abstraction Layer
Volume Sector Allocation Layer
- Free Space Descriptors
- Free Space Descriptor Location Table
File Allocation Layer
Directory Structure
SuperBlock
Boot Record
Directory Structure

Physical Media Abstraction Layer - This layer abstracts the physical format of the media into equal-sized logical allocation units called Logical Sectors using the following guidelines:

Logical Sectors are numbered consecutively beginning at 0.
No Logical Sector number can be greater than 2⁶⁴-1.
Logical Sector 0 always refers to the lowest accessible physical sector.
Lower Logical Sector numbers are considered to be toward the 'beginning' of the device.
Logical Sector numbers are consistent as follows:
- For SCSI and other LBA devices, LBA sector numbers are used.
- For devices which require Cylinder/Head/Sector, the formula: SectorNumber = (CylinderNumber * HeadCount + HeadNumber) * SectorCount + SectorNumber - 1
  must be used.
All Logical Sectors represent 512 bytes of physical storage each.
The latency time seeking between any two consecutive Logical Sectors is minimum.
Logical Sectors cannot refer to overlapping regions of the physical media.

By definition, a Logical Sector is a single allocatable unit for the physical media addressable via an unsigned 64-bit sector number. The number of Logical Sectors is determined by the capacity of the media. The Physical Media Abstraction Layer determines Logical Sector size, SectorSize, which is the number of 8-bit bytes each Logical Sector represents of the physical media; currently only a SectorSize of 512 bytes is supported. Logical Sector sizes less than 512 bytes will never be supported. In addition, the Physical Media Abstraction layer determines the number of Logical Sectors used to represent the physical media, NumSectors; it also determines the function which maps Logical Sectors to physical regions of the media given the above criteria.

It is the sole responsibility of this layer to translate Logical Sector to physical regions of the media. This is the only layer that should have to concern itself with the details of accessing the physical media. All higher layers have no idea after boot of the working of the hardware nor the physical layout of the media, but rather perform all operations in terms of Logical Sectors assuming the above guidelines for Logical Sector properties.

Additional requirements have to be made to accomidate the boot process. When booting the BIOS int13h functions have to be called to load sectors into memory. This requires that physical sectors be 512 bytes each, it also requires that sectors be able to be accessed using Cylinder/Head/Sector unless LBA mode is enabled and LBA BIOS extensions are present. For devices which support both LBA and Cylinder/Head/Sector addressing (such as many modern IDE drives and SCSI drives with IDE compatibility ROM), LBA sector numbers must correlate with the Cylinder/Head/Sector sector numbers perfectly. This is only truly a concern with boot devices, but should not affect the performance of any device.

Media Partitioning - The device is scanned to see if it contains a supported partition table format, if no supported partition table is found, the entire device is considered a single partition. Currently, only a PC-compatible master boot record and partition table is supported; it is located in the first accessible sector, Logical Sector 0. This allows the peaceful coexistence of the Phoenix File System on one partition along with any other PC-compatible file system, including FAT, VFAT, NTFS, and HPFS, on another partition of the same physical device. The format of the PC-compatible master boot record and partition table is:

  PC-compatible Partition Record Entry:
  Offset   Size           Field Name     Description
    00h    BYTE           BootIndicator  (80h) Partition is the one the system is booted from
                                         (00h) Partition is not the boot partition
    01h    BYTE           StartHead      Partition Start Head
    02h    BYTE
             bits   0-5 : StartSector    Partition Start Sector
             bits   6-7 : StartCylinderH Partition Start Cylinder bits 8-9
    03h    BYTE           StartCylinderL Partition Start Cylinder bits 0-7
    04h    BYTE           SystemID       FileSystem ID
                                           
    05h    BYTE           EndHead        Partition End Head
    06h    BYTE
             bits   0-5 : EndSector      Partition End Sector
             bits   6-7 : EndCylinderH   Partition End Cylinder bits 8-9
    07h    BYTE           EndCylinderL   Partition End Cylinder bits 0-7
    08h    DWORD          SectorsBefore  Sectors preceding partition
    0Ch    DWORD          Length         Length of partition in sectors
  ----------------
  Total: 16 bytes

  PC-compatible Master Boot Record format:
  Offset   Size           Field Name     Description
     0h    446 BYTES      Bootstrap      Master Bootstrap loader program
   1BEh    4 PREs         PartitionTable Array of 4 partition records
   1FEh    2 BYTES        Signature      Boot Record Signature (AA55h)
  ----------------
  Total: 512 bytes

This document only discusses the structure of a partition dedicated to the Phoenix File System. The Phoenix File System subdivides each partition into one or more Segments; Segments can be grouped into Volumes. A single Volume may contain Segments from several different partitions on several different storage devices. Volumes are used to store files and associated data in the Phoenix File System. Segments can also be used as raw storage space for virtual memory.

Each logical sector in a given Segment can be referred to as a Segment Sector; Segment Sector number 0 refers to the first logical sector in the given Segment and the number of Segment Sectors is equal to the number of logical sectors in the Segment. Once segments are grouped into Volumes, a Volume Sector is used to make the entire volume look like a contiguous series of equal-sized storage blocks. Each Volume Sector can be comprised of 1 or more Segment Sectors, the number of which must be a power of 2 and all Volume Sectors must be the same size. Volume Sectors begin being numbered with the first Segment Sector on the first segment in the volume. The number of Volume Sectors is the sum of the number of Segment Sectors on each segment that constitutes the volume divided by the number of Segment Sectors per Volume Sector. This is discussed in greater detail as part of the Volume Abstraction Layer.

The first Logical Sector of a Phoenix File System partition holds a short bootstrap loader and a PFS Segment Table. The purpose of the PFS Segment Table is to allow a Phoenix File System partition to be further subdivided in logical segments. Every Phoenix File System partition always contains at least one Segment. Furthermore, one or more Segments compose a Volume. The format of this initial sector is:

  Phoenix File System Segment Table entry:
  Offset   Size           Field Name     Description
     0h    1 QWORD        StartSector    Starting logical sector of this Segment
     8h    1 QWORD        Length         Number of logical sectors in this Segment
    10h    20 BYTES       NextDrive      DriveID of next Segment in Volume (or 0 if last)
    24h    1 BYTE         NextSegment    Segment number of next Segment in Volume
    25h    1 BYTE         NumSegments    Number of Segments belonging to this same Volume
    26h    1 BYTE         Sequence       Index number of Segment within Volume
    27h    1 BYTE         Flags
             bit      0 : BootVolume     Whether the Volume this Segment belongs to is bootable
             bit      1 : Interleaved    Whether the Segments in this same Volume are interleaved
                                         (also called data striping)
                      2 : InUse          Indicates the System Segment Table entry contains valid information
                      3 : FinalSegment   Indicates the given Segment is the last in its Volume
                      4 : Swap           Indicates the given Segment is dedicated as a swap partition
             bits   5-7 : (reserved)     must be 0
   ------------
   Total: 40 bytes

  Phoenix File System Segment Table and Bootstrap Loader format:
  Offset   Size           Field Name     Description
     0h    138 BYTES      BootstrapInit  Bootstrap loader initialization
    8Ah    1 BYTE         Flags          File system and device flags
             bits   0-7 : (reserved)     must be 0
    8Bh    1 BYTE         BootSectors    Number of logical sectors in the bootstrap loader
    8Ch    1 DWORD        MagicNumber    Special number to help discern a Segment Table from other
                                         data should the file system become corrupt, equal to
                                         31676573h ("seg1")
    90h    8 STEs         Segments       Segment Table
   1E0h    1 QWORD        CylinderTable  Logical sector number where Cylinder Table for device resides
                                         or 0 if no Cylinder Table is present
   1E8h    1 WORD         SectorSize     Logical Sector Size for this device
   1EAh    20 BYTES       DiskID         Serial number used to identify drive
   1FEh    1 WORD         Signature      Boot Record Signature (AA55h)
   ------------
   Total: 512 bytes*

* The Segment Table and Bootstrap Loader sector is always 512 bytes, even if the device's logical sectors are larger than 512 bytes. Data stored in the remainder of the Segment Table and Bootstrap Loader block is undefined by the Phoenix File System and may be used however the operating system wishes. The function of the Bootstrap loader initialization is to locate the bootable PhoenixFS Segment, load the first sector of the bootstrap from the Segment into memory, and then transfer control of the processor to the bootstrap program. The BootSectors field should always at least 1 and if it is greater than 1, then the Bootstrap loader initialization should load each sector from the storage device consecutively into memory. This allows a device with 512-byte sectors and 2 BootSectors to be practically indistinguisable during the boot process from a device with 1024-byte sectors, if one were supported, and a value of 1 in the BootSectors field. Again, the format of the data in these extended boot sectors is undefined and implementation-dependent.

The Swap flag is used to indicate that the given Segment is reserved to be used for virtual memory. This can be much more efficient than maintaining a swap file on a PFS volume. Segments reserved for virtual memory are not formatted as PFS volumes and the data in the Segment is considered garbage and may be overwritten in an operating-system specific manner. This accounts for the fact that different operating systems may have different methods of providing virtual memory and that data stored in virtual memory is never consistent between system reboots. Segments reserved for virtual memory are always local to the machine and may not be shared in any fashion. The NextDrive, NextSegment, Sequence, BootVolume, and Interleaved fields should all be set to 0 and the NumSegments, InUse, and FinalSegments should always be 1 whenever the Swap flag is set.

If a Segment Table Entry is unused, then all fields should be set to 0. The InUse flag would then indicate that Segment Table Entry was empty and this condition could be verified by examing the other fields and discovering them all to be 0 also.

A bootstrap loader is always present even if the partition is not flagged as bootable in the partition table. If the partition is not bootable, however, it is suggested that the bootstrap loader simply display some sort of error message should it ever be executed.

It is also important to point out that if a device contains no system partition table supported by PhoenixFS then the entire device is treated as a single partition. As such, if the initial segment(s) corresponds to a Phoenix File System Segment Table and Bootstrap Loader, then the single partition is considered to be PhoenixFS. This can be utilized in formatting removable diskettes with PhoenixFS or otherwise dedicating entire mass storage devices to PhoenixFS.

  Diagram of system partitions and PhoenixFS Segments:

                         PhoenixFS Segment Table
                         (First sector in partition)
  Master Boot Record     .-----------.
  (Logical sector 0)    /| bootstrap |      Segment
  .-------------.      / | loader    |     .------------.
  |    boot     |     /  |-----------|    /|            |
  |    code     |    /   | Segment 0 |   / |            |
  |     .       |   /    |-----------|  /  |            |
  |     .       |  /     | Segment 1 | /   |            |
  |     .       | /      |-----------|/    |            |
  |-------------|/       | Segment 2 |     .            .
  | partition 0 |        |-----------|\    .            .
  |             |        | Segment 3 | \   .            .
  |-------------|\       |-----------|  \  |            |
  | partition 1 | \      | Segment 4 |   \ |            |
  |             |  \     |-----------|    \|            |
  |-------------|   \    | Segment 5 |     `------------'
  | partition 2 |    \   |-----------|
  |             |     \  | Segment 6 |
  |-------------|      \ |-----------|
  | partition 3 |       \| Segment 7 |
  |             |        `-----------'
  `-------------'

The CylinderTable field holds the location of an optional Cylinder Table which can be used to increase file system performance by noting between which sequential logical sectors occurs greater seek latency. No specific knowledge of the storage device is required but rather such a table is constructed by empirically observing seek latencies between adjacent sectors and is stored for use between file system sessions. The name originates from the fact that most storage devices store information logically in cylinders broken into sectors and that seeking from a sector in one cylinder to a sector in another cylinder takes longer than seeking between two sectors in the same cylinder. The Cylinder Table would then end up storing the first sector in each cylinder as the seek between the last sector of the previous cylinder and it would take longer than a seek between any two sectors in the same cylinder. However, the Cylinder Table does not have to be restricted to only describing the locations of the beginning of cylinders, but can be used to indicate any high latency seek between logically consecutive sectors. Many SCSI drives have the ability to remap good sectors logically over sectors that may go bad during normal operation, any seeks to or from such sectors would have much higher latency than would be expected if the translation were not being made by the device.
The file system can then use this Cylinder Table to determine how to best allocate logical sectors to a single file. For example, the file system may be discouraged from spreading a file across cylinder groups in order to minimize seek latency when accessing the file.

The Cylinder Table itself is simply a series of Logical Sector Numbers indicating the destination sector of a sector-to-sector seek between two consecutive logical sectors. The series is terminated by a Logical Sector Number of 0. The table may span as many sectors as is required to store the entire table, but the sectors must be contiguous.

Volume Abstraction Layer - This layer manages the grouping of one or more Segments into logical Volumes. Furthermore, this layer makes a collection of Segment Sectors in distinct Segments appear as a single array of Volume Sectors; each Volume Sector represents a portion of the underlying physical media. This is the final layer of sector-based abstraction present in the Phoenix File System.

As mentioned before, each Volume Sector can be comprised of 1 or more Segment Sectors, the number of which must be a power of 2 and all Volume Sectors must be the same size; this is the VolumeSectorSize. Since the only supported size of a Segment Sector is 512 bytes, the minimum size of a Volume Sector is also 512 bytes. Volume Sectors begin being numbered at 0 which correlated with the first Segment Sector on the first segment in the volume. The number of Volume Sectors, NumVolumeSectors, is the sum of the number of Segment Sectors on each segment that constitutes the volume divided by the number of Segment Sectors per Volume Sector. The boot Volume, the Volume from which the operating system is loaded, may not span multiple Segments.

The grouping of Segments into Volumes is indicated by the Segment Table Entry for each Segment. Each Segment Table Entry specifies the next drive and next segment, NextDrive and NextSegment respectively, belonging to the same volume. The NextDrive field holds the serial number of the device on which resides the next Segment of this Volume or 0 if the given Segment is the last Segment in the Volume (this can also be verified by examining the FinalSegment flag). Serial numbers should be unique among all storage devices on the same system. The NextSegment field holds the Segment number of the Segment which is next in this Volume; the 5 high bits of a Segment number determine the system partition on the device and low 3 bits determine the Segment within the Phoenix File System partition. An error occurs when a partition which does not exist is specified, the partition or Segment specified is unused, or the partition specified has a SystemID other than PhoenixFS (B9h). Two Segments belonging to the same Volume should not reside on the same physical device, although if two Segments belonging to a single Volume are found to be on the same physical device an error does not occur.

In addition, each Segment Table Entry also indicates the number of Segments which comprise the Volume that is belongs to, NumSegments. An error occurs if any two Segments supposedly part of the same Volume have different values for NumSegments. Each Segment Table has a Sequence value which gets incrementally higher from Segment to Segment as the chain of Segments in a Volume is traced. A gap in Sequence values between two Segments of the same volume indicates that a Segment is missing and is considered an error. The Sequence value also determines which Volume Sectors reside on the given Segment. The way in which Volume Sectors are arranged across Segments is determined by the Interleaved flag:

If the Interleaved flag is clear then each Segment is responsible for a consecutive number of Volume Sectors corresponding to the size of the Segment; the order of the Segments is determined by the Sequence value. For example, if two Segments belong to a single Volume, one Segment with 50,000 logical sectors and a Sequence value of 0, the other with 30,000 logical sectors and a Sequence value of 1, and each volume sector was composed of 2 Segment Sectors, then the Volume would consist of 40,000 Volume Sectors. The first 25,000 Volume Sectors, numbered 0 through 24,999, would be located on the first Segment and the last 15,000 Volume Sectors, numbered 25,000 through 39,999, would be located on the second Segment.

If the Interleaved flag is set then the Phoenix File System scatters data across all the Segments in the Volume in a uniform pattern such that individual read and write operations can be fulfilled cooperatively by all the underlying physical devices providing storage for that Volume. This requires that each Segment in the Volume represent exactly the same number of logical sectors. The Sequence value determines which modulus of the volume sector number and the NumSegments refers to the particular Segment. For example, if three Segments belong to a single volume, and each volume sector was composed of 2 Segment Sectors; a write to volume sector 3401 would be written to the Segment with Sequence value 2 while a write to volume sector 3402 would be written to the Segment with Sequence value 0. This is because the volume segment number 3401 modula the number of Segments in the Volume, 3, yields 2 and the volume Segment number 3402 modula the number of Segments in the Volume, 3, yields 0. Note that the entirety of the volume sector is written to the same Segment, event if the number of Segment Sectors per volume sector is greater than 1. It would be possible to further interleave the Segment Sectors which compose a volume sector if and only if the number of Segments per volume is a power of 2 and all Segments in the Volume have the same logical sector size. Currently this additional interleaving is not currently supported by the Phoenix File System but may be implemented in a future version.

Volume are the basic unit of all high level file operations. This layer of abstraction allows for Volumes to be independent of the underlying storage devices.

Volume sector 0, the first sector in the Volume, holds the Volume Descriptor for the Volume as well as a boot stub program. The boot stub is the last phase of the boot process before turning control over to the kernel loader; its function is to locate and load the kernel loader into memory and then turn control over to the kernel loader to actually start the operating system. The Volume Descriptor holds information about the Volume and has the format:

  Format of a Volume Descriptor:
  Offset   Size           Field Name     Description
     0h    4 BYTES        BootStubStart  Short intrasegment jump to real start of boot stub
     4h    1 BYTE         StubSectors    Number of logical sectors in the boot stub program
     5h    1 BYTE         Flags
             bit      0 : Dirty          Set to 1 when the file system is initialized; set to 0 when
                                         file system is properly shutdown.
             bits   1-7 : (reserved)     must be 0
     6h    1 BYTE
             bits   0-3 : ClusterSize    Log base 2 of size of a Volume Sector in 512-byte segments,
                                         for example 0 indicates Volume Sectors are each 512 bytes
                                         while 15 indicates Volume Sectors are each 16,777,216 bytes.
             bits   4-7 : (reserved)     must be 0
     7h    1 BYTE         (reserved)     must be 0
     8h    1 QWORD        VolumeSize     Number of Volume Sectors in this Volume
    10h    1 QWORD        SuperBlock     Volume sector number where SuperBlock is located
    18h    1 DWORD        DateCreated    Date/Time Volume was created
    1Ch    1 DWORD        (reserved)     must be 0
    20h    64 BYTES       VolumeLabel    Short name associated with the volume (null terminated string)
    60h    x BYTES        BootStub       Minimum of 410 bytes of boot stub program
  ----------------
  Total: 512 bytes minimum

At the point in the boot process when the Volume Descriptor is loaded into memory, it is read as a single Segment Sector, without knowledge of the number of Segment Sectors per volume sector. The StubSectors field determines the number of Segment Sectors the boot stub program occupies; this field must always be at least 1 and must be a multiple of the number of Segment Sectors per volume sector. The first task of the boot stub should be to load any additional boot stub sectors into memory consecutively after the first boot stub sector; this should allow for linear program execution.

A volume sector is composed of one or more Segment Sectors. The number of Segment Sectors per volume sector is determined by dividing the size of a volume sector in bytes, ClusterSize, by the size of a logical sector. The size of a logical sector can vary from Segment to Segment that belongs to the same Volume, the only requirement being that the size of a volume sector be at least as large as a logical sector on any Segment that composes the Volume. In which case, the number of Segment Sectors per volume sector can also vary from Segment to Segment.

Each Volume in a system should have a distinct VolumeLabel, if not, once loaded the Operating System is responsible for alerting the user and correcting the conflict. The VolumeLabel is a 31 character long Unicode string, null-terminated. When determining a label conflict, case is not important.

Volume Sector Allocation - This layer exists to keep track of which volume sectors have been allocated to hold data, which are available, and which are unusable (bad).

Single volume sectors called Free Space Descriptors are used to determine whether a number of sectors are in use or available for use. This is done by using each bit in the Free Space Descriptor to represent whether a single successive volume sector is in use or not (0 indicates the respective volume sector is in use, 1 indicates it is not). The number of volume sectors accounted for per Free Space Descriptor depends on the size of a volume sector (since a Free Space Descriptor is exactly one volume sector in size), but can be determined using the formula

SectorsPerFSD = VolumeSectorSize * 8
Traditionally, PCs have used 512-byte sectors, using a VolumeSectorSize equal to a single 512-byte sector implies that 512*8, or 4096, volume sectors could be accounted for per Free Space Descriptor. In addition, it needs to be noted that the volume sectors which are used to hold Free Space Descriptors themselves have to be accounted for like any other volume sector. The number of Free Space Descriptors varies based on the number of volume sectors, NumVolumeSectors. Every volume sector MUST be accounted for using a Free Space Descriptor, and it is not possible for the sectors accounted for by Free Space Descriptors to overlap. Volume sectors used by the Free Space Descriptor Location Table and by the Free Space Descriptors are considered in use and therefore are not available for allocation.

The location of each of the Free Space Descriptors is stored as a series of 64-bit Volume Sector numbers in a consecutive series of sectors ideally stored near the beginning of the storage media. This series of sectors is called the Free Space Descriptor Location Table and each 64-bit entry is the Volume Sector number of the Free Space Descriptor which accounts for a successive series of volume sectors. For example, if VolumeSectorSize is equal to 512 bytes, SectorsPerFSD would then be 4096, as such the first entry in the Free Space Descriptor Location Table (FSDLT) would hold the Volume Sector number of the Free Space Descriptor accountable for the first 4096 sectors (numbered 0 through 4095) and the next entry would hold the Volume Sector number of the Free Space Descriptor accountable for the next 4096 sectors (numbered 4096 through 8091), etc.

      FSDLT         Free Space Descriptors (FSDs)
      .----.
    1 |  o--------->|100010010...100101|  Allocation map for sectors 0 - 4095
      |----|
    2 |  o--------->|000010101...110001|  Allocation map for sectors 4096 - 8091
      |----|
    3 |  o--------->|100100101...001000|  Allocation map for sectors 8092 - 12287
      |----|
      .    .
      .    .
      |----|
    N |  o--------->|001001000...001001|  Allocation map for sectors 4096*N - 4096*(N-1)-1
      `----'

Where N is the number of entries in the Free Space Descriptor table (equal to NumVolumeSectors / SectorsPerFSD, or in this example, NumVolumeSectors / 4096).

The Free Space Descriptor Location Table may span as many sectors as it needs in order to hold the location of all of the Free Space Descriptors, but the sectors must be contiguous. As shown in the table below, the Free Space Descriptor Location Table is very efficient at describing the media and therefore the FSDLT will be relatively small (for example, only 4 kilobytes per gigabyte of capacity given a VolumeSectorSize of 512 bytes). For this reason, it is suggested that any OS implementing the Phoenix File System load and store a copy of the entire Free Space Descriptor Location table in memory for quick reference.

`VolumeSectorSize`	`SectorsPerFSD`	Bytes accounted for per FSD	FSD location entries per sector of Free Space Descriptor Location Table	Sectors accountable per sector of Free Space Descriptor Location Table	Bytes accountable per sector of Free Space Descriptor Location Table
256 bytes	2048 sectors	524,288 bytes	32 entries	65,536 sectors	16,777,216 bytes
512 bytes	4096 sectors	2,097,152 bytes	64 entries	262,144 sectors	134,217,728 bytes
1024 bytes	8,192 sectors	8,388,608 bytes	128 entries	1,048,576 sectors	1,073,741,824 bytes
2048 bytes	16,384 sectors	33,554,432 bytes	256 entries	4,194,304 sectors	8,589,934,592 bytes
4096 bytes	32,768 sectors	134,217,728 bytes	512 entries	16,777,216 sectors	68,719,476,736 bytes
8192 bytes	65,536 sectors	536,870,912 bytes	1024 entries	67,108,864 sectors	549,755,813,888 bytes
16384 bytes	131,072 sectors	2,147,483,648 bytes	2048 entries	268,435,456 sectors	4,398,046,511,104 bytes

The suggested arrangement for the Free Space Descriptors in relation to the volume sectors they describe is that for even indexed Free Space Descriptors the Free Space Descriptor is located in the first sector it describes and for odd indexed Free Space Descriptor the Free Space Descriptor is located in the last sector it describes; this maximizes the number of contiguous sectors for allocation and minimizes the distance, and thus the seek time, between Free Space descriptors and the data they describe. This is by no means the only way to arrange the Free Space Descriptors, Free Space Descriptors need not even reside in the region they describe, but whenever possible it would be considered desireable to place Free Space Descriptors near the sectors they account for in order to minimize access times.

Bad Sector Recovery: when a given media is first formatted, the Free Space Descriptors are located only in "good" sectors (ie. not bad) and bad sectors are marked as used and recorded in a special File Allocation Layer - This layer exists to group volume sectors into units called files, a file is a collection of information that logically related. In the Phoenix File System, files are described using a tree structure where that tree leafs hold the actual file information. Each node and leaf of the data tree is described using a Tree Node Descriptor:

  Structure of a Tree Node Descriptor:
  Offset   Size           Field Name     Description
    0h     QWORD
            bit       0 : Type           (1) Descriptor refers to SubTree block (internal node)
                                         (0) Descriptor refers to data block (leaf)
            bits   1-63 : Location       Volume Sector number for block location
    8h     QWORD          Length         If Type is 1, is total number of bytes described by SubTree
                                         If Type is 0, is number of bytes in data block
  ----------------
  Total: 16 bytes

Where each block of information is exactly 1 volume sector in size and can hold information either about the files contents (a tree leaf) or about the location of additional blocks (a tree node). If a sector is a data block, Length bytes of the sector are considered to be part of the file's contents. If a sector is a SubTree block, it is simply a consecutive list of Tree Node Descriptors and the Length field represents the total number of bytes of the file's information that is described by all data blocks in or under the SubTree.
SubTrees can be nested any number of levels deep, but whenever possible should be balanced to minimize the amount of recursion.

Sparse files: a region of a file is considered sparse if it contains no data. This can occur, for example, if a new file is created, then a seek if performed to 1000 bytes into the file, and then a single byte is written and the file is closed. The total length of the file would be 1001 bytes, even though only 1 byte of information is actually stored in the file; the first 1000 bytes are considered sparse. The Phoenix File System is efficient is storing files with sparse data by making note of the condition using a Tree Node Descriptor. The Type indicates the region of the file is a data block since it actually refers to the file's contents. The Length is the number of bytes in the sparse region, and the Location has the special reserved value of 0. (Location value 0 is considered reserved in the File Allocation Layer because volume sector 0 will always hold the Volume Descriptor for the volume). There cannot exist a sparse subtree as it would be illogical. Any information read from a sparse file region will always be 0. A sparse region 0 bytes in length is used to indicate a Tree Node Descriptor is not in use (entire Tree Node Descriptor is all zeros).

The basic properties of a file are described using a File Node with the following structure:

  Structure of a File Node:
  Offset   Size           Field Name     Description
   00h     DWORD          MagicNumber    Special number to help discern a File Node from other
					 data should the file system become corrupt, equal to
					 31534650h ("PFS1")
   04h     DWORD          HardLinks      Number of references made from Directories to this file
   08h     DWORD          Flags          Basic file attributes
             bit      0 : ArchiveFlag    Set whenever the LastModified field is updated
             bit      1 : SystemFlag     Indicates file is an operating-system related file
             bit      2 : HiddenFlag     Indicates file should not be listed in default file listings
             bit      3 : ReadOnlyFlag   Indicates file cannot be written to or deleted
             bits   4-7 : (reserved)     must be 0
             bit      8 : ImmediatePurge Indicates whether file should be immediately purged on delete
             bits  9-15 : (reserved)     must be 0
                                     --- end general flags, begin internal flags ---
             bits 16-18 : FileType       Type definition for file data
                                           000 = generic file
                                           001 = directory
                                           010 = symbolic link
             bit     19 : (reserved)     must be 0
             bit     20 : Compression    reserved for file data compression flag; must be 0
             bit     21 : Encryption     reserved for file data encryption flag; must be 0
             bit     22 : DeletedFlag    Indicates whether or not this file has been deleted
             bit     23 : PurgeFlag      Indicates whether or not this file is to be purged
             bits 24,25 : InternalFlag   Indicates what file information, if any, is stored
                                         internally if the File Node
                                           00 = no internal data
                                           01 = internal rights information
                                           10 = internal extended attributes
                                           11 = internal file data
             bits 26,27 : (reserved)     must be 0
             bit     28 : NodeSize       Indicates whether or not the File Node occupies the full
                                         volume sector
                                           0 = File Node is half the size of the volume sector
                                           1 = File Node occupies the entire volume sector
             bits 29-31 : (reserved)     must be 0
   0Ch     DWORD          Owner          Object ID of owner of this File Node
   10h     DWORD          Creator        Object ID which created this File Node
   14h     DWORD          Modifier       Object ID of user who last modified File Node
   18h     DWORD          Created        Date/Time File Node was created
   1Ch     DWORD          LastModified   Date/Time File Node was last modified
   20h     DWORD          DataAccessed   Date/Time file data last accessed
   24h     DWORD          DataModified   Date/Time file data last modified
   28h     QWORD          FileSize       Total length of file data
   30h     1 TND          Rights         Rights list data tree
   40h     2 TNDs         EAs            Extended Attributes data tree
   60h     7 TNDs         FileData       File contents data tree
   D0h     x BYTES        InternalData   minimum of 48 bytes of space specifically set aside for
                                         storing small amounts of data inside the File Node
                                         without using data trees. The InternalFlag determines
                                         which information, if any, is stored internally.
  ----------------
  Total: 256 bytes minimum

A file node is always at most 1 volume sector in size and at least 256 bytes in size, as such, the amount of space reserved for internal data with a File Node can vary from 48 bytes to VolumeSectorSize-208 bytes in size. A File Node may occupy an entire sector or only half of a sector, the latter only being valid for VolumeSectorSizes of 512-bytes or more (since one half of 512 bytes is 256 bytes, the minimum size of a File Node). Furthermore, each File Node is identified using a File Node Number of which bits 1-63 indicate the volume sector number the File Node resides in, and bit 0 is clear if the File Node is in the first half of the sector and is 1 if the File Node is in the second half of the sector. The following table summarizes the minimum and maximum sizes of File Nodes and the amount of space reserved in each File Node for internal data, based on the size of a volume sector.

`VolumeSectorSize`	Minimum File Node Size	Maximum File Node Size	Minimum Internal Data Reserve	Maximum Internal Data Reserve
512 bytes	256 bytes	512 bytes	48 bytes	304 bytes
1024 bytes	512 bytes	1024 bytes	304 bytes	816 bytes
2048 bytes	1024 bytes	2048 bytes	816 bytes	1840 bytes
4096 bytes	2048 bytes	4096 bytes	1840 bytes	3888 bytes
8192 bytes	4096 bytes	8192 bytes	3888 bytes	7984 bytes
16384 bytes	8192 bytes	16384 bytes	7984 bytes	16176 bytes

When data is stored internally, the field(s) that would ordinarilly used to store Tree Node Descriptor(s) for the given data are instead overwritten with a portion of the internal data. This can be safely done since the InternalFlag identifies which data is stored internally, and if the data is being stored internally, then the external data Tree Node Descriptor field(s) are not used, thereby leaving them available to store additional internal data. The space normally reserved for the Tree Node Descriptors is first used in the storage of internal data before utilizing the space specifically reserved for internal data. If the Rights List or Extended Attributes are stored internally, the first two bytes of the internal data are the length of the remaining internal data in bytes; the remainder of the internal data is the actual information to be stored internally. In the case of internal file data, the FileSize field can be consulted to determine the amount of internal data, and as such two bytes are not prepended to the internal data. The following table shows the maximum amount of internal data that can be stored in a File Node utilizing this optimization; it should be noted that this efficient use of File Node space is not a feature that can be optionally implemented, but is a standard component of the Phoenix File System.

`VolumeSectorSize`	Maximum Internal Data Reserve	Maximum Internal Rights Info	Maximum Internal Extended Attributes	Maximum Internal File Data
512 bytes	304 bytes	318 bytes (320 total)	334 bytes (336 total)	416 bytes
1024 bytes	816 bytes	830 bytes (832 total)	846 bytes (848 total)	928 bytes
2048 bytes	1840 bytes	1854 bytes (1856 total)	1870 bytes (1872 total)	1952 bytes
4096 bytes	3888 bytes	3902 bytes (3904 total)	3918 bytes (3920 total)	4000 bytes
8192 bytes	7984 bytes	7998 bytes (8000 total)	8014 bytes (8016 total)	8096 bytes
16384 bytes	16176 bytes	16190 bytes (16192 total)	16206 bytes (16208 total)	16388 bytes

A Rights list is maintained to determine which users or groups of users have access to the information stored in the file data, and Extended Attributes are maintained to give system add-ons a storage area for additional file properties.

Directory files: Directories are special files which hold a list of files. This is used to provide a hierarchical organization to the file system. Every file must have an entry in at least 1 directory (or else the file cannot be accessed from the file system which is bad). A file's HardLinks is the number of directories in which the file is listed. A directory can hold any type of file including symbolic links and other directories. Every entry in the directory must have an associated file name, unique in the directory, for the user to discern which files are which. File names are case-sensitive Unicode strings, null-terminated. File names may contain any defined Unicode character except:

a forward slash (/)
a backward slash (\)
a control character (first 32 characters of Unicode set)

All directories must contain a "." and a ".." entry corresponding to the directory file itself and the directory file's parent directory respectively. The exact format of each directory entry is:

  Structure of a Directory entry:
  Offset   Size           Field Name     Description
   00h     QWORD          FileNode       File node of file associated with entry
   08h     WORD           NameLength     Length of file name in Unicode characters
   0Ah     x bytes        Name           Name associated with entry
  ----------------
  Total: 10+x bytes minimum

These entries are packed back-to-back in the directory file.

Symbolic links: Symbolic links are special files which simply hold the path to another file and the method to be used to retrieve it. Symbolic links can, in this manner, point to other files including generic files, directories, other symbolic links, or network resources. Symbolic links, unlike hard links, can refer to files located on volumes other than the one on which it resides as well as remote systems. The file data of a symbolic link holds the path to the other file or resource in URI format. URI format includes the method (protocol) to retrieve to data, and the location of the data to retrieve. The only protocol required to be implemented is the file method. The file protocol simply indicates that the referenced file is located on the local system, the location is simply the full path to the file. Other protocols such as http and ftp may optionally be implemented by the operating system. If no protocol is specified, file is assumed and paths relative to the current working directory can be used rather than full paths.
A symbolic link has its own set of flags and extended attributes, its own owner, its own creator, etc. The Rights for a symbolic link determine access to the link itself, a separate rights access check is performed on the file the symbolic link refers to.

Rights list format: The rights list consists of an array of Rights List Entries, where each entry specifies the rights for the file node for one user or group Object ID. The number of entries in the rights list is determined by the length of the rights list as stored in the file node; simply divide the length of the Rights by the size of a single RightsListEntry to get the number of entries. Each Rights List Entry has the format:

  Structure of a Rights List Entry:
  Offset   Size           Field Name     Description
   00h     DWORD          User           Object ID of user or group to whom the rights pertain
   04h     DWORD          Rights         Determines what rights the User has to this file node
             bit      0 : Find           user may read file node information
             bit      1 : Read           user may read file data
                                         For generic files  : may read the "contents" of the file
                                         For directories    : may scan the directory contents
                                         For symbolic links : may see where the link points to
             bit      2 : ReadRights     user may read the rights information of the file node
             bit      3 : ChangeRights   user may modify the rights information in the file node
             bit      4 : ChangeEAs      user may modify the extended attributes of the file node
             bit      5 : ChangeOwner    user may change the ownership of a file node
             bit      6 : ChangeFlags    user may modify the file node's general flags
             bit      7 : Unlink         user may remove a link to this file node
             bit      8 : Write          user may modify the file data of a generic file
             bit      9 : Redirect       user may change the file data of a symbolic link
             bit     10 : Create         user may add an entry to a directory (modify directory file data)
             bit     11 : Rename         user may rename an entry in a directory (modify directory file data)
             bit     12 : Remove         user may remove an entry from a directory (modify directory file data)
             bits 13-23 : (reserved)     must be 0
                     24 : Supervisor     user has all rights to the file node
             bits 25-30 : (reserved)     must be 0
             bit     31 : InheritMask    (1) rights list entry is an inherited rights mask
                                         (0) rights list entry contains access rights
  ----------------
  Total: 8 bytes

Rights for a given user are resolved using the fully qualified path of the file in question. The rights list of the file are scanned for an entry explicitly defining the rights of the user in question, if an entry is found then it completely determines the user's rights. If not, the parent directory's rights list is scanned for an entry explicitly defining the rights of the user, if an entry is found then it determines the user's rights, modifiable by an inherited rights mask. If an entry is not found, the process is continued up the directory tree until the root is reached. If the root is reached and no rights were ever defined, then the entire process is repeated per group the user belongs to. If no rights entries are found to be applicable to the user, then the user is presumed to have no rights for the given file node. The fact that directories which constitute the fully qualified path to the file can determine a file's access rights illustrates that rights for a file may be inherited. In addition, this also illustrates how through the use of inheritance, two directory entries which are linked to the same file node may actually have different access rights. Even though the rights list for the file node is the same for both directory entries (since both entries actually refer to the same file), the inherited rights can differ.

In addition to defining access rights for user and group objects, Rights List Entries can be used to store Inherited Rights Masks limiting the amount of access that can be inherited from directories composing the fully qualified path to a file node. Usually Inherited Rights Masks are defined for groups of users, however it is possible to also have Inherited Rights Masks per user which override masks per group. The full Inherited Rights Mask for a file node is determined in a similar manner as a user's access rights for the file node. The effective Inherited Rights Mask is initialized to all bits set, then the rights list of each directory up the directory tree from the file node is scanned for Inherited Rights Mask entries pertaining to the user and any group the user belongs to. For each entry found the rights mask is logically AND'ed with the current value of the effective Inherited Rights Mask to form the new value for the effective Inherited Rights Mask. Note that only directories higher up the directory tree than the file in question are scanned. When the root of the directory tree is reached, or when the effective Inherited Rights Mask becomes 0, the effective Inherited Rights Mask is complete. This mask is then logically AND'ed with a user's access rights whenever there is not an rights list entry in the file node explictly defining the user's access to that file node. For example, consider the following set of access rights and Inherited Rights Masks for a given user (for simplicity, this example does not include use of groups):

  File or Directory             Access Rights    Inherited Rights Mask   effective Inherited Rights Mask
  /example/of/rights                                                     11111111...11
  /example/of                                    11110111...11           11110111...11
  /example                      11011111...01    10011111...11           10010111...11
  /                             00001100...00                            10010111...11

  So the given user's access to the file node linked to /example/of/rights would be
                      Access Rights : 11011111...01
    effective Inherited Rights Mask : 10010111...11 AND
                                     --------------
     user's effective access rights : 10010111...01

One thing worth pointing out is that a user's access rights are determined by a single rights list entry either located in the file node's rights list or in a directory's rights list which forms the fully qualified path to the given file. On the other hand, a user's Inheritied Rights Mask is determined by AND'ing appropriate Inherited Rights Mask entries from each directory that forms the fully qualified path to the given file. Futhermore, a effective Inheritied Rights Mask is only applied when the user inherited their rights to a file from a directory above the file (ie. the given file did not explicitly list access rights for the given user or a group that the user belongs to).

Extended Attributes format: Extended Attributes are stored individually in packets describing the information stored by the attribute. These packets are stored linearly and are aligned on DWORD boundaries. The format of each Extended Attribute packet is:

  Structure of an Extended Attribute:
  Offset   Size           Field Name     Description
    00h    BYTE           NameLength     Length of attribute name (x)
    01h    BYTE           Type           Base data type stored in EA
                                           0 = Character (8 bits)
                                           1 = Short Integer Value (8 bits)
                                           2 = Integer Value (32 bits)
                                           3 = Integer Value (64 bits)
                                           8 = Floating Point Value (64 bits)
    02h    WORD           ValueLength    Length of attribute value data
    04h    x BYTES        Name           The name of the Extended Attribute
  x+04h    y BYTES        Value          The EA value; format is determined by the data type
  ----------------
  Total: x+y+4 bytes

Where x is equal to the length of the attribute name rounded to a multiple of 4 bytes and
y is equal to the length of the actual attribute value data.

While a value of 0 in the ValueLength field is acceptable, the NameLength may never be 0. Floating point numbers are stored as a double-precision ANSI/IEEE Standard 754-1985 binary floating point value ("64-bit real"). Any number of values of the base data type may be stored in a single extended attributes's value data (allowing "arrays" of data stored in a single extended attribute). Each entry in the attribute's value can be referenced given an index. The number of entries of the base data type that are stored in the extended attribute's value can be determined by examining the ValueLength field; the ValueLength must always be a multiple of the size of the base type stored. For example, a 9 character null-terminated string can be stored in an extended attribute by specifying a base Type of character and a ValueLength of 10 (9 characters and a null character). All strings should be stored null-terminated using the character data type. The file system should provide API functions to read and write null-terminated strings as extended attribute values. The file system can also perform bounds checking on requests to read from an index into the value data that is invalid by comparing the index with the number of entries actually present.

Bad Sector List: One File Node is used to maintain a file which holds a list of all known bad sectors on the storage medium. The File Node indicates a generic file with Hidden and System attributes set; no encryption or compression will ever be allowed on the Bad Sector List. At present, Extended Attributes and Rights information is undefined for this file. The format of the file is simply a series of QWORD values specifing the Volume Sector Numbers of any bad sectors. Since the Bad Sector List is stored as a file, the proper procedure to update the list is to first make the bad sector as being in-use by setting the appropriate bit in a Free Space Descriptor before writing to the Bad Sector List. This ensures that should the Bad Sector List require an additional sector of disk space in order to hold the new value, it does not allocate the sector already found bad! There is one exception to this policy: should the sector found bad be a sector normally reserved for a Free Space Descriptor, the Free Space Descriptor may simply be relocated to the next available sector, this leaves a "hole" where the bad sector is not accounted for by any Free Space Descriptor... that is alright because we cannot allocate a sector not accounted for by a Free Space Descriptor and so updating the Bad Sector List cannot possibly cause the list to expand into the bad sector.

Ideally, and this is not required in the implementation of the Phoenix File System, the Bad Sector List File Node should not be referenced from any directory (the hard link count should always be 0) and special system calls should be provided to add bad sectors to the bad sector list rather than writing to the file using standard system calls. The location of the Bad Sector List is always specified in the SuperBlock for the volume.

Should a sector which holds part of the Bad Sector List data become bad itself, the sector should be marked as in-use in the appropriate Free Space Descriptor, the remaining sectors (if any) should be reassembled as best as possible into an incomplete Bad Sector List, and then a full scan of the Volume should be performed to determine the remaining Bad Sector List entries and to possibly find new ones.

Directory Structure: path naming conventions, location of kernel, etc. put here. ie. How paths are formed.

The SuperBlock I'm just going to make notes of what fields that will need to be in the superblock as we think of them, we can go back and define the format later.

  Location of Free Space Descriptor Table
  Bad Block List File Node number
  Root Directory File Node number

  File System Version Number
  Number of bytes of reserved space in File Nodes for internal data
  A Features DWORD, now is reserved(0), but later on bits may indicate advanced features
  Date/Time created
  Volume sectors around which to locate directories (center of directory band(s)), up to 16
  Minimum acceptable percentage of volume sectors that may be free.
? Volume creator (who partitioned it and/or formatted it) ?

Backup copies of the SuperBlock are placed immediately after every 16th Free Space Descriptor on the volume. Any writes to the SuperBlock do not complete until all backup copies are also updated.

Phoenix File System Specification written and maintained by John Baldwin and Kelly Yancey