Proposed Phoenix File System

  The file system will be composed of a number of layers of abstraction as follows:
 
    Logical Sector to Physical Media mapping - This lowest layer would be implemented by the
      storage device hardware driver. It would convert the device-dependant physical
      representation of the storage device to logical 512-byte sectors each numbered starting
      from 0 and continuing based on the storage capacity of the device. Numbering is such that
      the seek time between any two consecutively numbered sectors is minimum. Logical sector
      numbers are stored as 64-bit values. The "beginning" of the storage device is considered
      where the logical sector numbers are low; the "end" of the storage device is considered
      where the logical sector numbers are high.

    Logical Sector Allocation - This layer exists to keep track of which logical sectors have
      been allocated to hold data, which are available, and which are unusable (bad).

      Single logical sectors called Free Space Descriptors will be used to indicate whether
      up to 4096 sectors each are in use or available for use. This is done by using each bit
      in the 512-byte Free Space Descriptor to represent whether a single sector is in use or
      not (0 indicates the respective logical sector is in use, 1 indicates it is not). The
      number of Free Space Descriptors varies based on the storage capacity of the device. The
      location of each of the Free Space Descriptors is stored as a series of Logical Sector
      Numbers in a series of sectors toward the beginning of the storage device. The first
      logical sector number would hold the location of the first free space descriptor (which
      would describe the first 4096 sectors), the second logical sector number would hold the
      location of the second free space descriptor (which would describe logical sectors 4096
      through 8191), etc. This table would be called the Free Space Descriptor Location Table
      and would consist of enough consectutive 512-byte sectors to hold as many 64-bit logical
      sector numbers as are required to locate enough Free Space Descriptors to describe the
      entire meida. Each sector of the Free Space Descriptor Location Table could hold the
      location of up to 64 Free Space Descriptors, with each Free Space Descriptor indicating
      whether or not each of up to 4096 sectors is available for use, for a total of 262144
      sectors accounted for per sector of the Free Space Descriptor Location Table, or 128
      megabytes of storage capacity. The Free Space Descriptor Location Table may span as
      many sectors at it needs, but the sectors MUST be contiguous. Logical sectors used by
      the Free Space Descriptor Location Table and by the Free Space Descriptors are
      considered in use and therefore are not available for allocation. It is not possible
      for portions of the logical sector mapping to not be accounted for by Free Space
      Descriptors nor for them to overlap.


      FSDLT         FSDs
      .----.
    1 |  o--------->|100010010...100101|  Allocation map for sectors 0 - 4095
      |----|
    2 |  o--------->|000010101...110001|  Allocation map for sectors 4096 - 8091
      |----|
    3 |  o--------->|100100101...001000|  Allocation map for sectors 8092 - 12287
      |----|
      .    .
      .    .
      |----|
    N |  o--------->|001001000...001001|  Allocation map for sectors 4096*N - 4096*(N-1)-1
      `----'  

      It is suggested the OS load and store the entire Free Space Descriptor Location Table in
      memory for quick reference; only eight sectors, or 4 kilobytes, of File Space Descriptor
      Location Table are required per gigabyte of capacity thereby making it quite economic
      to store the entire table in memory.
      The suggested arangement for the Free Space Descriptors in relation to the logical
      sectors they describe is that for even indexed Free Space Descriptors the Free Space
      Descriptor is located in the first sector it describes and for odd indexed Free Space
      Descriptor the Free Space Descriptor is located in the last sector it describes; this
      maximizes the number of contiguous sectors for allocation (8190) and minimizes the
      distance, and thus the seek time, between Free Space descriptors and the data they
      describe. 

      Bad Sector Recovery: when a given media is first formatted, the Free Space Descriptors
      are located only in "good" sectors (ie. not bad) and bad sectors are marked as used and
      recorded in a special Bad Sector List described below. Should a sector be found to be bad
      after format the given sector should be marked as in use in the appropriate Free Space
      Descriptor and added to the Bad Sector List. However, if the bad sector is detected at
      the location of a Free Space Descriptor then major error recovery methods need to be
      employed (as described below) resulting in the sector being added to the Bad Sector List
      and the Free Space Descriptor being moved to the a free good sector (or possibly moving
      data from a used sector to a free sector and then locating the Free Space Descriptor
      there so as to minimize the distance between the Free Space Descriptor and the data it
      describes).
      
    Data Chain Layer - This layer exists to group a number of logical sectors into an ordered
      list which can completely describe a series of bytes of any length up to 2^64-1 bytes
      long. This can be thought of as a simplistic file mechanism. This layer is intended to
      bring order to the data stored on the media. Data Chains are described using Data Chain
      Descriptor Blocks which each 1 logical sector in size. Each Data Chain Descriptor Block
      has the following format:
 
           42 x Data Chain Descriptor Entry (12 bytes each) ----------.
                 | 8 bytes - Start sector of data chain segment       |
                 | 4 bytes - Number of sectors in data chain segment  |
                 `----------------------------------------------------'         
      8 bytes - Sector number of next Data Chain Descriptor in chain (or 0 if is last
                descriptor in chain)

      * first Data Chain Descriptor Block in chain has the following special definitions for
        the first fields of the first Data Chain Descriptor Entry:
           
           8 bytes - Total size of chain in bytes.
           4 bytes - Initial Data Chain Descriptor Block flag (value = 0)

      * empty Data Chain Descriptor Entries have both fields equal to 0.

      if all Data Chain Descriptor Entries have their length field (the 4 byte field) equal to
      0, then the chain is considered empty.
          
    File Layer - This layer actually creates the user model of the file system. It describes
      files and their attributes from the user perspective and lays the foundation for the
      file system security. Any higher layers, including directory structure, are implemented
      in terms of files. Files and their attributes are described using File Nodes which have
      the following format:

      Offset    Size  Description
        0h        4   Magic Number (used to help locate File Nodes in a corrupt file system)
        4h        4   Date/Time File Node Created
        8h        4   Date/Time File Node data last accessed
        Ch        4   Date/Time File Node data last modified
       10h        4   Date/Time File Node last modified
       14h    	  4   Link count (hard links)
       18h	  8   Location/Internal Information of data chain
       20h	  8   Location/Internal Information of extended attributes chain
       28h	  8   Location/Internal Information of rights list chain
       30h	  8   Extended Node sector location (or 0 if none)
       38h	 16   (reserved - must be 0)
       48h	  4   Flags
                        System				boolean
			Archive				boolean
			Hidden				boolean
			Read-Only			boolean
			Purge on delete			boolean
			Deleted				boolean
                        Internal Data			enumerated (none, direct, indirect)
			Internal Extended Attributes	enumerated (none, direct, indirect)
			Internal Rights List		enumerated (none, direct, indirect)
       4Ch	  1   Type
       4Dh 	  3   (reserved - must be 0)
       50h  432-944   Reserved for File Node internal data
	
    File Nodes are usually 1 logical sector in size (512 bytes) although may be as large as
    1024 bytes through the use of the Extended Node sector.
    Field descriptions:
      Magic Number     - some arbitrary 32-bit number which is used in file system error
                         recovery to help indicate the presence of a file node. The repair
			 utility can scan for this value in the first 4 bytes of a sector, and
			 if matched can then begin analyzing the rest of the sector. Currently
			 the value is 0x424A594B, however future revisions may use different
			 magic numbers to help indentify themselves.
      Date/Time fields - 32-bit encoded date/time identifiers.
      Link count       - Number of references to this File Node from the Directory Structure.
			 If 0, file has no references from the Directory Structure and is
                         therefore in accessible.
      Type             - Indicates what type of file it is; this determines the file system's
			 interpretation of the contents of the data field. Currently valid
			 values include:
                           0  (reserved)
                           1  File          - file system makes no assumption about the format
					      nor contents of the file data
			   2  Symbolic Link - file system interprets file data as fully
					      qualified path name of a File Node that the file
					      system should be redirected to.
			  10  Directory     - file system interprets file data as directory
					      information (see section on Directory Structure)

 
      System Flag      - Only can be deleted by the super-user and they are prompted if they
			 are sure they want to delete a system file.
      Archive Flag     - Set whenever the Date/Time File Node last modified field changes
      Hidden Flag      - Simply indicates whether or not file should be displayed in basic
			 file listings (does not indicate the ability to see the file, but
			 only the default behavior of file listings...see rights list
                         information on making files invisible to certain users)
      Read-Only Flag   - Simply provides an extra safeguard against accidently overwrite or
			 deletion of a file. See rights list for denying write ability for
			 certain users.
      Purge Flag       - Indicates whether or not the file should be immediately purged when
			 deleted.
      Deleted Flag     - Indicates whether or not the file has been deleted and is waiting for
                         purge. When a file is deleted its link count is decremented; when the
			 link count reaches 0 the Purge on Delete flag is tested, if set then
			 the file is immediately purged from the system, otherwise the
			 Deleted flag is set and the operating system may purge the file at
			 its discretion. It is recommended, but not required, that the OS
			 maintain a file with the fully qualified pathnames of every file
			 tagged as deleted so that the user may more easily locate and recover
			 files that were accidentally deleted (otherwise, an "undelete"
			 facility would have to scan the entire file system).
      Internal flags   - Each of the three flags: Internal Data, Interal Extended Attributes,
			 and Internal Rights List is used to specify where the respective
			 information for a file is stored. Each can have one of three values
			 represented by 2 bits each in the flags field. The summary of the
			 values is as follows: If a given internal flag has the "none" value,
                         the respective Location field holds the Logical Sector Number of
			 the first Data Chain Descriptor Block of the respective information.
			 If a given internal flag has the "direct" value it means the
			 respective information is stored directly in the file node; the
			 appropriate Location field is used to store the offset within the
			 file node of the information as well as the length of the information
			 each as 32-bit values. If a given internal flag has the "indirect"
			 value it means the respective information is again stored in a
			 Data Chain but the Data Chain Descriptor Entries describing the chain
			 are stored internally in the file node. The appropriate Location field
		         is used to store the offset within the file node of the first Data
			 Chain Descriptor Entry as well as the number of Data Chain Descriptor
			 Entries that are used each as 32-bit values.
			 There can be any combination of internal Data, Extended Attributes,
			 and Rights List. It is recommended that internal data offsets be
			 8-byte aligned to ensure that data structures are properly aligned in
			 memory for faster access and to allow for subtle changes in the
			 internal information without having to rearrange all of the file
			 node's internal data structure.
      Location fields  - Can either store the Logical Sector Number of the first Data Chain
			 Descriptor Block of the appropriate information or can be used to
			 store a 32-bit offset/length pair for internal information as defined
			 by the respective Internal flag bit. Note that if an internal data
			 offset or the full length of internal information beginning at a given
			 offset does not fall within the first 512 bytes of the file node
			 there MUST exist an Extended Node sector; if there is no Extended
			 Node sector, a file system error has occurred. The same error can be
			 caused also if an internal data offset or the full length of internal
			 information beginning at a given offset does not fall within 1024
			 bytes.

      The Extended Attributes and Rights List information are used by the file system to store
      additional information about a file and security information about the file respectively.
      Each of these two fields of information are stored as hashed index databases as
      described:

        (John...what do you suggest for a good hashing function for this? I figured that would
         be better than trying to scan a bunch of name=value pairs or linear records. I would
         think that we would want to use the same function for both to minimize development time
         and memory overhead. Thanks!)

        Extended Attributes - 


        Rights List         -
    
    Directory Structure - this layer organizes files into heirarchies and associates unique
      fully qualified pathnames with each one. A special file type called Directory exists to
      describe the logical organization of files within the file system (see the Type field of
      File Nodes). In Directory File Nodes, the operating system is concerned with the
      information stored in the node's data field, this is because the data is the list of
      Files Nodes in the directory. Each entry in a directory listing consists of its filename
      as well as the Logical Sector Number of the File Node. Filenames are stored in Unicode
      and should be null-padded to 8-byte increments (to allow for minor renaming without
      having to rewrite the entire directory list. The directory data is preceded by a header
      containing the following information maintained for hashed indexing into to the directory
      list:

        Offset	  Size  Description
          0h	    1	Number of entries in hash table (should be prime) = N
          1h        1   (reserved - must be 0)
          2h	    1   Flags
			  Entry Size			enumerated (dword, quadword)
	  3h	    1   (reserved - must be 0)
          4h      4*N   Hash table entries (if Entry Size = dword)
               OR 8*N   Hash table entries (if Entry Size = quadword)


      The hashing function is defined as taking the sum of all the ordinal values of the 
      characters of the filename modulo the number of hash table entries; this produces the
      index into the hash table to lookup the offset within the directory data of the first
      entry with the same hashed index, thereby reducing the number of directory entries that
      have to be scanned to find any particular one. As mentioned, each hash table entry is an
      offset within the directory data. If the number of entries in the hash table is 0, then
      assume there is no hash table and DO NOT perform the modulo to determine hashed index (as
      this will result in a divide by 0 exception). Suggested values for the number of hash
      table entries are as follows:
					          Average
        Hash Table Size	  Directory Entries   Entries per Index   
              7               0 - 63	         0.00 - 9.00
             17              64 - 127		 3.76 - 7.47
             23		    128 - 255		 5.57 - 11.13
             31             256 - 511		 8.26 - 16.46
  	     47		    512 - 1023		10.89 - 21.77
             63		   1024 - 2047		16.25 - 32.49
             97		   2048 - 4095		21.11 - 42.22
	    127		   4096 - 8191	        32.25 - 64.50
            193            8192 - 16383		42.45 - 84.89
            255		  16384 - xxx		64.25 - xxx

      Each directory entry is variable length as defined below:
  
        Offset    Size  Description
          0h	    8	Logical Sector Number of file node
	  8h        2   Length of filename -1 (in characters) = N
	  Ah	  N+2   Filename (in Unicode)
        N+Ah	  0-7   null padding to keep all entries aligned to 8-byte boundaries

      Currently, filenames are limited to a maximum of 256 characters. It is recommended that
      any directory listings displayed to the user be sorted alphabetically in their selected 
      language.