Revisiting Disk Formats in 2016

It's always kind of cracked me up, how despite Moor's law, it seems everyone is always struggling to keep up.  We had hard drives cases which couldn't handle larger drives, card readers which couldn't handle larger cards, desktop BIOSes which couldn't handle larger hard drives, motherboards which can't handle more RAM, etc.

It all seems a bit silly.  If you are building a slot for upgrading, make it handle at least 10 times what is common at the time - then you should be good to go for upgrades for a few years at least.  In general, this shouldn't be terribly difficult or expensive.

I don't expect old hardware to keep up with new speeds.  For example, I don't expect that a computer built in 1995 could have USB 3, DDR4, Thunderbolt, SATA 2, or Wireless AC.  Those things just weren't around - but there is very little valid reason it shouldn't be able to access a much larger hard drive than the one it shipped with.  It's easy enough to add more addressing bits than you actually need, so that they can be used later when actually required.  (And the overhead is small).

Still, though, it is to be expected that there are some limitations.  For example, You wouldn't expect an old 8 bit computer from the 80s to have 64 bit memory addressing.  That would have been quite a lot of overhead for something that wouldn't be needed for a very very long time.

Likewise, there are technical shifts (like IDE to SATA) which make using newer hardware impossible in many cases.

What I do expect, however, is for software to plan ahead, evolve, and improve.  Looking at the state of disk formats in 2016, the situation is actually quite sad.

First of all, let me say this: If you are a geek who is into this kind of thing, you may say "But ZFS is awesome!  XFS is cool!", etc.

Yes, yes, there are fantastic systems for keeping track of what is on disk - but they are only as useful as their installed base.  That ZFS silently detects and corrects corruption is a killer feature, but it isn't going to stop Sally the 6th grader's report from getting corrupted, because her report probably wasn't written on a huge Unix machine, but on a Mac or windows PC, or maybe on a Linux machine.  None of which normally run ZFS by default.

Even ZFS is quite a number of years old at this point, but it isn't supported by most systems (mainly for political reasons).  Microsoft and Apple would potentially have to pay for it even though it is open source.  Apple actually announced it as a feature for a beta version of OS X, but it was silently dropped from the feature list by the time the official release was made.  It was specifically designed to make the licensing incompatible with Linux, so there goes that.  Basically, it was supported on Solaris and now BSD.

The situation on Windows is like this:
1. You can use NTFS - Which is the "Native" file system for all versions of Windows since Windows 2000. (And Windows NT before that).  This system supports a fair amount of semi-modern features natively, including compression, permissions, and encryption.  NTFS is supported on Linux and Windows only partially.  (Usually read-only).  NTFS supports MAC file labeling (like NTFS also uses B-Trees to manage large directories, and also supports files larger than any hard drive in existence today.  NTFS encryption is also typically not supported on other platforms.  NTFS also supports a lot of other features (like hard links), which basically go unused in Windows.

2. You can use FAT - This has many variations (FAT32 vs. FAT16, VFAT, etc.)  FAT has no security attributes, and doesn't support files larger than 4 GB, etc.  Microsoft's native Encryption (BitLocker) is supported on FAT disks, but only on recent versions of Windows.  About the only good thing about the FAT filesystem is that it is supported on most consumer devices that accept memory cards.

3. You can use ExFAT - This is basically a system that is similar to FAT, but it has been changed in incompatible ways.  Apparently one of the goals of Microsoft was to include patents so that they could enforce licensing fees.  They also did away with the backup copy of the file allocation table, meaning that data recovery is now less possible.  FAT includes superficial changes for streaming (with the main target being video files), and supports larger files sizes.  Unfortunately, since it's incompatible with FAT, it won't work on many older devices (and makers of newer devices have to pay a fee to implement it).

4. Apple offers a read-only HFS driver for BootCamp.

5. Other third party vendors (Paragon comes to mind) offer solutions to mount HFS or EXT3 in read-write mode in Windows, however these cost money and require installation (and thus admin rights).

On OS X the situation is as follows:
1. Apple supports read/write access to FAT and ExFAT systems.  They don't, however, support Microsoft's BitLocker Encryption.
2. Apple supports the legacy HFS format (from OS X 6.x - 9.x)
3. Apple's current format is HFS+.  HFS+ supports normal POSIX (Unix) security, Access Control Lists, full disk encryption, compression, metadata, journaling, sparse files, and more.  HFS+ Supports symbolic links, tags, and sort-of supports hard links.
4. Apple ships read-only drivers for NTFS on at least some versions of OS X, but they can not read BitLocker drives.
5. Third parties sell read-write drivers for NTFS and ext3 for OS X.
6. There is an experimental ZFS driver.

On Linux, it looks like this:
Linux is actually in the best position in terms of mature support for robust file systems.  There are a few reasons for this: Firstly, it is a system that was originally written by geeks, for geeks - and geeks care about things like file systems.  Secondly, it is used for mission critical tasks, and thus has attracted a lot of interest from those companies looking to use Linux for such applications.
1. Ext3 and now Ext4 are the default file systems for most distributions.  The good thing about these systems is that most of the new features are backwards and forwards compatible with old versions of the file system.   These filesystems support Unix security, very large files, MAC labeling and other metadata, B-Tree directories, Transparent Encryption, etc.
Although 3rd party drivers are available for other operating systems - they typically only support ext2 or ext3 level features.
2. JFS - A robust file system capable of handling very large files, etc. - contributed by IBM.
3. XFS - A robust file system capable of handling very large files, etc. - contributed by SGI.
4. BTRFS - An experimental new file system for Linux with more scalability, pooling, balancing, snapshots, and checksumming built in.(Originally designed at Oracle)  When complete, this system will offer many of the advantages of ZFS..
5. Support for FAT - Read/write support for all normal FAT/VFAT formats.
6. Read-only support for NTFS.
7. Numerous other systems like ReiserFS, etc.
(Android basically follows the Linux standards, since it is based on Linux - and typically uses ext4 in recent versions).

BSD Unix:
1. UFS/BFS/FFS - This is the standard filesystem for BSD variants.  This format stems from historical Unix formats, but has evolved overtime to add new features.
2. HAMMER - System similar to BTRFS or ZFS developed natively for BSD.
3. ZFS - ZFS was an awesome system developed for Solaris supporting redundancy, checksums, pooling, etc. -  but with largely wasted potential since it isn't implemented on any popular system, and basically not available for Linux natively due to licensing restrictions.  BSD supports it, though.


So why care about all of this?  Well a few reasons:
1. Systems like BTRFS and ZFS can theoretically improve the lives of not just data center managers but ordinary people as well.  You can (theoretically) throw together a random collection of disks and have the system manage what files go where automatically.  You set the system up to be redundant and highly resistant to data loss.  You can move partitions around on the disk, shrink and grow them, add/remove storage capacity, etc.  This is honestly the kind of stuff that regular people should be able to do in 2016 without needing an IT degree.
2. You should be able to have an external hard drive or flash memory stick that: a. Supports compression, b. Supports encryption, c. Supports large files, and d. Works on Windows, Linux, and OS X.

Neither 1 or 2 are really possible at this time.  #1 is perhaps possible with ZFS and Btrfs, but the end user tools and market penetration are not there yet.  #2 is simply not possible without purchasing a lot of third party tools, and difficult even then.

It's really sad to see politics be the limiting factor with both ExFAT and ZFS implementations.  Each vendor seems content to develop their own standard and even prevent other people from using it.  That's just sad.

What we end up with is people using FAT32 for external hard drives because it's the only thing that currently works between systems.  Then they have no compression, no security, no encryption, no metadata, and they can't even hold big files like Movies or virtual machines.  Suddenly non-geeks have to start caring about something like file systems, too.  It's really pathetic to see a Windows error message show up when you try to copy a 5 GB file into an empty 1TB drive telling you that you can't copy the file because it's too large.

There is an interesting paper on how the Btrfs converter works, in that it can convert from Ext2/3/4 to BTRFS in a non-destructive way.  I would like to see a universal file-system designed with this kind of idea in mind.  Some enhanced version of FAT that can handle large files, encryption, and compression, without any patents or licensing restrictions, and be backwards compatible with FAT32 as much as possible.  It doesn't have to be the best at everything, but it should allow any size of drive or file.

Comments

Popular Posts