We've Moved!

Please update your bookmarks from embeddedarm.com to embeddedts.com. You've already been redirected and may close this modal to continue.

Industrial Grade Flash Reliability with RAID-like XNAND Driver

Published as a Whitepaper on Dec 6, 2009 (Updated Feb 23, 2010)

Introduction

Technologic Systems XNAND technology is a user-space device driver that uses a simple RAID algorithm, Reed-Solomon codes, and extra checksums to allow any Linux filesystem to be used with confidence on NAND flash. The result is a rugged non-volatile storage device with industrial grade flash reliability. Our TS-BOOTROM also boots from XNAND for an ultra-reliable bootup.

Motivation

Many embedded systems require a non-volatile storage media that is rugged, affordable, big enough for a Linux file system, and extremely reliable. The only technology that meets the first three criteria is NAND flash. Spinning hard drives and removable media such as SD cards or USB thumb drives do not always meet our requirements for durability in harsh environments. The best solution is flash storage that is soldered directly onto the board. NOR flash is reliable, but it is 10-20 times more expensive than NAND flash. So, NAND flash soldered to the board is the preferred hardware option.

NAND flash, unlike NOR flash, is prone to bad blocks. A 512MB NAND flash chip often has bad blocks on it when shipped from the factory. More blocks will become unreliable over time. The traditional way of dealing with bad blocks is with a flash-specific filesystem such as YAFFS2 or JFFS2. While flash file systems often perform well in embedded systems, and we will continue to support them on our products that currently use them, we feel they have room for improvement. Here are some concerns we have about flash file systems:

  • Flash file systems are tightly coupled to the Linux kernel. This can create extra risk and expense for applications that require a kernel version upgrade.
  • Flash file systems that are not cleanly unmounted can take an unacceptably long time to mount.
  • Flash file systems with a lot of files can have a large memory footprint.
  • Flash file systems are not as robust as we would like in how they handle unexpected power losses or unclean shutdowns.

No file system is perfect. What we wanted was a way to run our preferred file systems such as ext3 and jfs on NAND flash, with absolute confidence in the underlying hardware. The solution is a software layer that presents NAND flash as a block device so that any file system can be created on it.

Our engineers at Technologic Systems realized that bad flash blocks would not be a problem if enough redundancy was built into the system. Unlike other flash controllers and flash file systems, XNAND does not use fancy algorithms to work around bad blocks. With XNAND, if a block is marked bad, it is just not used, and this works smoothly because the data that would be on that block is stored in two other places anyway.

How It Works

With TS XNAND, all data blocks are written in three places on the underlying flash using a type of RAID. Each copy is on a separate NAND eraseblock, so if an entire eraseblock of flash is not functional, two copies of the data are still available. Each underlying flash sector also has a Reed-Solomon ECC syndrome as well as an extra checksum in its out of band data. When an XNAND sector is read, it passes both of those tests. On extremely rare occasions, it fails, in which case the second and third copy are available, both of which are also protected by Reed-Solomon and checksum. In order for an XNAND read to fail, there would have to be three failures, each of which is so bad that it cannot be corrected by the Reed-Solomon code, each of which is on a different eraseblock. Further, all three of these failures would have to fall into a set of eight eraseblocks which are systematically scattered around the NAND chip.

The XNAND driver also addresses the concern that NAND flash data will become corrupted over time or over thousands of read cycles. Blocks are periodically checked for data integrity and refreshed if necessary.

The XNAND driver is implemented entirely in user space. Like our latest SD card drivers, the code runs in user space and uses the Linux NBD system to provide a block device. These user space utilities are easier to write and debug than kernel drivers, and they provide kernel independence. This is a substantial benefit to customers who want to upgrade the kernel version on their TS board without losing driver support.

Pros and Cons of XNAND


ProsCons
  • Redundancy:  Data stored on XNAND has built in double redundancy. In the unlikely event of a bad flash block, there is still single redundancy for the data on that particular block.
  • Ultra-reliable bootup:  XNAND isn't just used for filesystems. We also use it to store a kernel image and initial ramdisk image. This allows a reliable boot from NAND flash, even on flash chips that don't have a guaranteed good first block.
  • File system flexibility:  Using XNAND and NBD, any file system supported by the kernel can be used with NAND flash.
  • Self-healing:  XNAND is always repairing itself. Data on NAND flash can be corrupted by too many read cycles or by the passage of years. The XNAND driver periodically checks random blocks for data integrity and refreshes them if necessary. Also, if an XNAND write is interrupted by a power failure, resulting in a situation where the usual 3 levels of redundancy are not available, this will be automatically repaired the next time that block is accessed.
  • Atomicity:  Write operations to XNAND are atomic. Barring unlikely scenarios involving a bad block, a write operation to XNAND will either succeed or fail, and when it fails, the previous data will still be available.
  • Kernel independence:  Since XNAND functions entirely in user space (except for the NBD client process) it can easily be ported to the latest version of the Linux kernel.
  • Reed-Solomon codes:  The XNAND driver uses a Reed-Solomon algorithm that is tailored to be used on NAND flash without sacrificing performance.
  • Read speed performance:  The many layers of protection -- RAID redundancy, Reed-Solomon decoding, and extra checksums -- have a negligible effect on read speed.
  • Impossibility of silent failure:  With SD cards and other flash controllers, it is possible to read corrupt data without receiving an error, despite the presence of ECC algorithms. With XNAND, data must be verified by a Reed-Solomon syndrome and an extra checksum. If an XNAND read does not report failure, the data integrity is guaranteed.
  • No block retirement:  Flash file systems will retire a block based on a single bit flip, resulting in a reduction in available storage space. With XNAND, the available space is constant and bad blocks are handled by the RAID algorithm.
  • Reduced write speed performance:  Due to the need to modify three different eraseblocks when data is written, writes to XNAND are relatively slow. Using the ext3 filesystem on XNAND on a TS-7552, file creation operations are about four times faster on the SD card compared to the XNAND drive.
  • Storage capacity:  Due to the RAID redundancy, the amount of storage is reduced by 50%. For example, a 512MB NAND flash chip with XNAND supplies 256MB of storage space.

Platform support

Technologic Systems is introducing XNAND on the TS-7550, TS-7552, TS-7553 and TS-4500. We expect to use it on more new products in 2010. Please contact us if you are interested in using XNAND on the TS-7800, TS-7390, or TS-7395, or if you have any questions about XNAND.


Document History

Date of Issue/Revision Revision Number Comments
12/08/2009 1.0 Document created
02/24/2010 1.1 Revised platform support