Error Handling and Data Protection
Every vendor addresses NAND management in a slightly different way with unique software and firmware in the controller. The primary objective is to improve SSD endurance through Flash management algorithms. Proactive cell management provides improved reliability and reduced bit error rates. State-of-the-art controllers also employ advanced signal processing techniques to dynamically manage how NAND wears. This eliminates the need for read-retries by accessing error-free data, even at vendor- specified endurance limits. In addition, techniques such as predictive read-optimization ensure there is no loss of performance during the useful life of the drive.
Some technologies also incorporate controller-based media access management, which dynamically adjusts over the lifetime of the media to reduce the Unrecoverable
Bit Error Rate (UBER). Advanced Error Correction Code (ECC) techniques enable a higher degree of protection against media errors, leading to improved endurance while maintaining or delivering higher performance.
From a data protection standpoint, certain SSDs can prevent data loss associated with Flash media. These products provide the ability to recover from NAND Flash page, block, die and chip failures by creating multiple instances of data striped across multiple NAND Flash dies.
Fundamentally, each NAND Flash die consists of multiple pages which are further arranged in multiple blocks. Data stored by the controller is managed at the NAND block level. Software in the controller is used to arrange data in stripes. When the host writes data to the SSD, redundancy information is generated by the controller over a stripe of data. The controller then writes the host data and the redundant data to the Flash stripes. Data in the
stripe is spread across the NAND Flash blocks over multiple Flash channels, so that no two blocks of data within a stripe resides in the same NAND block or die. The result is RAID-like protection of NAND that yields very high reliability.
PCIe SSDs use more power than their SATA and SAS counterparts. High-end NVMe-compliant PCIe SSDs generally specify maximum power ratings for Gen-3 x4 around 25 watts. While there are “low-power” PCIe SSDs, they typically have lower performance characteristics than the high-end devices.
A few products on the market also offer field programmable power options that allow users to set power thresholds. As a lower power threshold can throttle performance, users should check with the manufacturer for proper power/performance tuning.
SSD performance typically is measured by three distinct metrics
– Input/Output Operations per Second (IOPS), throughput and latency. A fourth metric that is often overlooked but important to note is Quality of Service (QoS). Each is described below:
1. IOPS is the transfer rate of the device or the number of transactions that can be completed in a given amount of time. Depending on the type of benchmarking tool, this measure could also be shown as Transactions per Minute (TPM).
2. Throughput is the amount of data that can be transferred to or from the SSD. Throughput is measured in MB/s or GB/s.
3. Latency is the amount of time it takes for a command generated by the host to go to the SSD and return (round trip time for an IO request). Response time is measured in milliseconds or microseconds depending on the type of SSD.
4. QoS measures the consistency of performance over a specific time interval with a fixed confidence level or threshold. QoS measurements can include both Macro (consistency of average IOPS latency) and Micro (measured command completion time latencies at various queue depths).
Performance measurements must be tied to the workload or use case for the SSD. In some cases, block sizes are small, in others they are large. Workloads also differ by access patterns like random or sequential and read/write mix. A read or write operation is sequential when its starting storage location, or Logical Block Address (LBA), follows directly after the previous operation. Each new IO begins where the last one ended. Random operations are just the opposite, where the
LBA is not contiguous to the ending LBA of the previous operation. SSD controllers maintain a mapping table to align LBAs to Flash Physical Block Addresses (PBA). The algorithms employed by different vendors vary and have a big impact on both performance and endurance.
The mix of read and write operations also impact SSD performance. SSDs are really good at reads since there are very few steps that the controller must take. Writes on the other hand are slower. This is because a single NAND memory location cannot be overwritten in a single IO operation (unlike HDDs that can overwrite a single LBA). The number of write steps depends on how full the device is and whether the controller must first erase the target cell (or potentially relocate data with a read/ modify/write operation). Overall, SSDs can deliver very high IOPS in small random read access patterns and high throughput with large block sequential patterns.