Tutorial – Not your Grandpa’s SSD: The Era of Co-Designed Storage Devices

Abstract

The Solid-State Drive (SSD) landscape is in constant evolution. For years, this evolution was hidden behind the unchanging abstractions of block devices and POSIX I/O. However, these abstractions have become problematic. They hinder performance and no longer reduce software complexity. Such a state of affairs impacts the database community in at least two ways.

First, using SSDs through legacy interfaces that hide internal mechanisms invariably results in erratic performance. The blame often goes to SSDs’ notoriously expensive garbage collection. In truth, several other complex processes result in non-linear effects in terms of latency and bandwidth. In this tutorial, we describe these processes and how they are implemented in modern devices. This knowledge will help system designers better choose SSDs and shape database workloads to match their performance characteristics.

Second, the inadequacy of the traditional I/O abstractions opens up an entire research field focused on the co-design of SSD and database management systems (DBMS). Such research aims at devising mechanisms and policies coupling the storage manager of a DBMS and SSD internals: e.g., placing an SSD FTL (its “brains”) under the control of an application, changing SSD subsytems in response to the workload, or executing logic within a SSD on a database’s behalf. In this tutorial, we describe the research opportunities and challenges through this continuum of DBMS/SSD co-design techniques, and present platforms supporting their simulation and prototyping. We believe that those two areas—a more seamless integration of Database and Storage, and the study of SSD variations adapted to Database computations—are central to the development of the next generation of Database Systems. This (opinionated) survey will equip both researchers and practitioners alike to enter the field.

Authors

Alberto Lerner is a Senior Researcher at the University of Fribourg, Switzerland, at the eXascale Infolab. In the past, he was a postdoctoral researcher at IBM Research (both at T.J. Watson and Almaden), and participated in the design and development of several commercial database systems including at IBM, Google and MongoDB. His recent work focuses on using the network and storage stacks’ computational capacity to offloading database systems logic.

Philippe Bonnet is professor at the IT University of Copenhagen. Philippe is an experimental computer scientist with a background in database management. For twenty years, he has explored the design, implementation and evaluation of database systems in the context of successive generations of computer classes.