Patrick Valduriez (INRIA, France & LeanXcale, Spain)
Principles of Distributed Database Systems: spotlight on NewSQL
The first edition of the book Principles of Distributed Database Systems, co-authored with Prof. Tamer Özsu (University of Waterloo) appeared in 1991 when the technology was new and there were not too many products. In the Preface to the first edition, we had quoted Michael Stonebraker who claimed in 1988 that in the following 10 years, centralized DBMSs would be an “antique curiosity” and most organizations would move towards distributed DBMSs. That prediction has certainly proved to be correct, and most systems in use today are either distributed or parallel.
The fourth edition of this classic textbook [Özsu & Valduriez 2020] provides major updates, in particular, new chapters on big data platforms, NoSQL, NewSQL and polystores. In this tutorial, we introduce these major updates, with a focus on NewSQL.
NewSQL is the latest technology in the big data management landscape, enjoying a fast-growing rate in the DBMS and BI markets. NewSQL combines the scalability and availability of NoSQL with the consistency and usability of SQL. By providing online analytics over operational data, NewSQL opens up new opportunities in many application domains where real-time decision is critical. Important use cases are eAdvertisement (such as Google Adwords), IoT, performance monitoring, proximity marketing, risk monitoring, real-time pricing, real-time fraud detection, etc. NewSQL may also simplify data management, by removing the traditional separation between NoSQL and SQL (ingest data fast, query it with SQL), as well as between operational database and data warehouse / data lake (no more ETLs!). However, a hard problem is scaling out transactions in mixed operational and analytical (HTAP) workloads over big data, possibly coming from different data stores (HDFS, SQL, NoSQL). Today, only a few NewSQL systems have solved this problem.
A first in-depth presentation of NewSQL was given in a tutorial at IEEE Big Data 2019 with Prof. Ricardo Jimenez-Peris (CEO and founder at LeanXcale) [Valduriez & Jimenez-Peris 2019]. In this tutorial, we provide a taxonomy of NewSQL systems based on major dimensions including targeted workloads, capabilities and implementation techniques. We illustrate with popular NewSQL systems such as Google Spanner, LeanXcale, CockroachDB, SAP HANA, MemSQL and Splice Machine. In particular, we give a spotlight on some of the more advanced systems. We also compare with major NoSQL and SQL systems, and discuss integration within big data ecosystems and corporate information systems, using polystores. Finally, we discuss the current trends and research directions.
About the Speaker
Patrick Valduriez is a senior scientist at Inria, France, and the scientific advisor of the LeanXcale company. He has also been a professor of computer science at University Pierre et Marie Curie (UPMC), now Sorbonne University, in Paris (2000-2002) and a researcher at Microelectronics and Computer Technology Corp. in Austin, Texas (1985-1989). He received his Ph. D. degree and Doctorat d’Etat in CS from UPMC in 1981 and 1985, respectively. From 1995 to 2000, he was the manager of the Bull-Inria joint venture (called Dyade), which fostered technology transfer in IT and security. Dyade spined off five successful start-ups, including Kelkoo based on the Disco software that he built at Inria with his team. He has also been consulting for major companies in USA, Europe, Brazil and France.
He is currently the head of the Zenith team (between Inria and University of Montpellier, LIRMM) that focuses on data science, in particular data management in large-scale distributed and parallel systems and scientific data management. He has authored and co-authored many technical papers and several textbooks, among which “Principles of Distributed Database Systems” (with Professor Tamer Özsu, University of Waterloo). He currently serves as associate editor of several journals, including the VLDB Journal, Distributed and Parallel Databases, and Internet and Databases. He has served as PC chair of major conferences such as SIGMOD and VLDB. He was the general chair of SIGMOD04, EDBT08 and VLDB09.
He received prestigious awards and prizes. He obtained several best paper awards, including VLDB00. He was the recipient of the 1993 IBM scientific prize in Computer Science in France and the 2014 Innovation Award from Inria – French Academy of Science – Dassault Systems. He is an ACM Fellow.
[Özsu & Valduriez 2020] Tamer Özsu, Patrick Valduriez. Principles of Distributed Database Systems, 4th Edition, Springer, 2020.
[Valduriez & Jimenez-Peris 2019] Patrick Valduriez, Ricardo Jimenez-Peris. NewSQL : principles, systems and current trends. IEEE Big Data Conference, Los Angeles, December 2019.