SBBD – Para Que e Para Quem?
Neste ano de 2018, o 33o SBBD acontece junto com o 44o VLDB. O primeiro é o maior evento acadêmico nacional, enquanto o último é um dos maiores e mais importantes da área no âmbito internacional. Durante o SBBD, a comunidade brasileira de bancos de dados tem a oportunidade de acompanhar de perto os principais avanços científicos e tecnológicos na área, além de avaliar a importância do evento e o papel dos professores e pesquisadores na formação de pessoal e na contribuição para a sociedade. Nesta palestra, pretendo revisitar alguns dos diversos temas de P&D a que venho me dedicando ao longo dos anos, desde bancos de dados dedutivos até a análise de dados, passando pelas aplicações na bioinformática e pelos SGBDs autônomos. Aproveitarei também o convite para provocar uma reflexão sobre a nossa grande área de pesquisa, com foco na relevância e no papel de eventos como o SBBD para a ciência e a tecnologia no país.

Sérgio Lifschitz is an Associate Professor at the Informatics Department of Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Brazil. Sérgio has obtained his doctor’s degree at the École Nationale Supérieure des Télécommunications (now Télécom ParisTech), Paris, France (1994). He also holds an M.Sc. (1990) and a B.Sc. (1986), both in Electrical Engineering, from PUC-Rio. His primary research area involves database systems, including applications in bioinformatics, social networks data, and autonomic tuning and management. Sérgio often participates in SBBD events, either with tutorials (three times), eleven full papers published, other 18 contributions for SBBD co-located events, one best paper and two best demo prizes, more than thirty times a member of evaluation committees and once the SBBD program chair.

RAW – Fast Analysis on All Kinds of Data
Keynote presentation

Today’s scientific and business processes heavily depend on fast and accurate data analysis. Data scientists are routinely overwhelmed by the effort needed to manage the volumes of data produced. As general-purpose data management software is often inefficient, hard to manage, or too generic to serve today’s applications, businesses increasingly turn to specialised data management software, which can only handle one data format, and then resort to data integration solutions. With the exponential growth of dataset size and complexity, however, data format-specific solutions no longer scale for efficient analysis, thereby slowing down the cycle of analysing and understanding the data, and making decisions. I will illustrate the different nature of problems we face when managing heterogeneous datasets, and how these translate to fundamental challenges for the data management community. Then I will introduce RAW, a new solution inspired by these challenges. RAW overturns long-standing assumptions, enables meaningful and timely results, and promotes timely discovery.

Anastasia Ailamaki is a Professor of Computer and Communication Sciences at the Ecole Polytechnique Federale de Lausanne (EPFL) in Switzerland and the co-founder of RAW Labs SA, a swiss company developing real-time analytics infrastructures for heterogeneous big data. Her research interests are in data-intensive systems and applications, and in particular (a) in strengthening the interaction between the database software and emerging hardware and I/O devices, and (b) in automating data management to support computationally- demanding, data-intensive scientific applications. She has received an ERC Consolidator Award (2013), a Finmeccanica endowed chair from the Computer Science Department at Carnegie Mellon (2007), a European Young Investigator Award from the European Science Foundation (2007), an Alfred P. Sloan Research Fellowship (2005), an NSF CAREER award (2002), and nine best-paper awards in database, storage, and computer architecture conferences,. She holds a Ph.D. in Computer Science from the University of Wisconsin-Madison in 2000. She is an ACM fellow, an IEEE fellow, the Laureate for the 2018 Nemitsas Prize in Computer Science, and an elected member of the Swiss National Research Council. She has served as a CRA-W mentor, and is a member of the Expert Network of the World Economic Forum.

Querying Graph Databases with the GSQL Query Language
This talk presents GSQL, a recent addition to the spectrum of query languages for expressing graph analytics. GSQL is a high-level yet still Turing-complete language whose syntax is inspired by SQL in order to reduce the learning curve for SQL programmers, while simultaneously supporting a Map-Reduce interpretation that is preferred by NoSQL developers and that is conducive to massively parallel evaluation. The talk will also provide some context on the graph query language landscape represented in modern systems.

Alin Deutsch is a professor of Computer Science and Engineering at UC San Diego. His research is motivated by the data management challenges raised by database-powered applications. Alin’s interests include query language design and optimization for various data models ranging from text to the relational and post-relational models (with particular emphasis on graph data). He also works on cross-model data integration and on automatic verification of business processes. Alin earned his PhD in Computer Science from the University of Pennsylvania, an MSc degree from the Technical University of Darmstadt (Germany) and a BSc degree from the Polytechnic University Bucharest (Romania). He is the recipient of the 2018 ACM PODS Test of Time Award, a Jean D’Alembert Fellowship from the University Paris-Saclay, the Alfred P.Sloan Fellowship, the ACM SIGMOD 2006 Top-3 Best Paper Award, and an NSF CAREER award.

SafePredict: a Meta-algorithm for Machine Learning to Guarantee Correctness by Refusing Occasionally
Keynote presentation

We propose a meta-algorithm to reduce the error rate of state-of-the-art machine learning algorithms by refusing to make predictions in certain cases even when the underlying algorithms suggest predictions. Intuitively, our SafePredict approach estimates the likelihood that a prediction will be in error and when that likelihood is high, the approach refuses to go along with that prediction. Unlike other approaches, we can probabilistically guarantee an error rate on predictions we do make (denoted the decisive predictions). Empirically on seven diverse data sets from genomics, ecology, image-recognition, and gaming, our method can probabilistically guarantee to reduce the error rate to 1/4 of what it is in the state-of-the-art machine learning algorithm at a cost of between 11% and 58% refusals. Competing state-of-the-art methods refuse at roughly twice the rate of ours (sometimes refusing all suggested predictions).

Dennis Shasha is a Julius Silver Professor of computer science at the Courant Institute of New York University (NYU) and an Associate Director of NYU Wireless. He works on meta-algorithms for machine learning to achieve guaranteed correctness rates, with biologists on pattern discovery for network inference; with computational chemists on algorithms for protein design; with physicists and financial people on algorithms for time series; on clocked computation for DNA computing; and on computational reproducibility. Other areas of interest include database tuning as well as tree and graph matching. Because he likes to type, he has written six books of puzzles about a mathematical detective named Dr. Ecco, a biography about great computer scientists, and a book about the future of computing. He has also written five technical books about database tuning, biological pattern recognition, time series, DNA computing, resampling statistics, and causal inference in molecular networks. He has co-authored over eighty journal papers, seventy conference papers, and twenty-five patents. He has written the puzzle column for various publications including Scientific American, Dr. Dobb’s Journal, and the Communications of the ACM. He is a fellow of the ACM and an INRIA International Chair.