Dataset showcase WORKSHOP

SBBD DSW - 8th EdiTION

The publication and availability of datasets (whether open or not) have become highly relevant due to the increasing attention given by various sectors, such as the media, industry, academia, and government. The task of making data available is important for numerous reasons, ranging from its reuse in digital applications developed by society to enabling the reproduction of experiments and developments by the scientific community. Therefore, within the context of the Brazilian Database community, making datasets available is intrinsically important, as it fosters new research and development questions that can be explored through such efforts.

In this sense, the purpose of the SBBD DSW is to provide a forum for sharing and discussing approaches to the construction and organization of datasets that serve as the basis for research work developed by the Brazilian scientific community. The contribution of a paper to be published at the SBBD DSW is the final product in the form of a dataset, usually extracted from a database or a Web platform, cleaned and curated, often enriched with external data, and suitable for reuse in other scenarios or for experiment reproducibility. While the main contribution is the dataset itself, the paper must present all the necessary information to understand and properly use it.

DSW website: https://sites.google.com/view/sbbd-dsw/edição-2026/chamada

SUBMISSION

Papers must describe the data as curated by the author team and made publicly available. As a guiding principle, the dataset must be useful and readily reusable by third parties, for instance by: adding value to the data for the community through preprocessing or filtering; providing an easy-to-understand organization through a schema, data dictionary, taxonomy, ontology, or other formalism; offering facilitated access through appropriate mechanisms; or presenting distinguishable quality achieved through complex curation and cleaning processes.

Papers submitted to SBBD DSW must be written in Portuguese, English, or Spanish, must include an abstract in English, follow the SBC formatting guidelines, have 6 to 10 pages (with up to two additional pages exclusively for acknowledgments and references), and be submitted via JEMS. Each submission must include, as appropriate and in the order preferred by the author team:

A description of the data source(s) and the complete methodology for data collection or generation (including public availability of the tool used to create or generate the data, when applicable);
A description of the storage mechanism, including, when available, a schema or data dictionary, taxonomy, ontology, or other formalism that facilitates reuse by third parties;
A quantitative description of the dataset, as well as an initial data analysis characterizing the dataset and reporting the amount of missing data, number of tables, and other relevant information that can be used to assess dataset quality;
A description of how the data have been used (if applicable, referencing published or submitted papers that use the data and how they do so) and of its novelty, since even if the data have been used in other publications or submissions, the complete dataset description as provided in the DSW submission must be original;
A discussion of existing challenges and potential limitations in data usage;
Ideas for different uses of the data, application scenarios, research questions that could be formulated or addressed based on the dataset, and possible improvements to the data; and
A public download location, as the dataset must be publicly available at the time of paper submission for review. Preferably, authors should use online platforms specialized and suitable for public data maintenance, such as GitHub, Zenodo, Figshare, or OSF (i.e., avoiding private repositories such as Google Drive folders, Dropbox, OneDrive, or similar services).

It is important to note that submissions to SBBD DSW must not overlap with submissions to the main conference or its satellite events, as the contribution is entirely different from that of papers typically published at SBBD. In particular, data showcase papers are not:

Surveys, systematic reviews, or empirical or experimental evaluation studies;
Papers proposing tools for data generation or processing;
Based on weak or questionable data collection heuristics; or
Simple applications of generic tools to generate data that can be quickly and easily reproduced by anyone.

Authors are encouraged to review papers published in previous editions of SBBD DSW and to include references as appropriate.

Finally, all authors must commit to complying with the SBC Code of Conduct for Authors and Publications, available on the SBC website and on SOL. Moreover, the submission of a full paper must be made under the condition that the work has not been previously published.

Regarding the use of Generative AI, the SBC Code of Conduct for Authors, Part II, Art. 2, states:

III – Use of Generative Artificial Intelligence (AI): The use of Generative AI tools and technologies for content generation, writing, and/or content revision in papers must be explicitly declared in the work. The declaration may appear in the Acknowledgments section, in the methodology, or in a section specifically defined for this purpose, according to the adopted template, and must list the tools used and describe where they were applied (e.g., text, tables, figures, citations, etc.). Such tools must not be listed as authors of a paper. The use of these tools does not exempt the authors from full responsibility for the content, including in cases where plagiarism is identified.

For DSW 2026, the inclusion of an unnumbered subsection entitled “Use of Artificial Intelligence”, placed immediately before the References section, is mandatory. This subsection must declare whether Generative AI technologies were used and, if so, which tools were employed and for what purposes (e.g., translation, grammatical revision, etc.). This section does not count toward the maximum page limit of the paper.

EVALUATION, PUBLICATION, AND PRESENTATION

Each submission will be reviewed by at least three members of the Program Committee, in order to provide more comprehensive feedback to the author teams. The evaluation criteria include:

Technical quality: The contribution is clearly a dataset; the paper presents all the necessary information to understand and use the dataset; the dataset is clearly defined, including acquisition methodologies, curation processes, limitations, and challenges;
Originality: The work adds value to the data through preprocessing or filtering, or presents distinguishable quality through complex curation and cleaning processes (i.e., it is not merely the collection of an online HTML source, for example);
Quality of presentation: The paper is easy to read and understand, with correct language usage, well-structured and coherent text, clear figures and tables, proper formatting according to the template, and compliance with the page limits specified in the call (6 to 10 pages, plus up to 2 additional pages for acknowledgments and references);
Potential for use and impact: The dataset is easy to use and understand through a schema, data dictionary, taxonomy, ontology, or other formalism; and/or provides facilitated access through appropriate mechanisms; the potential uses of the data are discussed, including ideas for different applications, scenarios, and research questions that could be formulated or addressed based on the dataset; and
Public availability: The dataset and any tools required to access or replicate it are publicly available.

Publication of accepted papers is conditional upon registration for SBBD and online presentation during the event by at least one member of the author team.

Accepted papers will be published in SBC Open Lib, the SBC digital library, in the Proceedings of the VII Dataset Showcase Workshop (DSW) series, ISBN 978-85-7669-399-4. All papers will be assigned a DOI.

TOPICS OF INTEREST

The topics of interest of SBBD DSW are the same as those of the main conference, expanded to include current research topics from multiple areas of Computer Science and other scientific disciplines, as well as diverse contexts related to government, education, culture, economy, transportation, healthcare, climate, and urban studies.

In particular, papers are expected to present datasets that can be used in research related to (non-exhaustive list):

Scientific Applications, Data Science, and Interdisciplinarity with other Sciences, including e-Science;
Applications and Areas Related to Databases, such as data analysis and visualization, machine learning, digital libraries, data mining, information retrieval, social networks, recommender systems, information systems, Web technologies, workflows, and related topics;
Other areas related to Computing (including, but not limited to, all areas that have an Interest Group or Special Committee within SBC), and their applications such as benchmarks, baselines, ground truths, and related artifacts;
Different Types of Databases, including active databases, Web databases, data streams, string databases, document databases, cloud databases, linked data, Semantic Web and RDF, heterogeneous databases, semi-structured data, XML, mobile databases, sensor data, multidimensional databases, temporal databases, spatial and GPS data, multimedia databases, NoSQL, NewSQL, statistical databases, and related systems;
Data Engineering, including data warehouses and OLAP; authorization, privacy, anonymization, and security in databases; information integration and interoperability; data processing on novel hardware; and data provenance.

IMPORTANT DATES

Submission deadline: May 4 (Monday)
Notification of accepted papers: June 15
Final version for proceedings publication: June 30

COORDINATION

Michele A. Brandão, UFMG

Carina F. Dorneles, UFSC

Mirella M. Moro, UFMG

PROGRAM COMMITTEE

In formation.

EMAIL

sbbd.sbc@gmail.com

Suggestions?

Please fill out the form

ICMC Address

Av. Trab. São Carlense, 400 - Centro, São Carlos - SP, 13566-590

@2026, SBBD