The Symposium in Information and Human Language Technology (STIL) is the most important event in Brazil for researchers interested in publishing significant and novel results related to Natural Language Processing (NLP) in general (not only applied to Portuguese). Since 2023, STIL has been held as an annual event supported by the Brazilian Computer Society (SBC) and the Brazilian Special Interest Group on Natural Language Processing (CE-PLN).
In 2025, STIL will be held in Fortaleza, Ceará, Brazil, collocated with BRACIS 2025 (The 35th Brazilian Conference on Intelligent Systems), ENIAC 2025 (The 21th National Meeting on Artificial and Computational Intelligence), SBBD 2025 (40th Brazilian Symposiom on Databases) and KDMILE 2025 (The 13th Symposium on Knowledge Discovery, Mining and Learning).
STIL 2025 is composed of a main event and two sub-events, namely the X Portuguese Description Conference (JDP 2025) and the X Scientific Initiation Workshop in Information Technology and Human Language (TILic 2025).
The multidisciplinary conference covers a broad spectrum of disciplines related to Human Language Technology, such as Linguistics, Computer Science, Psycholinguistics, and Information Science. It aims to bring together academic and industrial participants working in those areas.
ICMC, University of São Paulo
Abstract
From time to time, the field of Natural Language Processing (NLP) undergoes scientific movements that significantly affect a variety of tasks, some in striking ways, others more subtly. Large language models, for instance, represent one of these recent (and resounding) developments. In this presentation, in addition to trying to offer a slightly broader view of recent developments in NLP, I will also discuss one of these movements: the Universal Dependencies initiative and the efforts related to Brazilian Portuguese. Although relatively low profile, this initiative is already supported by more than 150 languages around the world.
The Program Committee of the 16th Symposium in Information and Human Language Technology (STIL) invites submissions of original research papers for the STIL 2025 conference.
Relevant topics for STIL 2025 include, but are not limited to:
STIL 2025 accepts submissions of long and short papers. Long papers should describe finished, original, unpublished work with significant results, and will be presented orally. Short papers may report work in progress, negative results, opinion papers, or applications/demos, and will be presented as posters.
All papers submitted to STIL must be written in Portuguese, English or Spanish.
Long papers may have up to ten (10) pages of content (including tables and figures), and unlimited pages of references. Short papers should have up to six (6) pages of content and unlimited pages of references. Authors will be asked whether they agree to have their long paper relocated as a poster if reviewers recommend it.
Paper formatting must follow the SBC guidelines available at the SBC website and also in Overleaf .
All papers submitted to STIL will be reviewed by 3 experts in the field. The reviewing process will be double-blind, and therefore, papers should not contain any information regarding their authorship in the header or body of the text. Self-references that reveal the author’s identities must be avoided. For example, instead of “As we previously showed (Silva, 2005) …” authors should use “Silva (2005) previously showed …”.
By submitting papers to STIL 2025, all authors agree that at least one of them will register for the conference and present the paper in case of acceptance. This registration must take place before the deadline for the camera-ready version of the paper and must be made in the category established by the organization.
The following rules and guidelines are intended to protect the integrity of the double-blind review and ensure that submissions are reviewed fairly. The rules refer to the period of anonymity, which goes from 1 month before the submission deadline until the date your work is accepted or rejected. Works withdrawn during this period will no longer be subject to these rules.
Please note that while you are not prohibited from making a non-anonymous version available online prior to the start of the anonymity period, this makes the double-blind review more difficult to maintain and therefore we encourage you to wait until the end of the anonymity period.
Long and short papers should be submitted as PDF files via the JEMS system (https://jems3.sbc.org.br/events/346) by the deadline indicated above.
For inquiries about the conference, please send an email to: msouza1@ufba.br or rkofreitag@academico.ufs.br
A Moving Target: Detecting Concept Drift in Brazilian Portuguese Fake News – Manuela Guedes Wanderley (USP/ICMC, Brazil), Lucca Ferraz (USP, Brazil), Tiago A. Almeida (UFSCAR, Brazil), Renato Moraes Silva (ICMC / University of São Paulo, Brazil)
A música brasileira na ditadura militar: uma análise de tópicos com BERTopic e GSDMM – Henry Piceni (UFRGS, Brazil), Pedro Vitor (UFRGS, Brazil), Dennis Balreira (UFRGS, Brazil)
A sintaxe no tribunal: apresentando e explorando um corpus jurídico em português anotado sintaticamente segundo o modelo Universal Dependencies – Lucelene Lopes (USP/ICMC, Brazil), Maria das Graças Volpe Nunes (USP/ICMC, Brazil), Magali Duran (USP/ICMC, Brazil), Thiago Pardo (USP/ICMC, Brazil)
A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification – Ana Begnini (Instituto de Pesquisas Eldorado, Brazil), Matheus Vicente (Eldorado Research Institute, Brazil), Leonardo Souza (Instituto de Pesquisas Eldorado, Brazil)
Adapting ASR Models to Technical Scenarios: A Case Study in the Brazilian Automotive Repair Domain – Daniel Ribeiro da Silva (UFG, Brazil), Maria Eduarda Silva Borba (UFG, Brazil), Gustavo Oliveira (UFG, Brazil), Pedro Reis Pimenta (UFG, Brazil), Guilherme Correia Dutra (CEIA, Brazil), Állan Christoffer Pereira Silva (UFG, Brazil), Sávio Teles (UFG, Brazil)
AI-PAVE-Br: Leveraging Large Language Models for Enhanced Product Attribute Value Extraction through a Golden Set Approach – Murilo Gleyson Gazzola (Industry, Brazil), Hugo Gobato Souto (Luizalabs, Brazil)
Aprendizado Profundo para Detecção de Movimentos Retóricos – Bruno Vinicius Veronez de Jesus (UNESP, Brazil), Arnaldo Candido Junior (UTFPR, Brazil)
Automated Fact-Checking in Brazilian Portuguese: Resources and Baselines – Marcelo Mussi Delucis (PUCRS, Brazil), Lucas Fraga (PUCRS, Brazil), Otávio Parraga (PUCRS, Brazil), Christian Mattjie (PUCRS, Brazil), Rafaela Ravazio (PUCRS, Brazil), Rodrigo C. Barros (PUCRS, Brazil), Lucas Silveira Kupssinskü (PUCRS, Brazil)
Avaliação de eficiência na leitura: uma abordagem baseada em PLN – Túlio Gois (UFS, Brazil), Raquel Freitag (UFS, Brazil)
Benchmarking Large Language Models for Text-to-SQL in Brazilian Portuguese and English – Luís Felippe Coutinho de Carvalho (IFES, Brazil), Paulo Sérgio Santos Júnior (IFES, Brazil), Hilário Tomaz Alves de Oliveira (IFES, Brazil)
Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models – Gustavo Bonil (UNICAMP, Brazil), João Gondim (UNICAMP, Brazil), Marina dos Santos (UNICAMP, Brazil), Simone Hashiguti (None, None), Helena Maia (UNICAMP, Brazil), Nádia Félix Felipe da Silva (UFG, Brazil), Helio Pedrini (UNICAMP, Brazil), Sandra Avila (UNICAMP, Brazil)
Corpus Memórias Paroquiais: Avanços em Reconhecimento de Entidades – Renata Vieira (Universidade de Évora, Portugal), Helena Cameron (CIDEHUS-UE, Portugal), Joaquim Santos (PUCRS, Brazil), Fernanda Olival (CIDEHUS, Portugal)
CROSSAGE: A cross-attentional graph and Transformer architecture for skill and knowledge recognition in job descriptions – Antônio Ramos (Universidade de Pernambuco, Brazil), Byron Leite Dantas Bezerra (Universidade de Pernambuco, Brazil), João Paulo Felix (None, None), Cleyton Mário de Oliveira Rodrigues (Universidade de Pernambuco, Brazil), Wylliams Santos (UPE, Brazil)
DEBISS-Arg – An In Depth Data Annotation Protocol and Corpus for Argument Mining in Semi Structured Debates – David Pereira (UFCG, Brazil), Daniela Thuaslar (UFCG, Brazil), Cláudio Elízio Calazans Campelo (UFCG, Brazil)
Diplomatrix-BR: Um Corpus Paralelo de Redações de Autoria Humana e de LLMs no Concurso de Diplomacia Brasileira – Rodrigo Cavalcanti João (UFF, Brazil), Gabriela Casini (UFF, Brazil), Gabriel Assis (UFF, Brazil), Livy Real (IComp/UFAM, Brazil), Daniela Vianna (None, None), Paulo Mann (UFRJ, Brazil), Aline Paes (UFF, Brazil)
Empirical Evaluation of Preprocessing and Balancing Techniques Impact Across Algorithm-Vectorizer Combinations in Sentiment Classification – Nathanael Motta (Universidade de Pernambuco, Brazil), Ana Souza (Universidade de Pernambuco, Brazil), Carlo Marcelo Revoredo da Silva (UPE, Brazil), Cleyton Mário de Oliveira Rodrigues (Universidade de Pernambuco, Brazil)
Enhancing a Nheengatu Morphosyntactic Analyzer for Word Formation and Non-standard Language – Leonel Figueiredo de Alencar (UFC, Brazil)
Evaluating Domain-Specialized LLMs in Multi-Agent RAG for Enterprise Retrieval – Vinicius Aguiar (UFG, Brazil), Leonardo Afonso Amorim (Federal University of Goias, Brazil), Artur Matos (UFG, Brazil), Gustavo Bueno (UFG, Brazil), Sávio Teles (UFG, Brazil), Arlindo Rodrigues Galvão Filho (UFG, Brazil), Anderson Soares (UFG, Brazil), Tales Figueiredo (CEMIG, Brazil), Carlos Sousa (None, None)
Evaluating Large Language Models through Multidimensional Item Response Theory: A Comprehensive Case Study on ENEM – Leonardo Taschettto (UFSC, Brazil), Renato Fileto (UFSC, Brazil)
Evaluating RAG-based QA Systems: A Comparative Analysis of LLM as a Judge, Traditional Metrics, and Human Alignment – Renato Miyaji (Visagio, Brazil), Renato Moulin (None, None), Leonardo Machado (None, None), Samuel Monção (None, None)
Evaluation of an NLP-Based Chatbot for Informational Support in Bronchopulmonary Dysplasia (BPD) in neonates – Anna Beatriz Silva (Universidade de Pernambuco, Brazil), Cleyton Mário de Oliveira Rodrigues (Universidade de Pernambuco, Brazil), Patricia Takako Endo (Universidade de Pernambuco, Brazil)
Fine-tuned model evaluation on Transformer Fragments for Identifying Idiomatic Expressions in Portuguese – Ricardo Gomes de Oliveira (UFBA, Brazil), Laila Pereira Mota (UFBA, Brazil), Lilian Sousa (UFBA, Brazil), Daniela Barreiro Claro (Federal University of Bahia, Brazil), Marcos Adriano P. Santos (None, None)
From Zero-shot to Self-generated References: Leveraging LLMs for Scoring ENEM Essays – Matheus Yasuo Ribeiro Utino (USP/ICMC, Brazil), Paulo Mann (UFRJ, Brazil)
Gender Bias in Portuguese Literary Texts: A Masked Language Model Approach – Mariana O. Silva (UFMG, Brazil), Michele Brandão (DCC/UFMG, Brazil), Mirella M. Moro (UFMG, Brazil)
GolpeBR: Construction and Validation of an Annotated Dataset on Banking Scams and Fraud – Tamyres Vial de Souza (UFMT, Brazil), Jhonata Tirloni (UFMT, Brazil), Felipe Belo (UFMT, Brazil), Nelcileno Araújo (UFMT, Brazil), Thiago M Ventura (UFMT, Brazil), Allan Gonçalves de Oliveira (UFMT, Brazil)
How Faithful Are Your Summaries? A Study of NLI-Based Verification in Portuguese – Felipe Paula (UFRGS, Brazil), Matheus Westhelle (UFRGS, Brazil), Maria Cecília Corrêa (UFRGS, Brazil), Luciana Bencke (UFRGS, Brazil), Viviane Moreira (UFRGS, Brazil)
Impacto do Idioma no Desempenho de Algoritmos de Classificação de Texto: Um Estudo entre Português e Inglês – Jorge Pavão (CEFET/RJ, Brazil), Kele Teixeira Belloze (CEFET/RJ, Brazil), Gustavo Guedes (CEFET/RJ, Brazil)
Improving Pun Detection with an Ensemble of Traditional Machine Learning Methods – Jhúlia Leal (None, None), Márcio Inácio (CISUC, Portugal), Hugo Gonçalo Oliveira (Universidade de Coimbra, Portugal), Rafael Anchiêta (IFPI, Brazil)
Knowledge Distillation in Compact Models: An Approach Applied to Text Processing for Public Security – Ricardo Barcelar (UFMT, Brazil), Leonardo Arruda Vilela Garcia (UFMT, Brazil), Alan Papafanurakis Heleno (UFMT, Brazil), Thiago M Ventura (UFMT, Brazil), Allan Gonçalves de Oliveira (UFMT, Brazil)
LattesRex: Building ChatBots for Semi-Structured Documents – Lucas Darcio (IComp/UFAM, Brazil), Livy Real (IComp/UFAM, Brazil), Amanda Nicole Silveira Spellen (IComp/UFAM, Brazil), Karina Soares dos Santos (Serasa Experian, Brazil), Esther Soares (None, None), Altigran Soares da Silva (Ufam, Brazil)
Learning with Few: A Comparative Study of Multilingual Text Anomaly Detection – Fabio Masaracchia Maia (USP, Brazil), Anna Helena Reali Costa (USP, Brazil)
Machine Learning Classifiers with Acoustic Features for Prosodic Segmentation in Brazilian Portuguese: A Comprehensive Evaluation – Giovana Meloni Craveiro (USP/ICMC, Brazil), Caroline Adriane Alves (None, None), Flaviane Svartman (USP, Brazil), Sandra M. Aluísio (USP/ICMC, Brazil)
Meta4BR: Avaliando a Fidelidade Metafórica em Traduções de Metáforas para o Português por LLMs – Luisa Stellet (UFF, Brazil), Isabella Leite Pereira da Silva (None, None), Gabriel Assis (UFF, Brazil), Aline Paes (UFF, Brazil)
MOPrompt: Multi-objective Semantic Evolution for Prompt Optimization – Sara Pinheiro Camara (UFOP, Brazil), Valéria de Carvalho Santos (UFOP, Brazil), Ivan Meneghini (IFMG, Brazil), Eduardo Jose da Silva Luz (UFOP, Brazil), Gladston Moreira (UFOP, Brazil)
NounBank.DS: a Lexical Repository of Nominal Frames from Stock Market Tweets in Brazilian Portuguese – Bryan Khelven Barbosa (UFSCAR, Brazil), Ariani Di Felippo (UFSCAR, Brazil)
PetroGeoNER: A refined and unified dataset for NER in the Oil & Gas domain – Higor Moreira (UFRGS, Brazil), Patricia Ferreira da Silva (Petrobras, Brazil), Renata Vieira (Universidade de Évora, Portugal), Viviane Moreira (UFRGS, Brazil)
RAISE: Reasoning Agent for Interactive SQL Exploration – Fernando Fortes Granado (UNICAMP, Brazil), Jayr Alencar Pereira (UNICAMP, Brazil), Roberto Lotufo (UNICAMP, Brazil)
Restauração de Pontuação em Textos Traduzidos no Idioma pt-BR a partir de Transcrição de Áudios – Angel Sales (IFAM, Brazil), Brenda Moura (IFAM, Brazil), José Elislande Breno de Souza Linhares (IFAM, Brazil), Fabiann Matthaus Barbosa (IFAM, Brazil)
Syntactic Analysis in Transformers with Attention Heads – Ricardo Gomes de Oliveira (UFBA, Brazil), Daniela Barreiro Claro (UFBA, Brazil), Rerisson Cavalcante (None, None)
Techniques for Dealing with Imbalanced Data: A Systematic Literature Review – Leandro Oliveira da Silva (UNIFESP, Brazil), Daniela Freire (None, None), Márcio Porto Basgalupp (UNIFESP, Brazil), André Ponce de Leon F de Carvalho (ICMC, Brazil)
Towards a Corpus Methodology for LLMs in the Legal Domain – Lucas Mota (UFBA, Brazil), Aline Athaydes (UFBA, Brazil), Fernando Humberto (None, None), Daniela Barreiro Claro (UFBA, Brazil), Samuel Rios (UFBA, Brazil), Babacar Mane (UFBA, Brazil), Andressa Lisboa (None, None), Marlo Souza (UFBA, Brazil)
When Annotators Disagree: A Controlled Evaluation of Gender Bias in Sentiment Analysis Using Synthetic Datasets – Erica Carneiro (CEFET/RJ, Brazil), Alexander Feitosa (None, None), Gustavo Guedes (CEFET/RJ, Brazil)
Anotação de Narrativas Clínicas aos moldes das Dependências Universais – Carlos Antônio de Souza Perini (UFMG, Brazil), Cristiano da Silveira Colombo (IFES, Brazil), Claudia Benevenute (IFES, Brazil), Adriana Pagano (UFMG, Brazil)
Análise de tópicos e sentimentos em cartas indígenas brasileiras – Caio Sacramento de Britto Almeida (UFBA, Brazil), Renata Vieira (Universidade de Évora, Portugal), Débora Abdalla (UFBA, Brazil)
Classificação de Notícias Falsas na Língua Portuguesa Utilizando Modelos Baseados na Arquitetura Transformer – Lucas Pellegrini (UFU, Brazil), Fernanda Maria da Cunha Santos (UFU, Brazil), Felipe Harrison (UFU, Brazil)
DEBISS: a Corpus of Individual, Semi-structured and Spoken Debates – Klaywert de Souza (UFCG, Brazil), David Pereira (UFCG, Brazil), Larissa Lucena Vasconcelos (IFPB, Brazil), Cláudio Elízio Calazans Campelo (UFCG, Brazil)
Estratégias de modelização de dicionários latim-português como Linked Open Data – Lucas Dezotti (Universidade Federal da Paraíba UFPB, Brazil)
Frame-Based Semantic Representation and Similarity Analysis in Audio Description Scripts – Maucha Andrade Gamonal (UFJF, Brazil), Adriana Pagano (UFMG, Brazil), Tiago Timponi Torrent (UFJF, Brazil), Ely Edison Matos (UFJF, Brazil)
Modelo de Classificação Automática de Frases Faladas com Abordagem em Redes Neurais Convolucionais – Cid Ivan Costa Carvalho (UFERSA, Brazil), Francisca Ticiany Barbosa Lopes de Oliveira (UERN, Brazil), Vitória Maria Albuquerque Silva (UFERSA, Brazil)
The Zé Lensky Dataset: A Brazilian Portuguese Twitter Corpus for Russo-Ukraine War Stance and Sentiment Analysis – Andreis Purim (UNICAMP, Brazil), Karlis Kuskevics (N/A, United States of America)
Tokens, Embeddings, and Context: Integrating Natural Language Processing and Discourse Analysis – Mayara Sousa Miguel (USP, Brazil)
Towards Prompt Engineering and Large Language Models for Post-OCR correction in handwritten texts – Sávio Santos (Universidade de Pernambuco, Brazil), Byron Leite Dantas Bezerra (Universidade de Pernambuco, Brazil), Arthur Neto (Universidade de Pernambuco, Brazil)
Use of Embodied Conversational Agents to Engage Visitors in Art Exhibits – José Fernando Rodrigues Ferreira Neto (University of Fortaleza, Brazil), Yuri Nekan Soares Fontes (Noex, Brazil), Daniel Ribeiro (None, None), Daniel Colares (None, None), Vládia Pinheiro (Unifor, Brazil)
