Big Text: from Language to Knowledge
News, social media, web sites, and enterprise sources produce huge amounts of valuable contents in the form of text and speech. To tap this wealth of unstructured Big Data and obtain insights, a decisive step is to identify the entities that are referred to and relationships between entities. This allows linking unstructured contents with structured data. However, this step faces the fundamental problem that names and phrases are often highly ambiguous; mapping them to entities and relations is a challenging task. The talk will discuss the state of the art, applications, and open problems on disambiguating named entities in text and heterogeneous tables. It will also put this line of research in perspective to the bigger picture of Big Data analytics.
Gerhard Weikum is a Scientific Director at the Max Planck Institute for Informatics in Saarbruecken, Germany, and also an Adjunct Professor at Saarland University. He graduated from the University of Darmstadt, Germany. Weikum’s research spans transactional and distributed systems, self-tuning database systems, DB&IR integration, and the automatic construction of knowledge bases from Web and text sources. He co-authored a comprehensive textbook on transactional systems, received the VLDB 10-Year Award for his work on automatic database tuning, and is one of the creators of the YAGO knowledge base. Gerhard Weikum is an ACM Fellow, a Fellow of the German Computer Society, and a member of several European and German academies. He has served on various editorial boards, including Communications of the ACM, and as program committee chair of conferences like ACM SIGMOD, IEEE Data Engineering, and CIDR. From 2003 through 2009 he was president of the VLDB Endowment. He received the ACM SIGMOD Contributions Award in 2011 and has won an ERC Synergy Grant in 2013.