Homo sapiens (Latin: "wise human") is a born knowledge seeker. While modern AI-powered search engines have revolutionized knowledge-seeking by responding to our queries in a manner that resembles natural conversation, systems that understand our knowledge-seeking needs and "take us by the hand" in navigating online knowledge are yet to be realized.
The work presented in this thesis takes the first step in the direction of realizing the next generation of information systems. Specifically, we devise a framework for modeling and enhancing human knowledge navigation in online platforms and make two major contributions. First, we develop methods for understanding and modeling human navigation on Wikipedia, the largest platform for open knowledge. Second, we devise methods and tools for mitigating content and structural knowledge gaps, thereby facilitating improvements in human knowledge navigation behavior. Overall, the methodological contributions of this thesis are organized into three parts, Parts II, III, and IV, respectively.
In Part II, we describe an information-theoretic measure for understanding the underlying dynamics of human knowledge navigation on Wikipedia. Surprisingly, we find that the majority of human navigation on Wikipedia is Markovian, and leveraging these insights, devise the first large-scale privacy-preserving model for synthesizing human-like navigation traces by relying solely on aggregate data.
In Part III, we present methods for mitigating content gaps and gaps in knowledge stores, which improve the knowledge organization of Web corpora, thereby facilitating improvements in knowledge navigation as a by-product. Focusing on enriching the textual content by grounding concepts to knowledge bases, we devise EIGENTHEMES, the first truly unsupervised entity linker that relies solely on the availability of entity names and a referent knowledge base. In order to improve knowledge bases themselves, we devise PARIS+, a probabilistic model capable of performing entity alignment for Web-scale knowledge stores on commodity hardware.
Finally, in Part IV, we present methods for mitigating structural gaps, which explicitly impact the link structure of knowledge sources and therefore provide direct enhancements to knowledge navigation. We first describe a framework to assess the causal impact of structural gaps and present methods for mitigating them. Next, to support human editors in effectively integrating new entities in linked textual corpora on the Web, we devise LOCEI, a framework to perform localized entity insertions.
We conclude by discussing the implications of our findings and presenting future research opportunities enabled by our contributions.
EPFL_TH9910.pdf
main document
openaccess
N/A
12.62 MB
Adobe PDF
7aff19f0d3748e163f9fb660cb5de34b