Consolidating Heterogeneous Enterprise Data for Named Entity Linking and Web Intelligence

Citation

[weichselbraun2015, self.bib] Weichselbraun, Albert, Streiff, Daniel and Scharl, Arno (2015). ''Consolidating Heterogeneous Enterprise Data for Named Entity Linking and Web Intelligence'', International Journal on Artificial Intelligence Tools, 24(2)

Abstract

Linking named entities to structured knowledge sources paves the way for state-of-the-art Web intelligence applications which assign sentiment to the correct entities, identify trends, and reveal relations between organizations, persons and products. For this purpose this paper introduces Recognyze, a named entity linking component that uses background knowledge obtained from linked data repositories, and outlines the process of transforming heterogeneous data silos within an organization into a linked enterprise data repository which draws upon popular linked open data vocabularies to foster interoperability with public data sets. The presented examples use comprehensive real-world data sets from Orell F├╝ssli Business Information, Switzerland's largest business information provider. The linked data repository created from these data sets comprises more than nine million triples on companies, the companies' contact information, key people, products and brands. We identify the major challenges of tapping into such sources for named entity linking, and describe required data pre-processing techniques to use and integrate such data sets, with a special focus on disambiguation and ranking algorithms. Finally, we conduct a comprehensive evaluation based on business news from the New Journal of Zurich and AWP Financial News to illustrate how these techniques improve the performance of the Recognyze named entity linking component.

Keywords: linked open data, linked enterprise data, named entity linking, named entity resolution, business news, Web intelligence, data pre-processing, data consolidation

Downloads and Resources

  1. Reference (BibTex)
  2. Full Article