Identifying all names that refer to a particular set of named entities is a challenging task, as quite
often we need to consider many features that include a lot of variation like abbreviations, aliases,
hypocorism, multilingualism or partial matches. Each entity type can also have specific rules for
name variances: people names can include titles, country and branch names are sometimes removed
from organization names, while locations are often plagued by the issue of nested entities. The lack
of a clear strategy for collecting, processing and computing name variants significantly lowers the
recall of tasks such as Named Entity Linking and Knowledge Base Population since name variances
are frequently used in all kind of textual content.
This paper proposes several strategies to address these issues. Recall can be improved by
combining knowledge repositories and by computing additional variances based on algorithmic
approaches. Heuristics and machine learning methods then analyze the generated name variances
and mark ambiguous names to increase precision. An extensive evaluation demonstrates the effects
of integrating these methods into a new Named Entity Linking framework and confirms that
systematically considering name variances yields significant performance improvements.
Keywords: Named Entity Linking, Name Variance, Machine Learning, Linked Data