Orbis Annotator: An Open Source Toolkit for the Efficient Annotation and Refinement of Text Corpora

Citation

Süsstrunk, Norman, Fraefel, Andreas, Weichselbraun, Albert and Brasoveanu, Adrian M.P.. (2023). Orbis Annotator: An Open Source Toolkit for the Efficient Annotation and Refinement of Text Corpora. Proceedings of the 4th Conference on Language, Data and Knowledge (LDK 2023), Vienna, Austria

Abstract

Annotated language data plays an important role in training, fine-tuning and evaluating natural language processing components. Nevertheless, manually annotating language data is still a cumbersome task. This paper presents the Orbis Annotator framework, a user-friendly, easy to install, web-based software that supports users in efficiently annotating language data. Orbis Annotator supports standard and collaborative workflows, reuse of language resources through corpus versioning, and provides built-in tools for assessing corpus quality. In addition, it offers an API which enables the use of different clients (e.g., web-based, command line, etc.) and the use of third-party tools that accelerate the annotation process by pre-annotating corpora. The paper concludes with an evaluation that compares its features to other open-source annotation frameworks and the description of two use cases that outline its use in more sophisticated settings.

Downloads and Resources

  1. Reference (BibTex)
  2. Full Article