An Efficient Workflow Towards Improving Classifiers in Low-Resource Settings with Synthetic Data

Citation

Brasoveanu, Adrian M. P., Weichselbraun, Albert, Nixon, Lyndon and Scharl, Arno. (2024). An Efficient Workflow Towards Improving Classifiers in Low-Resource Settings with Synthetic Data. Proceedings of the 9th SwissText Conference, Shared Task on the Automatic Classification of the United Nations’ Sustainable Development Goals (SDGs) and Their Targets in English Scientific Abstracts, Chur, Switzerland

Abstract

The correct classification of the 17 Sustainable Development Goals (SDG) proposed by the United Nations (UN) is still a challenging and compelling prospect due to the Shared Task’s imbalanced dataset. This paper presents a good method to create a baseline using RoBERTa and data augmentation that offers a good over- all performance on this imbalanced dataset. What is interesting to notice is that even though the alignment between synthetic gold and real gold was only marginally better than what would be expected by chance alone, the final scores were still okay.

Downloads and Resources

  1. Reference (BibTex)
  2. Full Article