Scouting out the Border: Leveraging Explainable AI to Generate Synthetic Training Data for SDG Classification

Citation

Süsstrunk, Norman, Weichselbraun, Albert, Murk, Andreas, Waldvogel, Roger and Glatzl, André. (2024). Scouting out the Border: Leveraging Explainable AI to Generate Synthetic Training Data for SDG Classification. Proceedings of the 9th SwissText Conference, Shared Task on the Automatic Classification of the United Nations’ Sustainable Development Goals (SDGs) and Their Targets in English Scientific Abstracts, Chur, Switzerland

Abstract

This paper discusses the use of synthetic training data towards training and optimizing a DistilBERT-based classifier for the SwissText 2024 Shared Task which focused on the classification of the United Nation's Sustainable Development Goals (SDGs) in scientific abstracts. The proposed approach uses Large Language Models (LLMs) to generate synthetic training data based on the test data provided by the shared task organizers. We then train a classifier on the synthetic dataset, evaluate the system on gold standard data, and use explainable AI to extract problematic features that caused incorrect classifications. Generating synthetic data that demonstrates the use of the problematic features within the correct class, aids the system in learning based on its past mistakes. An evaluates demonstrates that the suggested approach significantly improves classification performance, yielding the best result for Shared Task 1 according to the accuracy performance metric.

Downloads and Resources

  1. Reference (BibTex)
  2. Full Article