Investigating evidence-oriented generation of synthetic text data with a generative large language model in science education

Item request has been placed!

Item request cannot be made.

Processing Request

Read More Add to Saved list

Author(s): Judith Stanja; Sarah Dannemann; Johannes Krugel; Anett Hoppe
Subject Terms:
Genetics; Biotechnology; Science Policy; Plant Biology; Biological Sciences not elsewhere classified; Information Systems not elsewhere classified; Synthetic text data; artificial intelligence; interdisciplinary/transdisciplinary/convergent; alternative conception
Document Type:
article in journal/newspaper
Language:
unknown

Additional Information
- Publication Date:
  2025
- Collection:
  University College London (UCL): Fighsare
- Abstract:
  The scarcity and privacy of student data present significant challenges in research on technical support of the diagnosis of students’ conceptions. Whilst synthetic data generation offers promising solutions, its application in science education requires careful consideration of domain-specific aspects. This study investigates an evidence-oriented prompting approach to generate synthetic text data with a generative large language model without direct injection of student data. As evidence, it builds on previous work in biology education on students’ conceptions of the evolutionary adaptation of whales. Drawing from educational research, we propose requirements for data generation and the evaluation of its faithfulness and diversity. Through a mixed-methods approach, we (1) compare characteristics of generated text samples and students’ texts, (2) evaluate how well the evidence-oriented prompting approach generates text samples that align with the categories of intentional and non-intentional adaptation, and (3) examine features between these groups. Our findings demonstrate the potential and limitations of evidence-oriented prompting and highlight the need for data curation procedures. This work contributes to generative AI-assisted research in education by investigating an educational-informed, privacy-preserving prompting approach for synthetic data generation.
- Accession Number:
  10.6084/m9.figshare.29978415.v1
- Online Access:
  https://doi.org/10.6084/m9.figshare.29978415.v1
  https://figshare.com/articles/journal_contribution/Investigating_evidence-oriented_generation_of_synthetic_text_data_with_a_generative_large_language_model_in_science_education/29978415
- Rights:
  CC BY-NC-ND 4.0
- Accession Number:
  edsbas.473AAE73

Comments

No Comments.

Investigating evidence-oriented generation of synthetic text data with a generative large language model in science education

Contact

Follow us