Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Investigating evidence-oriented generation of synthetic text data with a generative large language model in science education

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Publication Date:
      2025
    • Collection:
      University College London (UCL): Fighsare
    • Abstract:
      The scarcity and privacy of student data present significant challenges in research on technical support of the diagnosis of students’ conceptions. Whilst synthetic data generation offers promising solutions, its application in science education requires careful consideration of domain-specific aspects. This study investigates an evidence-oriented prompting approach to generate synthetic text data with a generative large language model without direct injection of student data. As evidence, it builds on previous work in biology education on students’ conceptions of the evolutionary adaptation of whales. Drawing from educational research, we propose requirements for data generation and the evaluation of its faithfulness and diversity. Through a mixed-methods approach, we (1) compare characteristics of generated text samples and students’ texts, (2) evaluate how well the evidence-oriented prompting approach generates text samples that align with the categories of intentional and non-intentional adaptation, and (3) examine features between these groups. Our findings demonstrate the potential and limitations of evidence-oriented prompting and highlight the need for data curation procedures. This work contributes to generative AI-assisted research in education by investigating an educational-informed, privacy-preserving prompting approach for synthetic data generation.
    • Accession Number:
      10.6084/m9.figshare.29978415.v1
    • Online Access:
      https://doi.org/10.6084/m9.figshare.29978415.v1
      https://figshare.com/articles/journal_contribution/Investigating_evidence-oriented_generation_of_synthetic_text_data_with_a_generative_large_language_model_in_science_education/29978415
    • Rights:
      CC BY-NC-ND 4.0
    • Accession Number:
      edsbas.473AAE73