Abstract: The scarcity and privacy of student data present significant challenges in research on technical support of the diagnosis of students’ conceptions. Whilst synthetic data generation offers promising solutions, its application in science education requires careful consideration of domain-specific aspects. This study investigates an evidence-oriented prompting approach to generate synthetic text data with a generative large language model without direct injection of student data. As evidence, it builds on previous work in biology education on students’ conceptions of the evolutionary adaptation of whales. Drawing from educational research, we propose requirements for data generation and the evaluation of its faithfulness and diversity. Through a mixed-methods approach, we (1) compare characteristics of generated text samples and students’ texts, (2) evaluate how well the evidence-oriented prompting approach generates text samples that align with the categories of intentional and non-intentional adaptation, and (3) examine features between these groups. Our findings demonstrate the potential and limitations of evidence-oriented prompting and highlight the need for data curation procedures. This work contributes to generative AI-assisted research in education by investigating an educational-informed, privacy-preserving prompting approach for synthetic data generation.
No Comments.