Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Fine-tuning small and open LLMs to automate geoscience data analysis workflows: A scalable approach

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Publication Information:
      Elsevier, 2025.
    • Publication Date:
      2025
    • Collection:
      LCC:Geography. Anthropology. Recreation
      LCC:Geology
      LCC:Electronic computers. Computer science
    • Abstract:
      With the recent integration of Large Language Models (LLMs) into geoscience applications, agentic LLM-driven workflows have emerged as an innovative approach to streamline automated data analysis processes. Advanced proprietary LLMs like ChatGPT demonstrate strong performance in customized workflows due to their substantial computational resources and extensive pretraining on diverse datasets. However, deploying such workflows with commercial LLMs can incur significant costs, especially in terms of token consumption, necessitating a shift toward open-source models. In this study, we fine-tuned an open-source LLM (Llama 3.1) to handle geoscience data analysis tasks, leveraging the self-instruct method to generate synthetic training datasets. The proposed pipeline for designing LLM-driven workflows and fine-tuning open-source models using synthetic datasets enables scalability, allowing the integration of additional LLM agents to accommodate more complex tasks. Furthermore, this workflow serves as a template for researchers in other domains to develop similar solutions tailored to their specific needs. Our experimental evaluation compares the performance of ChatGPT-4o with the fine-tuned Llama 3.1 in the context of the proposed geoscience data analysis workflow. Results demonstrate that the fine-tuned open-source model achieves performance comparable to proprietary models, extending the applicability of open LLMs to domain-specific agentic workflows in data analysis.
    • File Description:
      electronic resource
    • ISSN:
      2590-1974
    • Relation:
      http://www.sciencedirect.com/science/article/pii/S259019742500093X; https://doaj.org/toc/2590-1974
    • Accession Number:
      10.1016/j.acags.2025.100311
    • Accession Number:
      edsdoj.645d2dfd8df048bba28d4697ee666bea