Fine-tuning small and open LLMs to automate geoscience data analysis workflows: A scalable approach

Item request has been placed!

Item request cannot be made.

Processing Request

Read More Add to Saved list

Author(s): Jiyin Zhang; Wenjia Li; Xiang Que; Weilin Chen; Chenhao Li; Xiaogang Ma
Source:
Applied Computing and Geosciences, Vol 28, Iss , Pp 100311- (2025)
Subject Terms:
Open LLM; Fine-tuning; Mindat; Data analytics; Data science; Geography. Anthropology. Recreation; Geology; QE1-996.5; Electronic computers. Computer science; QA75.5-76.95
Document Type:
article
Language:
English
Online Access:
https://doaj.org/article/645d2dfd8df048bba28d4697ee666bea

Additional Information
- Publication Information:
  Elsevier, 2025.
- Publication Date:
  2025
- Collection:
  LCC:Geography. Anthropology. Recreation
  LCC:Geology
  LCC:Electronic computers. Computer science
- Abstract:
  With the recent integration of Large Language Models (LLMs) into geoscience applications, agentic LLM-driven workflows have emerged as an innovative approach to streamline automated data analysis processes. Advanced proprietary LLMs like ChatGPT demonstrate strong performance in customized workflows due to their substantial computational resources and extensive pretraining on diverse datasets. However, deploying such workflows with commercial LLMs can incur significant costs, especially in terms of token consumption, necessitating a shift toward open-source models. In this study, we fine-tuned an open-source LLM (Llama 3.1) to handle geoscience data analysis tasks, leveraging the self-instruct method to generate synthetic training datasets. The proposed pipeline for designing LLM-driven workflows and fine-tuning open-source models using synthetic datasets enables scalability, allowing the integration of additional LLM agents to accommodate more complex tasks. Furthermore, this workflow serves as a template for researchers in other domains to develop similar solutions tailored to their specific needs. Our experimental evaluation compares the performance of ChatGPT-4o with the fine-tuned Llama 3.1 in the context of the proposed geoscience data analysis workflow. Results demonstrate that the fine-tuned open-source model achieves performance comparable to proprietary models, extending the applicability of open LLMs to domain-specific agentic workflows in data analysis.
- File Description:
  electronic resource
- ISSN:
  2590-1974
- Relation:
  http://www.sciencedirect.com/science/article/pii/S259019742500093X; https://doaj.org/toc/2590-1974
- Accession Number:
  10.1016/j.acags.2025.100311
- Accession Number:
  edsdoj.645d2dfd8df048bba28d4697ee666bea

Comments

No Comments.

Fine-tuning small and open LLMs to automate geoscience data analysis workflows: A scalable approach

Contact

Follow us