2026-06-24

A study published Tuesday in Discover Applied Sciences found that machine learning tools used to support geographical indications, including wine-growing regions and other origin-linked food and drink products, often perform well in internal tests but weaken when they are checked against independent spatial data.
The paper examines what the authors call “digital terroir,” a digital layer meant to connect a product’s claimed qualities to measurable environmental conditions such as soil, climate, biodiversity, land management and local practices. The researchers argue that the credibility of geographical indications increasingly depends on systems that can audit those claims with verifiable evidence.
The study was led by researchers affiliated with institutions in Brazil, including the Federal University of Sergipe, the State University of Feira de Santana and the Federal Rural University of Pernambuco. It reviewed scientific literature published between 2010 and 2025 on machine learning approaches tied to geographical indications and ecosystem-service auditing.
Using PRISMA-ScR review guidelines, the team started with 272 records and screened them with an automated weighted-score system that, according to the paper, reached 94.2% thematic accuracy. That process produced a final thematic corpus of 148 studies for descriptive, multivariate, network and meta-analytic analysis. Of those, 25 met the full methodological-quality threshold for deeper qualitative review, based on an adapted MMAT score of at least 20 and an inter-rater consistency measure, or ICC, of 0.87.
The researchers found a clear gap between strong reported accuracy in internal validation and weaker results in more demanding external checks. Across the studies reviewed, classifiers often posted internal-validation accuracy of 80% to 100%. But models without spatially independent validation showed an average performance drop of 11.8% in external robustness tests, compared with a 5.6% decline for spatially validated models. The paper reports an effect size of d = 0.95.
The authors also found broad methodological fragmentation across the field. They reported modularity of Q = 0.62 and heterogeneity of I² = 58%, figures they say point to inconsistent methods that can limit comparability and regulatory use. Compliance with FAIR data principles, which focus on making data findable, accessible, interoperable and reusable, averaged 34.2 out of 100.
According to the paper, those weaknesses create what it describes as verification asymmetries that may restrict the use of these systems by regulators or third-party auditors. The authors say machine learning models for digital terroir should move away from static classification systems and toward adaptive models that are spatially validated and explainable.
They propose integrity benchmarks for future work, including external degradation of no more than 8%, explainable AI tools that can identify territorial markers behind model decisions, and FAIR compliance of at least 60 out of 100.
The findings could matter for beverage producers that rely on geographical indications, especially wine regions but also other origin-based categories, because stronger digital auditing tools may help support sustainability and terroir claims with traceable evidence. That could reduce the risk that environmental marketing tied to place names outpaces what can actually be verified.
The article was published as open access on June 23. The authors reported no external funding and said they had no competing interests or conflicts of interest.