[cdif-community] Linking variable description to machine-ready datasets
Asha CODATA
asha at codata.org
Wed Oct 22 08:36:08 EDT 2025
There is currently a lot of interest in how we maximise increase the
effectiveness of datasets for AI training and fine-tuning through metadata.
This led to the development of the ML Croissant
<https://mlcommons.org/working-groups/data/croissant/> metadata
specification, “an open community-built standardized metadata vocabulary
for ML datasets, including key attributes and properties of datasets, as
well as information required to load these datasets in ML tools. Croissant
enables data interoperability between ML frameworks and beyond, which makes
ML work easier to reproduce and replicate.”
In turn, there is also great potential for AI tools to enhance metadata and
to assist in establishing semantic mappings. A session at IDW recently
explored these issues: ‘AI for Metadata Enhancement, Metadata for AI
Readiness: how do we ensure a virtuous rather than a vicious circle?’
<https://scidatacon.org/event/9/contributions/36/> In this session, and in
a plenary session on AI for Science
<https://scidatacon.org/event/9/contributions/233/>, Slava Tykhonov, Head
of Interoperability and AI at CODATA presented his work on Semantic
Croissant, an extension of ML Croissant that is powered by the CODATA
Cross-Domain Interoperability Framework (CDIF) <https://cdif.codata.org/>.
This knowledge graph, maintained at the variable level, is designed to
guide and navigate AI through structured expert knowledge. The crucially
important steps is to incorporate CDIF’s use of the DDI-CDI
<https://ddialliance.org/ddi-cdi> variable description providing a
rich semantic
description of the core feature of the dataset: the observable property
that was measured or described. This has benefits for interoperability and
reuse of data and for its effectiveness in the training of AI models. It
also helps situate the variable description as a first-class semantic
object, which is one of the key purposes of CDIF.
There is considerable interest in this work. Slava has been invited to give
a keynote to the NFDI4DataScience Conference
<https://www.nfdi4datascience.de/news/2025/202504_conference2025/>, taking
place on 25–26 November 2025 at Fraunhofer FOKUS
<https://www.fokus.fraunhofer.de/en.html> in Berlin. The NFDI4DS initiative
<https://www.nfdi4datascience.de/> aims to build and sustain a national
research data infrastructure for the Data Science and Artificial
Intelligence community in Germany – an exciting step toward more
interoperable, transparent, responsible and FAIR AI.
Slava has also been invited to speak at as CESSDA AI Workshop, as part of
the 4-day “CESSDA at 50” conference in Bergen, 15-18 June 2026. This
50th-anniversary event will bring together CESSDA Service Providers,
researchers, policy actors, partner organisations, and international
networks to share knowledge and address the evolving landscape of research
and innovation.
Thanks,
Asha
___________________________
*CDIF and AI Sessions at IDW2025*
<https://codata.org/initiatives/making-data-work/cdif/cdif-at-idw2025/>
*CODATA and the Australian Research Data Commons (ARDC) announce the
updated **2025 CODATA Research Data Management Terminology (RDMT)*
<https://codata.org/codata-and-the-australian-research-data-commons-ardc-announce-the-updated-2025-codata-research-data-management-terminology-rdmt/>
*From launch to action:** operationalising UNESCO’s open science data
policies guidance for crises*
<https://codata.org/from-launch-to-action-operationalising-unescos-open-science-data-policies-guidance-for-crises/>
*First Climate-Adapt for EOSC Deliverable D1.1: **Requirement Analysis and
CLIMATE-ADAPT4EOSC potentialities* <https://doi.org/10.5281/zenodo.17244500>
September 2025 publications
<https://codata.org/read-now-september-2025-publications-in-the-data-science-journal/>
in
the CODATA Data Science Journal <https://datascience.codata.org/>
*Stay in touch with CODATA:*
Stay up to date with CODATA activities: join the CODATA International News
list
<http://lists.codata.org/mailman/listinfo/codata-international_lists.codata.org>
Looking for training and career opportunities in data science and data
stewardship? Sign up to the CODATA early career community-run data
science training and careers list
<http://lists.codata.org/mailman/listinfo/data_science_training_lists.codata.org>
Follow us on social media! Bluesky
<https://bsky.app/profile/codata-isc.bsky.social> - LinkedIn
<https://www.linkedin.com/in/simon-hodson-b3711a11/>
___________________________
Asha Law | Program Assistant, CODATA | http://www.codata.org
E-Mail: asha at codata.org
Tel (Office): +33 1 45 25 04 96
CODATA (Committee on Data of the International Science Council), 5 rue
Auguste Vacquerie, 75016 Paris, FRANCE
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251022/7fe03e77/attachment.htm>
More information about the cdif-community
mailing list