[cdif-community] Planning CDIF use for plant phenotyping data

Mon Nov 10 22:36:10 EST 2025

Thanks so much, Arofan.

I'll get my head around this and then come back to you. This UML seems to capture the structure. I was a little surprised to see that the has property is so generic and collects up functionally different parts of the DataStructure (components and key) and SegmentLayout (mappings and positions), although the semantics can clearlt be inferred from the associated object. Should the Positions include an index or is everything inferred from the sequence in the list of has objects?

Looking forward to next week.

Donald

[cid:081ea903-c696-4ded-b0e6-09900499d421]

Donald Hobern
Data Management Director, Australian Plant Phenomics Network
University of Adelaide - working from Canberra, ACT

P (04) 20511471   |   plantphenomics.org.au<http://www.plantphenomics.org.au/>   |   subscribe to our news<https://www.plantphenomics.org.au/news/#news-from-our-blog>

[cid:472fbc15-ba51-4078-9145-77cf4f212e3b]
APPN acknowledges the Traditional Custodians of Country throughout Australia and their connections to land, sea and community. We pay our respect to their Elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.
The Australian Plant Phenomics Network (APPN) is supported by the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS<https://www.education.gov.au/national-collaborative-research-infrastructure-strategy-ncris>)
APPN National Head Office at the University of Adelaide<https://www.thewaite.org/> (UoA - CRICOS provider number 00123M). This email (and any attachment) is confidential and may also be privileged or otherwise exempt from disclosure. It is intended only for the addressee. If you are not the intended recipient, please delete it and do not send it on, copy it or disclose its contents. No assurance is given about the security of information sent electronically. Think green and read on the screen.
________________________________
From: Arofan Gregory <arofan at codata.org>
Sent: Monday, 10 November 2025 9:58 PM
To: Donald Hobern <donald.hobern at adelaide.edu.au>
Cc: cdif-community at lists.codata.org <cdif-community at lists.codata.org>
Subject: Re: [cdif-community] Planning CDIF use for plant phenotyping data

CAUTION: External email. Only click on links or open attachments from trusted senders.

________________________________
Donald:

I promised to send along a simple example of DDI-CDI, using the CDIF profile. Please find one in the attached ZIP.  Apologies for taking so long, but you know how this goes(!)

(1) This file does not contain any coded variables - only string, numeric, and date
(2) The structure of the JSON-LD is currently under discussion, so the RDF being conveyed is correct, but the nesting in a CDIF Schema.org context might change how the JSON-LD is structured. This will be discussed at the dagsyuhl workshop.

Cheers,

Arofan

On Fri, Oct 24, 2025 at 8:54 AM Arofan Gregory <arofan at codata.org<mailto:arofan at codata.org>> wrote:

Donald:

First, to answer your specific questions, and second to make some general observations.

Question 1 (Variable cascade in RDF):

There is no reason you should not instantiate the Represented and Instance Variables as separate objects with their own URIs. While you might want to collapse these into a simple set of Instance Variables (which simultaneously act as Represented Variables for ad hoc reuse, and can be doubly typed) for a one-off publication of a data set, in cases where the data structure is not reused, in your case the best approach is to have the reusable variables be clearly identified stand-alone Represented Variables, to which Instance Variables have a "uses" relationship. They are maintained separately and have their own identity (and URI).

Question 2:

I will prepare a simple example for you. I will send this along presently.

Question 3:

I have not yet seen a dual use of DCAT and DDI-CDI, but they may exist. There is one person - Pascal Heus - who has been doing some development of Python libraries for DDI-DI and DCAT both, and he may have examples. I can check for you and tell you at Dagstuhl (he will be attending the workshop the week before.) The "physical data set" in DDI-CDI is really the equivalent of a DCAT distribution, and what we describe in DDI-CDI is the distribution, because this is what we assign access rights to. The DDI-CDI "data set" is the equivalent of a DCAT one: it has the same logical contents, but may have differences in structure and format across different distributions. The DDI-CDI "data store" is a repository of logical records, which may produce many different data sets. There is no directly corresponding object in DCAT.

Note that in the week prior to the Provenance workshop at Dagstuhl we will be working with some folks from W3C (including the people from the group which produced DCAT) to look at how the variable cascade might be publishable as a W3C recommendation, similar to SOSA/SSN. The model for this would be DDI-CDI, so the approach you are taking is in line with future developments.

General Observations:

The value of CDIF is that it does not require the use of domain-specific standards. What you have described is a rich set of domain-level agreements for data sharing - CDIF in no way is intended to replace that. The thinking is that if a domain has such a standard, that domain standard can be mapped into an equivalent CDIF form, and that domain-external form is used to provide the resources to other adjacent domains or infrastructures. The CDIF4XAS OSCARS Project is an excellent example of that. Even though they use Schema.org rather than DCAT, the general mapping approach from their community standards to CDIF is in line with this vision. We just published the first (in-progress) stage of that mapping:

Zenodo: https://zenodo.org/records/17421917<https://zenodo.org/records/17421917>

GitHub: https://github.com/CDIF-4-XAS/XAS-CDIF<https://github.com/CDIF-4-XAS/XAS-CDIF>

A lot of what I see in your diagram falls into the category of "Context" for CDIF. I would ask that you come prepared to use your use case as an example for the work at Dagstuhl - we have been playing with PROV and some other standards (I-ADOPT) for describing these sorts of information, and there were also some interesting explorations done during WorldFAIR in the Clinical Trials space using Schema.org (https://zenodo.org/records/7887385<https://zenodo.org/records/7887385>). The XAS project is also running into some of the same requirements for expressing some of the information about data sources, experiments, etc. This would be an excellent use case for the coming workshop.

I look forward to seeing you there and talking more about this.

Cheers,

Arofan

On Fri, Oct 24, 2025 at 1:53 AM Donald Hobern <donald.hobern at adelaide.edu.au<mailto:donald.hobern at adelaide.edu.au>> wrote:
I'm going to follow up here with some more details, which may or may not clarify my issues. Here is the pattern we've been planning to use:

  1.
Assume one of the APPN nodes has a variable it standardly uses in the Study datasets it publishes, something like dry biomass as a miappe:Trait measured in g/m2 as a miappe:Scale.
  2.
The node publishes a miappe:ObservedVariable on a resolvable IRI that provides the definition, including properties documenting the Trait and Scale - we're developing a pipeline so each node can manage and extend such a list over time.
  3.
Each RO-Crate dataset reuses the ObservedVariable instance to document the corresponding column of dry biomass values in the tabular data.
  4.
The RO-Crate metadata also identifies that these values were produced by a sosa:Observation which references the associated miappe:Method (== sosa:Procedure).

I'd like to understand how much we would need to modify this to benefit from DDI-CDI. I get the impression at very least that the ObservedVariable instance in 2 would need to be a cdi:RepresentedVariable but that the one in 3 would be a cdi:InstanceVariable, and I feel that means they should have different IRIs - otherwise the combined graph would end up defeating the point of having InstanceVariable at all.

Aside from that aspect, and assuming the Trait and Scale are modeled appropriately and the properties linking them to my ObservedVariable are subproperties of CDI ones, would there be more I need to do to benefit from DDI-CDI for cross-dataset and cross-domain variable interoperability?

Thanks so much.

Donald

Donald Hobern

Data Management Director, Australian Plant Phenomics Network
University of Adelaide - working from Canberra, ACT

P (04) 20511471   |   plantphenomics.org.au<http://www.plantphenomics.org.au/>   |   subscribe to our news<https://www.plantphenomics.org.au/news/#news-from-our-blog>

[cid:ii_19a16477db61a364f881]
APPN acknowledges the Traditional Custodians of Country throughout Australia and their connections to land, sea and community. We pay our respect to their Elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.
The Australian Plant Phenomics Network (APPN) is supported by the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS<https://www.education.gov.au/national-collaborative-research-infrastructure-strategy-ncris>)
APPN National Head Office at the University of Adelaide<https://www.thewaite.org/> (UoA - CRICOS provider number 00123M). This email (and any attachment) is confidential and may also be privileged or otherwise exempt from disclosure. It is intended only for the addressee. If you are not the intended recipient, please delete it and do not send it on, copy it or disclose its contents. No assurance is given about the security of information sent electronically. Think green and read on the screen.
--
cdif-community mailing list
cdif-community at lists.codata.org<mailto:cdif-community at lists.codata.org>
http://lists.codata.org/mailman/listinfo/cdif-community_lists.codata.org<http://lists.codata.org/mailman/listinfo/cdif-community_lists.codata.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/6e37cb58/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-mwjfxd2m.png
Type: image/png
Size: 25838 bytes
Desc: Outlook-mwjfxd2m.png
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/6e37cb58/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 229710 bytes
Desc: image.png
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/6e37cb58/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-rloa34iy.png
Type: image/png
Size: 25838 bytes
Desc: Outlook-rloa34iy.png
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/6e37cb58/attachment-0005.png>