[cdif-community] Planning CDIF use for plant phenotyping data
Donald Hobern
donald.hobern at adelaide.edu.au
Mon Oct 27 02:43:15 EDT 2025
Thanks so much, Arofan.
The diagram I shared was an old sketch rather than anything formal, but you are correct - almost all of it was context for the study. Nevertheless, so the RO-Crates we generate are as reusable as possible, I'm trying to anchor all context classes using schemas that help us to expose common patterns, including SOSA/SSN, DCAT and DDI-CDI, and I'm trying also to ensure that every class is a subclass of the most appropriate schema.org class or of a class that already has an explicit relationship to SOSA.
Here is a draft turtle schema for the APPN domain: https://github.com/aus-plant-phenomics-network/appn-schema/blob/main/appn-schema.ttl (mainly really classes and inheritance for now, but most properties will be inherited from elsewhere).
There are UML diagrams in this folder: https://github.com/aus-plant-phenomics-network/appn-schema/tree/main/ttl_uml - these were all generated from the turtle:
*
A very wide diagram showing all currently defined subclass relationships: https://github.com/aus-plant-phenomics-network/appn-schema/blob/main/ttl_uml/ttl_appn_full.png
*
The same but excluding the packages for PPEO, CDI, SOSA and SSN so it's easier to read: https://github.com/aus-plant-phenomics-network/appn-schema/blob/main/ttl_uml/ttl_appn-ppeo-sosa-ssn-cdi.png
*
A diagram for each APPN class showing the relationships from the turtle
I'm working up some example data using the schema concepts, so I should be able to discuss at Dagstuhl/
Donald
Donald Hobern
Data Management Director, Australian Plant Phenomics Network
University of Adelaide - working from Canberra, ACT
P (04) 20511471 | plantphenomics.org.au<http://www.plantphenomics.org.au/> | subscribe to our news<https://www.plantphenomics.org.au/news/#news-from-our-blog>
[cid:8b46ea94-5a61-469b-9aed-8643db468566]
APPN acknowledges the Traditional Custodians of Country throughout Australia and their connections to land, sea and community. We pay our respect to their Elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.
The Australian Plant Phenomics Network (APPN) is supported by the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS<https://www.education.gov.au/national-collaborative-research-infrastructure-strategy-ncris>)
APPN National Head Office at the University of Adelaide<https://www.thewaite.org/> (UoA - CRICOS provider number 00123M). This email (and any attachment) is confidential and may also be privileged or otherwise exempt from disclosure. It is intended only for the addressee. If you are not the intended recipient, please delete it and do not send it on, copy it or disclose its contents. No assurance is given about the security of information sent electronically. Think green and read on the screen.
________________________________
From: Arofan Gregory <arofan at codata.org>
Sent: Friday, 24 October 2025 11:54 PM
To: Donald Hobern <donald.hobern at adelaide.edu.au>
Cc: cdif-community at lists.codata.org <cdif-community at lists.codata.org>
Subject: Re: [cdif-community] Planning CDIF use for plant phenotyping data
CAUTION: External email. Only click on links or open attachments from trusted senders.
________________________________
Donald:
First, to answer your specific questions, and second to make some general observations.
Question 1 (Variable cascade in RDF):
There is no reason you should not instantiate the Represented and Instance Variables as separate objects with their own URIs. While you might want to collapse these into a simple set of Instance Variables (which simultaneously act as Represented Variables for ad hoc reuse, and can be doubly typed) for a one-off publication of a data set, in cases where the data structure is not reused, in your case the best approach is to have the reusable variables be clearly identified stand-alone Represented Variables, to which Instance Variables have a "uses" relationship. They are maintained separately and have their own identity (and URI).
Question 2:
I will prepare a simple example for you. I will send this along presently.
Question 3:
I have not yet seen a dual use of DCAT and DDI-CDI, but they may exist. There is one person - Pascal Heus - who has been doing some development of Python libraries for DDI-DI and DCAT both, and he may have examples. I can check for you and tell you at Dagstuhl (he will be attending the workshop the week before.) The "physical data set" in DDI-CDI is really the equivalent of a DCAT distribution, and what we describe in DDI-CDI is the distribution, because this is what we assign access rights to. The DDI-CDI "data set" is the equivalent of a DCAT one: it has the same logical contents, but may have differences in structure and format across different distributions. The DDI-CDI "data store" is a repository of logical records, which may produce many different data sets. There is no directly corresponding object in DCAT.
Note that in the week prior to the Provenance workshop at Dagstuhl we will be working with some folks from W3C (including the people from the group which produced DCAT) to look at how the variable cascade might be publishable as a W3C recommendation, similar to SOSA/SSN. The model for this would be DDI-CDI, so the approach you are taking is in line with future developments.
General Observations:
The value of CDIF is that it does not require the use of domain-specific standards. What you have described is a rich set of domain-level agreements for data sharing - CDIF in no way is intended to replace that. The thinking is that if a domain has such a standard, that domain standard can be mapped into an equivalent CDIF form, and that domain-external form is used to provide the resources to other adjacent domains or infrastructures. The CDIF4XAS OSCARS Project is an excellent example of that. Even though they use Schema.org rather than DCAT, the general mapping approach from their community standards to CDIF is in line with this vision. We just published the first (in-progress) stage of that mapping:
Zenodo: https://zenodo.org/records/17421917<https://zenodo.org/records/17421917>
GitHub: https://github.com/CDIF-4-XAS/XAS-CDIF<https://github.com/CDIF-4-XAS/XAS-CDIF>
A lot of what I see in your diagram falls into the category of "Context" for CDIF. I would ask that you come prepared to use your use case as an example for the work at Dagstuhl - we have been playing with PROV and some other standards (I-ADOPT) for describing these sorts of information, and there were also some interesting explorations done during WorldFAIR in the Clinical Trials space using Schema.org (https://zenodo.org/records/7887385<https://zenodo.org/records/7887385>). The XAS project is also running into some of the same requirements for expressing some of the information about data sources, experiments, etc. This would be an excellent use case for the coming workshop.
I look forward to seeing you there and talking more about this.
Cheers,
Arofan
On Fri, Oct 24, 2025 at 1:53 AM Donald Hobern <donald.hobern at adelaide.edu.au<mailto:donald.hobern at adelaide.edu.au>> wrote:
I'm going to follow up here with some more details, which may or may not clarify my issues. Here is the pattern we've been planning to use:
1.
Assume one of the APPN nodes has a variable it standardly uses in the Study datasets it publishes, something like dry biomass as a miappe:Trait measured in g/m2 as a miappe:Scale.
2.
The node publishes a miappe:ObservedVariable on a resolvable IRI that provides the definition, including properties documenting the Trait and Scale - we're developing a pipeline so each node can manage and extend such a list over time.
3.
Each RO-Crate dataset reuses the ObservedVariable instance to document the corresponding column of dry biomass values in the tabular data.
4.
The RO-Crate metadata also identifies that these values were produced by a sosa:Observation which references the associated miappe:Method (== sosa:Procedure).
I'd like to understand how much we would need to modify this to benefit from DDI-CDI. I get the impression at very least that the ObservedVariable instance in 2 would need to be a cdi:RepresentedVariable but that the one in 3 would be a cdi:InstanceVariable, and I feel that means they should have different IRIs - otherwise the combined graph would end up defeating the point of having InstanceVariable at all.
Aside from that aspect, and assuming the Trait and Scale are modeled appropriately and the properties linking them to my ObservedVariable are subproperties of CDI ones, would there be more I need to do to benefit from DDI-CDI for cross-dataset and cross-domain variable interoperability?
Thanks so much.
Donald
Donald Hobern
Data Management Director, Australian Plant Phenomics Network
University of Adelaide - working from Canberra, ACT
P (04) 20511471 | plantphenomics.org.au<http://www.plantphenomics.org.au/> | subscribe to our news<https://www.plantphenomics.org.au/news/#news-from-our-blog>
[cid:ii_19a16477db61a364f881]
APPN acknowledges the Traditional Custodians of Country throughout Australia and their connections to land, sea and community. We pay our respect to their Elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.
The Australian Plant Phenomics Network (APPN) is supported by the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS<https://www.education.gov.au/national-collaborative-research-infrastructure-strategy-ncris>)
APPN National Head Office at the University of Adelaide<https://www.thewaite.org/> (UoA - CRICOS provider number 00123M). This email (and any attachment) is confidential and may also be privileged or otherwise exempt from disclosure. It is intended only for the addressee. If you are not the intended recipient, please delete it and do not send it on, copy it or disclose its contents. No assurance is given about the security of information sent electronically. Think green and read on the screen.
--
cdif-community mailing list
cdif-community at lists.codata.org<mailto:cdif-community at lists.codata.org>
http://lists.codata.org/mailman/listinfo/cdif-community_lists.codata.org<http://lists.codata.org/mailman/listinfo/cdif-community_lists.codata.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251027/f27a38f5/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-mwjfxd2m.png
Type: image/png
Size: 25838 bytes
Desc: Outlook-mwjfxd2m.png
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251027/f27a38f5/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-ujdyqqrr.png
Type: image/png
Size: 25838 bytes
Desc: Outlook-ujdyqqrr.png
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251027/f27a38f5/attachment-0003.png>
More information about the cdif-community
mailing list