[cdif-community] Planning CDIF use for plant phenotyping data
Arofan Gregory
arofan at codata.org
Tue Nov 11 06:54:26 EST 2025
Donald:
This one has been fixed.
Cheers,
Arofan
On Tue, Nov 11, 2025 at 6:43 AM Arofan Gregory <arofan at codata.org> wrote:
> Donald:
>
> OK - you have immediately spotted my oversight! I should have added an
> explicit "index" property to reflect the sequence. I will correct this and
> re-send.
>
> Cheers,
>
> Arofan
>
> On Mon, Nov 10, 2025 at 10:36 PM Donald Hobern <
> donald.hobern at adelaide.edu.au> wrote:
>
>> Thanks so much, Arofan.
>>
>> I'll get my head around this and then come back to you. This UML seems to
>> capture the structure. I was a little surprised to see that the *has* property
>> is so generic and collects up functionally different parts of the
>> DataStructure (components and key) and SegmentLayout (mappings and
>> positions), although the semantics can clearlt be inferred from the
>> associated object. Should the Positions include an index or is everything
>> inferred from the sequence in the list of *has *objects?
>>
>> Looking forward to next week.
>>
>> Donald
>>
>>
>>
>> *Donald Hobern*
>> Data Management Director, Australian Plant Phenomics Network
>> University of Adelaide - working from Canberra, ACT
>>
>> *P* (04) 20511471 | plantphenomics.org.au
>> <http://www.plantphenomics.org.au/> | subscribe to our news
>> <https://www.plantphenomics.org.au/news/#news-from-our-blog>
>> APPN acknowledges the Traditional Custodians of Country throughout
>> Australia and their connections to land, sea and community. We pay our
>> respect to their Elders past and present and extend that respect to all
>> Aboriginal and Torres Strait Islander peoples today.
>> The Australian Plant Phenomics Network (APPN) is supported by the
>> Australian Government’s National Collaborative Research Infrastructure
>> Strategy (NCRIS
>> <https://www.education.gov.au/national-collaborative-research-infrastructure-strategy-ncris>
>> )
>> APPN National Head Office at the University of Adelaide
>> <https://www.thewaite.org/> (UoA - CRICOS provider number 00123M). This
>> email (and any attachment) is confidential and may also be privileged or
>> otherwise exempt from disclosure. It is intended only for the addressee. If
>> you are not the intended recipient, please delete it and do not send it on,
>> copy it or disclose its contents. No assurance is given about the security
>> of information sent electronically. Think green and read on the screen.
>> ------------------------------
>> *From:* Arofan Gregory <arofan at codata.org>
>> *Sent:* Monday, 10 November 2025 9:58 PM
>> *To:* Donald Hobern <donald.hobern at adelaide.edu.au>
>> *Cc:* cdif-community at lists.codata.org <cdif-community at lists.codata.org>
>> *Subject:* Re: [cdif-community] Planning CDIF use for plant phenotyping
>> data
>>
>> * CAUTION: External email. Only click on links or open attachments from
>> trusted senders. *
>> ------------------------------
>> Donald:
>>
>> I promised to send along a simple example of DDI-CDI, using the CDIF
>> profile. Please find one in the attached ZIP. Apologies for taking so
>> long, but you know how this goes(!)
>>
>> (1) This file does not contain any coded variables - only string,
>> numeric, and date
>> (2) The structure of the JSON-LD is currently under discussion, so the
>> RDF being conveyed is correct, but the nesting in a CDIF Schema.org context
>> might change how the JSON-LD is structured. This will be discussed at the
>> dagsyuhl workshop.
>>
>> Cheers,
>>
>> Arofan
>>
>> On Fri, Oct 24, 2025 at 8:54 AM Arofan Gregory <arofan at codata.org> wrote:
>>
>> Donald:
>>
>> First, to answer your specific questions, and second to make some general
>> observations.
>>
>> *Question 1 (Variable cascade in RDF):*
>>
>> There is no reason you should not instantiate the Represented and
>> Instance Variables as separate objects with their own URIs. While you might
>> want to collapse these into a simple set of Instance Variables (which
>> simultaneously act as Represented Variables for ad hoc reuse, and can be
>> doubly typed) for a one-off publication of a data set, in cases where the
>> data structure is not reused, in your case the best approach is to have the
>> reusable variables be clearly identified stand-alone Represented Variables,
>> to which Instance Variables have a "uses" relationship. They are maintained
>> separately and have their own identity (and URI).
>>
>> *Question 2:*
>>
>> I will prepare a simple example for you. I will send this along presently.
>>
>> *Question 3:*
>>
>> I have not yet seen a dual use of DCAT and DDI-CDI, but they may exist.
>> There is one person - Pascal Heus - who has been doing some development of
>> Python libraries for DDI-DI and DCAT both, and he may have examples. I can
>> check for you and tell you at Dagstuhl (he will be attending the workshop
>> the week before.) The "physical data set" in DDI-CDI is really the
>> equivalent of a DCAT distribution, and what we describe in DDI-CDI is the
>> distribution, because this is what we assign access rights to. The DDI-CDI
>> "data set" is the equivalent of a DCAT one: it has the same logical
>> contents, but may have differences in structure and format across different
>> distributions. The DDI-CDI "data store" is a repository of logical records,
>> which may produce many different data sets. There is no directly
>> corresponding object in DCAT.
>>
>> Note that in the week prior to the Provenance workshop at Dagstuhl we
>> will be working with some folks from W3C (including the people from the
>> group which produced DCAT) to look at how the variable cascade might be
>> publishable as a W3C recommendation, similar to SOSA/SSN. The model for
>> this would be DDI-CDI, so the approach you are taking is in line with
>> future developments.
>>
>> *General Observations:*
>>
>> The value of CDIF is that it does not require the use of domain-specific
>> standards. What you have described is a rich set of domain-level agreements
>> for data sharing - CDIF in no way is intended to replace that. The thinking
>> is that if a domain has such a standard, that domain standard can be mapped
>> into an equivalent CDIF form, and that domain-external form is used to
>> provide the resources to other adjacent domains or infrastructures. The
>> CDIF4XAS OSCARS Project is an excellent example of that. Even though they
>> use Schema.org rather than DCAT, the general mapping approach from their
>> community standards to CDIF is in line with this vision. We just published
>> the first (in-progress) stage of that mapping:
>>
>>
>>
>> Zenodo: https://zenodo.org/records/17421917
>>
>> GitHub: https://github.com/CDIF-4-XAS/XAS-CDIF
>>
>>
>>
>> A lot of what I see in your diagram falls into the category of "Context"
>> for CDIF. I would ask that you come prepared to use your use case as an
>> example for the work at Dagstuhl - we have been playing with PROV and some
>> other standards (I-ADOPT) for describing these sorts of information, and
>> there were also some interesting explorations done during WorldFAIR in the
>> Clinical Trials space using Schema.org (
>> https://zenodo.org/records/7887385). The XAS project is also running
>> into some of the same requirements for expressing some of the information
>> about data sources, experiments, etc. This would be an excellent use case
>> for the coming workshop.
>>
>>
>> I look forward to seeing you there and talking more about this.
>>
>> Cheers,
>>
>> Arofan
>>
>> On Fri, Oct 24, 2025 at 1:53 AM Donald Hobern <
>> donald.hobern at adelaide.edu.au> wrote:
>>
>> I'm going to follow up here with some more details, which may or may not
>> clarify my issues. Here is the pattern we've been planning to use:
>>
>>
>> 1. Assume one of the APPN nodes has a variable it standardly uses in
>> the Study datasets it publishes, something like dry biomass as a
>> miappe:Trait measured in g/m2 as a miappe:Scale.
>> 2. The node publishes a miappe:ObservedVariable on a resolvable IRI
>> that provides the definition, including properties documenting the Trait
>> and Scale - we're developing a pipeline so each node can manage and extend
>> such a list over time.
>> 3. Each RO-Crate dataset reuses the ObservedVariable instance to
>> document the corresponding column of dry biomass values in the tabular data.
>> 4. The RO-Crate metadata also identifies that these values were
>> produced by a sosa:Observation which references the associated
>> miappe:Method (== sosa:Procedure).
>>
>>
>> I'd like to understand how much we would need to modify this to benefit
>> from DDI-CDI. I get the impression at very least that the ObservedVariable
>> instance in 2 would need to be a cdi:RepresentedVariable but that the one
>> in 3 would be a cdi:InstanceVariable, and I feel that means they should
>> have different IRIs - otherwise the combined graph would end up defeating
>> the point of having InstanceVariable at all.
>>
>> Aside from that aspect, and assuming the Trait and Scale are modeled
>> appropriately and the properties linking them to my ObservedVariable are
>> subproperties of CDI ones, would there be more I need to do to benefit from
>> DDI-CDI for cross-dataset and cross-domain variable interoperability?
>>
>> Thanks so much.
>>
>> Donald
>>
>> *Donald Hobern*
>>
>> Data Management Director, Australian Plant Phenomics Network
>> University of Adelaide - working from Canberra, ACT
>>
>> *P* (04) 20511471 | plantphenomics.org.au
>> <http://www.plantphenomics.org.au/> | subscribe to our news
>> <https://www.plantphenomics.org.au/news/#news-from-our-blog>
>> APPN acknowledges the Traditional Custodians of Country throughout
>> Australia and their connections to land, sea and community. We pay our
>> respect to their Elders past and present and extend that respect to all
>> Aboriginal and Torres Strait Islander peoples today.
>> The Australian Plant Phenomics Network (APPN) is supported by the
>> Australian Government’s National Collaborative Research Infrastructure
>> Strategy (NCRIS
>> <https://www.education.gov.au/national-collaborative-research-infrastructure-strategy-ncris>
>> )
>> APPN National Head Office at the University of Adelaide
>> <https://www.thewaite.org/> (UoA - CRICOS provider number 00123M). This
>> email (and any attachment) is confidential and may also be privileged or
>> otherwise exempt from disclosure. It is intended only for the addressee. If
>> you are not the intended recipient, please delete it and do not send it on,
>> copy it or disclose its contents. No assurance is given about the security
>> of information sent electronically. Think green and read on the screen.
>> --
>> cdif-community mailing list
>> cdif-community at lists.codata.org
>> http://lists.codata.org/mailman/listinfo/cdif-community_lists.codata.org
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/a4936661/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-mwjfxd2m.png
Type: image/png
Size: 25838 bytes
Desc: not available
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/a4936661/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 229710 bytes
Desc: not available
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/a4936661/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-rloa34iy.png
Type: image/png
Size: 25838 bytes
Desc: not available
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/a4936661/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SimpleSample.zip
Type: application/x-zip-compressed
Size: 1927 bytes
Desc: not available
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/a4936661/attachment-0001.bin>
More information about the cdif-community
mailing list