[cdif-community] Planning CDIF use for plant phenotyping data

Arofan Gregory arofan at codata.org
Tue Nov 11 06:43:42 EST 2025


Donald:

OK - you have immediately spotted my oversight! I should have added an
explicit "index" property to reflect the sequence. I will correct this and
re-send.

Cheers,

Arofan

On Mon, Nov 10, 2025 at 10:36 PM Donald Hobern <
donald.hobern at adelaide.edu.au> wrote:

> Thanks so much, Arofan.
>
> I'll get my head around this and then come back to you. This UML seems to
> capture the structure. I was a little surprised to see that the *has* property
> is so generic and collects up functionally different parts of the
> DataStructure (components and key) and SegmentLayout (mappings and
> positions), although the semantics can clearlt be inferred from the
> associated object. Should the Positions include an index or is everything
> inferred from the sequence in the list of *has *objects?
>
> Looking forward to next week.
>
> Donald
>
>
>
> *Donald Hobern*
> Data Management Director, Australian Plant Phenomics Network
> University of Adelaide - working from Canberra, ACT
>
> *P* (04) 20511471   |   plantphenomics.org.au
> <http://www.plantphenomics.org.au/>   |   subscribe to our news
> <https://www.plantphenomics.org.au/news/#news-from-our-blog>
> APPN acknowledges the Traditional Custodians of Country throughout
> Australia and their connections to land, sea and community. We pay our
> respect to their Elders past and present and extend that respect to all
> Aboriginal and Torres Strait Islander peoples today.
> The Australian Plant Phenomics Network (APPN) is supported by the
> Australian Government’s National Collaborative Research Infrastructure
> Strategy (NCRIS
> <https://www.education.gov.au/national-collaborative-research-infrastructure-strategy-ncris>
> )
> APPN National Head Office at the University of Adelaide
> <https://www.thewaite.org/> (UoA - CRICOS provider number 00123M). This
> email (and any attachment) is confidential and may also be privileged or
> otherwise exempt from disclosure. It is intended only for the addressee. If
> you are not the intended recipient, please delete it and do not send it on,
> copy it or disclose its contents. No assurance is given about the security
> of information sent electronically. Think green and read on the screen.
> ------------------------------
> *From:* Arofan Gregory <arofan at codata.org>
> *Sent:* Monday, 10 November 2025 9:58 PM
> *To:* Donald Hobern <donald.hobern at adelaide.edu.au>
> *Cc:* cdif-community at lists.codata.org <cdif-community at lists.codata.org>
> *Subject:* Re: [cdif-community] Planning CDIF use for plant phenotyping
> data
>
> * CAUTION: External email. Only click on links or open attachments from
> trusted senders. *
> ------------------------------
> Donald:
>
> I promised to send along a simple example of DDI-CDI, using the CDIF
> profile. Please find one in the attached ZIP.  Apologies for taking so
> long, but you know how this goes(!)
>
> (1) This file does not contain any coded variables - only string, numeric,
> and date
> (2) The structure of the JSON-LD is currently under discussion, so the RDF
> being conveyed is correct, but the nesting in a CDIF Schema.org context
> might change how the JSON-LD is structured. This will be discussed at the
> dagsyuhl workshop.
>
> Cheers,
>
> Arofan
>
> On Fri, Oct 24, 2025 at 8:54 AM Arofan Gregory <arofan at codata.org> wrote:
>
> Donald:
>
> First, to answer your specific questions, and second to make some general
> observations.
>
> *Question 1 (Variable cascade in RDF):*
>
> There is no reason you should not instantiate the Represented and Instance
> Variables as separate objects with their own URIs. While you might want to
> collapse these into a simple set of Instance Variables (which
> simultaneously act as Represented Variables for ad hoc reuse, and can be
> doubly typed) for a one-off publication of a data set, in cases where the
> data structure is not reused, in your case the best approach is to have the
> reusable variables be clearly identified stand-alone Represented Variables,
> to which Instance Variables have a "uses" relationship. They are maintained
> separately and have their own identity (and URI).
>
> *Question 2:*
>
> I will prepare a simple example for you. I will send this along presently.
>
> *Question 3:*
>
> I have not yet seen a dual use of DCAT and DDI-CDI, but they may exist.
> There is one person - Pascal Heus - who has been doing some development of
> Python libraries for DDI-DI and DCAT both, and he may have examples. I can
> check for you and tell you at Dagstuhl (he will be attending the workshop
> the week before.) The "physical data set" in DDI-CDI is really the
> equivalent of a DCAT distribution, and what we describe in DDI-CDI is the
> distribution, because this is what we assign access rights to. The DDI-CDI
> "data set" is the equivalent of a DCAT one: it has the same logical
> contents, but may have differences in structure and format across different
> distributions. The DDI-CDI "data store" is a repository of logical records,
> which may produce many different data sets. There is no directly
> corresponding object in DCAT.
>
> Note that in the week prior to the Provenance workshop at Dagstuhl we will
> be working with some folks from W3C (including the people from the group
> which produced DCAT) to look at how the variable cascade might be
> publishable as a W3C recommendation, similar to SOSA/SSN. The model for
> this would be DDI-CDI, so the approach you are taking is in line with
> future developments.
>
> *General Observations:*
>
> The value of CDIF is that it does not require the use of domain-specific
> standards. What you have described is a rich set of domain-level agreements
> for data sharing - CDIF in no way is intended to replace that. The thinking
> is that if a domain has such a standard, that domain standard can be mapped
> into an equivalent CDIF form, and that domain-external form is used to
> provide the resources to other adjacent domains or infrastructures. The
> CDIF4XAS OSCARS Project is an excellent example of that. Even though they
> use Schema.org rather than DCAT, the general mapping approach from their
> community standards to CDIF is in line with this vision. We just published
> the first (in-progress) stage of that mapping:
>
>
>
> Zenodo: https://zenodo.org/records/17421917
>
> GitHub: https://github.com/CDIF-4-XAS/XAS-CDIF
>
>
>
> A lot of what I see in your diagram falls into the category of "Context"
> for CDIF. I would ask that you come prepared to use your use case as an
> example for the work at Dagstuhl - we have been playing with PROV and some
> other standards (I-ADOPT) for describing these sorts of information, and
> there were also some interesting explorations done during WorldFAIR in the
> Clinical Trials space using Schema.org (https://zenodo.org/records/7887385).
> The XAS project is also running into some of the same requirements for
> expressing some of the information about data sources, experiments, etc.
> This would be an excellent use case for the coming workshop.
>
>
> I look forward to seeing you there and talking more about this.
>
> Cheers,
>
> Arofan
>
> On Fri, Oct 24, 2025 at 1:53 AM Donald Hobern <
> donald.hobern at adelaide.edu.au> wrote:
>
> I'm going to follow up here with some more details, which may or may not
> clarify my issues. Here is the pattern we've been planning to use:
>
>
>    1. Assume one of the APPN nodes has a variable it standardly uses in
>    the Study datasets it publishes, something like dry biomass as a
>    miappe:Trait measured in g/m2 as a miappe:Scale.
>    2. The node publishes a miappe:ObservedVariable on a resolvable IRI
>    that provides the definition, including properties documenting the Trait
>    and Scale - we're developing a pipeline so each node can manage and extend
>    such a list over time.
>    3. Each RO-Crate dataset reuses the ObservedVariable instance to
>    document the corresponding column of dry biomass values in the tabular data.
>    4. The RO-Crate metadata also identifies that these values were
>    produced by a sosa:Observation which references the associated
>    miappe:Method (== sosa:Procedure).
>
>
> I'd like to understand how much we would need to modify this to benefit
> from DDI-CDI. I get the impression at very least that the ObservedVariable
> instance in 2 would need to be a cdi:RepresentedVariable but that the one
> in 3 would be a cdi:InstanceVariable, and I feel that means they should
> have different IRIs - otherwise the combined graph would end up defeating
> the point of having InstanceVariable at all.
>
> Aside from that aspect, and assuming the Trait and Scale are modeled
> appropriately and the properties linking them to my ObservedVariable are
> subproperties of CDI ones, would there be more I need to do to benefit from
> DDI-CDI for cross-dataset and cross-domain variable interoperability?
>
> Thanks so much.
>
> Donald
>
> *Donald Hobern*
>
> Data Management Director, Australian Plant Phenomics Network
> University of Adelaide - working from Canberra, ACT
>
> *P* (04) 20511471   |   plantphenomics.org.au
> <http://www.plantphenomics.org.au/>   |   subscribe to our news
> <https://www.plantphenomics.org.au/news/#news-from-our-blog>
> APPN acknowledges the Traditional Custodians of Country throughout
> Australia and their connections to land, sea and community. We pay our
> respect to their Elders past and present and extend that respect to all
> Aboriginal and Torres Strait Islander peoples today.
> The Australian Plant Phenomics Network (APPN) is supported by the
> Australian Government’s National Collaborative Research Infrastructure
> Strategy (NCRIS
> <https://www.education.gov.au/national-collaborative-research-infrastructure-strategy-ncris>
> )
> APPN National Head Office at the University of Adelaide
> <https://www.thewaite.org/> (UoA - CRICOS provider number 00123M). This
> email (and any attachment) is confidential and may also be privileged or
> otherwise exempt from disclosure. It is intended only for the addressee. If
> you are not the intended recipient, please delete it and do not send it on,
> copy it or disclose its contents. No assurance is given about the security
> of information sent electronically. Think green and read on the screen.
> --
> cdif-community mailing list
> cdif-community at lists.codata.org
> http://lists.codata.org/mailman/listinfo/cdif-community_lists.codata.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/e0b98149/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-mwjfxd2m.png
Type: image/png
Size: 25838 bytes
Desc: not available
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/e0b98149/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 229710 bytes
Desc: not available
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/e0b98149/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-rloa34iy.png
Type: image/png
Size: 25838 bytes
Desc: not available
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251111/e0b98149/attachment-0005.png>


More information about the cdif-community mailing list