[cdif-community] Planning CDIF use for plant phenotyping data

Arofan Gregory arofan at codata.org
Fri Oct 24 11:25:08 EDT 2025


Jay:

Sorry - I provided the wrong link for the XAS - CDIF work. It should be:

https://codata.org/mapping-x-ray-absorption-spectroscopy-standards-to-cdif/

Cheers,

Arofan

On Fri, Oct 24, 2025 at 10:34 AM Arofan Gregory <arofan at codata.org> wrote:

> Jay:
>
> That is something we need to better establish. I have been discussing
> these correlations between DDI-CDI, PROV, ODRL, Schema.org and DCAT with
> various people in the CDIF group (notably Darren Bell and Steve Richard) at
> different points, and I think we (CDIF) probably need to publish something
> providing a detailed comparison.
>
> The work you are doing should help to inform that. Is there any kind of
> documentation we could use explaining how you have combined Schema.org and
> DDI-CDI? We are doing this for the CDIF4XAS project, and we just published
> a mapping document which lays out a little bit how we connect Schema.org
> variables and DDI-CDI (
> https://codata.org/cdif-4-xas-project-first-deliverable-overview-of-x-ray-absorption-spectroscopy-standards/).
> Working with DCAT will require some additional  thought, as there is no
> variable construct in DCAT.
>
> Steve Richard has also been doing some work with DDI-CDI and ASA data (he
> provided a sample data set here on the community list a few days ago.) This
> combines the Schema,org variableMeasured with DDI-CDI variable properties.
>
> My fear, of course, is that wthout some effort we will not witness all of
> these great minds thinking alike! ;-)
>
> Cheers,
>
> Arofan
>
>
> On Fri, Oct 24, 2025 at 9:22 AM Jay Greenfield <jay at codata.org> wrote:
>
>> Arofan:
>>
>> As you know, with the OHDSI environmental epidemiology platform (OHDSI
>> GIS) we have been using schema.org next to DDI-CDI. Using its Action
>> schema, we have been able to create placeholders for RO-Crate and Croissant
>> as extensions of the JSON-LD at multiple levels of granularity.
>>
>> In schema.org the the DataCalog may be comparable to the DDI-CDI
>> “datastore”.
>>
>>
>> Jay
>>
>> On Oct 24, 2025, at 8:54 AM, Arofan Gregory <arofan at codata.org> wrote:
>>
>> Donald:
>>
>> First, to answer your specific questions, and second to make some general
>> observations.
>>
>> *Question 1 (Variable cascade in RDF):*
>>
>> There is no reason you should not instantiate the Represented and
>> Instance Variables as separate objects with their own URIs. While you might
>> want to collapse these into a simple set of Instance Variables (which
>> simultaneously act as Represented Variables for ad hoc reuse, and can be
>> doubly typed) for a one-off publication of a data set, in cases where the
>> data structure is not reused, in your case the best approach is to have the
>> reusable variables be clearly identified stand-alone Represented Variables,
>> to which Instance Variables have a "uses" relationship. They are maintained
>> separately and have their own identity (and URI).
>>
>> *Question 2:*
>>
>> I will prepare a simple example for you. I will send this along presently.
>>
>> *Question 3:*
>>
>> I have not yet seen a dual use of DCAT and DDI-CDI, but they may exist.
>> There is one person - Pascal Heus - who has been doing some development of
>> Python libraries for DDI-DI and DCAT both, and he may have examples. I can
>> check for you and tell you at Dagstuhl (he will be attending the workshop
>> the week before.) The "physical data set" in DDI-CDI is really the
>> equivalent of a DCAT distribution, and what we describe in DDI-CDI is the
>> distribution, because this is what we assign access rights to. The DDI-CDI
>> "data set" is the equivalent of a DCAT one: it has the same logical
>> contents, but may have differences in structure and format across different
>> distributions. The DDI-CDI "data store" is a repository of logical records,
>> which may produce many different data sets. There is no directly
>> corresponding object in DCAT.
>>
>> Note that in the week prior to the Provenance workshop at Dagstuhl we
>> will be working with some folks from W3C (including the people from the
>> group which produced DCAT) to look at how the variable cascade might be
>> publishable as a W3C recommendation, similar to SOSA/SSN. The model for
>> this would be DDI-CDI, so the approach you are taking is in line with
>> future developments.
>>
>> *General Observations:*
>>
>> The value of CDIF is that it does not require the use of domain-specific
>> standards. What you have described is a rich set of domain-level agreements
>> for data sharing - CDIF in no way is intended to replace that. The thinking
>> is that if a domain has such a standard, that domain standard can be mapped
>> into an equivalent CDIF form, and that domain-external form is used to
>> provide the resources to other adjacent domains or infrastructures. The
>> CDIF4XAS OSCARS Project is an excellent example of that. Even though they
>> use Schema.org rather than DCAT, the general mapping approach from their
>> community standards to CDIF is in line with this vision. We just published
>> the first (in-progress) stage of that mapping:
>>
>>
>> Zenodo: https://zenodo.org/records/17421917
>>
>> GitHub: https://github.com/CDIF-4-XAS/XAS-CDIF
>>
>>
>> A lot of what I see in your diagram falls into the category of "Context"
>> for CDIF. I would ask that you come prepared to use your use case as an
>> example for the work at Dagstuhl - we have been playing with PROV and some
>> other standards (I-ADOPT) for describing these sorts of information, and
>> there were also some interesting explorations done during WorldFAIR in the
>> Clinical Trials space using Schema.org (
>> https://zenodo.org/records/7887385). The XAS project is also running
>> into some of the same requirements for expressing some of the information
>> about data sources, experiments, etc. This would be an excellent use case
>> for the coming workshop.
>>
>>
>> I look forward to seeing you there and talking more about this.
>>
>> Cheers,
>>
>> Arofan
>>
>> On Fri, Oct 24, 2025 at 1:53 AM Donald Hobern <
>> donald.hobern at adelaide.edu.au> wrote:
>>
>>> I'm going to follow up here with some more details, which may or may not
>>> clarify my issues. Here is the pattern we've been planning to use:
>>>
>>>
>>>    1. Assume one of the APPN nodes has a variable it standardly uses in
>>>    the Study datasets it publishes, something like dry biomass as a
>>>    miappe:Trait measured in g/m2 as a miappe:Scale.
>>>    2. The node publishes a miappe:ObservedVariable on a resolvable IRI
>>>    that provides the definition, including properties documenting the Trait
>>>    and Scale - we're developing a pipeline so each node can manage and extend
>>>    such a list over time.
>>>    3. Each RO-Crate dataset reuses the ObservedVariable instance to
>>>    document the corresponding column of dry biomass values in the tabular data.
>>>    4. The RO-Crate metadata also identifies that these values were
>>>    produced by a sosa:Observation which references the associated
>>>    miappe:Method (== sosa:Procedure).
>>>
>>>
>>> I'd like to understand how much we would need to modify this to benefit
>>> from DDI-CDI. I get the impression at very least that the ObservedVariable
>>> instance in 2 would need to be a cdi:RepresentedVariable but that the one
>>> in 3 would be a cdi:InstanceVariable, and I feel that means they should
>>> have different IRIs - otherwise the combined graph would end up defeating
>>> the point of having InstanceVariable at all.
>>>
>>> Aside from that aspect, and assuming the Trait and Scale are modeled
>>> appropriately and the properties linking them to my ObservedVariable are
>>> subproperties of CDI ones, would there be more I need to do to benefit from
>>> DDI-CDI for cross-dataset and cross-domain variable interoperability?
>>>
>>> Thanks so much.
>>>
>>> Donald
>>>
>>> *Donald Hobern*
>>> Data Management Director, Australian Plant Phenomics Network
>>> University of Adelaide - working from Canberra, ACT
>>> *P* (04) 20511471   |   plantphenomics.org.au
>>> <http://www.plantphenomics.org.au/>   |   subscribe to our news
>>> <https://www.plantphenomics.org.au/news/#news-from-our-blog>
>>> <Outlook-mwjfxd2m.png>
>>> APPN acknowledges the Traditional Custodians of Country throughout
>>> Australia and their connections to land, sea and community. We pay our
>>> respect to their Elders past and present and extend that respect to all
>>> Aboriginal and Torres Strait Islander peoples today.
>>> The Australian Plant Phenomics Network (APPN) is supported by the
>>> Australian Government’s National Collaborative Research Infrastructure
>>> Strategy (NCRIS
>>> <https://www.education.gov.au/national-collaborative-research-infrastructure-strategy-ncris>
>>> )
>>> APPN National Head Office at the University of Adelaide
>>> <https://www.thewaite.org/> (UoA - CRICOS provider number 00123M). This
>>> email (and any attachment) is confidential and may also be privileged or
>>> otherwise exempt from disclosure. It is intended only for the addressee. If
>>> you are not the intended recipient, please delete it and do not send it on,
>>> copy it or disclose its contents. No assurance is given about the security
>>> of information sent electronically. Think green and read on the screen.
>>> --
>>> cdif-community mailing list
>>> cdif-community at lists.codata.org
>>> http://lists.codata.org/mailman/listinfo/cdif-community_lists.codata.org
>>>
>> --
>> cdif-community mailing list
>> cdif-community at lists.codata.org
>> http://lists.codata.org/mailman/listinfo/cdif-community_lists.codata.org
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251024/45bcf584/attachment-0001.htm>


More information about the cdif-community mailing list