[cdif-community] Planning CDIF use for plant phenotyping data
Arofan Gregory
arofan at codata.org
Fri Oct 24 10:34:17 EDT 2025
Jay:
That is something we need to better establish. I have been discussing these
correlations between DDI-CDI, PROV, ODRL, Schema.org and DCAT with various
people in the CDIF group (notably Darren Bell and Steve Richard) at
different points, and I think we (CDIF) probably need to publish something
providing a detailed comparison.
The work you are doing should help to inform that. Is there any kind of
documentation we could use explaining how you have combined Schema.org and
DDI-CDI? We are doing this for the CDIF4XAS project, and we just published
a mapping document which lays out a little bit how we connect Schema.org
variables and DDI-CDI (
https://codata.org/cdif-4-xas-project-first-deliverable-overview-of-x-ray-absorption-spectroscopy-standards/).
Working with DCAT will require some additional thought, as there is no
variable construct in DCAT.
Steve Richard has also been doing some work with DDI-CDI and ASA data (he
provided a sample data set here on the community list a few days ago.) This
combines the Schema,org variableMeasured with DDI-CDI variable properties.
My fear, of course, is that wthout some effort we will not witness all of
these great minds thinking alike! ;-)
Cheers,
Arofan
On Fri, Oct 24, 2025 at 9:22 AM Jay Greenfield <jay at codata.org> wrote:
> Arofan:
>
> As you know, with the OHDSI environmental epidemiology platform (OHDSI
> GIS) we have been using schema.org next to DDI-CDI. Using its Action
> schema, we have been able to create placeholders for RO-Crate and Croissant
> as extensions of the JSON-LD at multiple levels of granularity.
>
> In schema.org the the DataCalog may be comparable to the DDI-CDI
> “datastore”.
>
>
> Jay
>
> On Oct 24, 2025, at 8:54 AM, Arofan Gregory <arofan at codata.org> wrote:
>
> Donald:
>
> First, to answer your specific questions, and second to make some general
> observations.
>
> *Question 1 (Variable cascade in RDF):*
>
> There is no reason you should not instantiate the Represented and Instance
> Variables as separate objects with their own URIs. While you might want to
> collapse these into a simple set of Instance Variables (which
> simultaneously act as Represented Variables for ad hoc reuse, and can be
> doubly typed) for a one-off publication of a data set, in cases where the
> data structure is not reused, in your case the best approach is to have the
> reusable variables be clearly identified stand-alone Represented Variables,
> to which Instance Variables have a "uses" relationship. They are maintained
> separately and have their own identity (and URI).
>
> *Question 2:*
>
> I will prepare a simple example for you. I will send this along presently.
>
> *Question 3:*
>
> I have not yet seen a dual use of DCAT and DDI-CDI, but they may exist.
> There is one person - Pascal Heus - who has been doing some development of
> Python libraries for DDI-DI and DCAT both, and he may have examples. I can
> check for you and tell you at Dagstuhl (he will be attending the workshop
> the week before.) The "physical data set" in DDI-CDI is really the
> equivalent of a DCAT distribution, and what we describe in DDI-CDI is the
> distribution, because this is what we assign access rights to. The DDI-CDI
> "data set" is the equivalent of a DCAT one: it has the same logical
> contents, but may have differences in structure and format across different
> distributions. The DDI-CDI "data store" is a repository of logical records,
> which may produce many different data sets. There is no directly
> corresponding object in DCAT.
>
> Note that in the week prior to the Provenance workshop at Dagstuhl we will
> be working with some folks from W3C (including the people from the group
> which produced DCAT) to look at how the variable cascade might be
> publishable as a W3C recommendation, similar to SOSA/SSN. The model for
> this would be DDI-CDI, so the approach you are taking is in line with
> future developments.
>
> *General Observations:*
>
> The value of CDIF is that it does not require the use of domain-specific
> standards. What you have described is a rich set of domain-level agreements
> for data sharing - CDIF in no way is intended to replace that. The thinking
> is that if a domain has such a standard, that domain standard can be mapped
> into an equivalent CDIF form, and that domain-external form is used to
> provide the resources to other adjacent domains or infrastructures. The
> CDIF4XAS OSCARS Project is an excellent example of that. Even though they
> use Schema.org rather than DCAT, the general mapping approach from their
> community standards to CDIF is in line with this vision. We just published
> the first (in-progress) stage of that mapping:
>
>
> Zenodo: https://zenodo.org/records/17421917
>
> GitHub: https://github.com/CDIF-4-XAS/XAS-CDIF
>
>
> A lot of what I see in your diagram falls into the category of "Context"
> for CDIF. I would ask that you come prepared to use your use case as an
> example for the work at Dagstuhl - we have been playing with PROV and some
> other standards (I-ADOPT) for describing these sorts of information, and
> there were also some interesting explorations done during WorldFAIR in the
> Clinical Trials space using Schema.org (https://zenodo.org/records/7887385).
> The XAS project is also running into some of the same requirements for
> expressing some of the information about data sources, experiments, etc.
> This would be an excellent use case for the coming workshop.
>
>
> I look forward to seeing you there and talking more about this.
>
> Cheers,
>
> Arofan
>
> On Fri, Oct 24, 2025 at 1:53 AM Donald Hobern <
> donald.hobern at adelaide.edu.au> wrote:
>
>> I'm going to follow up here with some more details, which may or may not
>> clarify my issues. Here is the pattern we've been planning to use:
>>
>>
>> 1. Assume one of the APPN nodes has a variable it standardly uses in
>> the Study datasets it publishes, something like dry biomass as a
>> miappe:Trait measured in g/m2 as a miappe:Scale.
>> 2. The node publishes a miappe:ObservedVariable on a resolvable IRI
>> that provides the definition, including properties documenting the Trait
>> and Scale - we're developing a pipeline so each node can manage and extend
>> such a list over time.
>> 3. Each RO-Crate dataset reuses the ObservedVariable instance to
>> document the corresponding column of dry biomass values in the tabular data.
>> 4. The RO-Crate metadata also identifies that these values were
>> produced by a sosa:Observation which references the associated
>> miappe:Method (== sosa:Procedure).
>>
>>
>> I'd like to understand how much we would need to modify this to benefit
>> from DDI-CDI. I get the impression at very least that the ObservedVariable
>> instance in 2 would need to be a cdi:RepresentedVariable but that the one
>> in 3 would be a cdi:InstanceVariable, and I feel that means they should
>> have different IRIs - otherwise the combined graph would end up defeating
>> the point of having InstanceVariable at all.
>>
>> Aside from that aspect, and assuming the Trait and Scale are modeled
>> appropriately and the properties linking them to my ObservedVariable are
>> subproperties of CDI ones, would there be more I need to do to benefit from
>> DDI-CDI for cross-dataset and cross-domain variable interoperability?
>>
>> Thanks so much.
>>
>> Donald
>>
>> *Donald Hobern*
>> Data Management Director, Australian Plant Phenomics Network
>> University of Adelaide - working from Canberra, ACT
>> *P* (04) 20511471 | plantphenomics.org.au
>> <http://www.plantphenomics.org.au/> | subscribe to our news
>> <https://www.plantphenomics.org.au/news/#news-from-our-blog>
>> <Outlook-mwjfxd2m.png>
>> APPN acknowledges the Traditional Custodians of Country throughout
>> Australia and their connections to land, sea and community. We pay our
>> respect to their Elders past and present and extend that respect to all
>> Aboriginal and Torres Strait Islander peoples today.
>> The Australian Plant Phenomics Network (APPN) is supported by the
>> Australian Government’s National Collaborative Research Infrastructure
>> Strategy (NCRIS
>> <https://www.education.gov.au/national-collaborative-research-infrastructure-strategy-ncris>
>> )
>> APPN National Head Office at the University of Adelaide
>> <https://www.thewaite.org/> (UoA - CRICOS provider number 00123M). This
>> email (and any attachment) is confidential and may also be privileged or
>> otherwise exempt from disclosure. It is intended only for the addressee. If
>> you are not the intended recipient, please delete it and do not send it on,
>> copy it or disclose its contents. No assurance is given about the security
>> of information sent electronically. Think green and read on the screen.
>> --
>> cdif-community mailing list
>> cdif-community at lists.codata.org
>> http://lists.codata.org/mailman/listinfo/cdif-community_lists.codata.org
>>
> --
> cdif-community mailing list
> cdif-community at lists.codata.org
> http://lists.codata.org/mailman/listinfo/cdif-community_lists.codata.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251024/756a5bcc/attachment-0001.htm>
More information about the cdif-community
mailing list