[cdif-community] Planning CDIF use for plant phenotyping data
Donald Hobern
donald.hobern at adelaide.edu.au
Thu Oct 23 21:49:33 EDT 2025
TL;DR - APPN plans to use RO-Crates for standard packaging of plant phenotyping data from all its nodes. We are pretty well advanced with how this will work and want to exploit as much of CDIF as possible. I'm looking for example datasets that help us to understand how to use DCAT and DDI-CDI for the relevant parts of our model - see the questions at the end.
APPN is part of a community that broadly accepts a helpful but underspecified Minimum Information model, MIAPPE: https://www.miappe.org/. MIAPPE itself (and the wider plant phenotyping community) adopts/accepts the Investigation-Study-Assay model (ISA, https://isa-tools.org/index.html) as a general framework for thinking about life science research datasets, although again the mapping of MIAPPE to ISA is underspecified.
I will be attending a meeting in the Netherlands on 13/14 November (immediately before the Dagstuhl provenance workshop) to explore how we can harden MIAPPE concepts as a much FAIRer model. I want to frame this as an effort to develop a WorldFAIR+ petal for plant phenotyping. I have already developed a UML/Turtle representation of MIAPPE that exploits SSN/SOSA for the sensing/actuating parts and am happy it's a good evolution of MIAPPE that should fit well with the conceptual framework understood by members of the community. The following schematic indicates some of the main concepts in a MIAPPE Study (based on the pre-SOSA underspecified concepts).
[cid:96bf4e2e-3d3c-40f7-8ab2-83dc76f4e7e8]
A typical MIAPPE dataset describes a Study that uses cameras, sensors and human measurements to record primary images and measurements for set of ObservationUnits (i.e. features of interest: plants, plots, plant organs, soil characteristics, environmental conditions, etc.) and then processes these to estimate plant Trait values (height, leaf area, biomass, organ counts, greenness indices, etc.) for the plants at timepoints through their development. In many cases, some of the environmental variables are controlled (i.e. actuated). The dataset therefore includes a provenance chain with primary data from various sosa:Observation executions (with sosa:Observation treated as a subclass of isa:Assay) then processed by a chain of software components to deliver tabular data. The RO-Crate metadata graph should standardly describe the study context and the primary and derived assets. An example would be a drone flight collecting RGB, hyperspectral and LiDAR imagery, followed by processing to create point clouds and orthomosaics and then to segment plot areas in the field, then a battery of computer vision, machine learning and other algorithms to estimate the traits of interest, all delivered as a CSV file with plots as rows and traits as columns.
For now, I'm mostly interested in how to organise and describe the tabular data. MIAPPE models the trait data as values for a set of miappe:ObservedVariables, each of which references a miappe:Trait, a miappe:Scale and a miappe:Method. In the SOSA reworking, the Method becomes a sosa:Procedure associated with the sosa:Observation, sosa:Actuation or sosa:Sampling (all subclasses of isa:Assay). The miappe:ObservedVariable is then a sosa:Property that somehow links to a Trait and a Scale.
Looking at DDI-CDI, the MIAPPE Trait (normally an ontology term without a representation or scale) is a cdi:Concept, and the miappe:ObservedVariable is (I think) a cdi:InstanceVariable (in the context of describing my final CSV file). We are expecting our nodes to publish instance lists of the variables they reuse across multiple Studies, so those would presumably be cdi:RepresentedVariable instances.
Inside the APPN MIAPPE-based class schema, I would therefore like to model Trait as a subclass of cdi:Concept and ObservedVariable as a subclass of cdi:InstanceVariable with the Scale as a cdi:UnitType.
QUESTIONS:
*
I am confused by how the variable cascade would be modeled in linked data. Each level in the cascade is shown as a subclass of the level above, so each instance of an InstanceVariable is at the same time also a RepresentedVariable, a ConceptualVariable and a Concept. Does that mean that I can have a single object with a single IRI and use it wherever I need to reference an instance of any of these four CDI classes? How are we intended to link an InstanceVariable with the associated RepresentedVariable or Concept if either of these has previously been defined for reuse? Is there a reasonably simple example of CDI metadata doing this?
*
I would really value an example that shows use of CDI for tabular data containing mainly scalar values (rather than enumerated concepts). An example for a small CSV file would be perfect.
*
In addition, are there any example datasets that model combined use of DCAT and DDI-CDI - is a dcat:Dataset the same as a dci:DataSet? Is there any standardised relationship between these two?
Thanks for any insights.
Donald
Donald Hobern
Data Management Director, Australian Plant Phenomics Network
University of Adelaide - working from Canberra, ACT
P (04) 20511471 | plantphenomics.org.au<http://www.plantphenomics.org.au/> | subscribe to our news<https://www.plantphenomics.org.au/news/#news-from-our-blog>
[cid:2dcfe621-7d6b-413b-bd17-37d2109146c2]
APPN acknowledges the Traditional Custodians of Country throughout Australia and their connections to land, sea and community. We pay our respect to their Elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.
The Australian Plant Phenomics Network (APPN) is supported by the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS<https://www.education.gov.au/national-collaborative-research-infrastructure-strategy-ncris>)
APPN National Head Office at the University of Adelaide<https://www.thewaite.org/> (UoA - CRICOS provider number 00123M). This email (and any attachment) is confidential and may also be privileged or otherwise exempt from disclosure. It is intended only for the addressee. If you are not the intended recipient, please delete it and do not send it on, copy it or disclose its contents. No assurance is given about the security of information sent electronically. Think green and read on the screen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251024/5dcc091b/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 88102 bytes
Desc: image.png
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251024/5dcc091b/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-m13mkdns.png
Type: image/png
Size: 25838 bytes
Desc: Outlook-m13mkdns.png
URL: <http://lists.codata.org/pipermail/cdif-community_lists.codata.org/attachments/20251024/5dcc091b/attachment-0003.png>
More information about the cdif-community
mailing list