[CODATA-international] Workshop Report: “Interoperability of Metadata Standards in Cross-Domain Science, Health, and Social Science Applications”

Schloss Dagstuhl <https://www.dagstuhl.de/18403> – Leibniz Center for
Informatics, 1 – 5 October 2018, Wadern, Germany

A workshop on the practical application of computer science to enable data
sharing and data interoperability across disciplinary boundaries
was hosted at the internationally renowned computer science institute at
Schloss Dagstuhl in Germany. The event was sponsored by CODATA (the Committee
on Data <http://www.codata.org/> of the International Science Council
<https://council.science/>), and the Data Documentation Initiative Alliance
and subsidized by Dagstuhl; it was organized by Simon Cox
<http://people.csiro.au/C/S/simon-cox> (CSIRO Australia and W3C Dataset
Exchange Working Group), Simon Hodson
<http://www.codata.org/about-codata/secretariat> (CODATA), Steven McEachern
<http://csrm.cass.anu.edu.au/people/dr-steven-mceachern> (Australian
National University and DDI Alliance), Joachim Wackerow
(GESIS - Leibniz Institute for the Social Sciences and DDI Alliance). The
workshop brought together 24 participants from many different domains.
These included representatives of a number of metadata specifications, as
well as researchers involved in pilot projects currently being pursued as
part of the ISC and CODATA Data Integration Initiative
<http://dataintegration.codata.org/>. A duration of 5 days, and the
relative isolation and unique dynamics of Dagstuhl, encourages intense
involvement on the part of all participants (as described on the DDI site

The workshop examined how modern web-friendly computer science techniques
and standards could better enable data-sharing in the context of the Data
Integration Initiative <http://dataintegration.codata.org/> pilots. These
are major cross-disciplinary data integration projects to advance solutions
for three important global challenges: infectious disease outbreaks,
resilient cities, and disaster risk reduction. The infectious disease pilot
builds on work by the Infectious Diseases Data Observatory (IDDO)
<https://www.iddo.org/> to support both research and humanitarian efforts,
with Ebola used as the primary example for discussion. The resilient cities
pilot focuses on the work in Medellín, Columbia, in partnership with Resilience
Brokers <https://resiliencebrokers.org/>.  Examples involved air quality
measurement, location of hospitals, and geo-spatial data. The disaster risk
reduction pilot, led by Public Health England in partnership with the
Research on Disaster Risk <http://www.irdrinternational.org/> is looking at
how data could support the Sendai Framework, especially in cases where the
SDG indicators would not be sufficient. Different approaches for obtaining
data both from within and from outside the realm of official statistics
were explored, with an emphasis on research data.  In each case, data
integration presented significant challenges.

Metadata standards are a part of the computer science landscape which can
facilitate the discovery of existing datasets, and their integration and
use within a particular scenario. Representatives of many of these
standards were present, helping to understand the data integration
challenges faced by each of the projects. These standards included many of
the W3C Linked Data vocabularies (DCAT, SSN, Data Cube, PROV-O, etc.), DDI,
HL7 FHIR, CDISC, DATS, ISO 19115, EML and several others.[i] <#_edn1> Some
of these standards are focused on the data within a particular discipline
or domain. Others are more general in scope. The workshop examined the
relationships between these standards in the context of their real-world
application (the pilot projects). This required an understanding of the
granularity of the metadata being expressed by each standard (at the level
of a study or dataset, at the variable and observation level, etc.)

Much of the activity in the workshop was in small working groups composed
of both business experts involved in the pilot projects, and experts in the
relevant technology and domain standards. Some additional technical topics
which arose during the exploration of the pilot projects were also
addressed separately by small teams of the appropriate experts.

The workshop was extremely productive, with immediately producing outlines
of working papers relating to each of the pilot projects. An article will
also be produced describing the overall goals of the effort and the
relationship of various standards and technology approaches to the
cross-disciplinary data integration projects. The intention is that these
will be published in peer-reviewed journals appropriate to their content.
In addition, it is anticipated one specific technical output was initiated
- for example, a DCAT profile to support granular description of data in
online catalogues. The outputs of the workshop will be presented at the
upcoming SciDataCon <https://www.scidatacon.org/IDW2018/sessions/232/>
conference (at the 2nd International Data Week
<http://www.internationaldataweek.org/> organized by CODATA together with
the Research Data Alliance and the World Data System) in Gaborone, Botswana
in early November of 2018. Further collaborative work between CODATA, the
DDI Alliance, and other interested organizations is anticipated in the
future, including more intense, focused workshops of this kind.[ii] <#_edn2>


[i] <#_ednref1> A list of metadata specifications for the workshop
is available at the workshop site.

[ii] <#_ednref2> A list of recent CODATA and DDI Workshops
on relevant topics is available at the workshop site.

