[CODATA-international] Cost of Data Wrangling

Greg Janée gjanee at ucsb.edu
Fri Dec 11 12:56:54 EST 2020

The paper cited below in turn cites survey work by Michael Brodie delivered at DAMDID 2015.  The conference is at http://ceur-ws.org/Vol-1536/ and Brodie's paper is at http://ceur-ws.org/Vol-1536/paper32.pdf.

Brodie writes, "I examined over 30 DIA use cases ... This paper summarizes some key results of my research."  The key sentence that is the apparent source of the quote is, "Currently, ~80% of the effort and resources required for the entire DIA activity are due to the two data management processes...".  These "data management processes" are Raw Data Acquisition and Curation, and Analytical Data Acquisition.

No detailed methodology is given in the paper and there is no associated raw data.  How effort and resources were measured (CPU time? Person-hours? Calendar time? Dollars?), and how activities were exactly categorized, is not known, so make of it what you will.


> On Dec 11, 2020, at 9:17 AM, Ulrich Schwardmann <uschwar1 at gwdg.de> wrote:
> Dear Ernie,
> you find the figure 80% of effort in data intensive research is used on data wrangling at:
> Peter Wittenburg, Costs of FAIR Compliance and not being FAIR compliant, 2018
> DOI: 10.23728/b2share.e184bd1ff12d45269de80c3f3e443eb7
> where it is explained on page 7f.

Greg Janée
Interim Associate University Librarian for Digital Strategies
Director, Research Data Services
UCSB Library
University of California
Santa Barbara, CA 93106-9010

More information about the CODATA-international mailing list