[CODATA-international] Cost of Data Wrangling

Johnson, Jon jon.johnson at ucl.ac.uk
Fri Dec 11 04:00:26 EST 2020

Hi Eric

It’s a bit of an urban myth I think see https://blog.ldodds.com/2020/01/31/do-data-scientists-spend-80-of-their-time-cleaning-data-turns-out-no/, but it aligns with the Pareto Principle, so we are all willing to go with it!

I suppose it is not that important whether it is 80% or 60%, it’s still a massive problem and the takeaway is that it highlights where the source of most effort is being expended, and strongly suggests that it arises from poor data quality and lack of metadata to manage that.

Jon Johnson
CLOSER, UCL Institute of Social Research

From: CODATA-international <codata-international-bounces at lists.codata.org> on behalf of Ernie Boyko <boykern at yahoo.com>
Reply to: Ernie Boyko <boykern at yahoo.com>
Date: Friday, 11 December 2020 at 07:24
To: CODATA International <codata-international at lists.codata.org>
Subject: [CODATA-international] Cost of Data Wrangling

Hi all
A study conducted for the EU? is often quoted as being the source of a statement along the lines of

     *   80% of effort in data intensive research is used on data wrangling; conservative estimate of 10.2 Bn Euro.
 Can anyone on this list point me to this study?
Many thanks in advance.  I am trying to make the case for the benefits of developing a career stream for data wranglers/data stewards.
Cheers, Ernie

  “Data is the new oil.” — Clive Humby
“Data really powers everything that we do.” – Jeff Weiner

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.codata.org/pipermail/codata-international_lists.codata.org/attachments/20201211/08993994/attachment.html>

More information about the CODATA-international mailing list