Tuesday, May 19, 2015

Introduction: Mark Twain on Reproducibility, more or less

Having now figured out Blogspot's somewhat painful user interface, I will take up Doug's challenge to introduce myself and talk a little about reproducibility in the context of computational science:

I'm Daniel S. Katz (or less formally, Dan), and I'm have two relevant roles in the context of this blog. First, I'm a researcher, specifically a Senior Fellow in the Computation Institute at the University of Chicago and Argonne National Laboratory, where I work on the development and use of advanced cyberinfrastructure to solve challenging problems at multiple scales. My technical research interests are in applications, algorithms, fault tolerance, and programming in parallel and distributed computing, including HPC, Grid, Cloud, etc. I am also interested in policy issues, including citation and credit mechanisms and practices associated with software and data, organization and community practices for collaboration, and career paths for computing researchers. (I also have been blogging a bit at another site, and I will cross-post part of this post there.)

Second, I'm currently a program officer at the National Science Foundation in the Division of Advanced Cyberinfrastructure (ACI.) At NSF, I am managing $25m-$35m of software programs, including leading NSF's Software Infrastructure for Sustained Innovation (SI2) program, and I currently lead ACI's participation in Computational and Data-enabled Science & Engineering (CDS&E), Designing Materials to Revolutionize and Engineer our Future (DMREF), and Computational and Data-Enabled Science & Engineering in Mathematical and Statistical Sciences (CDS&E-MSS). I previously led ACI's participation in Faculty Early Career Development (CAREER) and Exploiting Parallelism and Scalability (XPS). I also co-led writing of "A Vision and Strategy for Software for Science, Engineering, and Education: Cyberinfrastructure Framework for the 21st Century," and led writing of "Implementation of NSF Software Vision." [This leads me to add: "Some work by the author was supported by the National Science Foundation (NSF) while working at the Foundation; any opinion, finding, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the NSF."]

Given these two roles, a researcher and a funder, it's clear to me that reproducibility in science is increasingly seen as a concern, at least a high level. And thus, making science more reproducible is a challenge that many people want to solve. But it's quite hard to do this, in general. In my opinion, there are a variety of factors responsible, including:

  1. Our scientific culture thinks reproducibility is important at a high level, but not in specific cases. This reminds me of Mark Twain's definition of classic books: those that people praise but don't read. We don't have incentives or practices in place that translate the high level concept of reproducibility into actions that support actual reproducibility.
  2. In many cases, reproducibility is difficult in practice, due to some unique situation. For example, data can be taken with a unique instrument, such as the LHC or a telescope, or the data may be transient, such as seismometer that measured a specific signal, though on the other hand, in many cases, data taken in one period should be statistically the same as data taken in another period.
  3. Given limited resources, reproducibility is less important than new research. As an example, perhaps a computer run that took months has been completed. This is unlikely to be repeated, because generating a new result is seen as a better use of the computing resources than reproducing the old result.

We can't easily change culture, but we can try to change practice, with the idea that a change in practice will eventually turn into a change in culture. And we can start by working on the easier parts of the problem, not the difficult ones. One way we can do this is by formalizing the need for reproducibility. This could be done at multiple levels, such as by publishers, funders, and faculty.

Publishers could require that reviewers actually try to reproduce submitted work as a review criterion. Funders could require the final project reports contain a reproducibility statement, a demonstration that an unrelated group had reproduced specific portions of the reported work, with funders funding these independent groups to do this. And faculty could require students to reproduce the work of other students, benefitting the reproducer with training and the reproducee with knowledge that their work has been proven to be reproducible.

What do we do about work that cannot be reproduced due to a unique situation? Perhaps try to isolate that situation and reproduce the parts of the work that can be reproduced. Or reproduce the work as a thought experiment rather than in practice. In either case, if we can't reproduce something, then we have to accept that we can't reproduce it and we need to decide how close we can come and if this is good enough.

In all of these cases, there's an implied cost-benefit tradeoff. Do we think the benefit of reproducibility is worth the cost, in reviewers' time, funders' funds, or students' time? This gets to the third factor I mentioned previously, the comparative value of reproducibility versus new research. We can try to reduce the cost using automation, tools, etc., but it will always be there and we will have to choose if it is sufficiently important to pursue.

Let me close by going back to Twain's definition, and asking, will reproducibility become one of the classic books of the 21st Century, praised but not carried out? Or will we choose to make the effort to read it?


Some work by the author was supported by the National Science Foundation (NSF) while working at the Foundation; any opinion, finding, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the NSF.


  1. In talking with colleagues in a variety of scientific fields using computation, I find that there is almost no interest in reproducing another PI's published work simply for the purpose of validating or disproving it, in a competitive sense. If validated, nothing new is learned. If not validated, then one merely enters into a complex argument about whether the reproduction was correct.

    (The value judgement is probably different in non-computational science: multiple trials will improve the confidence interval on the question or whether a particular antibiotic works or not, and so reproducibility has an easily quantified value.)

    But, there is considerable interest in reproducibility for the sake of efficient collaboration! That is, if I have a student that sets up some software and system to run a particular workload efficiently, then I want to be able to pass that setup off to other people and hope they have a good chance of getting it running in the next day, rather than the next month. If my algorithm is easy to reproduce, then someone is likely to use it as a comparison point for new work, resulting in more interest and citations. I see a lot of people who would like that.

    If the cost of reproducibility between collaborators becomes low, perhaps interest in reproducibility between competitors would become of more interest?

  2. One way that reproducibility is sometimes pushed is that I need to make sure my work is reproducible by the future me. This seems appealing to me.

  3. Replicating someone else's computational work is indeed of no interest. Reproducing, i.e. reimplementing the ideas from scratch, is mainly of interest for gaining better understanding, and therefore as much part of education and training as of doing science.

    The big advantage that computational scientists have over experimentalists is that replication of computations is a purely technical issue (see https://khinsen.wordpress.com/2014/08/27/reproducibility-replicability-and-the-two-layers-of-computational-science/ and http://dx.doi.org/10.12688/f1000research.5773.3 for an explanation). In the long run, it can and should be delegated to technology. For reasonably short computations, a replication server could accept submissions (complete records of computational work), replicate them, and issue some certificate. Once certain tools for ensuring replicability are trusted by the community, explicit replication could be reduced or even given up. A trusted tool chain would also solve the problem of huge computations: they would be considered replicable because of the use of trusted tools.