The Burden of Reproducibility
Hi, I am Tanu Malik, a research scientist and Fellow at the Computation Institute, University of Chicago and Argonne National Laboratory. I am also a Lecturer in the Department of Computer Science, University of Chicago. I work in scientific data management, with emphasis on distributed data management, and metadata management, and regularly collaborate with astronomers, chemists, geoscientists, and cosmologists.
For the past few years, I have had an active interest in data provenance, which has direct implications on computational reproducibility. My student Quan Pham (co-advised with Ian Foster) recently graduated from the Department of Computer Science, University of Chicago with a thesis on the topic of computational reproducibility. In his thesis, Quan investigated the three orthogonal issues of efficiency, usability and cost of computational reproducibility. He proposed a novel system for computational reproducibility and showed how it can be optimized for several use cases.
Quan’s novel work lead to our NSF-sponsored project ‘GeoDataspace’ in which we are exploring the ideas of sharing and reproducibility in the geoscience community, introducing the geoscientists to efficient and usable tools so that they are ultimately in-charge of their ‘reproducible’ pipelines, publications, and projects.
So what challenges have we faced so far, and in particular to answer Doug’s question “where in computational science is reproducibility not happening or not working”?
As I perceive, reproducibility is indeed happening at an individual/collaboration level but not at the community level. Again, I witness reproducibility happening at micro time scales but not at macro time scales. Let me explain myself.
Publications, which serve as the primary medium of dissemination for computational science, are not reproducible. Typically, to produce a publication, the authors conceive of a research idea, experiment with it, and produce some results. Those results are reproducible in that the authors are able to re-confirm themselves that their research idea and experiments are plausible. However, reproducibility stops soon thereafter. The authors take in some effort to describe their idea, in the form of a publication, to the community, and since it doesn’t pay any further to be part of the community, reproducibility stops.
In this process, reproducibility was happening for a short time scale, i.e., when the authors were investigating the ideas. They were reconfirming it many ways. But at larger time scales, such as taking a publication that was published five years ago, it is incredibly hard even for the author to reproduce it.
So why is reproducibility not happening at that scale? There are several reasons, and Dan described some of them in his post. Let me also give my take:
When the end result of the scientific method was safely proclaimed to be a text publication (in late 16th century after the printing press was in considerable use), we made some assumptions. In particular, that (i) we shall always remember all the details of the chosen scientific method including the know-how of the tools that were used, and that (ii) research is a singular activity done with a focused mind, possibly in the confines of an attic or dungeons.
Four centuries down, those assumptions no longer hold true. Computers, Internet, search engines, Internet-based services, and social networks have changed much of that: We can access, discover, and connect knowledge and people much faster. But can we use these new inventions to verify faster?
GitHub + TravisCI is an excellent example of improving the speed of verification by using open-source, re-executable source codes and bringing in the social network. For the varieties of scientific methods, systems, tools, and communities, this example is just a start. There is still a significant burden of verification that science has to bear. Humans cannot tackle or reduce this burden. We need highly-efficient computational methods and tools that verify fast, encourage good behavior, and provide instantaneous feedback of being left out from the community so that scientists ensure reproducibility at all steps of the scientific method.