Chris Vernon is a senior data scientist as Pacific Northwest National Laboratory. Chris specializes in all things geospatial and
is a noisy proponent of open-source software and reproducibility within integrated science. Chris is engrained in MSD research
as the lead software engineer for the Integrated Multisector Multiscale Modeling (IM3) project and the Enabling and
Foundational Capabilities task lead for the Global Change Intersectoral Modeling System (GCIMS) project.
We will likely agree that the scientific method proposes that we interrogate our results from experimentation to grow our collective knowledge of a topic which is in turn used to conduct more observations and so on. This is one reason we conduct literature reviews and write publications with introductions that summarize the contributions of those that have come before us. A proficient scientist will often find the need to reproduce the work of others or investigate the process by which it was created to gain insight into their own research. This method that keeps us interconnected is an inherent part of who we are as scientists and is a fundamental reason for why we disseminate our work.
Many publications are present in high-impact forums that examine the meaning and deficiencies of reproducibility in science in the context of scientific integrity, generalizability, reliability, etc. I think we can simplify these a bit by stating our basic agreement which is that reproducibility is part of the scientific method, our ethos. It is easy to confuse reproducibility with results, products, or conclusions that we share because these are rewards that rise to the top whereas reproducibility supports the underlying cyclical nature of the scientific method which allows us to progress. In the words of Richard Feynman, “it isn’t the stuff, but the power to make the stuff, that is important.”
Good news—reproducible science as a part of our ethos is not abstract! In fact, I propose it can be summarized in an applied sense by the following: disseminate your research in a way that someone who has no understanding of your subject matter could walk step-by-step through a document describing your process with the tools and data you give them and reproduce your results. I would be remiss if only giving an application lacking a proposed solution, so I present the meta-repository (https://github.com/IMMM-SFA/metarepo). This document is used in the form of a GitHub repository to relay information which simply describes your research, the references that support it, citations, and accessibility guidelines for source data products any contributing software, and finally a section that describes step-by-step how to reproduce your experiment. Those who have built a diagram showing the interactions between models for integrated experiments know that a large portion of our time is often spent on the arrows, or how systems communicate. The meta-repository itself is used to version and store any code that facilitates this communication that is not imported from another source.
Creating reproducible research in this manner will also have side-effects. When used in combination with well-documented, open-source software and data, reproducibility can cause 1) an unusual amount of time to surface for use in conducting new research, 2) quick on-boarding of new staff to a project, 3) a deeper understanding of the science we are communicating, 4) a decrease in spending on duplicative efforts, and 5) a general feeling of value knowing that you have supported those who will continue where you have left off.