Researcher Highlight: Cypress Frankenfeld

Cypress is a senior software engineer in the MIT Joint Program on the Science and Policy of Global Change. He helps build communication tools to communicate risks of climate change to stakeholders and policymakers.

I am a software engineer working with the MIT Joint Program, helping them build tools to communicate risks of climate change to state and local policymakers. One of these tools is a website that shows an interactive county-level map of the USA. It allows stakeholders to understand how multiple risks interact—for example, combining the percentage of people living below the poverty line with water quality reveals where people are particularly at risk of drinking low quality water, and less likely to be able to afford treatments or alternatives.

But I’m not here to write about my work making this map tool. I’m writing to the MSD newsletter to convince you that research groups can benefit from using a software engineering mindset. Software engineering has always been my specialty—I’m not a research scientist. There are many things that come naturally to me that may not be common-knowledge in the broader research community.

The main takeaway I want you to get from this article is that researchers could benefit by reducing code rot. What’s code rot? Have you ever had to pick up the code of another researcher and spent hours trying to get it to run unsuccessfully? Have you ever tried to replicate a result from a simulation someone else made with the same input parameters, but it ended up with an unexpected output? This is code rot: code that doesn’t run after being passed between people; code that doesn’t work after being left alone for a while; code that produces different results with the same inputs when run on different computers.

Reducing code rot could help researchers in many ways, including:

1. Easy replication or extension of previous studies

2. Much quicker onboarding of new researchers

3. Less time spent fixing bugs when revisiting old code

4. Easily working with larger groups of researchers on the same model

Software engineers have had decades to come up with preventative measures for code rot, and I want to share some of the tactics that I’ve found most useful.

Step 1: Use version control

Version control systems—like git—are a practical necessity at any company with a software engineering team. They’re so common that we don’t even include knowledge of them in interviews—it’s expected that software engineers know how to use them. Many engineers would scoff at a software company that doesn’t use version control systems. In the research community there’s some adoption of version control, but you can also find people writing code that’s stored on someone’s laptop and shared via email. Why is there such a discrepancy? I imagine part of it is the up-front investment in learning how to use git. I can say that I wholeheartedly think it’s worth it for three reasons:

You can easily roll back to working code if it breaks down the line. When I was working on the website for viewing climate change data, I tried making the map faster at one point and broke the whole thing. I felt safe to try out big changes, however, because I made them in a new branch using git, and could easily switch back to my working code with a single command.
It helps teammates collaborate. When I was working with an undergraduate researcher, we needed to clean up some data. With the help of git and GitHub, she was able to write all the scripts for cleaning the data, and I was able to verify they worked, and provide test visualization scripts for her to test out her data-sanitizing scripts without getting out of sync.
You can more easily share your code with the outside world. When I created a website to show off the results of running our EPPA model, I was able to quickly adapt a platform by the Joint Global Change Research Institute that they had added to version control and published online.

Step 2: Freeze the versions of libraries your code uses

The results you produce rely on a combination of the code you write and that in any third party libraries you use. The only way to guarantee repeatability is by running the exact same code, including using the same version of the same libraries. Some tools make it possible to declare exactly what library versions you use, making it easier for future researchers to run the exact same code you did. You declare these versions in a text file called a lock file. The lock file specifies the exact version numbers of your dependencies to be installed on any future runs. Some dependency managers that include lock files include yarn for JavaScript, poetry for python, cargo for Rust, gradle for Java, and more recently renv for R. I’m sure there’s more examples out there but those are some good places to start.

Step 3: Always keep the latest working code in the same place

Now that you have a record of code revisions, via version control, and a way to ensure everyone is using the same versions of all the dependencies, you have a much better chance of your code continuing to work if it’s working already! The last step is to make sure you always have a working version of your code that always is in the same place, so anyone visiting the code for the first time knows where to go to get off to a running start.

Git version control provides you a mechanism to save different versions of your code in separate branches. You can use git to create one branch that is always working, always runnable, let’s call this branch main. Humans are fallible though, and can and will accidentally break the main branch. This is why software engineers often enforce this rule through automated testing and git hooks.

There are many methods I’ve seen software engineers use to enforce that the main branch is always runnable, but I’d like to explain my favorite at the moment. If you use GitHub, you can set up a GitHub Action to run and test your code when someone creates a pull request, and alter the settings of your repository to require status checks before merging to main. This will at least prevent people from merging unrunnable code.

Final thoughts

Using version control, freezing your dependencies with lock files, and only allowing working code in the main branch carries an upfront cost, but the great thing about it is once you’ve done it once, you will be able to do it again quickly by using the same tools you used for your previous project. The upfront time investment pays for itself when it saves you, or anyone else who tries to run your code years from now from the frustrations of code rot.

Avoiding code rot is particularly valuable in the field of MSD, where a main focus is on connections between different systems and sectors, often explored by linking different models. The above three suggestions all reduce the friction of making code interoperable. More interoperable and accessible code can lead to a virtuous cycle for the MSD community by making it easier for additional collaborators to join, which could open up even more opportunities to collaborate.

While I’ve focused solely on avoiding cod rot, there are many other aspects of the software engineering mindset that could benefit the MSD community. For example, software engineers have strategies for writing APIs that can easily talk to each other via open standards, know how to create web services that can scale when demand grows for a specific dataset or model, and are good at organizing code to be readable and simple even as the complexity of the problem it’s solving grows. As the field of MSD grows, stronger connections to software engineers can help them more efficiently and effectively achieve their goals.

Researcher Highlight: Cypress Frankenfeld

Like this:

Leave a ReplyCancel reply

Originally published in our February 2023 newsletter (Issue 19)

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from