One of the ways we promote FAIR data is to highlight unique MSD-relevant datasets that have embraced the FAIR principles. If you created or work with such a dataset and would like to share your work with the community please reach out to Casey or Adam.


February 2021:

  • Title: Applying FAIR Principles in the Capacity Expansion Regional Feasibility Model
  • Authors: Kristian Nelson, Chris Vernon, and Jennie Rice – Pacific Northwest National Laboratory
  • Time: Tuesday, February 16 2021 at 9:30 am Pacific
  • Abstract: The FAIR guidelines require data to be findable, accessible, interoperable, and reusable and are key to making scientific datasets available to the community in a way that facilitates efficient data-intensive research. It is important for scientists to follow these guidelines and solidify them as standard practice in the community, so that data products can be properly organized and accessible for use in other research. As an exercise in implementing these principles in the Capacity Expansion Regional Feasibility Model (CERF), we amalgamate, archive, and distribute energy expansion data into a commonly used format, to provide to the scientific community as a reusable service. We describe the entire process of planning, discovery, harvesting, documenting, archiving, publishing, and serving data for the CERF model. The process of using these guidelines in this model can be an example for other scientists to make their data FAIR. Like many data-intensive projects, this work requires a multitude of datasets that encompass a variety of sectors, scales, and formats. Abiding by the FAIR principles ensures that this data service can be accessible and reusable. These guidelines are a useful tool to help scientists organize and store future work, and it is important to make it a standard practice in the data-science world.


June 2020:

  • Title: The FEWSION Food-Energy-Water Dataset
  • Author: Benjamin Ruddell, Northern Arizona University
  • Abstract: FEWSION is a significant NSF-funded research effort focused on mesoscale food-energy-water systems data synthesis in the United States. This effort involves engineering a coupled natural human systems data model for commodity (and other) inputs and outputs, including adoption of database structures and standards, controlled vocabularies and crosswalks, metadata, documentation, scientific workflows, high performance computing optimization, and data fusion/linking strategies to integrate heterogeneous spatiotemporal data sources. Visualization is also a challenge, and patent-pending visual analytics strategies have been developed and prototyped in the FEW-View system available since April 2019 on the website https://fewsion.us. The first version of the dataset is already available to collaborators, although unpublished, and the second version is scheduled for a more complete publication and release in 2020-2021.

September 2020:

  • Title: Open Source Historical Emissions Data Generation With The Community Emissions Data System (CEDS)
  • Author: Steve Smith, Pacific Northwest National Laboratory
  • Abstract: Anthropogenic emissions of reactive gases, aerosols, and aerosol precursor compounds have substantially changed atmospheric composition. Increased particulate and tropospheric ozone concentrations since pre-industrial times have altered the radiative balance of the atmosphere, increased human mortality and morbidity, and impacted terrestrial and aquatic ecosystems. Central to studying these effects are historical trends of emissions. Historical emissions data and consistent emissions time series are particularly important for Earth System Models (ESMs) and atmospheric chemistry and transport models, which use emissions time series as key model inputs. Emissions data are also used by multi-sector models of the human-Earth system as a historical starting point for analysis and projection of future emissions scenarios. For example, in the Coupled Model Intercomparison Project (CMIP), a key international benchmarking exercise for ESMs, a common “forcing” dataset is required for historical emissions and other inputs. In previous phases of CMIP historical emissions data were compiled from different sources using inconsistent methodologies between emission species and over time that also lacked uncertainty estimates and reproducibility.