Welcome to the MSD Working Group on Open Science and FAIR Data

One component of an effective community of practice is leveraging shared tools and resources. The purpose of this working group is to facilitate the reuse of models and datasets across the MSD community. Open science is one strategy to enable these changes. Open science aims to accelerate scientific progress by making the data, code, and methods underpinning scientific research freely and easily available to downstream users. Open science is an umbrella term that describes foci ranging from open access journals to reproducible research to open science tools such as data repositories and open-source models (Pontika et al. 2015). The open science movement has risen to the point of being recognized by the U.S. National Academy of Science who published a seminal report on the subject in 2018: “Open Science by Design: Realizing a Vision for 21st Century Research”.

Our 10-year vision is to foster a culture of openness and to facilitate a collaborative, resource-rich, community-driven way of doing MSD research. In this open science world MSD datasets would be developed using the FAIR (Findable, Accessible, Interoperable, and Reusable; Wilkinson et al. 2016) data standards and open-source models would be the norm rather than the exception. If successful, we will accelerate progress, facilitate and incentivize collaboration, and enhance the scientific impact and visibility of the MSD community.

A graphical taxonomy of open science from Pontika et al. 2015.

Data plays a critical role in the MSD community. It is used for a variety of purposes including model formulation, forcing, and evaluation as well as empirical analyses. Due in part to the multisector nature of the work our community does, the datasets we need are often not available “off the shelf”. This results in members of our community spending a significant amount of time and resources constructing the unique data sets they need to do their work. Because teams from across the community are focused on problems in common sectors (e.g., energy, water, urban, etc.), datasets generated in one project may be immensely valuable to another. The FAIR data principles aim to facilitate the reuse of datasets by enhancing the ways in which data is processed, documented, and shared. Our working group aims to provide mechanisms to inventory currently available datasets, advertise new datasets, and enhance the way datasets are documented and archived by promoting, for example, common metadata standards. Common standards would also facilitate opportunities to apply machine- and deep-learning methods that explore, discover, and dissect co-evolving behaviors across multi-sectoral landscapes.

The FAIR data standards as defined by Wilkinson et al. 2016.

References

  1. Pontika, Nancy, Petr Knoth, Matteo Cancellieri, and Samuel Pearce, 2015. Fostering open science to research using a taxonomy and an eLearning portal. iKnow: 15th International Conference on Knowledge Technologies and Data Driven Business, 21-22 Oct 2015, Graz, Austria. https://doi.org/10.1145/2809563.2809571
  2. National Academies of Sciences, Engineering, and Medicine. 2018. Open Science by Design: Realizing a Vision for 21st Century Research. Washington, DC: The National Academies Press. https://doi.org/10.17226/25116
  3. Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg et al., 2016. The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(1), 1-9. https://doi.org/10.1038/sdata.2016.18