Data plays a critical role in the MSD community. It is used for a variety of purposes including model formulation, forcing, and evaluation as well as empirical analyses. Due in part to the multisector nature of the work our community does, the datasets we need are often not available “off the shelf”. This results in members of our community spending a significant amount of time and resources constructing the unique data sets they need to do their work. Because teams from across the community are focused on problems in common sectors (e.g., energy, water, urban, etc.), datasets generated in one project may be immensely valuable to another.
A key limiting factor to effective reuse of datasets in the MSD community is a lack of information about what datasets are available from each team. In the economist’s parlance, there is an imperfect information problem. While publications and conferences provide limited opportunities to advertise datasets and generate new collaborations based on data reuse, these mechanisms rely on serendipitous interactions and often do not reach the entire community. Even if datasets are advertised, a lack of effective and standardized documentation and data-storage protocols limit the extent to which new users can effectively adapt them.
One component of an effective community of practice is leveraging shared tools and resources. The purpose of this working group is to facilitate the reuse of datasets across the MSD community by providing mechanisms to inventory currently available datasets, advertise new datasets, and enhance the way datasets are documented and archived by promoting, for example, common metadata standards. Common standards would also facilitate opportunities to apply machine- and deep-learning methods that explore, discover, and dissect co-evolving behaviors across multi-sectoral landscapes. The working group will help the MSD community adopt the FAIR (Findable, Accessible, Interoperable, and Reusable) data standards.