Ally Fitts




Data Science
Data Tools
Sponsor - USGS
Open Source Software

Sweating the Little Things

The Importance of Consistency and Documentation in Data Science

Earth Lab's undergraduate interns learn career-focused earth data science skills while contributing to Earth Lab's science, analytics, and education projects. For more information about our internship program, click here.

Many students and scientists in the earth data science field struggle to find credible, reliable, and consistent data. The U.S. Geological Survey (USGS) and Earth Lab recognize this issue and have joined forces to improve the accessibility of hydrologic data. The USGS offers a vast amount of hydrological data, and as a result data scientists have produced multiple open-source packages to retrieve, plot, and analyze these data locally. It has become apparent that many of these packages overlap in purpose and tool functionality, and package developers are unknowingly reproducing each other’s work which is inefficient and confusing for users. The Science Analytics and Synthesis (SAS) organization at the USGS and Earth Lab are hoping to alleviate repetition between packages by improving documentation of packages and spreading awareness of the existence of these packages.


This summer, I am studying various open-source Python tools that are used to access and process the U.S. Geological Survey’s hydrologic data. The goal of my project is to make USGS data more accessible and easier to use for scientific and academic purposes. By analyzing multiple hydrologic datasets and tools, the project aims to identify gaps in documentation while assisting project developers in refining their tools. Additionally, I will be collaborating with the SAS team at the USGS to build a catalog of tools that use USGS datasets. My focus has been improving the documentation of a Python package used to interface with the USGS StreamStats API. In addition to updating the text file (README) that explains the project, I have created two tutorials highlighting the functionality of the package.


This position has allowed me to develop my Python coding skills. A key resource for completing my project has been the Earth Lab’s extensive library of materials to learn how to use data science tools, including workshops and tutorials. Over the past few months, I have learned the ins and outs of Jupyter Notebook, GitHub, and Atom. I am extremely grateful and proud to be working with such an amazing group of earth scientists and look forward to continuing my work this summer with Earth Lab and USGS.