Teaching Earth Data Science Skills Backed by Market Research

As the availability and volume of earth data continues to grow, and data processing workflows evolve to be more complex, data science skills are becoming increasingly fundamental to science. In tandem, science and industry have become more collaborative and interdisciplinary. Given this context, there is a growing need for professionals with skills at the intersection of science and data science who can also work effectively in interdisciplinary teams and communicate their work.

To identify specific program learning goals that are in market-demand, we survey hiring managers, academics, and professionals in the earth and environmental sciences about the core skills that they seek for new hires in data science positions. Using this data, we’ve built our curriculum around five learning areas: technical data science, domain science, ability to use different data types and structures, scientific communication and interdisciplinary collaboration. These in-demand skills prepare students and professionals for careers in data-intensive science that address a variety of large scale environmental challenges and are responsive to rapid changes in technology.

 

In a survey of hiring managers, scientific programming skills were considered in demand. Earth Lab survey results indicate the comparative importance of scientific programming tools to industry and academic organizations. Source: Earth Lab 2017 Survey.

Training the Next Generation of Earth Data Scientists

A First of Its Kind Earth Data Analytics Professional Graduate Certificate

In support of our mission to train a data-capable earth and environmental science workforce, we’ve created one of the first professional earth data science programs in the country. The three course, 10 month Earth Data Analytics - Foundations professional graduate certificate program trains students who are new to earth and environmental data science in the skills required to integrate data-intensive approaches into their careers. In as little as 10 months of online or in person instruction, the program provides students with a powerful combination of skills at the intersection of earth and data science.

Increasing Diversity in STEM By Building Earth and Environmental Data Science Teaching Capacity at Partner Institutions

Through a grant from the National Science Foundation - Harnessing the Data Revolution, we have designed and are leading a three-year Earth Data Science Corps program which builds sustainable capacity for faculty to teach earth data science at institutions serving students historically underrepresented in STEM. Our focus has specifically been on students at Tribal and Hispanic Serving Institutions including:

  • Oglala Lakota College in Kyle, South Dakota
  • United Tribes Technical Institute in Bismarck, North Dakota, and
  • Metropolitan State University of Denver

The program includes online and in-person training for faculty and students to learn technical data skills, focused training to help faculty embed data intensive content into their courses, an applied internship where students develop skills-learned through real world projects (project-based applications), and a full-semester course.
 

Spectrum of data science teaching capability at EDSC partner institutions.

Evaluation is core to the EDSC project to assess the program’s effect on student skill attainment, self-confidence, career interest, and career persistence in STEM as well as understand how students best learn in online environments. In our first year of data, students reported that the program had a positive impact on their sense of science identity and belonging and that they built skills that prepared them for their future careers. Students also enjoyed the flexibility and convenience of online learning, but generally indicated that they preferred in-person instruction.


 

Earth and Environmental Data Science Workshops For Students of All Levels

For students and professionals who want a basic introduction to earth and environmental data science or to learn targeted skills, we offer workshops that teach the core programming and open reproducible science skills to work with environmental and earth systems data in collaborative team environments. Workshops are aimed at participants of all skill levels and backgrounds and are generally offered fully online.

The Earth Data Analytics professional graduate certificate consists of three sequential courses, which provide the fundamental skills required to work in the growing field of earth data science.
Earth Lab education hosts technical workshops on topics like working with spatial data, using Git/GitHub for version control and collaboration, and writing clean code.
The Earth Data Science Corps, funded by the National Science Foundation, is a $1.2 million three-year project that builds capacity to teach and learn earth data science at schools serving communities that are historically underrepresented in STEM.

Earth and Environmental Data Science Core Skills

Technical Data Science Skills 

Our earth analytics education programs teach the scientific programming, version control and command line skills required to create efficient, open and reproducible workflows to process earth data. Currently we specifically focus on Python, Git and GitHub, and bash given these tools were consistently listed as in demand in our industry surveys. Knowledge of these tools is meant to serve as a basis for learning other programming languages and software in today’s evolving data science landscape.

Understand Scientific Applications of Data Science

While data skills can be applied in almost every field, science domain knowledge differentiates students when they enter the job market. We integrate earth and environmental science into our data science lessons by teaching students how to frame a scientific question, identify appropriate data, and produce a useful final product. 

Find & Work With Different Types of Data

Students finish our programs with the ability to efficiently find and work with different data types and structures. Our lessons cover how to work with spatial, remote sensing, and time series data that comes in raster, vector, and hierarchical formats. We also teach students how to combine these different types of data, critical for harnessing the rapidly growing number of data sources available today.

 

Learn how to work with and plot spatial vector data using the geopandas package for Python.
Learn fundamental concepts related to working with raster data in Python, including understanding the spatial attributes of raster data, how to open raster data and access its metadata, and how to explore the distribution of values in a raster dataset.
Learn how to work with the datetime object in Python which you need for plotting and working with time series data.

Communicate & Collaborate in Synchronous & Asynchronous Interdisciplinary Environments

As science and industry become more interdisciplinary and work environments increasingly flexible, strong communication and collaboration skills are critical to professional success. We teach these skills to students through group work, communication-focused activities, project based-learning and tools like Git/GitHub, Slack, and Discourse.


 

Promoting Open, Reproducible Workflows that Accelerate Science

Open reproducible science occurs when a scientist makes their workflow available for others to view, use and run from beginning to end. It involves connecting data inputs, processing methods and outputs with supporting documentation that allow peers to replicate the process. Reproducible scientific workflows allow scientists to build upon one another's methods rather than begin from scratch, boost the visibility of scientific work, allow peers to check for errors or provide feedback and increase the efficiency of the science as a whole. Our program teaches the process of developing open, reproducible workflows using real-world data and common open science tools such as Python programming, Git and GitHub, bash and Jupyter Notebooks. 


 

An open science workflow highlighting the roles of data, code, and workflows. Source: Max Joseph, Earth Lab at University of Colorado, Boulder.

Our curriculum exposes students to each step required to develop, implement, and communicate the components of a science project. This includes articulating a challenge to a broad audience, developing a reproducible data processing workflow to address the challenge, and communicating the results in both written and verbal formats. Through our courses, students develop the skills and confidence needed to independently define, find data for and complete data-intensive projects to address scientific questions and challenges. 

 

This lesson teaches you how to use Jupyter Notebooks, an interactive environment where you can write and run code such as Python and add text that describes your workflow using Markdown.
Learn why open reproducible science is important and discover tools that support open science including Shell (Bash), git and GitHub, and Jupyter in this lesson.
Review this presentation on benefits and best practices for working reproducibly.

Featured Blogs

Project Team

Project Lead

Elsa Culler