Overview

Data on Earth observations is growing at an unprecedented rate. While many scientists may test their workflows using a local computer, the volume of data is growing well beyond the capacity for processing with personal resources (e.g., a laptop/desktop). Fortunately, there are many options for scaling workflows, however using these resources is not always trivial. Some of these resources include high-performance computing through supercomputers at local institutions, cloud computing, and distributed compute leveraging resources from many locations. 

 

Some of the challenges to using these resources include: pre-allocated compute time and waits to access resources (e.g., supercomputers) and complex cost models (e.g., cloud computing).

 

To use any of these resources, users must know how to:

  • modularize their data and code to take advantage of parallelization
  • leverage the strengths of different hardware (e.g., GPU vs CPU), and
  • port their workflows including operating system and library dependencies (e.g., containerization)

Earth Lab works with cyberinfrastructure partners who specialize in building scalable scientific computing infrastructure to inform use cases and test new developments. Earth Lab also works with our network to help build capacity to take advantage of these resources.

Featured Work

    Blog

    ImgSPEC

    ImgSPEC is a prototype on-demand processing system on the cloud that enables use of many different kinds of datasets but...

Project Team

Project Lead

Ty Tuff

Dr. Ty Tuff is our friendly neighborhood data scientist. He is based in Earth lab’s analytics hub where he helps members and affiliates of the lab process, analyze, and publish their hard-won data.