Data on Earth observations is growing at an unprecedented rate. While many scientists may test their workflows using a local computer, the volume of data is growing well beyond the capacity for processing with personal resources (e.g., a laptop/desktop). Fortunately, there are many options for scaling workflows, however using these resources is not always trivial. Some of these resources include high-performance computing through supercomputers at local institutions, cloud computing, and distributed compute leveraging resources from many locations.
Some of the challenges to using these resources include: pre-allocated compute time and waits to access resources (e.g., supercomputers) and complex cost models (e.g., cloud computing).
To use any of these resources, users must know how to:
- modularize their data and code to take advantage of parallelization
- leverage the strengths of different hardware (e.g., GPU vs CPU), and
- port their workflows including operating system and library dependencies (e.g., containerization)
Earth Lab works with cyberinfrastructure partners who specialize in building scalable scientific computing infrastructure to inform use cases and test new developments. Earth Lab also works with our network to help build capacity to take advantage of these resources.
Dr. E. Natasha Stavros is the Director of Earth Lab Analytics Hub. She specializes in complex systems science, data science, image processing, and information technologies. She developed these skills as a fire ecologist, but has applied them in other complex systems including NASA Flight Projects, biodiversity science, and urban ecology.