Authors

Topics

Tags

Data Science
Data Skills
Education and Training

What is Scientific Programming (And Why It Rocks)

In this post, we will introduce scientific programming and talk about why scientists should learn to code and the benefits of clean and reproducible coding practices for open science. 

A man in jeans and a t-shirt types into a console. He types a few commands that run a script. No errors, no red text. Everything goes smoothly. 

This is probably not most people’s image of a geologist. Yet, these days many geologists spend as much time looking at a computer monitor as they do looking at rock formations, just as many modern hydrologists spend far more time writing code and scraping databases than wading in rivers. 

This is the reality of science in the 21st century. Research is becoming more efficient, faster, and more reproducible. It is a world where scientists in nearly every discipline have a tool in common: scientific programming. 

In this post, we will introduce scientific programming and talk about why scientists should learn to code and the benefits of clean and reproducible coding practices for open science. 

What is Scientific Programming? 

Scientific programming has a definition that, while simple, covers an immense number of applications and industries. Technically, scientific programming is any time a computer program is used for science research.

Caption: Scientific programming is any time programming is used for science research.

Source: Christiaan Colen

Alt Text: Green code on a black screen, with the keyboard in the foreground. 

 

Most scientists can benefit from scientific programming, ranging from geologists to zoologists. Using scientific programming, a researcher can increase the rate and reproducibility of their work exponentially. While humans are certainly better at some things than computers, doing massive calculations, storing data, and analyzing results are exactly what computers were designed to do. Scientists can use computers to automate processes that could otherwise be incredibly time-consuming, tedious, error-prone, and difficult for human researchers. 

Importantly, computers don’t make mathematical errors. Of course, mistakes can still happen, but these mistakes are usually rooted in a human-based error—computers just follow directions, so if there is a mistake in the calculation, the computer is not going to catch it. That being said, a computer can, in the matter of a few minutes, do a series of calculations that would take a human researcher months or even years. Even better, it will perform the calculations exactly the same way each time the code is run.

What can I do with Scientific Programming?

Scientific programming can benefit scientists and researchers in a number of different ways. Without being field specific (we’ll give you a few Earth data science examples below), the most important scientific programming superpowers are to:

  1. Automate time-intensive tasks: You can use scientific programming to automate tasks that might take you weeks, months, years, or simply be impossible to do by hand. For example, if you wanted to track the number of tweets surrounding a recent natural disaster, it would be miserable to go through tens of  thousands of feeds one-by-one. Using code, this task might only take a few minutes. 
  2. Modify and update research: If you write clean code, it can be modified and rerun over and over. Let’s pretend you’re studying how socio-economic data relates to air pollution in the Denver area. If you write a clean, well commented script, it will be easy to incorporate the next year’s census data into your results. 
  3. Share methods with the public and other researchers: Code is easy to share, making science more open and reproducible. As a researcher, it allows you to share your exact methods with both other scientists and with the public.
  4. Document workflows: Code allows you to easily document your workflow. You can use comments to explain every step of the process (to your future self or to others), so if you need to update or change something later, it is fast and simple. 
  5. Enable collaboration: Code makes collaborating easier. Going back to the previous example, if you’re studying air pollution in Denver, and a colleague is studying air pollution in San Francisco, you can compare models, exchange scripts, and work together.

 

This is a pretty incredible list. It pushes science forward in powerful ways not only because of the speed of computation and calculation, but also the ease of collaboration and modification. From helping biologists to sequence the human genome, to allowing social scientists to make better economic predictions, to the Earth science examples below, scientific programming has been and continues to be revolutionary.  

The Cold Springs Fire: A Programmatic Workflow Example

Here’s an example of a workflow that is drastically improved with scientific programming.

You’re studying the Cold Springs Fire that occured in Colorado in July 2016. You want to understand how the fire affected vegetation, which will involve using satellite images (from Landsat, MODIS, and NAIP) from before and after the fire, with the fire boundary (a “shapefile”) as an overlay. In order to do this, you may need to first change the coordinate reference system with coordinates and a projection---the way that spatial data is flattened to be conveyed on a 3D surface---of the Landsat images to match that of the fire boundary. You would then crop the images so that you’re only analyzing vegetation data within the fire boundary. Next you would remove clouds from the images, as these pixels would throw off your results. Since each Landsat image contains multiple unique bands representing different wavelengths of light, you need to select the bands of interest. Finally you would calculate Normalized Difference Vegetation Index (NDVI), a measure of vegetation health, and Normalized Burn Ratio (NBR), a measure of the severity of wildfire using mathematical calculations with certain band combinations, and plot the results.

 

Caption: A NAIP image of the location where the Cold Springs Fire occurred, taken after the event.

Manually, this process would be very tedious, taking days or even weeks if you have a lot of images. You would likely also need to purchase proprietary software programs to process the images and perform raster calculations. You’d have to open each individual set of Landsat bands and perform the steps above—reprojecting, cropping, stacking bands, cloud masking, and doing raster calculations. Cloud masking in particular would be a headache, as you’d need to visually look for clouds to cut out. There would be increased potential for error as you slog through each image, as you may forget a step. 

Your process would be far less reproducible too, as another scientist wishing to repeat your study would need to own the same software as you and follow your exact steps. Alternatively, if you did it programmatically, you could easily share your code and workflow with other scientists, allowing them to repeat your process in minutes. Once the code is written, you could use it on as many images as you’d like with minimal additional time investment.

Caption: An example of NBR calculated programmatically on a Landsat image.

 

Caption: An example of a NDVI calculated programmatically on a MODIS image.

More Real-World Examples of Scientific Programming

Using Scientific Programming to Study Fire Pollution and Respiratory Health

Caption: An air tanker drops red flame retardant over a community in Washington to slow the progress of a wildfire. Source: Wikipedia

Alt Text: An air tanker drops red flame retardant over a community in the Washington hills.

 

We have several scientists at Earth Lab that specialize in studying wildfires across the United States. Our research team investigates many aspects of fire including the plants that burn most commonly like invasive cheatgrass, ways to defend communities from the increasing number of wildfires, how social media can be used to aid in wildfire response, and how different forest disturbances like insect invasion, wind events, and logging can influence the frequency of fires. 

A large and commonly overlooked danger of wildfires is the toll they take on human respiratory health. Smoke from fires is related to a number of breathing and lung problems because it drastically increases the amount of fine particulate matter in the air. Fine particulate matter is one of the most dangerous pollutants from wildfires and has been linked to lung cancer and heart and lung problems

As an airborne pollutant, the effect of these fine particles on local populations can be difficult to predict and track—respiratory illnesses have many causes and are influenced by economic, lifestyle, and demographic factors as well as air pollution. Separating the effects of wildfire related pollution on a population from vehicle exhaust or industrial pollution requires a lot of data and complex modeling. 

Scientific programming offers a solution to this challenge. Researchers can combine data sources and identify “confounding variables” (variables that cause the same health response as the wildfire smoke and will negatively impact the accuracy of the model). 

Earth Lab researchers use a combination of hospitalization and emergency visit data, U.S. Census data, meteorological data, and imagery from geostationary satellites to create models of how these particles impact respiratory health. By finding algorithms, or mathematical rules, that best fit the data, the models can accurately predict the effect of smoke from future wildfires on vulnerable populations.  

Scientific programming is vital for creating these complex models—a human cannot combine and understand all of the data types without machine assistance. In addition, while it may be possible for a researcher to compute the algorithms explaining the patterns in the data by hand, it would take an impractically long time. Thus, the models relating wildfire smoke pollution to negative respiratory health impacts could not exist without scientific programming. 

Using Scientific Programming and Social Media to Respond to Disasters

One of the greatest strengths of scientific programming is that it can do an amount of work that would be impossible for a single individual or even a team of people. That superpower is displayed clearly in this example from Earth Lab scientist Lise St. Denis’ research.

Lise uses twitter to help first responders identify developing and in progress emergency situations that they may not be aware of.   

When disaster strikes, people call the police and other officials, but they also reach out to their online community, sometimes with vital information. In addition, during disasters hotlines can easily be overwhelmed by callers. Social media may be the only line of communication. 

This means that sites like Twitter can have vital information that disaster response teams need, but there are far too many tweets for a person (or team) to sort through and still get information to response teams fast enough. Lise St. Denis knows this from first hand experience. She was part of a team during the 2014 Carlton complex fire who was tasked with sorting tweets and forming a nightly report. While their report was useful, Lise’s team couldn’t keep up with the flow of tweets, and responders needed the information faster than the team could provide it. The solution was clear to Lise—they needed to enlist the superpowers of scientific computing.

For the last few years, Lise has been working on creating a filtration algorithm that scrapes data off Twitter and sorts it by importance. This code can do the work of an entire team of humans by “looking at” and categorizing every single tweet as it comes in. These tweets are sorted by the algorithm into messages that the human team needs to look at, and those the algorithm has safely identified as unimportant for first responders. “Looking” at a massive amount of fast moving data and putting it into categories is one of the strengths of scientific programming. 

The Future of Scientific Programming

It is hard to select one area of scientific programming that is particularly promising because there are so many exciting possibilities. Nearly every field of science has some programming tool that is the “way of the future,” and writing about all of them would be impossible. 

A promising trend that benefits nearly all fields is the realm of modeling. Models have been the basis of science for centuries. They are used in astronomy, Earth science, and medicine for discovering star systems, predicting wildfires, and understanding diseases respectively. The more accurate and complete the algorithms that power such models become, the better the models will be (but no model is perfect!). 

In the modern data-driven world, science is nearly synonymous with scientific programming. By harnessing the power of massive computers, problems that have frustrated the scientific world for decades are solved in a matter of moments. The increase in efficiency and speed has utterly altered most fields of modern science. Science programming is, without a doubt, our path to the future. 

If you are ready to start learning how to program for Earth data science, you can learn more about our professional certificate program here. If you have any questions (or have an amazing example of scientific programming that we should include), please let us know in the comment section below!