Assessing the Extent of a Flood Event using Python: A Change Detection Approach with Landsat Imagery
Note from author:
I am very passionate about climate scinece and resilency of human societies towards climate change. My areas of interest include vulnerbaility analysis and risk/mitigation planning in urban areas.
Flood events have always been a a topic i choose to complete projects on becasue Of a personale connection to its destruction. You know... with Hurricane Katrina.. it still hurts me to this day over 1000 people died in New Orleans alone de to infastructure failure. And my own home flooded in GA back in 2009.
So really this project was chosen out of curiosity of pythons capabiltiies with geospatial data. The topic is somethign that I can relate to
What does this topic mean?
If you’ve ever wondered what “identifying the extent of a flood event by classifying Normalized Difference Water Index (NDWI) images that are derived from Landsat 5 Thematic Mapper bands 2 and Band 4 imagery” is… be prepared to have your mind blown!
If you have NO idea what I’m talking about but you’ve been affected by floods, or you’d like to understand how to implement preventative measures in your area, keep reading!
I’m going to break this down Barney style, so you’ll easily be able to introduce this topic in conversation and be the facilitator of change you’ve always wanted to be
Why should I care about this topic?
English author Terry Pratchett said “If you do not know where you come from, then you don't know where you are, and if you don't know where you are, then you don't know where you're going. And if you don't know where you're going, you're probably going wrong.”
We can answer questions like what is the approximate area of land coverage that the flood impacted? Who has been impacted within the flood extent? What areas are in need of emergency response teams deployment? How does this affect the local economy? Answers to these questions can provide communities with clarity, and a concise plan for moving forward. Currently, people affected by floods cling on to hope as they blindly navigate the frustrating path of rebuilding their lives
# Import earth-analytics-python packages and libraries
import seaborn as sns
import os
import folium
from folium import plugins
import pandas as pd
import geopandas as gpd
import numpy as np
from glob import glob
import rasterio as rio
from rasterio.warp import calculate_default_transform, reproject, Resampling
from rasterio.plot import plotting_extent
from rasterio.mask import mask
import earthpy as et
import earthpy.spatial as es
import clip_data as cl
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates
from shapely.geometry import mapping
from matplotlib.patches import Patch
from matplotlib.colors import ListedColormap, BoundaryNorm
%matplotlib inline
# Set directory
os.environ["PROJ_LIB"] = r"C:\Users\tonif_000\Anaconda3\envs\earth-analytics-python\Library\share"
#os.chdir(os.path.join(et.io.HOME, 'flood_analysis'))
# Set standard plot parameters for uniform plotting
mpl.rcParams['figure.figsize'] = (10, 10)
mpl.rcParams['axes.titlesize'] = 20
plt.rcParams['figure.figsize'] = (10, 10)
# Plot with seaborn
sns.set(font_scale=1.5)
sns.set_style("whitegrid")
About the 2008 Midwest Floods
Taking the advice from the quaote above, we are going to go back to the summer of 2008.
During the summer of 2008, areas in the Midwestern parts of the United States experienced flooding as a result of heavy rainfall and overflowing rivers. States effected by severe flooding, rivers that overflowed, and/or levee failures included Illinois and Indiana.
On June 10, 2008 Lawrence County, IL experienced 4 levee failures along the Embarras River (central-western side of county) and the Wabash River (eastern side of county) which creates the Illinois-Indiana state line[5] On June 11, 2008 Lawrence County was declared a disaster area [5].
Overview of the Study Area: Lawrence County, Illinois
Zoom in the image to explore the area. Pay close attention to the location of the Embarras River and Wabash River
# Create a map using Stamen Terrain as the basemap
m = folium.Map(location=[38.7158, -87.7763])
# Display map
m
The following is to support the data for flood event background and study area (not used in the analysis).
The Flood event occured on June 10th through the 11th. Notice how much rain the study area already received from May 2008.
# Define path for preciptation data within study area
precip_may_jun_2008_path = "data/precipitation/LAWRENCEVILLE_2_WSW_IL_US _GHCND_USC00114957_precipitation.csv"
# Read the data
precip = pd.read_csv(precip_may_jun_2008_path,
parse_dates=['DATE'],
index_col=['DATE'])
# Plot the precipitation data
# Define the format for date labels on the x axis
myFmt = DateFormatter("%m/%d")
fig, ax = plt.subplots()
ax.bar(precip.index.values, precip['PRCP'].values, color='grey')
# Add lables and title
ax.set(xlabel='', ylabel='Precipitation (in)')
ax.set(title='Daily Precipitation -Lawrence, IL May 1 -June 31, 2008 \n Flood Event: June 10-11, 2008 \n')
# Change orientation of x axis tick marks - easier to read labels
plt.xticks(rotation=90)
ax.xaxis.set_major_locator(mdates.WeekdayLocator())
ax.xaxis.set_major_formatter(myFmt)
plt.show()
Project Purpose
Now here is where I delve into the the technicalities of my research. In order to measure the extent of flooding disasters, I used Python to derive the NDWI from Landsat 5 ™ imagery and change detectection analysis.(give a brief explanation of what Landsat 5 is and change detection analysis) If your eyes are starting to glaze over this simply means (express in layman's terms)
How do you use python to extract the extent of a flood using python?
Python has many packages available for geospatial analysis. I highly recommend utilizing earthpy published by the EarthLab at the University of Colorado Boulder. This python packages is perfect for people just starting to learn python and who are interested in manipulating geospatial data in python. You can find more information about the earthpy package at this website, https://github.com/earthlab/earthpy
The analysis I will guide you through in this blog primarily utilizes pandas, geopandas, rasterio. Check out their documentation to learn more!
Let’s Begin!
About the data:
Landsat 5 Thematic Mapper (TM) Images
Landsat 5 TM has a total of 8 bands and has a spatial resolution of 30 meters. If you would like to learn more about these bands and how they are best used in a study, check out this post from the U.S. Geological Survey found at https://landsat.usgs.gov/what-are-best-spectral-bands-use-my-study.
It is important to note that my analysis will only use Landsat 5 TM bands 2 and 4
Two Landsat 5 TM images used in the flood extent analysis were taken on June 9, 2007 and June 11, 2008. The figure below shows the June 2007 band 5 image on the left where there is no flooding and the June 2008 image shows the flooding a day after the flood waters began to rise. It is clear in the two images that the majority of the land-use is used for agricultural purposes with low to medium density urban areas within Lawrence County, IL. This will become more relevant when I discuss my approach to identify water in the imagery.
The Landsat 5 TM imagery was obtained from Earth Explorer found at https://earthexplorer.usgs.gov/.
Illinois Shapefile
Illinois shape file was used to identify and set the extent of the analysis. It contains information from the 2010 United States Census. This data was obtained from United States Census Bureau| TIGER/Line Shapefiles and can be downloaded at https://www.census.gov/geo/maps-data/data/tiger-line.html
Now, its time to get to the fun part of this analysis!
Extracting the Flood Event from Landsat 5 TM Imagery.
How did I extract the flood event?
Before we dive into the details of the methodology of my analysis, I want to highlight the main takeaways of my analysis as shown in the Figure below.
The flood extent was extracted from the Normalized Difference Water Index derived from Band 2 and Band 4 Landsat 5 TM imagery. My analysis presents a method to measure the extent of flooding disasters by using python to derive the Normalized Difference Water Index (NDWI) from Landsat 5 TM imagery and change detection analysis. Band 2 and Band 4 were used calculate NDWI and identify flood water extent. The flood area was then extracted in order to identify the extent of flood damage in Lawrence County, IL. It was found that approximately 12.4 % of the total area in Lawrence County received flooding on June 11, 2008.
Import the data for analysis
Data for flood extent analysis
# Country and State Boundaries for the study area
illinois_path = "data/study_area/Illinois/vector/tl_2008_17_tract00/tl_2008_17_tract00.shp"
# Lansat 5 TM Pre and Post Flood
# Grab bands 2 and 4
ndwi_bands_pre_path = glob(
"data/study_area/Illinois/landsat/pre_flood/LT5022033007160GNC01-2015-11-19/LT5022033007160GNC01/*B[2,4]*.tif")
ndwi_bands_post_path = glob(
"data/study_area/Illinois/landsat/post_flood/LT50220332008163GNC02/*B[2,4]*.tif")
# Sort the landsat bands
ndwi_bands_pre_path.sort()
ndwi_bands_post_path.sort()
Identify the Study Area: Lawrence County, IL
- Set the Extent to the Study Area:
The study area of this analysis will focus on Lawrence County, Illinois. The Lawrence County shapefile was obtained from the original Illinois shapefile, by subseting the geodataframe data to Lawrence County
# Read the county .shp to crop Landsat 5 TM imagery
county = gpd.read_file("data/study_area/lawrence_county_IL/vector/county.shp")
county
STATEFP00 | TRACTCE00 | CTIDFP00 | NAME00 | NAMELSAD00 | MTFCC00 | FUNCSTAT00 | geometry | |
---|---|---|---|---|---|---|---|---|
0 | 17 | 981000 | 17101981000 | 9810 | Census Tract 9810 | G5020 | S | POLYGON ((-87.53516399999999 38.681437, -87.54... |
Crop Landsat Imagery to Study Area (Lawrence County, IL)
Since I only care about the flood extent within Lawrence County, I was able to crop the Landsat 5 TM imagery to the extent of the county shapefile. Think about this as a way applying a cookie cutter to the Landsat 5 TM imagery.
Pre-Flood
The images below are greyscale Landsat 5 TM Band 2 and Band 4 images that were used to calculate the NDWI. These bands are known to do a better job at distinguishing flood water pixels
Post-Flood
# Open and crop bands to study area (Lawrence County)
# Band 2 - Green Band
with rio.open(ndwi_bands_post_path[0]) as green_band_post:
county_prj = county.to_crs(green_band_post.crs)
extent_geojson = mapping(county_prj['geometry'][0])
green_band_post_im, green_band_post_meta = es.crop_image(green_band_post,
[extent_geojson])
# Band 4 - Near Infrared Band (NIR)
with rio.open(ndwi_bands_post_path[1]) as nir_band_post:
county_prj = county.to_crs(nir_band_post.crs)
extent_geojson = mapping(county_prj['geometry'][0])
nir_band_post_im, nir_band_post_meta = es.crop_image(nir_band_post,
[extent_geojson])
# Plot the cropped Landsat Data
es.plot_bands(green_band_post_im,
title="Landsat 5 TM Band 2 (Green)\n\n Post-Flood: Lawrence County, IL \n Jun. 11, 2008 ",
cmap="Greys_r")
# Save the image
plt.savefig('green_band_post_im.png')
es.plot_bands(nir_band_post_im,
title="Landsat 5 TM Band 4 (NIR)\n\n Post-Flood: Lawrence County, IL \n Jun. 11, 2008 ",
cmap="Greys_r")
# Save the image
plt.savefig('nir_band_post_im.png')
The images below are greyscale Landsat 5 TM Band 2 and Band 4 images that were used to calculate the NDWI. These bands are known to do a better job at distinguishing flood water pixels
Calculate Normalized Difference Water Index (NDWI) Using Earthpy
2. Normlaized Difference Water Index (NDWI)Analysis - Identify pixels containing flood water:
The Normalized Difference Water Index (NDWI) was really powerful for this analysis. Since I am very familiar with the study area, I was very excited to see how water was clearly distinguished in both images as seen in figures below.
The formula used in this analysis can be found in the resources below
Pre-Flood
# Calculate NDWI
pre_ndwi_band = es.normalized_diff(b2=green_band_pre_im[0],
b1=nir_band_pre_im[0])
# Plot NDWI for Pre-Flood Event
fig, ax = plt.subplots(figsize=(10, 10))
pre_ndwi = ax.imshow(pre_ndwi_band,
cmap='PiYG',
vmin=-1,
vmax=1)
fig.colorbar(pre_ndwi, fraction=.05)
ax.set(title="Landsat 5 TM Derived NDWI \n Pre-Flood-Lawrence County, Illinois Jun. 9, 2007\n")
ax.set_axis_off()
# Save the image
plt.savefig('ndwi_pre_flood.png')
plt.show()
NDWI Derived from Landsat 5 TM pre-flood conditions. Water is depicted in the 0 – 0.25 range, while 0.25 to 1 is some sort of vegetation/ land-use.
# Calculate NDWI
post_ndwi_band = es.normalized_diff(b2=green_band_post_im[0],
b1=nir_band_post_im[0])
# Plot Post-Flood Event
fig, ax = plt.subplots(figsize=(10, 10))
post_ndwi = ax.imshow(post_ndwi_band,
cmap='PiYG',
vmin=-1,
vmax=1)
fig.colorbar(post_ndwi, fraction=.05)
ax.set(title="Landsat 5 TM Derived NDWI \n Post-Flood-Lawrence County, Illinois Jun. 11, 2008 \n")
ax.set_axis_off()
# Save the image
plt.savefig('ndwi_post_flood.png')
plt.show()
NDWI Derived from Landsat 5 TM post-flood conditions. Water is depicted in the 0 – 0.25 range, while 0.25 to 1 is some sort of vegetation/ land-use.
Calculate the difference between pre and post flood ndwi
Change Detection Analysis using NDWI Derived Imagery.
Even though I was super excited about the results shown above, I began to wonder how do I know that this is actually standing water as a result of flooding. Should I remove the original water bodies (such as rivers and lakes) so that I can only see flood associated water?
After reading some material related to change detection [1,3], I realized that form y analysis I wanted to remove all original water sources. Change detection analysis allows to clearly identify areas that have changes over time by taking the difference between before and after images. In this case, think about think about this as a way to remove the original water sources from the analysis and see identify areas submerged by water.
# NDWI flood_extent = post_ndwi_band - pre_ndwi_band
ndwi_flood_extent = es.normalized_diff(b2=post_ndwi_band,
b1=pre_ndwi_band)
# Plot Flood Extent
fig, ax = plt.subplots(figsize=(10, 10))
ndwi_flood = ax.imshow(ndwi_flood_extent,
cmap='PiYG',
vmin=-1,
vmax=1)
fig.colorbar(ndwi_flood, fraction=.05)
ax.set(title="Landsat 5 TM Derived NDWI (Difference)\nFlood Water in Lawrence County, Illinois on June 11, 2008\n")
ax.set_axis_off()
# Save the image
plt.savefig('ndwi_difference_flood.png')
plt.show()
Reclassify NDWI Image
3. Extract Pixels contacting Flood water:
In order to distinguish flood related pixels from other variations of lands, I reclassified the NDWI difference image into four classes.
Many may wonder why I did not reclassify into two classes to clearly depict land and water. My response is that this step took a lot of trial and error due to the histogram outputs and range of pixel values. I came to the solution to use four classes because it clearly narrowed down flood water related pixels to bin 1 (shown in the image below). I believe that the resulting NDWI difference image still contained some level of noise since the purpose of NDWI is to show water content in an area - hence, picking up the water content in vegetation ( especially since this area is primarily agricultural lands- use).
# Plot a histogram of the NDWI difference image
fig, ax = plt.subplots(figsize=(10, 10))
ax.hist(ndwi_flood_extent.ravel(),
color='purple',
edgecolor='white')
ax.set_title("NDWI Histogram \n (Removed Original Water Sources)",
fontsize=16)
ax.set(xlabel="Pixel Value",
ylabel="Number of Pixels")
# Save the figure
plt.savefig('NDWI_histogram.png')
plt.show()
counts, bins, patches = ax.hist(ndwi_flood_extent.ravel(),
color='springgreen',
bins=4)
class_bins = bins
ndwi_class = np.digitize(ndwi_flood_extent, class_bins)
np.unique(ndwi_class)
#bins
array([1, 2, 3, 4, 5], dtype=int64)
fig, ax = plt.subplots(figsize=(10, 10))
ax.hist(ndwi_class.ravel(),
color='magenta',
edgecolor='white')
ax.set_title("Histogram: Reclassified NDWI (Difference)\n 4 Classes",
fontsize=16)
ax.set(xlabel="Bin Number",
ylabel="Number of Pixels")
plt.savefig('reclassified_histogram.png')
plt.show()
Bin 1 contains pixels that contain flood water values.
# Plot newly classified and masked raster
fig, ax = plt.subplots(figsize=(10, 10))
colors = ['blue', 'linen', 'black']
class_labels1 = ["Flood Water", "Land", "Other"]
legend_patches1 = [Patch(color=icolor, label=label)
for icolor, label in zip(colors, class_labels1)]
cmap = ListedColormap(colors)
ax.imshow(ndwi_class,
cmap=cmap)
ax.legend(handles=legend_patches1,
facecolor='white',
edgecolor='white',
bbox_to_anchor=(1.35, 1))
ax.set_title("Reclassified NDWI (Difference)\n 4 Classes ")
ax.set_axis_off
# Turn off tick labels
ax.set_yticklabels([])
ax.set_xticklabels([])
plt.savefig('reclassified_ndwi.png')
plt.show()
Python enables me to specific with bin number I would like to show and mask all others as “no-data”. Now, the extent of the flood exist in its own image.
All data not in the floodwater class (in this case bin 1) were masked to produce the final product shown on the right
NDWI Change derived from Landsat 5 TM. This image shows the flooded area in Lawrence County, Illinoise
4. Quantify the extent of the flood:
As a result, I found that the flood extended over approximately 12.4% of the county’s land area! It is important to note that the accuracy of this calculation is as good as the spatial resolution of the satellite imagery the analysis is derived from.
# Count of pixels times 900 square meters
flood_pixels = (ndwi_class_ma.count())*(30*30)
# Lawrence, IL area
lawrence_area = county_prj.geometry.area
# Calculate the percentage
area_within_extent = (flood_pixels/lawrence_area)*100
Approximate area in the study area is within the flood extent:
area_within_extent
0 12.291506
dtype: float64
What is the impact of this analysis?
The impact of this analysis highlights the potential for providing near-real-time flooding information to relief agencies (assuming the data is accessible) and allowing for a fast and timely response for the implementation of evacuation plans and damage assessment. This image product can also be combined with vector data (such as census block groups and roads shapefile) to determine impact and vulnerability to flood events. Furthermore, this analysis can be utilized as a data service resource to government (federal and local) decision makers as well as first responders providing emergency services to the community.
Implementing python in this analysis allowed for an easy to use application that is open source (if you are indeed familiar with python). I believe that the resulting flood extent raster from my analysis can be utilized with both remote sensing and GIS to complete an assessment the impact of a flood event and allow people to be more informed and make better decisions in disaster management. This information to implement mitigation strategies, such as mapping flood-prone areas, delineating flood-plains, and creating land-use maps.
Other work done in this area:
I found the following publications to be very helpful to developing my own methodology ( based on my entry level experience and curiosity of the field). If you find this work to be interesting and want to learn more about a more accurate and in-depth analysis, I highly recommend reading the following publications:
https://www.mdpi.com/2072-4292/6/5/4173/htm
Kussul, N., Shelestov, A. & Skakun, S. Earth Sci Inform (2008) 1: 105. https://doi.org/10.1007/s12145-008-0014-3
References:
[1] Andreoli, Remi & Hervé, Yesou. (2008). CHANGE DETECTION ANALYSIS DEDICATED TO FLOOD MONITORING USING ENVISAT WIDE SWATH MODE DATA Website Accessed on December 11,2018: https://www.researchgate.net/publication/229018921_CHANGE_DETECTION_ANALYSIS_DEDICATED_TO_FLOOD_MONITORING_USING_ENVISAT_WIDE_SWATH_MODE_DATA
[2] Gleason, Karin; “2008 Midwestern U.S. Floods”; July 2008; NOAA’s National Climatic Data Center, Asheville, NC looding of June 2008. [3] https://www.mdpi.com/2072-4292/6/5/4173/htm
[4] Morlock, S.E., Menke, C.D., Arvin, D.V., and Kim, M.H., 2008, Flood of June 7–9, 2008, in central and southern Indiana: U.S. Geological Survey Open File Report 2008–1322, 15 p., 3 app.
[5] National Weather Service(NWS), December 2009,“Service Assessment: Central United States Flooding of June 2008”: U.S. Department of Commerce, National Weather Service, Silver Springs, Maryland
[6] Xu, Hanqiu. (2006). MODIFICATION OF NORMALIZED WATER INDEX (NDWI) TO ENHANCE OPEN WATER FEATURES IN REMOTELY SENSED IMAGERY. Website Accessed on December 11, 2018 http://www.aari.ru/docs/pub/060804/xuh06.pdf
Data Sources and Websites:¶
- Daily Precipitation: https://www.ncdc.noaa.gov/cdo-web/search Downloaded from: https://www.ncdc.noaa.gov/cdo-web/datasets/GHCND/stations/GHCND:USC00114957/detail
- Landsat 5 TM: Earth Explorer: https://earthexplorer.usgs.gov/