Authors

Topics

Tags

Open Reproducible Science
Urban Ecology
Active Sensors
Passive Sensors
Remote Sensing
Python
Scientific Programming
Machine Learning
Analytics

Project

Harnessing the power of multi-source, open data to map buildings using machine learning

Freely available optical and SAR data and open-source language and packages allow the continuous assessment of settlements development in urban, and wildland-urban interfaces, which is extremely important for climate adaptation policies.

Contributors:  Natasha Stavros, Clayton Brengman, Ryan Cassotto, and Kristy Tiampo

Up-to-date building maps are essential to assess the vulnerability and exposure of a community to extreme events (e.g. wildfire, flooding, and landslide), which have become more frequent due to climate change. These maps might serve as the basis for rapid response to hazards and for the implementation of policies and planning associated with climate mitigation and adaptation. 

However, building footprint mapping usually relies on very high-resolution images and deep-learning models that are expensive to acquire and run, respectively. Here, we present a workflow using freely-available data for mapping buildings in urban landscapes harnessing the power of advanced image processing and fusion (i.e., Landsat-8/Multispectral and Sentinel-1/SAR), and machine learning. This workflow is meant to be scalable using open-source tools and cloud computing and can be integrated into a Continuous Change Detection and Classification (CCDC, Zhu & Woodcock (2014)) workflow for tracking changes across time (Figure 1).

Figure 1. Visualization of the Continuous Change Detection and Classification steps (1-3), according to Zhu and Woodcock (2014) and USGS (2020).

 

One of the main secrets of a good machine learning prediction and mapping is to prepare a good training dataset. Here, we gathered open data from the Microsoft Building Footprints for available regions (https://www.microsoft.com/en-us/maps/building-footprints). Those footprints are a product of deep learning methods applied to Bing’s very high-resolution images. We incorporated the ESA world-cover data at a 10-m resolution (https://esa-worldcover.org/en) to compose the “other classes” class in a binary classification, using the algorithm Random Forest. 

It is particularly important to segregate buildings from other impervious surfaces (such as roads, and parking lots) that are very similar spectrally. Several steps were incorporated to curate the training/testing data. These included: (i) filtering data by date (building footprints from 2019-2021 only) to match 2020 ESA data, (ii) masking ESA data using QA bands on the number of Sentinel 1 data used and the percentage of high-quality Sentinel 2 data used for, (iii) ESA data vectorization, (iv) levels of classification assignment, (v) data splitting into training (70%) and testing (30%) subsets and (vi) Microsoft and ESA pre-processed data merging (Figure 2). 

Figure 2. Workflow used for training and testing data curation

 

The Random Forest model (Breiman et al., 2001) is very efficient and usually outperforms other machine learning algorithms. It works through bootstrap aggregation to find patterns across data subsets. Here, we used the algorithm following Bayer et al. (2009).  Using the Scikit-learn Python package (Pedregosa et al., 2011), we run it with 500 trees and a balanced class weight. Once trained, the model predicts the classes across the entire scene, and the unknown testing samples are used for accuracy assessment. Moreover, this classification can be expanded for an entire time series by using the class-associated time-series regression coefficients which can be generated using the open-source USGS CCD algorithm (https://github.com/repository-preservation/lcmap-pyccd) (Figure 3). 

Figure 3. CCDC focused on building detection and mapping, integrating USGS (2020) and Earth Lab’s workflows. 

 

Our first assessment on building classification in Rio de Janeiro, Brazil using Landsat data (resampled to 10 m) (Figure 4) and Landsat plus Sentinel 1 data (both resampled to 10 m) (Figure 5) indicates the power of the workflow to predict buildings against other classes (which include other common urban impervious surfaces). Both reached overall accuracies of 87%. However, the Sentinel 1 data (which included (preprocessed VV, VH backscatter, classification, and amplitude) improved the classification by decreasing the false positive rate and increasing the AUROC.

Figure 4. Building map using Landsat 8 (four dates across the year) optical dataset through Random Forest classification


Figure 5. Building map using Landsat 8 + Sentinel 1 (4 dates across the year) data through Random Forest classification

 

The CCDC workflow using optical and SAR data, and the classification’s first assessment presented here, proved that it is possible to have a classified change detection (focused on building detection and mapping) totally based on open-source data and program through a mix of unsupervised and supervised algorithms. 

The combined use of Landsat-8 and Sentinel-1 data allow a higher temporal resolution + structural data (features’ height) + data not affected by dilated clouds and cirrus + possibility of overcoming natural spectral misclassification.

The use of SAR data in the model demonstrated the improvement of subtle change detection and buildings discrimination across urban landscapes, i.e. within impervious surfaces matrix where spectral contrast is low (Figures 6 and 7).
https://lh5.googleusercontent.com/Vp5i_8PHfXdRzwY7xdwoDFousXRVUmFgbucf9FWw0K_jKRyvdenJldcV0kdQgC6vRvdYlw1kaq6_FkL-o-Va0ha29GEY-zunLJUjvIQxXZKikZsGLIHdvFdGdzTFeaShNzGBLDzyiWQhttps://lh3.googleusercontent.com/NBJ1hadM8AKg5je2Q1DSE1HkDiPndj9ZCEBztlH55gXnrS3rCHPkuEPKKdpeTIGjEF1ZgHmqFwk_lmD7wb1QTK3SIkfHfnOERxFGRZSmEk0Y9k6tKDei_X7-wyDRiGMs8tHPO_tzFhs
Figure 6. Visual evaluation of the building’s discrimination from other land cover classes.
https://lh5.googleusercontent.com/wT76UbW9MRK8YKEWnQm-sVHWmzFarRyUCl3Hr1gj4qq4o-YA2KdgFRh_ZF_a0ZezD465Ie8CoiBjLa7DMv6ZG4ymUn4larC5i9XpFgZNnVY3pFlPCSvi2FfOl7LK_cPOkKQ4a2hlVE0https://lh3.googleusercontent.com/NBJ1hadM8AKg5je2Q1DSE1HkDiPndj9ZCEBztlH55gXnrS3rCHPkuEPKKdpeTIGjEF1ZgHmqFwk_lmD7wb1QTK3SIkfHfnOERxFGRZSmEk0Y9k6tKDei_X7-wyDRiGMs8tHPO_tzFhs
Figure 7. Visual evaluation of the building’s discrimination. Highlight the decreasing of misclassification (e.g., other impervious surfaces assigned to the building class) when using the fused Landsat-8/Sentinel-1.

 

References
Beyer, F., Jurasinski, G., Couwenberg, J., & Grenzdörffer, G. (2019). Multisensor data to derive peatland vegetation communities using a fixed-wing unmanned aerial vehicle. International Journal of Remote Sensing, 40(24), 9103-9125.

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. The Journal of machine Learning research, 12, 2825-2830.

Zhu, Z., & Woodcock, C. E. (2014). Continuous change detection and classification of land cover using all available Landsat data. Remote sensing of Environment, 144, 152-171.

USGS (2020). Land Change Monitoring, Assessment, and Projection (LCMAP) Continuous Change Detection and Classification (CCDC): Algorithm Description Document (ADD). Release 1.0. Accessed on March 2nd, 2022. Available at https://www.usgs.gov/media/files/lcmap-ccdc-add