NO2 Prediction using Machine Learning Analyses in Google EE
By Anna Pavlenko
Contents:
Nitrogen Dioxide (NO2) air pollution.
The World Health Organization estimates that air pollution kills 4.2 million people every year.
The main effect of breathing in raised levels of NO2 is the increased likelihood of respiratory problems. NO2 inflames the lining of the lungs, and it can reduce immunity to lung infections.
Even there are connections between respiratory deceases / also exposure to viruses and more deadly cases and level of NO2 pollution in our atmosphere.
Sources of NO2:
The rapid population growth, The fast urbanization:
- Industrial facilities
- Fossil fuels (coal, oil and gas)
- The increase of transportation – 80 %.
The affect air pollution (NO2): population health, and global warming.
Fig_1. Pollution in Industrial cities
Workflow:
Fig_2. WorkFlow of this project
Study Area / Input Data:
Study area for research Project: Los Angeles, CA. Data: Image collection of Landsat 8 for 2014 – 2019 years, Sentinel 5-P (TROPOMI) 2018 -2019.
Fig_3. Los Angeles, CA / Landsat 8 / Sentinel 5-P (TROPOMI)
ML Regression Analysis uses in this Project
The machine learning toolbox includes several linear and non-linear supervised learners, predicting either numeric outputs (regressors) or nominal outputs (classifiers).
Classification Workflow
-
Build
-
Train
-
Apply
-
Assessment
Classification Workflow
var training = image.sample(region, scale) var classifier = ee.Classifier.randomForest().train(training) var result = image.classify(classifier) var predictor = classifier.setOutputMode(Regression) var confusionMatrix = classifier.confusionMatrix() var accuracy = confusionMatrix.accuracy()
Classifiers
Classification and regression trees.
Linear Regression: Random Forest - Random Decision Forest, SVM - Support Vector Machine
Classifier Output Mode
classifier.setOutputMode(mode):
Classification - Discrete input/output classes
Regression - Continuous valued output
Probability - binary classifiers only
// ___ Support Regression: Random Forest, SVM // ___ Support Probability: Cart, NaiveBayes, IKPamir*, Pegasos, SVM, Perceptron
Methods in this project
Random Sampling
Training DataSEt created by using Random Points. Random Points were collected by areas selected in 9 different levels of NO2 by TROPOMI satellite imagery 2019.
Fig_4. TROPOMI imagery 2019
Fig_5. Random points collection from TROPOMI 2019
Supervised Classification
In project was used Random Forest method.
Fig_6. Random Forest Classification 2019
We can assess the accuracy of the trained classifier using a confusionMatrix.
Two ways of Predict continuous values: Across Space & Over Time:
Regression: Predict continuous values output Across Space
Contexts of Linear Regression in GEE:
Fig_7. Regression Across Space 2018
Fig_8. NDVI and Regression 2018
Linear Regression 2018
Fig_9. Regression 2018
Fig_10. NDVI and Regression 2018
Model Accuracy:
For 2018 there are TROPOMI data exist, also Predicted value NO2 calculated from Predictor 2019. It is possible to evaluate quality of Predictor.
Fig_11. Difference Raster 2018 - Difference between actual value NO2 and Predicted value NO2
Predictions 2018 -2015 years:
Fig_12. NO2 Predictions for 2018 -2015 years
Predict continuous values output Across Time (Random Forest REGRESSION):
Fig_13. Context of Linear Regression in GEE - over Time
Fig_14. Predicted NO2 level Over Time (2018 - 2014)
Fig_15. Power Plants in GEE
Fig_16. Power Plants Predictions in GEE
Result
The goal to create data of NO2 for past years (2018, 2017, 2016, 2015) by using data for 2019 (TROPOMI and Landsat 8) was reached.
Summary / Conclusions:
To improve accuracy of Regression / Prediction Model combination from multiple Machine Learning Algorithms or Multiple Predictions several times from the same Algorithm to make more accurate predictions - Ensemble Model.
References:
-
Bert Brunekreef, Stephen T Holgate "Air pollution and health". THE LANCET • Vol 360 • October 19, 2002 • www.thelancet.com
-
M.L. Brusseau, A.D. Matthias, A.C. Comrie and S.A. Musil "Atmospheric Pollution". Environmental and Pollution Science. https://doi.org/10.1016/B978-0-12-814719-1.00017-3 Copyright © 2019 Elsevier Inc. All rights reserved.
-
M.L. Brusseau "Physical Processes Affecting Contaminant Transport and Fate". Environmental and Pollution Science. https://doi.org/10.1016/B978-0-12-814719-1.00007-0 Copyright © 2019 Elsevier Inc. All rights reserved. 103.
-
Gustavo Camps-Valls "Machine Learning in Remote Sensing Data Processing" Conference Paper · September 2009 DOI: 10.1109/MLSP.2009.5306233.
-
LEO BREIMAN "Random Forests". Machine Learning, 45, 5–32, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands.
-
Robert H. Shumway • David S. Stoffer "Third edition Time Series Analysis and Its Applications". 2011
-
"Sentinel-5 precursor/TROPOMI Level 2 Product User Manual Nitrogendioxide". 2017