Developing a Tree Cover Score Using Python

The Importance of Tree Cover Over Roofs

When a natural hazard event hits – whether it is wind, hail, or heavy rain – your home's roof bears the brunt of it. If your roof fails, the rest of your home can suffer severe damage. These hazard events have a costly effect on roofs across the nation. Because of this risk the roof is viewed by insurance carriers as the most important part of your house according to Insurance.com.

Insurance companies go through great lengths to assess the roof condition of the homes they cover. Carriers are looking for more automated ways and data-driven analytics to quickly and accurately assess roof condition and ultimately reduce the need for expensive onsite roof inspections. According to Claims Journal, "the market is seeing innovation in the inspection process ─ not just more inspections, but smarter, more strategic inspections." Having detailed property information, such as roof age, presence of skylights, and roof material potentially reduces the need for inspections and could help carriers more quickly determine policy premiums during the underwriting process.

Several data analytics companies, like Cape Analytics and ISO, already offer roof condition ratings for insurers, but these solutions often just rely on past claims history or a limited use of imagery. One characteristic none of the current roof condition solutions seem to consider is tree cover. Tree limbs hanging over a home are a major threat to the roof. As branches scrape against roof shingles on windy days, they can strip off layers of asphalt, reducing the effectiveness of shingles. Their leaves fall directly onto the roof or into the gutter, which can lead to mold, deterioration, or leaks. If the tree is damaged or diseased, a storm can cause limbs to fall onto your home. Tree cover should be part of any roof condition analysis.

Use of imagery, Python, and remote sensing techniques can provide a fast, accurate way to answer the tree cover question. This blog will illustrate a simple way to develop a "tree cover score" at a building level using high-resolution, multi-spectral imagery. The score could be as basic as providing percent of each roof area that is covered by the tree canopy. The score could then be appended to a parcel or address as part of a larger roof condition rating, which is how the insurance carriers would consume it.

Acquiring Data Sources

Several different types of data, both raster and vector, are needed for this type of analysis. High resolution, multi-spectral aerial imagery available through the United States Department of Agriculture (USDA) Farm Service Agency National Agriculture Imagery Program (NAIP) program is a good option because it is free, usually leaf-on, and offers the necessary resolution (1m) for this type of analysis. For this proof-of-concept, NAIP imagery was downloaded from Earth Explorer website.

Building footprints from CoreLogic were chosen because they came pre-attributed with land use and they were available for the study areas. Other datasets used for the reference maps include TIGER street centerlines and State Boundaries from the United States Census Bureau.

Choosing the Study Areas

Two study areas were chosen for their contrasting building styles and vegetation cover. The Sherwood Forest neighborhood in the southeastern section of Charlotte, North Carolina is primarily a suburban, single-family residential area with different construction time periods. Due to its densely-forested nature, one might expect a high degree of tree cover over the buildings in this neighborhood. Alternately the Park Hill neighborhood of Denver, Colorado is a well-established, urban neighborhood with older but also fewer trees than Sherwood Forest.

# Plot 1:  Study Area Boundaries
fig, ax = plt.subplots(figsize=(12, 8))

state.plot(alpha=1, color="lightgrey", edgecolor="black", ax=ax)
state_subset.plot(alpha=1, color="blue", edgecolor="black", ax=ax)
place_subset.plot(alpha=1, ax=ax, marker='*', color="red", markersize=400)

ax.set_title('Project Study Areas \n Charlotte, NC & Denver, CO', fontsize=16)
ax.set_axis_off()
plt.axis('equal')

plt.savefig("data/roof-cover/outputs/study_area.png")
plt.show()

Study Area Characteristics

The Sherwood Forest study area consists of 219 buildings, 86% (189) of which are single family residences (in light green.) A small number of other land uses are clustered along the main arterial through the neighborhood.

 

# Plot 2: Detailed Study Area Map of Sherwood Forest
fig, ax = plt.subplots(figsize=(10, 10))

LandUsePalette = {'Civic/Institutional': 'blue',
                  'Single Family - Detached': 'lightgreen',
                  'Commercial': 'red',
                  'Multi Family - Detached': 'maroon'}

for ctype, data in sf_building.groupby('ExistingLU'):
    color = LandUsePalette[ctype]
    data.plot(color=color,
              ax=ax,
              label=ctype)

sf_roads.plot(ax=ax, color='grey')
sf_boundary.boundary.plot(ax=ax, color='black', linewidth=3)

ax.set(title='Project Study Area 1 \n Sherwood Forest - Charlotte, NC')
ax.set_axis_off()

fig.text(.5, .05, "Building Footprint Source: CoreLogic", ha='center')

plt.savefig("data/roof-cover/outputs/sherwood_forest_landuse.png")
plt.show()

The Park Hill study area consists of 142 buildings, 90% (128) of which are single family residences. Similar to Sherwood Forest, the other land uses are clustered along the main arterial (Colfax Ave.) along the south boundary of the neighborhood.

# Plot 3: Detailed Study Area Map of Park Hill
fig, ax = plt.subplots(figsize=(12, 8))

for ctype, data in ph_building.groupby('ExistingLU'):
    color = LandUsePalette[ctype]
    data.plot(color=color,
              ax=ax,
              label=ctype)

ph_roads.plot(ax=ax, color='grey')
ph_boundary.boundary.plot(ax=ax,
                          color='black')

ax.set(title='Project Study Area 2 \n Park Hill - Denver, CO')
ax.set_axis_off()

fig.text(.5, .05, "Building Footprint Source: CoreLogic", ha='center')

plt.savefig("data/roof-cover/outputs/park_hill_landuse.png")
plt.show()

Defining the Workflow

Once you have the source datasets downloaded and your study area(s) defined, you can begin processing the data using Python. After importing the necessary packages and setting the working directory, the first step would be to load and reproject vector datasets, which would include the building footprints, study area boundaries, and census data. Next you would open and crop the NAIP images using your study area extents.

In order to calculate tree cover you will need to calculate the Normalized Difference Vegetation Index (NDVI), which uses the near infrared (NIR) and red channels from your NAIP images in its formula. Below are the resulting NDVI images for both study areas.

# Plot 4: NDVI Plot for Sherwood Forest
fig, ax = plt.subplots(figsize=(14, 8))

ndvi = ax.imshow(sf_ndvi,
                cmap='PiYG',
                extent=sf_crop_extent,
                vmin=-1, vmax=1)

sf_boundary_utm.boundary.plot(ax=ax,
                              color='black')
sf_building_utm.boundary.plot(ax=ax,
                              color='black')

# Code to add color bar, title, and caption
fig.colorbar(ndvi, fraction=.05)
fig.text(.5, .05, "NAIP Imagery Source: Farm Service Agency, US Department of Agriculture \n Building Footprint Source: CoreLogic", ha='center')
ax.set(title="NAIP 2016 NDVI  \n Sherwood Forest Neighborhood, Charlotte, North Carolina")
ax.set_axis_off()

plt.savefig("data/roof-cover/outputs/sherwood_forest_ndvi.png")
plt.show()

# Plot 5: NDVI Plot for Park Hill
fig, ax = plt.subplots(figsize=(14, 8))

ndvi = ax.imshow(ph_ndvi,
                cmap='PiYG',
                extent=ph_crop_extent,
                vmin=-1, vmax=1)

ph_boundary_utm.boundary.plot(ax=ax,
                              color='black')
ph_building_utm.boundary.plot(ax=ax,
                              color='black')

# Code to add color bar, title, and caption
fig.colorbar(ndvi, fraction=.05)
fig.text(.5, .05, "NAIP Imagery Source: Farm Service Agency, US Department of Agriculture \n Building Footprint Source: CoreLogic", ha='center')
ax.set(title="NAIP 2017 NDVI  \n Park Hill Neighborhood, Denver, Colorado")
ax.set_axis_off()

plt.savefig("data/roof-cover/outputs/park_hill_ndvi.png")
plt.show()

In order to simplify the calculation of the tree cover score, the NDVI rasters can be reclassified to fewer bins. NDVI thresholds (is this healthy vegetation versus something else?) can vary depending on season, locale, atmospheric conditions, soil types, and humidity. To determine the accurate range of NDVI for these study areas, different threshold values were tested and visually checked. A threshold of 0.2 was chosen as the best indicator for this analysis but that number may vary depending on study area and image source. Below is an example of the resulting reclassified and cleaned image for the Park Hill study area.

# Plot 6: Park Hill Classified
colors = ['linen', 'darkgreen']

ph_class_labels = ["No Vegetation", "Vegetation"]
legend_patches = [Patch(color=icolor, label=label)
                  for icolor, label in zip(colors, ph_class_labels)]

cmap = ListedColormap(colors)

fig, ax = plt.subplots(figsize=(10, 10))
ax.imshow(ph_ndvi_class_ma,
          cmap=cmap,
          extent=ph_crop_extent)

ph_boundary_utm.boundary.plot(ax=ax,
                              color='black')
ph_building_utm.boundary.plot(ax=ax,
                              color='maroon')

ax.legend(handles=legend_patches,
          facecolor="white",
          edgecolor="white",
          bbox_to_anchor=(1.35, 1))

ax.set(title="NAIP Classified  \n Park Hill Neighborhood, Denver, Colorado")
fig.text(.5, .12, "NAIP Imagery Source: Farm Service Agency, US Department of Agriculture \n Building Footprint Source: CoreLogic", ha='center')

ax.set_axis_off()
plt.savefig("data/roof-cover/outputs/park_hill_classified.png")
plt.show()

Once the input raster and vector datasets have been processed, you can finally run your calculations to assess tree cover at the building level. To do this you will use zonal statistics from the rasterstats package to calculate a percent tree cover for each building in each of the study areas. Finally, you can aggregate and output the results to a table which could then be joined back up to a shapefile or appended to a database containing other roof condition characteristics.

Analyzing the Results

Is Location Important?

Once might expect the Sherwood Forest study area to have a higher tree cover score based on its histogram of NDVI pixel distribution below. The neighborhood does indeed have a larger number of values in the higher spectrum than Park Hill, which exhibits a more even distribution of NDVI values.

# Plot 7: NDVI raster histograms for both study areas
fig, (ax1, ax2) = plt.subplots(
    2, 1, figsize=(11, 10), sharex=True, sharey=True)
ax1.hist(ph_ndvi.ravel(),
        color='lightgreen',
        edgecolor='white')
ax1.set_title("NDVI Value Distribution for Park Hill", fontsize=16)
ax1.set(xlabel="Greenness Value",
        ylabel="Number of Pixels")

ax2.hist(sf_ndvi.ravel(),
        color='lightgreen',
        edgecolor='white')
ax2.set_title("NDVI Value Distribution for Sherwood Forest", fontsize=16)
ax2.set(xlabel="Greenness Value",
        ylabel="Number of Pixels")

plt.savefig("data/roof-cover/outputs/ndvi_comparison.png")
plt.tight_layout()
plt.show()

However after aggregating the building level results for each study area, Park Hill actually has a slightly higher mean tree cover score than Sherwood Forest. This could be a result of the fact that Park Hill is an older neighborhood with an older, larger deciduous tree canopy. Sherwood Forest is a newer development and has slightly more coniferous vegetation which does not spread out as much as deciduous vegetation. More study areas with varying building types and vegetation should be selected to further understand how location affects tree cover.

 

 

mean

min

max

count

StudyArea

     

 

Park Hill

17.233982

0.198020

98.290598

115

Sherwood Forest

15.585421

0.155763

84.615385

180

How Does Land Use Affect Tree Cover Score?

Looking more closely at how land use might affect tree cover score, the results for both study areas are in line with one might expect. The mean tree cover score of 17.06 for single family residential buildings indicates that this land use category is at higher risk for tree-related roof issues than other land use categories. This is an important fact that illustrates that we should focus on single family residences when we move forward with calculation of a nationwide tree cover score.

 

mean

min

max

count

LandUse

     

 

Civic/Institutional

0.320993

0.155763

0.486224

2

Commercial

1.615210

0.198020

4.291845

5

Multi Family - Detached

8.095527

0.420168

45.000000

15

Single Family - Detached

17.059094

0.297619

98.290598

273

Does Size Really Matter?

One might expect that a larger building size would be less susceptible to high tree cover. For example a mansion or large commercial building is likely to have a large enough footprint that it would not have a high percent tree cover. Similar to land use, filtering on buildings over x size would improve efficiency and cost on a nationwide calculation. However at least based on these two study areas, which are primarily single family, it's difficult to discern whether there a high correlation between large size and tree cover score. One would need to pick additional study areas with larger buildings before being able to make that determination.

# Plot 8: scatter plot building size compared to tree cover
fig, ax = plt.subplots(figsize=(12, 10))

plt.setp(ax1.get_xticklabels(), rotation=45)

ax.set(title="Tree Cover Score and Building Size \n Combined Study Areas")

sns.scatterplot(x='TotalCount', y='PercentVeg', data=tree_cover_df,
                hue='StudyArea')

plt.savefig("data/roof-cover/outputs/percent_veg_and_building_size.png")
plt.show()

 

Conclusions and Areas for Future Research

NAIP imagery will certainly work for this type of analysis, but a tree cover score based on percent tree cover and NAIP imagery will only be a crude indicator of risk. The vintage of the NAIP imagery could become an issue, particularly in fast-growing areas with new developments. There are also potential issues with detecting trees in shadows and leaf on versus leaf off conditions, which remains a major drawback of this approach.

Source, vintage, and spatial accuracy of the footprints should be carefully considered as this also will affect accuracy of the tree cover score. There are commercially available imagery providers and hyperspectral imagery that may be more current and may better align to your building footprints.

Location was not as much of a factor in tree cover scores as expected with these two study areas. However one would want to pick additional study areas with different vegetation and construction patterns to confirm exactly how geography might affect tree cover score.

Single family buildings are most affected by tree cover so any nationwide analysis could just filter on those building types which would save processing time and money.

Building size does not seem to affect percent tree cover either but further analysis using study areas with a higher distribution of large buildings is recommended.

Finally, we did not validate the results of our tree cover analysis against other sources. One could create a hand-labeled dataset, perform field research, or use of an existing tree cover dataset to confirm the accuracy of the score. Additionally use of insurance claims related to tree damage would also validate the applicability of this score.