Spatial Statistics · Urban Analytics · Vancouver, BC

Green Equity in Vancouver

Geographically Weighted Regression · PostGIS · Census 2021 · Landsat 8 · NDVI
Joselyne MPAYIMANA  |  University of British Columbia  |  2026
PostGIS SQL QGIS ArcGIS Pro Geographically Weighted Regression Ordinary Least Squares Landsat 8 NDVI Census 2021 Zonal Statistics Exploratory Regression

Overview

Green space is not distributed equally across Vancouver. Parks, street trees, and urban vegetation tend to cluster in more affluent neighborhoods while lower-income, higher-density, and more marginalized communities are often left with considerably less. This raises fundamental questions about environmental justice and the quality of life experienced by Vancouver's most vulnerable residents.

This project investigates green equity across Vancouver's 1,016 Dissemination Areas (DAs) by integrating 2021 Canadian Census socio-economic data with Landsat 8 satellite imagery. Normalized Difference Vegetation Index (NDVI) derived from late-summer imagery serves as a proxy for vegetation greenness. I modelled its relationship with census characteristics using both Ordinary Least Squares (OLS) and Geographically Weighted Regression (GWR), a local modelling approach that fits a separate regression for each DA using its nearest neighbors. The shift from a global to a local model revealed that the relationships between socio-economic factors and green space are not uniform across Vancouver, they vary significantly by neighborhood in ways that a single global model cannot capture.

1,016
Vancouver Dissemination Areas analysed
0.64
GWR Adjusted R² for mean NDVI vs 0.40 for OLS
116
Optimal GWR neighborhood size in neighbors
24
Census variables tested across two analyses

Data and Preparation

The socio-economic data come from the 2021 Canadian Census by Statistics Canada, aggregated at the Dissemination Area level, Canada's smallest standard geographic unit averaging 400 to 700 residents. The raw census data were stored in long format on a PostgreSQL server with 2,631 characteristics per DA. I wrote a multi-part SQL query using the WITH clause to pivot this long-format table into wide format, extracting socio-economic variables and spatially joining them to DA boundary polygons using a PostGIS ST_Within spatial intersection. The entire operation was completed in a single SQL statement.

Census pivot and spatial join — core SQL query
WITH 
characteristics AS (
  SELECT 
    alt_geo_code,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 1)    AS population,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 6)    AS popdensity,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 50)   AS households,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 57)   AS hhsize,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 345)  AS lowincome,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 2008) AS education,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 39)   AS age,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 35)   AS children,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 37)   AS seniors,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 2230) AS unemployment,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 113)  AS medianincome,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 392)  AS neitherenglishorfrench,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 398)  AS nonofficiallanguage,
    MAX(c1_count_total) FILTER (WHERE characteristic_id = 1536) AS immigrants
  FROM census2021
  WHERE geo_level = 'Dissemination area'
  GROUP BY alt_geo_code
),
vancouver AS (
  SELECT wkb_geometry FROM lcsd000b21a_e WHERE CSDNAME = 'Vancouver'
),
vancouver_da AS (
  SELECT lda_000b21a_e.DAUID, lda_000b21a_e.wkb_geometry
  FROM vancouver, lda_000b21a_e
  WHERE ST_Within(lda_000b21a_e.wkb_geometry, vancouver.wkb_geometry)
)
SELECT characteristics.*, vancouver_da.wkb_geometry
FROM characteristics
JOIN vancouver_da ON characteristics.alt_geo_code::integer = vancouver_da.DAUID::integer;

Landsat 8 imagery was acquired on August 14, 2020 at 30 m spatial resolution and projected to UTM Zone 10N (EPSG:32610). Bands 4 and 5 were used to calculate NDVI, producing values ranging from -0.999 to +0.999. NDVI was then summarized over each DA using the Zonal Statistics as Table tool, generating mean, maximum, and minimum values per polygon for use as dependent variables in the regression models.


Modelling Green Equity

Variable Selection

Exploratory Regression was run across all candidate variables to identify the best-performing subset. The Adjusted R² plateaued at 0.41 with seven variables, meaning additional variables beyond that threshold provided negligible improvement. Variables with a Variance Inflation Factor above 7.5 were excluded due to multicollinearity. The Jarque-Bera and spatial autocorrelation tests failed consistently across all model combinations, confirming that the relationships between census characteristics and NDVI are spatially non-stationary. This was the key signal that a global OLS model alone would be insufficient and that GWR was the appropriate approach.

The final model used seven variables: population density, total population, average household size, level of education, proportion of children, proportion of seniors, and proportion of residents whose first official language is neither English nor French.

OLS vs GWR: Why Local Modelling Matters

Running both OLS and GWR on the same dependent variable and the same explanatory variables demonstrated how much is lost when spatial variation in relationships is ignored.

StatisticOLSGWR
Dependent VariableMean NDVIMean NDVI
Explanatory Variablespopulation, popdensity, hhsize, education, children, seniors, neitherengSame
Number of Observations1,0161,016
0.4050.713
Adjusted R²0.4010.640
AICc-3449.59-3831.03
Jarque-Bera p-value0.000 (Failed)N/A
Koenker (BP) p-value0.000247 (Failed)N/A
Number of NeighborsN/A116

GWR explained 71.3% of the variance in mean NDVI compared to 40.5% for OLS. The failing OLS diagnostics are not errors but findings: they confirm that the relationships between socio-economic factors and vegetation are spatially non-stationary, precisely the condition that justifies using GWR. A single global coefficient applied across all of Vancouver would obscure meaningful differences between neighborhoods.


Extended Analysis: Housing, Diversity, and Mobility

A second analysis was conducted using 10 additional census characteristics to examine a broader set of dimensions linked to green space access, with particular attention to housing tenure, visible minority status, residential mobility, and educational attainment.

Housing

Renter and owner households, high-rise and low-rise apartments, dwellings needing major repairs

Socio-economic

Population without a certificate, diploma, or degree as a proxy for educational and economic vulnerability

Population diversity

Visible minority population examined in relation to unequal access to urban vegetation

Mobility and urban form

Residential movers and commuting duration as indicators of neighborhood stability and density

Mean NDVI consistently produced the strongest models across all three NDVI statistics tested (mean, maximum, minimum), achieving a GWR Adjusted R² of 0.578 with an optimal neighborhood size of 96. Maximum NDVI performed the weakest (GWR Adjusted R² = 0.256) because peak vegetation values are driven by isolated green patches rather than systematic socio-economic patterns. Minimum NDVI showed moderate performance but required 227 neighbors, suggesting spatially uniform relationships that are difficult to model locally. The final model used five variables: visible minority population, dwellings needing major repairs, population without a certificate, residential movers, and low-rise apartments.


Spatial Patterns: Where the Model Fits and Where It Does Not

Standardized Residuals and Local R2 Maps
Figure 1. Standardized residuals and local R² values for GWR models using mean, maximum, and minimum NDVI as dependent variables across Vancouver Dissemination Areas (March 2026). Top row: standardized residuals where values beyond ±2.5 indicate areas of poor model fit. Bottom row: local R² values where darker tones indicate neighborhoods where the model explains more of the variation in vegetation greenness. Data: 2021 Canadian Census, Landsat 8 (August 14, 2020). Projection: NAD83 / UTM Zone 10N (EPSG:32610).

The local R² maps reveal that the GWR model for mean NDVI performs best in southwestern Vancouver, where values reach 0.70, and performs more poorly in the northeastern areas where local R² values drop below 0.21. This spatial pattern in model fit suggests that the socio-economic drivers of vegetation greenness are more consistent and predictable in some parts of the city than others, likely reflecting differences in urban form, land use history, and neighborhood composition.


How Educational Disadvantage Shapes Access to Green Space

One of the most informative outputs of GWR is the set of local coefficients, which show not just whether a variable is related to NDVI but how strongly and in which direction that relationship holds in each specific neighborhood. The coefficient and standard error maps for population without a certificate illustrate this spatial variation clearly.

GWR Coefficient Map
Figure 2. Spatial variation in the effect of educational attainment on vegetation greenness across Vancouver DAs. Dark red areas show the strongest negative relationship: higher proportions of residents without educational certificates are associated with significantly lower NDVI. Blue areas in the northwest show a positive relationship. Data: GWR output, Mean NDVI model.
GWR Standard Error Map
Figure 3. Reliability of local GWR coefficient estimates for educational attainment across Vancouver DAs. Lighter pink areas indicate more reliable estimates; darker red areas indicate greater uncertainty. Central and eastern Vancouver show the highest uncertainty, coinciding with areas of complex socio-economic transition. Data: GWR output, Mean NDVI model.

The coefficient map shows clear spatial variation. Central and eastern neighborhoods display the strongest negative relationships, with higher proportions of residents without educational certificates strongly associated with lower vegetation greenness. These are the same areas where high-density housing, lower incomes, and greater socio-economic vulnerability tend to concentrate. The northwest of the city, including areas near the University Endowment Lands, shows a positive or neutral relationship, likely reflecting the presence of mature forest cover that is independent of the surrounding socio-economic conditions.

The standard error map adds an important caveat. While the negative relationships in central and eastern Vancouver are the strongest in the city, they are also the least certain. Higher uncertainty in these areas likely reflects the complexity of socio-economic and land use transitions occurring in neighborhoods undergoing rapid change. In contrast, peripheral and western areas show more stable and reliable coefficient estimates. Together, the two maps make the case that interventions targeting educational disadvantage and green space access should focus on central and eastern Vancouver, while acknowledging that those estimates carry more uncertainty and would benefit from additional ground-level evidence.


Conclusions and Recommendations

Green equity is unevenly distributed across Vancouver's dissemination areas. Neighborhoods with higher socio-economic disadvantage, including those with greater proportions of renters, lower education levels, poorer housing conditions, and higher residential mobility, consistently show lower NDVI values. The dramatic improvement from OLS (Adjusted R² = 0.401) to GWR (Adjusted R² = 0.640) confirms that local spatial context is essential to understanding how socio-economic factors shape vegetation distribution in a complex urban environment. These findings are consistent with the green equity hypothesis: marginalized communities in Vancouver have less access to green space, and the magnitude of that inequity varies significantly by neighborhood.

Targeted greening initiatives should be implemented in neighborhoods identified by the GWR maps as having low NDVI values and high socio-economic disadvantage, including community parks, street tree planting programs, and green infrastructure such as green roofs and bioswales. Urban planning policies should incorporate mandatory vegetation requirements into new high-density residential developments, particularly in areas dominated by apartments. Because the GWR results demonstrate significant spatial variation in the drivers of vegetation inequality, interventions should be tailored to local neighborhood contexts rather than applying a uniform city-wide approach. Data-driven, place-based strategies informed by spatial regression analysis offer the most effective path toward reducing environmental inequalities and improving quality of life for all Vancouver residents.


Joselyne MPAYIMANA  |  Master of Geomatics for Environmental Management, UBC  |  2026