Green space is not distributed equally across Vancouver. Parks, street trees, and urban vegetation tend to cluster in more affluent neighborhoods while lower-income, higher-density, and more marginalized communities are often left with considerably less. This raises fundamental questions about environmental justice and the quality of life experienced by Vancouver's most vulnerable residents.
This project investigates green equity across Vancouver's 1,016 Dissemination Areas (DAs) by integrating 2021 Canadian Census socio-economic data with Landsat 8 satellite imagery. Normalized Difference Vegetation Index (NDVI) derived from late-summer imagery serves as a proxy for vegetation greenness. I modelled its relationship with census characteristics using both Ordinary Least Squares (OLS) and Geographically Weighted Regression (GWR), a local modelling approach that fits a separate regression for each DA using its nearest neighbors. The shift from a global to a local model revealed that the relationships between socio-economic factors and green space are not uniform across Vancouver, they vary significantly by neighborhood in ways that a single global model cannot capture.
The socio-economic data come from the 2021 Canadian Census by Statistics Canada, aggregated at the Dissemination Area level, Canada's smallest standard geographic unit averaging 400 to 700 residents. The raw census data were stored in long format on a PostgreSQL server with 2,631 characteristics per DA. I wrote a multi-part SQL query using the WITH clause to pivot this long-format table into wide format, extracting socio-economic variables and spatially joining them to DA boundary polygons using a PostGIS ST_Within spatial intersection. The entire operation was completed in a single SQL statement.
WITH
characteristics AS (
SELECT
alt_geo_code,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 1) AS population,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 6) AS popdensity,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 50) AS households,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 57) AS hhsize,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 345) AS lowincome,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 2008) AS education,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 39) AS age,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 35) AS children,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 37) AS seniors,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 2230) AS unemployment,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 113) AS medianincome,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 392) AS neitherenglishorfrench,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 398) AS nonofficiallanguage,
MAX(c1_count_total) FILTER (WHERE characteristic_id = 1536) AS immigrants
FROM census2021
WHERE geo_level = 'Dissemination area'
GROUP BY alt_geo_code
),
vancouver AS (
SELECT wkb_geometry FROM lcsd000b21a_e WHERE CSDNAME = 'Vancouver'
),
vancouver_da AS (
SELECT lda_000b21a_e.DAUID, lda_000b21a_e.wkb_geometry
FROM vancouver, lda_000b21a_e
WHERE ST_Within(lda_000b21a_e.wkb_geometry, vancouver.wkb_geometry)
)
SELECT characteristics.*, vancouver_da.wkb_geometry
FROM characteristics
JOIN vancouver_da ON characteristics.alt_geo_code::integer = vancouver_da.DAUID::integer;
Landsat 8 imagery was acquired on August 14, 2020 at 30 m spatial resolution and projected to UTM Zone 10N (EPSG:32610). Bands 4 and 5 were used to calculate NDVI, producing values ranging from -0.999 to +0.999. NDVI was then summarized over each DA using the Zonal Statistics as Table tool, generating mean, maximum, and minimum values per polygon for use as dependent variables in the regression models.
Exploratory Regression was run across all candidate variables to identify the best-performing subset. The Adjusted R² plateaued at 0.41 with seven variables, meaning additional variables beyond that threshold provided negligible improvement. Variables with a Variance Inflation Factor above 7.5 were excluded due to multicollinearity. The Jarque-Bera and spatial autocorrelation tests failed consistently across all model combinations, confirming that the relationships between census characteristics and NDVI are spatially non-stationary. This was the key signal that a global OLS model alone would be insufficient and that GWR was the appropriate approach.
The final model used seven variables: population density, total population, average household size, level of education, proportion of children, proportion of seniors, and proportion of residents whose first official language is neither English nor French.
Running both OLS and GWR on the same dependent variable and the same explanatory variables demonstrated how much is lost when spatial variation in relationships is ignored.
| Statistic | OLS | GWR |
|---|---|---|
| Dependent Variable | Mean NDVI | Mean NDVI |
| Explanatory Variables | population, popdensity, hhsize, education, children, seniors, neithereng | Same |
| Number of Observations | 1,016 | 1,016 |
| R² | 0.405 | 0.713 |
| Adjusted R² | 0.401 | 0.640 |
| AICc | -3449.59 | -3831.03 |
| Jarque-Bera p-value | 0.000 (Failed) | N/A |
| Koenker (BP) p-value | 0.000247 (Failed) | N/A |
| Number of Neighbors | N/A | 116 |
GWR explained 71.3% of the variance in mean NDVI compared to 40.5% for OLS. The failing OLS diagnostics are not errors but findings: they confirm that the relationships between socio-economic factors and vegetation are spatially non-stationary, precisely the condition that justifies using GWR. A single global coefficient applied across all of Vancouver would obscure meaningful differences between neighborhoods.
A second analysis was conducted using 10 additional census characteristics to examine a broader set of dimensions linked to green space access, with particular attention to housing tenure, visible minority status, residential mobility, and educational attainment.
Renter and owner households, high-rise and low-rise apartments, dwellings needing major repairs
Population without a certificate, diploma, or degree as a proxy for educational and economic vulnerability
Visible minority population examined in relation to unequal access to urban vegetation
Residential movers and commuting duration as indicators of neighborhood stability and density
Mean NDVI consistently produced the strongest models across all three NDVI statistics tested (mean, maximum, minimum), achieving a GWR Adjusted R² of 0.578 with an optimal neighborhood size of 96. Maximum NDVI performed the weakest (GWR Adjusted R² = 0.256) because peak vegetation values are driven by isolated green patches rather than systematic socio-economic patterns. Minimum NDVI showed moderate performance but required 227 neighbors, suggesting spatially uniform relationships that are difficult to model locally. The final model used five variables: visible minority population, dwellings needing major repairs, population without a certificate, residential movers, and low-rise apartments.
The local R² maps reveal that the GWR model for mean NDVI performs best in southwestern Vancouver, where values reach 0.70, and performs more poorly in the northeastern areas where local R² values drop below 0.21. This spatial pattern in model fit suggests that the socio-economic drivers of vegetation greenness are more consistent and predictable in some parts of the city than others, likely reflecting differences in urban form, land use history, and neighborhood composition.
One of the most informative outputs of GWR is the set of local coefficients, which show not just whether a variable is related to NDVI but how strongly and in which direction that relationship holds in each specific neighborhood. The coefficient and standard error maps for population without a certificate illustrate this spatial variation clearly.
The coefficient map shows clear spatial variation. Central and eastern neighborhoods display the strongest negative relationships, with higher proportions of residents without educational certificates strongly associated with lower vegetation greenness. These are the same areas where high-density housing, lower incomes, and greater socio-economic vulnerability tend to concentrate. The northwest of the city, including areas near the University Endowment Lands, shows a positive or neutral relationship, likely reflecting the presence of mature forest cover that is independent of the surrounding socio-economic conditions.
The standard error map adds an important caveat. While the negative relationships in central and eastern Vancouver are the strongest in the city, they are also the least certain. Higher uncertainty in these areas likely reflects the complexity of socio-economic and land use transitions occurring in neighborhoods undergoing rapid change. In contrast, peripheral and western areas show more stable and reliable coefficient estimates. Together, the two maps make the case that interventions targeting educational disadvantage and green space access should focus on central and eastern Vancouver, while acknowledging that those estimates carry more uncertainty and would benefit from additional ground-level evidence.
Green equity is unevenly distributed across Vancouver's dissemination areas. Neighborhoods with higher socio-economic disadvantage, including those with greater proportions of renters, lower education levels, poorer housing conditions, and higher residential mobility, consistently show lower NDVI values. The dramatic improvement from OLS (Adjusted R² = 0.401) to GWR (Adjusted R² = 0.640) confirms that local spatial context is essential to understanding how socio-economic factors shape vegetation distribution in a complex urban environment. These findings are consistent with the green equity hypothesis: marginalized communities in Vancouver have less access to green space, and the magnitude of that inequity varies significantly by neighborhood.
Targeted greening initiatives should be implemented in neighborhoods identified by the GWR maps as having low NDVI values and high socio-economic disadvantage, including community parks, street tree planting programs, and green infrastructure such as green roofs and bioswales. Urban planning policies should incorporate mandatory vegetation requirements into new high-density residential developments, particularly in areas dominated by apartments. Because the GWR results demonstrate significant spatial variation in the drivers of vegetation inequality, interventions should be tailored to local neighborhood contexts rather than applying a uniform city-wide approach. Data-driven, place-based strategies informed by spatial regression analysis offer the most effective path toward reducing environmental inequalities and improving quality of life for all Vancouver residents.