Exploring Vancouver Street Trees

Data Cleaning, Species Analysis, and Neighbourhood Tree Density Using tidyverse

Author

Joselyne MPAYIMANA


Project Overview

Vancouver’s urban forest is one of the city’s most visible and ecologically significant assets. This project uses the City of Vancouver’s open street tree dataset to explore patterns in species composition, tree size, and the spatial distribution of trees across neighbourhoods.

The analysis works through a full tidyverse workflow: reshaping messy data into tidy format, cleaning unrealistic values, computing summary statistics by species and neighbourhood, and building a series of visualizations that reveal how Vancouver’s street trees vary across the city. The final part of the analysis brings in street length and neighbourhood area data to calculate tree density per kilometre of street, giving a more meaningful picture of how well-treed each neighbourhood actually is relative to its size.


Dataset

The data comes from the Vancouver City Open Data Portal and covers trees on public streets across the city. Each record includes a tree ID, species (latin and common name), diameter at breast height (in inches), height rank (0 to 10), the street name, and the neighbourhood. Four supporting files are also used for street length and neighbourhood area.


Packages

library(tidyverse)
library(ggrepel)

Data Preparation

Reshaping to Tidy Format

The raw trees_count.csv file stores neighbourhood names as column headers rather than as values in a column. This is wide format and not tidy. I used pivot_longer() to collapse all neighbourhood columns into two columns: one for neighbourhood name and one for tree count.

trees_count <- read.csv(file.path(data_path, "trees_count.csv"))

trees_count_longer <- pivot_longer(trees_count,
                                   cols        = 2:23,
                                   names_to    = "neighborhood",
                                   values_to   = "trees_count")
trees_count_longer
# A tibble: 6,226 × 3
   species neighborhood             trees_count
   <chr>   <chr>                          <int>
 1 ABIES   arbutus.ridge                      2
 2 ABIES   downtown                           1
 3 ABIES   dunbar.southlands                 11
 4 ABIES   fairview                           7
 5 ABIES   grandview.woodland                 4
 6 ABIES   hastings.sunrise                   4
 7 ABIES   kensington.cedar.cottage          15
 8 ABIES   kerrisdale                         5
 9 ABIES   killarney                          4
10 ABIES   kitsilano                         17
# ℹ 6,216 more rows

Similarly, trees_height_diam.csv stores height rank and diameter as rows in a single attribute column instead of as separate variables. I used pivot_wider() to give each attribute its own column.

trees_h_d <- read.csv(file.path(data_path, "trees_height_diam.csv"))

trees_h_d_wider <- pivot_wider(trees_h_d,
                                names_from  = "attribute",
                                values_from = "value")
trees_h_d_wider
# A tibble: 146,730 × 4
   tree_id species               diameter height_rank
     <int> <chr>                    <dbl>       <dbl>
 1   21422 COLUMNAR NORWAY MAPLE     22             4
 2   21425 COLUMNAR NORWAY MAPLE     19             4
 3   21427 SYCAMORE MAPLE            14.5           3
 4   21428 SYCAMORE MAPLE            16.5           4
 5   21432 ENGLISH OAK               28             5
 6   21434 SYCAMORE MAPLE            11             3
 7   21447 NORWAY MAPLE              17.2           4
 8   21452 SYCAMORE MAPLE            17             4
 9   21453 THREAD-LEAF CYPRESS        5             2
10   21455 SYCAMORE MAPLE            20             4
# ℹ 146,720 more rows

Loading and Cleaning the Main Dataset

street_trees <- read.csv(file.path(data_path, "street_trees.csv"))

# Renaming columns to clean, consistent names
street_trees <- rename(street_trees,
  tree_id      = Tree.ID,
  street       = Street.Name,
  neighbourhood = Neighbourh,
  species      = SpeciesName,
  common_name  = CommonName,
  height_rank  = hrank,
  diameter_in  = Diameter,
  year_planted = YearPlanted
)

The dataset contains 146730 trees recorded across Vancouver’s public streets.

# Converting diameter to cm
street_trees <- street_trees %>%
  mutate(diameter_cm = diameter_in * 2.54)

# Converting height rank to an ordered factor
height_rank_ordered <- sort(unique(street_trees$height_rank))
street_trees <- street_trees %>%
  mutate(height_rank = factor(height_rank,
                               levels  = height_rank_ordered,
                               ordered = TRUE))

# Removing trees with diameter of 0 or more than 300 cm (3 m)
trees_clean <- street_trees %>%
  filter(diameter_cm > 0, diameter_cm <= 300)

n_trees <- nrow(trees_clean)

Before filtering, 91 trees had a diameter of 0 and 9 trees had a diameter above 3 metres. After removing these unrealistic records, 146630 trees remain in the cleaned dataset.


Species Analysis

Diameter Summary by Species

trees_summary_sp <- trees_clean %>%
  group_by(common_name) %>%
  summarise(
    n_trees   = n(),
    mean_diam = mean(diameter_cm),
    min_diam  = min(diameter_cm),
    max_diam  = max(diameter_cm),
    sd_diam   = sd(diameter_cm)
  )

Five Most Common Species

top_5_sp <- trees_summary_sp %>%
  arrange(desc(n_trees)) %>%
  slice_head(n = 5) %>%
  mutate(ntree_per = n_trees / n_trees * 100,
         ntree_per = n_trees / nrow(trees_clean) * 100)

top_5_sp %>%
  select(common_name, n_trees, ntree_per) %>%
  mutate(ntree_per = round(ntree_per, 2)) %>%
  rename(`Species` = common_name,
         `Count`   = n_trees,
         `% of Total` = ntree_per) %>%
  knitr::kable(caption = "Five most common street tree species in Vancouver.")
Five most common street tree species in Vancouver.
Species Count % of Total
KWANZAN FLOWERING CHERRY 10486 7.15
PISSARD PLUM 8636 5.89
NORWAY MAPLE 5660 3.86
CRIMEAN LINDEN 4423 3.02
PYRAMIDAL EUROPEAN HORNBEAM 3418 2.33

The most common street tree is KWANZAN FLOWERING CHERRY, making up 7.2% of all trees in the dataset. The top five species together account for 22.2% of the total.


Diameter by Species and Height Rank

I looked at how diameter varies across height ranks for the five most common species, focusing on height ranks 1 through 6.

trees_summary_sp_h <- trees_clean %>%
  group_by(common_name, height_rank) %>%
  summarise(
    n_trees   = n(),
    mean_diam = mean(diameter_cm),
    min_diam  = min(diameter_cm),
    max_diam  = max(diameter_cm),
    sd_diam   = sd(diameter_cm),
    .groups   = "drop"
  )

trees_summary_trg <- trees_summary_sp_h %>%
  filter(common_name %in% top_5_sp$common_name,
         height_rank >= 1,
         height_rank <= 6)
ggplot(trees_summary_trg,
       aes(x = height_rank, y = mean_diam, color = common_name, group = common_name)) +
  geom_point(size = 3) +
  geom_line(linewidth = 0.9) +
  labs(
    title  = "Mean Diameter by Height Rank for the Top 5 Street Tree Species",
    x      = "Height Rank",
    y      = "Mean Diameter (cm)",
    color  = "Species"
  ) +
  theme_bw(base_size = 12) +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5, face = "bold"))

Mean trunk diameter by height rank for the five most common street tree species in Vancouver. Across all species, mean diameter increases with height rank, but the rate of increase differs by species.
ggplot(trees_summary_trg,
       aes(x = height_rank, y = n_trees, fill = common_name)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Number of Trees per Height Rank by Species",
    x     = "Height Rank",
    y     = "Number of Trees",
    fill  = "Species"
  ) +
  theme_bw(base_size = 12) +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5, face = "bold"))

Number of trees per height rank for the five most common species. Most trees fall in the lower height ranks, and species like Kwanzan Flowering Cherry tend to be concentrated in shorter height classes compared to species such as Maple.

Diameter Distribution by Species

street_trees_trg <- trees_clean %>%
  filter(common_name %in% top_5_sp$common_name)

ggplot(street_trees_trg,
       aes(x = diameter_cm, fill = common_name)) +
  geom_density(alpha = 0.4) +
  xlim(c(0, 100)) +
  labs(
    title = "Diameter Distribution of the Top 5 Street Tree Species",
    x     = "Diameter (cm)",
    y     = "Density",
    fill  = "Species"
  ) +
  theme_bw(base_size = 12) +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5, face = "bold"))

Diameter distribution for the five most common street tree species. Kwanzan Flowering Cherry is strongly skewed towards smaller diameters, while Maple shows a wider spread, indicating a broader range of tree sizes across the city.

Kwanzan Flowering Cherry is strongly skewed towards smaller diameters, meaning most of these trees are relatively young or small. Maple shows a much wider spread, suggesting a greater range of tree ages and sizes across the city.


Neighbourhood Analysis

Tree Density per Kilometre of Street

To compare how tree-rich each neighbourhood is relative to its street network, I joined the tree count summary with total street length data and calculated the number of trees per kilometre of street.

# Street length by neighbourhood (converted to km)
street_length <- read.csv(file.path(data_path, "public_street_length.csv"))
street_length_tot <- street_length %>%
  group_by(neighbourhood) %>%
  summarise(total_length = sum(street_length) / 1000, .groups = "drop")

# Tree summary by neighbourhood
trees_summary_nei <- trees_clean %>%
  group_by(neighbourhood) %>%
  summarise(
    n_trees   = n(),
    mean_diam = mean(diameter_cm),
    min_diam  = min(diameter_cm),
    max_diam  = max(diameter_cm),
    sd_diam   = sd(diameter_cm),
    .groups   = "drop"
  )

# Joining and computing tree density
trees_dens_nei <- trees_summary_nei %>%
  inner_join(street_length_tot, by = "neighbourhood") %>%
  mutate(tree_dens = n_trees / total_length)
# Neighbourhood area data
nei_area <- read.csv(file.path(data_path, "local_area_areakm2.csv")) %>%
  rename(neighbourhood = area_name)

nei_area_df <- trees_dens_nei %>%
  inner_join(nei_area, by = "neighbourhood") %>%
  select(neighbourhood, mean_diam, tree_dens, area_km2)
ggplot(nei_area_df,
       aes(x = tree_dens, y = area_km2,
           size = mean_diam, color = neighbourhood)) +
  geom_point(alpha = 0.5) +
  geom_text_repel(aes(label = neighbourhood, color = neighbourhood),
                  size = 3, show.legend = FALSE) +
  scale_size_continuous(name = "Mean Tree Diameter (cm)") +
  labs(
    title = "Street Tree Density by Neighbourhood",
    x     = "Trees per km of Street",
    y     = "Neighbourhood Area (km²)"
  ) +
  theme_bw(base_size = 12) +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5, face = "bold")) +
  guides(color = "none")

Tree density (trees per km of street) versus neighbourhood area, with point size showing mean tree diameter. South Cambie has the highest tree density but is one of the smallest neighbourhoods. Dunbar-Southlands and Renfrew-Collingwood stand out as large neighbourhoods with high tree density.

South Cambie has the highest street tree density in the city, though its small area means it contributes relatively little to the overall count. Dunbar-Southlands and Renfrew-Collingwood have both high tree density and large areas, making them the most significant contributors to Vancouver’s urban forest. Downtown has notably smaller mean tree diameters, making it the least interesting neighbourhood for someone looking to live among mature, large-diameter trees.


Summary

Working through this dataset gave me a clear picture of how Vancouver’s street tree inventory is distributed across species and space. The Kwanzan Flowering Cherry dominates in count but tends to be small in diameter, while Maple trees show more variability in size, suggesting older planting history in some neighbourhoods. The neighbourhood-level analysis revealed that tree density per kilometre of street varies considerably across the city, and that area alone is not a reliable predictor of how well-treed a neighbourhood is. Combining tree counts, street length, and neighbourhood area together gave a much more informative view of where Vancouver’s urban forest is most concentrated.