Coloring an Intricate Map with Graph Coloring Algorithms

Jose M Sallan 2024-07-10 8 min read

In a previous post I have introduced the graph coloring problem and used a linear programming formulation to apply this problem to color a map of districts of Barcelona. In this post, I will try to color the map of neighbourhoods (barris) of Barcelona solving a graph coloring problem. In addition to the tidyverse for data handling and BAdatasetsSpatial to get the map, I will be using sf for spatial data handling and tidygraph and ggraph to work with graphs.

library(tidyverse)
library(BAdatasetsSpatial)
library(sf)

library(tidygraph)
library(ggraph)

Firstly, I will get the neighbourhoods, stored in the BCNNeigh sf object.

data("BCNNeigh")

BCNNeigh <- BCNNeigh |>
  arrange(c_barri)

int <- st_touches(BCNNeigh)

edges_bcn <- map_dfr(1:length(int), ~ tibble(i = ., j = int[[.]])) |>
  filter(i < j) # 195 edges

edges_bcn <- edges_bcn |>
  mutate(id = 1:n())

The map has up to 73 zones, stacked in complicated ways specially at the north.

BCNNeigh |>
  ggplot() +
  geom_sf(fill = "white") +
  geom_text(aes(coord_x, coord_y, label = c_barri), size = 3) +
  theme_void() +
  theme(legend.position = "none")

To solve the graph coloring problem, we need to define a graph with a node for each spatial region connected by edges if they touch each other. If we represent the graph, we observe that it is non-planar, as some edges cross each other. This is because some regions are connected by a single point, and it is correct to use the same color to label them. The edges representing one-point connections are plotted in red.

bcn_neigh_graph <- tbl_graph(edges = edges_bcn, directed = TRUE)

bcn_neigh_graph <- bcn_neigh_graph |>
  activate(nodes) |>
  mutate(label = 1:n())

bcn_neigh_graph <- bcn_neigh_graph |>
  activate(edges) |>
  mutate(label = 1:n())

non_planar_edges <- c(140, 144, 151, 147, 153, 158, 54, 112, 116)

bcn_neigh_graph <- bcn_neigh_graph |>
  activate(edges) |>
  mutate(planar = ifelse(label %in% non_planar_edges, "no", "yes"))

m_layout <- matrix(c(BCNNeigh$coord_x, BCNNeigh$coord_y), ncol = 2)

ggraph(bcn_neigh_graph, layout = m_layout) +
  geom_edge_link(aes(color = planar),
                 start_cap = circle(2, "mm"), 
                 end_cap = circle(2, "mm")) +
  geom_node_text(aes(label = label), size = 3) +
  scale_edge_color_manual(values = c("#FF0000", "#808080")) +
  theme_graph() +
  theme(legend.position = "none")

We remove the edges in red to get a planar graph, edges_non_planar.

edges_bcn_planar <- edges_bcn |>
  filter(!id %in% non_planar_edges)

A Greedy Algorithm for the Graph Coloring Problem

The instance is too large and intricate to solve it using linear programming, so we can try a constructive heuristic. It is defined as follows:

  • Label the first node with the first color.
  • For the remaining nodes:
  • Consider each node, and color it with the color of lowest number not used in adjacent nodes colored previously. If all colors used so far appear in the adjacent, yet colored nodes, assing the node a new color.

This heuristic is implemented in the gc_heuristic() function.

gc_heuristic <- function(edges_col){
  
  nodes <- sort(unique(c(edges_col$i, edges_col$j)))
  n_nodes <- length(nodes)
  color <- c(1, rep(0, n_nodes - 1))
  
  for(i in 2:(n_nodes)){
    
    node_to_color <- nodes[i]
    
    # nodes adjacent to node_to_color
    adj_table <- edges_col |>
      filter(i == node_to_color | j == node_to_color)
    adj_nodes <- unique(c(adj_table$i, adj_table$j))
    
    # nodes adjacent to node_to_color colored previously
    adj_nodes <- adj_nodes[which(adj_nodes %in% nodes[1:(i-1)])]
    
    #colores usados en los vertices adyacentes coloreados previamente
    prev_colors <- color[which(nodes %in% adj_nodes)] |> unique()
    new_color <- which(!1:n_nodes %in% prev_colors) |> min()
    color[i] <- new_color
  }
  
  solution <- tibble(node = nodes,  color = color)
  return(solution)
}

Let’s obtain the solution of our problem with this heuristic:

sol <- gc_heuristic(edges_bcn_planar)

We observe that we have needed five colors to solve the problem:

unique(sol$color)
## [1] 1 2 3 4 5

A Tabu Search Algorithm for the Graph Coloring Problem

With the constructive heuristic we have obtained a way to color the map using five colors. But we know that the chromatic number (minimum number of colors) of a planary graph is four. To improve the five color solution, I have defined a tabu search heuristic for the graph coloring problem.

In this heuristic, we fix the number of colors n_c and we aim to minimize the number of edges with nodes with the same color. This value is obtained with the edges_invalid() function. We get the coloring with n_c colors when this objective function is equal to zero.

The neighborhood of a solution has all nodes with the same color except one node. So in each iteration we test the objective function for each node and each color different from the one in the current solution. The tabu search is implemented in the tabu_gc() function.

edges_invalid <- function(edges_col, solution){
  
  t1 <- left_join(edges_col, solution, by = c("i" = "node"))
  t1 <- left_join(t1, solution, by = c("j" = "node"))
  ie <- sum(t1$color.x == t1$color.y)
  return(ie)
}

tabu_gc <- function(edges_col, n_c, iter, tabu_size, verbose = FALSE){
  
  # nodes of instance and number of nodes
  nodes <- sort(unique(c(edges_col$i, edges_col$j)))
  n_nodes <- length(nodes)
  
  # starting solution
  inisol <- gc_heuristic(edges_col)
  
  inisol <- inisol |>
    mutate(color = ifelse(color > n_c, n_c, color))
  
  #list of moves
  moves <- map_dfr(1:n_c, ~ bind_cols(tibble(node = nodes, color = .)))
  moves <- moves |>
    mutate(id = 1:n()) |>
    relocate(id)
  
  ##setting starting value of sol, fit, bestsol and bestfit
  sol <- inisol
  fit <- edges_invalid(edges_col, sol)
  bestsol <- inisol
  bestfit <- fit
  evalfit <- fit
  evalbest <- bestfit
  
  #initializing tabu list
  tabu_list <- rep(0, tabu_size)
  
  ##setting counters
  count <- 1
  count_tlist <- 0
  
  while(count <= iter & bestfit > 0){
    
    # nodes to explore not belonging to solution
    exploring_moves <- anti_join(moves, sol, by = c("node", "color"))
    
    # computing fitness for each mov
    exploring_moves <- exploring_moves |>
      mutate(fit = pmap_int(exploring_moves, ~ {
        test_sol <- sol |>
          filter(node != ..2)
        test_sol <- bind_rows(test_sol, tibble(node = ..2, color = ..3))
        ie <- edges_invalid(edges_col, test_sol)
        return(ie)
      }))
    
    exploring_moves <- exploring_moves |>
      arrange(fit)
    
    best_not_tabu <- exploring_moves |>
      filter(!id %in% tabu_list) |>
      slice(1)
    
    best_tabu <- exploring_moves |>
      filter(id %in% tabu_list) |>
      slice(1)
    
    if(nrow(best_tabu) == 0)
      best_tabu <- data.frame(id = 0, node = 0, color = 0, fit = Inf)
    
    if(best_tabu$fit < bestfit){
      sol <- sol |>
        filter(node != best_tabu$node)
      sol <- bind_rows(sol, best_tabu |> select(-c(fit, id)))
      fit <- best_tabu$fit
      asp <- TRUE
    }else{
      sol <- sol |>
        filter(node != best_not_tabu$node)
      sol <- bind_rows(sol, best_not_tabu |> select(-c(fit, id)))
      fit <- best_not_tabu$fit
      asp <- FALSE
    }
    
    if(fit < bestfit){
      bestsol <- sol
      bestfit <- fit
    }
    
    ##update tabu list
    if(!asp){
      tabu_list[count_tlist %% tabu_size + 1] <- best_not_tabu$id
      count_tlist <- count_tlist + 1
    }
    
    if(verbose)
      cat("iteration: ", count, " fit: ", bestfit, "\n")
    
    evalfit <- c(evalfit, fit)
    evalbest <- c(evalbest, bestfit)
    count <- count + 1
  }
  
  fit_track <- data.frame(step = 1:length(evalfit), fit = evalfit, best = evalbest)
  
  return(list(sol = bestsol, fit = bestfit, fit_track = fit_track))
  ## return solution if feasible solution is found  
}

Let’s find the solution with the tabu search heuristic:

tabu_sol <- tabu_gc(edges_col = edges_bcn_planar, n_c = 4, iter = 200, tabu_size = 100)

This new solution has four colors:

unique(tabu_sol$sol$color)
## [1] 1 4 2 3

The following plot presents the evolution of the algorithm. It has explored solutions with up to five edges with the same color until finding the optimum.

tabu_sol$fit_track |>
  pivot_longer(-step) |>
  ggplot(aes(step, value, color = name)) +
  geom_line() +
  theme_minimal(base_size = 12) +
  scale_color_manual(name = "colors", values = c("#CC0000", "#0080FF"))  +
  theme(legend.position = "bottom") +
  labs(title = "Tabu Search Algorithm", x = NULL, y = NULL)

Now it’s time to plot the map. First, we attach the labelling to the BCNNeigh table:

BCNNeigh <- left_join(BCNNeigh,
                      tabu_sol$sol,
                      by = c("c_barri" = "node"))

And finally we obtained the colored map.

BCNNeigh |>
  ggplot() +
  geom_sf(aes(fill = factor(color))) +
  theme_void() +
  theme(legend.position = "none") +
  scale_fill_brewer(palette = "Reds")

Map Coloring with the Graph Coloring Problem

In two posts, I have presented how to color a map using the graph coloring problem. We know that the chromatic number of a planar graph is no larger than four. To be planar, the edges of the graph must not cross. In this particular case, I have to remove some edges obtained with spatial analysis to make the graph planar and finding a coloring with four numbers.

In the previous post, I used linear programming to solve a small instance with 10 nodes and 19 connections. This instance is much larger, of 73 nodes with 186 observations. The linear programming formulation does not scale up to a problem of this size, so I needed to define a constructive heuristic to obtain a starting solution, and later refine it with a local search algorithm based on the tabu search metaheurisic.

Reference

Session Info

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Linux Mint 21.1
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Madrid
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggraph_2.2.1            tidygraph_1.3.1         sf_1.0-16              
##  [4] BAdatasetsSpatial_0.1.0 lubridate_1.9.3         forcats_1.0.0          
##  [7] stringr_1.5.1           dplyr_1.1.4             purrr_1.0.2            
## [10] readr_2.1.5             tidyr_1.3.1             tibble_3.2.1           
## [13] ggplot2_3.5.1           tidyverse_2.0.0        
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.5       xfun_0.43          bslib_0.7.0        ggrepel_0.9.5     
##  [5] tzdb_0.4.0         vctrs_0.6.5        tools_4.4.1        generics_0.1.3    
##  [9] proxy_0.4-27       fansi_1.0.6        highr_0.10         pkgconfig_2.0.3   
## [13] KernSmooth_2.23-24 RColorBrewer_1.1-3 lifecycle_1.0.4    compiler_4.4.1    
## [17] farver_2.1.1       munsell_0.5.1      ggforce_0.4.2      graphlayouts_1.1.1
## [21] htmltools_0.5.8.1  class_7.3-22       sass_0.4.9         yaml_2.3.8        
## [25] pillar_1.9.0       jquerylib_0.1.4    MASS_7.3-60        classInt_0.4-10   
## [29] cachem_1.0.8       wk_0.9.1           viridis_0.6.5      tidyselect_1.2.1  
## [33] digest_0.6.35      stringi_1.8.3      bookdown_0.39      labeling_0.4.3    
## [37] polyclip_1.10-6    fastmap_1.1.1      grid_4.4.1         colorspace_2.1-0  
## [41] cli_3.6.2          magrittr_2.0.3     utf8_1.2.4         e1071_1.7-14      
## [45] withr_3.0.0        scales_1.3.0       timechange_0.3.0   rmarkdown_2.26    
## [49] igraph_2.0.3       gridExtra_2.3      blogdown_1.19      hms_1.1.3         
## [53] memoise_2.0.1      evaluate_0.23      knitr_1.46         viridisLite_0.4.2 
## [57] s2_1.1.6           rlang_1.1.3        Rcpp_1.0.12        glue_1.7.0        
## [61] DBI_1.2.2          tweenr_2.0.3       rstudioapi_0.16.0  jsonlite_1.8.8    
## [65] R6_2.5.1           units_0.8-5