Plotting US airline airport networks

Jose M Sallan 2021-11-29 10 min read

In this post, I intend to examine some airport networks operating in the United States using the information of the USairports graph delivered in the igraphdata package. It is an opportunity to introduce how to work with graphs using tidygraph, and to use ggraph to plot graphs with ggplot2. I will pick maps with BAdatasetsSpatial, that loads the sf package. For additional data manipulation, I need the dplyr and stringr packages.

library(dplyr)
library(stringr)
library(igraphdata)
library(BAdatasetsSpatial)
library(ggraph)
library(tidygraph)

USairports is the network of passenger flights between airports in the United States, based on flights in 2010 December. To visualize node and edge attributes in a tidy way, we create a tidygraph object with as_tbl_graph

data("USairports") 
US_airports <- as_tbl_graph(USairports)
US_airports
## # A tbl_graph: 755 nodes and 23473 edges
## #
## # A directed multigraph with 6 components
## #
## # Node Data: 755 × 3 (active)
##   name  City          Position        
##   <chr> <chr>         <chr>           
## 1 BGR   Bangor, ME    N444827 W0684941
## 2 BOS   Boston, MA    N422152 W0710019
## 3 ANC   Anchorage, AK N611028 W1495947
## 4 JFK   New York, NY  N403823 W0734644
## 5 LAS   Las Vegas, NV N360449 W1150908
## 6 MIA   Miami, FL     N254736 W0801726
## # … with 749 more rows
## #
## # Edge Data: 23,473 × 8
##    from    to Carrier             Departures Seats Passengers Aircraft Distance
##   <int> <int> <chr>                    <dbl> <dbl>      <dbl>    <int>    <dbl>
## 1     1     4 British Airways Plc          1   226        193      627      382
## 2     1     4 British Airways Plc          1   299        253      819      382
## 3     2     7 British Airways Plc          1   216        141      627      200
## # … with 23,470 more rows

US_airports contains two tables containing node and edge attributes. The dataset contains number of passengers for each airline or carrier, so we could obtain a graph for each Carrier, being an airline airport network. The nodes of an airline airport network are the airports served by the airline, which are connected by a node if there is at least one flight connecting them. Defined in this way, airline airport networks are directed and unweighted, although it is a reasonable simplification to consider them undirected if the time window considered is large enough.

Let’s order carriers by number of flights We can do that because tidygraph allows us to access to node and edge attributes in a tidy way. We access the table of edges with activate(edges). To do grouping and summarizing we need to access the table of edges directly using as_tibble.

US_airports %>%
  activate(edges) %>%
  as_tibble() %>% #we need this to summarise
  group_by(Carrier) %>%
  summarise(flights = n(), .groups = "drop") %>%
  arrange(-flights)
## # A tibble: 118 × 2
##    Carrier                      flights
##    <chr>                          <int>
##  1 Delta Air Lines Inc.            2593
##  2 Southwest Airlines Co.          2253
##  3 SkyWest Airlines Inc.           1181
##  4 Hageland Aviation Service       1046
##  5 United Air Lines Inc.            965
##  6 US Airways Inc.                  906
##  7 Comair Inc.                      884
##  8 Continental Air Lines Inc.       845
##  9 American Eagle Airlines Inc.     765
## 10 American Airlines Inc.           701
## # … with 108 more rows

We find Delta and Southwest in the first place, followed by the unexpected SkyWest and Hageland Aviation Service. These are regional airlines, the later covering routes between Alaskan airports.

Let’s rank airlines by number of passengers.

US_airports %>%
  activate(edges) %>%
  as_tibble() %>% #we need this to summarise
  group_by(Carrier) %>%
  summarise(passengers = sum(Passengers), .groups = "drop") %>%
  arrange(-passengers)
## # A tibble: 118 × 2
##    Carrier                     passengers
##    <chr>                            <dbl>
##  1 Southwest Airlines Co.         9707625
##  2 Delta Air Lines Inc.           7172555
##  3 American Airlines Inc.         5443753
##  4 US Airways Inc.                3811869
##  5 United Air Lines Inc.          3384557
##  6 Continental Air Lines Inc.     2628803
##  7 AirTran Airways Corporation    1965722
##  8 SkyWest Airlines Inc.          1863194
##  9 JetBlue Airways                1796865
## 10 Alaska Airlines Inc.           1324029
## # … with 108 more rows

Now we find the largest US airlines in the first positions. US Airways was acquired by American Airlines in October 2015.

Geographical coordinates

To plot airport networks, we need the latitude and longitude of airports. This information is stored in the Position node attribute. Dataset help tells us that this position is presented as “WGS coordinates”. After some research, we can learn that the position is in the latitude/longitude in the form PDDMMSS PDDDMMSS or PDDMMSS PDDMMSS, where P is the position of coordinates (N and S for latitude, E and W for longitude) and DD, MM and SS are the degrees, minutes and seconds, respectively.

The WGS_latlon function picks this format, and returns latitude and longitude values. Let’s apply it to airport coordinates.

WGS_latlon <- function(wgs){
  
  m <- str_split_fixed(wgs, " ", n = 2)
  pos <- str_sub(m, start = 1, end = 1)
  sign <- ifelse(pos %in% c("N", "E"), 1, -1)
  value <- matrix(str_sub(m, start = 2), ncol = 2)
  value <- ifelse(nchar(value) == 6, paste0("0", value), value)
  
  degree <- as.numeric(str_sub(value, start = 1, end = 3))
  minute <- as.numeric(str_sub(value, start = 4, end = 5))
  second <- as.numeric(str_sub(value, start = 6, end = 7))
  
  dms <- degree + minute/60 + second/3600
  dms <- dms * sign
  dms <- matrix(dms, ncol = 2)
  
  return(dms)
}


coords <- WGS_latlon (US_airports %>%
                        activate(nodes) %>%
                        pull(Position))
## Warning in WGS_latlon(US_airports %>% activate(nodes) %>% pull(Position)): NAs
## introducidos por coerción

## Warning in WGS_latlon(US_airports %>% activate(nodes) %>% pull(Position)): NAs
## introducidos por coerción

I have obtained a NAs introduced by coercion warning, meaning that probably there are some missing data. Let’s load the function outcome as node attributes first:

US_airports <- US_airports %>%
  activate(nodes) %>%
  mutate(lat = coords[ , 1],
         lon = coords[, 2])

Now we can examine the missing data:

US_airports %>%
  activate(nodes) %>%
  filter(is.na(lat), is.na(lon))
## # A tbl_graph: 1 nodes and 0 edges
## #
## # A rooted tree
## #
## # Node Data: 1 × 5 (active)
##   name  City          Position   lat   lon
##   <chr> <chr>         <chr>    <dbl> <dbl>
## 1 KTN   Ketchikan, AK MIAMI       NA    NA
## #
## # Edge Data: 0 × 8
## # … with 8 variables: from <int>, to <int>, Carrier <chr>, Departures <dbl>,
## #   Seats <dbl>, Passengers <dbl>, Aircraft <int>, Distance <dbl>

KTN happens to be the Ketchikan International Airport. I have retrieved its coordinates from Wikipedia and added them to the dataset:

lat_KTN <- 55 + 21/60 + 15/3600
lon_KTN <- - (131 + 42/60 + 40/3600)

US_airports <- US_airports %>%
  activate(nodes) %>%
  mutate(lat = replace(lat, is.na(lat), lat_KTN),
         lon = replace(lon, is.na(lon), lon_KTN))

US_airports %>%
  activate(nodes) %>%
  filter(name == "KTN")
## # A tbl_graph: 1 nodes and 0 edges
## #
## # A rooted tree
## #
## # Node Data: 1 × 5 (active)
##   name  City          Position   lat   lon
##   <chr> <chr>         <chr>    <dbl> <dbl>
## 1 KTN   Ketchikan, AK MIAMI     55.4 -132.
## #
## # Edge Data: 0 × 8
## # … with 8 variables: from <int>, to <int>, Carrier <chr>, Departures <dbl>,
## #   Seats <dbl>, Passengers <dbl>, Aircraft <int>, Distance <dbl>

Obtaining airline airport networks

Now I am interested in obtaining the airport networks of selected carriers. The carrier_network function picks the nodes and edges of a carrier to build an airline airport network in the following way:

  • Selects the edges associated to a carrier filtering by the value of Carrier in the edge table.
  • To obtain airport nodes, it calculates node degree with centrality_degree. Node degree is the sum of edges arriving or departing from the node. So the function retains only nodes with degree different from zero. Finally, the function removes node degree.
carrier_network <- function(carrier){
  g <- US_airports %>%
          activate(edges) %>%
          filter(Carrier == carrier) %>%
          activate(nodes) %>%
          mutate(deg = centrality_degree()) %>%
          filter(deg != 0)
  
  g <- g %>%
    activate(nodes) %>%
    select(-deg)
  
  return(g)
}

Let’s obtain the airport networks of Southwest Airlines, Delta Air Lines and Hageland Aviation Service:

WN_airports <- carrier_network("Southwest Airlines Co.")
DL_airports <- carrier_network("Delta Air Lines Inc.")
H6_airports <- carrier_network("Hageland Aviation Service")

Plotting airline airport networks

The function carrier_graph plots the graph of an airport network obtained from US_airports. It uses ggraph to plot the network, using as layout (position of nodes) the latitude and longitude of each airport. Airline airport networks can be considered spatial networks, so I have overlaid a map with geom_sf. The fill of the map needs a value of alpha transparency smaller than one to see the network over the map. Node size is proportional to betweenness, so hub airports are presented with dots of larger size. The alaska logical variable allows focusing the map on Alaska, and us on mainland of the United States.

carrier_graph <- function(carrier_network, us = TRUE, alaska = TRUE){
  
  carrier_network <- carrier_network %>%
    activate(nodes) %>%
    mutate(btw = centrality_betweenness())
  
  #finding map
  us_map <- WorldMap1_10
  
  #position of airports
  layout <- as.matrix(carrier_network %>%
                        activate(nodes) %>%
                        as_tibble() %>%
                        select(lon, lat), byrow = TRUE)
  
  #finding hubs
  
  plot <- ggraph(carrier_network, layout = layout) +
    geom_edge_link(width = 0.1) +
    geom_node_point(color = "#CC0000", aes(size = btw)) +
    geom_sf(data = us_map, alpha = 0.25, fill = "#994C00") +
    # theme_void() +
    theme(panel.background = element_rect(fill = "#CCE5FF"), 
          legend.position = "none")
  
  if(alaska == TRUE & us == FALSE){
    plot <- plot +
      coord_sf(xlim = c(-180, -125), ylim = c(50, 71))
  }
  
  if(alaska == TRUE & us == TRUE){
    plot <- plot +
      coord_sf(xlim = c(-180, -62), ylim = c(17 , 71))
  }
  
  if(alaska == FALSE & us == TRUE){
    plot <- plot +
      coord_sf(xlim = c(-130, -62), ylim = c(25, 50))
  }
  
  return(plot)
}

Let’s apply the carrier_graph function to each of the airlines.

carrier_graph(WN_airports, alaska = FALSE)

Southwest Airlines network is dense, with many airports with a similar value of betweenness. This is typical of a point-to-point route network. This route network is typical of low cost carriers.

carrier_graph(DL_airports)

The graph of Delta Air Lines is very dense, covering a large range of airports. The difference of size between nodes is large, meaning that the operations of the airline are centered in their hubs, which are the Atlanta ATL and Minneapolis MSP airports. Delta Airlines is adopting a hub-and-spoke route network, with several hubs for domestic US flights. This network is typical of full service carriers.

carrier_graph(H6_airports, us = FALSE)

Hageland Aviation Services was a regional Alaskan airline, covering mainly Alaskan routes, which ceased operations in 2019. The Alaskan airport network is quite dense, as many towns have no roads leading to them, and are only accessible by airplane or by ship, as they have no roads connecting them to the rest of America. This regional airline had no large hub, so it has a point to point route network.

References

Session info

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 10 (buster)
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
## 
## locale:
##  [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] tidygraph_1.2.0         ggraph_2.0.5            ggplot2_3.3.5          
## [4] BAdatasetsSpatial_0.1.0 sf_1.0-2                igraphdata_1.0.1       
## [7] stringr_1.4.0           dplyr_1.0.7            
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.1   xfun_0.23          bslib_0.2.5.1      graphlayouts_0.7.2
##  [5] purrr_0.3.4        colorspace_2.0-1   vctrs_0.3.8        generics_0.1.0    
##  [9] viridisLite_0.4.0  htmltools_0.5.1.1  s2_1.0.6           yaml_2.2.1        
## [13] utf8_1.2.1         rlang_0.4.12       e1071_1.7-8        jquerylib_0.1.4   
## [17] pillar_1.6.4       glue_1.4.2         withr_2.4.2        DBI_1.1.1         
## [21] tweenr_1.0.2       wk_0.5.0           lifecycle_1.0.0    munsell_0.5.0     
## [25] blogdown_1.5       gtable_0.3.0       evaluate_0.14      knitr_1.33        
## [29] class_7.3-19       fansi_0.5.0        highr_0.9          Rcpp_1.0.6        
## [33] KernSmooth_2.23-20 scales_1.1.1       classInt_0.4-3     jsonlite_1.7.2    
## [37] farver_2.1.0       gridExtra_2.3      ggforce_0.3.3      digest_0.6.27     
## [41] stringi_1.7.3      ggrepel_0.9.1      bookdown_0.24      polyclip_1.10-0   
## [45] grid_4.1.2         cli_3.0.1          tools_4.1.2        magrittr_2.0.1    
## [49] sass_0.4.0         proxy_0.4-26       tibble_3.1.5       crayon_1.4.1      
## [53] tidyr_1.1.4        pkgconfig_2.0.3    ellipsis_0.3.2     MASS_7.3-54       
## [57] rstudioapi_0.13    viridis_0.6.1      assertthat_0.2.1   rmarkdown_2.9     
## [61] R6_2.5.0           units_0.7-2        igraph_1.2.6       compiler_4.1.2