The power law

Jose M Sallan 2021-07-29 7 min read

A power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities. That relationship can be expressed as:

\[ y = cx^{-\gamma} \]

where \(\gamma > 0\) is the exponent of the power law.

Power law relationships are pervasive in physics, biology, economics and sociology. Here I will introduce how to represent power laws using R and present two empirical phenomena that can follow a power law: the degree distribution of a Barabási-Albert (BA) graph and the count of occurrences of words in a text as a function of its ranking, know as Zipf’s law.

The R packages I will be using are:

  • data.table for data frame manipulation,
  • ggplot2, viridis and patchwork for visualization,
  • igraph to generate a sample of a BA graph and obtain node degree,
  • and tm library for word count.
library(data.table)
library(ggplot2)
library(viridis)
library(patchwork)
library(igraph)
library(tm)

Let’s start defining a function to generate the values of a power law of exponent gamma. I will scale the outcome with the sum of original values, so they can be understood as relative frequencies or probability of occurrence:

pl_freq <- function(val, gamma){
  
  res <- val^gamma
  res <- res/sum(res)
  
  return(res)
}

Let’s create a power_law data table with 30 points of power laws of different exponents:

x_pl <- 10^c(seq(0, 4, length.out = 30))
power_law <- data.table(x = x_pl, 
                        y1 = pl_freq(x_pl, -1/2), 
                        y2 = pl_freq(x_pl, -1), 
                        y3 = pl_freq(x_pl, -2))

Let’s see how can we represent power laws. The plot on the left presents a straight representation of the power law. We see that all three power laws decay fast, so it is hard to distinguish them. The best way to represent a power law is in a log-log plot. That’s why I have defined x_pl as a set of points evenly spaced in a log plot. We can use scale_x_log10() and scale_y_log10() to transform axis in log scale without transforming data.

In the rigth plot, we observe that power laws appear as straigth lines in a log-log plot. I have used the patchwork package to present both plots in the same panel, and viridis scales to distinguish each value of gamma.

pl <- ggplot(melt(power_law, id.vars = "x"), aes(x, value, color = variable)) +
  geom_point(size=2) +
  geom_line() +
  theme_bw() +
  scale_color_viridis_d(name = "gamma", labels = c("1/2", "1", "2")) +
  labs(title = "Power law", x = "value", y = "frequence")

pl_log <- ggplot(melt(power_law, id.vars = "x"), aes(x, value, color = variable)) +
  geom_point(size=2) +
  geom_line() +
  theme_bw() +
  scale_color_viridis_d(name = "gamma", labels = c("1/2", "1", "2")) +
  labs(title = "Power law (log-log)", x = "value", y = "frequence") +
  scale_x_log10() +
  scale_y_log10()
  

pl + pl_log

Another useful plot for power laws is the plot of cumulative probability. It is defined as:

\[p_{cum}\left(X\right) = p \left(X \geq x \right)\]

I have ordered the probabilities by decreasing order of x and used cumsum to obtain the cumulative probability. As we will see later, this plot is more regular than the frequency plot for empirical power law distributions.

power_law <- power_law[order(-x)]
power_law[, pcum_y3 := cumsum(y3)]

ggplot(power_law, aes(x, pcum_y3)) +
  geom_point(size = 2) +
  geom_line() +
  theme_bw() +
  scale_x_log10() +
  scale_y_log10() +
  labs(title = "Cumulative probability for gamma = 2", x = "value", y = "cumulative probability")

Let’s now examine two examples of emergence of power laws: the degree distribution of BA graphs and frequency versus rank word count.

BA graphs

Barabási-Albert (BA) graphs are undirected graphs constructed through growth and preferential attachment mechanisms. By growth we mean that the graph is constructed iteratively, adding one node at a time. Through the preferential attachment mechanisms, each new node is linked to existing nodes with a probability proportional to the degree \(k\) (number of incident links) of the later. BA graphs work as a good modelling approximation to friendship or airport networks, among others.

We can construct a BA graph using the barabasi.game function of igraph. We can obtain the degree of each node with an igraph function. Then I build a deg_dist data table with the frequence of occurrence of each degree. This is the degree distribution. Finally, I calculate the cumulative probability of each degree cum_pr.

n <- 10000
g <- barabasi.game(n)
deg_df <- data.table(k = degree(g))
deg_dist <- deg_df[ , .N, k][order(-k)]
deg_dist[, cum_pr := cumsum(N)/n]

Let’s plot the degree distribution and cumulative probability of degree in log-log plots.

sf_dd <- ggplot(deg_dist, aes(k , N)) +
  geom_point() +
  theme_bw() +
  scale_x_log10() +
  scale_y_log10() +
  labs(title = "Degree distribution of a BA graph", x = "degree", y = "frequence")

sf_cdd <- ggplot(deg_dist, aes(k , cum_pr)) +
  geom_point() +
  geom_line() +
  theme_bw() +
  scale_x_log10() +
  scale_y_log10() +
  labs(title = "Degree cumulative probability of a BA graph", x = "degree", y = "cumulative probability")

sf_dd + sf_cdd

We learn from the plots that the degree distribution of BA graphs is heterogeneous. Most of the nodes have low degree, while a few nodes are hubs of high degree. In the plot of the left we appreciate that the variance of frequency increases for high values of degree. The cumulative degree plot has less variability, and fits reasonably to a straight line. This analysis allows us to conclude that the degree distribution of BA graphs follows a power law.

Word count

The number of occurrences of a word in a text is said to follow the Zipf’s law, which states that the frequency of a word is inversely proportional to a power of its rank in the frequency table. This means that the frequency versus rank plot for words of a text is expected to follow a power law.

To evaluate this, I am using the acq database, a corpus of 50 articles from the Reuters-21578 data set. It is included in tm, a text mining package for R.

Here I am using tm functions to obtain a words table containing all words present in the text. Then, I build the freq_words table of frequencies (number of apparitions) of each word, and calculate its relative frequence and cumulative probability.

data(acq)
acq <- tm_map(acq, removePunctuation)  
acq <- tm_map(acq, removeNumbers)     
acq <- tm_map(acq, tolower)
acq <- tm_map(acq, stripWhitespace)   
acq <- tm_map(acq, PlainTextDocument) 

wrds <- strsplit(paste(unlist(acq), collapse = " "), ' ')[[1]]

words <- data.table(word = wrds)
freq_words <- words[, .N, words][order(-N)]
freq_words[, `:=`(freq = N/sum(N), rank = 1:nrow(freq_words))]
freq_words <- freq_words[order(-rank)]
freq_words[, p_cum := cumsum(freq)]

Let’s look at the first and last values of the table:

head(freq_words[order(rank)])
##    word   N       freq rank     p_cum
## 1:  the 414 0.05316553    1 1.0000000
## 2:   of 284 0.03647104    2 0.9468345
## 3:   to 199 0.02555541    3 0.9103634
## 4: said 186 0.02388596    4 0.8848080
## 5:    a 175 0.02247335    5 0.8609220
## 6:  and 173 0.02221651    6 0.8384487
tail(freq_words[order(rank)])
##                word N         freq rank        p_cum
## 1:       profitably 1 0.0001284192 1679 0.0007705150
## 2:          someone 1 0.0001284192 1680 0.0006420958
## 3:         catalyst 1 0.0001284192 1681 0.0005136766
## 4:        extending 1 0.0001284192 1682 0.0003852575
## 5:        specialty 1 0.0001284192 1683 0.0002568383
## 6: 38.4382979869843 1 0.0001284192 1684 0.0001284192

And plot the log-log graphs for relative frequences and cumulative probability:

zipf <- ggplot(freq_words, aes(rank , freq)) +
  geom_point() +
  theme_bw() +
  scale_x_log10() +
  scale_y_log10() +
  labs(title = "Word count vs rank", x = "rank", y = "relative frequence")

zipf_cum <- ggplot(freq_words, aes(rank , p_cum)) +
  geom_point() +
  geom_line() +
  theme_bw() +
  scale_x_log10() +
  scale_y_log10() +
  labs(title = "Word count vs rank", x = "rank", y = "cumulative probability")

zipf + zipf_cum

We observe that the left plot fits reasonably well to a power law. The plot of cumulative probability shows an exponential decay for high values of rank (non-frequent words), suggesting that the number of unique words of the corpus is too small to fully check if frequency versus rank plot is really a power law in this sample.

References and further reading

Built with R 4.1.0, data.table 1.14.0, ggplot2 3.3.4, viridis 0.6.1, patchwork 1.1.1, igraph 1.2.6 and tm 0.7-8.