Eiko Fried, researcher at the University of Amsterdam, recently blogged about personal collaborator networks. I came across his post on twitter, discussing how to conduct such analysis in R, and got inspired.
Unfortunately, my own publication record is quite boring to analyse, containing only a handful of papers. However, my promotors – Prof. dr. Jaap Paauwe and Prof. dr. Marc van Veldhoven – have more extensive publication lists. Although I did not manage to retrieve those using the scholar
package, I was able to scrape Jaap Paauwe’s publication list from his Google Scholar page. Jaap has 141 publications listed with one or more citation on Google Scholar. More than enough for an analysis!
While Eiko uses his colleague Sacha Epskamp’s R package qgraph
, I found an alternative in the packages igraph
and ggraph
.
### PAUL VAN DER LAKEN
### 2017-10-31
### COAUTHORSHIP NETWORK VISUALIZATION
# LOAD IN PACKAGES
library(readxl)
library(dplyr)
library(ggraph)
library(igraph)
# STANDARDIZE VISUALIZATIONS
w = 14
h = 7
dpi = 900
# LOAD IN DATA
pub_history <- read_excel("paauwe_wos.xlsx")
# RETRIEVE AUTHORS
pub_history %>%
filter(condition == 1) %>%
select(name) %>%
.$name %>%
gsub("[A-Z]{2,}|[A-Z][ ]", "", .) %>%
strsplit(",") %>%
lapply(function(x) gsub("\\..*", "", x)) %>%
lapply(function(x) gsub("^[ ]+","",x)) %>%
lapply(function(x) x[x != ""]) %>%
lapply(function(x) tolower(x))->
authors
# ADD JAAP PAAUWE WHERE MISSING
authors <- lapply(authors, function(x){
if(!"paauwe" %in% x){
return(c(x,"paauwe"))
} else{
return(x)
}
})
# EXTRACT UNIQUE AUTHORS
authors_unique <- authors %>% unlist() %>% unique() %>% sort(F)
# FORMAT AUTHOR NAMES
# CAPATILIZE
simpleCap <- function(x) {
s <- strsplit(x, " ")[[1]]
names(s) <- NULL
paste(toupper(substring(s, 1,1)), substring(s, 2),
sep="", collapse=" ")
}
authors_unique_names <- sapply(authors_unique, simpleCap)
The above retrieve the names of every unique author from the excel file I got from Google Scholar. Now we need to examine to what extent the author names co-occur. We do that with the below code, storing all co-occurance data in a matrix, which we then transform to an adjacency matrix igraph
can deal with. The output graph data looks like this:
# CREATE COAUTHORSHIP MATRIX
coauthorMatrix <- do.call(
cbind,
lapply(authors, function(x){
1*(authors_unique %in% x)
}))
# TRANSFORM TO ADJECENY MATRIX
adjacencyMatrix <- coauthorMatrix %*% t(coauthorMatrix)
# CREATE NETWORK GRAPH
g <- graph.adjacency(adjacencyMatrix,
mode = "undirected",
diag = FALSE)
V(g)$Degree <- degree(g, mode = 'in') # CALCULATE DEGREE
V(g)$Name <- authors_unique_names # ADD NAMES
g # print network
## IGRAPH f1b50a7 U--- 168 631 -- ## + attr: Degree (v/n), Name (v/c) ## + edges from f1b50a7: ## [1] 1-- 21 1--106 2-- 44 2-- 52 2--106 2--110 3-- 73 3--106 ## [9] 4-- 43 4-- 61 4-- 78 4-- 84 4--106 5-- 42 5--106 6-- 42 ## [17] 6-- 42 6-- 97 6-- 97 6--106 6--106 6--125 6--125 6--127 ## [25] 6--127 6--129 6--129 7--106 7--106 7--150 7--150 8-- 24 ## [33] 8-- 38 8-- 79 8-- 98 8-- 99 8--106 9-- 88 9--106 9--133 ## [41] 10-- 57 10--106 10--128 11-- 76 11-- 85 11--106 12-- 30 12-- 80 ## [49] 12--106 12--142 12--163 13-- 16 13-- 16 13-- 22 13-- 36 13-- 36 ## [57] 13--106 13--106 13--106 13--166 14-- 70 14-- 94 14--106 14--114 ## + ... omitted several edges
This graph data we can now feed into ggraph
:
# SET THEME FOR NETWORK VISUALIZATION
theme_networkMap <- theme(
plot.background = element_rect(fill = "beige"),
panel.border = element_blank(),
panel.grid = element_blank(),
panel.background = element_blank(),
legend.background = element_blank(),
legend.position = "none",
legend.title = element_text(colour = "black"),
legend.text = element_text(colour = "black"),
legend.key = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()
)
# VISUALIZE NETWORK
ggraph(g, layout = "auto") +
# geom_edge_density() +
geom_edge_diagonal(alpha = 1, label_colour = "blue") +
geom_node_label(aes(label = Name, size = sqrt(Degree), fill = sqrt(Degree))) +
theme_networkMap +
scale_fill_gradient(high = "blue", low = "lightblue") +
labs(title = "Coauthorship Network of Jaap Paauwe",
subtitle = "Publications with more than one Google Scholar citation included",
caption = "paulvanderlaken.com") +
ggsave("Paauwe_Coauthorship_Network.png", dpi = dpi, width = w, height = h)
Feel free to use the code to look at your own coauthorship networks or to share this further.
could you please share me the “paauwe_wos.xlsx”
LikeLike
Sure, send me your email address via paulvanderlaken@gmail.com, and I will forward you the data.
You can also extract these and other data yourself via Google Scholar
LikeLike