The R package SNA
provides a number of tools for analyzing social network data. This post reviews the function clique.census
from the SNA
package, and shows how it can be used to better understand the group structure among a list of network members.
Let’s start by creating a toy network. Say Henry, Sarah, Rick, Joe and Annie are all colleagues in a criminology department. Now imagine they’ve been asked to identify the most important people they collaborate with. The results of this fictitious effort are shown in the following edgelist.
edgelist <- matrix(c("Henry", "Sarah", "Henry", "Joe", "Sarah", "Joe", "Henry", "Sarah", "Henry", "Rick", "Sarah", "Rick", "Sarah", "Henry", "Sarah", "Rick", "Henry", "Rick", "Sarah", "Henry", "Sarah", "Joe", "Henry", "Joe", "Sarah", "Rick", "Sarah", "Annie", "Rick", "Annie"), ncol = 2)
Recall, a clique is a group where everyone in it “likes” everyone else. To identify cliques among our network of criminology researchers, we first transform it into a network object and then apply the SNA
function clique.census
.
library("network") net tabulate.by.vertex=FALSE, enumerate=TRUE, clique.comembership="bysize") # Identify cliques net_cc$clique.count
After applying the function clique.census
, we see there were two cliques among our respondents, each involving three researchers.
To identify the comembers of these cliques, we inspect the contents of the variable net_cc$clique.comemb[3, , ]
.
To paraphrase the SNA
documentation, the variable net_cc$clique.comemb
is a three dimensional matrix of size max clique size x n x n. In this example we observed cliques involving only three network members each. As such, the 3-clique comembership information is stored in the variable net_cc$clique.comemb[3, , ]
. (Note: if we observed one or more cliques with more than three members, say, a 4-clique, we could examine their comembership using the variable net_cc$clique.comemb[4, , ]
).
Notably, the format by which net_cc$clique.comemb[3, , ]
organizes clique comembership takes some getting used to. In fact, the main point of this post is to explain this organizational scheme in a more everyday kind of way.
Given our criminologist cliques, here’s how we find out who was in them. Recall, the largest clique we observed contained three individuals. Further recall that our network only contained five respondents. As such, the matrix net_cc$clique.comemb[3, , ]
is of size 5 x 5. This matrix mimics the network structure itself. That is, network members are listed along the rows and, in the exact same order, listed again across the columns. The values within the matrix then identify researchers who were in cliques together.
Let’s go through a couple columns together to better understand what this matrix is telling us exactly.
The values in the column Joe show who Joe was in a clique with. Zero values indicate who Joe was not in a clique with, while values greater than zero indicate who he was in a clique with. More than just showing who Joe was in a clique with, these values identify the different cliques he was a part of. Reading the column from top to bottom, Joe and Annie were in zero cliques together. Next we see that Joe was in a clique with Henry. Notably, Joe was in one clique with himself (i.e., there was only one clique that involved Joe). The rest of the column values show that Joe was in zero cliques with Rick and one clique with Sarah. In total, this column tell us there was one 3-clique that involved Joe (see Joe’s value for Joe) and that this clique involved the researchers Henry and Sarah.
Let’s look at a trickier column. The values in the column for Henry ranged from 0 to 2. As with Joe, the values show who Henry was in a clique with and how many times they were in the same clique. Going down the column, we see a zero value for Henry and Annie. That is, Henry and Annie were not part of the same clique. Notably, Henry is in two cliques with himself. That is, there were two cliques, each of which involved Henry. The rest of the values show that Henry was in one clique with Joe, another clique with Rick and two cliques with Sarah.
Combined, these column values tell us there were two 3-cliques: one clique involving Henry, Sarah and Joe and another involving Henry, Sarah and Rick.