Community detection algorithm

About the code

The code is an implementation of the algorithm described in [1]. It provides an efficient way to perform community detection on complex networks using modularity maximization. It also returns the effect size of the partition found as a

If you use this code for your research, I kindly ask you to cite Ref. 1 in your publications.

Download the code

How to use

The code consists of the files ComDet.c, ComDet.h and Comm.h. To use it, just include Comm.h in your code, and compile ComDet.c with your other source files, remembering to use the option -lm as it needs to link to the math library.

The code defines a new data structure, called comstruct. Its members are

void modinitmat(int **adj, const int nodes)

where adj contains the adjacency matrix, and nodes is the number of nodes in the network. If, instead, the network is stored as an adjacency list, such as those produced by the graph sampling algorithm, then the function to invoke is modinitlist. Its prototype is

void modinitlist(int **list, const int *sequence, const int nodes)

where list contains the adjacency list, sequence is an array of integers containing the degree sequence of the network, and nodes is the number of nodes.

Once the method is initialized, to detect the community structure, invoke the function modularity. The prototype of this function is

comstruct modularity(double (*rng)(void), const double tol, const double tolpm, const double taccpt)

The function returns a partition of the network that aims to maximize modularity, using the user-specified random number generator rng. The random number generator must be a function taking no input parameters and returning a double precision floating point number between 0 and 1. The generator must be already seeded. This leaves the user the choice of the generator to use. The variables tol, tolpm and taccpt allow to specify tolerances for the various steps of the method. In particular, tol is the tolerance used for internal comparisons in the refining steps, tolpm is the one used in finding the leading eigenvalue of the modularity matrix, and taccpt is the one used in the main cycle to decide whether to accept a refinement of the partitioning. So, given a network, a set of tolerances and a choice of random number generator, the user creates a partitioning by declaring some variable P of type comstruct, and then assigning it the return value of modularity:

P = modularity(rng,tol,tolpm,taccpt);

After the assignment, P.modularity is the modularity of the partitioning, P.zscore is its

P = gsam(rng);

Q = gsam(rng);

will result in the same community label list stored in P.community and in Q.community.

After finishing working with a given network, the memory used should be cleaned by invoking modclean(). Afterwards, the sampler is ready to be initialized again with another network.

Included are also two simple proofs of concept (POC). Both proofs of concept illustrate how to run the code on a network stored as adjacency matrix or adjacency list, with the second POC including an external call to the freely available Graphviz software (www.graphviz.org), which, if installed on your computer, produces a visualization of the networks with the communities separated by distance and distinguished by colour, for increased ease of understanding, using a tuned force-driven spring model.

To compile the POCs, use the commands

gcc -lm -o poc poc.c ComDet.c

gcc -lm -o poc2 poc2.c ComDet.c

References

[1] Treviño III, Nyberg, del Genio and Bassler, J. Stat. Mech. - Theory E. (2015) P02003

Release information

Current version

Older versions

About the code

The code is an implementation of the algorithm described in [1]. It provides an efficient way to perform community detection on complex networks using modularity maximization. It also returns the effect size of the partition found as a

*z*-score computed using an Erdős-Rényi null model. The algorithm consists of carefully controlled iterations of leading eigenvalue bisection, followed by several tuning steps, including a local Kernighan-Lin-like fine-tuning, a global final-tuning and an agglomeration step.If you use this code for your research, I kindly ask you to cite Ref. 1 in your publications.

Download the code

How to use

The code consists of the files ComDet.c, ComDet.h and Comm.h. To use it, just include Comm.h in your code, and compile ComDet.c with your other source files, remembering to use the option -lm as it needs to link to the math library.

The code defines a new data structure, called comstruct. Its members are

- double modularity
- double zscore
- int *community

void modinitmat(int **adj, const int nodes)

where adj contains the adjacency matrix, and nodes is the number of nodes in the network. If, instead, the network is stored as an adjacency list, such as those produced by the graph sampling algorithm, then the function to invoke is modinitlist. Its prototype is

void modinitlist(int **list, const int *sequence, const int nodes)

where list contains the adjacency list, sequence is an array of integers containing the degree sequence of the network, and nodes is the number of nodes.

Once the method is initialized, to detect the community structure, invoke the function modularity. The prototype of this function is

comstruct modularity(double (*rng)(void), const double tol, const double tolpm, const double taccpt)

The function returns a partition of the network that aims to maximize modularity, using the user-specified random number generator rng. The random number generator must be a function taking no input parameters and returning a double precision floating point number between 0 and 1. The generator must be already seeded. This leaves the user the choice of the generator to use. The variables tol, tolpm and taccpt allow to specify tolerances for the various steps of the method. In particular, tol is the tolerance used for internal comparisons in the refining steps, tolpm is the one used in finding the leading eigenvalue of the modularity matrix, and taccpt is the one used in the main cycle to decide whether to accept a refinement of the partitioning. So, given a network, a set of tolerances and a choice of random number generator, the user creates a partitioning by declaring some variable P of type comstruct, and then assigning it the return value of modularity:

P = modularity(rng,tol,tolpm,taccpt);

After the assignment, P.modularity is the modularity of the partitioning, P.zscore is its

*z*-score, and P.community contains the community labels of each node. A new partitioning of the same network can be obtained invoking modularity again. Please note that the label list of a previous partitioning is destroyed with further calls to the sampler,**even if the sample is assigned to a different variable**. Thus, for instance, the linesP = gsam(rng);

Q = gsam(rng);

will result in the same community label list stored in P.community and in Q.community.

After finishing working with a given network, the memory used should be cleaned by invoking modclean(). Afterwards, the sampler is ready to be initialized again with another network.

Included are also two simple proofs of concept (POC). Both proofs of concept illustrate how to run the code on a network stored as adjacency matrix or adjacency list, with the second POC including an external call to the freely available Graphviz software (www.graphviz.org), which, if installed on your computer, produces a visualization of the networks with the communities separated by distance and distinguished by colour, for increased ease of understanding, using a tuned force-driven spring model.

To compile the POCs, use the commands

gcc -lm -o poc poc.c ComDet.c

gcc -lm -o poc2 poc2.c ComDet.c

References

[1] Treviño III, Nyberg, del Genio and Bassler, J. Stat. Mech. - Theory E. (2015) P02003

Release information

Current version

**6.3**: Bug fixed: in networks with a very sparse largest component and a smaller, very dense component of similar size, the code could underestimate the memory needed.Older versions

**6.2**: Initial release.