Invocation

Main operations

To call CoDACom, the main script is :

sh execute.sh [-pmac] [varpath]

The options correspond to the subparts that are executed, if no option is provided all the subparts will be run sequentially. The optional argument is the path to the vars.sh configuration file, found by default in the execution folder. Please note that, without inputs and a configuration step, the program will do nothing. You need to :

Main files

To give instructions to the program, the three following files are important.

First, one need to provide the graphs to the program by placing them in the appropriate folders. It corresponds to the RAW_INPUT_DIR variable in vars.sh, which is raw/inputs by default. The corresponding ground truth file goes to the RAW_GROUND_TRUTH folder, raw/ground_truth by default.

The numerical results may be found in the RESULT_DIR folder, specified in vars.sh. The default folder is the subfolder results. Clusterings may be found in the OUTPUT_DIR, default being clusterings.

Subparts

Four purposes have been identified for CoDACom, corresponding to four different behaviors.

Preprocessing

In order to normalise and to avoid errors, the inputs graphs are :

This reordering uses a BFS algorithm that improves most of the processing thanks to caching.

A ground-truth may also be provided. It is preprocessed as well in order to keep consistency with the graph ids. A translation file is also outputted, which tracks the translation from old to new id.

Methods

The implementation of various community detection algorithms is executed. Sometimes, some change of the graph format from the internal one is needed to match the expectation of the specific implementation.

Analysis

In this subpart, comparison methods and quality functions are used. Comparison methods are functions that compare the result of two clusterings, without taking the graph topology into account. Quality functions quantify the quality of clusterings.

First, a simple comparison of all pairs of clusterings is processed. Second, the qualities of each clustering are computed. If the graphs feature a ground-truth, the clusterings are compared to this ground-truth, and Spearman's coefficient is computed between the ranking of the clusters from the perspective of the quality functions and the ranking of the ground-truth.

Conversion

You may convert the clusterings in a graphml format, which can be visualised using any modern graph visualisation software, such as Gephi. The clusterings are represented as edges labels (internal/external edges) and node labels (cluster id).

All the outputs of the program are originally available as space separated table files. To make things easier, a tabular LaTeX conversion is provided.

The quotient graph is constituted by the clusters as nodes and an edge exist if there is at least one edge connecting the clusters in the base graph. Useful data about the clusters and their connections is available in the labels.

Available components

When using CoDACom for a scientific publication, it would be kind of you to cite the original author of the used components.

Methods

Comparison

Quality functions

Restrictions

CoDACom does not (yet) support directed or weighted graphs nor overlapping community detection.

Formats

Accepted graph formats (see the related igraph documentation for more information) :

Accepted ground-truth format : list of nodes and membership

node_id1 cluster_id1 cluster_id2 ...
node_id2 cluster_id1 cluster_id3 ...
...

Translation format : list of nodes id

old_id new_id
...

Internal graph format : list of edges

node_id1 node_id2
...

Internal ground-truth format : list of nodes and membership

node_id1 cluster_id1 cluster_id2 ...
node_id2 cluster_id1 cluster_id3 ...
...

File hierarchy breakdown