Efficient Enumeration of Correlation Clustering Optimal Solution Space
- Copyright 2020-21 Nejat Arınık
EnumCC is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see the file LICENCE
- Lab site: http://lia.univ-avignon.fr/
- GitHub repo: https://github.com/CompNet/EnumCC
- Contact: Nejat Arınık arinik9@gmail.com, Vincent Labatut vincent.labatut@univ-avignon.fr
EnumCC
is an optimal solution space enumeration method for the Correlation Clustering (CC) problem. It relies on two essential tasks: Recurrent neighborhood search (RNS) and jumping onto an undiscovered solution. The former is performed by the component RNSCC
, whereas the latter is done by the commercial solver Cplex
.
In the first step, instead of directly jumping onto undiscovered optimal solutions one by one through Cplex
, as in a traditional sequential approach, its component RNSCC
discovers the recurrent neighborhood of the current optimal solution P with the hope of discovering new optimal solutions. The recurrent neighborhood of an optimal solution P, represents the set of optimal solutions, reached directly or indirectly from P depending on the maximum distance parameter maxNbEdit. Whether a new solution is found or not through RNSCC
, the jumping process into a new solution P is performed. If P is not empty, the workflow of RNS
and jumping is repeated again. Otherwise, the enumeration process stops. See our article [Arınık'23] for more details.
If you use this software, please cite article [Arınık'23]:
@Article{Arinik2023,
author = {Arınık, Nejat and Figueiredo, Rosa and Labatut, Vincent},
title = {Efficient Enumeration of the Optimal Solutions to the Correlation Clustering problem},
journal = {Journal of Global Optimization},
year = {2023},
volume = {86},
pages = {355-391},
doi = {10.1007/s10898-023-01270-3},
}
-
inFile
: Input file path. Seein/exemple.G
for the input graph format. -
outDir
: Output directory path. Default.
(i.e. the current directory). -
initMembershipFilePath
: The membership file path, from which theRNSCC
starts. It must be an optimal solution of the given signed graph. Moreover, It must be named asmembership0.txt
or something different thanmembership<x>.txt
. Seeout/exemple/membership0.txt
for its format. This file can be obtained through ExCC by running the scriptrun-cp-bb.sh
. -
java.library.path
: TheCplex
Java library path. It is usually found in<YOUR_CPLEX_PATH>/cplex/lib/cplex.jar
. -
maxNbEdit
: The maximum value edit distance value to be considered in edit operations. We show in our experiments thatmaxNbEdit=3
is usually more appropriate. -
tilim
: Time limit in seconds for the whole program. Default-1
, which means no time limit. -
solLim
: Maximum number of optimal solutions to be discovered. This can be useful when there is a huge number of optimal solutions, e.g. 50,000. Default-1
. -
JAR_filepath_RNSCC
: The jar file path forRNSCC
. -
LPFilePath
: It allows to import aCplex
LP file, corresponding to a ILP formulation of a signed graph for the CC problem. Remark: Such a file is obtained through Cplex by doingexportModel()
. This file can be obtained through ExCC by running the scriptrun-cp-bb.sh
. InExCC
, the name of this file isstrengthedModelAfterRootRelaxation.lp
.
Install IBM CPlex
. The default installation location is: /opt/ibm/ILOG/CPLEX_Studio<YOUR_VERSION>
. Tested with Cplex
12.8 and 20.1.
Put /opt/ibm/ILOG/CPLEX_Studio<YOUR_VERSION>/cplex/lib/cplex.jar
into the lib
folder in this repository.
Compile and get the jar file for RNSCC: ant -v -buildfile build-rns.xml compile jar
.
Compile and get the jar file for EnumCC ant -v -buildfile build.xml compile jar
.
We need a starting optimal solution and the ILP model of the given signed graph. We can obtain them by running the script run-cp-bb.sh
in the ExCC repository.
Run the script run.sh
.
Put /opt/ibm/ILOG/CPLEX_Studio<YOUR_VERSION>/cplex/lib/cplex.jar
into the lib
folder in this repository.
Compile and get the jar file for RNSCC
: ant -v -buildfile build-rns.xml compile jar
.
Compile and get the jar file for EnumCC
: ant -v -buildfile build.xml compile jar
.
Download the Sosocc repository and put these jar files into the lib
folder. Then, run first ExCC
and then EnumCC(3)
. See the instructions of the Sosocc
repository for more details.
Example command:
ant -v -buildfile build.xml compile jar
ant -v -buildfile build.xml -DinFile="in/example.G" -DoutDir="out/example" -DmaxNbEdit=3 -DinitMembershipFilePath="out/example/membership0.txt" -DLPFilePath="strengthedModelAfterRootRelaxation.lp" -DJAR_filepath_RNSCC="RNSCC.jar" -DnbThread=4 -Dtilim=-1 -DsolLim=5000 run
<x>
: Folder<x>
, wherex is a numerical value starting from 1. Each folder contains the result of a
RNSCC` execution and possesses one or multiple optimal solutions.allResults.txt
: File storing all absolute paths of the discovered optimal solutions.exec-time.txt
: Execution time for the whole enumeration process.jump-exec-time<x>.txt
: Execution time for thex
.th jumping process throughCplex
.jump-log<x>.txt
:Cplex
log file regarding the thex
.th jumping process.jump-status<x>.txt
: TheCplex
status result in the end of the jumoing process. Three values are possible:Optimal
,SolLim
,Infeasible
.membership<x>.txt
: The starting membership file for the (x+1
).thRNSCC
process.
- [Arınık'23] N. Arınık & R. Figueiredo & V. Labatut. Efficient Enumeration of the Optimal Solutions to the Correlation Clustering problem, Journal of Global Optimization, 86:355-391, 2023. DOI: 10.1007/s10898-023-01270-3 ⟨hal-03935831⟩