Authors: Dominik Krupke, Ahmad Moradi, Michael Perk, Phillip Keldenich, Gabriel Gehrke, Sebastian Krieter, Thomas Thüm, and Sándor P. Fekete
SampLNS is an LNS-based optimizer for pairwise configuration sampling that comes with a lower bound proving technique. On many of the instances we tested, it is able to compute smaller samples than YASA and frequently to even prove optimality. A paper describing the approach is currently in progress.
⚠️ The implementation is for research purposes only and not yet ready for production. Please contact us if you want to use it in production.
The general idea of SampLNS is as follows: Theoretically, the problem of finding a minimal sample can be expressed as a SAT-problem. However, this is not feasible in practice as the formula is way too large. If we have an initial sample, we can use it to fix a part of the formula and then solve the remaining, much smaller, formula. If we repeat this process multiple times, we have a good chance of finding a minimal sample. This is essentially a Large Neighborhood Search (LNS). To break symmetries in the formula and supply the LNS with a lower bound, we have an additional optimizer that searches for a large set of mutually exclusive pairs of features.
What is a Large Neighborhood Search (LNS)? It is a metaheuristic for optimization problems that works as follows: You start with a feasible solution. Then, you iteratively destroy a part if the solution and try to repair it in a way that improves the solution. We are using Mixed Integer Programming and a SAT-based approach to do so, which allows us to find optimal solutions for the repair step. Mixed Integer Programming is a powerful technique that can solve many NP-hard problems optimally up to a reasonable size. The size of the destroyed part is scaled automatically to be as large as possible while still being able to find a repair. The larger the destroyed part, the more likely it is to escape local minima and find an even better solution, potentially even an optimal solution. As the candidate set of improvements is implicitly defined by an optimization problem, we do not have to iterate over all possible improvements as simple local search strategies have to. This allows us to consider a significantly larger number of improvements, which significantly increases the chance of finding a good solution. The downside is that we have to solve an optimization problem for each improvement, which is computationally expensive and limits scalability.
What is a lower bound and why is it useful? A lower bound is a value that is guaranteed to be smaller than the optimal solution. If you have a lower bound, you can stop the optimization process as soon as you reach it. The lower bound proves that you cannot find a better solution. If the optimization process is not able to match it, the lower bound at least gives you a guaranteed upper bound on the error of your solution. Computing perfect lower bounds for NP-hard problems is itself NP-hard, so we are only able to approximate them. However, every lower bound returned is still guaranteed to be smaller than the optimal solution. It may just not be sufficient to prove optimality.
Why does SampLNS not compute its own initial samples? SampLNS requires an initial sample to start the optimization process. We do not compute this initial sample ourselves, but require the user to provide it. This is because there are already many good tools for computing initial samples, such as FeatureIDE. We do not want to reinvent the wheel and instead focus on the optimization process. If you do not have an initial solution, you can use the FeatureIDE to compute one.
Sample sizes and lower bounds (mean over five runs) obtained by SampLNS with a time limit of 900 sec compared to the best sample found by any of the other algorithms (each run five times). Bold values are proved to be optimal, but the lower bound to prove optimality may have been obtained by a separate run. In parenthesis are the best values computed with extended time limits of up to three hours.
Feature Model | Best previous alg. | SampLNS | Lower Bound | Savings by SampLNS | ||
---|---|---|---|---|---|---|
calculate | 9 | 15 | 9 | 5 (5) | 5 (5) | 44% (44%) |
lcm | 9 | 16 | 8 | 6 (6) | 6 (6) | 25% (25%) |
10 | 17 | 6 | 6 (6) | 6 (6) | 0% (0%) | |
ChatClient | 14 | 20 | 8 | 7 (7) | 7 (7) | 12% (12%) |
toybox_2006-10-3... | 16 | 13 | 10 | 8 (8) | 8 (8) | 20% (20%) |
car | 16 | 33 | 6 | 5 (5) | 5 (5) | 17% (17%) |
FeatureIDE | 19 | 27 | 11 | 8 (8) | 7 (7) | 27% (27%) |
FameDB | 22 | 40 | 9 | 8 (8) | 8 (8) | 11% (11%) |
APL | 23 | 35 | 9 | 7 (7) | 7 (7) | 22% (22%) |
SafeBali | 24 | 45 | 11 | 11 (11) | 10 (11) | 0% (0%) |
TightVNC | 28 | 39 | 12 | 8 (8) | 8 (8) | 33% (33%) |
APL-Model | 28 | 40 | 11 | 8 (8) | 8 (8) | 27% (27%) |
gpl | 38 | 99 | 19 | 16 (16) | 16 (16) | 16% (16%) |
SortingLine | 39 | 77 | 12 | 9 (9) | 9 (9) | 25% (25%) |
dell | 46 | 244 | 35 | 31 (31) | 31 (31) | 11% (11%) |
PPU | 52 | 109 | 12 | 12 (12) | 12 (12) | 0% (0%) |
berkeleyDB1 | 76 | 147 | 20 | 15 (15) | 15 (15) | 25% (25%) |
axTLS | 96 | 183 | 17 | 11 (11) | 10 (10) | 35% (35%) |
Violet | 101 | 203 | 23 | 17 (17) | 14 (16) | 26% (26%) |
berkeleyDB2 | 119 | 346 | 21 | 12 (12) | 11 (12) | 43% (43%) |
soletta_2015-06-... | 129 | 192 | 30 | 24 (24) | 24 (24) | 20% (20%) |
BattleofTanks | 144 | 769 | 453 | 344 (314) | 256 (256) | 24% (31%) |
BankingSoftware | 176 | 280 | 40 | 29 (29) | 28 (29) | 28% (28%) |
fiasco_2017-09-2... | 230 | 1181 | 240 | 226 (225) | 223 (225) | 5.8% (6.2%) |
fiasco_2020-12-0... | 258 | 1542 | 214 | 198 (196) | 196 (196) | 7.5% (8.4%) |
uclibc_2008-06-0... | 263 | 1699 | 506 | 505 (505) | 505 (505) | 0.2% (0.2%) |
uclibc_2020-12-2... | 272 | 1670 | 365 | 365 (365) | 365 (365) | 0% (0%) |
E-Shop | 326 | 499 | 20 | 12 (12) | 8 (9) | 39% (40%) |
toybox_2020-12-0... | 334 | 92 | 18 | 14 (13) | 7 (8) | 23% (28%) |
DMIE | 366 | 627 | 26 | 16 (16) | 16 (16) | 38% (38%) |
busybox_2007-01-... | 540 | 429 | 36 | 22 (21) | 19 (21) | 39% (42%) |
fs_2017-05-22 | 557 | 4992 | 398 | 396 (396) | 396 (396) | 0.5% (0.5%) |
WaterlooGenerated | 580 | 879 | 144 | 82 (82) | 82 (82) | 43% (43%) |
financial_services | 771 | 7238 | 4396 | 4386 (4343) | 4132 (4336) | 0.22% (1.2%) |
busybox-1_18_0 | 854 | 1164 | 29 | 18 (17) | 12 (13) | 37% (41%) |
busybox-1_29_2 | 1018 | 997 | 37 | 25 (23) | 16 (20) | 33% (38%) |
busybox_2020-12-... | 1050 | 996 | 34 | 23 (21) | 13 (19) | 32% (38%) |
am31_sim | 1178 | 2747 | 63 | 41 (37) | 24 (26) | 34% (41%) |
EMBToolkit | 1179 | 5414 | 1886 | 1889 (1872) | 1593 (1872) | -0.16% (0.74%) |
atlas_mips32_4kc | 1229 | 2875 | 66 | 45 (38) | 29 (33) | 32% (42%) |
eCos-3-0_i386pc | 1245 | 3723 | 66 | 55 (43) | 30 (33) | 17% (35%) |
integrator_arm7 | 1272 | 2980 | 66 | 45 (39) | 30 (33) | 32% (41%) |
XSEngine | 1273 | 2942 | 63 | 44 (39) | 27 (32) | 30% (38%) |
aaed2000 | 1298 | 3036 | 89 | 68 (54) | 46 (51) | 23% (39%) |
FreeBSD-8_0_0 | 1397 | 15692 | 76 | 66 (47) | 27 (29) | 13% (38%) |
ea2468 | 1408 | 3319 | 67 | 46 (40) | 29 (32) | 32% (40%) |
SampLNS is a Python-library that comes with a CLI. It can be easily installed
with the Python package manager pip
, which also handles most of the
dependencies (except for a C++-compiler and a Java runtime environment). All C++
and Java dependencies are compiled automatically during installation for most
systems. If you already have an activated Gurobi-license, the installation
should be done with a simple pip install .
(we consider a simple installation
of scientific projects important to allow verifying reproducibility). We
recommend a Linux system, but it should also work on Windows and Mac. After
installation, we recommend to check out the examples in examples/
to get
started.
It is highly recommended to use (ana)conda. It not only makes the installation of Gurobi much easier, but also prevents your system from being polluted with dependencies.
This package requires a valid license of the commercial MIP-solver Gurobi. You can get a free license for academic purposes. These academic licenses are fairly easy to obtain, as explained in this video:
- Register an academic account.
- Install Gurobi (very easy with Conda, you only need the tool
grbgetkey
). - Use
grbgetkey
to set up a license on your computer. You may have to be within the university network for this to work.
After you got your license, move into the folder with setup.py
and run
pip install -v .
This command should automatically install all dependencies and build the package. The package contains native C++-code that is compiled during installation. This requires a C++-compiler. On most systems, this should be installed by default. If not, you can install it via
sudo apt install build-essential # Ubuntu
sudo pacman -S base-devel # Arch
If you don't have initial samples at hand you might want to generate initial samples for SampLNS with FeatJAR. This requires you to install Java version 11 or higher.
sudo apt-get install openjdk-11-jdk
Generally, the installation will take a while as it has to compile the C++, but it should work out of the box. If you encounter any problems, please open an issue. Unfortunately, the performance of native code is bought with a more complex installation process, and it is difficult to make it work on all systems. Windows systems are especially difficult to support. We suggest using a Linux system.
If there are problems with the C++-compilation, you can also check out these common problems.
We provide a CLI and a Python interface. The CLI is the easiest way to get started. If you have an initial sample you can use it as follows:
samplns -f <path/to/model> --initial-sample <path/to/intial/sample>
If you do not have an initial sample, you can use the following command to compute one with YASA:
samplns -f <path/to/model> --initial-sample-algorithm YASA
For further options see help:
samplns --help
usage: samplns [-h] -f FILE [-o OUTPUT] (--initial-sample INITIAL_SAMPLE | --initial-sample-algorithm {YASA,YASA3,YASA5,YASA10}) [--initial-sample-algorithm-timelimit INITIAL_SAMPLE_ALGORITHM_TIMELIMIT] [--samplns-timelimit SAMPLNS_TIMELIMIT] [--samplns-max-iterations SAMPLNS_MAX_ITERATIONS] [--samplns-iteration-timelimit SAMPLNS_ITERATION_TIMELIMIT] [--cds-iteration-timelimit CDS_ITERATION_TIMELIMIT]
Starts samplns either with a given initial sample or runs another sampling algorithm before.
options:
-h, --help show this help message and exit
-f FILE, --file FILE File path to the instance (either FeatJAR xml or DIMACS format.)
-o OUTPUT, --output OUTPUT
File path to the output file. Default: sample.json
--initial-sample INITIAL_SAMPLE
Set this when you already have an initial sample. File path to an initial sample (JSON) that should be used.
--initial-sample-algorithm {YASA,YASA3,YASA5,YASA10}
Set this if you want to run a sampling algorithm to generate an initial sample. YASA has several versions for different values of m.
--initial-sample-algorithm-timelimit INITIAL_SAMPLE_ALGORITHM_TIMELIMIT
Timelimit of the initial sampling algorithm in seconds.
--samplns-timelimit SAMPLNS_TIMELIMIT
Timelimit of samplns in seconds.
--samplns-max-iterations SAMPLNS_MAX_ITERATIONS
Maximum number of iterations for samplns.
--samplns-iteration-timelimit SAMPLNS_ITERATION_TIMELIMIT
Timelimit for each iteration of samplns in seconds.
--cds-iteration-timelimit CDS_ITERATION_TIMELIMIT
Timelimit for each iteration of the lower bound computation in seconds.
If you want to use the Python interface, you can check out the following example
from examples/
:
import json
from samplns.simple import SampLns
from samplns.instances import parse
if __name__ == "__main__":
feature_model = parse("./toybox_2006-10-31_23-30-06/model.xml")
with open("./yasa_sample.json") as f:
initial_sample = json.load(f)
solver = SampLns(
instance=feature_model,
initial_solution=initial_sample,
)
solver.optimize(
iterations=10000,
iteration_timelimit=60.0,
timelimit=900,
)
optimized_sample = solver.get_best_solution(verify=True)
print(
f"Reduced initial sample of size {len(initial_sample)} to {len(optimized_sample)}"
)
print(f"Proved lower bound is {solver.get_lower_bound()}.")
! The samples have to be lists of fully defined dictionaries. Otherwise, the results may be wrong. We will add automatic warnings for that soon.
The optimizer uses the Python logging module. You can configure it as you like. The default logger is named "SampLNS" and does not print anything. You can change the logging level by adding the following lines to your code:
import logging
logging.getLogger("SampLNS").basicConfig(
format="%(levelname)s:%(message)s", level=logging.INFO
)
You can also pass a custom logger to the optimizer. This is useful if you want to capture the log messages for analysis.
We parse the instances and preprocess them into instances where the essential feature, i.e., features/variables that are not just auxiliary, are continously indexed starting at zero. The other features are appended also as indices. Additionally, some basic simplifications are performed, such as substituting simple equality constraints or pulling unary clauses up.
This module should possibly be extracted and provide a simple export/import functionallity.
The CDS part tries to find tuples of feature literal-pairs that cannot appear within the same sample. This is a very effective lower bound for the instances we have seen and it also helps in the next step for symmetry breaking, which speads up the individual LNS-steps fundamentally. For this purpose, it must not only be able to compute CDS on the whole graph but also on a list of pair-tuples.
This is the heart of the optimizer and is surprisingly simple. Starting from a given, feasible solution, it removes a few samples from the solution such that at most a limited number of pairs get uncovered (the important observation here is that only few pairs are covered by only one or very few samples, but they involve the highest cost).
First, you should install all dependencies and run the package tests. You can do so by simply executing the following commands at the root.
pip install -r requirements.txt
python3 setup.py install
python3 -m pytest -s tests
Now you can start developing. To check if something works early on, you may want
to write tests. Do so by simply adding a test_something.py
directly besides
your source. The tests should be in functions called def test_bla()
and use
assert
. You can now test your code with
python3 setup.py develop
pytest -s pysrc
This will place the compiled binary in your source folder and run only the tests within the source folder.
Please follow the following project structure. It is carefully designed and follows common guidelines. It uses Scikit-HEP developer information as baseline.
cmake
Should contain all cmake utilities (no cmake package manager, so copy&paste)deps
Should contain all dependenciesinclude
Public interfaces of C++-libraries you write. Should followinclude/libname/header.h
python
Python packages you write. Should be of the shapepython/package_name/module_name/file.py
. Can be recursive.notebooks
A place for Jupyter-notebooks.src
Your implementation internals. Can also contain header files that are not part of the public interface!tests
Your tests for C++ and Python. Read also this..clang_format
(C&P) C++-formatting rules, so we have a common standard.- Needs to be edited if: You want a different C++-coding style.
.flake8
(C&P) Python checking rules- Needs to be edited if: The rules do not fit your project. Especially, if there are too many false positives of some rule.
.gitignore
(C&P) Automatically ignore system specific files- Needs to be edited if: You use some uncommon tool that creates some kind of artifacts not covered by the current rules.
pyproject.toml
(C&P) Tells pip the dependencies for runningsetup.py
.- Needs to be edited if: You use additional/different packages in
setup.py
- Needs to be edited if: You use additional/different packages in
.pre-commit-config.yaml
(C&P) For applying a set of checks locally. Run, e.g., viapre-commit run --all-files
.- Needs to be edited if: Better tools appear you would like to use, like a
better
black
etc.
- Needs to be edited if: Better tools appear you would like to use, like a
better
CMakeLists.txt
Defines your C++-project. This is a complex topic we won't dive into here. You should know the basics of CMake to continue.conanfile.txt
Defines the C++-dependencies installed via conan (use CPM within CMakeLists.txt for copy&paste dependencies).- Needs to be edited if: You change C++-dependencies.
MANIFEST.in
(C&P) Defines all the files that need to be packaged for pip.- Needs to be edited if: You need some files included that do not fit the basic coding files, e.g., images.
setup.py
Scripts for building and installing the package.- Needs to be edited if: You add dependencies, rename the project, want to change metadata, change the project structure, etc.
- If you don't have any CPP-components yet, you need to set the target to None!
requirements.txt
The recommended requirements for development on this package- Needs to be edited if: You are using further python packages.
Please report any further issues you encounter.
The package will build C++-code during installation using skbuild-conan. If you encounter any problems during the installation, please check the skbuild-conan documentation.