Assembly code analysis is a time-consuming process. An effective and efficient assembly code clone search engine can greatly reduce the effort of this process, since it can identify the cloned parts that have been previously analyzed. Kam1n0 is a scalable system that supports assembly code clone search. It allows a user to first index a (large) collection of binaries, and then search for the code clones of a given target function or binary file. We have created a promotional video on YouTube:
Kam1n0 tries to solve the efficient subgraph search problem (i.e. graph isomorphism problem) for assembly functions. Given a target function (the middle one in the figure below) it can identity the cloned subgraphs among other functions in the repository (the ones on the left and the right as shown below). Kam1n0 supports rich comment format and has an IDA Pro plug-in to use its indexing and searching capabilities via IDA Pro.
Kam1n0 was developed by Steven H. H. Ding under the supervision of Benjamin C. M. Fung in the Data Mining and Security Lab at McGill University in Canada. This software won the second prize in the Hex-Rays Plug-In Contest 2015. If you find Kam1n0 useful, please cite our paper:
- S. H. H. Ding, B. C. M. Fung, and P. Charland. Kam1n0: MapReduce-based Assembly Clone Search for Reverse Engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 10 pages, San Francisco, CA: ACM Press, August 2016.
In this repository we release the initial version of Kam1n0 and its IDA Pro plug-in. It can run on a single workstation/server, and provides clone search service through RESTful web services. The users can connect to the server through IDA Pro. Alternatively, it can be deployed on a distributed cluster (next major release).
- [Web UI] Added a web interface for clone search with an assembly function.
- [Web UI] Added a web interface for clone search with a binary file.
- [Kam1n0 Workbench] Added Kam1n0 Workbench for creating and managing multiple repositories on a single workstation.
- [Kam1n0 Core] The binary file clone search result can be shared and browsed on the other machine without access to the repository.
- [Kam1n0 Core] Support indexing and searching for large binary file (>40mb) without limits on system memory.
- [Kam1n0 Core] Support ARM, PowerPC, x86 and amd86 binaries.
- [Kam1n0 Core] Support user-defined processor architecture.
- [Kam1n0 Core] Optimized index structure supports better scalability and clone search quality.
- [Kam1n0 Core] Kam1n0 no longer skips basic blocks which have less than three lines of instruction. Now only single line basic block is skipped; thanks to the new index structure.
- [IDA Pro plug-in for Kam1n0] [Experimental] Added assembly fragment search functionality.
- [IDA Pro plug-in for Kam1n0] Added a tree view for browsing large number of clones.
- The assembly code repositories and configuration files used in previous versions (<1.0.0) are no longer supported by the latest version. See documentations about how to migrate previous repositories.
- You can index millions of functions in each repository on a single machine. The average response time for a query still stays around 1s; and the average indexing time for a function still stays around 20ms.
The current release of the Kam1n0 consists of two installers: the server core installer and the IDA Pro plug-in installer for Kam1n0.
Installer | Included components | Description |
---|---|---|
Kam1n0-server.msi | Core engine | Main engine providing service for indexing and searching |
Workbench | An user interface to manage the repositories and the running service. | |
Web user interface | Web user interface for searching/indexing binary file and assembly function. | |
Kam1n0-client-idaplugin.msi | Plug-in | Connectors and user interface. |
Cefpython | Rendering engine for the user interface. | |
Wxpython | Rendering engine for Cefpython. |
The Kam1n0 core engine is purely written in Java. You need the following dependencies:
- [Required] The latest x64 8.x JRE/JDK distribution from Oracle.
- [Optional] The latest version of IDA Pro with the idapython plug-in installed. The Python plug-in and runtime should have already been installed with IDA Pro. Re-install IDA Pro if necessary.
Download the Kam1n0-server.msi
file on our release page. Follow the instructions to install the server. You will be prompted to select an installation path as well as the IDA Pro installation path. The later is optional if the server does not have to deal with any disassembling. In other words, the client side uses the Kam1n0 plugin for IDA Pro. It is strongly suggested to have the IDA Pro installed with the Kam1n0 server. The current version of Kam1n0 only supports IDA Pro.
The IDA Pro plug-in for Kam1n0 is written in Python for logic and in html/JavaScript for rendering. Before installation, it needs the following dependency:
- [Required] The latest version of IDA Pro with the idapython plug-in installed. The Python plug-in and runtime should have already been installed with IDA Pro. Re-install IDA Pro if necessary.
Next, download the Kam1n0-client-idaplugin.msi
installer from our release page. Follow the instructions to install the plug-in and runtime. Please note that the plug-in has to be installed in the IDA Pro plugins directory which is located at $IDA_PRO_PATH$/plugins
. For example, on Windows, the path could be C:/Program Files (x86)/IDA 6.8/plugins
. The installer will validate the path.
In the previous version of Kam1n0, only a single repository is supported on a workstation, and the configuration files for Kam1n0 stay in the same folder as the engine executable file. Starting from 1.x.x version, Kam1n0 supports multiple repositories on a workstation, and each repository can support different type of processor architecture. Each repository is given a data directory where you can find its configuration files. More details can be found in our Kam1n0 workbench tutorial.
- Manage repositories with Kam1n0 Workbench
- Web interface tutorial
- IDA Pro plug-in tutorial
- Working with a cluster
- Create your own processor definition
- Migrate repository from the previous version
- CLI tutorial
The software was developed by Steven H. H. Ding under the supervision of Benjamin C. M. Fung at the McGill Data Mining and Security Lab. Currently, we adopt a Creative Commons licensing model: Attribution-ShareAlike 4.0 International (CC BY-SA). In brief, you are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material
for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
under the following terms
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
Please refer to License.txt for details.
Copyright 2015 McGill Unviersity All rights reserved.