Skip to content

Commit

Permalink
Create README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
PGotts authored May 18, 2022
1 parent 7d41572 commit a40558d
Showing 1 changed file with 268 additions and 0 deletions.
268 changes: 268 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
# MP-CodeCheckBin-RedHat
MP-CodeCheck Binary RedHat Release
Thank you for visiting Merly's MP-CodeCheck (BETA) GitHub repository! We are happy you are here! If you enjoy using MP-CodeCheck and find it useful, we would greatly appreciate your feedback.

Please email us with any questions, comments, issues, or anything related to MP-CodeCheck to support@merly.ai or debugging@merly.ai. We would love to hear from you!

MP-CodeCheck currently works with C, C++, C#, Go, Java, JavaScript, and Python. We plan to continue to add support for other languages in the coming months.

![merlydog_favicon](https://user-images.githubusercontent.com/92695077/163842195-66aabaa5-9db3-4499-8593-ae40fbef6e97.png)


# What is MP-CodeCheck?
MPCC is an AI-based code anomaly detection system. More specifically, MPCC uses self-supervision, iterative learning, and programmatic-guided evolution to detect anomalous code patterns. MPCC was designed to learn good and bad code syntax, patterns, and semantics from a large corpora of existing code. Once trained, MPCC’s model can be used for a variety of tasks such as: (i) detecting potential anomalies in existing code,
(ii) grading the quality of an existing repository, and (iii) guiding programmers through the important aspects of an unfamiliar or updated code
repository, to name a few.

![System Diagram](https://user-images.githubusercontent.com/92695077/163854383-fadec958-7958-4f27-95be-3f4dcd689f5f.jpg)

At its core, MPCC is a machine programming (MP) system that autonomously identifies anomalous logical expressions directly
in source code. These anomalous expressions, also known as anomalies, are often latent defects in the existing code that
programmers have failed to identify or correct. MPCC helps programmers find these anomalies and correct them, thereby
improving the overall quality of the existing software. For this limited release version of MPCC, we only include MPCC the
ability to perform inference (i.e., detect good or bad patterns) on code. In subsequent releases of MPCC, we may also include
the ability to train new models on other code bases, including users’ own proprietary ones.

## Pre-Setup Instructions
For Linux RedHat, go to the Command Line Interface (CLI) and execute the following commands:

*mkdir MPCC*

*cd MPCC/*

*curl -OL https://github.com/merly-ai/MP-CodeCheckBin-RedHat/raw/main/bin/latest/MerlyInstaller*

*chmod +x MerlyInstaller*

*./MerlyInstaller install*

Your’re now ready to launch MP-CodeCheck!


## Setup Instructions
Prior to running inference and reviewing the results, let’s set up the environment. To run MPCC, you’ll need the following three
things (at a minimum):
1. A model trained on code (provided by Merly).
2. The MPCC executable (provided by Merly).
3. A code base to run inference against (provided by you, the user).

Please ensure both the MPCC model and the executable file were placed in the same folder. (This should have been completed
for you by following the steps in pre-setup.) Then, to simplify inference, we recommend you place the code repository folder in
the same directory as MPCC. Your setup is now complete!


## Launching MP-CodeCheck
Now that setup is complete, let’s launch MPCC to perform inference analysis. From the command line interface (CLI), type the
following (where ”[code base folder]” is a directory that contains the code you want to analyze):

*./MPCC infer -D [code base folder]*

When run successfully, MPCC will display information that looks similar to the following screen. This shows the progress of MPCC extracting the code DNA from the training data.

<img width="769" alt="launch1" src="https://user-images.githubusercontent.com/92695077/169102884-2df8f152-5b54-4546-a194-d3ceb32c12c3.png">

When MPCC has loaded its trained model and processed the code DNA, it begins inference analysis on all source code that
it finds in the files of the directory (or subdirectories) you have supplied when launching it. The image below shows an example of
MPCC’s inference progress in analyzing a code repository, how much work it has completed, and how much work is remaining.

<img width="769" alt="launch2" src="https://user-images.githubusercontent.com/92695077/169102952-dffaab20-4a7c-4c2d-a09b-f4fde0373321.png">

When inference analysis has completed, the Code View screen will appear (see image below), which will allow a user to
analyze the inference results as discussed in the next section.

<img width="769" alt="CodeView" src="https://user-images.githubusercontent.com/92695077/169102988-d6b250f8-d681-4167-8f1e-63919cc7c031.png">


## Exploring MPCC's Inference Results

After inference analysis is performed, MPCC will show a user interface that includes source code, with an expression highlighted.
We call this screen the *Code View*, which will be described in more detail in Views section of this manual. The image below provides an
example of an anomalous code example found by MPCC.

<img width="1395" alt="anomaly" src="https://user-images.githubusercontent.com/92695077/169103117-e039e3f9-61a7-4c82-a883-ab99abd8eb96.png">


Here's a description of what you'll see on this screen:

**Sort Criteria:** This refers to how MPCC is sorting the list of expressions it has found. This can be via score (a numeric value
assigned by anomaly identification and complexity), or location (sequential code order).

**Class Filter:** This refers to which class of complexity is being filtered in the current view. This can be set from a minimum
value of trivial to a high value of Max complexity.

**Cost Filter:** This refers to a “mental cost” of an expression. This filter can be set from a minimum value of 0 to a maximum
value of 2,000.

**Displayed Items:** This refers to which items MPCC is displaying. It can be set to all expressions, or only anomalous
expressions.

**Hide/Show Known Good:** This refers to whether or not MPCC displays expressions that have been marked by the user as
Known Good.

**Anomaly Identification:** This displays whether or not MPCC has identified the current expression as an anomaly. Non-
anomalous expressions will be classified as “known pattern detected” and highlighted in green. Anomalous expressions will be
classified as “unfamiliar pattern(s) detected” and will be highlighted in green.

**Cost:** This displays the “mental cost” of the current expression.

**Complexity:** This displays the class of complexity of the current expression.

**Source Code Location:** This displays the file location of the source code under review.

**Anomaly/Expression Count:** This displays the count of the highlighted expression, as well as the total expressions found in
this file. If the user toggles the filter to show only anomalies, this will display the count of highlighted anomaly, and the total
anomalies found in the current file.

**Walking Through Code:** You can move forwards and backwards through the expressions by using the left and right arrow
keys, and can page up and page down through the code (by location) using the Page Up and Page Down keys. You can also
scroll up and down through the code by hold the Control key while pressing the up or the down arrow, respectively.

## Basic Commands and Views

In MPCC, there are a number of supported keyboard and mouse commands. In this section we describe those keystrokes and
explain mouse behavior. Perhaps the most important initial command to remember is the *help* command which can be launched
by pressing the character ’h’ on your keyboard. The help command lists all of the keyboard commands, so if you ever find
yourself not remembering a keyboard command, just press ’h’ and MPCC will launch the keyboard shortcut commands. A
screenshot of the help dialogue box is shown below.

<img width="666" alt="Help" src="https://user-images.githubusercontent.com/92695077/163893676-f5845122-d222-45db-a756-7d8eca1c63a7.png">

In addition to commands, there are several screens users can utilize to help them gain deeper insights into specific anomalies,
general anomaly information, anomalies by file, anomalies per file, and so forth.

**Code View:** This is the view of all of the code, with the expressions found highlighted. This view is the default view when
MPCC is initially run.

<img width="769" alt="CodeView" src="https://user-images.githubusercontent.com/92695077/169103253-67336b12-1317-4722-bf2e-988bf9ac6f6f.png">

**Anomalies View:** Press ‘a’ to switch to the Anomalies view. This view shows all of the expressions in the code (across all
files) that MPCC has determined to be an anomaly, sorted by score. You can move up and down the list using the up and down
arrows, or the Page Up and Page Down keys. Press Enter with an anomaly highlighted to switch back to the Code View of that
specific anomaly.

<img width="769" alt="AnomaliesView" src="https://user-images.githubusercontent.com/92695077/169103292-b02129d8-0887-4bc8-a2e2-f072dd90307b.png">

**Files View:** Press ‘f’ to switch to the Files view. This view shows all of the source code files, with the total number of
expressions MPCC found in each file. You can move up and down the list using the up and down arrows, or the Page Up and
Page Down keys. Press Enter with a file highlighted to switch back to the Code View of the expressions within that specific file.

<img width="769" alt="FilesView" src="https://user-images.githubusercontent.com/92695077/169103330-b25af179-4665-47fe-9c90-dbcf5c983b49.png">

**Expressions View:** Press ‘e’ to switch to the Expressions view. This view shows all of the expressions in the current file,
sorted by score. You can move up and down the list using the up and down arrows, or the Page Up and Page Down keys. Also
note that you can toggle the sort between code location and score by pressing the ‘s’ key. Press Enter with an expression
highlighted to switch back to the Code View with that specific expression highlighted.

<img width="675" alt="ExpressionsView" src="https://user-images.githubusercontent.com/92695077/169103358-d4b63f25-0e2e-4eeb-90b1-c39c0eab8cdf.png">

**Details View:** Press ‘d’ to switch to the Details view. This view shows the detail of the currently selected expression. The
detail lets you know how many anomalies MPCC identified within the expression, the cost, and the total score. Press ‘d’ to
return to Code View.

<img width="675" alt="DetailsView" src="https://user-images.githubusercontent.com/92695077/169103404-853c07a1-55bb-486f-952f-027eee1a4df8.png">

**Help Pop-up:** In addition to the above views, you can press the ‘h’ key in any view to bring up the help screen which will
show you all of the hot keys and their functions.

<img width="666" alt="Help" src="https://user-images.githubusercontent.com/92695077/163893697-a8cd66b9-898b-4f84-8db4-92633864b4a6.png">

## Sorting/Filtering Inference Results

The following lists the ways MPCC’s inference results on source code data can be sorted and/or filtered.

***Sort Criteria:***

**Options:**
* Score (numeric value assigned by anomaly identification and complexity)
* Location (sequential code order)

**Default:** Score

**Toggle:** ‘s’ key

***Class filter***

**Options:**
* Trivial (minimum)
* Basic
* Complex 1
* Complex 2
* Max

**Default:** Trivial

**Adjust:** '1', '2', '3', '4', '5' keys

***Cost filter:***

**Options:**
* 0 (minimum) to 2,000

**Default:** 0

**Adjust:** ',' to decrease, '.' to increase, 'm' to reset to 0 (minimum)

***Hide/Show Known Good:***

**Options:**
* Hide Known Good
* Show Known Good

**Default:** Hide Known Good

**Toggle:** '9' key


## MPCC Generated Files
In addition to the live (online) user interface, you can also review the inference results offline through four MPCC generated
files. These files are re-generated each time inference is run successfully. These files will be created in the same folder that the
MPCC executable was launched and have the following naming structure.

**[Code Repo].by_file.txt:** This file lists all anomalous expressions (that are not nested if’s) found by MPCC. This human
readable file lists the original anomalous source and its normalized version.

**[Code Repo].by_file nested if.txt:** This file lists all nested if expressions that are found by MPCC to be anomalous. This
human readable file lists the original anomalous source and its normalized version.

**[Code Repo].mpcc.anomaly_list.json:** This file lists all expressions that are found by MPCC to be anomalous, in a machine-
readable format.

**[Code Repo].mpcc.summary.json:** This file contains a summary of all of the files, size, and lines of code reviewed by
MPCC. It also provides a summarized report of the number of expressions, anomalies, and scores found in the source code that
inference was performed on, in a machine-readable format.

## MPCC Configuration
For users who wish to customize their MPCC experience, a JSON file is available to configure MPCC to fit your preferences.

The JSON file is located at the following location:

*%appdata%\..\local\merly.ai\debugging\MP-CodeCheck\config.json*

You can use any text editor to modify the colors, log file locations, and settings. Let’s take a closer look.

**Colors:** These are stored in the json file in hexadecimal (HEX) RGB; simply use your favorite color picker to find the
hex value of the color you’d like, and change the value of the associated item.

For example, you can set *anomaly_background* to RGB *ab852e* to change the highlight color of the anomalous expressions to dark orange.

<img width="675" alt="orange_anomalies" src="https://user-images.githubusercontent.com/92695077/169103463-9daf31dd-fb99-4138-a618-355aa8e5dae8.png">

Or, set *highlight_background* to RGB *4a9de0* to change the highlight color of the non-anomalous expressions to light
blue.

<img width="675" alt="blue_expressions" src="https://user-images.githubusercontent.com/92695077/169103507-c6562915-d2e7-4528-9596-47b38e24e772.png">

**Log Files:**

You can change the model path by setting the directory associated with: local-db_root_path

You can change the log path by setting the directory associated with: log_path

**Settings:**

“run_training” – Determines whether or not training should be run before inference on the source code (defaults to true).

“filter” – Determines if items (such as nested ifs) are extracted from the json anomaly list file (defaults to true).

![banner](https://user-images.githubusercontent.com/92695077/163866689-00f29db2-6176-4e65-bd46-dddf96a6301e.jpg)

0 comments on commit a40558d

Please sign in to comment.