Skip to content

Code Analysis

Mitchell Miller edited this page Sep 14, 2019 · 7 revisions

Currently, code analysis is done through the python-based code analyzer Lizard.

This page was last updated with info in accordance with Lizard ver 1.16.6 on 9/21/2019.

List of supported languages:

  • C/C++ (works with C++14)
  • Java
  • C# (C Sharp)
  • JavaScript
  • Objective-C
  • Swift
  • Python
  • Ruby
  • TTCN-3
  • PHP
  • Scala
  • GDScript
  • Golang
  • Lua

List of metrics

Below are all the measures collected by our code analyzer on each function in a codebase.

tokens

Tokens are used to parse a function to generate all other metrics. They are the tokenized form of source code.

See this example: if (abc % 3 != 0) would be tokenized to these [‘if’, ‘(‘, ‘abc’, ‘%’, ‘3’, ‘!=‘, ‘0’, ‘)’] 8 tokens.

params

The number of parameters a function takes in.

length

A direct measure of the number of lines contained in a function. Basically counting the number of newline characters.

nloc - number of lines of code

In contrast to length, nloc is the number of non-commentary lines of code. This means pure whitespace and comments are ignored. This can help identify logically long functions.

ccn - cyclomatic complexity

A quantitative measure of the number of linearly independent paths in a function. In other words, a way to quantitatively measure of how complex a function is. This can relate to its readability as well as other things. For more information see McCabe's Cyclomatic Complexity

fanIn - structural fan-in of procedures (local)

This describes how many calls are coming into a procedure/function. This usually directly relates the number of parameters a function takes in.

fanOut - structural fan-out of procedures (local)

This describes how many procedures/functions are called from within a function. A high fan-out would mean the function calls many others. A fan-out of zero means it is a leaf procedure depending on no other procedures. (note that in most cases, this does not count imported library function calls)

generalFanOut - structural fan-out of procedures (global)

This version of the fan-out measure does so with global scope in mind instead of the local scope of the function.

maxNestingDepth - maximum nesting depth of a function

This describes the maximum depth of a nested statement. In a properly indented code segment, this would describe the maximum number of indents statements within the function have. For example, a function with an if statement would have a nesting depth of 1.

maxNestedStructures - maximum nested structures within a function

Code duplicate detection through Lizard

Duplicate code is detected by searching through sequences of tokens and finding long sequences that match between files. The groups of files (and the start & end lines) with duplicate segments are then reported.

Limitations (taken directly from Lizard README)

Lizard requires syntactically correct code. Upon processing input with incorrect or unknown syntax:

  • Lizard guarantees to terminate eventually (i.e., no forever loops, hangs) without hard failures (e.g., exit, crash, exceptions).

  • There is a chance of a combination of the following soft failures:

    • omission
    • misinterpretation
    • improper analysis/tally
    • success (the code under consideration is not relevant, e.g., global macros in C)

This approach makes the Lizard implementation simpler and more focused with partial parsers for various languages. Developers of Lizard attempt to minimize the possibility of soft failures. Hard failures are bugs in Lizard code, while soft failures are trade-offs or potential bugs.

In addition to asserting the correct code, Lizard may choose not to deal with some advanced or complicated language features:

  • C/C++ digraphs and trigraphs are not recognized.
  • C/C++ preprocessing or macro expansion is not performed. For example, using a macro instead of parentheses (or partial statements in macros) can confuse Lizard's bracket stacks.
  • Some C++ complicated templates may cause confusion with matching angle brackets and processing less-than < or more-than > operators inside of template arguments.