README.txt

iferret-logging-new - Tracing and logging version of QEMU
    This contains the modified version of QEMU (branched from iFerret
    many moons ago) that performs extensive logging. Thanks to
    integration of Tim & Michael's op-switching patch, it's not too
    slow.

    There are scripts called startqemu{vnc,linux,haiku,osx}.sh in this
    directory that will boot QEMU with the appropriate image and load
    the snapshot called "introprog", which I usually have in a state
    where the program I want to trace is ready to run. The scripts take
    one argument, which is the name to use for the log file.

    This directory also includes code to read the log files generated by
    iFerret. oiferret (created by Makefile-oiferret) reads in the log
    and prints the entries to stdout. By using Makefile-iferretso, you
    can also build a shared library that provides access to the logfiles
    in python. The resulting .so file is symlinked to the dynslicer
    directory, which also has a python wrapper for the library using
    ctypes.

    Finally, make_iferret_code.pl has been updated so that it also
    outputs a python module that contains the current op enumeration.
    This, too, is symlinked into the dynslicer directory.

dynslicer - Slicing and translation code
    This is where the bulk of the magic happens. The main code, which
    lives in newslice.py, does roughly the following:
        1. Load the trace from a binary file (using iferretpy.py)
        2. Find the inputs and outputs
        3. Filter out interrupts
        4. Replace malloc-related functions with summaries
        5. Perform some QEMU-specific fixups on the trace (e.g.,
           splitting TBs that have internal jumps).
        6. Do an initial slice on the trace with respect to the output
           buffer.
        7. Examine the CFG, and do a slice on any control-flow
           instructions that were observed to go more than one
           direction.
        8. Ensure that if an instruction is marked in *any* instance of
           a TB, it is marked in *all* instances of that TB. Do
           additional slicing to pull in the dependencies of these newly
           added instructions, and iterate until we hit a fixed point.
        9. Read the userland memory (by looking at LD instructions), and
           store it in a dictionary, so that modules that require static
           data (for example, a static string) can still function
           correctly.
        10. Translate the code, using the translations defined in
           translate_uop.py.
    
    This main script is supported by several auxiliary modules:
        qemu_data.py : data flow models for the slicing algorithm. These
            are stored in a dictionary keyed by the name of the op. The
            values are a pair of functions that, given an op's arguments,
            will return the set of defs and uses, respectively. It also
            contains some convenience functions like is_jcc and
            is_dynjump.
        translate_uop.py : translations used to turn QEMU micro-ops into
            Python. These, too, are stored in a dictionary keyed by the
            op name, and the values are functions that take the arguments
            to the op.
        qemu_trace.py : data structures for working with a trace. The
            main thing in here is the TraceEntry data structure, which
            has the same interface as the log entries stored by the
            iferret.so, so the two can be mixed in the same trace.
        summary_functions.py : replacements for malloc-related
            functions. These typically require the current value of ESP
            (so that data flow can be implemented for them), so
            newslice.py determines it from the trace and passes it to
            the summary function generator to get back the actual
            instance of the summary.
        iferretpy.py : wrapper for iferret.so. The trace returned by
            this module's load_trace is a Python object that wraps the
            underlying C array. It implements all the methods needed to
            make it appear as a mutable sequence to Python, so you can
            modify the trace using standard Python list operations.

    newslice.py takes one required argument, the base filename of the
    trace, and one optional switch that specifies the OS that the trace
    was generated under (-o). This is currently only used to determine
    the locations of mallocs to replace; if it is not set via the
    command line, xpsp2 is assumed. Valid values are currently:
        [xpsp2, haiku, osx, linux]

    The output of newslice.py is a .pkl file (serialized Python
    objects), which consists of a tuple of two dictionaries:
        transdict: the translated code blocks, keyed by starting EIP
        userland: static data read from userland
    This .pkl file can be fed into the newmicrodo plugin under
    Volatility to run the translated code (with the -m option).

Volatility-1.3_Beta - A version of Volatility for running generated code
    The main feature of this directory is the newmcirodo.py plugin,
    symlinked from the dynslicer directory. It allows running translated
    uOps created by newslice.py. The plugin is currently set up to use
    flat (dd-style) memory images, along with a register dump from QEMU.
    Register dumps for the various OSes (Haiku, Linux, Windows, and
    OS X), are available in this directory (named ending in .env). See
    the help for newmicrodo for more options.

    The only non-obvious feature of newmcirodo is its support for output
    decoding. This feature (which corresponds to the -i switch), allows
    the user to pass in a Python function (as a string) that will be
    applied to the output data to decode it. The string is decoded as a
    Python string before being executed, so you can include escape
    characters (such as tabs and newlines). An example that decodes the
    buffer as a list of DWORDs:

      'def f(x):\n\tprint unpack("<%dI" % (len(x)/4), x)'

    This version of Volatility has one modification that was needed to
    support newmicrodo.py: write support has been added to the address
    space classes. This was necessary to get the appropriate layering
    for having copy-on-write at the physical address space layer.

iflogs/traces - Saved traces for various introspections
    You'll note a mix of two formats here -- the newer binary log file
    format and the older .trc/.trace format (text-based). Most of these
    are fairly useless -- the .trc files are no longer supported
    (loading and parsing them took too long), and the binary logfiles
    become unreadable whenever a change is made to the logfile format.
    So really only the most recent traces are likely to be useful. Once
    this gets more stable, we can go back and regenerate the older
    traces so they're all in a ready-to-analyze format.

scripts - Miscellaneous scripts I find useful
    editpkl.sh: allows editing the translated code in .pkl files. Very
        useful for adding debugging statements in the generated code.
    mkiso.sh : generates an ISO of a directory that can be loaded by the
        QEMU guest
    
introprog - Introspection programs to run in the guest
    These are separated out into each OS we've tried. The programs all
    use vm_mark_buf_{in,out} to signal to QEMU that the trace is
    beginning / ending, and to give the location of inputs and outputs.
    These two macros are defined in vmnotify.h, which exists in two
    flavors: the Windows version, which uses assembly macros in MSVC
    format, and the *nix version, which uses gcc-style inline assembly.

    The programs themselves are quite simple, and should be
    self-explanatory.
    
images - Hard drive and memory images used in training
    The hard drive images are in qcow2 format. The memory images are in
    raw dd format, suitable for use with Volatility.