Skip to content
This repository has been archived by the owner on Feb 26, 2024. It is now read-only.

Enhancement: The visualizer (v1: Function calls and such) #3520

Merged
merged 10 commits into from
Nov 24, 2020
Merged

Conversation

haltman-at
Copy link
Contributor

@haltman-at haltman-at commented Nov 12, 2020

OK, here's the initial version of the visualizer! Or rather the new debugger subsystem that generates a JS tree object that will be used as input to some sort of visualizer. (Note: The form of this object will really need to be documented somewhere. I suppose this PR writeup will serve as the initial place, but it should really live somewhere else later!)

This initial version contains decoding for function calls, return values, and related information. It does not include gas information. It also does not include events or storage writes. Those will come in future versions of the visualizer. (Storage writes in particular is going to take a while...)

You can get the transaction tree object by running the debugger to the end (not in light mode!) and viewing the selector txlog.views.transactionLog.

So, what is the form of this object? I'll cover that first, and then describe how the system for generating it works later.

The object is structured as a tree; each node has a type field, which is a string. The root node (the object itself) always has type "transaction". It also has an origin field, and an actions field. The origin is the address of the transaction's origin.

The action field is something that a number of nodes will have, and is an array representing the sub-actions of a given node; this is where the tree structure comes in. Right now, these are always function calls. However, in the future, they may include events, storage writes, and transaction fees. Note that because of that, right now all nodes have an actions field. However, when leaf actions such as events, storage writes, or transaction fees are added, they will not include an actions field.

For the root (transaction) node, right now the actions array always consists of a single action, which will be of type "callexternal". However, in the future, a transaction fee action might be added afterwards.

So right now, aside from "transaction", there are only two types of actions: "callexternal" and "callinternal". Since both represent function calls, they have some structure in common. So let's start with the fields common to both.

  • functionName: The name of the function being called. Will be undefined for constructors, fallback functions, or calls that aren't to functions at all.
  • contractName: The name of the contract being called.
  • arguments: An array of arguments passed to the function. Each argument is given as a { name: ..., value: ...} pair, where name is a string (if the argument is named) or undefined (if it's unnamed); and each value uses the decoder output format.
  • returnKind: This can be one of "return", "revert", "selfdestruct", or "unwind". If it's "return", that means the call returned normally. If it's "revert", that means it reverted, and "selfdestruct" means it selfdestructed. Note that these returnKinds are only used in the actual innermost call where the revert or selfdestruct happened. The other calls below it in the same EVM stackframe, that got unwound as a result, will get returnKind "unwind". If the revert or selfdestruct happened in an internal call, that means the external call action that started the EVM stackframe will be marked "unwind" as well -- not "revert" or "selfdestruct". Also note that "unwind" is only used back to an EVM call boundary. If the function that made the EVM call reverts as a result, it will be marked "revert", not unwind.
  • returnValues: An array of return values, formatted similarly to arguments. Only included if returnKind === "return". For external calls, only included if kind === "function" (see below). Note: Not guaranteed to be included, as decoding may fail!
  • returnImmutables: If it's an external call with kind === "constructor" (see below), this will be present instead of returnValues, with an array of the returned contract's immutables, again formatted in the same way. Note that any immutables which go unused in the deployed contract will be omitted. (Note: Obviously this field can only go on a "callexternal", not a "callinternal", but I've included it here for ease of exposition.) (Won't be included if decoding fails, although that shouldn't happen...?)
  • error: If the returnKind is equal to "revert", this will be included instead of returnValues. It contains the decoded revert message (or lack of message); it takes the form of a Codec ReturndataDecoding, so it may be of kind failure to indicate no message or kind revert to indicate a message. Won't be included if decoding fails.
  • beneficiary: If the returnKind is equal to selfDestruct, this will be included instead of returnValues. It contains the address that the ether was sent to in the self-destruct. If the contract performed an ether-destroying self-destruct, this field will be null.

In addition, the "callexternal" type has several additional fields that do not appear on "callinternal":

  • kind: This can be one of "function", "constructor", or "message". Kind "function" means it's a function call. Kind "constructor" means it's a constructor call. And kind "message" means it's something else. There's also the rare fourth kind "unknowncreate", for an unrecognized constructor call.
  • address: The contract address being called.
  • contextHash: This is kind of an internal thing but I included it anyway; it's a hash for disambiguating contracts with the same name (although a contract's constructor bytecode and deployed bytecode will get different hashes). It's null for unrecognized contracts.
  • value: The number of wei sent, as a BN.
  • isDelegate: Set to true for calls made with DELEGATECALL or CALLCODE; set to false otherwise. I didn't bother distinguishing any further than that.
  • data: The calldata sent. Only included when kind === "message".
  • binary: The constructor bytecode being run. Only included when kind === "unknowncreate".

So, that's the format as it currently stands. Again, this should really be documented somewhere better, but here's a start.

Some additional notes:

  1. Absorption -- When an external function call is made, it will show up as one node in the tree, not two, regardless of Solidity version. An external call to C.f() will show up only as a callexternal node, not both a callexternal node and then again as a callinternal subnode.
  2. Yul function calls are not tracked; only Solidity function calls and EVM calls. Calls to generated sources are definitely not tracked. (As in, perhaps a future version might add in Yul function calls if people want that, but my intent is that calls to generated sources should never be included.)
  3. Oops I just realized this: Currently there's no returnData field for when an external call of kind message and returnKind return returns some raw data. That said, not sure if this is worth adding. Like it wouldn't work for calls to precompiles, which strike me as the main use case.

Some limitations:

  1. I can't guarantee how well this will work with optimized Solidity. That said, the EVM-based stuff and ABI-based stuff should obviously work regardless. The rest... well, we can hope.
  2. The debugger can't handle the case where multiple EVM stackframes return simultaneously, so neither can this. That said, that case requires contrived (and non-Solidity) setup and should never occur in practice.

OK, so that's what it produces. How does it all work?

Well, again we have a new submodule. This had to be hooked up like any new submodule; and since it's a new full-mode submodule, that had to be accounted for too.

I'm not going to go over the workings of the saga in too much detail because in a lot of ways it's pretty similar to the solidity saga or the stacktrace saga, keeping track of external and internal calls and instacalls, external and internal returns, reverts, and selfdestructs. However, what is quite different is the form of the state.

The state has three parts: transactionLog, currentNodePointer, and pointerStack.

The transactionLog state is the main part of the state; it's what contains the actual tree. However, it doesn't contain it as a tree! Rather, it has a single field, byPointer (as per state normalization), and then it has the various nodes of the tree indexed by what JSON pointer they would have if this were an actual tree. Except, the nodes here don't contain actual pointers to the other nodes in their actions arrays; rather, they contain JSON pointers.

This state (which can be accessed via the selector txlog.proc.transactionLog) is then processed into an actual tree in txlog.views.transactionLog, which resolves the JSON pointers and processes it all into an actual tree.

(Note that the JSON pointers are kind of fake -- this submodule doesn't even import json-pointer! But they make for a convenient and manipulable way of referring to the nodes.)

The second part of the state is currentNodePointer. This keeps track of which node in the tree is the active one. The third part is pointerStack. This is a stack of pointers that correspond to external calls; it's used for implementing external returns, so that it knows where to return to. (Internal returns are just handled by chopping off the end of the current pointer.)

As a result of this structure, most actions take a pointer argument, which tell the transactionLog reducer which node to operate on. Actions that change the currently active node also take a newPointer argument, which tells the currentNodePointer reducer what to switch the pointer to. There is one exception to this -- the actions instantExternalCall and instantCreate, which represent a call or create that returns instantly, still take the newPointer argument; however, they do not actually change the active node. Instead, the newPointer argument is used purely to tell transactionLog where to create the new node.

Here are a few other particular things I want to point out:

Firstly: Redundancy. A number of things involving external calls are done in two redundant ways -- one based on the EVM info, the ABI, and Truffle Codec's functionality for decoding function calls and returns; and one based on the sourcemaps, the ASTs, and Truffle Debugger's variable-decoding functionality. The EVM-based way takes priority to keep optimized code from screwing things up too much, but both routes are there. It's not pure redundancy, to be clear; there are cases involving libraries where the EVM-based way can't work. So they're both there to help cover all the cases.

Secondly: The identify system. The AST-based way of handling external calls is that when a FunctionDefinition node is hit after one, there's an identify action, which is what adds the extra info to the externalcall node. This is there to make sure we can handle pre-0.5.1 stuff, in particular pre-0.5.1 library calls, which I don't think can really be handled any other way. But, we also use this system for internal calls, even though it's not really necessary there, just for consistency and to prevent extra effort in the internal case. Originally the identify system was also used for handling absorptions, but that ended up not working like I thought, so I had to add a separate thing for handling absporptions. It's a bit inelegant, but, oh well...

Thirdly: Internal flags and the "library" kind. While the tree is being built, some nodes may have internal flags not listed above; these will be deleted by the end though. Specifically, the flags absorbNextInternalCall and waitingForFunctionDefinition. Also, an externalcall node can temporarily get the "library" kind when it's a call to a library, because with those we don't always know what's going on right away. However they should always be resolved to "function" or "message" by the end.

Fourthly: I had to add a new decoding saga to data to decode EVM function calls, because the debugger hasn't had to do that before; that was purely a Truffle Decoder feature previously. Note that normally this saga decodes the call the debugger is about to make, but there's a flag you can pass to instead make it decode the one it's currently in; this is used for decoding the initial call.

Fifthly: There's a new EVM selector for getting the beneficiary of a self-destruct. If the beneficiary is equal to the self-destructing address -- so that the ether will be destroyed -- it returns null rather than the address. I could have done that processing in txlog, but I thought it was easier to do it here.

Sixthly: There are two selectors in txlog, txlog.current.inputParameterAllocations and txlog.current.outputParameterAllocations, for getting the input & output stack allocations for a given function. Note that each is only built to be used at particular times! They won't give correct if used at other times. Also you'll notice they both make use of a locateParameters function. Of course, it's still basically the same way the data saga does it, but I didn't go that far in factoring it because that seemed unnecessary and inconvenient.

And finally of course I added some tests. Note that the tests aren't too extensive, because, well, we don't want the tests to be too slow, and also writing tests is tedious. But I tested more stuff manually. And of course I can add more tests if you think it's warranted.

OK, I think that's it! Visualizer v1, here it is!

@haltman-at
Copy link
Contributor Author

OK, I made some pretty significant changes to this one. I'm going to go back and rewrite the PR write-up tomorrow.

Copy link
Contributor

@gnidan gnidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright Harry, got through this all. Much improved from before, but still kind of a mess? I'm a bit concerned about the growing complexity of the code. I guess there's not much to do about it right now. Let's get this merged!

packages/codec/lib/core.ts Show resolved Hide resolved
packages/debugger/lib/session/sagas/index.js Show resolved Hide resolved
@haltman-at
Copy link
Contributor Author

OK, one final change here: I renamed type "origin" to "transaction", and its address field to origin. Will edit PR writeup. Will merge after CI passes.

@haltman-at haltman-at merged commit b2fa8dd into develop Nov 24, 2020
@haltman-at haltman-at deleted the viz branch November 24, 2020 01:33
@adam-lokicode
Copy link

@gnidan Hey Nick! Long time no talk! How would something like this compare to https://marketplace.visualstudio.com/items?itemName=tintinweb.solidity-visual-auditor?

It seems the work was started on that tool before this one so curious why this initial approach did not involve UML State Machine standards?

@adam-lokicode
Copy link

@haltman-at Hi Harry, what would be the easiest way for me to run one of your tests to generate a sample txlog? I am just playing around with the code now/tests to see how I can get one of these generated. Thanks!

@adam-lokicode
Copy link

https://github.com/trufflesuite/truffle/blob/4d1796b8a42f052c16cc3ae52f85be9e242735b7/packages/debugger/test/txlog.js

just trying to run it the above as a standalone test and dump results of root and other txlog properties mentioned above,

const root = bugger.view(txlog.views.transactionLog);

@adam-lokicode
Copy link

@haltman-at @gnidan Actually I think I am good, I figured out how to run the tests please see below. Can you guide me how I can dump the transaction log into a file for inspection so I can read it into a visualizer?

Screen Shot 2023-05-21 at 10 00 28 PM

@adam-lokicode
Copy link

When I try to do a console.log for the tests, I am getting the following, anyway to make these more specific so I can read it into a visualizer? Thanks

Screen Shot 2023-05-21 at 11 14 01 PM

@adam-lokicode
Copy link

{
  "type": "transaction",
  "actions": [
    {
      "type": "callexternal",
      "address": "0x13CC557Afc63513a3a4B40b67802BFAd9b3ad660",
      "contextHash": "0x2e3d3d56860bd945bdde2dccc44a4e9a2f962f13a5a7fe63de4bde4ec63d0474",
      "value": "00",
      "kind": "function",
      "isDelegate": false,
      "functionName": "testEvent",
      "contractName": "VizTest",
      "arguments": [],
      "actions": [
        {
          "type": "event",
          "decoding": {
            "kind": "event",
            "definedIn": {
              "typeClass": "contract",
              "kind": "native",
              "id": "shimmedcompilationNumber(0):200",
              "typeName": "VizTest",
              "contractKind": "contract",
              "payable": false
            },
            "class": {
              "typeClass": "contract",
              "kind": "native",
              "id": "shimmedcompilationNumber(0):200",
              "typeName": "VizTest",
              "contractKind": "contract",
              "payable": false
            },
            "abi": {
              "anonymous": false,
              "inputs": [
                {
                  "indexed": true,
                  "internalType": "uint256",
                  "name": "x",
                  "type": "uint256"
                },
                {
                  "indexed": false,
                  "internalType": "uint256",
                  "name": "y",
                  "type": "uint256"
                }
              ],
              "name": "TakesArgs",
              "type": "event"
            },
            "arguments": [
              {
                "name": "x",
                "indexed": true,
                "value": {
                  "type": {
                    "typeClass": "uint",
                    "bits": 256,
                    "typeHint": "uint256"
                  },
                  "kind": "value",
                  "value": {
                    "asBN": "01",
                    "rawAsBN": "01"
                  },
                  "interpretations": {}
                }
              },
              {
                "name": "y",
                "indexed": false,
                "value": {
                  "type": {
                    "typeClass": "uint",
                    "bits": 256,
                    "typeHint": "uint256"
                  },
                  "kind": "value",
                  "value": {
                    "asBN": "02",
                    "rawAsBN": "02"
                  },
                  "interpretations": {}
                }
              }
            ],
            "selector": "0x9c56520d85a5fd5549276c565b0b2f2a7a75bd00b98f091dcdf45e6d8d8bef44",
            "decodingMode": "full",
            "interpretations": {}
          },
          "raw": {
            "topics": [
              "0x9c56520d85a5fd5549276c565b0b2f2a7a75bd00b98f091dcdf45e6d8d8bef44",
              "0x0000000000000000000000000000000000000000000000000000000000000001"
            ],
            "data": "0x0000000000000000000000000000000000000000000000000000000000000002"
          },
          "step": 146
        },
        {
          "type": "event",
          "decoding": {
            "kind": "event",
            "definedIn": {
              "typeClass": "contract",
              "kind": "native",
              "id": "shimmedcompilationNumber(0):37",
              "typeName": "Nothing",
              "contractKind": "interface",
              "payable": false
            },
            "class": {
              "typeClass": "contract",
              "kind": "native",
              "id": "shimmedcompilationNumber(0):200",
              "typeName": "VizTest",
              "contractKind": "contract",
              "payable": false
            },
            "abi": {
              "anonymous": false,
              "inputs": [],
              "name": "Bloop",
              "type": "event"
            },
            "arguments": [],
            "selector": "0xce926f93d7a1a989f0f66534b4816e1077dc6f687a241e399878e591a346ca4d",
            "decodingMode": "full",
            "interpretations": {}
          },
          "raw": {
            "topics": [
              "0xce926f93d7a1a989f0f66534b4816e1077dc6f687a241e399878e591a346ca4d"
            ],
            "data": "0x"
          },
          "step": 156
        }
      ],
      "beginStep": -1,
      "raw": {
        "calldata": "0x4f9d719e"
      },
      "returnValues": [],
      "endStep": 159,
      "returnKind": "return"
    }
  ],
  "origin": "0xDE96CA0CA62A886848CBdEa95277a3Bac8A918B3"
}

I was able to output the above from a test, what's the easiest way for me to step through this? Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants