Document input description and metadata output. by chriseth · Pull Request #645 · argotorg/solidity

chriseth · 2016-06-10T15:27:26Z

This is a proposal, please comment!

VoR0220 · 2016-06-10T17:09:46Z

I'm not so certain here...It seems this is adding to the cost of deployment of a contract.

zelig · 2016-06-10T17:17:40Z

docs/using-the-compiler.rst

+**************************************************
+
+In order to ease source code verification of complex contracts that are spread across several files,
+there is a standardized for describing the relations between those files.


standardized way?

chriseth · 2016-06-10T20:10:01Z

Thanks for the comments, @zelig!

One specific problem where I would like to have comments on: It should of course be possible to verify the metadata by different builds of the same compiler version (i.e. for different architectures). But this means that neither the binary hash nor the compiler version in its current form should be part of it (because that is reproduced in a different way depending on the architecture), and instead just the git commit hash. This means that the actual compiler binary can only be retrieved after a lookup inside the "compiler contract". This lookup is recommended anyway so that you do not download anything that is not even a compiler, as explained in #611, so I would not think this is a drawback.

chriseth · 2016-06-10T20:22:55Z

@VoR0220 The additional costs are less than 2500 gas for the data and there is the option to switch it off. One of the advantages is that everyone will automatically have an auto-generated user interface for a contract in mist.

chriseth · 2016-06-14T18:09:37Z

If there are no further comments, I would like to implement it like that. As there are a lot of flexibilities anyway, I don't think it is a big deal if we screw up the first version.

chriseth · 2016-06-16T21:33:01Z

Note: There is more flexibility in the input specification for the solidity compiler. It should be possible to change the "mode" the compiler is operating on, for example:

only parse and type check, but do not compile
list all contracts in all files and whether they are abstract or not (could be done by just doing everything apart from compiling)
in general, do not compile unless bytecode is requested

axic · 2016-07-28T20:54:23Z

docs/using-the-compiler.rst

+        userDocumentation: [ /* user documentation comments */ ],
+        developerDocumentation: [ /* developer documentation comments */ ],
+        natspec: [ /* natspec comments */ ]
+      }


Proposed names for the rest:

abi

methodIdentifiers / functionHashes

evmAssembly

evmBytecode

evmRuntimeBytecode

evmGasEstimates

evmOpcodes

interface or solidityInterface

developerDocumentation

userDocumenation

natspec

why3

I'm not fully sure about source maps?

Ignore the bytecode section as it cannot be included per your comment above.

I am actually not sure how much to include here. As the metadata output is mainly used for source verification, you can anyway re-generate everything here by re-running the compiler.

I would only include the essential information here which are needed even in the absence of a compiler like the ones given above.

The general output format of the compiler should be a superset of the metadata output or at least largely overlapping. This general output can contain exactly the fields you mention.

To be honest I mistook this PR for the compiler input/output and only realised after that it is for verification purposes.

Should we open a issue / PR / gist for the compiler input/output?

axic · 2016-07-28T20:56:37Z

Will the compiler JSON input/output based on the metadata output? Where will the bytecode fit in?

chriseth · 2016-08-18T14:38:44Z

This requires a bit more discussion which moved here: https://pad.riseup.net/p/7x3G896a3NLA

axic · 2016-08-22T09:26:34Z

@chriseth updated the regular output. I think the input settings.outputSelection should be an array of strings and each element should match the output field names.

axic · 2016-08-22T09:27:45Z

@chriseth is why3 going to be generated per-file basis or only a single output is given? (I would assume the latter.)

chriseth · 2016-08-22T19:16:44Z

@axic compilationTarget and outputSelection should somehow be unified. Do you think this can be done?

Why3 is currently on compiler-invocation basis, so neither contract nor file specific, and I think it will stay like that in the future.

axic · 2016-08-22T21:04:25Z

@chriseth:

Why3 is currently on compiler-invocation basis, so neither contract nor file specific, and I think it will stay like that in the future.

I mean the output generated is a single blob for every included source or there's one output blob for each input?

chriseth · 2016-08-23T12:07:13Z

@axic there is one output blob (just that it is not binary). Everything gets combined into a single file.

chriseth · 2016-08-26T13:53:19Z

I will just provide a copy of the current state of the pad to increase redundancy:

**************************************************
Standardized Input Description and Metadata Output
**************************************************

In order to ease source code verification of complex contracts that are spread across several files,
there is a standardized way for describing the relations between those files.
Furthermore, the compiler can generate a json file while compiling that includes
the (hash of the) source, natspec comments and other metadata whose hash is included in the
actual bytecode. Specifically, the creation data for a contract has to begin with
`push32 <metadata hash> pop`.

The metadata standard is versioned. Future versions are only required to provide the "version" field,
the "language" field and the two keys inside the "compiler" field.
The field compiler.keccak should be the keccak hash of a binary of the compiler with the given version.

The example below is presented in a human-readable way. Properly formatted metadata
should use quotes correctly, reduce whitespace to a minimum and sort the keys of all objects
to arrive at a unique formatting.

Comments are of course not permitted and used here only for explanatory purposes.

Input Description
-----------------

QUESTION: How to specific file-reading callback? - probably not as part of json input

The input description is language-specific and could change with each compiler version, but it
should be backwards compatible if possible.

    {
      sources:
      {
        // the keys here are the "global" names of the source files, imports can use other files via remappings (see below)
        "abc": "contract b{}", // specify source directly
        // I think 'keccak' on its on is not enough. I would go perhaps with swarm: "0x12.." and ipfs: "Qma..." for simplicity
        // Where the content is stored is a second component, but yes, we could give an indication there.
        "def": {keccak: "0x123..."}, // source has to be retrieved by its hash
        "ghi": {file: "/tmp/path/to/file.sol"}, // file on filesystem
        "dir/file.sol": "contract a {}"
      },
      settings:
      {
        remappings: [":g/dir"], // just as it used to be
        optimizer: {enabled: true, runs: 500},
        // if given, only compiles this contract, can also be an array. If only a contract name is given, tries to find it if unique.
        compilationTarget: "myFile.sol:MyContract",
        // addresses of the libraries. If not all libraries are given here, it can result in unlinked objects whose output data is different
        libraries: {
          "def:MyLib": "0x123123..."
        },
        // The following can be used to restrict the fields the compiler will output.
        outputSelection: [
            "abi", "evm.assembly", "evm.bytecode", ..., "why3", "ewasm.wasm"
        ]
        outputSelection: {
        abi,asm,ast,bin,bin-runtime,clone-bin,devdoc,interface,opcodes,srcmap,srcmap-runtime,userdoc

 --ast                 AST of all source files.
  --ast-json            AST of all source files in JSON format.
  --asm                 EVM assembly of the contracts.
  --asm-json            EVM assembly of the contracts in JSON format.
  --opcodes             Opcodes of the contracts.
  --bin                 Binary of the contracts in hex.
  --bin-runtime         Binary of the runtime part of the contracts in hex.
  --clone-bin           Binary of the clone contracts in hex.
  --abi                 ABI specification of the contracts.
  --interface           Solidity interface of the contracts.
  --hashes              Function signature hashes of the contracts.
  --userdoc             Natspec user documentation of all contracts.
  --devdoc              Natspec developer documentation of all contracts.
  --formal              Translated source suitable for formal analysis.

          // to be defined
        }
      }
    }


Regular Output
--------------


    {
      errors: ["error1", "error2"], // we might structure them
      errors: [
          {
              file: "sourceFile.sol", // optional?
              contract: "contractName", // optional
              line: 100, // optional - currently, we always have a byte range in the source file
              // Errors/warnings originate in several components, most of them are not
              // backend-specific. Currently, why3 errors are part of the why3 output.
              // I think it is better to put code-generator-specific errors into the code-generator output
              // area, and warnings and errors that are code-generator-agnostic into this general area,
              // so that it is easier to determine whether some source code is invalid or only
              // triggers errors/warnings in some backend that might only implement some part of solidity.
              type: "evm" or "why3" or "ewasm" // maybe a better field name would be needed
              message: "Invalid keyword" // mandatory
          }
      ]
      contracts: {
        "sourceFile.sol:ContractName": {
          abi: 
          evm: {
              assembly:
              bytecode:
              runtimeBytecode:
              opcodes:
              gasEstimates:
              sourceMap:
              runtimeSourceMap:
              // If given, this is an unlinked object (cannot be filtered out explicitly, might be
              // filtered if both bytecode, runtimeBytecode, opcodes and others are filtered out)
              linkReferences: {
                "sourceFile.sol:Library1": [1, 200, 80] // byte offsets into bytecode. Linking replaces the 20 bytes there.
              }
              // the same for runtimeBytecode - I'm not sure it is a good idea to allow to link libraries differently for the runtime bytecode.
              // furthermore, runtime bytecode is always a substring of the bytecode anyway.
              runtimeLinkReferences: {
              }
          },
          functionHashes:
          metadata: // see below
          ewasm: {
              wast: // S-expression format
              wasm: // 
          }
        }
      },
      formal: {
        "why3": "..."
      },
      sourceList: ["source1.sol", "source2.sol"], // this is important for source references both in the ast as well as in the srcmap in the contract
      sources: {
        "source1.sol": {
          "AST": { ... }
        }
      }
    }

Metadata Output
---------------

Note that the actual bytecode is not part of the metadata because the hash
of the metadata structure will be included in the bytecode itself.

This requires the compiler to be able to compute the hash of its own binary,
which requires it to be statically linked. The hash of the binary is not
too important. It is much more important to have the commit hash because
that can be used to query a location of the binary (and whether the version is
"official") at a registry contract. 

    {
      version: "1",
      language: "Solidity",
      compiler: {
        commit: "55db20e32c97098d13230ab7500758e8e3b31d64",
        version: "soljson-2313-2016-12-12",
        keccak: "0x123..."
      },
      sources:
      {
        "abc": {keccak: "0x456..."}, // here, sources are always given by hash
        "def": {keccak: "0x123..."},
        "dir/file.sol": {keccax: "0xabc..."}
      },
      settings:
      {
        remappings: [":g/dir"],
        optimizer: {enabled: true, runs: 500},
        compilationTarget: "myFile.sol:MyContract",
        libraries: {
          "def:MyLib": "0x123123..."
        }
      },
      output:
      {
        abi: [ /* abi definition */ ],
        natspec: [ /* user documentation comments */ ]
      }
    }

This is used in the following way: A component that wants to interact
with a contract (e.g. mist) retrieves the creation transaction of the contract
and from that the first 33 bytes. If the first byte decodes into a PUSH32
instruction, the other 32 bytes are interpreted as the keccak-hash of
a file which is retrieved via a content-addressable storage like swarm.
That file is JSON-decoded into a structure like above. Sources are
retrieved in the same way and combined with the structure into a proper
compiler input description, which selects only the bytecode as output.

The compiler of the correct version (which is checked to be part of the "official" compilers)
is invoked on that input. The resulting
bytecode is compared (excess bytecode in the creation transaction
is constructor input data) which automatically verifies the metadata since
its hash is part of the bytecode. The constructor input data is decoded
according to the interface and presented to the user.

chriseth · 2016-11-11T16:56:47Z

suggestion by @axic: extend metadata information with a retrieve-hint. So we can always use keccak for referencing source files and also add a swarm, ipfs or github link to actually get the source.

same goes for the actual metadata "file" - we should include the keccak hash plus a way to retrieve the file.

VoR0220 · 2016-11-11T19:02:29Z

I would like to propose that we add in the keys of chainID (or something like the concept of a chain ID) and block deployed at for verifiability and better package management.

chriseth · 2016-11-11T19:07:46Z

"block deployed at" cannot be part of the metadata, because that has to be compile-time constant. The current plan is to fire a log at construction time, but the clients do not index events as well as they could be, so that might be hard to find. Perhaps we should put the "block deployed at" in the deployed bytecode in addition to firing an event.

VoR0220 · 2016-11-11T19:08:40Z

That works excellently. Lets make that happen.

chriseth · 2016-11-16T14:34:56Z

Continuing as #1387

eip-165: when the first loop hits BEGINDATA, the second loop should still run

* ✨ Receiver (argotorg#622) * ⚡📬 callback fallback * 🤏 scrunch 🤏 scrunch comments a bit * 👌 Fallback function triggered when upper 224 bits of 0x40 are mostly zero. 👌 Fallback function triggered when upper 224 bits of 0x40 are mostly zero. * ⚡ optimize / formatting ⚡ optimize / formatting * ✍️ Write some tests n stuff * ⚡ zimplify codesize * 🥢 prefer less gas * 🥢 nit that dbl comment 🥢 nit that dbl comment * 🥌 Simpler and future-proofed is also cheaper ofc * 👌 Back to known magic vals. * 📁 File into accounts * 📍 Doc acct contracts in README * 🤏 header 🤏 header * 🤏 header 🤏 header * 🪡 Stitch in comment on fallback pass cond * ✨ EIP4337 Simple Account (argotorg#623) * 🪄 EIP4337 Base Account * 📁 Make and refile into `accounts` * ⚡ Pinion owned clone base & optimize & typo nits * 🥌 Simplify and support contract signing owner * Tidy * Update README * Tidy * Optimize * Add tests * Tidy * Optimize * Tidy * Add comment * Add deposit functions * Fix * Tidy * Use gas() instead of not(0) * Edit comments * Fix comments delegatecall -> call * Use SignatureCheckerLib to validate signature * Add signature check tests * Edit tests * ✏️ Txt nits * Add delegateExecute * Add delegateExecute test * Tidy * 🏧 As in Ownable, onlyOwner functions are payable * Make initialize payable * Edit comments * Edit comments * Futureproof * Strengthen tests * Change to use Execution struct * Execution -> Call * Optimize * Optimize and strengthen tests * ✨ ERC4337: 1️⃣2️⃣7️⃣1️⃣ (argotorg#642) * 1️⃣2️⃣7️⃣1️⃣ * 🤌 Pull branch * ✔️ EOA case (1o1) use hybrid validation method * ✔️ EOA case test * 📸 snapshot * Optimize * Edit comments * Add comment * Optimize * Copypasta toEthSignedMessageHash to SignatureCheckerLib * Add delegateGuard * Add comment * Tidy * Add missing restore part of free memory pointer * 🥢 Typo nit Remove extra `be` in comment and also use `so` as coordinating conjunction removing `that`. So it's not a subordinating conjunction. * 🤏 fmt 🤏 arrange imports (libzip is w/n receiver so can come next and ++ aesthetics) and remove re-explanatory comment. * ✨ ERC4337Factory (argotorg#644) * Add ERC4337Factory * Update README * Edit comment * 🤏 Nits Nit comment and payable constructor (why not it's a factory right) * fix offset in _call() * removed funcs & add tests * ✂️ Remove unused internal func `_call` no longer used * checkStartsWithCaller -> checkStartsWith * Simplify tests --------- Co-authored-by: ross <92001561+z0r0z@users.noreply.github.com> Co-authored-by: atarpara <akpatel0618@gmail.com> Co-authored-by: 0xlgtm <simon.tan.yu.jing@gmail.com> * delegateGuard -> storageGuard * Tidy test * ✨ ERC4337 direct storage (argotorg#645) * Add test * ~ snap * assert optimizes & goes harder in storguard * mock EP receive() to match StakeManager.sol * ⚡ optimize deposit via EP receive() * ⚡ optimize balance getter w/ STL fmt * 🤏 assert -> require * Add comments and extcodesize check on addDeposit * Tidy * Move back extcodesize check to deposit functions, make entryPoint view * Snapshot --------- Co-authored-by: ross <92001561+z0r0z@users.noreply.github.com> Co-authored-by: atarpara <akpatel0618@gmail.com> Co-authored-by: 0xlgtm <simon.tan.yu.jing@gmail.com>

Document input description and metadata output.

ed5c378

chriseth added the in progress label Jun 10, 2016

zelig reviewed Jun 10, 2016
View reviewed changes

Add language and some minor corrections and clarifications.

ef20312

chriseth force-pushed the metadata branch from 55db20e to ef20312 Compare June 10, 2016 20:05

axic reviewed Jul 28, 2016
View reviewed changes

chriseth added the soon label Aug 12, 2016

zelig mentioned this pull request Aug 29, 2016

admin.saveinfo(contract.info) fail ethereum/go-ethereum#2644

Closed

axic mentioned this pull request Oct 7, 2016

Define "json input/output" #1129

Closed

VoR0220 mentioned this pull request Nov 11, 2016

add minifest-spec ethpm/ethpm-spec#5

Merged

axic mentioned this pull request Nov 16, 2016

JSON interface description #1387

Merged

10 tasks

chriseth closed this Nov 16, 2016

chriseth removed in progress labels Nov 16, 2016

axic pushed a commit that referenced this pull request Nov 20, 2018

Merge pull request #645 from pirapira/first_loop_second_loop

606405b

eip-165: when the first loop hits BEGINDATA, the second loop should still run

Conversation

chriseth commented Jun 10, 2016

Uh oh!

VoR0220 commented Jun 10, 2016

Uh oh!

zelig Jun 10, 2016

Choose a reason for hiding this comment

Uh oh!

chriseth commented Jun 10, 2016

Uh oh!

chriseth commented Jun 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chriseth commented Jun 14, 2016

Uh oh!

chriseth commented Jun 16, 2016

Uh oh!

axic Jul 28, 2016

Choose a reason for hiding this comment

Uh oh!

axic Jul 28, 2016

Choose a reason for hiding this comment

Uh oh!

chriseth Jul 29, 2016

Choose a reason for hiding this comment

Uh oh!

axic Jul 29, 2016

Choose a reason for hiding this comment

Uh oh!

axic commented Jul 28, 2016

Uh oh!

chriseth commented Aug 18, 2016

Uh oh!

axic commented Aug 22, 2016

Uh oh!

axic commented Aug 22, 2016

Uh oh!

chriseth commented Aug 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

axic commented Aug 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chriseth commented Aug 23, 2016

Uh oh!

chriseth commented Aug 26, 2016

Uh oh!

chriseth commented Nov 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VoR0220 commented Nov 11, 2016

Uh oh!

chriseth commented Nov 11, 2016

Uh oh!

VoR0220 commented Nov 11, 2016

Uh oh!

chriseth commented Nov 16, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chriseth commented Jun 10, 2016 •

edited

Loading

chriseth commented Aug 22, 2016 •

edited

Loading

axic commented Aug 22, 2016 •

edited

Loading

chriseth commented Nov 11, 2016 •

edited

Loading