Skip to content

Document input description and metadata output.#645

Closed
chriseth wants to merge 2 commits intoargotorg:developfrom
chriseth:metadata
Closed

Document input description and metadata output.#645
chriseth wants to merge 2 commits intoargotorg:developfrom
chriseth:metadata

Conversation

@chriseth
Copy link
Contributor

This is a proposal, please comment!

@VoR0220
Copy link
Contributor

VoR0220 commented Jun 10, 2016

I'm not so certain here...It seems this is adding to the cost of deployment of a contract.

**************************************************

In order to ease source code verification of complex contracts that are spread across several files,
there is a standardized for describing the relations between those files.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

standardized way?

@chriseth
Copy link
Contributor Author

Thanks for the comments, @zelig!

One specific problem where I would like to have comments on: It should of course be possible to verify the metadata by different builds of the same compiler version (i.e. for different architectures). But this means that neither the binary hash nor the compiler version in its current form should be part of it (because that is reproduced in a different way depending on the architecture), and instead just the git commit hash. This means that the actual compiler binary can only be retrieved after a lookup inside the "compiler contract". This lookup is recommended anyway so that you do not download anything that is not even a compiler, as explained in #611, so I would not think this is a drawback.

@chriseth
Copy link
Contributor Author

chriseth commented Jun 10, 2016

@VoR0220 The additional costs are less than 2500 gas for the data and there is the option to switch it off. One of the advantages is that everyone will automatically have an auto-generated user interface for a contract in mist.

@chriseth
Copy link
Contributor Author

If there are no further comments, I would like to implement it like that. As there are a lot of flexibilities anyway, I don't think it is a big deal if we screw up the first version.

@chriseth
Copy link
Contributor Author

Note: There is more flexibility in the input specification for the solidity compiler. It should be possible to change the "mode" the compiler is operating on, for example:

  • only parse and type check, but do not compile
  • list all contracts in all files and whether they are abstract or not (could be done by just doing everything apart from compiling)
  • in general, do not compile unless bytecode is requested

userDocumentation: [ /* user documentation comments */ ],
developerDocumentation: [ /* developer documentation comments */ ],
natspec: [ /* natspec comments */ ]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed names for the rest:

  • abi
  • methodIdentifiers / functionHashes
  • evmAssembly
  • evmBytecode
  • evmRuntimeBytecode
  • evmGasEstimates
  • evmOpcodes
  • interface or solidityInterface
  • developerDocumentation
  • userDocumenation
  • natspec
  • why3

I'm not fully sure about source maps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore the bytecode section as it cannot be included per your comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure how much to include here. As the metadata output is mainly used for source verification, you can anyway re-generate everything here by re-running the compiler.

I would only include the essential information here which are needed even in the absence of a compiler like the ones given above.

The general output format of the compiler should be a superset of the metadata output or at least largely overlapping. This general output can contain exactly the fields you mention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I mistook this PR for the compiler input/output and only realised after that it is for verification purposes.

Should we open a issue / PR / gist for the compiler input/output?

@axic
Copy link
Contributor

axic commented Jul 28, 2016

Will the compiler JSON input/output based on the metadata output? Where will the bytecode fit in?

@chriseth chriseth added the soon label Aug 12, 2016
@chriseth
Copy link
Contributor Author

This requires a bit more discussion which moved here: https://pad.riseup.net/p/7x3G896a3NLA

@axic
Copy link
Contributor

axic commented Aug 22, 2016

@chriseth updated the regular output. I think the input settings.outputSelection should be an array of strings and each element should match the output field names.

@axic
Copy link
Contributor

axic commented Aug 22, 2016

@chriseth is why3 going to be generated per-file basis or only a single output is given? (I would assume the latter.)

@chriseth
Copy link
Contributor Author

chriseth commented Aug 22, 2016

@axic compilationTarget and outputSelection should somehow be unified. Do you think this can be done?

Why3 is currently on compiler-invocation basis, so neither contract nor file specific, and I think it will stay like that in the future.

@axic
Copy link
Contributor

axic commented Aug 22, 2016

@chriseth:

Why3 is currently on compiler-invocation basis, so neither contract nor file specific, and I think it will stay like that in the future.

I mean the output generated is a single blob for every included source or there's one output blob for each input?

@chriseth
Copy link
Contributor Author

@axic there is one output blob (just that it is not binary). Everything gets combined into a single file.

@chriseth
Copy link
Contributor Author

I will just provide a copy of the current state of the pad to increase redundancy:

**************************************************
Standardized Input Description and Metadata Output
**************************************************

In order to ease source code verification of complex contracts that are spread across several files,
there is a standardized way for describing the relations between those files.
Furthermore, the compiler can generate a json file while compiling that includes
the (hash of the) source, natspec comments and other metadata whose hash is included in the
actual bytecode. Specifically, the creation data for a contract has to begin with
`push32 <metadata hash> pop`.

The metadata standard is versioned. Future versions are only required to provide the "version" field,
the "language" field and the two keys inside the "compiler" field.
The field compiler.keccak should be the keccak hash of a binary of the compiler with the given version.

The example below is presented in a human-readable way. Properly formatted metadata
should use quotes correctly, reduce whitespace to a minimum and sort the keys of all objects
to arrive at a unique formatting.

Comments are of course not permitted and used here only for explanatory purposes.

Input Description
-----------------

QUESTION: How to specific file-reading callback? - probably not as part of json input

The input description is language-specific and could change with each compiler version, but it
should be backwards compatible if possible.

    {
      sources:
      {
        // the keys here are the "global" names of the source files, imports can use other files via remappings (see below)
        "abc": "contract b{}", // specify source directly
        // I think 'keccak' on its on is not enough. I would go perhaps with swarm: "0x12.." and ipfs: "Qma..." for simplicity
        // Where the content is stored is a second component, but yes, we could give an indication there.
        "def": {keccak: "0x123..."}, // source has to be retrieved by its hash
        "ghi": {file: "/tmp/path/to/file.sol"}, // file on filesystem
        "dir/file.sol": "contract a {}"
      },
      settings:
      {
        remappings: [":g/dir"], // just as it used to be
        optimizer: {enabled: true, runs: 500},
        // if given, only compiles this contract, can also be an array. If only a contract name is given, tries to find it if unique.
        compilationTarget: "myFile.sol:MyContract",
        // addresses of the libraries. If not all libraries are given here, it can result in unlinked objects whose output data is different
        libraries: {
          "def:MyLib": "0x123123..."
        },
        // The following can be used to restrict the fields the compiler will output.
        outputSelection: [
            "abi", "evm.assembly", "evm.bytecode", ..., "why3", "ewasm.wasm"
        ]
        outputSelection: {
        abi,asm,ast,bin,bin-runtime,clone-bin,devdoc,interface,opcodes,srcmap,srcmap-runtime,userdoc

 --ast                 AST of all source files.
  --ast-json            AST of all source files in JSON format.
  --asm                 EVM assembly of the contracts.
  --asm-json            EVM assembly of the contracts in JSON format.
  --opcodes             Opcodes of the contracts.
  --bin                 Binary of the contracts in hex.
  --bin-runtime         Binary of the runtime part of the contracts in hex.
  --clone-bin           Binary of the clone contracts in hex.
  --abi                 ABI specification of the contracts.
  --interface           Solidity interface of the contracts.
  --hashes              Function signature hashes of the contracts.
  --userdoc             Natspec user documentation of all contracts.
  --devdoc              Natspec developer documentation of all contracts.
  --formal              Translated source suitable for formal analysis.

          // to be defined
        }
      }
    }


Regular Output
--------------


    {
      errors: ["error1", "error2"], // we might structure them
      errors: [
          {
              file: "sourceFile.sol", // optional?
              contract: "contractName", // optional
              line: 100, // optional - currently, we always have a byte range in the source file
              // Errors/warnings originate in several components, most of them are not
              // backend-specific. Currently, why3 errors are part of the why3 output.
              // I think it is better to put code-generator-specific errors into the code-generator output
              // area, and warnings and errors that are code-generator-agnostic into this general area,
              // so that it is easier to determine whether some source code is invalid or only
              // triggers errors/warnings in some backend that might only implement some part of solidity.
              type: "evm" or "why3" or "ewasm" // maybe a better field name would be needed
              message: "Invalid keyword" // mandatory
          }
      ]
      contracts: {
        "sourceFile.sol:ContractName": {
          abi: 
          evm: {
              assembly:
              bytecode:
              runtimeBytecode:
              opcodes:
              gasEstimates:
              sourceMap:
              runtimeSourceMap:
              // If given, this is an unlinked object (cannot be filtered out explicitly, might be
              // filtered if both bytecode, runtimeBytecode, opcodes and others are filtered out)
              linkReferences: {
                "sourceFile.sol:Library1": [1, 200, 80] // byte offsets into bytecode. Linking replaces the 20 bytes there.
              }
              // the same for runtimeBytecode - I'm not sure it is a good idea to allow to link libraries differently for the runtime bytecode.
              // furthermore, runtime bytecode is always a substring of the bytecode anyway.
              runtimeLinkReferences: {
              }
          },
          functionHashes:
          metadata: // see below
          ewasm: {
              wast: // S-expression format
              wasm: // 
          }
        }
      },
      formal: {
        "why3": "..."
      },
      sourceList: ["source1.sol", "source2.sol"], // this is important for source references both in the ast as well as in the srcmap in the contract
      sources: {
        "source1.sol": {
          "AST": { ... }
        }
      }
    }

Metadata Output
---------------

Note that the actual bytecode is not part of the metadata because the hash
of the metadata structure will be included in the bytecode itself.

This requires the compiler to be able to compute the hash of its own binary,
which requires it to be statically linked. The hash of the binary is not
too important. It is much more important to have the commit hash because
that can be used to query a location of the binary (and whether the version is
"official") at a registry contract. 

    {
      version: "1",
      language: "Solidity",
      compiler: {
        commit: "55db20e32c97098d13230ab7500758e8e3b31d64",
        version: "soljson-2313-2016-12-12",
        keccak: "0x123..."
      },
      sources:
      {
        "abc": {keccak: "0x456..."}, // here, sources are always given by hash
        "def": {keccak: "0x123..."},
        "dir/file.sol": {keccax: "0xabc..."}
      },
      settings:
      {
        remappings: [":g/dir"],
        optimizer: {enabled: true, runs: 500},
        compilationTarget: "myFile.sol:MyContract",
        libraries: {
          "def:MyLib": "0x123123..."
        }
      },
      output:
      {
        abi: [ /* abi definition */ ],
        natspec: [ /* user documentation comments */ ]
      }
    }

This is used in the following way: A component that wants to interact
with a contract (e.g. mist) retrieves the creation transaction of the contract
and from that the first 33 bytes. If the first byte decodes into a PUSH32
instruction, the other 32 bytes are interpreted as the keccak-hash of
a file which is retrieved via a content-addressable storage like swarm.
That file is JSON-decoded into a structure like above. Sources are
retrieved in the same way and combined with the structure into a proper
compiler input description, which selects only the bytecode as output.

The compiler of the correct version (which is checked to be part of the "official" compilers)
is invoked on that input. The resulting
bytecode is compared (excess bytecode in the creation transaction
is constructor input data) which automatically verifies the metadata since
its hash is part of the bytecode. The constructor input data is decoded
according to the interface and presented to the user.

@chriseth
Copy link
Contributor Author

chriseth commented Nov 11, 2016

suggestion by @axic: extend metadata information with a retrieve-hint. So we can always use keccak for referencing source files and also add a swarm, ipfs or github link to actually get the source.

same goes for the actual metadata "file" - we should include the keccak hash plus a way to retrieve the file.

@VoR0220
Copy link
Contributor

VoR0220 commented Nov 11, 2016

I would like to propose that we add in the keys of chainID (or something like the concept of a chain ID) and block deployed at for verifiability and better package management.

@chriseth
Copy link
Contributor Author

"block deployed at" cannot be part of the metadata, because that has to be compile-time constant. The current plan is to fire a log at construction time, but the clients do not index events as well as they could be, so that might be hard to find. Perhaps we should put the "block deployed at" in the deployed bytecode in addition to firing an event.

@VoR0220
Copy link
Contributor

VoR0220 commented Nov 11, 2016

That works excellently. Lets make that happen.

@axic axic mentioned this pull request Nov 16, 2016
10 tasks
@chriseth
Copy link
Contributor Author

Continuing as #1387

@chriseth chriseth closed this Nov 16, 2016
axic pushed a commit that referenced this pull request Nov 20, 2018
eip-165: when the first loop hits BEGINDATA, the second loop should still run
axic pushed a commit to ipsilon/solidity that referenced this pull request Apr 22, 2025
* ✨ Receiver (argotorg#622)

* ⚡📬 callback fallback

* 🤏 scrunch

🤏 scrunch comments a bit

* 👌 Fallback function triggered when upper 224 bits of 0x40 are mostly zero.

👌 Fallback function triggered when upper 224 bits of 0x40 are mostly zero.


* ⚡ optimize / formatting

⚡ optimize / formatting

* ✍️ Write some tests n stuff

* ⚡ zimplify codesize

* 🥢 prefer less gas

* 🥢  nit that dbl comment

🥢  nit that dbl comment

* 🥌 Simpler and future-proofed is also cheaper ofc

* 👌 Back to known magic vals.

* 📁 File into accounts

* 📍 Doc acct contracts in README

* 🤏 header

🤏 header


* 🤏 header

🤏 header


* 🪡 Stitch in comment on fallback pass cond

* ✨ EIP4337 Simple Account (argotorg#623)

* 🪄 EIP4337 Base Account

* 📁 Make and refile into `accounts`

* ⚡ Pinion owned clone base & optimize & typo nits

* 🥌 Simplify and support contract signing owner

* Tidy

* Update README

* Tidy

* Optimize

* Add tests

* Tidy

* Optimize

* Tidy

* Add comment

* Add deposit functions

* Fix

* Tidy

* Use gas() instead of not(0)

* Edit comments

* Fix comments delegatecall -> call

* Use SignatureCheckerLib to validate signature

* Add signature check tests

* Edit tests

* ✏️ Txt nits

* Add delegateExecute

* Add delegateExecute test

* Tidy

* 🏧 As in Ownable, onlyOwner functions are payable

* Make initialize payable

* Edit comments

* Edit comments

* Futureproof

* Strengthen tests

* Change to use Execution struct

* Execution -> Call

* Optimize

* Optimize and strengthen tests

* ✨ ERC4337: 1️⃣2️⃣7️⃣1️⃣ (argotorg#642)

* 1️⃣2️⃣7️⃣1️⃣

* 🤌 Pull branch

* ✔️ EOA case (1o1)

use hybrid validation method

* ✔️ EOA case test

* 📸 snapshot

* Optimize

* Edit comments

* Add comment

* Optimize

* Copypasta toEthSignedMessageHash to SignatureCheckerLib

* Add delegateGuard

* Add comment

* Tidy

* Add missing restore part of free memory pointer

* 🥢 Typo nit

Remove extra `be` in comment and also use `so` as coordinating conjunction removing `that`. So it's not a subordinating conjunction.

* 🤏 fmt

🤏 arrange imports (libzip is w/n receiver so can come next and ++ aesthetics) and remove re-explanatory comment.

* ✨ ERC4337Factory (argotorg#644)

* Add ERC4337Factory

* Update README

* Edit comment

* 🤏 Nits

Nit comment and payable constructor (why not it's a factory right)

* fix offset in _call()

* removed funcs & add tests

* ✂️ Remove unused internal func

`_call` no longer used

* checkStartsWithCaller -> checkStartsWith

* Simplify tests

---------

Co-authored-by: ross <92001561+z0r0z@users.noreply.github.com>
Co-authored-by: atarpara <akpatel0618@gmail.com>
Co-authored-by: 0xlgtm <simon.tan.yu.jing@gmail.com>

* delegateGuard -> storageGuard

* Tidy test

* ✨ ERC4337 direct storage (argotorg#645)

* Add test

* ~ snap

* assert optimizes & goes harder in storguard

* mock EP receive() to match StakeManager.sol

* ⚡ optimize deposit via EP receive()

* ⚡ optimize balance getter w/ STL fmt

* 🤏 assert -> require

* Add comments and extcodesize check on addDeposit

* Tidy

* Move back extcodesize check to deposit functions, make entryPoint view

* Snapshot

---------

Co-authored-by: ross <92001561+z0r0z@users.noreply.github.com>
Co-authored-by: atarpara <akpatel0618@gmail.com>
Co-authored-by: 0xlgtm <simon.tan.yu.jing@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants