Skip to content

Error out when contracts collide on name#1397

Merged
chriseth merged 20 commits intoargotorg:developfrom
roadriverrail:contract_collision
Jan 18, 2017
Merged

Error out when contracts collide on name#1397
chriseth merged 20 commits intoargotorg:developfrom
roadriverrail:contract_collision

Conversation

@roadriverrail
Copy link
Contributor

The previous behavior, courtesy of the [] operator in std::map, would
uncritically store a new ContractDefinition in m_contracts even when a
ContractDefinition already existed. This "resolved" collissions on contract
names by clobbering the original one with the new one, and could lead to
scenarios where the clobber would only be discovered when the original
ContractDefinition could not be found or referred to, which was an unhelpful
InternalCompilerError.

This change checks the m_contracts map for a collision first and will not let
the ContractDefinition be changed to a new one once it's set, throwing a
CompilerError with information about the conflict.

Fixes issue #1120

@roadriverrail
Copy link
Contributor Author

Some things to consider before merging--

  1. This "breaks" four tests. Two of them are broken because there appears to be a presumption that CompilerStack::compile() will casually return a boolean showing if the code will compile or not. To show good error messages, I throw a CompilerError, which means that, for two of these tests, we're "passing" but by throwing an exception and not by returning false. This should be an easy thing to correct.

  2. At least one of the other two tests broken by this change are broken because the test presumes that libraries and contracts don't collide when they have the same name. I'm not sure if this is really desirable behavior. See also conversation on issue Equally named contracts cause internal compiler error #1120

if (m_contracts.find(contract->name()) != m_contracts.end())
{
const ContractDefinition* existingContract = m_contracts.find(contract->name())->second.contract;
if (contract != existingContract)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to keep TAB-only indentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swear that I know this. I'm going to have to colorize my tabs so I don't keep doing this.

if (!resolver.resolveNamesAndTypes(*contract)) return false;
if (m_contracts.find(contract->name()) != m_contracts.end())
{
const ContractDefinition* existingContract = m_contracts.find(contract->name())->second.contract;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the const.

if (contract != existingContract)
BOOST_THROW_EXCEPTION(CompilerError() <<
errinfo_sourceLocation(contract->location()) <<
errinfo_comment(contract->name() + " conflicts with!!! contract at "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess !!! was for debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, while I was sorting some test case stuff out. I can't believe I didn't see that. No more commits before bed for me.

if (contract != existingContract)
BOOST_THROW_EXCEPTION(CompilerError() <<
errinfo_sourceLocation(contract->location()) <<
errinfo_comment(contract->name() + " conflicts with contract at "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verb "conflicts" is a bit vague. Maybe something like Contract name is already defined at ... is more direct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger that.

@chriseth
Copy link
Contributor

The planned proper fix would be to disambiguate such cases through the file name - this should also help in the case of libraries. Do you think that this would fix all the cases you have seen?

@roadriverrail
Copy link
Contributor Author

Could you explain in more detail what you mean by "disambiguate such cases through the file name"? Maybe offer a small example?

@chriseth
Copy link
Contributor

If you want to refer to a contract, you should be able to do it by FileName:ContractName. Contracts and libraries with the same name should only be disallowed in the same file.

Copy link
Contributor

@chriseth chriseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, thanks!

}


std::string ContractDefinition::fullyQualifiedName() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop the std:: prefixes in cpp files.


std::string ContractDefinition::fullyQualifiedName() const
{
std::string sourceString = *(location().sourceName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declaration, which should be a base class for this one, already has a function sourceUnitName() which does a similar thing, but based on scopes and not on source locations.
Can you add a function fullyQualifiedName to the Declaration class and return something like sourceUnitName() + ":" + name()? I think the : should always be present, even if the source unit name is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into this. FYI, the reason I dropped the : when the source unit name is not present is because it breaks a bunch of tests that expect to look up contracts by their name. It's really just busy work to fix, but I also opted for the path-of-least-edits for a first go-round.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is actually a different thing. In the tests, the source file is not named. I would say that it is always possible to do a qualified lookup and in addition, you can do an unqualified lookup if the name is unique, which would guarantee backwards-compatibility.

m_contracts[contract->name()].contract = contract;
if (m_contracts.find(contract->fullyQualifiedName()) != m_contracts.end())
{
const ContractDefinition* existingContract = m_contracts.find(contract->fullyQualifiedName())->second.contract;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use m_contracts[contract->fullyQualifiedName()] here? Also please put the const after the type name: ContractDefinition const*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It's safe to use [] by that point. Also, I appreciate the style nits; y'all just use a different style than I've seen before (though I'm far more a C guy than a C++ guy)


if (contract != existingContract)
{
auto err = make_shared<Error>(Error::Type::DeclarationError);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to trigger this? I think scoping rules should not allow it. If you cannot trigger it, please change it into a solAssert().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually put that in there because I can trigger it. It came up fairly regularly, I think possibly having to do with different phases of compilation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me a test that triggers it? I would like to investigate.

Copy link
Contributor Author

@roadriverrail roadriverrail Dec 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContractsCollision1.sol:

pragma solidity ^0.4.0;

contract Deed {
}

contract Registrar {
    function newBid(bytes32 sealedBid) payable {
        Deed newBid = new Deed();
    } 
}

ContractsCollision2.sol:

pragma solidity ^0.4.0;


contract Deed {
}

contract Registrar {
}

contract NoCollision {
}

If you drop something like this right before the if condition we're discussing:

          if (contract == existingContract)
          {
            cerr << "contract: " << contract->fullyQualifiedName() << " existingContract : " << existingContract->fullyQualifiedName() << endl;
          }

Then you should see each contract listed once. It happened so regularly to me that I just assumed it was part of some two-phase compilation process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it, all contracts in your example have different fully qualified names, so we should not get to line 226.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, they do all have different fully-qualified names. My intuition, however, is that somehow or another this code path ends up getting run twice for each contract. The first time causes each contract to get added, but then the hashmap is built up, and so the second time, each contract "collides" with itself once. I didn't know why at the time (even just writing some of this code gave me more comfort with how solc works internally), but like I said, it was almost like the parsing process happened twice. Before I put in this stuff that actually checks for collision between insert, running through this code path twice was a non-issue, since all that happened was an insert, and inserting the same contract a second time wasn't going to cause any issues. But now that collisions are detected before the insert, it becomes more apparent.

It should be pretty easy to figure out each contract is being passed through this function twice, and where. I can take on that investigative effort if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. It looks like what's happening is that there were two places in this function where contracts get added to the hashmap. The first one is back around line 176 or so. I replaced both cases of the insertion into the hashmap with a collision detector, because I naively believed there was a reason for both to exist; both of those inserts exist in the context of slightly different loop control logic. I'm not sure what really differentiates them, though; it might be one isn't actually necessary.


}
else
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No braces needed here.

}
// If no collision, return the contract's name
// String is copied to ensure that the contract's name can't be messed with
return std::string(matchContract.contract->name());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No explicit conversion needed here.

}
}
// If no collision, return the contract's name
// String is copied to ensure that the contract's name can't be messed with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please indent.

{
if (m_args.count("output-dir"))
createFile(_contract + ".bin", m_compiler->object(_contract).toHex());
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use braces.

}
}
else
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another pair of braces to remove.

Copy link
Contributor Author

@roadriverrail roadriverrail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responses included; anything without a response will be corrected as requested.

@roadriverrail
Copy link
Contributor Author

Did a stylistic cleanup and incorporated all the proposed changes, with the exception of the ones from @chriseth regarding my guard code to prevent double-adding a contract in CompilerStack.cpp (see conversation above).

I'm also going to fix up the tests so they use fully qualified names for looking up contracts.

CompilerStack c;
c.addSource("a", "library A {} pragma solidity >=0.0;");
c.addSource("b", "library A {} pragma solidity >=0.0;");
c.addSource("a", "library A {} library A {} pragma solidity >=0.0;");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, the test does not test the intended feature anymore. There is a separate module that checks for library name clashes. This might be something we need to address here, too: Clashes in the linker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, the libraries are actually internally called "a:A" and "b:A" and would be referenced as import A from "a" and import A from "b". The only way I could think to cause clashes would be to introduce a third source with imports the two of them (both as A), but I am presuming, perhaps incorrectly, that the compiler would notice this and give an error before getting to a linker. So, I'm not completely sure how to set up this test correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of what they will be called, this contract will produce an error in the type checker and not in the linker. Can you please also check how the library is called in the unlinked bytecode? There should be something like __A______ or __a:A____.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In unlinked bytecode, it's currently in the __A_______ format, because I left the codegen stuff using name( ) instead of fullyQualifiedName( ). The code that prevents library name collisions is in CompilerStack.cpp and uses fullyQualifiedName( ), hence the reason the test fails. So, one of two things probably needs to happen-- the linker symbols start using fully qualified names or the library clash checker goes back to using name( ).

I'm somewhat more fond of the first of those two ideas, so I've actually already written it up but haven't pushed it (since I don't want to update the PR). Two issues there. The first is people's existing linker map files will need to be updated. The second is that it looks like there's a 36-character limit on link references; if that's the case, I'd love to know how we can truncate starting at the tail rather than the head, so we preserve the :A part and as much path as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I would also prefer to use fully-qualified names for the linker. In the future, libraries will be linked differently and you do not have the 36 character limit. Also, the format of using __A____ is only a semi-backwards-compatible hack. You should really use solc --libraries my/file/name.sol:A=0x123 my/contract.sol for compiling, and there, there is obviously not such a small size limit.

If you want, you can merge this pull request and do the library handling in another one.

if (!resolver.resolveNamesAndTypes(*contract)) return false;
m_contracts[contract->name()].contract = contract;

if (m_contracts.find(contract->fullyQualifiedName()) != m_contracts.end())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this and the other place again: The compiler returns an error, which means it detects the double declaration, but the aim for the compiler is to still continue as much as possible even if there was an error.

If you leave this code in, the error will be reported twice. So please change this (and the other place) to silently drop the second contract and not store it in m_contracts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the double-reporting of the error. Actually, it's worth noting that the only way to cause a collision now is to have the same contract twice in one file, which is actually covered under another error already, anyway. So, I think you're right that it's actually okay to simply silently continue without adding the contract since it's covered by the other error.

I will, however, be leaving comments explaining what's going on in the code, because it feels like the sort of thing that a future developer could trip over.

Copy link
Contributor

@chriseth chriseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only remaining issue is whether this modified the linker or not and whether we have to change the test accordingly or not.

@roadriverrail
Copy link
Contributor Author

roadriverrail commented Dec 22, 2016 via email

Copy link
Contributor

@chriseth chriseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you could change the query functions to be backwards-compatible, so that e.g. the linker also works without having to prefix the library name by path:. If it is backwards-compatible, no changes to the tests should be necessary.

{
if (m_args.count(g_argOutputDir))
createFile(_contract + ".bin", m_compiler->object(_contract).toHex());
if (m_args.count("output-dir"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change here on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It looks like something left over from merging up that shouldn't be that way. Will fix.

@chriseth
Copy link
Contributor

chriseth commented Jan 2, 2017

Please also add an entry into Changelog.md.

@roadriverrail
Copy link
Contributor Author

Apologies on the delay; transition in my day job to blame. Fixed the most critical parts of this. Will look at the query process tomorrow to add a fallback like you asked.

@chriseth
Copy link
Contributor

chriseth commented Jan 9, 2017

No worries! Do you still plan to make it backwards-compatible?

@roadriverrail
Copy link
Contributor Author

roadriverrail commented Jan 9, 2017 via email

The previous behaviour, courtesy of the [] operator in std::map, would
uncritically store a new ContractDefinition in m_contracts even when a
ContractDefinition already existed.  This "resolved" collissions on contract
names by clobbering the original one with the new one, and could lead to
scenarios where the clobber would only be discovered when the original
ContractDefinition could not be found or referred to, which was an unhelpful
InternalCompilerError.

This change checks the m_contracts map for a collision first and will not let
the ContractDefinition be changed to a new one once it's set, throwing a
CompilerError with information about the conflict.
@chriseth had suggested that it would be better if contracts were
referenced in a file:contract notation, and that we output .bin files
that prepend original path names if necessary to avoid a collision.

This commit is mostly a draft; it still needs to be run through the test
suite.
A large number of tests compile contracts while passing in an empty
string for the source name.  This leads to it being keyed by the name
":<contract>", while the tests try to look it up under the name
"<contract>".  This change resolves that issue by dropping the ':' in
cases where there is, effectively, no source file to prepend anyway.
roadriverrail and others added 14 commits January 16, 2017 12:32
Throwing a CompilerError on multiple contract definition violates the
expectations of the test suite, which thinks that compile() will
return false if the code can't compile.  This brings contract
collision reporting in line with most of the other errors.
Since contracts and libraries only collide if they share a common
source file now, this test only works if both libraries are in the
same source.
The fully-qualified name of a contract with no source unit is :<Name>
instead of just <Name>, so the test system needed to be adjusted
accordingly.
Because contracts are uniquely identified by their source unit, there
is no need for a unique error for this; it's actually covered by the
checker for double-declaration of identifiers.
Using libraries leaves behind a library link reference in the binary
which the linker must later resolve.  These link references were still
being generated by name and not by fully-qualified name.  This would
lead to a link-time collision between two libraries having the same
name but in different source units.

This change changes linker symbols over to fully-qualified names,
which resolves that issue.  This does potentially introduce a new
problem, which is that linker symbols appear to be limited to 36
characters and are truncated.  Storing paths extends the average
symbol size, and it would be great if truncation was from the tail
rather than the head.
The library name clash checker throws errors when two libraries of the
same name are spotted.  In a previous commit, this function was
rewritten to use fully-qualified names instead, which makes it redundant
to the checker for multiply-declared identifiers.  Since it no longer
serves a clear purpose, the function is being dropped.
Since libaraies no longer collide on name but on fully-qualified name,
you can only induce collision by colliding them as idenfitiers.
Looks like merging up munged line 188 in CommandLineInterface.cpp,
so that a string literal was being used where a global variable should
be.
Properly, contracts are now looked up via <source>:<contract> identifiers
called "fully qualified names."  As a modicum of backward-compatibility,
failure on a lookup is now backed up by seeing if the ":" exists at all,
and if it doesn't, then the known contracts are scanned for any matching
contract name.
@roadriverrail
Copy link
Contributor Author

Rebased, added, the fallback, and reverted the changes to the tests. I'm not super stoked about the fallback, mostly because somone may have really meant "A:B," but the lookup is for just "B," and the lookup could return "C:B" instead. Again, though, that might live in a world of theory that exists mostly in my head.

@chriseth
Copy link
Contributor

Nice, thanks a lot! There are some tiny things, which I will just do after merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments