Skip to content

Hash linker #5145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 12, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Breaking Changes:
* Commandline interface: Remove obsolete ``--formal`` option.
* Commandline interface: Rename the ``--julia`` option to ``--yul``.
* Commandline interface: Require ``-`` if standard input is used as source.
* Commandline interface: Use hash of library name for link placeholder instead of name itself.
* Compiler interface: Disallow remappings with empty prefix.
* Control Flow Analyzer: Consider mappings as well when checking for uninitialized return values.
* Control Flow Analyzer: Turn warning about returning uninitialized storage pointers into an error.
Expand Down
8 changes: 8 additions & 0 deletions docs/050-breaking-changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,14 @@ Command Line and JSON Interfaces
node was replaced by a field called ``kind`` which can have the
value ``"constructor"``, ``"fallback"`` or ``"function"``.

* In unlinked binary hex files, library address placeholders are now
the first 36 hex characters of the keccak256 hash of the fully qualified
library name, surrounded by ``$...$``. Previously,
just the fully qualified library name was used.
This recudes the chances of collisions, especially when long paths are used.
Binary files now also contain a list of mappings from these placeholders
to the fully qualified names.

Constructors
------------

Expand Down
18 changes: 15 additions & 3 deletions docs/using-the-compiler.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,26 @@ If there are multiple matches due to remappings, the one with the longest common

For security reasons the compiler has restrictions what directories it can access. Paths (and their subdirectories) of source files specified on the commandline and paths defined by remappings are allowed for import statements, but everything else is rejected. Additional paths (and their subdirectories) can be allowed via the ``--allow-paths /sample/path,/another/sample/path`` switch.

If your contracts use :ref:`libraries <libraries>`, you will notice that the bytecode contains substrings of the form ``__LibraryName______``. You can use ``solc`` as a linker meaning that it will insert the library addresses for you at those points:
If your contracts use :ref:`libraries <libraries>`, you will notice that the bytecode contains substrings of the form ``__$53aea86b7d70b31448b230b20ae141a537$__``. These are placeholders for the actual library addresses.
The placeholder is a 34 character prefix of the hex encoding of the keccak256 hash of the fully qualified library name.
The bytecode file will also contain lines of the form ``// <placeholder> -> <fq library name>`` at the end to help
identify which libraries the placeholders represent. Note that the fully qualified library name
is the path of its source file and the library name separated by ``:``.
You can use ``solc`` as a linker meaning that it will insert the library addresses for you at those points:

Either add ``--libraries "Math:0x12345678901234567890 Heap:0xabcdef0123456"`` to your command to provide an address for each library or store the string in a file (one library per line) and run ``solc`` using ``--libraries fileName``.
Either add ``--libraries "file.sol:Math:0x1234567890123456789012345678901234567890 file.sol:Heap:0xabCD567890123456789012345678901234567890"`` to your command to provide an address for each library or store the string in a file (one library per line) and run ``solc`` using ``--libraries fileName``.

If ``solc`` is called with the option ``--link``, all input files are interpreted to be unlinked binaries (hex-encoded) in the ``__LibraryName____``-format given above and are linked in-place (if the input is read from stdin, it is written to stdout). All options except ``--libraries`` are ignored (including ``-o``) in this case.
If ``solc`` is called with the option ``--link``, all input files are interpreted to be unlinked binaries (hex-encoded) in the ``__$53aea86b7d70b31448b230b20ae141a537$__``-format given above and are linked in-place (if the input is read from stdin, it is written to stdout). All options except ``--libraries`` are ignored (including ``-o``) in this case.

If ``solc`` is called with the option ``--standard-json``, it will expect a JSON input (as explained below) on the standard input, and return a JSON output on the standard output. This is the recommended interface for more complex and especially automated uses.

.. note::
The library placeholder used to be the fully qualified name of the library itself
instead of the hash of it. This format is still supported by ``solc --link`` but
the compiler will no longer output it. This change was made to reduce
the likelihood of a collision between libraries, since only the first 36 characters
of the fully qualified library name could be used.

.. _evm-version:
.. index:: ! EVM version, compile target

Expand Down
10 changes: 5 additions & 5 deletions libdevcore/CommonData.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,18 +76,18 @@ bytes dev::fromHex(std::string const& _s, WhenError _throw)

bool dev::passesAddressChecksum(string const& _str, bool _strict)
{
string s = _str.substr(0, 2) == "0x" ? _str.substr(2) : _str;
string s = _str.substr(0, 2) == "0x" ? _str : "0x" + _str;

if (s.length() != 40)
if (s.length() != 42)
return false;

if (!_strict && (
_str.find_first_of("abcdef") == string::npos ||
_str.find_first_of("ABCDEF") == string::npos
s.find_first_of("abcdef") == string::npos ||
s.find_first_of("ABCDEF") == string::npos
))
return true;

return _str == dev::getChecksummedAddress(_str);
return s == dev::getChecksummedAddress(s);
}

string dev::getChecksummedAddress(string const& _addr)
Expand Down
2 changes: 1 addition & 1 deletion libdevcore/CommonIO.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ void dev::writeFile(std::string const& _file, bytesConstRef _data, bool _writeDe
{
// create directory if not existent
fs::path p(_file);
if (!fs::exists(p.parent_path()))
if (!p.parent_path().empty() && !fs::exists(p.parent_path()))
{
fs::create_directories(p.parent_path());
try
Expand Down
10 changes: 8 additions & 2 deletions libevmasm/LinkerObject.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

#include <libevmasm/LinkerObject.h>
#include <libdevcore/CommonData.h>
#include <libdevcore/SHA3.h>

using namespace dev;
using namespace dev::eth;
Expand Down Expand Up @@ -50,14 +51,19 @@ string LinkerObject::toHex() const
for (auto const& ref: linkReferences)
{
size_t pos = ref.first * 2;
string const& name = ref.second;
string hash = libraryPlaceholder(ref.second);
hex[pos] = hex[pos + 1] = hex[pos + 38] = hex[pos + 39] = '_';
for (size_t i = 0; i < 36; ++i)
hex[pos + 2 + i] = i < name.size() ? name[i] : '_';
hex[pos + 2 + i] = hash.at(i);
}
return hex;
}

string LinkerObject::libraryPlaceholder(string const& _libraryName)
{
return "$" + keccak256(_libraryName).hex().substr(0, 34) + "$";
}

h160 const*
LinkerObject::matchLibrary(
string const& _linkRefName,
Expand Down
5 changes: 5 additions & 0 deletions libevmasm/LinkerObject.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@ struct LinkerObject
/// addresses by placeholders.
std::string toHex() const;

/// @returns a 36 character string that is used as a placeholder for the library
/// address (enclosed by `__` on both sides). The placeholder is the hex representation
/// of the first 18 bytes of the keccak-256 hash of @a _libraryName.
static std::string libraryPlaceholder(std::string const& _libraryName);

private:
static h160 const* matchLibrary(
std::string const& _linkRefName,
Expand Down
56 changes: 48 additions & 8 deletions solc/CommandLineInterface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -226,21 +226,21 @@ void CommandLineInterface::handleBinary(string const& _contract)
if (m_args.count(g_argBinary))
{
if (m_args.count(g_argOutputDir))
createFile(m_compiler->filesystemFriendlyName(_contract) + ".bin", m_compiler->object(_contract).toHex());
createFile(m_compiler->filesystemFriendlyName(_contract) + ".bin", objectWithLinkRefsHex(m_compiler->object(_contract)));
else
{
cout << "Binary: " << endl;
cout << m_compiler->object(_contract).toHex() << endl;
cout << objectWithLinkRefsHex(m_compiler->object(_contract)) << endl;
}
}
if (m_args.count(g_argBinaryRuntime))
{
if (m_args.count(g_argOutputDir))
createFile(m_compiler->filesystemFriendlyName(_contract) + ".bin-runtime", m_compiler->runtimeObject(_contract).toHex());
createFile(m_compiler->filesystemFriendlyName(_contract) + ".bin-runtime", objectWithLinkRefsHex(m_compiler->runtimeObject(_contract)));
else
{
cout << "Binary of the runtime part: " << endl;
cout << m_compiler->runtimeObject(_contract).toHex() << endl;
cout << objectWithLinkRefsHex(m_compiler->runtimeObject(_contract)) << endl;
}
}
}
Expand Down Expand Up @@ -482,9 +482,23 @@ bool CommandLineInterface::parseLibraryOption(string const& _input)
string addrString(lib.begin() + colon + 1, lib.end());
boost::trim(libName);
boost::trim(addrString);
if (addrString.substr(0, 2) == "0x")
addrString = addrString.substr(2);
if (addrString.empty())
{
cerr << "Empty address provided for library \"" << libName << "\": " << endl;
cerr << "Note that there should not be any whitespace after the colon." << endl;
return false;
}
else if (addrString.length() != 40)
{
cerr << "Invalid length for address for library \"" << libName << "\": " << addrString.length() << " instead of 40 characters." << endl;
return false;
}
if (!passesAddressChecksum(addrString, false))
{
cerr << "Invalid checksum on library address \"" << libName << "\": " << addrString << endl;
cerr << "Invalid checksum on address for library \"" << libName << "\": " << addrString << endl;
cerr << "The correct checksum is " << dev::getChecksummedAddress(addrString) << endl;
return false;
}
bytes binAddr = fromHex(addrString);
Expand Down Expand Up @@ -569,7 +583,7 @@ Allowed options)",
g_argLibraries.c_str(),
po::value<vector<string>>()->value_name("libs"),
"Direct string or file containing library addresses. Syntax: "
"<libraryName>: <address> [, or whitespace] ...\n"
"<libraryName>:<address> [, or whitespace] ...\n"
"Address is interpreted as a hex string optionally prefixed by 0x."
)
(
Expand Down Expand Up @@ -1056,8 +1070,12 @@ bool CommandLineInterface::link()
{
string const& name = library.first;
// Library placeholders are 40 hex digits (20 bytes) that start and end with '__'.
// This leaves 36 characters for the library name, while too short library names are
// padded on the right with '_' and too long names are truncated.
// This leaves 36 characters for the library identifier. The identifier used to
// be just the cropped or '_'-padded library name, but this changed to
// the cropped hex representation of the hash of the library name.
// We support both ways of linking here.
librariesReplacements["__" + eth::LinkerObject::libraryPlaceholder(name) + "__"] = library.second;

string replacement = "__";
for (size_t i = 0; i < placeholderSize - 4; ++i)
replacement.push_back(i < name.size() ? name[i] : '_');
Expand Down Expand Up @@ -1087,6 +1105,11 @@ bool CommandLineInterface::link()
cerr << "Reference \"" << name << "\" in file \"" << src.first << "\" still unresolved." << endl;
it += placeholderSize;
}
// Remove hints for resolved libraries.
for (auto const& library: m_libraries)
boost::algorithm::erase_all(src.second, "\n" + libraryPlaceholderHint(library.first));
while (!src.second.empty() && *prev(src.second.end()) == '\n')
src.second.resize(src.second.size() - 1);
}
return true;
}
Expand All @@ -1100,6 +1123,23 @@ void CommandLineInterface::writeLinkedFiles()
writeFile(src.first, src.second);
}

string CommandLineInterface::libraryPlaceholderHint(string const& _libraryName)
{
return "// " + eth::LinkerObject::libraryPlaceholder(_libraryName) + " -> " + _libraryName;
}

string CommandLineInterface::objectWithLinkRefsHex(eth::LinkerObject const& _obj)
{
string out = _obj.toHex();
if (!_obj.linkReferences.empty())
{
out += "\n";
for (auto const& linkRef: _obj.linkReferences)
out += "\n" + libraryPlaceholderHint(linkRef.second);
}
return out;
}

bool CommandLineInterface::assemble(
AssemblyStack::Language _language,
AssemblyStack::Machine _targetMachine
Expand Down
4 changes: 4 additions & 0 deletions solc/CommandLineInterface.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ class CommandLineInterface
private:
bool link();
void writeLinkedFiles();
/// @returns the ``// <identifier> -> name`` hint for library placeholders.
static std::string libraryPlaceholderHint(std::string const& _libraryName);
/// @returns the full object with library placeholder hints in hex.
static std::string objectWithLinkRefsHex(eth::LinkerObject const& _obj);

bool assemble(AssemblyStack::Language _language, AssemblyStack::Machine _targetMachine);

Expand Down
18 changes: 18 additions & 0 deletions test/cmdlineTests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,24 @@ echo '' | "$SOLC" - --link --libraries a:0x90f20564390eAe531E810af625A22f51385Cd
printTask "Testing long library names..."
echo '' | "$SOLC" - --link --libraries aveeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeerylonglibraryname:0x90f20564390eAe531E810af625A22f51385Cd222 >/dev/null

printTask "Testing linking itself..."
SOLTMPDIR=$(mktemp -d)
(
cd "$SOLTMPDIR"
set -e
echo 'library L { function f() public pure {} } contract C { function f() public pure { L.f(); } }' > x.sol
"$SOLC" --bin -o . x.sol 2>/dev/null
# Explanation and placeholder should be there
grep -q '//' C.bin && grep -q '__' C.bin
# But not in library file.
grep -q -v '[/_]' L.bin
# Now link
"$SOLC" --link --libraries x.sol:L:0x90f20564390eAe531E810af625A22f51385Cd222 C.bin
# Now the placeholder and explanation should be gone.
grep -q -v '[/_]' C.bin
)
rm -rf "$SOLTMPDIR"

printTask "Testing overwriting files..."
SOLTMPDIR=$(mktemp -d)
(
Expand Down
2 changes: 1 addition & 1 deletion test/libevmasm/Assembler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ BOOST_AUTO_TEST_CASE(all_assembly_items)

BOOST_CHECK_EQUAL(
_assembly.assemble().toHex(),
"5b6001600220606773__someLibrary___________________________"
"5b6001600220606773__$bf005014d9d0f534b8fcb268bd84c491a2$__"
"6000567f556e75736564206665617475726520666f722070757368696e"
"6720737472696e605f6001605e73000000000000000000000000000000000000000000fe"
"fe010203044266eeaa"
Expand Down