Skip to content

Suggest alternatives when identifier not found. Closes #3058.#3147

Merged
chriseth merged 8 commits intoargotorg:developfrom
Balajiganapathi:alternative_scope
Feb 13, 2018
Merged

Suggest alternatives when identifier not found. Closes #3058.#3147
chriseth merged 8 commits intoargotorg:developfrom
Balajiganapathi:alternative_scope

Conversation

@Balajiganapathi
Copy link
Contributor

@Balajiganapathi Balajiganapathi commented Oct 29, 2017

We can use the following method to compute alternatives:

  1. When an identifier is not found, look through current and enclosing scopes and compare each declaration name with the identifier name
  2. To decide whether two names are similar, calculate the Damerau–Levenshtein distance. This metric considers insertion, deletion, substitution and transposition as single operations
  3. If the DL distance is <= 2 and it is less than the lengths of the name they can be considered similar.

For #3058.

@Balajiganapathi Balajiganapathi changed the title Suggest alternatives when identifier not found. For #3058. Suggest alternatives when identifier not found. Closes #3058. Oct 29, 2017
@axic
Copy link
Contributor

axic commented Oct 29, 2017

@Balajiganapathi thanks these are great! Will do an individual review, but two small mentions (applies to all PRs): no need to use namespace specifier std as it is imported and please add an entry to the changelog in every case a bug is fixed (no need to include bug issue number) or a feature is added.

@axic
Copy link
Contributor

axic commented Oct 29, 2017

Please include a test for inheritance (suggested name in the inheritance tree).

if (m_enclosingContainer)
{
std::vector<ASTString> enclosingSimilar = m_enclosingContainer->similarNames(_name);
similar.insert(similar.end(), enclosingSimilar.begin(), enclosingSimilar.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder which is faster. The current way or doing the parent first and the push_front to extend that vector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is reasonably fast. push_front on vector is quite slow (though we can use list/deque to get around that). Inserting at end like here is efficient.

It can be made slightly faster by avoiding creation of multiple vectors. We can create a single vector and pass it by reference as a parameter to the function. Or we can use a list and use splice to append at end. But I am not sure it will result in any substantial gain.

PS: Thanks for the tips :) will remove std:: and add change log to all my PRs.

CHECK_ERROR(sourceCode, DeclarationError, "Undeclared identifier.");

sourceCode = R"(
contract c {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation on these looks a bit off (one extra level of indentation).

@federicobond
Copy link
Contributor

Looks really good! Great work @Balajiganapathi

@axic
Copy link
Contributor

axic commented Oct 30, 2017

@Balajiganapathi just a heads up: we'll be busy with devcon this week and it is likely that your changes won't be properly addressed this week.

@Balajiganapathi
Copy link
Contributor Author

@axic I have made the suggested changes to this (and my 3 other) PRs. Waiting for these PRs to be reviewed and merged before taking up more tasks as I don't want to keep repeating the same mistakes :)

return similar;
}

bool DeclarationContainer::areSimilarNames(ASTString const& name1, ASTString const& name2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this to libdevcore and name it something like stringWithinDistance(name1, name2, distance). It can be later reused for showing potential names for visibility specifiers, etc.

function g() public { fun(); }
}
)";
CHECK_ERROR(sourceCode, DeclarationError, "Undeclared identifier. Did you mean func?");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you enclose the suggestions in quotes - that is how other messages are? eg "func", "func1", "func2" or "func3"

@axic
Copy link
Contributor

axic commented Nov 17, 2017

@Balajiganapathi will you have some time before next week to check these changes out or should I push changes to this branch?

@Balajiganapathi
Copy link
Contributor Author

@axic had a busy week so could not work on it. Will make these changes over this weekend.
Also, can you tell me where exactly to put the stringWithinDistance in libdevcore? I am not able to decide.

@Balajiganapathi
Copy link
Contributor Author

None of the existing files in libdevcore seems to fit. Shall I make a new StringUtils source file for this?

@axic
Copy link
Contributor

axic commented Nov 17, 2017

Shall I make a new StringUtils source file for this?

I think it sounds fine, could be more specific too if you find a good name (something along the lines of string similarity).

Please add a specific test for the distance module in test/libdevcore (just name the file the same as the implementation file name).

@Balajiganapathi
Copy link
Contributor Author

Done @axic

@axic axic force-pushed the alternative_scope branch 4 times, most recently from 960e853 to 22c7b6c Compare November 21, 2017 11:01
std::map<ASTString, std::vector<Declaration const*>> m_declarations;
std::map<ASTString, std::vector<Declaration const*>> m_invisibleDeclarations;

static size_t const MAXIMUM_DISTANCE = 2;
Copy link
Contributor

@axic axic Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully sold we should have this constant here, but if we do the name should reflect what it does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it should just be a constant inside the similarNames function.

@axic axic force-pushed the alternative_scope branch from 22c7b6c to a32ad20 Compare November 21, 2017 11:04
@axic
Copy link
Contributor

axic commented Nov 21, 2017

Rebased and made some tiny updates to the branch.

@axic axic force-pushed the alternative_scope branch from a32ad20 to bc799b1 Compare November 21, 2017 11:06
using namespace std;
using namespace dev;

namespace dev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Please don't use namespace regions in cpp files.

size_t n1 = _name1.size(), n2 = _name2.size();
vector<vector<size_t>> dp(n1 + 1, vector<size_t>(n2 + 1));

// In this dp formulation of Damerau–Levenshtein distance we are assuming that the strings are 1-based to make base case storage easier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems rather complicated compared to other implementations: https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C.2B.2B

Especially the two-dimensional vector could be rather heap-heavy, especially given that this routine will be called on every identifier that is in scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the license on the wikibook examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriseth the linked example does not consider transpositions. e.g. for "test", "tset" would be considered as distance 2 (i.e. 2 substitution). I think this is a common typo so we should consider these as 1 distance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Text is available under the Creative Commons Attribution-ShareAlike License.; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy."

but I'm not sure copyright and licensing actually applies to such small snippets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriseth @axic

About the implementation, the implementation currently uses O(n1.n2) space. I will optimize it to O(n1) (or O(n2)) space. It will be different from the one in the wiki link as we will be accessing previous 2 entries. So will need a vector of size (3 * n).

I can also think of another (but ugly) optimization to avoid creating a vector of 3.n1 everytime. Fix the unidentified identifier as s1, so n1 will be same over all calls to the function. So, we can create a 3 * n1 vector once and pass it to the function.
I think we should do the 1st optimization but avoid the 2nd as it would affect generalization of the string distance function.

PS: I will do this and all the other small changes suggested here in a few hours.

}
}

size_t distance = dp[n1][n2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you factor out the distance calculation into its own function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is a good idea. I will:

  1. Only keep a distance function in StringUtils and remove this one.
  2. Compare the distance with MIN_EDIT_DISTANCE constant that will be declared in similarNames function.


if (m_enclosingContainer)
{
vector<ASTString> enclosingSimilar = m_enclosingContainer->similarNames(_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an operator overload that allows similar += m_enclosingContainer->similarNames(_name);

return result;
}

string NameAndTypeResolver::similarNameSuggestions(ASTString const& _name) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a function in StringUtils called something like quotedAlternativesList.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


string NameAndTypeResolver::similarNameSuggestions(ASTString const& _name) const
{
return quotedAlternativesList(m_currentScope->similarNames(_name));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this has been split into utils I think similarNameSuggestions should only return a vector and the formatting of the contents should be in ReferencesResolver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the quotedAlternativeList part to the error message in ReferencesResolver. This method could be also placed into the header since it is a single line (though it is not necessary).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation.

if (declarations.empty())
{
string const& suggestions = m_resolver.similarNameSuggestions(_identifier.name());
string const& suggestions = quotedAlternativesList(m_resolver.similarNameSuggestions(_identifier.name()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot store this as a const&. I actually wonder why the compiler does not complain here...

@chriseth
Copy link
Contributor

Please use rebasing instead of merging, since it is impossible to review a merge commit.

@axic
Copy link
Contributor

axic commented Dec 11, 2017

Rebased for review.

@axic
Copy link
Contributor

axic commented Feb 9, 2018

@chriseth oh I think actually this was waiting for you to review. Can you review the updates? It shouldn't be too hard to finish this one.


size_t dev::stringDistance(string const& _str1, string const& _str2)
{
size_t n1 = _str1.size(), n2 = _str2.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only one declaration per line.

@chriseth
Copy link
Contributor

chriseth commented Feb 9, 2018

Fine part from minor things. I will rebase and fix those.

@chriseth
Copy link
Contributor

chriseth commented Feb 9, 2018

@axic if tests are green, you can merge at will

@chriseth chriseth force-pushed the alternative_scope branch 2 times, most recently from a2d93ce to 8df768d Compare February 13, 2018 11:50
@chriseth
Copy link
Contributor

Circle somehow fails since it cannot fetch the commit. Since travis is green, I would ignore it.

@axic axic force-pushed the alternative_scope branch from 8df768d to 31c3dd1 Compare February 13, 2018 15:01
@axic axic force-pushed the alternative_scope branch from 31c3dd1 to 1dcd7c5 Compare February 13, 2018 15:04
@axic
Copy link
Contributor

axic commented Feb 13, 2018

Circle somehow fails since it cannot fetch the commit. Since travis is green, I would ignore it.

I guess building from external pull requests doesn't properly work with circleci

@chriseth chriseth merged commit f84b2c4 into argotorg:develop Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants