Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation database guessing #232

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Hylen
Copy link
Contributor

@Hylen Hylen commented Oct 3, 2015

No description provided.

Also added error checking when parsing the database file.
Even recent versions of stdlibc++ doesn't mark the move constructor of
std::string as noexcept. Thus CompileCommands move operations cannot be
noexcept and copy operations must be supported for CompileCommands to be
used with standard contrainers.
Added a guessing algorithm based on the basename of the file. If there
is no file with the same basename, a file with similar directory and
filename is sought after.
@Hylen
Copy link
Contributor Author

Hylen commented Oct 3, 2015

Here is the guessing, as you asked for. I added boost filesystem to be able to produce PoC fast. If you don't want to add it as a dependency, I guess we'll have to re-implement whatever is needed for the guessing.

If it were up to me, I'd gladly use dependencies such as boost filesystem. The benefits are many, we would have less code to maintain, and it would be cross platform by default. I'd like to hear your motivation for not wanting to add such dependencies.

@Sarcasm
Copy link
Owner

Sarcasm commented Oct 4, 2015

Here is the guessing, as you asked for. I added boost filesystem to be able to produce PoC fast.

I don't think calcScore() is quite correct. You should probably return the number of directory component in the common prefix of the two file compared. There is no reason for /a/foo/bar/baz.cpp to have a better score against /b/foo/bar/qux.cpp than /a/fu.cpp. I'm may have misunderstood the algo.

For the levenstein distance, as much as I would like to have a use for this algorithm, I don't think it is much relevant the way it is used. Maybe it should work only on the filename and not the subpath before the filename.

The guessing algorithms does not look useful to me if headers and source file are in different sub-trees.

For example:

  • lib/Support/Mutex.cpp
  • include/llvm/Support/Mutex.h

If it were up to me, I'd gladly use dependencies such as boost filesystem. The benefits are many, we would have less code to maintain, and it would be cross platform by default. I'd like to hear your motivation for not wanting to add such dependencies.

I like the simplicity of install of irony-mode. Installing boost can be a pain (e.g: on Windows). For such a little program I don't think it is worth it. Also I think the guessing can be implemented as a simple python script, does not need to be part of irony-server. Python will have everything by default available, JSON, filesystem abstraction. People without python will still be able to use irony-mode. On the other hand, it would be nice to have irony-mode be C++ only I guess.

@Hylen
Copy link
Contributor Author

Hylen commented Oct 4, 2015

I don't think calcScore() is quite correct. You should probably return the number of directory component in the common prefix of the two file compared. There is no reason for /a/foo/bar/baz.cpp to have a better score against /b/foo/bar/qux.cpp than /a/fu.cpp. I'm may have misunderstood the algo.

It is written like that for exactly the reason you mentioned:

The guessing algorithms does not look useful to me if headers and source file are in different sub-trees.

For example:

lib/Support/Mutex.cpp
include/llvm/Support/Mutex.h

If headers and sources are in different sub-trees, this example would still score 1 with the calcScore algorithm. That is why I calculate it backwards. I think it is less common to have headers and sources in the same source tree but not in the same directory. In this example, the guessing would be right from the stem comparison though.

I like the simplicity of install of irony-mode. Installing boost can be a pain (e.g: on Windows). For such a little program I don't think it is worth it. Also I think the guessing can be implemented as a simple python script, does not need to be part of irony-server. Python will have everything by default available, JSON, filesystem abstraction. People without python will still be able to use irony-mode. On the other hand, it would be nice to have irony-mode be C++ only I guess.

I agree that is an advantage. One solution is to make dependencies like this optional, and only use enable features that relies on them if boost (or whatever else is needed) is detected on the system.

@Hylen
Copy link
Contributor Author

Hylen commented Oct 4, 2015

For the levenstein distance, as much as I would like to have a use for this algorithm, I don't think it is much relevant the way it is used. Maybe it should work only on the filename and not the subpath before the filename.

I agree that levenshtein distance maybe isn't perfect here, maybe longest common substring would make more sense. It does work on only the filenames though, and I think it's probably good enough.

@Sarcasm
Copy link
Owner

Sarcasm commented Oct 4, 2015

lib/Support/Mutex.cpp
include/llvm/Support/Mutex.h

If headers and sources are in different sub-trees, this example would still score 1 with the calcScore algorithm. That is why I calculate it backwards. I think it is less common to have headers and sources in the same source tree but not in the same directory. In this example, the guessing would be right from the stem comparison though.

Interesting. I still think that the prefix matter.
For example:

/home/foo/llvm/include/llvm/Support/Mutex.h // srcPath

/home/foo/llvm/projects/libcxx/src/utility.cpp // score == 3 ?
/home/foo/llvm/include/llvm/lib/Support/Mutex.cpp // score == 1?

I agree that levenshtein distance maybe isn't perfect here, maybe longest common substring would make more sense. It does work on only the filenames though, and I think it's probably good enough.

Indeed, filename() I see filename now.
I guess we could also do the comparison without the file extension.

@Hylen
Copy link
Contributor Author

Hylen commented Oct 4, 2015

Interesting. I still think that the prefix matter.
For example:

/home/foo/llvm/include/llvm/Support/Mutex.h // srcPath

/home/foo/llvm/projects/libcxx/src/utility.cpp // score == 3 ?
/home/foo/llvm/include/llvm/lib/Support/Mutex.cpp // score == 1?

I'm counting backwards in calcDirScore, so it would be

/home/foo/llvm/include/llvm/Support/Mutex.h // srcPath

/home/foo/llvm/projects/libcxx/src/utility.cpp // score == 0
/home/foo/llvm/include/llvm/lib/Support/Mutex.cpp // score == 1

Indeed, filename() I see filename now.
I guess we could also do the comparison without the file extension.

Yes, that makes sense.

@Sarcasm
Copy link
Owner

Sarcasm commented Oct 4, 2015

Ok, I understand calcScore better now, it's only about the common suffix path indeed.

@Sarcasm
Copy link
Owner

Sarcasm commented Oct 29, 2015

@Hylen
Copy link
Contributor Author

Hylen commented Nov 1, 2015

Very interesting indeed. I will look into it when I have more time. Thanks for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants