Skip to content

fix: improve textDocument/definition ordering #2792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## Unreleased
<!-- Add all new changes here. They will be moved under a version at release -->
* `FIX` Improved ordering of results for `textDocument/definition`.
* `FIX` Error `attempt to index a nil value` when `Lua.hint.semicolon == 'All'` [#2788](https://github.com/LuaLS/lua-language-server/issues/2788)

## 3.10.3
Expand Down
66 changes: 62 additions & 4 deletions script/core/definition.lua
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,76 @@ local rpath = require 'workspace.require-path'
local jumpSource = require 'core.jump-source'
local wssymbol = require 'core.workspace-symbol'

local function sortResults(results)
--- @param s string
--- @return string[]
local function split(s)
local r = {}
s:gsub('[^/]+', function (w)
r[#r+1] = w:gsub("~1", "/"):gsub("~0", "~")
end)
return r
end

--- Returns the Levenshtein distance between the two given string arrays
--- @param a string[]
--- @param b string[]
--- @return number
local function levenshteinDistance(a, b)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, your requirement is to find the length of the common prefix of two strings. Why did you use such a complex algorithm?
Also, considering the application scenario, there's a good chance that two strings might be entirely identical, so a fast handling branch for this case should be implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not perfect, but is a lot of better than u1 < u2.

I'm happy to optimise this more, but keep in mind results <= 2, so there is little benefit in optimizing this too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, your requirement is to find the length of the common prefix of two strings. Why did you use such a complex algorithm?

This is a standard algorithm and implementation for finding the differences between two strings. I adapted it slightly to work on lists.

Copy link
Contributor

@tomlau10 tomlau10 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your requirement is to find the length of the common prefix of two strings

This just wakes me up 😮. Although this is a standard algorithm for finding diff between 2 strings, this is not the same as finding common prefix, and I think this algorithm is not suitable for the current situation. Lets me use some test codes:

local currentFile = "/path/to/some/project/current/file/directroy/test.lua"
local uris = {
    "/path/to/some/project/current/file/directroy/test.lua",
    "/path/to/some/project/current/file/directroy/file.lua",
    "/path/to/some/project/current/file/dir2/test.lua",
    "/path/to/some/project/current/file/other/file.lua",
    "/path2/to/some/project/current/file/directroy/test.lua",
}

for i, uri in ipairs(uris) do
    print(i, ("%.6f"):format(pathSimilarityRatio(currentFile, uri)), uri)
end
  • this prints
1	0.000000	/path/to/some/project/current/file/directroy/test.lua
2	0.250000	/path/to/some/project/current/file/directroy/file.lua
3	0.250000	/path/to/some/project/current/file/dir2/test.lua
4	0.375000	/path/to/some/project/current/file/other/file.lua
5	0.250000	/path2/to/some/project/current/file/directroy/test.lua
  • 1 has a distance of 0.0 => expected ✅
  • but 2, 3, 5 all have the same distance 0.25 => I think this is not we want? ❌
  • the example 5 is the most problematic, it's difference starts with the root path, but still has the same score as 2 / 3 => I don't think this is a desirable behavior either ❌
  • definitely we prefer 2 more, because it has greater length of common prefixes

This is not about whether the algorithm is standard or not, it's just unsuitable for this situation. And if all we want is to find common prefix, we need not to use this algorithm with high complexity (just unnecessary, even though #results is 95% time <= 2).

One more edge case, when the distance score (or the common prefix length) is the same, I guess they should be ordered alphabetically? Otherwise they may be in random orders each time when we check for definitions. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common prefix isn't always suitable.

The aim wasn't to provide something that's perfect. It was to provide something that was an improvement over < which isn't suitable in any way.

Please let's stop trying to find 0.0001% edge cases, it just isn't worth the effort.

Before the ordering for me was always 0% correct as it always favoured the builtin, and now it's 99% correct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you carefully consider whether you want to prioritize files in the nearby directory or files with similar paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 😅 😅

The common prefix isn't always suitable.

Can you provide some examples where common prefix is not suitable?
By sorting with greatest common prefix length, it can solve the examples you provided in your initial description as well.
And if common prefix isn't always suitable, then I think the Levenshtein distance isn't always suitable either. And I think it is worse than common prefix in a few major aspects:

  • the code complexity for future maintenance
  • the time complexity of the algorithm
  • cannot handle the cases that I provided (this can be countered by providing examples that common prefix cannot handle)

As you have said, the aim wasn't to provide something that's perfect. I agree. But then when both common prefix and Levenshtein distance are not perfect, why not use the simpler one instead of a complex one? This is just "using a sledgehammer to crack a nut" (殺雞用牛刀). And even then the nut is not cracked perfectly 😂 .

Please let's stop trying to find 0.0001% edge cases, it just isn't worth the effort.

I don't agree that edge case I mentioned is a 0.0001% edge case. It's very often that, a value's definition is just one folder level above. The is related to the completeness of the logic. For any comparator function that are sorting records, most of the time we want to compare those major columns, and fallback to compare the id field as a last resort. In this case it is the u1 < u2 => this fallback comparison is always deterministic.

simularity_cache[u1] = simularity_cache[u1] or pathSimilarityRatio(uri, u1)
simularity_cache[u2] = simularity_cache[u2] or pathSimilarityRatio(uri, u2)
if simularity_cache[u1] ~= simularity_cache[u2] then
  return simularity_cache[u1] < simularity_cache[u2]
end
return u1 < u2

Of course I'm no maintainer and you may just ignore my suggestion if you disagree with me. 😅

Copy link
Contributor Author

@lewis6991 lewis6991 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use the simpler one instead of a complex one

Time. Which I have no more of for this pull request.

Spending more time on this, is unlikely to result in observable better functionality.

I'll just fix this on the client side.

local a_len, b_len = #a, #b
local matrix = {} --- @type integer[][]

-- Initialize the matrix
for i = 1, a_len + 1 do
matrix[i] = { [1] = i }
end

for j = 1, b_len + 1 do
matrix[1][j] = j
end

-- Compute the Levenshtein distance
for i = 1, a_len do
for j = 1, b_len do
local cost = (a[i] == b[j]) and 0 or 1
matrix[i + 1][j + 1] =
math.min(matrix[i][j + 1] + 1, matrix[i + 1][j] + 1, matrix[i][j] + cost)
end
end

-- Return the Levenshtein distance
return matrix[a_len + 1][b_len + 1]
end

--- @param path1 string
--- @param path2 string
--- @return number
local function pathSimilarityRatio(path1, path2)
if path1 == path2 then
return 0
end
local parts1 = split(path1)
local parts2 = split(path2)
local distance = levenshteinDistance(parts1, parts2)
return distance * 2 / (#parts1 + #parts2)
end

local function sortResults(results, uri)
-- 先按照顺序排序
-- Sort in order first
local simularity_cache = {} --- @type table<string,number>
table.sort(results, function (a, b)
local u1 = guide.getUri(a.target)
local u2 = guide.getUri(b.target)
if u1 == u2 then
return a.target.start < b.target.start
else
return u1 < u2
simularity_cache[u1] = simularity_cache[u1] or pathSimilarityRatio(uri, u1)
simularity_cache[u2] = simularity_cache[u2] or pathSimilarityRatio(uri, u2)
return simularity_cache[u1] < simularity_cache[u2]
end
end)
-- 如果2个结果处于嵌套状态,则取范围小的那个
-- If two results are nested, take the one with the smaller range
local lf, lu
for i = #results, 1, -1 do
local res = results[i].target
Expand Down Expand Up @@ -141,7 +199,7 @@ return function (uri, offset)
local results = {}
local uris = checkRequire(source)
if uris then
for i, uri in ipairs(uris) do
for _, uri in ipairs(uris) do
results[#results+1] = {
uri = uri,
source = source,
Expand Down Expand Up @@ -230,7 +288,7 @@ return function (uri, offset)
return nil
end

sortResults(results)
sortResults(results, uri)
jumpSource(results)

return results
Expand Down
Loading