Skip to content

Commit

Permalink
Make Query.query('exist') lookup faster by adding a filename cache
Browse files Browse the repository at this point in the history
Add a cache for `exists` queries. Currently, `exists` calls `git
ls-tree` and parses the result to check if a file exists. A single
call takes around 20-30 ms.

It only gets used by Makefile filters. Large Makefiles cause a filter to
make hundreds of these calls, causing filter processing to take
seconds.

Statistics on 20 HTTP requests on /linux/v6.11.6/source/MAINTAINERS:

                 without:     with:
    avg            1160        843
    median          951        790
    75th perc      1289        861
    95th perc      2452       1078
    max            2874       1749

The cache is stored inside Query, of which there is one instance per
request. We do not risk cache invalidation issues.

About memory usage: on Linux v6.9.4, the cache is 12MB.
  • Loading branch information
fstachura authored and tleb committed Nov 7, 2024
1 parent 81a1116 commit 7df3543
Showing 1 changed file with 12 additions and 11 deletions.
23 changes: 12 additions & 11 deletions elixir/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ def __init__(self, data_dir, repo_dir):
self.data_dir = data_dir
self.dts_comp_support = int(self.script('dts-comp'))
self.db = data.DB(data_dir, readonly=True, dtscomp=self.dts_comp_support)
self.file_cache = {}

def script(self, *args):
return script(*args, env=self.getEnv())
Expand Down Expand Up @@ -136,22 +137,22 @@ def query(self, cmd, *args):
return decode(self.script('get-type', version, path)).strip()

elif cmd == 'exist':

# Returns True if the requested file exists, otherwise returns False

version = args[0]
path = args[1]

dirname, filename = os.path.split(path)

entries = decode(self.script('get-dir', version, dirname)).split("\n")[:-1]
for entry in entries:
fname = entry.split(" ")[1]
if version not in self.file_cache:
version_cache = set()
last_dir = None
for _, path in self.db.vers.get(version).iter():
dirname, filename = os.path.split(path)
if dirname != last_dir:
last_dir = dirname
version_cache.add(dirname)
version_cache.add(path)

if fname == filename:
return True
self.file_cache[version] = version_cache

return False
return path.strip('/') in self.file_cache[version]

elif cmd == 'dir':

Expand Down

0 comments on commit 7df3543

Please sign in to comment.