Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML search: sanitise 'searchindex.js' contents #13099

Conversation

jayaddison
Copy link
Contributor

Feature or Bugfix

  • Feature / Refactoring

Purpose

  • Make the search index datastructure immutable, and remove JavaScript object prototype references from it, to reduce the risk of user-controlled input accessing or modifying unrelated runtime state.

Detail

  • Remove the JavaScript Object prototype from most objects (with the exception of Array objects, where the prototype functions are useful).
  • Freeze the resulting search index entries (this does also prevent modifications to the previously mentioned Array objects).

Relates

  * Remove the JavaScript `Object` prototype from most objects (with
    the exception of `Array` objects, where the prototype functions
    are useful).

  * Freeze the resulting search index entries (this does also
    prevent modifications to the previously mentioned `Array`
    objects).

Note: some JavaScript built-in types such as `Set` and `Map` permit
modifications to their contents even after they are frozen.  However,
they should not appear in the index contents since it is represented
as JSON by the Python code in the `sphinx.search` module.
@jayaddison jayaddison added html search javascript Pull requests that update Javascript code labels Nov 3, 2024
As a side-effect, this introduces a limitation: the argument passed
to the initial `sanitiseRecursive` call must be an `Object` itself.
In particular, relocates the `Array.isArray` conditional clause
to reduce the recursion depth by one level.
@jayaddison jayaddison added the python Pull requests that update Python code label Nov 4, 2024
tests/test_search.py Outdated Show resolved Hide resolved
@jayaddison

This comment was marked as outdated.

@jayaddison
Copy link
Contributor Author

I have to be honest: I'm not sure that I can achieve this realistically within reasonable searchindex.js size, page load performance, and code complexity bounds.

I'd be reluctant to close/leave an attempt to implement some (what I consider) safety improvements unfinished -- so in fact I probably will work more on this during the week -- but I have doubts about whether it can be completed in a satisfying way.

@jayaddison jayaddison removed the python Pull requests that update Python code label Nov 5, 2024
@jayaddison
Copy link
Contributor Author

The most recent two performance traces I've run using Firefox and Chromium respectively put the traced duration for setIndex at ~76ms and ~72ms, respectively. That's for the Sphinx self-built documentation, but with a recent copy of the Python documentation searchindex.js file artifically substituted-in to provide a benchmark workload.

I don't have any further changes planned on this branch currently.

@jayaddison
Copy link
Contributor Author

Despite the fact that the performance overhead introduced by these changes may not affect the perceptible time-to-appearance of search result display for Sphinx HTML project builds, I'm going to close this because that overhead, on aggregate, I think would add a nontrivial resource cost. If the change eliminated a category of security problems, then perhaps that would be worthwhile -- but in this case, we don't have a known problem to resolve.

I will hold the related issue #13098 open, because I think in future there may be methods (most likely Records and Tuples) to achieve search index immutability with minimal runtime performance cost (or perhaps even benefits, if resulting compiler/bytecode optimizations become possible thanks to the immutability).

@jayaddison jayaddison closed this Nov 6, 2024
@jayaddison jayaddison deleted the issue-13098/html-search-index-sanitisation branch November 6, 2024 13:26
@jayaddison
Copy link
Contributor Author

(if for some reason the code from this branch does prove useful in future, feel free to continue on from it. in particular I would also recommend considering removing the prototype from Array objects if doing so)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
html search javascript Pull requests that update Javascript code type:performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTML search: make the search index immutable at load-time
1 participant