Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Material 5.0.0 Beta 1 #1465

Closed
6 tasks done
squidfunk opened this issue Feb 17, 2020 · 47 comments
Closed
6 tasks done

Material 5.0.0 Beta 1 #1465

squidfunk opened this issue Feb 17, 2020 · 47 comments

Comments

@squidfunk
Copy link
Owner

squidfunk commented Feb 17, 2020

This thread is meant for feedback on the first beta release of Material 5.0. Please post any issues or errors encountered during setup and/or migration.

The first beta focuses on the rewrite of the underlying JavaScript code to a new, more modern architecture based on TypeScript and RxJS. However, it also provides some new features.

The probably biggest feature is the new search functionality which was completely rewritten and is now running inside a web worker. Furthermore, it supports prebuilt indexes (albeit this is not recommended). Previously, the search index was built when the search was focussed for the first time. Sometimes this led to lags and the UI freezing, as the search index needed to be constructed. The construction is now done upon page load inside a web worker.

Additionally, most of lunr's query syntax is now supported, e.g.:

color -primary +accent

Fixed issues

Installation

Install the beta via pip:

pip install mkdocs-material>=5.0.0b1

Migration

Material 5.0 is mostly downward compatible but includes some breaking changes. Following is a list of changes that need to be made to your mkdocs.yml. Note that you only need to adjust the values if you defined them.

Search

Search is now configured as part of the search plugin configuration.

Material 4.x:

extra:
  search:
    language: 'en, de, ru'
    tokenizer: '[\s\-\.]+'

Material 5.x:

plugins:
  - search:
      separator: '[\s\-\.]+'
      lang:
        - en
        - de
        - ru

Social links

Font Awesome was updated to the latest version and is now provided via inline SVGs which reduces the overall footprint. To reference an icon, reference its path from the top-level .fontawesome directory which is distributed with the theme without the .svg at the end.

Material 4.x:

extra:
  social:
    - type: 'github'
      link: 'https://github.com/squidfunk'
    - type: 'twitter'
      link: 'https://twitter.com/squidfunk'
    - type: 'linkedin'
      link: 'https://www.linkedin.com/in/squidfunk'

Material 5.x:

extra:
  social:
    - icon: brands/github-alt
      link: https://github.com/squidfunk
    - icon: brands/twitter
      link: https://twitter.com/squidfunk
    - icon: brands/linkedin
      link: https://www.linkedin.com/in/squidfunk/

Note that mkdocs build will now terminate with an error if an invalid icon is referenced.

Templates

Note that some of the templates changed. If you extended the theme by overriding blocks or partials, you might have to adjust the HTML of your overrides. We're working on a list of changes, but for now you can use the diff functionality of GitHub.

@wilhelmer
Copy link
Contributor

wilhelmer commented Feb 18, 2020

Shouldn't the search index be written to localStorage? Doesn't work for me, localStorage stays empty after performing a search.

@squidfunk
Copy link
Owner Author

squidfunk commented Feb 18, 2020

I postponed that feature to the next beta due to other priorities in refactoring. Sorry for the inconvenience.

The main motivation behind this beta version is to find out whether things break downstream. It’s an entirely new code base.

@wilhelmer
Copy link
Contributor

Ah okay, thanks for the info. Maybe a "Known Issues" list would help for things like that.

My main concern with v4 is the search performance, so I tested that first. Currently, there isn't much change in performance, but I know it willl be dramatically better once search index persistance is implemented.

@squidfunk
Copy link
Owner Author

@wilhelmer actually search should be better than in v4 already. I don’t know what you precisely mean when speaking of performance, as the search query performance should not have changed. What did change is the delay upon first focusing the search when the index is built. That’s now completely gone, as the index is built after the page has loaded. Persisting the index will only cut down on setup time, which should already be barely noticeable.

Furthermore prebuilt indexes are now supported.

@wilhelmer
Copy link
Contributor

wilhelmer commented Feb 18, 2020

Yes, there is no freeze/delay anymore when opening search. However, if you immediately open search after page load (which is what I think many users will do), it still takes around 3-4 seconds on my very mediocre computer at work and with a ~8 MB search index before the search results appear.

I thought that with the local storage implementation, this 3-4 second delay will only occur on first visit, and from then on, search results are displayed immediately, since the index doesn't have be built anymore. Isn't that the idea?

@squidfunk
Copy link
Owner Author

I didn't know you have such a big search index. Great use case! Persisting the index in local storage will bring exactly as much improvement as reading a prebuilt index from browser cache. You can test it by enabling prebuilt indexes and ensuring that your webserver sends the correct cache headers, so your browser will cache search_index.json.

@squidfunk
Copy link
Owner Author

squidfunk commented Feb 18, 2020

However, also note that local storage only allows for 5 MB of storage. I implemented a worker that uses LZ-based compression to compress the index before writing it to local storage which, however, will again increase the TTI regarding search because it needs to be uncompressed. Furthermore if your raw search_index.json is 8 MB, the serialized index will be even larger. Compression will save around 30-40%. It will probably not be possible to write a index of this size to local storage.

A probably better approach is what I already prototyped and called instant loading - clicks on internal links are intercepted, directly loaded in a single XHR call and then injected into the existing DOM. This will allow to keep the index while traversing the docs. This however will not work with a local deployment, as XHR will not work.

@squidfunk
Copy link
Owner Author

I benchmarked the packer again on Chrome (macOS) with the search index of the official Material docs which should be pretty much within the mean in regard to its size.

Compression

Compressing the index which is done in a worker thread after it was built (and thus without disrupting the main thread):

Original Compressed Time
780kb 85kb 330ms
1560kb 155kb 660ms
3120kb 280kb 1500ms

I think it is rather safe to extrapolate the results. If the original serialized index is 8MB in size, it could be compressed to around 800kb in probably around 4 seconds. Maybe more on other browsers. The index must then be transferred back to the main thread and persisted in local storage which should probably add another second. This, however, is only done when the index could not be retrieved from local storage.

Decompression

Decompression is the actual thing we're interested in.

Original Compressed Time
780kb 85kb 75ms
1560kb 155kb 120ms
3120kb 280kb 170ms

Again, for a serialized index of 8MB in size we should be clocking in at around 500ms. While this looks pretty good, we have to account for the actual loading of the index which should be quite quick but definitely add some time + time for transfer to the main thread. However, I guess it could be under a second which should be pretty good.

I'll consider adding the packer back before the RC. I remembered the benchmarks to be worse, but it looks quite promising.

@squidfunk
Copy link
Owner Author

BTW, one problem which I have no solution for is the invalidation of the locally persisted index. I have no idea what we could use to determine whether the index needs to be rebuilt. What we could do is persist it in session storage, not local storage. This would ensure that it is purged in-between sessions but even that may not be enough.

@wilhelmer
Copy link
Contributor

Maybe hash the index when the site is built and compare the hash on each page load?

@Stanzilla
Copy link
Contributor

Could use a ServiceWorker for it, they invalidate after 24h or on demand

@squidfunk
Copy link
Owner Author

squidfunk commented Feb 18, 2020

Depending on the hash function this could be rather expensive, but maybe there's a cheap one for JavaScript, haven't checked. Furthermore we would need to do this on every page load, as the search_index.json is the source of truth. Circling around the subject, these open questions are the reason why I removed it from the first beta, as more testing needs to be done. Maybe somebody could draft up a repository with some random data which results in a big index, so we'd have something for testing. That would be of great help, actually.

@squidfunk
Copy link
Owner Author

Could use a ServiceWorker for it, they invalidate after 24h or on demand

The problem is to know when to invalidate. Caching is easy. To know when to purge the cache is not. I think ServiceWorkers won't be of much help.

@wilhelmer
Copy link
Contributor

Depending on the hash function this could be rather expensive, but maybe there's a cheap one for JavaScript, haven't checked.

There’s probably something I‘m missing here, but can’t you just hash it in Python on mkdocs build and store the hash in a file, preferably directly in search_index.json? And then just do a cheap string compare against, e.g., localStorage.searchIndexHash?

@squidfunk
Copy link
Owner Author

There’s probably something I‘m missing here, but can’t you just hash it in Python on mkdocs build and store the hash in a file, preferably directly in search_index.json? And then just do a cheap string compare against, e.g., localStorage.searchIndexHash?

This would mean changes to the search plugin, so nothing that Material can solve on its own. If the search plugin would provide a hash it’d be a simple string comparison, sure. However, this actually demands some extensive testing. As said, if somebody could draft up a repo with a very large search index, we could try to gain a better understanding whether this approach is really feasible and actually improves search for large docs. Currently it’s more or less a POC. We need more data.

@wilhelmer
Copy link
Contributor

Even if Material can't solve it on its own, it sounds to me like the best way to go. Maybe even develop a separate mkdocs-hash-search-index plugin only for that purpose. If the plugin is not installed, persist the index in session storage. Something like that.

Regarding big data, I can only provide you with our built site dir, not with the sources, since they contain corporate IP. Don't know if that helps.

@squidfunk
Copy link
Owner Author

@wilhelmer no, nothing potentially sensitive. Probably better to search for some collection of Markdown documents that are public domain which we can use.

In theory, we could use session storage (not local storage), as the index will invalidate after closing the window, though.

@wilhelmer
Copy link
Contributor

Maybe @waylan knows of a large, public domain MkDocs project.

@squidfunk
Copy link
Owner Author

Found this - the King James version of the bible. 66k words, the search_index.json is 8,2 MB:
https://github.com/arleym/kjv-markdown

@squidfunk
Copy link
Owner Author

squidfunk commented Feb 19, 2020

I benchmarked the search with the Bible Markdown. Note that index size is larger than the total size, as it is assumed that the download is gzipped (which is very important). The original sizes were 8.2MB and 22.5MB (prebuilt). Furthermore we assume fast 3G (7.2 Mbit/s) for download times:

Prebuilt Size (gz) Size (Index) Download Indexing Pack Unpack
no 2.3MB 14.3MB 2s 11s 12s 2s + ?
yes 4.3MB 14.3MB 5s 3s - -

We can deduce three strategies:

  • Default: Don't prebuild the index during build time, download it (2s) and index it (11s)
  • Prebuilt: Prebuild the index during build time, download it (5s) and load it (3s)
  • Cached: Don't prebuild the index during build time, download it (2s), index it (11s) and pack the index (12s) to save it in sessions storage. On a subsequent request, retrieve the index from session storage (2s) and load it (3s)

If we assume that cache headers are set, the search_index.json can be retrieved from the browser cache on a subsequent request. Thus, the total timings are:

Strategy Initial Subsequent
Default 13s 11s
Prebuilt 8s 3s
Cached 25s 5s

Honestly, I have to admit that I forgot about gzip when making my initial calculations. Packing the index on the server brings down the prebuilt index by 80% (4.3MB vs 22.5MB) and it can then be retrieved from browser cache on a subsequent request. The problem is that the gzip APIs are not exposed, so we can only compress the index ourselves in the browser which means we have to handle 14.3MB, which is insane. Furthermore, the user would have to stay for 25s on the first page he visits, or the index might not be persisted to session storage. Then, on a subsequent request, it must first be fetched from session storage (which is why the "?" is added for unpacking). I think we should abandon the idea of persisting the index to session storage. Let gzip and the browser cache do its job. However, keeping the search index across the session by hijacking all links and loading the content without a full page reload should be very beneficial. IMHO this is the strategy we should follow.

@wilhelmer
Copy link
Contributor

wilhelmer commented Feb 19, 2020

IMHO this is the strategy we should follow.

But this will be the death sentence for mkdocs-localsearch, right? Can I spare myself the work on the plugin then? 🤔

@squidfunk
Copy link
Owner Author

@wilhelmer this has nothing to do with local search. It was just meant to speed up on subsequent indexing. The problems that need to be solved for local search are those outlined in #1464:

  • Inlining the search index into the HTML (or keeping your current search-index.js strategy)
  • A web worker polyfill (i.e. pseudo web worker without XHR)

You could also fork the theme and implement a worker-less search. The whole search logic is completely decoupled from the web worker context. However, providing a web worker polyfill is probably more general and easier.

@wilhelmer
Copy link
Contributor

But you wrote:

A probably better approach is what I already prototyped and called instant loading - clicks on internal links are intercepted, directly loaded in a single XHR call and then injected into the existing DOM. This will allow to keep the index while traversing the docs. This however will not work with a local deployment, as XHR will not work.

@squidfunk
Copy link
Owner Author

squidfunk commented Feb 19, 2020

That is true. However, the gain from persisting the search index in local storage is practically non existent with gzip. It's probably best to just use the prebuilt_index which is supported as of v5.

@squidfunk
Copy link
Owner Author

squidfunk commented Feb 19, 2020

I'm sorry, but the packing-approach is really a dead-end. The benchmarks speak very clear numbers – it's worse than just using a prebuilt index, especially from the file protocol where transfer times are practically zero. Furthermore this so called instant loading feature will be optional and experimental. It will at all times be opt-in.

@wilhelmer
Copy link
Contributor

Furthermore this so called instant loading feature will be optional and experimental. It will at all times be opt-in.

Ahh that's good to know, thank you.

Burying the packing approach is no problem for me, good riddance 👋

@wilhelmer
Copy link
Contributor

Uh, prebuilt_index is indeed super fast. Can you elaborate on the gzip part? So for an optimized download size, I have to configure my server to gzip JSONs, is that correct? Like this, for Apache? https://tecadmin.net/enable-json-gzip-compression-apache/

@squidfunk
Copy link
Owner Author

squidfunk commented Feb 19, 2020

Jep, I would guess that for most web servers it's already enabled for all the common plain text formats (including JSON). You can check whether your web server already delivers your content gzipped by inspecting the network requests in Chrome:

Bildschirmfoto 2020-02-19 um 17 34 29

That's from Material v4 - the top number is the downloaded size, the bottom number the actual size. This indicates that the content was gzipped. Furthermore there should be a header present which tells the browser that the content was gzipped: content-encoding: gzip

@squidfunk
Copy link
Owner Author

squidfunk commented Feb 19, 2020

Alternatively one might also consider brotli, which has very good browser support. Just tried it on the 22.5MB search_index.json - while gzip compresses it down to 4.3MB, brotli even gets it down to 2.1MB, pretty insane.

@wilhelmer
Copy link
Contributor

Nope, my web server doesn't compress it. Will try to enable that tomorrow.
Thanks for the Chrome instructions. Note for others who want to check it: You have to enable "Use large request rows" in the Network Settings to display the requests as shown in the screenshot above.

@Nelyah
Copy link

Nelyah commented Feb 20, 2020

First, thank you for the work, this beta is great news!

I have noticed some behaviour however that might not be intended. This works with the new search.

site_name: Community Documentation
dev_addr: localhost:7777

repo_name: 'my/documentation'

theme:
  name: material
  palette:
    primary: 'indigo'
    accent: 'orange'
  feature:
    tabs: true

plugins:
  - search:
      prebuild_index: true
      separator: '[\s\-\.]+'
      lang: en

However adding the lines

repo_url: 'https://git.soundhound.com/terrier/documentation'
edit_uri: edit/master/

adds a link to the repository on the top right. This icon/link centres the search bar. In such a case, typing in the search bar doesn't seem to trigger the search. Everything works fine otherwise. Also, this option works with the previous release.

@squidfunk
Copy link
Owner Author

@Nelyah thanks for reporting! Unfortunately I cannot reproduce your problem. Does the console shown an error? How many pages do you have (maybe the search hasn't loaded yet)?

@squidfunk
Copy link
Owner Author

squidfunk commented Feb 20, 2020

@Nelyah nevermind, I found the error - it has nothing to do with search but with the repository setup. It's already fixed on the refactoring branch 😊I'll issue a new beta release today, so maybe it's easiest to wait and then try again. Sorry for the inconvenience!

@Nelyah
Copy link

Nelyah commented Feb 20, 2020

Awesome! Thank you for your work 😄
Super excited to see this come live!

@wilhelmer
Copy link
Contributor

When using prebuild_index with the current version of MkDocs (1.0.4), the index is built with a previous version of lunr. Search works, but lunrs query syntax isn't supported, and there's a console warning:

Version mismatch when loading serialised index. Current version of lunr '2.3.8' does not match serialized index '2.1.6'

This doesn't even work with MkDocs 1.1.dev0 from the master branch. The index version is higher, but still not high enough:

Version mismatch when loading serialised index. Current version of lunr '2.3.8' does not match serialized index '2.3.2'

@squidfunk
Copy link
Owner Author

Jep, also observed that. However, search seems to work as it's only a console warning.

@waylan can we upgrade lunr as part of MkDocs 1.1 to 2.3.8?

@wilhelmer
Copy link
Contributor

Yes, standard search queries work, but not extended queries like +foo +bar.

@squidfunk
Copy link
Owner Author

But that's only happening when prebuild_index is set to true? I would expect 2.3.2 -> 2.3.8 to only contain bugfixes, but maybe there's some other incompatibility issue.

@wilhelmer
Copy link
Contributor

Yep, only happening when prebuild_index is set to true.

@squidfunk
Copy link
Owner Author

Great, thanks for investigating. Maybe we could try and upgrade lunr.js directly in the MkDocs plugin to 2.3.8 to see if the issue vanishes. If it does, we should consider PR to MkDocs to upgrade to 2.3.8. It is bundled with the search plugin here.

@wilhelmer
Copy link
Contributor

Installed MkDocs 1.1.dev0, replaced lunr.js with version 2.3.8, built the site.

Result with prebuild_index to set true: Console warning disappears, but extended search still doesn't work properly. Search for +test +images returns 0 results, even though there's a "Test images" topic with the term "test images" all over the place.

Result with prebuild_index to set false: No console warning, and extended search works fine. Search for +test +images returns all expected results.

@squidfunk
Copy link
Owner Author

squidfunk commented Feb 20, 2020

Thanks for testing! Then it seems that this behavior doesn't have anything to do with the outdated version. Maybe the prebuilding logic has a bug.

EDIT: could also be related to a difference in the pipeline. Material disables some parts of the pipeline for some languages. The prebulding logic doesn't do that.

/* Remove stemmer, as it cripples search experience */
this.pipeline.reset()
if (pipeline.trimmer)
this.pipeline.add(lunr.trimmer)
if (pipeline.stopwords)
this.pipeline.add(lunr.stopWordFilter)

@facelessuser
Copy link
Contributor

Pymdown-Extensions 7.0b2 has been deployed. This creates a separation between CodeHilite and things line SuperFences and InlineHilite. For a long time now CodeHilite was not used for SuperFences, but we used the pymdownx.highlight extension, but we continued to sync CodeHilite options if Highlight was not explicitly used. This was to help transition completely from CodeHilite. This planned separation is now complete in 7.0b2. If you weren't specifying anything more than guess_lang: false, then you'll probably not notice anything. If you did a bit more, you may need to configure pymdownx.highlight instead.

CodeHilite will work along side Highlight in Material 5 betas, so there shouldn't be any issues there.

@squidfunk
Copy link
Owner Author

CodeHilite will work along side Highlight in Material 5 betas, so there shouldn't be any issues there.

Thanks for noting! I'll upgrade to 7.0b2 for the next beta (which should see the light of day in a few hours, hopefully).

@facelessuser
Copy link
Contributor

Thanks for noting! I'll upgrade to 7.0b2 for the next beta (which should see the light of day in a few hours, hopefully).

Sounded like you were close to doing another release. Wanted to try and squeeze this release in right before. Looks like I made it 🙂.

@waylan
Copy link
Contributor

waylan commented Feb 20, 2020

@waylan can we upgrade lunr as part of MkDocs 1.1 to 2.3.8?

Yes, that would be great. A PR is welcome.

By the way, note that the MkDocs 1.1 milestone only has one open issue currently, which is related to search. If we get that issue resolved, we should be ready for a release.

@squidfunk squidfunk mentioned this issue Feb 20, 2020
13 tasks
@squidfunk
Copy link
Owner Author

@waylan thanks, we'll definitely wait for MkDocs 1.1 then!

Closing this issue, let's continue with beta 2 in #1469!

@squidfunk squidfunk unpinned this issue Feb 20, 2020
@squidfunk squidfunk removed the needs help Issue needs help by other contributors label Feb 20, 2020
@squidfunk squidfunk mentioned this issue Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants