Skip to content

Shortened generated "data-reactid" #934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 20, 2014
Merged

Shortened generated "data-reactid" #934

merged 1 commit into from
Jan 20, 2014

Conversation

syranide
Copy link
Contributor

Previously generated IDs looked like:

.r[0].[0].{x}[0]{y}.[17]

That has now been shortened to:

.0.0.$x:0:$y.g

The older ID is 60% longer by comparison (above), and this new ID is still surprisingly readable in my opinion. It's also worth mentioning that I use base-36 numerical indices instead to save further space, we could technically use base-62 but I don't it would be worth it. It is technically possible to avoid $ for keys but it would affect some ID functions that would have to check for both for an additional separator over the default . SEPARATOR, and I believe that keys are longer and less common anyway, so it would not be worth trade-off.

PS. The React Chrome extension doesn't even need to be updated for this PR.

Performance

I did a torture test of it, mounting and unmounting (the same) randomly generated tree of divs, I could measure between 3-9% performance increase depending on browser, shortening data-reactid also helped measurably. Interestingly enough, using reactid resulted in significantly poorer performance on at least IE8.

Size

While testing on the React implementation of TodoMVC, with a few items, the entire body innerHTML is 7545bytes and the IDs are a total of 3550bytes of it, a whole 47% of it is IDs and I'm not even including data-reactid. Using the shorter IDs, that same page is 6440bytes and the keys are 2406bytes. That is a difference of 1105bytes, so with the old IDs the HTML was 17% larger and the IDs themselves were 48% larger and that is with TodoMVC's large item keys.

Future improvements

There are even more efficient ways of cutting down the "ID bloat", but this is a very straight-forward implementation that should suffice for now, without being a head-ache. Apart from the refactoring/improvement below, it only modifies traverseAllChildren and no other React internals.

I also experimented quickly with a proof-of-concept implementation that stores the full IDs in memory instead, with only minimal reference ID generated for each node, definitely doable. Server-rendering and the chrome extension would have to be modified accordingly though.

Technical details

I should mention that traverseAllChildren now generate the IDs it returns with prefixed ., technically this is because the order of the keys is otherwise ruined. Practically, this means that the use of the separator is not as spread out as it was before. And this has no practical difference other than for tests on IDs, which are already affected.

   raw     gz Compared to master @ 8ca62bd022b535386da08b8d2ba59b951d3cfc2a
     =      = build/JSXTransformer.js
     =      = build/react-test.js
  +250    +43 build/react-with-addons.js
   +84     +7 build/react-with-addons.min.js
  +250    +44 build/react.js
   +84     +9 build/react.min.js

@@ -253,8 +253,8 @@ var ReactInstanceHandles = {
* @internal
*/
getReactRootIDFromNodeID: function(id) {
var regexResult = /\.r\[[^\]]+\]/.exec(id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this old regex test was more necessary when we were using the id attribute; whether this function returns anything is used by ReactMount.registerContainer in places to test if a React component is mounted into a container. Shouldn't be a big problem, just wanted to call it out explicitly. Maybe best to make sure this function returns null if the ID doesn't start with a dot.

@sophiebits
Copy link
Collaborator

Cool, would love to see benchmarks.

@syranide
Copy link
Contributor Author

@spicyj I've corrected all your points, thanks for the input!

Also, I recently updated the PR with information from my torture benchmark, between 3-9% depending on browser. What that actually is in real scenarios, who knows, but it's definitely something.

@@ -253,8 +253,11 @@ var ReactInstanceHandles = {
* @internal
*/
getReactRootIDFromNodeID: function(id) {
var regexResult = /\.r\[[^\]]+\]/.exec(id);
return regexResult && regexResult[0];
if (id.charAt(0) === SEPARATOR && id.length > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting passed null on some internal flows (when looking at window). Can you guard for it? if (id && ..)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, ofc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a unit test for extra win!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I submit under the influence of extra win!

But you're completely right (it's not second nature for me yet to add tests :)).

PS: It's added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS2. Technically .. is valid now, but I'm not sure if it's worth guarding against? Or perhaps it actually is.

@vjeux
Copy link
Contributor

vjeux commented Jan 20, 2014

"I did a torture test of it". Can you also put that code somewhere? It would be really useful to track it over time and for the next time we want to modify the ids

@petehunt
Copy link
Contributor

kk, pulling this in

@syranide
Copy link
Contributor Author

@petehunt Cool!

@vjeux Of course! https://gist.github.com/syranide/aa3ebeb7e235d8c5e5fd

I just copy-pasted my crappy code, and it benchmarks it by continuously printing to the console. So it's probably not super usable as-is. If there's something specific you have in-mind, just shout and I can probably have it fixed.

Also PS, I noticed that I had accidentally benchmarked with the unminified React, so the performance improvements may even be slightly higher.

petehunt added a commit that referenced this pull request Jan 20, 2014
Shortened generated "data-reactid"
@petehunt petehunt merged commit e3248ef into facebook:master Jan 20, 2014
@syranide syranide deleted the minid branch January 20, 2014 21:27
@sebmarkbage
Copy link
Collaborator

Sweet. If concatenation and string allocation is a major perf impact. How about we get rid of them completely?

We could stop creating global IDs using massive strings and use incremental numbers (UID) instead. Or find a cool DOM traversal algorithm that doesn't require IDs. There's a proposal for a live reconciler that doesn't need a flat list of children. The event bubbling could be much faster if we stored a parent reference on the virtual nodes (components) instead of walking the ID structure.

Dare to dream big, guys. :)

@syranide
Copy link
Contributor Author

@sebmarkbage I did a proof-of-concept for storing an incremental ID in the DOM and using an internal map for translating it, it's really easy to do, however it breaks your Chrome Dev Tools. It also breaks server-rendering and I'm not sure what the right approach would be there, the straight-forward approach would be to "translate" it on mount. This is a relatively straight-forward solution that has minimal core impact while generating next to minimal output. However, a lot of tests will also be significantly affected by this (they were all touched by this PR too), but most of them could be easily patched.

This PR was simply the result of being an obvious inefficiency that could be improved relative isolation.

From what I've heard, not pre-generating and thus not storing the ID in the DOM at all may make more sense for us soon.

@sebmarkbage
Copy link
Collaborator

We'll just update the devtools to fix it if that's the case.

Currently server-rendering starts with a shared rootID and multiple roots can be rendered individually and sharded. The UID could be on a per rootID basis. E.g. the first component in the root is 0, the first child 1 and so on. That way the initial count will always generate idempotent results. If the hash fails, it just remounts anyway.

I don't think we have a solution for hit testing and event tracking without IDs. We can get rid of IDs for getDOMNode and event bubbling. But identifying the first target of an event is difficult without IDs. Any ideas?

@syranide
Copy link
Contributor Author

@sebmarkbage While it may not be clean, it's possible to attach the ID directly to the node object and avoid the DOM, just prefix it properly and there will be no conflict. You get the O(1) property of an absolute ID, or O(depth) if you want to use shorter relative IDs... which may make sense since this is probably not a hotpath anyway and perhaps we want to traverse it regardless? Alternatively we could make the root node(s) of each component have an absolute ID for a good balance. Btw, this comes with having to do a full traversal of all creates nodes to "inject" it.

Another possibility that has virtually zero initialization cost, is an O(depth * breadth) search by traversing up to the nearest known root (could be the owner component), and then searching all of the child components to find the one that owns the current node, and doing that until you manage to return to source node. You just have to hope that no-one creates a huge-ass table and attach the event handlers in the table (as opposed to the item components).

If it's only events we are interested in this for, we could just skip the IDs for everything but (ascendants of) nodes with event handlers. That should significantly cut the impact of the IDs, while not requiring significant core changes.

@petehunt
Copy link
Contributor

I don't think monotonic IDs will break server rendering as long as separate counters are maintained per root. Since the apps initialize identically on server and client they will map the same IDs.

@plievone
Copy link
Contributor

@syranide There are a few places in codebase where the previous ID structure is documented in comments, might want to grep for those?

@syranide
Copy link
Contributor Author

@plievone I'll take a look.

@syranide
Copy link
Contributor Author

@sebmarkbage @petehunt I talked to someone earlier in the chat who seemed to have been somewhat stricken by React IDs exploding in size as it was quite nested, with performance suffering. It got me thinking a bit that it may be important to take this one step further at least, for now.

I see one of two solutions:

  1. Use incremental IDs. It merely puts outrageously long IDs where you can't see them. Which actually goes a long way. Unless we garbage collect IDs (very simple), they'll increase slightly in size over time, but it's survivable 62 ^ 6 = 56.800.235.584. Still O(1), but memory is roughly O(depth * width).
  2. Use relative IDs. No more explosions because of nesting, React IDs are basically as small as they can be without having an internal translation for them. However, we can't use the node cache then and we'll always traverse up to the root O(depth). The biggest issue with this I think is that DOM-components rely on React ID to find their DOM-node, which would require doing a full traverse immediately after creation, but I think we kind of already do this? Other than that, only events really rely on IDs right? This shouldn't be too costly for that I think.

The first one should be a relatively quick fix, and I wonder if it isn't the preferable solution regardless (for now at least). Replace the internal full concatenated IDs with something relative and you have the benefits of both. With virtually none of the down-sides. Sounds like a plan?

@sophiebits
Copy link
Collaborator

I was thinking about 2 recently. If building the IDs is really a significant cost then I think it should help significantly.

@syranide
Copy link
Contributor Author

@spicyj Building them is not costly, them becoming a significant part of the generated HTML makes insertions slower.

@benjamn
Copy link
Contributor

benjamn commented Jan 28, 2014

subjective 👍 to option 1 over option 2

@sebmarkbage
Copy link
Collaborator

@syranide For option 2, what's your solution for operations that currently go from node reference to instance reference? E.g. event propagation and the second argument to renderComponent.

For option 1, I care about allowing the ability to generate these nodes in parallel. At the very least, different roots should be able to spread to different web servers for server-side rendering.

Ideally I'd like to be able to spread the whole tree reconciliation using parallelization but could probably live with just the root requirement since there are other concerns for parallelization.

@syranide
Copy link
Contributor Author

@sebmarkbage 2 was simply a logical alternative, and given my current understanding of React I'd say it's a high-risk change with no major advantages over 1 (relatively), which would be very isolated changes in comparison.

I haven't actually tested or looked into server-rendering for React, but to me, I intuitively see two solutions. The simplest being (I guess) that the server renders them with relative IDs (or concatenated IDs if you want the half-solution), the client then simply traverses it after insertion, constructing the internal IDs and replacing the DOM ID with an incremental. The other being to simply not generate IDs during server-rendering, I don't see why we need them technically, but there are probably practical constraints that makes this harder than it seems.

Client-side incremental IDs + server-side relative IDs + client-side reconstruction = relatively small isolated changes it seems, and it should be quite simple to make the internal representation relative (at a slight performance penalty), and it would support server-side parallelization.

Or better yet, going by what @petehunt said above, if we just rootIndex-prefix all incremental IDs, server-rendering should take care of itself it seems? You can't do sub-root parallelization then as-is, but that sounds more like a fantasy rather than actually being practical right now? Client- and server-side rootIndex-prefixed incremental IDs = very small isolated changes.

PS. A third (simple) possibility is for the server to generate some serialized representation of all IDs as an attribute to the root-node, which the client can use to reconstruct everything. Could result in less output, but GZIP would probably make it even anyway, performance might improve measurably. But I'm not sure how safe I feel with the idea of having a 10+ kilobyte large attribute. However, if @petehunt's suggestion work, then that seems like a much better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants