-
Notifications
You must be signed in to change notification settings - Fork 232
Improve performance of serialize/deserialize by caching type information of fields in class #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here's my test result. The performance test is executed in WSL using the following command: $ time pytest --repeat 3000 betterproto/tests/test_inputs.py which repeats serializing and deserializing 3000 times in the test case after the data is loaded. Result:
|
This is pretty awesome. I assume you did some profiling and found that I've read through the entire diff but I'm just going to give another look for thoroughness |
if meta.group not in group_map["groups"]: | ||
group_map["groups"][meta.group] = {"current": None, "fields": set()} | ||
group_map["groups"][meta.group]["fields"].add(field) | ||
group_map.setdefault(meta.group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me grok this? is this the same result before and after, or does this dict have very different structure with this change?
From what I can tell, some keys are now removed. There is no concern for these groups being overwritten? or the possibility is not there because such behavior cannot occur in the protocol buffer that this will process. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, the group_map
is like:
{
"fields": {
"field1": "group1",
"field2": "group1",
},
"groups": {
"group1": {
"current": Field1,
"fields": set(Field1, Field2),
},
},
}
Within this dict, only current
is object scoped, others are all class metadata. So that I kept current
in this dict and moved others into ProtoClassMetadata
, as oneof_group_by_field
and oneof_field_by_group
. Please refer to method __setattr__
, L555 of changed __init__.py
file.
In this way, building the lookup table between fields and groups is executed once per class. And also, all the object s of the class share the same lookup table.
Now in the object, _group_map
indicates the current value of oneof group:
{
"group1": Field1,
}
__setattr__
will keep this updated
betterproto/__init__.py
Outdated
setattr(cls, "_betterproto", ProtoClassMetadata(cls)) | ||
|
||
|
||
def protoclass(*args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right in my understanding, that this causes users to have to use the created protoclass
instead of creating dataclasses?
However:
- It's auto-generated
- It behaves like a dataclass
So this doesn't seem to break anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something this looks like classes that inherit from betterproto.Message
with a regular dataclass decorator will break if the user upgrades due to the _betterproto attribute not being set.
Breaking changes may be OK if this comes as a major version bump, but would still be better to avoid it or at least try and handle it gracefully.
I suggest accessing _betterproto via a property that checks whether it is set, and degrades gracefully for dataclasses lacking the extra annotation, maybe with a warning message. If going this route then there should be a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protoclass
is a wrapper of dataclasses.dataclass
. When this decorator is applied, the dataclass
is still applied and the data class is still generated. The extra thing protoclass
does, is to call make_protoclass
to scan metadata of fields, and store in attribute _betterproto
of the generated class.
In this way, the scan operation is done once per class, instead of per object, or per method call.
So this change requires to replace @dataclasses.dataclass
with @betterproto.protoclass
in generated python files. Which means users need to rerun protoc
to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslan Thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the extra decorator is removed. The metadata is populated at the first access, instead of when data class is created.
Can you go through and update the README? I think that's the main source of docs at this time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm mistaken this looks like a breaking change, see other comment for details.
Yes the result of profiling shows that So this PR is to make it not call them on accessing each field since they will never change. And the next step is to remove invocations to |
ef00e72
to
d484405
Compare
I checked README and think that it needn't any change, since now this PR is fully transparent and backward compatible. |
d484405
to
7c29a54
Compare
Rebased to head of master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Don't forget to run black.
def test_message_can_instantiated(test_case_name: str) -> None: | ||
plugin_module = importlib.import_module(f"{plugin_output_package}.{test_case_name}.{test_case_name}") | ||
@pytest.fixture(scope="module", params=test_case_names) | ||
def test_data(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
betterproto/tests/test_inputs.py
Outdated
sys.path.remove(reference_module_root) | ||
|
||
|
||
def test_message_can_be_imported(test_data: TestData) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks even more pointless now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
f603e2f
to
ab7687b
Compare
Updated. Thanks! |
Performance is great! I've already prepared an update to #55 to be compatible with the changes done in |
Hey @nat-n ~ your last commit 5e2d9fe created merge conflicts for this Pr and mine. Unfortunately running black on the PR branches doesn't solve the conflicts because the lines can turn out to be different. I was able to work around it by reverting 5e2d9fe temporarily on a local copy of master, merging that copy into my branch, then rerunning black on the same files. you could do the same, @jameslan git branch # check that you are on your PR branch
git checkout master
git pull danielgtaylor master # pull from danielgtaylor repository, use your remote name
git checkout -b master_no_blacken
git revert 5e2d9febea2a2982b7cc372ef5f4b1b7d2233e79
git checkout - # go back to PR branch
git merge master_no_blacken
black betterproto/tests/test_inputs.py # rerun black on affected files (if needed)
git commit -am 'Blacken' |
…luate performance
Cached data include, - lookup table between groups and fields of "oneof" fields - default value creator of each field - type hint of each field
…s generated while deserializing.
… it is backward compatible.
ab7687b
to
917de09
Compare
Thanks! I just $ git rebase master
$ git checkout --theirs betterproto/tests/test_inputs.py
$ git add betterproto/tests/test_inputs.py
$ git rebase --continue To override the conflict. |
@jameslan Sorry about that. I did it to simplify some other merging, but should have been more cautious about touching so many lines. Maybe there should be build time check for black correctness to keep everything in sync after the current round of PRs. |
No description provided.