perf: Replace MessageToDict with optimized custom dict builder#6015
perf: Replace MessageToDict with optimized custom dict builder#6015abhijeet-dhumal wants to merge 1 commit intofeast-dev:masterfrom
Conversation
6a38f3c to
2b81774
Compare
2b81774 to
8fbda9f
Compare
8fbda9f to
f3ea953
Compare
06877fd to
063eeb2
Compare
063eeb2 to
d3c512d
Compare
Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
d3c512d to
c06a75d
Compare
| return dt.strftime("%Y-%m-%dT%H:%M:%S") + ( | ||
| ".%06dZ" % (ts.nanos // 1000) if ts.nanos else "Z" | ||
| ) |
There was a problem hiding this comment.
🟡 Timestamp nano-second formatting differs from MessageToDict output
The _timestamp_to_str function always emits 6 fractional-second digits (e.g., .500000Z) when nanos are present, but protobuf's MessageToDict uses variable precision: 3 digits if nanos % 1_000_000 == 0, 6 if nanos % 1_000 == 0, and 9 otherwise (stripping trailing zeros).
Root cause and impact
For Timestamp(seconds=1609459200, nanos=500000000):
- Old (
MessageToDict):"2021-01-01T00:00:00.500Z"(3 fractional digits) - New (
_timestamp_to_str):"2021-01-01T00:00:00.500000Z"(6 fractional digits)
The culprit is line 70 which unconditionally formats with 6 digits:
".%06dZ" % (ts.nanos // 1000)Protobuf's Timestamp JSON serializer (_TimestampMessageToJsonObject) uses adaptive precision:
- 0 fractional digits when
nanos == 0 - 3 digits when
nanos % 1_000_000 == 0 - 6 digits when
nanos % 1_000 == 0 - 9 digits otherwise
Since this function is a drop-in replacement for MessageToDict at the feature server API boundary (sdk/python/feast/feature_server.py:344), any downstream client performing exact string comparison or parsing that's sensitive to trailing zeros in timestamps will observe a behavioral change.
| return dt.strftime("%Y-%m-%dT%H:%M:%S") + ( | |
| ".%06dZ" % (ts.nanos // 1000) if ts.nanos else "Z" | |
| ) | |
| nanos = ts.nanos | |
| if nanos == 0: | |
| return dt.strftime("%Y-%m-%dT%H:%M:%S") + "Z" | |
| elif nanos % 1_000_000 == 0: | |
| return dt.strftime("%Y-%m-%dT%H:%M:%S") + ".%03dZ" % (nanos // 1_000_000) | |
| elif nanos % 1_000 == 0: | |
| return dt.strftime("%Y-%m-%dT%H:%M:%S") + ".%06dZ" % (nanos // 1_000) | |
| else: | |
| return dt.strftime("%Y-%m-%dT%H:%M:%S") + ".%09dZ" % nanos |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
but most timestamps have 0 nanos or millisecond precision right?
| return dt.strftime("%Y-%m-%dT%H:%M:%S") + ( | ||
| ".%06dZ" % (ts.nanos // 1000) if ts.nanos else "Z" | ||
| ) |
There was a problem hiding this comment.
🟡 _timestamp_to_str truncates nanoseconds to microseconds, losing precision vs. MessageToDict
The _timestamp_to_str function formats the fractional seconds using ".%06dZ" % (ts.nanos // 1000), which integer-divides nanos by 1000 and formats as 6 digits (microseconds). This silently truncates sub-microsecond precision and produces a different format than MessageToDict.
Root Cause and Impact
The format ".%06dZ" % (ts.nanos // 1000) at sdk/python/feast/feature_server_utils.py:70 does integer division by 1000 then formats as 6-digit microseconds. This causes two problems:
-
Sub-microsecond precision is lost: e.g.
nanos=123456789produces".123456Z"instead of the correct".123456789Z"(the789nanoseconds are silently dropped).nanos=100produces".000000Z"— the value is entirely lost. -
Trailing zeros differ: e.g.
nanos=500000000produces".500000Z"whereasMessageToDictproduces".500Z"(protobuf strips trailing zeros and uses 3/6/9-digit groups).
Comparison:
nanos=500000000 → fast: ".500000Z" MessageToDict: ".500Z"
nanos=123456789 → fast: ".123456Z" MessageToDict: ".123456789Z"
nanos=100 → fast: ".000000Z" MessageToDict: ".000000100Z"
This breaks the documented contract that the function "matches the output format of MessageToDict with proto_json.patch() applied" (sdk/python/feast/feature_server_utils.py:3). Downstream clients parsing event_timestamps from the REST API (e.g. sdk/python/feast/infra/online_stores/remote.py:496) would receive timestamps with degraded or incorrect precision.
| return dt.strftime("%Y-%m-%dT%H:%M:%S") + ( | |
| ".%06dZ" % (ts.nanos // 1000) if ts.nanos else "Z" | |
| ) | |
| if ts.nanos % 1000000000 == 0: | |
| frac = "Z" | |
| elif ts.nanos % 1000000 == 0: | |
| frac = ".%03dZ" % (ts.nanos // 1000000) | |
| elif ts.nanos % 1000 == 0: | |
| frac = ".%06dZ" % (ts.nanos // 1000) | |
| else: | |
| frac = ".%09dZ" % ts.nanos | |
| return dt.strftime("%Y-%m-%dT%H:%M:%S") + frac |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Fixes #6013
The feature server uses
google.protobuf.json_format.MessageToDictfor convertingGetOnlineFeaturesResponseto JSON, but this generic converter is slow due to reflection-based field introspection.Changes
feature_server_utils.pywithresponse_to_dict_fast()functionGetOnlineFeaturesResponsestructurePerformance Impact
~4x faster serialization, saving 5-15ms per request for typical feature responses with 50-200 features.
Test Plan
test_feature_server_utils.pyMessageToDictformatmake lint-python