Skip to content

Commit 79409ca

Browse files
authored
feat(features) Use dataclasses for flagpole instead of pydantic (#75859)
These changes switch flagpole from using pydantic as the base classes to `dataclasses` from stdlib. During the pydantic 2 upgrade the time to parse features shot way up which was unexpected. We have also not been as impressed with feature flag match times and suspected that pydantic might be contributing overhead. The changes of this pull request re-implement flagpole with basic python dataclasses. The new implementation has reduced time to build feature flags, which should help improve the overall performance of our feature flagging. Improvements were measured both with cProfile, and local mini-benchmarks. (scripts provided below) ## cProfile results The cprofile script builds a feature 1000 times and collects profiling data from those operations. *current master (pydantic)* ``` 9004 function calls in 1.125 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.125 1.125 <string>:1(<module>) 1000 1.123 0.001 1.124 0.001 __init__.py:117(from_feature_dictionary) 1 0.001 0.001 1.125 1.125 flagpole-profile:21(main) 2000 0.001 0.000 0.001 0.000 typing.py:2424(get_origin) 1 0.000 0.000 1.125 1.125 {built-in method builtins.exec} 6000 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} ``` *after (dataclasses)* ``` 41004 function calls in 0.009 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.010 0.010 <string>:1(<module>) 3000 0.000 0.000 0.000 0.000 <string>:2(__init__) 1000 0.001 0.000 0.009 0.000 __init__.py:128(from_feature_dictionary) 1000 0.001 0.000 0.008 0.000 __init__.py:134(<listcomp>) 2000 0.002 0.000 0.004 0.000 conditions.py:193(_condition_from_dict) 3000 0.002 0.000 0.007 0.000 conditions.py:217(from_dict) 3000 0.000 0.000 0.004 0.000 conditions.py:219(<listcomp>) 2000 0.000 0.000 0.000 0.000 enum.py:1093(__new__) 2000 0.000 0.000 0.000 0.000 enum.py:1255(value) 2000 0.000 0.000 0.000 0.000 enum.py:193(__get__) 2000 0.000 0.000 0.001 0.000 enum.py:686(__call__) 1 0.000 0.000 0.010 0.010 flagpole-profile:21(main) 1 0.000 0.000 0.010 0.010 {built-in method builtins.exec} 1000 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 19000 0.001 0.000 0.001 0.000 {method 'get' of 'dict' objects} ``` While significantly more functions were called the overall runtime is *much* better. <details> <summary>flagpole-profile script</summary> ```python #!/usr/bin/env python from sentry.runner import configure from flagpole import Feature as FlagpoleFeature import cProfile configure() feature_names = ( # Largest feature flag is 2688 bytes "organizations:user-feedback-ui", # 404 bytes is around median "organizations:performance-chart-interpolation", # 97 bytes is smallest feature. "organizations:new-weekly-report", ) feature_config = {} def main(): feature_name = "organizations:user-feedback-ui" # feature_name = "organizations:performance-chart-interpolation" # feature_name = "organizations:new-weekly-report" for _ in range(1000): FlagpoleFeature.from_feature_dictionary(feature_name, feature_config[feature_name]) if __name__ == "__main__": from sentry import options for feature_name in feature_names: feature_config[feature_name] = options.get(f"feature.{feature_name}") cProfile.run('main()') ``` </details> ## Micro benchmark In the microbenchmark I looked at 3 feature flags (the max, approximate median and smallest features). Each feature would be parsed 10000 times and I looked at the min, mean and max duration for building a Feature object from dictionary data loaded from options seeded with the current feature flag inventory. *before (pydantic)* ``` Results for organizations:user-feedback-ui option load_time 0.0018911361694335938 build_time min 0.0004580021 build_time max 0.0011467934 build_time mean 0.0004902341 RSS memory usage 272400384 Results for organizations:performance-chart-interpolation option load_time 0.0030400753021240234 build_time min 0.0000336170 build_time max 0.0001301765 build_time mean 0.0000367536 RSS memory usage 272400384 Results for organizations:new-weekly-report option load_time 0.0022568702697753906 build_time min 0.0000057220 build_time max 0.0000231266 build_time mean 0.0000069423 RSS memory usage 272400384 ``` *after (dataclasses)* ``` Results for organizations:user-feedback-ui option load_time 0.0033750534057617188 build_time min 0.0000026226 build_time max 0.0000209808 build_time mean 0.0000032377 RSS memory usage 276054016 Results for organizations:performance-chart-interpolation option load_time 0.0033571720123291016 build_time min 0.0000016689 build_time max 0.0000610352 build_time mean 0.0000028541 RSS memory usage 276054016 Results for organizations:new-weekly-report option load_time 0.003008127212524414 build_time min 0.0000000000 build_time max 0.0000047684 build_time mean 0.0000007447 RSS memory usage 276070400 ``` <details> <summary>flagpole-timing script</summary> ```python #!/usr/bin/env python from sentry.runner import configure from flagpole import Feature as FlagpoleFeature import gc import statistics import time import psutil configure() def main(): from sentry import options gc.disable() feature_names = ( # Largest feature flag is 2688 bytes "organizations:user-feedback-ui", # 404 bytes is around median "organizations:performance-chart-interpolation", # 97 bytes is smallest feature. "organizations:new-weekly-report", ) for feature_name in feature_names: load_start = time.time() option_val = options.get(f"feature.{feature_name}") load_end = time.time() build_durations = [] for _ in range(0, 10000): build_start = time.time() FlagpoleFeature.from_feature_dictionary(feature_name, option_val) build_end = time.time() build_durations.append(build_end - build_start) load_time = load_end - load_start rss_mem = psutil.Process().memory_info().rss print("") print(f"Results for {feature_name}") print("") print(f"option load_time {load_time}") print("") print("build_time min", "{:.10f}".format(min(build_durations))) print("build_time max", "{:.10f}".format(max(build_durations))) print("build_time mean", "{:.10f}".format(statistics.mean(build_durations))) print("") print(f"RSS memory usage {rss_mem}") if __name__ == "__main__": main() ``` </details> Again we see significant improvements in runtime without any impact in memory usage. # What we lose? While moving to dataclasses gives us some gains in performance it has a few drawbacks: - The dataclass implementation isn't as typesound as pydantic would be. Each and every property is not validated for type correctness. - Updating the jsonschema will be manual going forward. Pydantic has a great integration with jsonschema that allows you to generate jsonschema documents from objects. Dataclasses do not. - Drift between the schema and implementation could occur. Because the jsonschema and code are not connected they could drift in the future.
1 parent f5c74ad commit 79409ca

File tree

7 files changed

+620
-256
lines changed

7 files changed

+620
-256
lines changed

src/flagpole/__init__.py

Lines changed: 54 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -61,12 +61,14 @@
6161
"""
6262
from __future__ import annotations
6363

64-
from datetime import datetime
64+
import dataclasses
65+
import functools
66+
import os
6567
from typing import Any
6668

69+
import jsonschema
6770
import orjson
6871
import yaml
69-
from pydantic import BaseModel, Field, ValidationError, constr
7072

7173
from flagpole.conditions import ConditionBase, Segment
7274
from flagpole.evaluation_context import ContextBuilder, EvaluationContext
@@ -76,26 +78,29 @@ class InvalidFeatureFlagConfiguration(Exception):
7678
pass
7779

7880

79-
class Feature(BaseModel):
80-
name: constr(min_length=1, to_lower=True) = Field( # type:ignore[valid-type]
81-
description="The feature name."
82-
)
81+
@functools.cache
82+
def load_json_schema() -> dict[str, Any]:
83+
path = os.path.join(os.path.dirname(__file__), "flagpole-schema.json")
84+
with open(path, "rb") as json_file:
85+
data = orjson.loads(json_file.read())
86+
return data
87+
88+
89+
@dataclasses.dataclass(frozen=True)
90+
class Feature:
91+
name: str
8392
"The feature name."
8493

85-
owner: constr(min_length=1) = Field( # type:ignore[valid-type]
86-
description="The owner of this feature. Either an email address or team name, preferably."
87-
)
94+
owner: str
8895
"The owner of this feature. Either an email address or team name, preferably."
8996

90-
segments: list[Segment] = Field(
91-
description="The list of segments to evaluate for the feature. An empty list will always evaluate to False."
92-
)
93-
"The list of segments to evaluate for the feature. An empty list will always evaluate to False."
94-
95-
enabled: bool = Field(default=True, description="Whether or not the feature is enabled.")
97+
enabled: bool = dataclasses.field(default=True)
9698
"Whether or not the feature is enabled."
9799

98-
created_at: datetime = Field(description="The datetime when this feature was created.")
100+
segments: list[Segment] = dataclasses.field(default_factory=list)
101+
"The list of segments to evaluate for the feature. An empty list will always evaluate to False."
102+
103+
created_at: str | None = None
99104
"The datetime when this feature was created."
100105

101106
def match(self, context: EvaluationContext) -> bool:
@@ -109,24 +114,40 @@ def match(self, context: EvaluationContext) -> bool:
109114

110115
return False
111116

112-
@classmethod
113-
def dump_schema_to_file(cls, file_path: str) -> None:
114-
with open(file_path, "w") as file:
115-
file.write(cls.schema_json(indent=2))
117+
def validate(self) -> bool:
118+
"""
119+
Validate a feature against the JSON schema.
120+
Will raise if the the current dict form a feature does not match the schema.
121+
"""
122+
dict_data = dataclasses.asdict(self)
123+
spec = load_json_schema()
124+
jsonschema.validate(dict_data, spec)
125+
126+
return True
116127

117128
@classmethod
118129
def from_feature_dictionary(cls, name: str, config_dict: dict[str, Any]) -> Feature:
130+
segment_data = config_dict.get("segments")
131+
if not isinstance(segment_data, list):
132+
raise InvalidFeatureFlagConfiguration("Feature has no segments defined")
119133
try:
120-
feature = cls(name=name, **config_dict)
121-
except ValidationError as exc:
122-
raise InvalidFeatureFlagConfiguration("Provided JSON is not a valid feature") from exc
134+
segments = [Segment.from_dict(segment) for segment in segment_data]
135+
feature = cls(
136+
name=name,
137+
owner=str(config_dict.get("owner", "")),
138+
enabled=bool(config_dict.get("enabled", True)),
139+
created_at=str(config_dict.get("created_at")),
140+
segments=segments,
141+
)
142+
except Exception as exc:
143+
raise InvalidFeatureFlagConfiguration(
144+
"Provided config_dict is not a valid feature"
145+
) from exc
123146

124147
return feature
125148

126149
@classmethod
127-
def from_feature_config_json(
128-
cls, name: str, config_json: str, context_builder: ContextBuilder | None = None
129-
) -> Feature:
150+
def from_feature_config_json(cls, name: str, config_json: str) -> Feature:
130151
try:
131152
config_data_dict = orjson.loads(config_json)
132153
except orjson.JSONDecodeError as decode_error:
@@ -135,6 +156,9 @@ def from_feature_config_json(
135156
if not isinstance(config_data_dict, dict):
136157
raise InvalidFeatureFlagConfiguration("Feature JSON is not a valid feature")
137158

159+
if not name:
160+
raise InvalidFeatureFlagConfiguration("Feature name is required")
161+
138162
return cls.from_feature_dictionary(name=name, config_dict=config_data_dict)
139163

140164
@classmethod
@@ -148,7 +172,7 @@ def from_bulk_json(cls, json: str) -> list[Feature]:
148172
return features
149173

150174
@classmethod
151-
def from_bulk_yaml(cls, yaml_str) -> list[Feature]:
175+
def from_bulk_yaml(cls, yaml_str: str) -> list[Feature]:
152176
features: list[Feature] = []
153177
parsed_yaml = yaml.safe_load(yaml_str)
154178
for feature, yaml_dict in parsed_yaml.items():
@@ -157,9 +181,9 @@ def from_bulk_yaml(cls, yaml_str) -> list[Feature]:
157181
return features
158182

159183
def to_dict(self) -> dict[str, Any]:
160-
json_dict = dict(orjson.loads(self.json()))
161-
json_dict.pop("name")
162-
return {self.name: json_dict}
184+
dict_data = dataclasses.asdict(self)
185+
dict_data.pop("name")
186+
return {self.name: dict_data}
163187

164188
def to_yaml_str(self) -> str:
165189
return yaml.dump(self.to_dict())

src/flagpole/conditions.py

Lines changed: 57 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1+
import dataclasses
12
from abc import abstractmethod
3+
from collections.abc import Mapping
24
from enum import Enum
3-
from typing import Annotated, Any, Literal, TypeVar
4-
5-
from pydantic import BaseModel, Field, StrictBool, StrictFloat, StrictInt, StrictStr, constr
5+
from typing import Any, Self, TypeVar
66

77
from flagpole.evaluation_context import EvaluationContext
88

@@ -48,20 +48,20 @@ def create_case_insensitive_set_from_list(values: list[T]) -> set[T]:
4848
return case_insensitive_set
4949

5050

51-
class ConditionBase(BaseModel):
52-
property: str = Field(description="The evaluation context property to match against.")
51+
@dataclasses.dataclass(frozen=True)
52+
class ConditionBase:
53+
property: str
5354
"""The evaluation context property to match against."""
5455

55-
operator: ConditionOperatorKind = Field(
56-
description="The operator to use when comparing the evaluation context property to the condition's value."
57-
)
58-
"""The operator to use when comparing the evaluation context property to the condition's value."""
59-
60-
value: Any = Field(
61-
description="The value to compare against the condition's evaluation context property."
62-
)
56+
value: Any
6357
"""The value to compare against the condition's evaluation context property."""
6458

59+
operator: str = dataclasses.field(default="")
60+
"""
61+
The name of the operator to use when comparing the evaluation context property to the condition's value.
62+
Values must be a valid ConditionOperatorKind.
63+
"""
64+
6565
def match(self, context: EvaluationContext, segment_name: str) -> bool:
6666
return self._operator_match(
6767
condition_property=context.get(self.property), segment_name=segment_name
@@ -99,12 +99,8 @@ def _evaluate_contains(self, condition_property: Any, segment_name: str) -> bool
9999

100100
return value in create_case_insensitive_set_from_list(condition_property)
101101

102-
def _evaluate_equals(
103-
self, condition_property: Any, segment_name: str, strict_validation: bool = False
104-
) -> bool:
105-
# Strict validation enforces that a property exists when used in an
106-
# equals condition
107-
if condition_property is None and not strict_validation:
102+
def _evaluate_equals(self, condition_property: Any, segment_name: str) -> bool:
103+
if condition_property is None:
108104
return False
109105

110106
if not isinstance(condition_property, type(self.value)):
@@ -121,33 +117,33 @@ def _evaluate_equals(
121117
return condition_property == self.value
122118

123119

124-
InOperatorValueTypes = list[StrictInt] | list[StrictFloat] | list[StrictStr]
120+
InOperatorValueTypes = list[int] | list[float] | list[str]
125121

126122

127123
class InCondition(ConditionBase):
128-
operator: Literal[ConditionOperatorKind.IN] = ConditionOperatorKind.IN
129124
value: InOperatorValueTypes
125+
operator: str = dataclasses.field(default="in")
130126

131127
def _operator_match(self, condition_property: Any, segment_name: str):
132128
return self._evaluate_in(condition_property=condition_property, segment_name=segment_name)
133129

134130

135131
class NotInCondition(ConditionBase):
136-
operator: Literal[ConditionOperatorKind.NOT_IN] = ConditionOperatorKind.NOT_IN
137132
value: InOperatorValueTypes
133+
operator: str = dataclasses.field(default="not_in")
138134

139135
def _operator_match(self, condition_property: Any, segment_name: str):
140136
return not self._evaluate_in(
141137
condition_property=condition_property, segment_name=segment_name
142138
)
143139

144140

145-
ContainsOperatorValueTypes = StrictInt | StrictStr | StrictFloat
141+
ContainsOperatorValueTypes = int | str | float
146142

147143

148144
class ContainsCondition(ConditionBase):
149-
operator: Literal[ConditionOperatorKind.CONTAINS] = ConditionOperatorKind.CONTAINS
150145
value: ContainsOperatorValueTypes
146+
operator: str = dataclasses.field(default="contains")
151147

152148
def _operator_match(self, condition_property: Any, segment_name: str):
153149
return self._evaluate_contains(
@@ -156,101 +152,87 @@ def _operator_match(self, condition_property: Any, segment_name: str):
156152

157153

158154
class NotContainsCondition(ConditionBase):
159-
operator: Literal[ConditionOperatorKind.NOT_CONTAINS] = ConditionOperatorKind.NOT_CONTAINS
160155
value: ContainsOperatorValueTypes
156+
operator: str = dataclasses.field(default="not_contains")
161157

162158
def _operator_match(self, condition_property: Any, segment_name: str):
163159
return not self._evaluate_contains(
164160
condition_property=condition_property, segment_name=segment_name
165161
)
166162

167163

168-
EqualsOperatorValueTypes = (
169-
StrictInt
170-
| StrictFloat
171-
| StrictStr
172-
| StrictBool
173-
| list[StrictInt]
174-
| list[StrictFloat]
175-
| list[StrictStr]
176-
)
164+
EqualsOperatorValueTypes = int | float | str | bool | list[int] | list[float] | list[str]
177165

178166

179167
class EqualsCondition(ConditionBase):
180-
operator: Literal[ConditionOperatorKind.EQUALS] = ConditionOperatorKind.EQUALS
181168
value: EqualsOperatorValueTypes
182-
strict_validation: bool = Field(
183-
description="Whether the condition should enable strict validation, raising an exception if the evaluation context property is missing",
184-
default=False,
185-
)
186-
"""Whether the condition should enable strict validation, raising an exception if the evaluation context property is missing"""
169+
operator: str = dataclasses.field(default="equals")
187170

188171
def _operator_match(self, condition_property: Any, segment_name: str):
189172
return self._evaluate_equals(
190173
condition_property=condition_property,
191174
segment_name=segment_name,
192-
strict_validation=self.strict_validation,
193175
)
194176

195177

196178
class NotEqualsCondition(ConditionBase):
197-
operator: Literal[ConditionOperatorKind.NOT_EQUALS] = ConditionOperatorKind.NOT_EQUALS
198179
value: EqualsOperatorValueTypes
199-
strict_validation: bool = Field(
200-
description="Whether the condition should enable strict validation, raising an exception if the evaluation context property is missing",
201-
default=False,
202-
)
203-
"""Whether the condition should enable strict validation, raising an exception if the evaluation context property is missing"""
180+
operator: str = dataclasses.field(default="not_equals")
204181

205182
def _operator_match(self, condition_property: Any, segment_name: str):
206183
return not self._evaluate_equals(
207184
condition_property=condition_property,
208185
segment_name=segment_name,
209-
strict_validation=self.strict_validation,
210186
)
211187

212188

213-
# We have to group and annotate all the different subclasses of Operator
214-
# in order for Pydantic to be able to discern between the different types
215-
# when parsing a dict or JSON.
216-
AvailableConditions = Annotated[
217-
InCondition
218-
| NotInCondition
219-
| ContainsCondition
220-
| NotContainsCondition
221-
| EqualsCondition
222-
| NotEqualsCondition,
223-
Field(discriminator="operator"),
224-
]
189+
OPERATOR_LOOKUP: Mapping[ConditionOperatorKind, type[ConditionBase]] = {
190+
ConditionOperatorKind.IN: InCondition,
191+
ConditionOperatorKind.NOT_IN: NotInCondition,
192+
ConditionOperatorKind.CONTAINS: ContainsCondition,
193+
ConditionOperatorKind.NOT_CONTAINS: NotContainsCondition,
194+
ConditionOperatorKind.EQUALS: EqualsCondition,
195+
ConditionOperatorKind.NOT_EQUALS: NotEqualsCondition,
196+
}
225197

226198

227-
class Segment(BaseModel):
228-
name: constr(min_length=1) = Field( # type:ignore[valid-type]
229-
description="A brief description or identifier for the segment"
199+
def condition_from_dict(data: Mapping[str, Any]) -> ConditionBase:
200+
operator_kind = ConditionOperatorKind(data.get("operator", "invalid"))
201+
if operator_kind not in OPERATOR_LOOKUP:
202+
valid = ", ".join(OPERATOR_LOOKUP.keys())
203+
raise ValueError(f"The {operator_kind} is not a known operator. Choose from {valid}")
204+
205+
condition_cls = OPERATOR_LOOKUP[operator_kind]
206+
return condition_cls(
207+
property=str(data.get("property")), operator=operator_kind.value, value=data.get("value")
230208
)
209+
210+
211+
@dataclasses.dataclass
212+
class Segment:
213+
name: str
231214
"A brief description or identifier for the segment"
232215

233-
conditions: list[AvailableConditions] = Field(
234-
description="The list of conditions that the segment must be matched in order for this segment to be active"
235-
)
216+
conditions: list[ConditionBase] = dataclasses.field(default_factory=list)
236217
"The list of conditions that the segment must be matched in order for this segment to be active"
237218

238-
rollout: int | None = Field(
239-
default=0,
240-
description="""
241-
Rollout rate controls how many buckets will be granted a feature when this segment matches.
242-
243-
Rollout rates range from 0 (off) to 100 (all users). Rollout rates use `context.id`
244-
to determine bucket membership consistently over time.
245-
""",
246-
)
219+
rollout: int | None = dataclasses.field(default=0)
247220
"""
248221
Rollout rate controls how many buckets will be granted a feature when this segment matches.
249222
250223
Rollout rates range from 0 (off) to 100 (all users). Rollout rates use `context.id`
251224
to determine bucket membership consistently over time.
252225
"""
253226

227+
@classmethod
228+
def from_dict(cls, data: Mapping[str, Any]) -> Self:
229+
conditions = [condition_from_dict(condition) for condition in data.get("conditions", [])]
230+
return cls(
231+
name=str(data.get("name", "")),
232+
rollout=int(data.get("rollout", 0)),
233+
conditions=conditions,
234+
)
235+
254236
def match(self, context: EvaluationContext) -> bool:
255237
for condition in self.conditions:
256238
match_condition = condition.match(context, segment_name=self.name)

0 commit comments

Comments
 (0)