-
Notifications
You must be signed in to change notification settings - Fork 14
Feature/abstract data collector #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/abstract data collector #156
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #156 +/- ##
==========================================
- Coverage 92.27% 89.81% -2.46%
==========================================
Files 11 12 +1
Lines 1721 1768 +47
==========================================
Hits 1588 1588
- Misses 133 180 +47 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ben for putting this together.
I'm a bit confused about the interplay between collect()
, _collect()
, get_data()
, and flush()
, particularly regarding their intended public/private semantics and expected usage. Below are a few points for discussion:
1. Public vs. private: should collect()
really be exposed?
Currently, collect()
is a public method that checks should_collect()
and then calls the abstract _collect()
.
If the intention is that users or developers should never call _collect()
directly (as it's meant to be implemented by subclasses), then exposing collect()
makes sense. However, if users are not meant to call collect()
themselves either (e.g., because data is collected automatically after every n steps), then perhaps collect()
should be renamed to something like _collect_if_triggered()
or otherwise marked as internal.
Alternatively, it might be clearer to split responsibilities:
- Keep
collect()
as the public method that always collects (i.e., unconditional). - Introduce a new internal method like
_collect_if_triggered()
that checks the trigger before collecting.
This would make the public API more predictable and reduce the risk of misuse.
2. Clarify the difference between collect()
and get_data()
From the current docstrings:
collect()
triggers data collection if the trigger condition is met.get_data()
returns the collected data currently in memory.
But it’s unclear:
- Does
get_data()
trigger a collection if no data has been collected yet? - Does it always return just the in-memory data (
self._frames
)? - Could it return data from an external backend (if flushed)?
If get_data()
is purely an accessor for self._frames
, it might make sense to expose it as a @property
instead, and rename it accordingly (e.g. data
).
If it can also trigger collection or fetch from storage, that should be made explicit in the docstring and usage examples.
3. Docstrings could be more explicit and example-driven
For example, in flush()
, it’s not immediately obvious what reset_memory
does unless one reads the implementation. Could you clarify in the docstring:
- What it means to "flush" data (to storage? to file? to database?).
- What effect
reset_memory
has (i.e., clears in-memory buffers after flushing).
More broadly, it would be very helpful if each public method (collect
, get_data
, flush
, reset
) had:
- A short, high-level summary of when it should be used,
- And, ideally, a minimal usage example in the docstring to illustrate expected patterns.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
So in mesa data collection the model is used like this class MoneyModel(mesa.Model):
"""A model with some number of agents."""
def __init__(self, n, width, height, seed=None):
super().__init__(seed=seed)
self.num_agents = n
self.grid = OrthogonalMooreGrid(
(width, height), torus=True, capacity=10, random=self.random
)
# Instantiate DataCollector
self.datacollector = mesa.DataCollector(
model_reporters={"Gini": compute_gini}, agent_reporters={"Wealth": "wealth"}
)
# Create agents
agents = MoneyAgent.create_agents(
self,
self.num_agents,
self.random.choices(self.grid.all_cells.cells, k=self.num_agents),
)
def step(self):
# Collect data each step
self.datacollector.collect(self)
self.agents.shuffle_do("move")
self.agents.do("give_money") i thought we will do the same and the user has to call collect in
i think I fixed these with current changes I was also thinking of a |
Hey Ben, looks good to me! Just make sure to fix any pre-commit errors before merging. You can run checks locally with No worries about CodeCov since it's just an abstract implementation. |
8b5ab29
to
9cac504
Compare
65f8caa
to
9a05e68
Compare
Hey @adamamer20 I ran pre commit and i got get conflicts which for some reason I get conflicts, I am not able to solve it after pulling from main. |
…ben-geo/mesa-frames into pr/Ben-geo/156-1
…o pr/Ben-geo/156-1
…o pr/Ben-geo/156-1
…ben-geo/mesa-frames into pr/Ben-geo/156-1
@Ben-geo I recently merged #143 into I resolved it by accepting all changes from the incoming branch ( Also, it looks like To fix that, I merged #158 — now running As for the pre-commit issues: (mesa-frames) adam@Supermacchina:~/projects/mesa-frames$ pre-commit run --all
ruff (legacy alias)......................................................Failed
- hook id: ruff
- exit code: 1
mesa_frames/abstract/datacollector.py:79:9: D400 First line should end with a period
|
77 | storage: Literal["memory:", "csv:", "postgresql:"] = "memory:",
78 | ):
79 | / """
80 | | Initialize a Datacollector
81 | |
82 | | Parameters
83 | | ----------
84 | | model : ModelDF
85 | | The model object from which data is collected.
86 | | model_reporters : dict[str, Callable], optional
87 | | Functions to collect data at the model level.
88 | | agent_reporters : dict[str, Union[str, Callable]], optional
89 | | Attributes or functions to collect data at the agent level.
90 | | trigger : Callable, optional
91 | | A function(model) -> bool that determines whether to collect data.
92 | | reset_memory : bool
93 | | Whether to reset in-memory data after flushing. Default is True.
94 | | storage : str
95 | | Storage backend URI (e.g. 'memory:', 'csv:', 'postgresql:').
96 | | """
| |___________^ D400
97 | self._model = model
98 | self._model_reporters = model_reporters or {}
|
= help: Add period
mesa_frames/abstract/datacollector.py:106:9: D400 First line should end with a period
|
105 | def collect(self) -> None:
106 | / """
107 | | Trigger Data collection
108 | |
109 | | This method caslls _collect() to perform actual data collection
110 | |
111 | | Example
112 | | -------
113 | | >>> datacollector.collect()
114 | | """
| |___________^ D400
115 | self._collect()
|
= help: Add period
mesa_frames/abstract/datacollector.py:118:9: D400 First line should end with a period
|
117 | def conditional_collect(self) -> None:
118 | / """
119 | | Trigger data collection if condition is met
120 | |
121 | | This method caslls _collect() to perform actual data collection
122 | |
123 | | Example
124 | | -------
125 | | >>> datacollector.conditional_collect()
126 | | """
| |___________^ D400
127 | if self._should_collect():
128 | self._collect()
|
= help: Add period
mesa_frames/abstract/datacollector.py:131:9: D400 First line should end with a period
|
130 | def _should_collect(self) -> bool:
131 | / """
132 | | Evaluates whether data should be collected at current step
133 | |
134 | | Returns
135 | | -------
136 | | bool
137 | | True if the configured trigger condition is met, False otherwise.
138 | | """
| |___________^ D400
139 | return self._trigger(self._model)
|
= help: Add period
mesa_frames/abstract/datacollector.py:131:9: D401 First line of docstring should be in imperative mood: "Evaluates whether data should be collected at current step"
|
130 | def _should_collect(self) -> bool:
131 | / """
132 | | Evaluates whether data should be collected at current step
133 | |
134 | | Returns
135 | | -------
136 | | bool
137 | | True if the configured trigger condition is met, False otherwise.
138 | | """
| |___________^ D401
139 | return self._trigger(self._model)
|
mesa_frames/abstract/datacollector.py:143:9: D400 First line should end with a period
|
141 | @abstractmethod
142 | def _collect(self):
143 | / """
144 | | Performs the actual data collection logic
145 | |
146 | | This method must be im
147 | | """
| |___________^ D400
148 | pass
|
= help: Add period
mesa_frames/abstract/datacollector.py:143:9: D401 First line of docstring should be in imperative mood: "Performs the actual data collection logic"
|
141 | @abstractmethod
142 | def _collect(self):
143 | / """
144 | | Performs the actual data collection logic
145 | |
146 | | This method must be im
147 | | """
| |___________^ D401
148 | pass
|
mesa_frames/abstract/datacollector.py:153:9: D400 First line should end with a period
|
151 | @abstractmethod
152 | def data(self) -> Any:
153 | / """
154 | | Returns collected data currently in memory as a dataframe
155 | |
156 | | Example:
157 | | -------
158 | | >>> df = datacollector.data
159 | | >>> print(df)
160 | | """
| |___________^ D400
161 | pass
|
= help: Add period
mesa_frames/abstract/datacollector.py:206:9: D400 First line should end with a period
|
204 | @property
205 | def seed(self) -> int:
206 | / """
207 | | Function to get the model seed
208 | |
209 | | Example:
210 | | --------
211 | | >>> seed = datacollector.seed
212 | | """
| |___________^ D400
213 | return self._model._seed
|
= help: Add period
Found 9 errors.
No fixes available (7 hidden fixes can be enabled with the `--unsafe-fixes` option).
ruff format..............................................................Passed
pyupgrade................................................................Passed
trim trailing whitespace.................................................Passed
check toml...............................................................Passed
check yaml...............................................................Passed
codespell................................................................Failed
- hook id: codespell
- exit code: 65
mesa_frames/abstract/datacollector.py:172: Parametrs ==> Parameters
mesa_frames/abstract/datacollector.py:174: intialization ==> initialization
markdownlint-cli2........................................................Passed
pydoclint................................................................Failed
- hook id: pydoclint
- exit code: 1
Skipping files that match this pattern: \.git|\.tox
docs/api/conf.py
examples/boltzmann_wealth/performance_plot.py
examples/sugarscape_ig/__init__.py
examples/sugarscape_ig/performance_comparison.py
examples/sugarscape_ig/ss_mesa/__init__.py
examples/sugarscape_ig/ss_mesa/agents.py
examples/sugarscape_ig/ss_mesa/model.py
examples/sugarscape_ig/ss_polars/__init__.py
examples/sugarscape_ig/ss_polars/agents.py
examples/sugarscape_ig/ss_polars/model.py
mesa_frames/__init__.py
mesa_frames/abstract/__init__.py
mesa_frames/abstract/agents.py
mesa_frames/abstract/datacollector.py
mesa_frames/abstract/mixin.py
mesa_frames/abstract/space.py
mesa_frames/concrete/__init__.py
mesa_frames/concrete/agents.py
mesa_frames/concrete/agentset.py
mesa_frames/concrete/mixin.py
mesa_frames/concrete/model.py
mesa_frames/concrete/space.py
mesa_frames/types_.py
mesa_frames/utils.py
tests/__init__.py
tests/test_agents.py
tests/test_agentset.py
tests/test_grid.py
tests/test_mixin.py
tests/test_modeldf.py
mesa_frames/abstract/datacollector.py
70: DOC105: Method `AbstractDataCollector.__init__`: Argument names match, but type hints in these args do not match: model_reporters, agent_reporters, trigger, storage
(mesa-frames) adam@Supermacchina:~/projects/mesa-frames$ Here's a summary of the failing hooks:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small change for imports. Everything else, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small change for imports. Everything else, LGTM.
This PR introduces the
AbstractDataCollector
class, an extensible and standardized base for implementing data collection logic in mesa-frames. It provides a structured interface for collecting bothmodel-level
andagent-level
data during simulation runs, with support for custom triggers, in-memory or external storage, and flexible reporter definitions.Key Features :
Model & Agent Reporters: Accepts user-defined
Callable
or attribute names to specify what to collect.Trigger-Based Collection: Supports conditional data collection via a trigger function; defaults to always collect.
In-Memory Storage with Optional External Flush: Stores collected data in memory with support for flushing to external backends (defined in child classes).
Reset Behavior: Automatically resets in-memory data after flush if reset_memory=True.
Abstract Interface: Defines the following abstract methods to be implemented by concrete subclasses:
_collect()
: Actual logic for collecting model and agent data.get_data()
: Returns in-memory collected data._flush()
: Flushes data to persistent storage.