-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update_column() utility not working as intended? #89
Comments
Here's a thought, could this have to do with whether or not the |
Interesting (and troubling). Agree that it seems like I'm not sure the last part (beginning with "So my working hypothesis") is the reason, though -- i don't think i'd expect changes to propagate between copies anyway, even if they'd been recorded by Orca. I'm looking through the Orca documentation to see if i can come up with any leads.. |
It does seem like this is the intended workflow. And here's an example of the underlying data being updated properly, in the Large MNL unit tests: test_large_multinomial_logit.py#L202-L211 One thing to try might be explicitly replacing the orca table with the new one, which might be more reliable. So in your debugging section, something like: dfw = orca.get_table(mc.out_tables)
dfw.update_col(mc.out_column, mc.choices)
orca.add_table(mc.out_tables, dfw.to_frame()) If that works, it would be an easy fix to add it to the template utility. |
Hmm, and the |
I'll test the explicit table replacement and report back. |
@mxndrwgrdnr I think you're exactly right about the cache status. Basically what's going on here is (a) a function evaluates and returns a data frame, (b) you're modifying that data frame and then (c) the function is being evaluated again so you're obtaining what looks to be the original data frame. # define a non-cached table
@orca.table()
def my_table():
return pd.DataFrame({'a': ['uno', 'dos', 'tres']})
# grab the table from orca and update it
my_wrapper = orca.get_table('my_table')
my_wrapper.update_col('b', pd.Series(['cuatro', 'cinco', 'seis']))
# this will bring back the original result
orca.get_table('my_table').local
Instead if we cache, then the function is NOT being re-evaluated and we get the same instance: @orca.table(cache=True, cache_scope='forever')
def my_table():
return pd.DataFrame({'a': ['uno', 'dos', 'tres']})
my_wrapper= orca.get_table('my_table')
my_wrapper.update_col('b', pd.Series(['cuatro', 'cinco', 'seis']))
orca.get_table('my_table').local
I lieu of messing with the cache, I suppose you could retrieve the data frame from the function and then add it back into orca as a data frame rather than a function. @orca.table()
def my_table():
return pd.DataFrame({'a': ['uno', 'dos', 'tres']})
orca.add_table('my_table', orca.get_table('my_table').local)
my_wrapper= orca.get_table('my_table')
my_wrapper.update_col('b', pd.Series(['cuatro', 'cinco', 'seis']))
orca.get_table('my_table').local
|
OK, so then why do your large MNL unit tests work, because you aren't setting the cache here |
That looks to be creating a orca table wrapper from an already created pandas.DataFrame, not a function that returns a pandas.DataFrame. |
Yes but there is a function that returns a pandas.DataFrame that gets called behind the scenes here |
How is the __ persons_CHTS_format __ table being added to orca? Like this? # create a data frame and add it explicitly
df = pandas.DataFrame({a: [1,2,3]})
orca.add_table('__persons_CHTS_format__', df) Or like this? # wrap a function and add it to orca
@orca.table()
def __persons_CHTS_format__():
return pandas.DataFrame({a: [1,2,3]}) If table is added as a wrapped function (2nd case), I would set cache=True_ and cache_scope_='forever' or add the table explicitly. If that doesn't fix it then there's a template or orca problem. |
Thanks @bridwell, this is all extremely helpful. It does seem like it must be the cache settings. Which means that the workflow of modifying a table within a model step only works if caching is enabled for the table, which I hadn't fully realized. Max and i were discussing what to do to prevent the sort of problem he ran into, and here are some possibilities:
So my inclination is to do numbers 1 through 3. |
One more thing to be mindful of, is that in the examples I provided, the table is cached with a scope of 'forever'. If the table was cached to a lesser scope (step, iteration) then you would run into the same problems--and these would probably be even more difficult to debug because the updates would sometimes persist and then be removed. I agree that changing Orca cache defaults would probably cause problems. Raising a warning in orca when update_col is called on a table that is (a) being returned by a function and (b) is not cached forever, makes sense. In terms of the templates I wonder if you should be stricter and raise an error? I also use orca decorators to initially load my tables. I don't really see anything wrong with the workflow, it's definitely helpful and convenient. But the whole caching idea does seem to create confusion at times. Perhaps a new decorator or input argument should be added that would clarify/differentiate this type of workflow? Like maybe @orca.load_table()? This would be equivalent to @orca.table(cache=True, cache_scope='forever'). |
@bridwell I believe the default is But yeah, @smmaurer I think this solution (1-3) makes the most sense. Regarding point 4, though, would the idea be to convert all of the |
I'm suspicious that the new
urbansim_templates.utils.update_column()
method is not functioning properly, or at least only functioning in certain settings, because it definitely is working in some places, but not in others. I have a hypothesis as to why this might be happening, but I wonder what you @smmaurer think.Here I am running the mode choice model in activitySynth:
which uses the small MNL template and as such should automatically append the results to the correct output table as seen here.
But that doesn't happen:
In order to figure out if the problem might be in the model step itself, I manually trigger the
update_column()
command but to no avail:So then I debug these lines of code from the
update_column()
method itself to see if the problem lies there, but it seems to work as intended:But actually they don't because the updated column cannot be found in the parent table:
So my working hypothesis is that for some reason, in some instances, updates that are made to an
orca.get_table()
dataframe wrapper object are not made to the orca table itself. I believe this has to do with the fact that successiveorca.get_table()
calls to the same table return objects that are stored in different locations in memory, as seen here:Whether or not these having different in-memory addresses is the issue, the following exercise shows how changes do not permeate:
Am I missing something here or is my reasoning sound? The bigger mystery to me now is why the
update_column()
method would ever work to update the parent orca table itself.The text was updated successfully, but these errors were encountered: