-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very Slow to Load/Import #37
Comments
I agree this is the major weakness of the library. The issue is that the CPI database from the BLS is quite large. It is packaged here as a SQLite file that ships with the code. One alternative design approach would be a lazy-loading on-demand system, but due to the structure of the source files from the government that's not an easy option. Some kind of API would likely need to be launched and supported, or a drastically restructured version of the data would need to be shipped, perhaps one small CSV file per series. Another strategy would be to target users who are looking to make only the most rudimentary CPI calculations using the CPI-U series. I have no evidence to back this up, but I suspect this would represent the lion's share of use cases. Were that to be a driving design principle, a carved out, slimmed down version containing only that data series could be initially imported, with access to the larger, more complete database only being made when requested. In the meantime, if you are working on a data science style research project, I would recommend using our library in a computational notebook, like Project Jupyter, where the import can be done a single time and saved into the environment. |
I am just playing around with my retirement funds, trying to model different indices to figure out which one to pick and I want to adjust the returns for inflation before I dump them into my model. I am not quite sure I am following you though. Is this mod downloading the data every time it is imported? From a very brief look at the source code it looked like it would only do that when it was extremely out of date. If it is downloading every time, I feel like that is unnecessary. I think you could get away with lazy loading as you suggest, where you download maybe just the last few years and then as the function is called for different dates, deal with it then. I am guessing there isn't a sweet little gov. api that could be queried on the fly to just download the values needed? |
I'm having the same problem. I'm working only with "Housing" and just in New Jersey. Would it be possible to partition the data by "items" and/or "area" so as to reduce the load time? |
How can I help implement this? I use cpi in a project that needs to put data from several sources on a consistent $-year. Not having to write and maintain code for this is worth the ~40-50 seconds needed to load cpi, but it would be great to cut down the import time. Here is the
If only the
|
same issue, seems slow, how could we speed this up? i just need to get extremely recent cpi, what if we could specify how many years back to load? |
I tossed a couple of ideas out in my first comment in the thread. I'm totally open to a proposal with a pull request. |
I believe I have a solution to this problem prototyped in the lazy loader branch right now. If everything goes according to plan, I should have a solution released in the coming days. I mention this because I'd welcome any input or help roadtesting it in advance of the change. I am aiming for full backwards compatibility but that's easier said than done. |
Python 3.6
Every time I import the library, it is extremely slow (~50 seconds).
I don't have time to review the codebase right now for a solution, so I am going to look for other libraries, but this really is a huge problem when you just want to rattle off some code quickly in the shell.
I tried with Python 3.7 from Anaconda distribution and normal 3.6, but both were exceedingly slow. I also tried calling cpi.update() and then closing and re-opening the shell and importing again, but it was still extremely slow.
The text was updated successfully, but these errors were encountered: