-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add code for collecting reginfo.gov data to the data update Python scripts #36
Comments
The Reg Stats data for significant & economically significant rules prior to 2021 were obtained from the Regulatory Review database on reginfo.gov. I investigated the possibility to pull the data from Regulatory Review XML reports, but it seems that there are discrepancies between the data in XMLs and the web search results. For example, for the presidential year 2017 (Published Date Range = 02/01/2017-01/31/2018), the web search returns 77 significant rules and 22 economically significant rules, whereas the XMLs show only 40 significant rules and 14 economically significant rules published during the same time period. It suggests that the XML reports were not updated as publication dates become available. Therefore, it is impossible to obtain the same data from the XML reports. The web search requires manual input of criteria, and the URL of the results page does not contain criteria values. I have no idea how to automate this process. I also investigated the Federal Register API as an alternative source for this data. While incomplete and sometimes inaccurate, the rin_priority field from the FR indicates the significance designation from the Unified Agenda for documents published. However, the FR rules include corrections, extension of comment periods, etc., so the numbers returned are much larger than what we got from reginfo.gov. In sum, I didn't find a good approach to automatically fetch and verifying data for significant & economically significant rules published prior to 2021. Any thoughts @mfebrizio @haysarah ? |
Thanks, Zoey. This is really helpful. It sounds like we should stick with the manual process in the meantime, but I have a few ideas:
|
One more thought: we should def reach out to reginfo and notify them of the inconsistency between the xml and search requests. |
Thanks Mark! These are all good thoughts. Your idea 1 sounds promising; I'll explore the libraries that auto-fill webforms. Option 2 is also worth trying. If you can share your code for cleaning FR documents, I'll check to see how much it matches the reginfo.gov data. Options 3 & 4 are more of longer term solutions. Manually going through all final rules back in time may be too time consuming, as our data go back to 1981, and we only need annual counts for the current Reg Stats charts, so rule-level details don't matter so much. In that sense, option 3 may be a more efficient approach. I'll send an email to the reginfo.gov contact about the data inconsistency and see if they do anything. |
Re: option 2, I just added you to the repo with that code. Should be here. If I am remembering correctly, it uses data output from your Unified Agenda compilation script, extracts the FR citations from the "actions" columns, then for remaining documents links the RIN to the UA RIN. |
And more fun :) |
Thanks! Does the rin_priority field from your fr-toolbelt come from the same source? I used that and thought it was obtained from the FR API. |
|
Data for the significant & economically significant rules prior to 2021 were collected from the Regulatory Review database on Reginfo.gov. The Python scripts for updating the data need to be revised to pull data from Reginfo.gov.
The text was updated successfully, but these errors were encountered: