-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provider Reports - Static #607
Conversation
Can move example to a folder later.
I like this PR - it'll get this data out of email and into a standard format in MDS, but it's a small enough change from the current behavior that it's reasonable for cities to begin requiring operators to adopt it in the near future. It would be a happy outcome indeed if, through that adoption, we get operator feedback (from folks besides Spin, thank you @dirkdk and @joshuaandrewjohnson1!). |
We should clarify that the k=10 requirement is applied to both trips and unique riders. We don't want data being shared about single rider who takes a daily trip, particularly when that rider is a member of a Special Group. |
I want to raise one concern about this sentence:
In practice, I worry that this could lead to situations where a small and/or sensitive geography inadvertently becomes part of a report about Special Groups. For example, the city might have a no parking zone around a single public park that is the home of political protests. Clarifying the k-value requirement as per the above would help address this, but it still violates the principle of data minimization. Longer term, I believe the fix for this is allowing the Policy API to describe data sharing requirements through a new Policy Rule type. Shorter term, I wonder if we should modify this to say that the Geographies in Reports should be agreed upon by the City and the Operator? |
I wonder if this is a real concern though? For one, we're talking about a monthly report - so even in the case of a tiny park / geography, you're still not getting any better temporal granularity. The protest happened over a weekend? Great, it's drowned by the rest of the month. The protest happened over the entire month? Great, that's a lot of data aggregation! If there were less than 10 riders and/or trips in that geometry over the month, you get no data anyway. We touched on the idea of the "City and Provider should discuss which geographies are returned"; it can definitely work, but also doesn't scale as well on either side as things change. I like your longer-term vision for Policy models of data sharing agreements. As this is beta (and Geography itself only now starting to pick up some momentum) we wanted to keep it simple. |
Yes I will add that clarification thanks! |
@thekaveman Agreed, the concern here is principled (Data Minimization) rather than real world. I gave a pretty poor example, should have been more clear on that point. I'm OK with leaving it as-is as long as we think it will be OK to make a breaking change in the future that limits Reports to only explicitly requested geographies. I believe that as a beta feature that shouldn't be a problem. |
I think keeping it as simple as possible for this version is best, which means not having to communicate which geographies to return ahead of time. This way the entire /reports can be created without any needed communication about the details between the city and provider. I also agree with @thekaveman that in this case the risk is very minimal with counts in geographies redacted when the values are low over a month period. It's more data and possible some geographies that are not needed, but the effort to calculate and return these is less than the effort/time to communicate and agree on the list of IDs. I also like the idea of modeling data sharing agreements. This idea came up in a WG call by @whereissean - a sort of meta data MDS json that communicated all the optional parts of MDS that a city would like providers to support: endpoints, optional fields, IDs (like geography for reports in this case), beta features (like GDEs), optional location data, etc. This would make it clear what is required and what is not. A good proposal for 2.0! |
@schnuerle My thoughts exactly. I went ahead and filed #608 to track this. |
Explain pull request
This is a beta implementation of the #569 special groups aggregate metrics issue. It adds a /reports endpoint within the Provider API that handles returning monthly trip counts of special groups, in this case just low income groups.
The response served up is a pre-generated CSV file of all data.
Is this a breaking change
Impacted Spec
Which spec(s) will this pull request impact?
provider
Additional context
See PR #606 for a version that is dynamic and not static like this one.