Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Brigade Project Meta-data representation on StatusBoard #30

Open
2 tasks
nikolajbaer opened this issue Nov 2, 2019 · 2 comments
Open
2 tasks

Comments

@nikolajbaer
Copy link
Collaborator

The next step on the Status board sub-project is to improve how we report the relative "health" of an indexed project, as well as describe actions to take to change the indexing.

  • Create list of actions with links (e.g. how to remove this project from the index, how to add topics)
  • Identify key metrics of project meta data (e.g. beyond topic tags, functioning project-url, parent-projects, etc)
@themightychris
Copy link
Collaborator

This overlaps a bit with #26 which I created to track figuring out some initial details to capture into the index

I've gotten a bit stuck fretting over the schema, but the move to having a version in the branch name should help that

Maybe the antidote is that for the v1 index we just shove all the details we can into new fields and not worry any redundancy or cohesiveness. Long term what I'm worried about is that we don't want too much data logic to end up in tool code. For example, between project lists, GitHub metadata, git analysis, civic.json, and publiccode.yml there might be 6 different ways to determine who the main project maintainer is. Do we want an index that contains all 6 and every tool determining which it uses?

I've been thinking we should aim for a set of "core" fields to shake out over time that the index handles filling based on an always-evolving panel of techniques. So we might have a core field for project maintainer that the index makes a best effort to fill for every project, and then we iterate within the index over time on the coverage and quality of that field by continuously adding/tweaking/ranking the various sources we can get at for potential values.

So maybe the play is that for v1 we just keep adding root attributes aggressively with little concern for cohesion (i.e we add a publiccode key with the entire document if present, a github key with everything interesting we extract from GitHub), and then v2 is where we take a step back and design. It shouldn't be hard to have the automated infrastructure keep populating both in parallel.

So tools built against the v1 index will be tightly coupled with the various underlying data sources and will be left to sort out on their own between various redundant fields. Then we'd have a second generation of tools able to build against the v2 index that provides consolidated & normalized data

Does that sound like a good approach? Are there any other paths we might consider?

@nikolajbaer
Copy link
Collaborator Author

I think your approach sounds good. My intent on this issue is more coming up with what set of meta data fields we want to promote on the status board as places to improve upon. Even if we don't have a "score", we apply merit by inclusion (and dismiss by omission).

This is more a place to say, for instance, given that v1 is going to be tightly coupled to github (mostly), can we identify some objective metrics, e.g.

  1. Do you have topic tags / description?
  2. Does your "project url" return a 200 status code?

I guess to your point, the next layer is somewhat opinionated (maybe we just need to push for one or two of these), e.g.

  1. Do you have a publiccode.yaml or civic.json file?
  2. Do you have a CONTRIBUTING.md file?

And finally we may have some suggestions that are maybe a bit more controversial:

  1. You say your project is in-development, but your last commit was 2 years ago? Perhaps you want "archive" or "stable"?
  2. You only have 3 commits in this repo, maybe add the exclusion topic tag?

So, I am thinking of how to take the information your index is gathering, and present it in a useful (ala Google PageSpeed) manner to brigade members so they can know how to improve the overall index quality. I think to your point we need to keep the statusboard from taking on too much responsibility in terms of being the place where we decide on subjective assessments, but that is just a fine line we are going to have to balance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Prioritized Backlog
Development

No branches or pull requests

2 participants