Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] activity of each user in a repo by month #1186

Closed
tyn1998 opened this issue Feb 12, 2023 · 11 comments · Fixed by #1187 or #1207
Closed

[Feature] activity of each user in a repo by month #1186

tyn1998 opened this issue Feb 12, 2023 · 11 comments · Fixed by #1187 or #1207
Assignees
Labels
kind/feature Category issues or prs related to feature request. waiting for repliers need other's feedback

Comments

@tyn1998
Copy link
Member

tyn1998 commented Feb 12, 2023

Description

Hi OpenDigger community,

In Hypercrx we have a feature called repo's developer network, which consumes data file like: https://oss.x-lab.info/open_digger/github/X-lab2017/open-digger/developer_network.json. With the data in the file, we can know each user's activity in that repo with the time span of 90 days.

Hypercrx is looking forward to datas that has every user's activity in every month for a repo. Maybe the data file is organized in this way:

and a possible json scheme for a file might look like this:

{
  "2020-08": [["frank-zsy",43.85],["xgdyp",22.36],["longyanz",13.09],["birdflyi",9.83]],
  "2020-09": [["frank-zsy",23.85],["xgdyp",22.36],["longyanz",13.09],["birdflyi",5.83]],
  ...
}

With these data files, Hypercrx can implement features like:

These data can be generated when repo's activity is computed, right? Not cost too much?

@open-digger-bot open-digger-bot bot added the kind/feature Category issues or prs related to feature request. label Feb 12, 2023
@github-actions github-actions bot added the waiting for repliers need other's feedback label Feb 12, 2023
@open-digger-bot
Copy link
Contributor

This issue has not been replied for 24 hours, please pay attention to this issue: @gymgym1212 @xiaoya-yaya @xgdyp

@frank-zsy
Copy link
Contributor

@tyn1998 The network data is generated from Neo4j database while relationship data can be extracted more easily. I think the detail can be retrieved from ClickHouse when activity metric is generated.

Right now the repo activity metric function does not return any details about the developers but actually we can add an option to query option so the query will return the detail about developers. So the data can be generated while activity metric is generated. It can be done of course.

@github-actions github-actions bot added waiting for author need issue author's feedback and removed waiting for repliers need other's feedback labels Feb 13, 2023
@frank-zsy
Copy link
Contributor

/self-assign

@frank-zsy
Copy link
Contributor

@tyn1998 Could you take a look on this file: https://oss.x-lab.info/open_digger/github/X-lab2017/open-digger/activity_details.json , does it fit your requirement?

@tyn1998
Copy link
Member Author

tyn1998 commented Feb 13, 2023

@frank-zsy

That is exactly what we want, thank you!

@github-actions github-actions bot added waiting for repliers need other's feedback and removed waiting for author need issue author's feedback labels Feb 13, 2023
@frank-zsy
Copy link
Contributor

@tyn1998 All the data has been uploaded, but the data size is quite large compare with former data, like for vscode, the activity details contains 3MB data.

I think we should limit how many developers will be contained in a single month, this will reduce lots of storage usage.

@frank-zsy frank-zsy reopened this Feb 13, 2023
@tyn1998
Copy link
Member Author

tyn1998 commented Feb 15, 2023

Hi @frank-zsy, I have two ideas:

  1. use [index, activity] instead of [user_name, activity] to decrease the file size
{
  "participants": ["frank-zsy", "xgdyp", "longyanz", "birdflyi", "xxx"],
  "2020-08": [[0, 43.85],[1, 22.36],[2, 13.09],[3, 9.83]],
  "2020-09": [[1, 23.85],[0, 22.36],[2, 13.09],[4, 5.83]],
  ...
}
  1. (similar with yours)if the number of developers(participants) is greater than a certain threshold, then remove those whose total activity(i.e. sum of all months) is relatively small.

Both methods require that you need first to get data of all months then do the processing work. Rather month by month.

@frank-zsy
Copy link
Contributor

@tyn1998 I tried the first solution, but seems not work since most developer in vscode community maybe only active only once. So when I use index to replace the login and add an login array, the size of the output grows from 3.1MB to 3.3MB.

And why do you think we should remove developers by total activity but not in every month? How is it different from like just return the top 100 for each month?

@github-actions github-actions bot added waiting for author need issue author's feedback and removed waiting for repliers need other's feedback labels Feb 15, 2023
@tyn1998
Copy link
Member Author

tyn1998 commented Feb 15, 2023

So when I use index to replace the login and add an login array, the size of the output grows from 3.1MB to 3.3MB.

That was not taken into consideration... Thank you for your experiment.

And why do you think we should remove developers by total activity but not in every month? How is it different from like just return the top 100 for each month?

Because I don't want to miss the growing path(i.e. how the active developer becomes active or not active in a repo) of any outstanding contributor(i.e. sum of activity is great, whether he/she is active now) in a certain repo. However, this strategy might cause that current active developers not being included because their total score is not big enough to take over those who are active in history.

Now I'm thinking it twice. What is your idea?

@github-actions github-actions bot added waiting for repliers need other's feedback and removed waiting for author need issue author's feedback labels Feb 15, 2023
@frank-zsy
Copy link
Contributor

I think we can use a threshold to filter the low activity developer for every month, the threshold could be every low like 2. I think this may not effect the purpose to find the trend.

@github-actions github-actions bot added waiting for author need issue author's feedback and removed waiting for repliers need other's feedback labels Feb 15, 2023
@tyn1998
Copy link
Member Author

tyn1998 commented Feb 15, 2023

Agree +1

@github-actions github-actions bot added waiting for repliers need other's feedback and removed waiting for author need issue author's feedback labels Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Category issues or prs related to feature request. waiting for repliers need other's feedback
Projects
None yet
2 participants