-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] activity
of each user in a repo by month
#1186
Comments
This issue has not been replied for 24 hours, please pay attention to this issue: @gymgym1212 @xiaoya-yaya @xgdyp |
@tyn1998 The network data is generated from Neo4j database while relationship data can be extracted more easily. I think the detail can be retrieved from ClickHouse when activity metric is generated. Right now the repo activity metric function does not return any details about the developers but actually we can add an option to query option so the query will return the detail about developers. So the data can be generated while activity metric is generated. It can be done of course. |
/self-assign |
@tyn1998 Could you take a look on this file: https://oss.x-lab.info/open_digger/github/X-lab2017/open-digger/activity_details.json , does it fit your requirement? |
That is exactly what we want, thank you! |
@tyn1998 All the data has been uploaded, but the data size is quite large compare with former data, like for vscode, the activity details contains 3MB data. I think we should limit how many developers will be contained in a single month, this will reduce lots of storage usage. |
Hi @frank-zsy, I have two ideas:
{
"participants": ["frank-zsy", "xgdyp", "longyanz", "birdflyi", "xxx"],
"2020-08": [[0, 43.85],[1, 22.36],[2, 13.09],[3, 9.83]],
"2020-09": [[1, 23.85],[0, 22.36],[2, 13.09],[4, 5.83]],
...
}
Both methods require that you need first to get data of all months then do the processing work. Rather month by month. |
@tyn1998 I tried the first solution, but seems not work since most developer in vscode community maybe only active only once. So when I use index to replace the login and add an login array, the size of the output grows from 3.1MB to 3.3MB. And why do you think we should remove developers by total activity but not in every month? How is it different from like just return the top 100 for each month? |
That was not taken into consideration... Thank you for your experiment.
Because I don't want to miss the growing path(i.e. how the active developer becomes active or not active in a repo) of any outstanding contributor(i.e. sum of activity is great, whether he/she is active now) in a certain repo. However, this strategy might cause that current active developers not being included because their total score is not big enough to take over those who are active in history. Now I'm thinking it twice. What is your idea? |
I think we can use a threshold to filter the low activity developer for every month, the threshold could be every low like 2. I think this may not effect the purpose to find the trend. |
Agree +1 |
Description
Hi OpenDigger community,
In Hypercrx we have a feature called
repo's developer network
, which consumes data file like: https://oss.x-lab.info/open_digger/github/X-lab2017/open-digger/developer_network.json. With the data in the file, we can know each user'sactivity
in that repo with the time span of 90 days.Hypercrx is looking forward to datas that has every user's
activity
in every month for a repo. Maybe the data file is organized in this way:and a possible json scheme for a file might look like this:
With these data files, Hypercrx can implement features like:
These data can be generated when
repo's activity
is computed, right? Not cost too much?The text was updated successfully, but these errors were encountered: