[Feature] `activity` of each user in a repo by month #1186

tyn1998 · 2023-02-12T13:25:11Z

Description

Hi OpenDigger community,

In Hypercrx we have a feature called repo's developer network, which consumes data file like: https://oss.x-lab.info/open_digger/github/X-lab2017/open-digger/developer_network.json. With the data in the file, we can know each user's activity in that repo with the time span of 90 days.

Hypercrx is looking forward to datas that has every user's activity in every month for a repo. Maybe the data file is organized in this way:

https://oss.x-lab.info/open_digger/github/X-lab2017/open-digger/developer-activity.json

and a possible json scheme for a file might look like this:

{
  "2020-08": [["frank-zsy",43.85],["xgdyp",22.36],["longyanz",13.09],["birdflyi",9.83]],
  "2020-09": [["frank-zsy",23.85],["xgdyp",22.36],["longyanz",13.09],["birdflyi",5.83]],
  ...
}

With these data files, Hypercrx can implement features like:

These data can be generated when repo's activity is computed, right? Not cost too much?

The text was updated successfully, but these errors were encountered:

open-digger-bot · 2023-02-13T01:00:02Z

This issue has not been replied for 24 hours, please pay attention to this issue: @gymgym1212 @xiaoya-yaya @xgdyp

frank-zsy · 2023-02-13T02:06:56Z

@tyn1998 The network data is generated from Neo4j database while relationship data can be extracted more easily. I think the detail can be retrieved from ClickHouse when activity metric is generated.

Right now the repo activity metric function does not return any details about the developers but actually we can add an option to query option so the query will return the detail about developers. So the data can be generated while activity metric is generated. It can be done of course.

frank-zsy · 2023-02-13T02:10:34Z

/self-assign

frank-zsy · 2023-02-13T02:49:32Z

@tyn1998 Could you take a look on this file: https://oss.x-lab.info/open_digger/github/X-lab2017/open-digger/activity_details.json , does it fit your requirement?

tyn1998 · 2023-02-13T02:54:55Z

@frank-zsy

That is exactly what we want, thank you!

frank-zsy · 2023-02-13T23:00:27Z

@tyn1998 All the data has been uploaded, but the data size is quite large compare with former data, like for vscode, the activity details contains 3MB data.

I think we should limit how many developers will be contained in a single month, this will reduce lots of storage usage.

tyn1998 · 2023-02-15T03:00:02Z

Hi @frank-zsy, I have two ideas:

use [index, activity] instead of [user_name, activity] to decrease the file size

{
  "participants": ["frank-zsy", "xgdyp", "longyanz", "birdflyi", "xxx"],
  "2020-08": [[0, 43.85],[1, 22.36],[2, 13.09],[3, 9.83]],
  "2020-09": [[1, 23.85],[0, 22.36],[2, 13.09],[4, 5.83]],
  ...
}

(similar with yours)if the number of developers(participants) is greater than a certain threshold, then remove those whose total activity(i.e. sum of all months) is relatively small.

Both methods require that you need first to get data of all months then do the processing work. Rather month by month.

frank-zsy · 2023-02-15T06:03:28Z

@tyn1998 I tried the first solution, but seems not work since most developer in vscode community maybe only active only once. So when I use index to replace the login and add an login array, the size of the output grows from 3.1MB to 3.3MB.

And why do you think we should remove developers by total activity but not in every month? How is it different from like just return the top 100 for each month?

tyn1998 · 2023-02-15T06:31:32Z

So when I use index to replace the login and add an login array, the size of the output grows from 3.1MB to 3.3MB.

That was not taken into consideration... Thank you for your experiment.

And why do you think we should remove developers by total activity but not in every month? How is it different from like just return the top 100 for each month?

Because I don't want to miss the growing path(i.e. how the active developer becomes active or not active in a repo) of any outstanding contributor(i.e. sum of activity is great, whether he/she is active now) in a certain repo. However, this strategy might cause that current active developers not being included because their total score is not big enough to take over those who are active in history.

Now I'm thinking it twice. What is your idea?

frank-zsy · 2023-02-15T07:50:21Z

I think we can use a threshold to filter the low activity developer for every month, the threshold could be every low like 2. I think this may not effect the purpose to find the trend.

tyn1998 · 2023-02-15T08:44:01Z

Agree +1

open-digger-bot bot added the kind/feature Category issues or prs related to feature request. label Feb 12, 2023

github-actions bot added the waiting for repliers need other's feedback label Feb 12, 2023

github-actions bot added waiting for author need issue author's feedback and removed waiting for repliers need other's feedback labels Feb 13, 2023

open-digger-bot bot assigned frank-zsy Feb 13, 2023

github-actions bot added waiting for repliers need other's feedback and removed waiting for author need issue author's feedback labels Feb 13, 2023

frank-zsy mentioned this issue Feb 13, 2023

feat: add activity developer details option #1187

Merged

open-digger-bot bot closed this as completed in #1187 Feb 13, 2023

frank-zsy reopened this Feb 13, 2023

github-actions bot added waiting for author need issue author's feedback and removed waiting for repliers need other's feedback labels Feb 15, 2023

github-actions bot added waiting for repliers need other's feedback and removed waiting for author need issue author's feedback labels Feb 15, 2023

github-actions bot added waiting for author need issue author's feedback and removed waiting for repliers need other's feedback labels Feb 15, 2023

github-actions bot added waiting for repliers need other's feedback and removed waiting for author need issue author's feedback labels Feb 15, 2023

frank-zsy mentioned this issue Feb 20, 2023

refactor: slice activity detail in case large size file #1207

Merged

open-digger-bot bot closed this as completed in #1207 Feb 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] `activity` of each user in a repo by month #1186

[Feature] `activity` of each user in a repo by month #1186

tyn1998 commented Feb 12, 2023

open-digger-bot bot commented Feb 13, 2023

frank-zsy commented Feb 13, 2023

frank-zsy commented Feb 13, 2023

frank-zsy commented Feb 13, 2023

tyn1998 commented Feb 13, 2023

frank-zsy commented Feb 13, 2023

tyn1998 commented Feb 15, 2023 •

edited

Loading

frank-zsy commented Feb 15, 2023

tyn1998 commented Feb 15, 2023

frank-zsy commented Feb 15, 2023

tyn1998 commented Feb 15, 2023

[Feature] activity of each user in a repo by month #1186

[Feature] activity of each user in a repo by month #1186

Comments

tyn1998 commented Feb 12, 2023

Description

open-digger-bot bot commented Feb 13, 2023

frank-zsy commented Feb 13, 2023

frank-zsy commented Feb 13, 2023

frank-zsy commented Feb 13, 2023

tyn1998 commented Feb 13, 2023

frank-zsy commented Feb 13, 2023

tyn1998 commented Feb 15, 2023 • edited Loading

frank-zsy commented Feb 15, 2023

tyn1998 commented Feb 15, 2023

frank-zsy commented Feb 15, 2023

tyn1998 commented Feb 15, 2023

[Feature] `activity` of each user in a repo by month #1186

[Feature] `activity` of each user in a repo by month #1186

tyn1998 commented Feb 15, 2023 •

edited

Loading