Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is data.table abandoned? Should we switch to something else (arrow, tibble, collapse,...)? #5656

Closed
MLopez-Ibanez opened this issue Jun 13, 2023 · 28 comments

Comments

@MLopez-Ibanez
Copy link
Contributor

There has been no commits since February.

There are more than 1k issues and 132 PRs (some of them obvious , like fixing the github actions).

@mattdowle seems to be the only person able to commit to the main branch and he has not been active in GitHub since February.

Has the project been abandoned? Is there some activity going on behind the scenes that is not visible from the outside?

@tdhock
Copy link
Member

tdhock commented Jun 14, 2023

data.table is not abandoned, but Matt has limited bandwidth. Lots of issues and PRs is normal. I wrote a NSF POSE grant which will be funded starting later this summer, and lasting for two years, about expanding the ecosystem of contributors in data.table. This project will include creating new de-centralized governance, documentation, and testing tools, and it would be great to have your input!

@jangorecki
Copy link
Member

jangorecki commented Jun 14, 2023

Lots of issues is also because we don't use any bots that closes/locks the issues due to inactivity, and ignores if they are resolved or not.

@avimallu
Copy link
Contributor

@tdhock, are you able to provide more details? My coding skills in C aren't the best, but I'd love to keep documentation up to date and/or create more vignettes (like our missing join vignette 😅 that I already have a draft of), so want to know if I can contribute more actively.

@tdhock
Copy link
Member

tdhock commented Jun 20, 2023

hey @avimallu thanks for your interest. You could help by volunteering to code review some files, by adding your name to the CODEOWNERS file in PR #5629.
After the project starts next month, I will start a discussion about what kind of community governance model would be good for data.table, and I will certainly ask for your input at that point.
Going forward we will also be funding translations for documentation, to make data.table usage and contribution more accessible for non-English native speakers, so we may be able to fund you if you want to help with that.
Also we will have funding for travel awards, for people who want to give talks about data.table at R-related conferences, so you may be able to apply for one of those.

@neovom
Copy link

neovom commented Jul 19, 2023

@tdhock Sorry for the negative comment, but no commits since February is definitely not normal. A lot of people rely on this package, if it is de-facto abandoned people will (and already are) looking for alternatives. I understand it's open-source so there's no expectation of anything, but it would be a shame for such an important package like data.table to essentially rot. collapse is currently outshining data.table in terms of speed so there are very few reasons for anyone to use data.table in 2023 (apart from legacy and familiarity).

@arthurgailes
Copy link

Collapse is wonderful, but lacks the data.table's merge and reshape capabilities, and sometimes its memory efficiency. Perhaps we should simply think of data.table as a mature package.

@MLopez-Ibanez
Copy link
Contributor Author

@tdhock Sorry for the negative comment, but no commits since February is definitely not normal.

The main worry is not the lack of commits, it is the lack of maintainer. There are 134 pull requests, some of them obviously correct even to me (such as fixing the github actions). There are people willing to contribute but the only person able to approve changes and upload new versions to CRAN does not seem to be available to do so or delegate those tasks to someone else. It may take a fork and a new data.table2 to get things moving again.

(thanks for pointing out collapse , it looks interesting)

@davidbudzynski
Copy link
Contributor

Having only one maintainer who is rarely present is a terrible bus factor for the project, and it gives a feeling that the development is hindered by it: looking at the number of open PRs and also comparing the development speed on other projects such us Arrow, Collapse, etc. makes it look like it's slowly dying. Maybe someone who has contact with @mattdowle would be able to speak to him so more people are able to contribute to the project in his absence.

If this isn't possible, maybe a project fork is what we need...

@jangorecki
Copy link
Member

jangorecki commented Jul 23, 2023

@tdhock is in contact AFAIK. His last message clearly addresses our concerns. If his project will work out then there is no need for any forks etc. So as Matt is less responsive now, you can ask Toby for update, which I am pretty sure he will provide as soon as he will have any. What you or your organization can do to help is to look at codeowners file and possibly make a commitment to maintain a piece of the package.

@waynelapierre

This comment was marked as off-topic.

@jangorecki

This comment was marked as off-topic.

@neovom

This comment was marked as off-topic.

@phisanti
Copy link

@tdhock any update on the new form of governance? On the 20th of June, you wrote that the project will start next month. Will that materialise?

@davidbudzynski
Copy link
Contributor

davidbudzynski commented Aug 16, 2023

It's a little concerning that the person in contact with the maintainer is also MIA (edit: missing in action?), even though they were meant to start the new form of governance two months ago. How can we get this project going if the bus factor = 1?

@bluetealatte
Copy link

data.table stands as an unparalleled tool for many, characterized by its efficient data manipulation capabilities, swift performance, and its concise yet potent syntax. Personally, I consider it the most valuable R package, and it's the primary reason my team and I gravitate towards using R. Its role in our everyday tasks and larger projects is monumental. The recent absence of its main maintainer, coupled with a growing roster of unresolved issues, casts shadows over the package's trajectory. Given its significance, we should leave no stone unturned in safeguarding its future. What if we established a task force of dedicated users to sift through and prioritize these issues, potentially even sketching out developmental roadmaps? This could serve as an interim solution as we seek to engage with the primary maintainer. Furthermore, might there be a platform or method allowing us to financially back the package's evolution? The decline of such a pivotal package would be a significant loss. I earnestly hope someone with the requisite skills and passion can rise to champion its continued development.

@phisanti
Copy link

I definitively back the proposal of @bluetealatte. Moreover, if I understood it correctly, I think that was the original idea by @MLopez-Ibanez, but it seemed to end up in a dead end (?). The first thing should be to contact @mattdowle to check what could be done. Otherwise, without him, the only viable road is a fork.

@MLopez-Ibanez
Copy link
Contributor Author

Checks have started failing in CRAN: https://cran.r-project.org/web/checks/check_results_data.table.html
Typically, CRAN mantainers will swiftly remove packages whose maintainers do not reply to email. In this case, given the number of reverse imports, I hope that they will keep the package around despite failed checks as long as it builds.

At this point, it seems a fork (with a new name rather than a git fork) may be the only solution. If someone can create a data.table2 repository within the Rdatatable organisation as a clone of the current repository that would be perhaps the best approach. But data.table2 would need a new maintainer and I am not volunteering.

@MLopez-Ibanez MLopez-Ibanez changed the title is data.table abandoned? Should we switch to something else (arrow, tibble, ...)? is data.table abandoned? Should we switch to something else (arrow, tibble, collapse,...)? Aug 22, 2023
@jangorecki
Copy link
Member

Cran check looks good. Timeout on old windows is known issue, reported to cran and discussed here in another issue.

@tdhock
Copy link
Member

tdhock commented Aug 22, 2023

hi all, thanks for your concerns and valuable comments.
Sorry for my late response, as I have been traveling these past few weeks.
I am leader of a 2 year NSF POSE project, starting now, which is supposed to expand the open-source ecosystem of users and contributors around data.table, so it is extremely encouraging to see so many concerned users/contributors commenting here! 👍

I have created a new issue #5676 to discuss possibilities and proposals to formalize a governance document for data.table, and hopefully that should address some of these concerns.

@msummersgill
Copy link

msummersgill commented Aug 23, 2023

I therefore propose that we use this issue to write ... a governance document ... data.table project... (hopefully final) draft in end of May [2024].

( initial proposal in #5676)

I don't have any fundamental problem with the existence of a governance document. However, the existence of such a document doesn't actually solve some of the important questions raised here.

From a practical standpoint, when might users might expect to see a CRAN release to address issues like #5538 that are slated to be resolved in 1.14.9? Are you proposing that everything will be put on hold for 6-9 months while a document is drafted before anyone besides Matt approves a PR or puts out a release?

@Fred-Wu
Copy link

Fred-Wu commented Aug 24, 2023

I therefore propose that we use this issue to write ... a governance document ... data.table project... (hopefully final) draft in end of May [2024].
( initial proposal in #5676)

I don't have any fundamental problem with the existence of a governance document. However, the existence of such a document doesn't actually solve some of the important questions raised here.

From a practical standpoint, when might users might expect to see a CRAN release to address issues like #5538 that are slated to be resolved in 1.14.9? Are you proposing that everything will be put on hold for 6-9 months while a document is drafted before anyone besides Matt approves a PR or puts out a release?

I agree. At least should make 1.14.9 completed while drafting the governance document.

@tdhock
Copy link
Member

tdhock commented Aug 25, 2023

From a practical standpoint, when might users might expect to see a CRAN release to address issues like #5538 that are slated to be resolved in 1.14.9? Are you proposing that everything will be put on hold for 6-9 months while a document is drafted before anyone besides Matt approves a PR or puts out a release?

Hi @msummersgill thanks for your comment.
Currently my understanding is that Matt is only submitting "patch" releases (not including new features on master branch which may address issues like #5538), whenever there is a failing CRAN check that needs to be fixed.
Therefore I have proposed to discuss a new governance structure which will hopefully create new leadership and roles that will result in releasing new features from master branch which may address issues like #5538.
Yes my proposition was to take 6-9 months to obtain consensus from the community, and if you have an alternative proposition for the timeline, please share it on the governance issue #5676.
I'm not sure when users should expect to see a CRAN releases to address issues like #5538, but I would expect that will happen after we agree on new governance and leadership/roles.
There are technical revdep issues https://github.com/Rdatatable/data.table/issues?q=is%3Aissue+is%3Aopen+revdep+label%3Arevdep with the current master branch which need to be resolved before submitting the new master branch features to CRAN (which requires compatibility for new submissions, in other words our new code should not cause other CRAN package checks to fail). In particular #5133 is particularly tricky and could use some help, if you have time to contribute.

@jangorecki
Copy link
Member

jangorecki commented Aug 25, 2023

Glad you pointed out #5133. If anyone wants to release fast, this is the place to start.
This is what blocked proper release last time when we were approaching (shifting many non critical items for next milestone).
6-9m seems long but it is better than never.

@MLopez-Ibanez
Copy link
Contributor Author

Glad you pointed out #5133. If anyone wants to release fast, this is the place to start. This is what blocked proper release last time when we were approaching (shifting many non critical items for next milestone). 6-9m seems long but it is better than never.

Is it necessary to wait 9 months to merge the PR that fixes the github actions #5632 ?
Or trivial documentation changes: #5673 ?
There may even be PRs that already fix some of the revdep regressions...

@MLopez-Ibanez
Copy link
Contributor Author

Another thing that would be useful in the short term: Pin issues like #5676 so people can find them quickly.

@tdhock
Copy link
Member

tdhock commented Sep 12, 2023

Hi! Another revdep issue that is tricky, but must be resolved prior to releasing new features to CRAN is #5541 so if anyone has time to investigate and fix, that would be much appreciated. (and would make it possible to release new features to CRAN sooner)

@MLopez-Ibanez
Copy link
Contributor Author

MLopez-Ibanez commented Oct 12, 2023

Hi! Another revdep issue that is tricky, but must be resolved prior to releasing new features to CRAN is #5541 so if anyone has time to investigate and fix, that would be much appreciated. (and would make it possible to release new features to CRAN sooner)

Another alternative is to fork data.table at tag 1.14.8 (https://github.com/Rdatatable/data.table/commits/1.14.8), which is the last one in CRAN, then start merging pull requests one by one doing a full revdep for each of them. Every PR that does not pass the full revdep is not merged. This will ensure that resulting code is at least as solid as the version currently in CRAN. The first PR to merge should be #5691.

If by the time this process is finished (or the person doing it has had enough or the next release of R is about to be released), there is no progress with data.table, submit the fork to CRAN as data.table2. I would switch my R package to use this fork if it was available in CRAN.

The code here: https://github.com/tdhock/data.table-revdeps (more details here: https://github.com/Rdatatable/data.table/wiki/Release-management-and-revdep-checks) may be helpful to implement the above idea.

@tdhock
Copy link
Member

tdhock commented Oct 26, 2023

data.table is not abandoned, and Matt has granted Maintainer team to Jan, Michael, and myself, so we definitely do not need to fork. Let's continue working together in this repo to make data.table the best it can be!

@tdhock tdhock closed this as completed Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests