-
Notifications
You must be signed in to change notification settings - Fork 34
feat: get rolling comments #206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment-zouryou is MIT licensed, so we should probably give credit in our LICENSE file. Just a small attribution at the end like:
Methods and logic borrowed from the following programs. See their respective licenses for more details:
comment-zouryou (https://github.com/tanbatu/comment-zouryou): Copyright (c) 2022 tanbatu, MIT LicenseIf it's easy enough, feel free to add a flag. I think it should be pretty simple with the utility method you added to get the timestamp. Probably --comments-date="2025-05-05 00:00:00" or whatever reasonable granularity makes sense.
|
Hi @AlexAplin, thanks for reviewing my code. I've made changes to resolves most of the issues, but I need clarification for some. I plan to implement the datetime range flag once we have resolved all of this. By default, datetime range will be from current unix timestamp (start) until 2007-03-03 11:59 pm unix timestamp (end) (as set comment-zouryou) and cannot exceed that range. Additionally, by default nndownload will try to fetch all comments in the datetime range or there are no more comments to fetch, whichever comes first. Some videos have millions of comments and it might take a very long time to fetch them all, so I should include a flag for the comment fetching to stop once it has fetched X number of comments (or perhaps timeout after some time? not sure which is better). This limit will take priority, then datetime range, and finally no more comments. I would love to hear your thoughts. |
March 6th is when γ launched, along with the first video uploads, but maybe test data goes back to then. I think it's probably fine to use that as a limit if you specify it came from comment-zouryou.
I'd add
If any of the new flags are specified without Hope this makes sense, but please let me know if you have any questions. Thanks for your efforts! |
I'll specify.
On some videos, such as Anyways, this sounds good to me, just that we might not get the exact number of comments given that per fetch number is variable, but that shouldn't be an issue. Just add in the flag description it might not be exact.
I'll make it datetime instead of just date for sake of granularity. If only date is provided, assume 11:59:59. We should also want to accomodate the flags Off-topic: do you kmow what easy threads/comments are? What makes them different from main? |
|
That additional combination is a good call, feel free to handle that.
On sm9 I get 980 comments received in the browser request to |
|
Hi @AlexAplin, thanks for the reply. Yes, the owner thread can still be fetched ( I've added the flags you've requested. Note that I've changed You can test the following commands:
In addition, I've changed the comment fetching logic such that it will append to global Since downloading comments can take quite a while, I want to implement an estimated progress output. Something that takes one line per thread, like:
If you have any preferences on the format or tool to use for this, let me know. |
|
We settled on using rich for progress bars, so specifying the total should work, and you can set up a task for each thread if desired. |
|
@AlexAplin added the progress bar + a check on whether comments.json exists so that we don't accidentally overwrite previous progress. |
nndownload/nndownload.py
Outdated
| # Save in case of success and on interrupt) | ||
| with open(filename, "w", encoding="utf-8") as file: | ||
| json.dump(COMMENTS_DATA_JSON, file, indent=4, ensure_ascii=False, sort_keys=True) | ||
| json.dump(COMMENTS_DATA_JSON, file, indent=None, ensure_ascii=False, sort_keys=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
halves the space taken from my very limited testing.
|
This got really spaghetti-fied. I've simplified a lot of the logic and made improvements. I've tested the different flows but may need reverification. I also removed the save on Ctrl+C because of the threading changes, but should be decently easy to add back I think. I'll leave some feedback to explain my changes |
nndownload/nndownload.py
Outdated
| # There are no comments before lastTime to fetch | ||
| break | ||
|
|
||
| # If we got the same comments as last time, we should stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems really unnecessary, especially iterating over every comment. Isn't it really unlikely that we would retrieve the exact same response twice? If this is a concern, it should be sufficient to see if one specific last ID is present in both requests. I've taken this out for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have gotten this issue before from some videos, and I think that is why comment-zouryou stops fetching when it sees comment id 1-5, but that can lead to possibly missing comment id 1-4 if 5 was found. That is why I introduced this check so that we don't get stuck in a loop getting the same ids again.
Your check is way better though, I'll implement that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to work on this still?
|
@AlexAplin sorry, been a bit busy. I'll try to finish this up in around 3-5 days. |
No rush, thanks for your efforts |
|
@shovel-kun Hi again, just checking in. I'd like to get this merged before #205. If it's okay, I can make some of the changes on the branch |
|
Yes, sorry, go ahead @AlexAplin |
Resolves #134
Comment fetching logic is taken from https://github.com/tanbatu/comment-zouryou.
If you want, a flag could be added for specifying a timestamp.