Skip to content

Conversation

@shovel-kun
Copy link

Resolves #134

Comment fetching logic is taken from https://github.com/tanbatu/comment-zouryou.

If you want, a flag could be added for specifying a timestamp.

Copy link
Owner

@AlexAplin AlexAplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment-zouryou is MIT licensed, so we should probably give credit in our LICENSE file. Just a small attribution at the end like:

Methods and logic borrowed from the following programs. See their respective licenses for more details:
comment-zouryou (https://github.com/tanbatu/comment-zouryou): Copyright (c) 2022 tanbatu, MIT License

If it's easy enough, feel free to add a flag. I think it should be pretty simple with the utility method you added to get the timestamp. Probably --comments-date="2025-05-05 00:00:00" or whatever reasonable granularity makes sense.

@shovel-kun
Copy link
Author

shovel-kun commented May 6, 2025

Hi @AlexAplin, thanks for reviewing my code.

I've made changes to resolves most of the issues, but I need clarification for some. I plan to implement the datetime range flag once we have resolved all of this.

By default, datetime range will be from current unix timestamp (start) until 2007-03-03 11:59 pm unix timestamp (end) (as set comment-zouryou) and cannot exceed that range.

Additionally, by default nndownload will try to fetch all comments in the datetime range or there are no more comments to fetch, whichever comes first. Some videos have millions of comments and it might take a very long time to fetch them all, so I should include a flag for the comment fetching to stop once it has fetched X number of comments (or perhaps timeout after some time? not sure which is better). This limit will take priority, then datetime range, and finally no more comments.

I would love to hear your thoughts.

@AlexAplin
Copy link
Owner

AlexAplin commented May 6, 2025

By default, datetime range will be from current unix timestamp (start) until 2007-03-03 11:59 pm unix timestamp (end) (as set comment-zouryou) and cannot exceed that range.

March 6th is when γ launched, along with the first video uploads, but maybe test data goes back to then. I think it's probably fine to use that as a limit if you specify it came from comment-zouryou.

This limit will take priority, then datetime range, and finally no more comments.

I'd add --comments-limit and set it with a default of 1000, which is what the site provides. Rather than a date range, --comments-from-date or similar should get comments at the timestamp requested, respecting --comments-limit. Additionally, --request-all-comments should do as said, and ignore the limit and date flags.

  • --download-comments: Request 1000 comments from today.
  • --download-comments --comment-limit <n>: Request n comments from today. If not a valid integer, do nothing and output a warning.
  • --download-comments --comments-from-date "%Y-%m-%d": Request 1000 comments looking back from %Y-%m-%d (initial when header) to 2007-03-03. If the date provided is invalid or before 2007-03-03, do nothing and output a warning.
  • --download-comments --comments-limit <n> --comments-from-date "%Y-%m-%d": Request n comments looking back from %Y-%m-%d to 2007-03-03. Same integer and date checks as above.
  • --download-comments --request-all-comments: Ignore all other flags and request going back from today to 2007-03-03

If any of the new flags are specified without --download-comments, a warning should be output saying they need to specify --download-comments additionally.

Hope this makes sense, but please let me know if you have any questions. Thanks for your efforts!

@shovel-kun
Copy link
Author

I think it's probably fine to use that as a limit if you specify it came from comment-zouryou.

I'll specify.

I'd add --comments-limit and set it with a default of 1000, which is what the site provides.

On some videos, such as sm9, I get 250 comments instead of 1000 per fetch. Not sure if I'm being rate-limited, or niconico limits number of comments fetched based on total number of comments on the video. Could you check if this is the case for you?

Anyways, this sounds good to me, just that we might not get the exact number of comments given that per fetch number is variable, but that shouldn't be an issue. Just add in the flag description it might not be exact.

--download-comments --comments-from-date "%Y-%m-%d"

I'll make it datetime instead of just date for sake of granularity. If only date is provided, assume 11:59:59.

We should also want to accomodate the flags --download-comments --request-all-comments --comments-from-date "%Y-%m-%d", since the user could have stopped fetching comments halfway through and want to resume from when they last started. Hmm, I should also accept unix timestamps so that the user can just copy and paste the last when from their comment json data.

Off-topic: do you kmow what easy threads/comments are? What makes them different from main?

@AlexAplin
Copy link
Owner

AlexAplin commented May 7, 2025

That additional combination is a good call, feel free to handle that.

main comments are normal comments. easy comments should be the preset comments you see below videos, like かわいい or うぽつ, which is a fairly recent addition. Nicopedia article about them (says they were added 2020-07-27): https://dic.nicovideo.jp/a/%E3%81%8B%E3%82%93%E3%81%9F%E3%82%93%E3%82%B3%E3%83%A1%E3%83%B3%E3%83%88
In the past there was also an owner thread, no idea if that's still around.

On some videos, such as sm9, I get 250 comments instead of 1000 per fetch. Not sure if I'm being rate-limited, or niconico limits number of comments fetched based on total number of comments on the video. Could you check if this is the case for you?

On sm9 I get 980 comments received in the browser request to https://public.nvcomment.nicovideo.jp/v1/threads. Not sure why you'd get less, except maybe if your account's language is set differently, but it's likely to not always be 1000 anyway because of deleted and moderated comments.

@shovel-kun
Copy link
Author

shovel-kun commented May 7, 2025

Hi @AlexAplin, thanks for the reply. Yes, the owner thread can still be fetched (sm9 has that).

I've added the flags you've requested. Note that I've changed --request-all-comments to just --all-comments to make it shorter.

You can test the following commands:

python nndownload.py -s --comments-limit 2000 "https://www.nicovideo.jp/watch/sm9"

  • Result: Comment downloading qualifiers --comments-limit, --request-all-comments, or --comments-from were specified, but --download-comments was not. Did you forget to set --download-comments?

python nndownload.py -sc --comments-limit 2000 "https://www.nicovideo.jp/watch/sm9"

  • Result: Downloads ≈2000 comments

python nndownload.py -sc --comments-limit 2000 --comments-from "2025-05-04T16:30:37+09:00" "https://www.nicovideo.jp/watch/sm9"

  • Result: First comment in JSON is after that date

python nndownload.py -sc --comments-limit 2000 --all-comments --comments-from "2025-05-04T16:30:37" "https://www.nicovideo.jp/watch/sm9"

  • Result: Ignores --comments-limit, saves downloaded comments on CTRL+C.

In addition, I've changed the comment fetching logic such that it will append to global COMMENT_DATA_JSON on every fetch. I did this so that if the comment processing takes too long and the user wants to stop it, nndownload will save whatever we've already fetched instead discarding all progress.

Since downloading comments can take quite a while, I want to implement an estimated progress output. Something that takes one line per thread, like:

Downloaded 123 out of 426 comments from main thread (est. time left: 56 seconds)

If you have any preferences on the format or tool to use for this, let me know.

@AlexAplin
Copy link
Owner

AlexAplin commented May 7, 2025

We settled on using rich for progress bars, so specifying the total should work, and you can set up a task for each thread if desired.

@shovel-kun
Copy link
Author

@AlexAplin added the progress bar + a check on whether comments.json exists so that we don't accidentally overwrite previous progress.

# Save in case of success and on interrupt)
with open(filename, "w", encoding="utf-8") as file:
json.dump(COMMENTS_DATA_JSON, file, indent=4, ensure_ascii=False, sort_keys=True)
json.dump(COMMENTS_DATA_JSON, file, indent=None, ensure_ascii=False, sort_keys=True)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

halves the space taken from my very limited testing.

@AlexAplin
Copy link
Owner

This got really spaghetti-fied. I've simplified a lot of the logic and made improvements. I've tested the different flows but may need reverification. I also removed the save on Ctrl+C because of the threading changes, but should be decently easy to add back I think.

I'll leave some feedback to explain my changes

# There are no comments before lastTime to fetch
break

# If we got the same comments as last time, we should stop
Copy link
Owner

@AlexAplin AlexAplin May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems really unnecessary, especially iterating over every comment. Isn't it really unlikely that we would retrieve the exact same response twice? If this is a concern, it should be sufficient to see if one specific last ID is present in both requests. I've taken this out for now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gotten this issue before from some videos, and I think that is why comment-zouryou stops fetching when it sees comment id 1-5, but that can lead to possibly missing comment id 1-4 if 5 was found. That is why I introduced this check so that we don't get stuck in a loop getting the same ids again.

Your check is way better though, I'll implement that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to work on this still?

@shovel-kun
Copy link
Author

@AlexAplin sorry, been a bit busy. I'll try to finish this up in around 3-5 days.

@AlexAplin
Copy link
Owner

AlexAplin commented Jun 7, 2025

@AlexAplin sorry, been a bit busy. I'll try to finish this up in around 3-5 days.

No rush, thanks for your efforts

@AlexAplin
Copy link
Owner

@shovel-kun Hi again, just checking in. I'd like to get this merged before #205. If it's okay, I can make some of the changes on the branch

@shovel-kun
Copy link
Author

Yes, sorry, go ahead @AlexAplin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Download comment history

2 participants