feat: get rolling comments #206

shovel-kun · 2025-05-05T15:36:39Z

Resolves #134

Comment fetching logic is taken from https://github.com/tanbatu/comment-zouryou.

If you want, a flag could be added for specifying a timestamp.

AlexAplin

comment-zouryou is MIT licensed, so we should probably give credit in our LICENSE file. Just a small attribution at the end like:

Methods and logic borrowed from the following programs. See their respective licenses for more details:
comment-zouryou (https://github.com/tanbatu/comment-zouryou): Copyright (c) 2022 tanbatu, MIT License

If it's easy enough, feel free to add a flag. I think it should be pretty simple with the utility method you added to get the timestamp. Probably --comments-date="2025-05-05 00:00:00" or whatever reasonable granularity makes sense.

nndownload/nndownload.py

shovel-kun · 2025-05-06T10:50:11Z

Hi @AlexAplin, thanks for reviewing my code.

I've made changes to resolves most of the issues, but I need clarification for some. I plan to implement the datetime range flag once we have resolved all of this.

By default, datetime range will be from current unix timestamp (start) until 2007-03-03 11:59 pm unix timestamp (end) (as set comment-zouryou) and cannot exceed that range.

Additionally, by default nndownload will try to fetch all comments in the datetime range or there are no more comments to fetch, whichever comes first. Some videos have millions of comments and it might take a very long time to fetch them all, so I should include a flag for the comment fetching to stop once it has fetched X number of comments (or perhaps timeout after some time? not sure which is better). This limit will take priority, then datetime range, and finally no more comments.

I would love to hear your thoughts.

AlexAplin · 2025-05-06T21:35:25Z

By default, datetime range will be from current unix timestamp (start) until 2007-03-03 11:59 pm unix timestamp (end) (as set comment-zouryou) and cannot exceed that range.

March 6th is when γ launched, along with the first video uploads, but maybe test data goes back to then. I think it's probably fine to use that as a limit if you specify it came from comment-zouryou.

This limit will take priority, then datetime range, and finally no more comments.

I'd add --comments-limit and set it with a default of 1000, which is what the site provides. Rather than a date range, --comments-from-date or similar should get comments at the timestamp requested, respecting --comments-limit. Additionally, --request-all-comments should do as said, and ignore the limit and date flags.

--download-comments: Request 1000 comments from today.
--download-comments --comment-limit <n>: Request n comments from today. If not a valid integer, do nothing and output a warning.
--download-comments --comments-from-date "%Y-%m-%d": Request 1000 comments looking back from %Y-%m-%d (initial when header) to 2007-03-03. If the date provided is invalid or before 2007-03-03, do nothing and output a warning.
--download-comments --comments-limit <n> --comments-from-date "%Y-%m-%d": Request n comments looking back from %Y-%m-%d to 2007-03-03. Same integer and date checks as above.
--download-comments --request-all-comments: Ignore all other flags and request going back from today to 2007-03-03

If any of the new flags are specified without --download-comments, a warning should be output saying they need to specify --download-comments additionally.

Hope this makes sense, but please let me know if you have any questions. Thanks for your efforts!

shovel-kun · 2025-05-06T22:06:17Z

I think it's probably fine to use that as a limit if you specify it came from comment-zouryou.

I'll specify.

I'd add --comments-limit and set it with a default of 1000, which is what the site provides.

On some videos, such as sm9, I get 250 comments instead of 1000 per fetch. Not sure if I'm being rate-limited, or niconico limits number of comments fetched based on total number of comments on the video. Could you check if this is the case for you?

Anyways, this sounds good to me, just that we might not get the exact number of comments given that per fetch number is variable, but that shouldn't be an issue. Just add in the flag description it might not be exact.

--download-comments --comments-from-date "%Y-%m-%d"

I'll make it datetime instead of just date for sake of granularity. If only date is provided, assume 11:59:59.

We should also want to accomodate the flags --download-comments --request-all-comments --comments-from-date "%Y-%m-%d", since the user could have stopped fetching comments halfway through and want to resume from when they last started. Hmm, I should also accept unix timestamps so that the user can just copy and paste the last when from their comment json data.

Off-topic: do you kmow what easy threads/comments are? What makes them different from main?

AlexAplin · 2025-05-07T04:22:58Z

That additional combination is a good call, feel free to handle that.

main comments are normal comments. easy comments should be the preset comments you see below videos, like かわいい or うぽつ, which is a fairly recent addition. Nicopedia article about them (says they were added 2020-07-27): https://dic.nicovideo.jp/a/%E3%81%8B%E3%82%93%E3%81%9F%E3%82%93%E3%82%B3%E3%83%A1%E3%83%B3%E3%83%88
In the past there was also an owner thread, no idea if that's still around.

On some videos, such as sm9, I get 250 comments instead of 1000 per fetch. Not sure if I'm being rate-limited, or niconico limits number of comments fetched based on total number of comments on the video. Could you check if this is the case for you?

On sm9 I get 980 comments received in the browser request to https://public.nvcomment.nicovideo.jp/v1/threads. Not sure why you'd get less, except maybe if your account's language is set differently, but it's likely to not always be 1000 anyway because of deleted and moderated comments.

shovel-kun · 2025-05-07T15:43:00Z

Hi @AlexAplin, thanks for the reply. Yes, the owner thread can still be fetched (sm9 has that).

I've added the flags you've requested. Note that I've changed --request-all-comments to just --all-comments to make it shorter.

You can test the following commands:

python nndownload.py -s --comments-limit 2000 "https://www.nicovideo.jp/watch/sm9"

Result: Comment downloading qualifiers --comments-limit, --request-all-comments, or --comments-from were specified, but --download-comments was not. Did you forget to set --download-comments?

python nndownload.py -sc --comments-limit 2000 "https://www.nicovideo.jp/watch/sm9"

Result: Downloads ≈2000 comments

python nndownload.py -sc --comments-limit 2000 --comments-from "2025-05-04T16:30:37+09:00" "https://www.nicovideo.jp/watch/sm9"

Result: First comment in JSON is after that date

python nndownload.py -sc --comments-limit 2000 --all-comments --comments-from "2025-05-04T16:30:37" "https://www.nicovideo.jp/watch/sm9"

Result: Ignores --comments-limit, saves downloaded comments on CTRL+C.

In addition, I've changed the comment fetching logic such that it will append to global COMMENT_DATA_JSON on every fetch. I did this so that if the comment processing takes too long and the user wants to stop it, nndownload will save whatever we've already fetched instead discarding all progress.

Since downloading comments can take quite a while, I want to implement an estimated progress output. Something that takes one line per thread, like:

Downloaded 123 out of 426 comments from main thread (est. time left: 56 seconds)

If you have any preferences on the format or tool to use for this, let me know.

AlexAplin · 2025-05-07T22:00:47Z

We settled on using rich for progress bars, so specifying the total should work, and you can set up a task for each thread if desired.

shovel-kun · 2025-05-13T20:24:29Z

@AlexAplin added the progress bar + a check on whether comments.json exists so that we don't accidentally overwrite previous progress.

shovel-kun · 2025-05-13T21:19:04Z

nndownload/nndownload.py

        # Save in case of success and on interrupt)
        with open(filename, "w", encoding="utf-8") as file:
-            json.dump(COMMENTS_DATA_JSON, file, indent=4, ensure_ascii=False, sort_keys=True)
+            json.dump(COMMENTS_DATA_JSON, file, indent=None, ensure_ascii=False, sort_keys=True)


halves the space taken from my very limited testing.

AlexAplin · 2025-05-22T05:26:33Z

This got really spaghetti-fied. I've simplified a lot of the logic and made improvements. I've tested the different flows but may need reverification. I also removed the save on Ctrl+C because of the threading changes, but should be decently easy to add back I think.

I'll leave some feedback to explain my changes

nndownload/nndownload.py

AlexAplin · 2025-05-22T05:41:50Z

nndownload/nndownload.py

-                    # There are no comments before lastTime to fetch
-                    break
-
-                # If we got the same comments as last time, we should stop


This seems really unnecessary, especially iterating over every comment. Isn't it really unlikely that we would retrieve the exact same response twice? If this is a concern, it should be sufficient to see if one specific last ID is present in both requests. I've taken this out for now

I have gotten this issue before from some videos, and I think that is why comment-zouryou stops fetching when it sees comment id 1-5, but that can lead to possibly missing comment id 1-4 if 5 was found. That is why I introduced this check so that we don't get stuck in a loop getting the same ids again.

Your check is way better though, I'll implement that.

Are you planning to work on this still?

nndownload/nndownload.py

README.md

shovel-kun · 2025-06-07T11:32:29Z

@AlexAplin sorry, been a bit busy. I'll try to finish this up in around 3-5 days.

AlexAplin · 2025-06-07T19:46:05Z

@AlexAplin sorry, been a bit busy. I'll try to finish this up in around 3-5 days.

No rush, thanks for your efforts

AlexAplin · 2025-07-16T00:47:05Z

@shovel-kun Hi again, just checking in. I'd like to get this merged before #205. If it's okay, I can make some of the changes on the branch

shovel-kun · 2025-07-16T05:31:40Z

Yes, sorry, go ahead @AlexAplin

shovel-kun added 2 commits May 5, 2025 23:32

feat: get rolling comments

4ba87cb

fix: remove unused urlencode

ebe02fd

AlexAplin requested changes May 5, 2025

View reviewed changes

AlexAplin added the enhancement label May 5, 2025

shovel-kun added 5 commits May 6, 2025 02:06

perf: remove 1 sec rate limiting for comments

c68e3ba

add attribution

d512349

use API_HEADERS and specify thread refresh url as global

4369a44

easy wins

79e6c8b

resolve most convos

af3781d

remove extra space

440896f

feat: add comment qualifier cli flags with impl

2720d09

shovel-kun added 2 commits May 14, 2025 04:15

feat: comments fetching progress bar

bb58443

fix: file exists check on comments.json

34174ef

perf: no indents when dumping comments json

05750a0

shovel-kun commented May 13, 2025

View reviewed changes

Make changes and improvements

2b80b0a

AlexAplin reviewed May 22, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

AlexAplin added 3 commits May 22, 2025 01:53

Standardize outputs

498b63b

Update README

c971c49

Clarify constant

3889fee

feat: get rolling comments #206

Are you sure you want to change the base?

feat: get rolling comments #206

Uh oh!

Conversation

shovel-kun commented May 5, 2025

Uh oh!

AlexAplin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shovel-kun commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexAplin commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shovel-kun commented May 6, 2025

Uh oh!

AlexAplin commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shovel-kun commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexAplin commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shovel-kun commented May 13, 2025

Uh oh!

shovel-kun May 13, 2025

Choose a reason for hiding this comment

Uh oh!

AlexAplin commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexAplin May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shovel-kun May 22, 2025

Choose a reason for hiding this comment

Uh oh!

AlexAplin Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shovel-kun commented Jun 7, 2025

Uh oh!

AlexAplin commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexAplin commented Jul 16, 2025

Uh oh!

shovel-kun commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

AlexAplin left a comment •

edited

Loading

shovel-kun commented May 6, 2025 •

edited

Loading

AlexAplin commented May 6, 2025 •

edited

Loading

AlexAplin commented May 7, 2025 •

edited

Loading

shovel-kun commented May 7, 2025 •

edited

Loading

AlexAplin commented May 7, 2025 •

edited

Loading

AlexAplin May 22, 2025 •

edited

Loading

AlexAplin commented Jun 7, 2025 •

edited

Loading