-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YouTube] Support csv and zip subscription import (Google Takeout) #709
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this needed usability improvement!
|
||
import javax.annotation.Nonnull; | ||
|
||
import static org.schabi.newpipe.extractor.subscription.SubscriptionExtractor.ContentSource.INPUT_STREAM; | ||
|
||
/** | ||
* Extract subscriptions from a Google takout export (the user has to get the JSON out of the zip) | ||
* Extract subscriptions from a Google takeout export (the user has to get the JSON out of the zip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Extract subscriptions from a Google takeout export (the user has to get the JSON out of the zip) | |
* Extract subscriptions from a Google takeout export |
public static final String[] ENTRY_PATHS = { | ||
"Takeout/YouTube and YouTube Music/subscriptions/subscriptions.csv", | ||
"Takeout/YouTube y YouTube Music/suscripciones/suscripciones.csv" // There is a <NBSP> between YouTube and Music | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't rely on a list of paths. It will be different for each language. I'd navigate the whole zip searching for a valid file instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible way to find this file is to look for files in the glob Takeout/YouTube*/*/*.csv
and find one where the file starts with a match for this regex (this regex is a JS one, unsure of the Java equiv):
/^.*\n([0-9A-Za-z_-]{20,}),[^,]*?\/\1,.+\n/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ intuition, this regex skips the first line, checks the second line to see if it starts with something that looks like a channel id, then looks for /
followed by that same id again followed by a comma ,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, it should work out fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done--brief summary of changes:
CSV: after 5 lines, checks to make sure it's found subscription items (ensures the 2nd entry is a channel url)
if there's no items, then it exits out of the function, returning an empty list.
ZIP: iterates over every CSV file, gets the subscription items from that CSV file, if there's an error or no items in the list, it goes to the next CSV file in the zip.
Thanks for your feedback -- ready for review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this approach looks good to me! Have you tested it? Also, please put final
in front of every variable you never change after declaration.
@Stypox Final: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost good ;-)
...a/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeSubscriptionExtractor.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good code-wise :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good 😄
Test could be in theory parametrized, however because this is pain in JUnit 4 and creates a bunch of overhead, it's okay for me to go with a foreach
-loop.
...a/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeSubscriptionExtractor.java
Outdated
Show resolved
Hide resolved
I think this is good to be merged now. The failed tests are unrelated to my changes (a Peertube test is failing). |
@TobiGr please merge this, I don't have permission as apparently |
@Stypox Synk fails because of outdated jsoup version having a vulnerability. The version was already updated and merged. So if this is rebased, the check should succeed |
d5e74f3
to
ca6d64c
Compare
csv: Improved error messages Exits early if it hasnt found any items in the first few lines zip: Now checks all CSV files instead of hard-coded paths final qualifiers for immutable locals and parameters Co-authored-by: litetex <40789489+litetex@users.noreply.github.com>
Rebased, thank you! |
...a/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeSubscriptionExtractor.java
Show resolved
Hide resolved
Importing the CSV file doesn't work in the latest NewPipe available from F-Droid. |
@Simon311 The version with this fix isn't released yet. You can install and test the Release Candidate by checking the pinned issue. |
Extractor fix for issue TeamNewPipe/NewPipe#6757
This is the extractor PR for supporting the subscriptions.csv file from Google Takeout.
Also supports selecting a ZIP file (which will in turn find the subscriptions.csv file)
I introduced a new method on
SubscriptionExtractor
to determine what the stream is (e.g. json, zip, or csv):List<SubscriptionItem> fromInputStream(@Nonnull final InputStream contentInputStream, String contentType)
Ready for review and happy to make any changes to get this accepted.