Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Support csv and zip subscription import (Google Takeout) #709

Merged
merged 1 commit into from
Aug 24, 2021

Conversation

talanc
Copy link
Contributor

@talanc talanc commented Aug 8, 2021

  • I carefully read the contribution guidelines and agree to them.
  • I have tested the API against NewPipe.
  • I agree to create a pull request for NewPipe as soon as possible to make it compatible with the changed API.

Extractor fix for issue TeamNewPipe/NewPipe#6757

This is the extractor PR for supporting the subscriptions.csv file from Google Takeout.
Also supports selecting a ZIP file (which will in turn find the subscriptions.csv file)
I introduced a new method on SubscriptionExtractor to determine what the stream is (e.g. json, zip, or csv):
List<SubscriptionItem> fromInputStream(@Nonnull final InputStream contentInputStream, String contentType)

Ready for review and happy to make any changes to get this accepted.

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this needed usability improvement!


import javax.annotation.Nonnull;

import static org.schabi.newpipe.extractor.subscription.SubscriptionExtractor.ContentSource.INPUT_STREAM;

/**
* Extract subscriptions from a Google takout export (the user has to get the JSON out of the zip)
* Extract subscriptions from a Google takeout export (the user has to get the JSON out of the zip)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Extract subscriptions from a Google takeout export (the user has to get the JSON out of the zip)
* Extract subscriptions from a Google takeout export

Comment on lines 124 to 127
public static final String[] ENTRY_PATHS = {
"Takeout/YouTube and YouTube Music/subscriptions/subscriptions.csv",
"Takeout/YouTube y YouTube Music/suscripciones/suscripciones.csv" // There is a <NBSP> between YouTube and Music
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't rely on a list of paths. It will be different for each language. I'd navigate the whole zip searching for a valid file instead.

Copy link

@tbjgolden tbjgolden Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible way to find this file is to look for files in the glob Takeout/YouTube*/*/*.csv and find one where the file starts with a match for this regex (this regex is a JS one, unsure of the Java equiv):

/^.*\n([0-9A-Za-z_-]{20,}),[^,]*?\/\1,.+\n/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ intuition, this regex skips the first line, checks the second line to see if it starts with something that looks like a channel id, then looks for / followed by that same id again followed by a comma ,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, it should work out fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done--brief summary of changes:

CSV: after 5 lines, checks to make sure it's found subscription items (ensures the 2nd entry is a channel url)
if there's no items, then it exits out of the function, returning an empty list.

ZIP: iterates over every CSV file, gets the subscription items from that CSV file, if there's an error or no items in the list, it goes to the next CSV file in the zip.

@talanc
Copy link
Contributor Author

talanc commented Aug 10, 2021

Thanks for your feedback -- ready for review again.

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this approach looks good to me! Have you tested it? Also, please put final in front of every variable you never change after declaration.

@talanc
Copy link
Contributor Author

talanc commented Aug 14, 2021

@Stypox
Testing:
Yes.
It has unit tests for the CSV and ZIP import.
I've manually tested (via an emulator) with two different subscription files.
And I'm sure there will be another round of testing when it gets merged and the UI layer is tested: TeamNewPipe/NewPipe#6882

Final:
I've added final to variables where appropriate.

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good ;-)

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good code-wise :-)

Copy link
Member

@litetex litetex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good 😄
Test could be in theory parametrized, however because this is pain in JUnit 4 and creates a bunch of overhead, it's okay for me to go with a foreach-loop.

@talanc
Copy link
Contributor Author

talanc commented Aug 17, 2021

I think this is good to be merged now.

The failed tests are unrelated to my changes (a Peertube test is failing).

@Stypox
Copy link
Member

Stypox commented Aug 24, 2021

@TobiGr please merge this, I don't have permission as apparently snyk failed. And I don't have access to Snyk.

@XiangRongLin
Copy link
Collaborator

@Stypox Synk fails because of outdated jsoup version having a vulnerability. The version was already updated and merged. So if this is rebased, the check should succeed

csv:
Improved error messages
Exits early if it hasnt found any items in the first few lines

zip:
Now checks all CSV files instead of hard-coded paths

final qualifiers for immutable locals and parameters

Co-authored-by: litetex <40789489+litetex@users.noreply.github.com>
@Stypox Stypox changed the title YoutubeSubscriptionExtractor: Support for subscriptions.csv (and zip) [YouTube] Support csv and zip subscription import (Google Takeout) Aug 24, 2021
@Stypox
Copy link
Member

Stypox commented Aug 24, 2021

Rebased, thank you!

@Stypox Stypox merged commit db6c729 into TeamNewPipe:dev Aug 24, 2021
@Simon311
Copy link

Importing the CSV file doesn't work in the latest NewPipe available from F-Droid.
But perhaps more importantly, my Google export only returned 6 channels out of hundreds. I know there's nothing you guys can do about it, but has anyone had this same issue? Is there a way to report this to Google, since this is a clear GDPR violation?

@opusforlife2
Copy link
Collaborator

Importing the CSV file doesn't work in the latest NewPipe available from F-Droid.

@Simon311 The version with this fix isn't released yet. You can install and test the Release Candidate by checking the pinned issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is related to a bug youtube service, https://www.youtube.com/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants