-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BrainPOP] Add new extractor (Closes #10000) #10025
Conversation
youtube_dl/extractor/brainpop.py
Outdated
|
||
self.report_extraction(video_id) | ||
|
||
ec_token = self._html_search_regex(r'ec_token : \'(.+)\'', webpage, 'token') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its easier to parse the JSON instead of searching through it manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TRox1972 Right, but the JSON is mixed in with the HTML. How should I go about extracting it?
EDIT: I think I found a way.
youtube_dl/extractor/brainpop.py
Outdated
|
||
self.report_extraction(video_id) | ||
|
||
ec_token = self._html_search_regex(r"ec_token : '([^']*)'", webpage, 'token') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It this needed? Does it work without it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The token? Yea, you can't access the videos without it.
…bscription video detection
youtube_dl/extractor/brainpop.py
Outdated
'id': content['category']['unit']['topic']['EntryID'], | ||
'display_id': display_id, | ||
'title': remove_end(settings['title'], ' - BrainPOP'), | ||
'description': settings['description'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be non-fatal
'display_id': display_id, | ||
'title': remove_end(settings['title'], ' - BrainPOP'), | ||
'description': settings['description'], | ||
'title': remove_end(settings.get('title', display_id), ' - BrainPOP'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title
should be mandatory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a pattern is OK if title
is missing in settings
. @nehalvpatel Do you have an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title
can be extracted from <title>(.*)</title>
also. One of them should be fatal, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is fine. In general <title>
is better than extracting from embedded data as the former is less likely to change. If there's an example that <title>
or settings['title']
is missing, a fallback should be provided, otherwise the extraction should be fatal.
Does this thing still work? |
Authored by: MinePlayersPE Based on ytdl-org/youtube-dl#10025
Authored by: MinePlayersPE Based on ytdl-org/youtube-dl#10025
Before submitting a pull request make sure you have:
What is the purpose of your pull request?
Description of your pull request and other information
Added a new extractor for BrainPOP. In addition to the video, it extracts the description and the thumbnail. It should work on any video on the site, even if it's in a different language.