Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARD:mediathek] Unable to extract media id #32518

Open
5 tasks done
zeuner opened this issue Aug 19, 2023 · 5 comments
Open
5 tasks done

[ARD:mediathek] Unable to extract media id #32518

zeuner opened this issue Aug 19, 2023 · 5 comments
Labels
broken-IE problem with existing site extraction

Comments

@zeuner
Copy link

zeuner commented Aug 19, 2023

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've checked that all provided URLs are alive and playable in a browser
    (the URL is alive as of now, but it might be taken down at some point)
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones
    (there were multiple similar issues for other URLs, but these are marked as fixed, while the currently encountered error is still there)

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.ardmediathek.de/video/tom-sawyer/24-wie-tom-ein-rudel-woelfe-baendigte/kika/MDdkMjc0ZmYtOWJkNi00ZTg0LTlkMDUtMzAzYzI3YjRlOGZh?isChildContent']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 86e3cf5e5
[debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-5.15.0-78-generic-x86_64-with-glibc2.35 - OpenSSL 3.0.2 15 Mar 2022 - glibc 2.35
[debug] exe versions: ffmpeg 4.4.2, ffprobe 4.4.2, phantomjs ., rtmpdump 2.4
[debug] Proxy map: {'no': 'localhost,127.0.0.0/8,::1'}
[debug] Using fake IP 53.53.132.33 (DE) as X-Forwarded-For.
[ARD:mediathek] MDdkMjc0ZmYtOWJkNi00ZTg0LTlkMDUtMzAzYzI3YjRlOGZh: Downloading webpage
ERROR: Unable to extract media id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/tmp/youtube-dl/youtube_dl/YoutubeDL.py", line 859, in wrapper
    return func(self, *args, **kwargs)
  File "/tmp/youtube-dl/youtube_dl/YoutubeDL.py", line 955, in __extract_info
    ie_result = ie.extract(url)
  File "/tmp/youtube-dl/youtube_dl/extractor/common.py", line 565, in extract
    ie_result = self._real_extract(url)
  File "/tmp/youtube-dl/youtube_dl/extractor/ard.py", line 235, in _real_extract
    video_id = self._search_regex(
  File "/tmp/youtube-dl/youtube_dl/extractor/common.py", line 1045, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
youtube_dl.utils.RegexNotFoundError: Unable to extract media id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Description

The video URL https://www.ardmediathek.de/video/tom-sawyer/24-wie-tom-ein-rudel-woelfe-baendigte/kika/MDdkMjc0ZmYtOWJkNi00ZTg0LTlkMDUtMzAzYzI3YjRlOGZh?isChildContent (and other ones from https://www.ardmediathek.de/sendung/tom-sawyer/Y3JpZDovL2hyLW9ubGluZS8zODIyMDEwNA?isChildContent as far as already tried) don't work, while they are perfectly playable e.g. from a Firefox browser. This hinders playback from CLI players like mpv.

Any hints on how to investigate this further are appreciated.

@dirkf
Copy link
Contributor

dirkf commented Aug 19, 2023

Working in master: try the nightly build (#30839):

python -m youtube_dl -v -F 'https://www.ardmediathek.de/video/tom-sawyer/24-wie-tom-ein-rudel-woelfe-baendigte/kika/MDdkMjc0ZmYtOWJkNi00ZTg0LTlkMDUtMzAzYzI3YjRlOGZh?isChildContent'
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.ardmediathek.de/video/tom-sawyer/24-wie-tom-ein-rudel-woelfe-baendigte/kika/MDdkMjc0ZmYtOWJkNi00ZTg0LTlkMDUtMzAzYzI3YjRlOGZh?isChildContent']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 86e3cf5e5
[debug] Python 2.7.15 (CPython i686 32bit) - Linux-6.1.0-11-686-pae-i686-with-debian-12.1 - OpenSSL 1.1.1a  20 Nov 2018 - glibc 2.1.3
[debug] exe versions: none
[debug] Proxy map: {}
[debug] Using fake IP 53.116.133.201 (DE) as X-Forwarded-For.
[ARDBetaMediathek] MDdkMjc0ZmYtOWJkNi00ZTg0LTlkMDUtMzAzYzI3YjRlOGZh: Downloading webpage
[ARDBetaMediathek] MDdkMjc0ZmYtOWJkNi00ZTg0LTlkMDUtMzAzYzI3YjRlOGZh: Downloading m3u8 information
[info] Available formats for MDdkMjc0ZmYtOWJkNi00ZTg0LTlkMDUtMzAzYzI3YjRlOGZh:
format code  extension  resolution note
SD           mp4        640x360    [de] 1200k
SD_480p      mp4        960x540    [de] 1600k
HD_Ready     mp4        1280x720   [de] 3200k
Full_HD      mp4        1920x1080  [de] 5000k
hls-418      mp4        480x270    [de]  418k , avc1.4d401e, 50.0fps, mp4a.40.2
hls-593      mp4        640x360    [de]  593k , avc1.4d401f, 50.0fps, mp4a.40.2
hls-928      mp4        960x540    [de]  928k , avc1.4d401f, 50.0fps, mp4a.40.2
hls-1400     mp4        1280x720   [de] 1400k , avc1.640020, 50.0fps, mp4a.40.2
hls-2171     mp4        1920x1080  [de] 2171k , avc1.64002a, 50.0fps, mp4a.40.2 (best)
$

@zeuner
Copy link
Author

zeuner commented Aug 19, 2023

@dirkf The error occurs on master. And I notice your log shows the same git commit hash. Any idea what could lead to a differing behaviour here? It seems that your setup calls ARDBetaMediathek instead of ARD:mediathek.

@zeuner
Copy link
Author

zeuner commented Aug 19, 2023

When I use the release from https://github.com/ytdl-org/ytdl-nightly/releases/tag/2023.08.07, the outcome differs only marginally:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.ardmediathek.de/video/tom-sawyer/24-wie-tom-ein-rudel-woelfe-baendigte/kika/MDdkMjc0ZmYtOWJkNi00ZTg0LTlkMDUtMzAzYzI3YjRlOGZh?isChildContent']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2023.08.07 [86e3cf5e5]
[debug] ** This version was built from the latest master code at https://github.com/ytdl-org/youtube-dl.
[debug] ** For support, visit the main site.
[debug] Python 2.7.18 (CPython x86_64 64bit) - Linux-5.15.0-78-generic-x86_64-with-Ubuntu-22.04-jammy - OpenSSL 3.0.2 15 Mar 2022 - glibc 2.35
[debug] exe versions: ffmpeg 4.4.2, ffprobe 4.4.2, phantomjs ., rtmpdump 2.4
[debug] Proxy map: {'no': 'localhost,127.0.0.0/8,::1'}
[debug] Using fake IP 53.152.102.83 (DE) as X-Forwarded-For.
[ARD:mediathek] MDdkMjc0ZmYtOWJkNi00ZTg0LTlkMDUtMzAzYzI3YjRlOGZh: Downloading webpage
ERROR: Unable to extract media id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "youtube_dl/YoutubeDL.py", line 863, in wrapper
    return func(self, *args, **kwargs)
  File "youtube_dl/YoutubeDL.py", line 959, in __extract_info
    ie_result = ie.extract(url)
  File "youtube_dl/extractor/common.py", line 565, in extract
    ie_result = self._real_extract(url)
  File "youtube_dl/extractor/ard.py", line 236, in _real_extract
    r'/play/(?:config|media)/(\d+)', webpage, 'media id')
  File "youtube_dl/extractor/common.py", line 1045, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
RegexNotFoundError: Unable to extract media id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

@dirkf
Copy link
Contributor

dirkf commented Aug 19, 2023

Absolutely. I forgot that I brought a dev version of the ARD extractor on holiday with me! Sorry to confuse things, but at least it shows that the new version solves the issue. The changes are pretty extensive, so it's not practical to post a diff.

@zeuner
Copy link
Author

zeuner commented Aug 21, 2023

Great to hear. Happy to test in case the changes become available in some branch.

@dirkf dirkf added the broken-IE problem with existing site extraction label Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken-IE problem with existing site extraction
Projects
None yet
Development

No branches or pull requests

3 participants
@dirkf @zeuner and others