feat(wistia): add subtitle download support #3

kovyrin · 2025-10-14T15:58:45Z

add full Wistia subtitle discovery and queuing; captions now download beside their videos with sanitized filenames
guard resume cache: when subtitle support is enabled, outdated .thinkific_progress.json files are regenerated so caption tasks aren’t skipped
expose SUBTITLE_DOWNLOAD_ENABLED in Settings and .env.example, wiring the toggle into runtime behavior
document subtitle capability and new env flag in README, including table and feature highlights updates

Verification

python -m compileall thinkific_downloader
uv run python thinkificdownloader.py (full course download with subtitles queued and saved)

gemini-code-assist · 2025-10-14T15:59:03Z

Summary of Changes

Hello @kovyrin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new feature: the ability to download subtitles and captions for Wistia-hosted videos. It integrates subtitle discovery and queuing directly into the download process, ensures proper configuration through a new environment variable, and intelligently manages the download resume cache to accommodate this new content type. The changes enhance the utility of the downloader by providing a more complete content acquisition experience.

Highlights

Wistia Subtitle Support: Added comprehensive functionality to discover and download subtitles/captions for Wistia videos, saving them alongside their respective video files with sanitized names.
Configuration Toggle: Introduced a SUBTITLE_DOWNLOAD_ENABLED setting in .env.example and Settings to allow users to enable or disable subtitle downloads.
Resume Cache Guarding: Implemented logic to automatically regenerate the .thinkific_progress.json cache if subtitle downloading is enabled and the existing cache does not contain subtitle tasks, preventing skipped caption downloads.
Documentation Updates: Updated the README.md to reflect the new subtitle download capability, including its entry in the feature table and highlights section.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a great new feature for downloading Wistia video subtitles. The implementation is thorough, covering configuration, subtitle discovery, and download queuing. I appreciate the addition of cache invalidation to handle courses that were analyzed before this feature was added. My review focuses on improving exception handling to be more specific and suggesting a refactor for a large new function to enhance maintainability by reducing code duplication.

thinkific_downloader/downloader.py

thinkific_downloader/wistia_downloader.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

kovyrin · 2025-10-14T16:36:09Z

/gemini review

kovyrin · 2025-10-14T16:36:55Z

@gemini-code-assist review

gemini-code-assist

Code Review

This pull request introduces subtitle download support for Wistia videos, a valuable addition. The implementation includes a new configuration flag, comprehensive logic for discovering subtitle tracks from the Wistia API, and intelligent cache invalidation to ensure subtitles are queued for existing partial downloads. The documentation has also been updated accordingly. My review focuses on improving code clarity, maintainability, and error handling for even greater robustness.

thinkific_downloader/downloader.py

gemini-code-assist · 2025-10-14T16:38:53Z

thinkific_downloader/downloader.py

+        filtered_tasks: List[Dict[str, Any]] = []
+        skipped_count = 0
+        for task in saved_tasks:
+            content_type = (task.get('content_type') or 'video').lower()
+            if content_type == 'subtitle':
+                skipped_count += 1
+                continue
+            filtered_tasks.append(task)
+        restored_tasks = filtered_tasks


This block of code for filtering out subtitle tasks can be made more concise and Pythonic by using a list comprehension. This improves readability and reduces the lines of code.

restored_tasks = [ task for task in saved_tasks if (task.get('content_type') or 'video').lower() != 'subtitle' ] skipped_count = len(saved_tasks) - len(restored_tasks)

thinkific_downloader/wistia_downloader.py

gemini-code-assist · 2025-10-14T16:38:53Z

thinkific_downloader/wistia_downloader.py

+        language_part = track.get('language') or track.get('label') or ''
+        if isinstance(language_part, (list, dict)):
+            language_part = ''
+        language_part = str(language_part or '')


This block of code to safely get and convert language_part to a string can be simplified. You can achieve the same goal more concisely by checking the type and using a conditional expression, which also handles edge cases like None or non-string types more cleanly.

language_part_raw = track.get('language') or track.get('label') language_part = str(language_part_raw) if isinstance(language_part_raw, str) else ''

gemini-code-assist

Code Review

This pull request adds a great new feature: downloading subtitles for Wistia videos. The implementation is thorough, covering discovery of subtitles from various fields in the Wistia API response, adding a feature flag to control it, and even invalidating the resume cache to ensure subtitles are picked up on existing partial downloads. The documentation and configuration changes are also well-executed.

My review includes a few suggestions to improve maintainability and code structure. I've pointed out a circular dependency between modules and suggested a refactoring to decouple them. I've also recommended simplifying a complex function by breaking it down and making a piece of logic more concise. These changes should make the new code even more robust and easier to work with in the future.

gemini-code-assist · 2025-10-14T16:41:12Z

thinkific_downloader/wistia_downloader.py

+def queue_wistia_subtitle_downloads(media: Dict[str, Any], dest_dir: Path, video_base_name: str):
+    """Queue subtitle download tasks for a Wistia media object."""
+    from .downloader import SETTINGS, add_download_task, init_settings
+
+    if not isinstance(dest_dir, Path):
+        dest_dir = Path(dest_dir)
+
+    init_settings()
+    settings = SETTINGS
+    if settings and hasattr(settings, 'subtitle_download_enabled') and not settings.subtitle_download_enabled:
+        return
+
+    tracks = extract_wistia_subtitle_tracks(media)
+    if not tracks:
+        return
+
+    base_name = Path(video_base_name).stem
+    if not base_name:
+        fallback_name = media.get('name') or media.get('hashedId') or 'captions'
+        base_name = filter_filename(str(fallback_name))
+    else:
+        base_name = filter_filename(base_name)
+
+    if not base_name:
+        base_name = "captions"
+
+    counter = 1
+    for track in tracks:
+        url = track.get('url')
+        if not url:
+            continue
+
+        ext = (track.get('ext') or _infer_track_extension(url)).lstrip('.').lower() or DEFAULT_SUBTITLE_EXTENSION
+        language_part = track.get('language') or track.get('label') or ''
+        if isinstance(language_part, (list, dict)):
+            language_part = ''
+        language_part = str(language_part or '')
+        language_part = _LANGUAGE_SANITIZE_PATTERN.sub('-', language_part).strip('-')
+
+        if not language_part:
+            language_part = 'captions' if counter == 1 else f"captions-{counter}"
+
+        subtitle_filename = filter_filename(f"{base_name}.{language_part}.{ext}")
+        if not subtitle_filename:
+            subtitle_filename = filter_filename(f"{base_name}.captions-{counter}.{ext}")
+
+        print(f"   [Subs] Queued subtitles: {subtitle_filename}")
+        add_download_task(url, dest_dir / subtitle_filename, "subtitle")
+        counter += 1


This function imports from downloader.py, which in turn imports from wistia_downloader.py, creating a circular dependency. While using local imports makes this work at runtime, it's a code smell that makes the modules tightly coupled and harder to maintain and test.

A better approach would be to refactor this function to not have side effects in another module. Instead of calling add_download_task directly, it could return a list of subtitle download tasks. The caller in downloader.py would then be responsible for adding them to the download queue. This is an example of Inversion of Control.

This would involve:

Renaming queue_wistia_subtitle_downloads to something like get_wistia_subtitle_tasks and making it return a list of task dictionaries.

Passing the settings object to it to check subtitle_download_enabled.

Updating the call sites in downloader.py and wistia_downloader.py to handle the returned list of tasks.

This change would break the dependency from wistia_downloader.py to downloader.py, improving the overall module architecture.

thinkific_downloader/downloader.py

gemini-code-assist · 2025-10-14T16:41:13Z

thinkific_downloader/wistia_downloader.py

+def extract_wistia_subtitle_tracks(media: Dict[str, Any]) -> List[Dict[str, Optional[str]]]:
+    """Extract subtitle/caption track metadata from Wistia media JSON."""
+    if not isinstance(media, dict):
+        return []
+
+    hashed_id = media.get('hashedId') or media.get('hashed_id')
+    tracks: List[Dict[str, Optional[str]]] = []
+
+    def add_track(url: Optional[str], language: Optional[str], label: Optional[str], ext: Optional[str]):
+        normalized = _normalize_wistia_track_url(url)
+        if not normalized and hashed_id and language:
+            normalized = _build_caption_url(hashed_id, language, ext)
+        if not normalized:
+            return
+        tracks.append({
+            'url': normalized,
+            'language': language,
+            'label': label,
+            'ext': (ext or '').lstrip('.') or None
+        })
+
+    for track in media.get('captions') or []:
+        if isinstance(track, dict):
+            add_track(
+                track.get('url') or track.get('src'),
+                track.get('language') or track.get('lang'),
+                track.get('languageName') or track.get('label') or track.get('name'),
+                track.get('ext')
+            )
+
+    def iter_track_dicts(collection: Optional[Iterable[Dict[str, Any]]]):
+        for item in collection or []:
+            if isinstance(item, dict):
+                yield item
+
+    def extract_label(track_dict: Dict[str, Any], label_keys: Iterable[str]) -> Optional[str]:
+        for key in label_keys:
+            value = track_dict.get(key)
+            if value:
+                return value
+        return None
+
+    def iter_track_sources(track_dict: Dict[str, Any]):
+        sources = track_dict.get('sources') or []
+        if not sources:
+            yield track_dict.get('url') or track_dict.get('src'), track_dict.get('ext')
+            return
+        for source in sources:
+            if isinstance(source, dict):
+                yield source.get('url') or source.get('src'), source.get('ext') or track_dict.get('ext')
+
+    def process_track_collection(collection: Optional[Iterable[Dict[str, Any]]], label_keys: Iterable[str]):
+        for track in iter_track_dicts(collection):
+            language = track.get('language') or track.get('lang')
+            label = extract_label(track, label_keys)
+            for source_url, source_ext in iter_track_sources(track):
+                add_track(source_url, language, label, source_ext)
+
+    process_track_collection(media.get('text_tracks'), ('name', 'label'))
+    process_track_collection(media.get('textTracks'), ('name', 'label', 'title'))
+
+    for asset in media.get('assets') or []:
+        if isinstance(asset, dict):
+            asset_type = (asset.get('type') or '').lower()
+            asset_kind = (asset.get('kind') or '').lower()
+            if asset_type in ('caption', 'captions', 'subtitle', 'subtitles') or asset_kind in ('caption', 'captions', 'subtitle', 'subtitles'):
+                add_track(
+                    asset.get('url') or asset.get('src'),
+                    asset.get('language') or asset.get('lang'),
+                    asset.get('display_name') or asset.get('name'),
+                    asset.get('ext')
+                )
+
+    available_transcripts = media.get('availableTranscripts') or []
+    if hashed_id and available_transcripts:
+        for transcript in available_transcripts:
+            if not isinstance(transcript, dict) or not transcript.get('hasCaptions'):
+                continue
+            language = transcript.get('language') or transcript.get('wistiaLanguageCode') or transcript.get('bcp47LanguageTag')
+            if not language:
+                continue
+            add_track(
+                _build_caption_url(hashed_id, language, DEFAULT_SUBTITLE_EXTENSION),
+                language,
+                transcript.get('name') or transcript.get('familyName') or language,
+                DEFAULT_SUBTITLE_EXTENSION
+            )
+
+    unique_tracks: Dict[str, Dict[str, Optional[str]]] = {}
+    for track in tracks:
+        url = track['url']
+        if not url:
+            continue
+        if url not in unique_tracks:
+            unique_tracks[url] = track
+        else:
+            existing = unique_tracks[url]
+            # Prefer track data that includes language/label/ext
+            if not existing.get('language') and track.get('language'):
+                existing['language'] = track['language']
+            if not existing.get('label') and track.get('label'):
+                existing['label'] = track['label']
+            if not existing.get('ext') and track.get('ext'):
+                existing['ext'] = track['ext']
+
+    return list(unique_tracks.values())
+


This function is quite long and complex, making it difficult to read and maintain. It handles extracting subtitle tracks from several different structures within the Wistia media JSON.

To improve readability and maintainability, consider breaking this function down into several smaller helper functions, each responsible for extracting tracks from a specific part of the media object (e.g., captions, text_tracks, assets). The main function would then orchestrate calls to these helpers.

Example structure:

def _extract_from_captions(media_captions: list, add_track: callable): # ... logic for 'captions' key def _extract_from_text_tracks(text_tracks: list, add_track: callable): # ... logic for 'text_tracks' key # ... other helpers for 'assets', 'availableTranscripts' def extract_wistia_subtitle_tracks(media: Dict[str, Any]) -> List[Dict[str, Optional[str]]]: if not isinstance(media, dict): return [] hashed_id = media.get('hashedId') or media.get('hashed_id') tracks: List[Dict[str, Optional[str]]] = [] def add_track(url: Optional[str], language: Optional[str], label: Optional[str], ext: Optional[str]): # ... (current add_track logic) tracks.append(...) _extract_from_captions(media.get('captions') or [], add_track) _extract_from_text_tracks(media.get('text_tracks') or [], add_track) # ... call other helpers # De-duplication logic remains here unique_tracks: Dict[str, Dict[str, Optional[str]]] = {} # ... return list(unique_tracks.values())

This refactoring would make the code more modular and easier to test and debug.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ajimix · 2025-11-03T03:30:52Z

@kovyrin what is missing for this to be merged?

feat(wistia): add subtitle download support

a4dca4a

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

thinkific_downloader/downloader.py Outdated Show resolved Hide resolved

thinkific_downloader/downloader.py Outdated Show resolved Hide resolved

thinkific_downloader/wistia_downloader.py Outdated Show resolved Hide resolved

thinkific_downloader/wistia_downloader.py Show resolved Hide resolved

kovyrin marked this pull request as draft October 14, 2025 16:03

kovyrin and others added 7 commits October 14, 2025 12:05

Respect subtitle flag when restoring cached tasks

f2cd39d

Update thinkific_downloader/downloader.py

80129f8

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update thinkific_downloader/downloader.py

7f4a58a

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update thinkific_downloader/wistia_downloader.py

60f8273

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Refactor Wistia track extraction helper

e02aca2

Factor cache restore helpers

72573ac

Decompose Wistia track processing helpers

bd00862

kovyrin marked this pull request as ready for review October 14, 2025 16:35

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

kovyrin and others added 5 commits October 14, 2025 13:05

Update thinkific_downloader/downloader.py

c705cc3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update thinkific_downloader/wistia_downloader.py

08b0129

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update thinkific_downloader/downloader.py

47192c4

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Improve resume cache handling for subtitles

98112af

Return Wistia subtitle tasks for callers

462b3b6

kovyrin mentioned this pull request Oct 15, 2025

Add offline course viewer generator #4

Open

Uh oh!

feat(wistia): add subtitle download support #3

Are you sure you want to change the base?

feat(wistia): add subtitle download support #3

Uh oh!

Conversation

kovyrin commented Oct 14, 2025

Uh oh!

gemini-code-assist bot commented Oct 14, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kovyrin commented Oct 14, 2025

Uh oh!

kovyrin commented Oct 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

ajimix commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants