Skip to content

Conversation

@kovyrin
Copy link

@kovyrin kovyrin commented Oct 14, 2025

  • add full Wistia subtitle discovery and queuing; captions now download beside their videos with sanitized filenames
  • guard resume cache: when subtitle support is enabled, outdated .thinkific_progress.json files are regenerated so caption tasks aren’t skipped
  • expose SUBTITLE_DOWNLOAD_ENABLED in Settings and .env.example, wiring the toggle into runtime behavior
  • document subtitle capability and new env flag in README, including table and feature highlights updates

Verification

  • python -m compileall thinkific_downloader
  • uv run python thinkificdownloader.py (full course download with subtitles queued and saved)

@gemini-code-assist
Copy link

Summary of Changes

Hello @kovyrin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new feature: the ability to download subtitles and captions for Wistia-hosted videos. It integrates subtitle discovery and queuing directly into the download process, ensures proper configuration through a new environment variable, and intelligently manages the download resume cache to accommodate this new content type. The changes enhance the utility of the downloader by providing a more complete content acquisition experience.

Highlights

  • Wistia Subtitle Support: Added comprehensive functionality to discover and download subtitles/captions for Wistia videos, saving them alongside their respective video files with sanitized names.
  • Configuration Toggle: Introduced a SUBTITLE_DOWNLOAD_ENABLED setting in .env.example and Settings to allow users to enable or disable subtitle downloads.
  • Resume Cache Guarding: Implemented logic to automatically regenerate the .thinkific_progress.json cache if subtitle downloading is enabled and the existing cache does not contain subtitle tasks, preventing skipped caption downloads.
  • Documentation Updates: Updated the README.md to reflect the new subtitle download capability, including its entry in the feature table and highlights section.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a great new feature for downloading Wistia video subtitles. The implementation is thorough, covering configuration, subtitle discovery, and download queuing. I appreciate the addition of cache invalidation to handle courses that were analyzed before this feature was added. My review focuses on improving exception handling to be more specific and suggesting a refactor for a large new function to enhance maintainability by reducing code duplication.

@kovyrin kovyrin marked this pull request as draft October 14, 2025 16:03
kovyrin and others added 7 commits October 14, 2025 12:05
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@kovyrin kovyrin marked this pull request as ready for review October 14, 2025 16:35
@kovyrin
Copy link
Author

kovyrin commented Oct 14, 2025

/gemini review

@kovyrin
Copy link
Author

kovyrin commented Oct 14, 2025

@gemini-code-assist review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces subtitle download support for Wistia videos, a valuable addition. The implementation includes a new configuration flag, comprehensive logic for discovering subtitle tracks from the Wistia API, and intelligent cache invalidation to ensure subtitles are queued for existing partial downloads. The documentation has also been updated accordingly. My review focuses on improving code clarity, maintainability, and error handling for even greater robustness.

Comment on lines 429 to 437
filtered_tasks: List[Dict[str, Any]] = []
skipped_count = 0
for task in saved_tasks:
content_type = (task.get('content_type') or 'video').lower()
if content_type == 'subtitle':
skipped_count += 1
continue
filtered_tasks.append(task)
restored_tasks = filtered_tasks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code for filtering out subtitle tasks can be made more concise and Pythonic by using a list comprehension. This improves readability and reduces the lines of code.

        restored_tasks = [
            task for task in saved_tasks
            if (task.get('content_type') or 'video').lower() != 'subtitle'
        ]
        skipped_count = len(saved_tasks) - len(restored_tasks)

Comment on lines 204 to 207
language_part = track.get('language') or track.get('label') or ''
if isinstance(language_part, (list, dict)):
language_part = ''
language_part = str(language_part or '')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code to safely get and convert language_part to a string can be simplified. You can achieve the same goal more concisely by checking the type and using a conditional expression, which also handles edge cases like None or non-string types more cleanly.

        language_part_raw = track.get('language') or track.get('label')
        language_part = str(language_part_raw) if isinstance(language_part_raw, str) else ''

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a great new feature: downloading subtitles for Wistia videos. The implementation is thorough, covering discovery of subtitles from various fields in the Wistia API response, adding a feature flag to control it, and even invalidating the resume cache to ensure subtitles are picked up on existing partial downloads. The documentation and configuration changes are also well-executed.

My review includes a few suggestions to improve maintainability and code structure. I've pointed out a circular dependency between modules and suggested a refactoring to decouple them. I've also recommended simplifying a complex function by breaking it down and making a piece of logic more concise. These changes should make the new code even more robust and easier to work with in the future.

Comment on lines 171 to 219
def queue_wistia_subtitle_downloads(media: Dict[str, Any], dest_dir: Path, video_base_name: str):
"""Queue subtitle download tasks for a Wistia media object."""
from .downloader import SETTINGS, add_download_task, init_settings

if not isinstance(dest_dir, Path):
dest_dir = Path(dest_dir)

init_settings()
settings = SETTINGS
if settings and hasattr(settings, 'subtitle_download_enabled') and not settings.subtitle_download_enabled:
return

tracks = extract_wistia_subtitle_tracks(media)
if not tracks:
return

base_name = Path(video_base_name).stem
if not base_name:
fallback_name = media.get('name') or media.get('hashedId') or 'captions'
base_name = filter_filename(str(fallback_name))
else:
base_name = filter_filename(base_name)

if not base_name:
base_name = "captions"

counter = 1
for track in tracks:
url = track.get('url')
if not url:
continue

ext = (track.get('ext') or _infer_track_extension(url)).lstrip('.').lower() or DEFAULT_SUBTITLE_EXTENSION
language_part = track.get('language') or track.get('label') or ''
if isinstance(language_part, (list, dict)):
language_part = ''
language_part = str(language_part or '')
language_part = _LANGUAGE_SANITIZE_PATTERN.sub('-', language_part).strip('-')

if not language_part:
language_part = 'captions' if counter == 1 else f"captions-{counter}"

subtitle_filename = filter_filename(f"{base_name}.{language_part}.{ext}")
if not subtitle_filename:
subtitle_filename = filter_filename(f"{base_name}.captions-{counter}.{ext}")

print(f" [Subs] Queued subtitles: {subtitle_filename}")
add_download_task(url, dest_dir / subtitle_filename, "subtitle")
counter += 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function imports from downloader.py, which in turn imports from wistia_downloader.py, creating a circular dependency. While using local imports makes this work at runtime, it's a code smell that makes the modules tightly coupled and harder to maintain and test.

A better approach would be to refactor this function to not have side effects in another module. Instead of calling add_download_task directly, it could return a list of subtitle download tasks. The caller in downloader.py would then be responsible for adding them to the download queue. This is an example of Inversion of Control.

This would involve:

  1. Renaming queue_wistia_subtitle_downloads to something like get_wistia_subtitle_tasks and making it return a list of task dictionaries.
  2. Passing the settings object to it to check subtitle_download_enabled.
  3. Updating the call sites in downloader.py and wistia_downloader.py to handle the returned list of tasks.

This change would break the dependency from wistia_downloader.py to downloader.py, improving the overall module architecture.

Comment on lines 63 to 169
def extract_wistia_subtitle_tracks(media: Dict[str, Any]) -> List[Dict[str, Optional[str]]]:
"""Extract subtitle/caption track metadata from Wistia media JSON."""
if not isinstance(media, dict):
return []

hashed_id = media.get('hashedId') or media.get('hashed_id')
tracks: List[Dict[str, Optional[str]]] = []

def add_track(url: Optional[str], language: Optional[str], label: Optional[str], ext: Optional[str]):
normalized = _normalize_wistia_track_url(url)
if not normalized and hashed_id and language:
normalized = _build_caption_url(hashed_id, language, ext)
if not normalized:
return
tracks.append({
'url': normalized,
'language': language,
'label': label,
'ext': (ext or '').lstrip('.') or None
})

for track in media.get('captions') or []:
if isinstance(track, dict):
add_track(
track.get('url') or track.get('src'),
track.get('language') or track.get('lang'),
track.get('languageName') or track.get('label') or track.get('name'),
track.get('ext')
)

def iter_track_dicts(collection: Optional[Iterable[Dict[str, Any]]]):
for item in collection or []:
if isinstance(item, dict):
yield item

def extract_label(track_dict: Dict[str, Any], label_keys: Iterable[str]) -> Optional[str]:
for key in label_keys:
value = track_dict.get(key)
if value:
return value
return None

def iter_track_sources(track_dict: Dict[str, Any]):
sources = track_dict.get('sources') or []
if not sources:
yield track_dict.get('url') or track_dict.get('src'), track_dict.get('ext')
return
for source in sources:
if isinstance(source, dict):
yield source.get('url') or source.get('src'), source.get('ext') or track_dict.get('ext')

def process_track_collection(collection: Optional[Iterable[Dict[str, Any]]], label_keys: Iterable[str]):
for track in iter_track_dicts(collection):
language = track.get('language') or track.get('lang')
label = extract_label(track, label_keys)
for source_url, source_ext in iter_track_sources(track):
add_track(source_url, language, label, source_ext)

process_track_collection(media.get('text_tracks'), ('name', 'label'))
process_track_collection(media.get('textTracks'), ('name', 'label', 'title'))

for asset in media.get('assets') or []:
if isinstance(asset, dict):
asset_type = (asset.get('type') or '').lower()
asset_kind = (asset.get('kind') or '').lower()
if asset_type in ('caption', 'captions', 'subtitle', 'subtitles') or asset_kind in ('caption', 'captions', 'subtitle', 'subtitles'):
add_track(
asset.get('url') or asset.get('src'),
asset.get('language') or asset.get('lang'),
asset.get('display_name') or asset.get('name'),
asset.get('ext')
)

available_transcripts = media.get('availableTranscripts') or []
if hashed_id and available_transcripts:
for transcript in available_transcripts:
if not isinstance(transcript, dict) or not transcript.get('hasCaptions'):
continue
language = transcript.get('language') or transcript.get('wistiaLanguageCode') or transcript.get('bcp47LanguageTag')
if not language:
continue
add_track(
_build_caption_url(hashed_id, language, DEFAULT_SUBTITLE_EXTENSION),
language,
transcript.get('name') or transcript.get('familyName') or language,
DEFAULT_SUBTITLE_EXTENSION
)

unique_tracks: Dict[str, Dict[str, Optional[str]]] = {}
for track in tracks:
url = track['url']
if not url:
continue
if url not in unique_tracks:
unique_tracks[url] = track
else:
existing = unique_tracks[url]
# Prefer track data that includes language/label/ext
if not existing.get('language') and track.get('language'):
existing['language'] = track['language']
if not existing.get('label') and track.get('label'):
existing['label'] = track['label']
if not existing.get('ext') and track.get('ext'):
existing['ext'] = track['ext']

return list(unique_tracks.values())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function is quite long and complex, making it difficult to read and maintain. It handles extracting subtitle tracks from several different structures within the Wistia media JSON.

To improve readability and maintainability, consider breaking this function down into several smaller helper functions, each responsible for extracting tracks from a specific part of the media object (e.g., captions, text_tracks, assets). The main function would then orchestrate calls to these helpers.

Example structure:

def _extract_from_captions(media_captions: list, add_track: callable):
    # ... logic for 'captions' key
    
def _extract_from_text_tracks(text_tracks: list, add_track: callable):
    # ... logic for 'text_tracks' key

# ... other helpers for 'assets', 'availableTranscripts'

def extract_wistia_subtitle_tracks(media: Dict[str, Any]) -> List[Dict[str, Optional[str]]]:
    if not isinstance(media, dict):
        return []

    hashed_id = media.get('hashedId') or media.get('hashed_id')
    tracks: List[Dict[str, Optional[str]]] = []

    def add_track(url: Optional[str], language: Optional[str], label: Optional[str], ext: Optional[str]):
        # ... (current add_track logic)
        tracks.append(...)

    _extract_from_captions(media.get('captions') or [], add_track)
    _extract_from_text_tracks(media.get('text_tracks') or [], add_track)
    # ... call other helpers

    # De-duplication logic remains here
    unique_tracks: Dict[str, Dict[str, Optional[str]]] = {}
    # ...

    return list(unique_tracks.values())

This refactoring would make the code more modular and easier to test and debug.

kovyrin and others added 5 commits October 14, 2025 13:05
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ajimix
Copy link

ajimix commented Nov 3, 2025

@kovyrin what is missing for this to be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants