Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Suggestion] Ignore extra header lines #18

Closed
abstractvector opened this issue Mar 13, 2019 · 4 comments
Closed

[Feature Suggestion] Ignore extra header lines #18

abstractvector opened this issue Mar 13, 2019 · 4 comments

Comments

@abstractvector
Copy link
Contributor

I'm using this module to ingest data from Google's YouTube Captions API. Unfortunately, the content it generates has extra lines after the opening WEBVTT line, for example:

WEBVTT
Kind: captions
Language: en

00:00:00.000 --> 00:00:00.960
[Happy music]

According to MDN, this is not allowed, however nonetheless it appears there. At the moment I'm solving this with a workaround to alter the string before passing it to parse():

const adjustedCaption = caption.replace(/^WEBVTT[\s\S]*?\n\n/, "WEBVTT\n\n");

Without this workaround, I receive an error: Missing blank line after signature. It would be preferable if this module could instead accept an option to ignore trailing signature lines. Looking at the code, this wouldn't have adverse effects on the parsing. Alternatively, these lines could be parsed and added as metadata to the parsed output.

I'd be happy to issue a PR for this if you're comfortable with the approach, or if you have a better suggestion I can look at implementing that too.

@osk
Copy link
Owner

osk commented Mar 13, 2019

Hi @abstractvector, that sounds like a fine way to go about it and I'd be happy to merge a PR.

@abstractvector
Copy link
Contributor Author

@osk would you be happy if I go the route of exposing the fields as an extra top-level object called meta? In the case of my example above, parse() would return something like:

{
  "valid": true,
  "meta": {
    "Kind": "captions",
    "Language": "en"
  },
  "cues": [
    // ..
  ]
}

@osk
Copy link
Owner

osk commented Mar 13, 2019

Looks like this is something being added via the webm project:
http://wiki.webmproject.org/webm-metadata/temporal-metadata/webvtt-metadata
but that's just after a few minutes of research.

It should be fine if webvtt.parse(input); would get an options param so e.g. webvtt.parse(input, { meta: true }); would look for these meta fields and add them to the output. If the param is passed but no metadata found, it should return "meta": null or something to that effect.

@abstractvector
Copy link
Contributor Author

@osk Sounds great! I'm not an expert on this topic by any means, so I appreciate the steer! I'll work on a PR for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants