Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FilePreview] Use syntax highlighting for .srt #35651

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

PesBandi
Copy link
Contributor

@PesBandi PesBandi commented Oct 29, 2024

Summary of the Pull Request

Adds syntax highlighting to .srt preview (Peek and Preview Pane).

PR Checklist

  • Closes: [Monaco] Use syntax highlighting for .srt #35152
  • Communication: I've discussed this with core contributors already.
  • Tests: All pass
  • Localization: All end user facing strings can be localized
  • Dev docs: Updated
  • New binaries: None
  • Documentation updated: No need

Detailed Description of the Pull Request / Additional comments

  • Removes srt from txtExt and registers it as a new language
  • Changes customTokenColors to customTokenThemeRules
  • Adds custom Monarch definition
    • Block numbers tokenized as number
    • Timestamp tokenized as tag
    • Subtitle content tokenized as string
    • Bold, italic and underline are the same color as the rest of the subtitle, just in their respective font style
    • Partially assumes that the file is valid to make the definition simpler, which means:
      • All text on the same line as a timestamp will be colored the same as the subtitle text, even though it should be ignored
      • Assumes that a block number is followed by a timestamp (doesn't break if it isn't, just incorrectly tokenizes the number as a number instead of ignoring it)
      • Assumes that all format tags are closed (if not, the rest of that subtitle block is in that format) Correction: that's the intended behavior
      • Only timestamps in the HH:mm:ss,mmm format are considered valid, even though something like 00:123:01,1234 --> 00:00:02,000 is technically valid and would get interpreted as 02:03:02,234 --> 1193:02:49,296 (I have no idea what's going on)
    • I couldn't find any official description of the srt format, so I used ffmpeg to determine what is considered valid

Screenshot:
image

Validation Steps Performed

Tested previewing srt files with Peek and Preview Pane. Monarch definition tested for edge cases.

Comment on lines -8 to +18
registerAdditionalLanguage("cppExt", [".ino", ".pde"], "cpp", monaco)
registerAdditionalLanguage("xmlExt", [".wsdl", ".csproj", ".vcxproj", ".vbproj", ".fsproj"], "xml", monaco)
registerAdditionalLanguage("txtExt", [".sln", ".log", ".vsconfig", ".env", ".srt"], "txt", monaco)
registerAdditionalLanguage("razorExt", [".razor"], "razor", monaco)
registerAdditionalLanguage("vbExt", [".vbs"], "vb", monaco)
registerAdditionalLanguage("iniExt", [".inf", ".gitconfig", ".gitattributes", ".editorconfig"], "ini", monaco)
registerAdditionalLanguage("shellExt", [".ksh", ".zsh", ".bsh"], "shell", monaco)
registerAdditionalNewLanguage("reg", [".reg"], regDefinition(), monaco)
registerAdditionalNewLanguage("gitignore", [".gitignore"], gitignoreDefinition(), monaco)
registerAdditionalLanguage("cppExt", [".ino", ".pde"], "cpp", monaco);
registerAdditionalLanguage("xmlExt", [".wsdl", ".csproj", ".vcxproj", ".vbproj", ".fsproj"], "xml", monaco);
registerAdditionalLanguage("txtExt", [".sln", ".log", ".vsconfig", ".env"], "txt", monaco);
registerAdditionalLanguage("razorExt", [".razor"], "razor", monaco);
registerAdditionalLanguage("vbExt", [".vbs"], "vb", monaco);
registerAdditionalLanguage("iniExt", [".inf", ".gitconfig", ".gitattributes", ".editorconfig"], "ini", monaco);
registerAdditionalLanguage("shellExt", [".ksh", ".zsh", ".bsh"], "shell", monaco);
registerAdditionalNewLanguage("reg", [".reg"], regDefinition(), monaco);
registerAdditionalNewLanguage("gitignore", [".gitignore"], gitignoreDefinition(), monaco);
registerAdditionalNewLanguage("srt", [".srt"], srtDefinition(), monaco);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why all of the are changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I don't see any changes there. Maybe it's just git being weird?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, sorry, I see it now. There are semicolons added to each line. I guess I hit format and it added them without me noticing. But I guess we can keep that in, right? After all, they're supposed to be there.

Comment on lines +3 to +5
{token: 'string.bold', fontStyle: 'bold'},
{token: 'string.emphasis', fontStyle: 'italic'},
{token: 'string.underline', fontStyle: 'underline'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need combined styles for the nested formats? Or is this to complicated?

Copy link
Contributor Author

@PesBandi PesBandi Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we would for proper nesting, however as far as I know there isn't any nice way to do that, we would need a separate state for every possible combination (bold, italic, underline, bold-italic, bold-underline, italic-underline and so on...) and not even VSCode does that for Markdown, so I think that it's unnecessary.
The tags are styled, so even if you have something like <b><i>text</i></b> you can see both bold and italic.

src/Monaco/customLanguages/gitignore.js Show resolved Hide resolved
@htcfreek
Copy link
Collaborator

All text on the same line as a timestamp will be colored the same as the subtitle text, even though it should be ignored

What color?

Can we instead color it white? Then it is more clear what is used as sub title.

  • Maybe define the and of time stamp color correctly.
  • Maybe require that the subtitle text starts in a new line.

@htcfreek
Copy link
Collaborator

htcfreek commented Oct 29, 2024

You might look at this:
https://docs.lokalise.com/en/articles/5365539-srt-files-and-all-you-need-to-know-about-subrip-subtitles

This page contains additional sytax that you currently not support in your PR.
And I am wondering if we really should show the text based on the containing format tags. Because applying the color format definition will be tricky. Maybe we should only format text in one color and format tags in an other color.


Edited at 30. October 2024 00:08 AM.

@PesBandi
Copy link
Contributor Author

This page contains additional sytax that you currently not support in your PR.

Yes, it does contain the following syntax that I don't support:

  • Curly braces for tags - {b}bold{/b}
  • Font color with <font>
  • Line position with X1:… X2:… Y1:… Y2:…

I'm aware of this, but I have decided not to support these features for a reason:

  • Most sources never mention curly braces for tags. I don't think they were ever an official part of the srt format. I am guessing that this misconception comes from the fact that the SubStation Alpha (ASS) format uses curly braces, someone published one wrong article and everyone copied it. Moreover, FFMpeg, one of the most trusted video tools, doesn't support that kind of syntax and just ignores it.
  • There is no way I can support proper coloring, so there are three options:
    • Ignore <font> completely
    • Color/style all <font> tags in one way to make it clear that there is one
    • Support some of the basic colors (red, green, blue...) and do option 2 for the rest of the colors
  • X1:… X2:… Y1:… Y2:… was never officially supported, it's a story similar to {b}

@PesBandi
Copy link
Contributor Author

PesBandi commented Oct 30, 2024

Can we instead color it white? Then it is more clear what is used as sub title.

I am working on this turns out it's a little more complicated than I thought though, so it may take some time for me to get it working Solved it

@htcfreek
Copy link
Collaborator

Can we instead color it white? Then it is more clear what is used as sub title.

I am working on this, turns out it's a little more complicated than I thought though, so it may take some time for me to get it working

You know the test page?
https://microsoft.github.io/monaco-editor/monarch.html

@PesBandi
Copy link
Contributor Author

Can we instead color it white? Then it is more clear what is used as sub title.

I am working on this, turns out it's a little more complicated than I thought though, so it may take some time for me to get it working

You know the test page? https://microsoft.github.io/monaco-editor/monarch.html

Yeah, I do, that's where I did most of my testing. Thanks for making sure

@PesBandi
Copy link
Contributor Author

I pushed the changes, it now looks like this
image

@htcfreek
Copy link
Collaborator

Wondering if we should format in the text only the tags instead of memic the format. This would help with the font tag problem.

We could use a lighter color or make it italic. Then you can see that it is a valid tag or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants