Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editing VTT breaks INDEXTRANSCRIPT #151

Open
DonRichards opened this issue Oct 3, 2019 · 20 comments
Open

Editing VTT breaks INDEXTRANSCRIPT #151

DonRichards opened this issue Oct 3, 2019 · 20 comments
Assignees
Labels

Comments

@DonRichards
Copy link
Member

When first uploaded the transcripts were correct. After editing the file it creates empty INDEXTRANSCRIPT files. Regenerating INDEXTRANSCRIPT also results in an (47 B) empty file.

Screen Shot 2019-10-03 at 9 42 06 AM

Here is the transcript. I've tried to remove any special characters but it still seems broken.

@MarcusBarnes
Copy link
Contributor

@DonRichards Thanks for reporting. Next step is for me to reproduce. Thanks for your patience while I try to work this task into my work schedule.

@MarcusBarnes MarcusBarnes self-assigned this Oct 3, 2019
@DonRichards
Copy link
Member Author

@MarcusBarnes Are you able to reproduce the error?

@DonRichards
Copy link
Member Author

Ping @MarcusBarnes

@MarcusBarnes
Copy link
Contributor

@DonRichards I've been away. I'll look into this this week. Would you please clarify how the VTT was edited?

@DonRichards
Copy link
Member Author

Here's an example of a transcript that is failing. It works upon ingest but it fails when the editor is used.
test.vtt.txt
Screen Shot 2019-10-22 at 11 25 11 AM

@DonRichards
Copy link
Member Author

One discripency you might have noticed between the screenshot and the vtt file is the closing </v> tag. I've tried it both ways with no luck.

@DonRichards
Copy link
Member Author

I tried regenerating the INDEXTRANSCRIPT file but it creates a blank (47 B) file.

@MarcusBarnes
Copy link
Contributor

@DonRichards Would you please confirm that WEBVTT was used for the transcript datastream when creating the initial oral history object? That is, you did not use transcript XML for the transcript datastream and then have WebVTT generated from the transcript XML?

@MarcusBarnes
Copy link
Contributor

@DonRichards I was able to reproduce the behaviour you reported. I've labeled this as a bug. I'll note that the screenshot you shared in #151 (comment) is not the default that ships with the solution pack, but that the issue is not related to that customization.

@DonRichards
Copy link
Member Author

I uploaded a WebVTT file as the transcript when I ingested the object. Screen Shot 2019-10-22 at 1 03 14 PM Sorry, I wrote this and didn't click the green button. >:-|

@MarcusBarnes
Copy link
Contributor

@DonRichards Thank you for confirming.

@MarcusBarnes
Copy link
Contributor

@DonRichards For the example object above, please grab the text file below, remove the .txt extension (so that the file name and extension is unixlf.vtt), and then replace the TRANSCRIPT datastream with this file via the manage datastreams interface. Please do not otherwise open or edit the file.

unixlf.vtt.txt

After the TRANSCRIPT datastream has been replaced, click the regenerate operation for the INDEXTRANSCRIPT datastream.

Please let me know if you get the 47 B file (as per #151 (comment)) for the INDEXTRANSCRIPT datastream or not.

@DonRichards
Copy link
Member Author

Doing those steps does fix the issue.

@DonRichards
Copy link
Member Author

Is this to identify if ant \r \n characters are the issue?

@MarcusBarnes
Copy link
Contributor

@DonRichards Correct. It seems that the parse_vtt function is breaking on the CR \r characters.

@DonRichards
Copy link
Member Author

DonRichards commented Oct 28, 2019

@MarcusBarnes I wonder why the module is generating a \r character instead of the typical \n. It should be easy enough to sanitize this.

@DonRichards
Copy link
Member Author

@MarcusBarnes What was the steps you took to strip out those characters? I've ran a few tests (replacing \n with \r and tried \r\n) with no luck.

@MarcusBarnes
Copy link
Contributor

@DonRichards I opened the sample VTT you provided in my text editor. My text editor (currently BBEdit) has the option of changing line ending characters from Windows (CRLF) to Unix (LF). If you're working on Windows, Notepad++ provides similar functionality. After changing the line ending settings, I saved.

@DonRichards
Copy link
Member Author

I got it. Thanks. For the sake of prosperity for others if they come across this issue before it gets resolved I think the fix is easy enough. Steps to work around this issue

  1. Download the VTT file (for example lets call it view.vtt)
  2. Run this command against it $ dos2unix -ic view.vtt | xargs dos2unix
  3. Replace data stream ( manage > datastreams > TRANSCRIPT > Replace > Upload)

@DonRichards
Copy link
Member Author

The IDE solution works as well. Sorry, should have made that comment as well. Command line solutions avoid the IDE configuration craziness (like working with ATOM vs notepad++). I hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants