Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editing of one enrty cannot be successful what so ever #16

Open
abuali129 opened this issue Feb 10, 2017 · 25 comments
Open

Editing of one enrty cannot be successful what so ever #16

abuali129 opened this issue Feb 10, 2017 · 25 comments

Comments

@abuali129
Copy link

abuali129 commented Feb 10, 2017

I'm still continue with my project, and I develop a pattern for localizing the TPP to arabic, which is appear to be successful in every .subp and all entries in them.
But there's one entry that whatever I do it is get corrupted inside the game,

Entry Id="2161021477" in the tape.subp
Cassette tape is Skull Face's Objective [4]
Track Secret Recording of Skull Face and Code Talker [2]

Whenever I modify this entry and put it on the game, the subtitles won't show, and also the rewind and forward buttons are getting corrupted.
I provide a two .subp sample containing just the subjected entry, for both the original and modified one.
Also see the videos to look at the original behavior and the corrupted

Videos
original https://www.youtube.com/watch?v=VpmW2LG6oBk
corrupted https://www.youtube.com/watch?v=xNhw1nteyLg

Samples
Original https://mega.nz/#!zM8khIwb!L-ORh-oHNcA3H1NqgC7YUOijx6oeTvp4BelGiJW6MbU
corrupted https://mega.nz/#!rEkFDSaL!OpJ2ntxyVx3ittkq2r6i72V2U_GoRYIU7LI5YBhx24E

@Atvaark
Copy link
Owner

Atvaark commented Feb 10, 2017

The only difference between the two files you provided is the text content and length (if they are both utf-8 encoded).

The "corrupted" one seems to be missing the character ID prefixes ([C=37]) in each line. Perhaps the game can only "skip" to lines with a character ID.

Original:

<Line Text="[C=37]Forgive me, but my schedule has changed.">
    <Timing Start="576" End="830" />
</Line>

Modded:

<Line Text=".ばチをす ぬウ タガぐケろぅ オズぬち コォガ ぁタガ つケふぉ         ">
    <Timing Start="576" End="830" />
</Line>

@abuali129
Copy link
Author

I already tried having the character ID prefixes, but same result.
I found another entries that had the same issues with it. I'll make it ready just in the next moments.

@abuali129
Copy link
Author

Entry Id="856784307"
Cassette tape is Truth Records
Track Secret Recording with PAZ and ZERO

Original https://mega.nz/#!uRVTSDTa!UB7xpGYQY5WcEQrw02G0JF2MqC1-YX9YZPiCWPUkKtg
Modified
https://mega.nz/#!7A1WmChD!YxvS2283CXup87wsXOtNNUTA9Cc-eo5t155vx07DxZg
Corrupted
https://mega.nz/#!Td8RXZwZ!Un70HOvJIJcFP8efntkQ0HVSRXm-8FZnhCroHrzuoiw

Look at Lines# 138 & 336

in the Modified version, the file works probably, that's because I didn't touch these 2 lines..
in the Corrupted Version I add my text and same issue as before happened.

@abuali129
Copy link
Author

on the last sample that I provide, I did some tests.
It appears that if the length of the whole file exceeds 17,572 the problem exists. I didn't take the length in mind on my project before, I thought that it is not gonna cause a problem...
well do some test on the first sample just to get a good picture of what is happening

@Atvaark
Copy link
Owner

Atvaark commented Feb 11, 2017

It could also be related to the characters in a line and the line length.

Your example is a lot larger than the original line.
[C=20]ザはズぐぢケガく ホざアばをガく タア クスゴぉ

Could you try replacing the original line with substrings of varying length of your modded line?
(1, 2, 3... characters)
Maybe you can find out which length triggers the corruption.

@abuali129
Copy link
Author

Yes I tried that already, I even put english words instead -with respect of crossing the maximum length-, l am pretty sure that the problem happens because I passed the maximum length in the entry.
I did some tests on both files a I got a clear picture of what happened.

@abuali129
Copy link
Author

The conclusion of this, is that whenever the -length- if the whole enrty exceeds a specific lenght that each entry could take, no matter what substring is causing that, substring itself is not related directly but the length of the whole thing.

Hope if there's any means to increase the "length limit".
And by the way not all characters equal in lenght, some of them as for one letter it add +4 to the length. I think that's related to unicode coding of the characters.

@Atvaark
Copy link
Owner

Atvaark commented Feb 12, 2017

You're right with the different sizes for different UTF-8 codepoints.

As each entry in a subp file is saved as a single string with $-characters separating the lines, the max character limit per entry should be (assuming the entry has at least one line):
2^16-len(lines)-3

  1. 2^16-1 is max 16bit
  2. len(lines)-1 is the amount of $-characters required to separate the lines
  3. -1 for the NULL-terminator of the entry

As the game can't load these files correctly there have to be some other limits.
Could you perhaps check which unmodified file has the largest entry and check if the supported size can be increased by changing the flags?

@abuali129
Copy link
Author

update: seems like that reducing the length in the first sample cannot help either.
I managed to shrink down the length of the first sample to 21987 by combining some line texts strings along with changing the timing value for them, maybe the method itself is not working?! I still don't know.
here is the result
https://mega.nz/#!vYcTHKaC!eHBW258SCIP0UXOWFPJ5xHrxvKfIYbjKHtPMP_v1ISw

As for flags that you mentioned, I reported earlier on another raised issue that Flags value is related to content,
1024 is for cutscenes, 768 for cassette tapes, and others have other uses.

@Atvaark
Copy link
Owner

Atvaark commented Feb 12, 2017

Combining 2 lines will just save a single byte.
3472 of the 4525 UTF-8 codepoints used in your latest example are 3 bytes wide (the rest are 1 byte wide).
So you won't save much space by combining them.

Did you check if the size limit you found is the same for each subp file or if some of them have different limits?

@abuali129
Copy link
Author

The size limit, I'm not talking about the .supb because I have files that is have more bytes in it and it is working perfectly.
untitled

But, the size limit of an entry inside the .subp is different from each one
as for provided samples, the first sample length limit is 22,420, the 2nd sample length limit is 17,572

@Atvaark
Copy link
Owner

Atvaark commented Feb 13, 2017

You're mapping arbitrary Japanese UTF-8 codepoints to Arabic letters, right? Could you try using only codepoints that are 1 byte wide instead of using the 3 byte ones?
That alone could net you 6944 additional codepoints (to your latest example) before the corruption will start again.

@abuali129
Copy link
Author

abuali129 commented Feb 14, 2017

I had a third entry sample also which was corrupted but know I managed to fix it. If you want to look at it just let me know, also the 2nd sample is fixed just by removing some unwanted spaces, but the first one is something that cannot be repair
here is the second sample fixed
https://mega.nz/#!mQ8FVZwI!pC4-oZslXXptnyJWbYjiLqgwvgZQHI9NPDUXlD5a1UU

I tried using 1 byte letters as you suggested for the first sample, but still they can't cover all of the Arabic letters then I ended up using 2 byte letters with them, still the file is in corrupted status. Even if I merged line texts -which was the solution for the third sample- still no benefit. I managed to shrink to length to 21862 with 2,3 byte letters, and to 17,537 with merging line texts

@Atvaark
Copy link
Owner

Atvaark commented Feb 14, 2017

How many distinct letters are there in the Arabic alphabet (+numerics and punctuation)?
You should see that the most frequently used letters are encoded in 1 or 2 byte codepoints to save additional space. Either use this as source or analyze the frequency of your own subtitles.

@abuali129
Copy link
Author

Only the letters and punctuation 140 in total, numbers and symbols are shared with Latin, also I cannot replace the one byte Latin letters as I use them almost.
Anyway, looks like I will skip translating Entry Id="2161021477".

@Atvaark
Copy link
Owner

Atvaark commented Feb 14, 2017

That's unfortunate.

Since I can't change the limits imposed by the engine I'd rather print an error if one of the subtitles doesn't fit in an entry.

I'll have to analyze all the unedited subp files to get some more facts about the limits.

@abuali129
Copy link
Author

Any information I can provide for this? You just have to ask. And thanks hundred times for the awesome tool

@Atvaark
Copy link
Owner

Atvaark commented Feb 14, 2017

Could you perhaps upload a zip archive with all subtitles? I don't have the game installed right now and would have to redownload it first.

Add me on Steam as sharing all these files publically here is likely against the Github ToS.

@abuali129
Copy link
Author

All right, Steam id same as here?

@abuali129
Copy link
Author

There are three users by your name, I cannot identify you :)

@Atvaark
Copy link
Owner

Atvaark commented Feb 14, 2017

Link

@abuali129
Copy link
Author

Invitation sent

@Atvaark
Copy link
Owner

Atvaark commented Feb 15, 2017

The entry with id 2161021477 is indeed the largest one in all subs.

The max sizes (in bytes) in the unmodified files are as follows:
File: 513497
Entry: 7517
Line: 308

As long as these aren't exceeded the game should load them fine. Anything above these values needs some more testing.

@abuali129
Copy link
Author

I will look at this in the evening, thanks.

@abuali129
Copy link
Author

I got
573,670 bytes for modified file working fine except for entry id 2161021477
however, along side the modified files I but back the original entry id 2161021477 the result is
572,211 bytes without any problems,
however last 19 entries still at the original status

Entry Id="3976005522"
Entry Id="3983176914"
Entry Id="3985838335"
Entry Id="4015605908"
Entry Id="4033776865"
Entry Id="4038494047"
Entry Id="4044911970"
Entry Id="4084410907"
Entry Id="4123126871"
Entry Id="4131631805"
Entry Id="4181857144"
Entry Id="4201908311"
Entry Id="4205344688"
Entry Id="4209208445"
Entry Id="4230996696"
Entry Id="4272505980"
Entry Id="4275351727"
Entry Id="4277855698"
Entry Id="4289530536"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants