I found lyrics is mis-decoding. With some debugging, I found the fact that USLT actually uses 32bit integer for frame size. (what about the other frame?) If we change USLT frame size to 32 bit integer instead of the sync safe integer, it works.