Why are \n generated in the output #128

Cgrandjean · 2024-08-01T20:44:41Z

Hello guys,
I would like to train the model with format enforced output. I plan to mask the forced tokens to avoid learning this part.
But i see a lot of \n are added ,these constitute tokens too and i was wondering whats the logic behind that .Why are they added. Are they always there and thats normal,in which case i should mask them or is it something else?

Thanks for all answers!

aw632 · 2024-08-07T16:27:02Z

They're whitespace tokens (\n, \t, \d, etc) which are typically allowed. You can disable them using the CharacterLeveParserConfig.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are \n generated in the output #128

Why are \n generated in the output #128

Cgrandjean commented Aug 1, 2024

aw632 commented Aug 7, 2024

Why are \n generated in the output #128

Why are \n generated in the output #128

Comments

Cgrandjean commented Aug 1, 2024

aw632 commented Aug 7, 2024