Skip to content

Conversation

@HanNayeoniee
Copy link
Contributor

@HanNayeoniee HanNayeoniee commented Jul 23, 2023

What does this PR do?

Translated the <tokenizer_summary>.md file of the documentation to Korean.
Thank you in advance for your review.

Part of #20179

Before reviewing

  • Check for missing / redundant translations (번역 누락/중복 검사)
  • Grammar Check (맞춤법 검사)
  • Review or Add new terms to glossary (용어 확인 및 추가)
  • Check Inline TOC (e.g. [[lowercased-header]])
  • Check live-preview for gotchas (live-preview로 정상작동 확인)

Who can review? (Initial)

@sronger, @TaeYupNoh, @kj021, @HanNayeoniee, @eenzeenee, @sim-so

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review? (Final)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Contributor

@sim-so sim-so left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

나연님 항상 알기 쉬운 말로 번역해주셔서 좋아요!
게다가 이번 문서에서 토크나이저를 쭉 둘러볼 수 있어서 유익했습니다 😊

리뷰 하면서 glossary 관련한 수정 제안을 몇 가지 드렸습니다.
참고 부탁 드립니다!

Co-Authored-By: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-Authored-By: Juntae <79131091+sronger@users.noreply.github.com>
Co-Authored-By: Injin Paek <71638597+eenzeenee@users.noreply.github.com>
@sronger
Copy link
Contributor

sronger commented Jul 30, 2023

리뷰를 남기고 submit을 안 했었네요 .. ㅠ

@HanNayeoniee
Copy link
Contributor Author

리뷰를 남기고 submit을 안 했었네요 .. ㅠ

ㅋㅋㅋㅋㅋㅋ 다행히 지금 번역 수정중이라서 확인했습니다!

@HanNayeoniee
Copy link
Contributor Author

나연님 항상 알기 쉬운 말로 번역해주셔서 좋아요! 게다가 이번 문서에서 토크나이저를 쭉 둘러볼 수 있어서 유익했습니다 😊

리뷰 하면서 glossary 관련한 수정 제안을 몇 가지 드렸습니다. 참고 부탁 드립니다!

제가 번역을 오랜만에 해서 그런지 glossary 관련 수정사항이 많군요.. 꼼꼼한 리뷰 감사합니다!!

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Looks good overall, except for a few formatting things 👍


<a id='byte-pair-encoding'></a>

### 바이트 페어 인코딩(Byte-Pair Encoding, BPE)[[bytepair-encoding-bpe]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### 바이트 페어 인코딩(Byte-Pair Encoding, BPE)[[bytepair-encoding-bpe]]
### 바이트 페어 인코딩 (Byte-Pair Encoding, BPE)[[bytepair-encoding-bpe]]

이전에 언급했듯이 어휘 크기(즉 기본 어휘 크기 + 병합 횟수)는 선택해야하는 하이퍼파라미터입니다.
예를 들어 [GPT](model_doc/gpt)의 기본 어휘 크기는 478, 40,000번의 병합 이후에 훈련을 종료하기 때문에 어휘 크기가 40,478입니다.

#### 바이트 수준 (Byte-level) BPE[[bytelevel-bpe]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### 바이트 수준 (Byte-level) BPE[[bytelevel-bpe]]
#### 바이트 수준 (Byte-level BPE) [[bytelevel-bpe]]

Copy link
Contributor

@wonhyeongseo wonhyeongseo Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the translations, @HanNayeoniee ! Since 바이트 수준 refers to Byte-level, either:

  • repeating BPE for clarity or
  • removing the space in-between parenthesis would help
Suggested change
#### 바이트 수준 (Byte-level) BPE[[bytelevel-bpe]]
#### 바이트 수준 BPE (Byte-level BPE)[[bytelevel-bpe]]

Please resolve the remaining suggestions as well to merge the PR 😄


<a id='sentencepiece'></a>

### 센텐스피스(SentencePiece)[[sentencepiece]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### 센텐스피스(SentencePiece)[[sentencepiece]]
### 센텐스피스 (SentencePiece)[[sentencepiece]]

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants