Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilanguage support #103

Open
janvarev opened this issue May 20, 2023 · 6 comments
Open

Multilanguage support #103

janvarev opened this issue May 20, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@janvarev
Copy link

Feature request

Increase effitiency of system by translating input and output data to user language.

What we need?

  • Translate input fragments from UserLang to English
  • Translate output fragments to UserLang (if needed)

So, engine will proceed search on English (correctly), but input/output will be translated

Motivation

I need to efficiently index docs on my native language.

Your contribution

I recommend to use this compact function from my project: https://github.com/janvarev/kobold_api_multilang_proxy/blob/main/server.py#L25

def translator_main(string,from_lang:str,to_lang:str) -> str:

It allow to translate strings using GoogleTranslator from deep_translator lib (standart, no special processing required)
or using my project OneRingTranslator (REST translation server setup required). OneRingTranslator will allow user to choose engine for translation and even translate locally with Meta NLLB neuronet.

Default settings for user can be easy:

  • UserLang=en
  • TranslationEngine=GoogleTranslate

If we call translator_main("string","en","en") string will just return unchanged. So, it will no changes for user who don't want to use translations in their project (by default).

If user want to change, he can change UserLang and TranslationEngine option.

@su77ungr
Copy link
Owner

Could be an addition.

So your are basically setting up a translation service in front of your sdout? Do we know the depth of supported languages on the base model?

Split this into two parts:

  • input data

Might be a thing if you already want the system the check out with the right language

  • output data

How would that compare with the localisation skills of a LLM when queried in comparison to a raw translation.

@su77ungr su77ungr added the enhancement New feature or request label May 20, 2023
@janvarev
Copy link
Author

Hi!
Sorry, due to some reason I can't install CASALIOY with poetry :((, but can install privateGPT with pip. (I hope you will support pip in future...)

I've prepared PR for privateGPT, you can see it here: zylon-ai/private-gpt#325 or here: https://github.com/janvarev/privateGPT

There are translation logic on "before" generate, and "after" generate; both are optional. First is for translate queries to En; last is for translate result back (not necessary, but handy for user).


Supported translation logic:

  • GoogleTranslate,
  • via OneRingTranslator
    • Google Translate (online)
    • Libre Translate (online or offline)
    • FB NLLB neuronet (offline)

A word about working with LLM without English translation - results simply significantly worse. Yes, we can query on native lang, but at 50% or more cases answers are bad (not related to topic etc.)

@janvarev
Copy link
Author

UPD: you can get any code you need from PR above, I'll be glad to see it implemented.

@hippalectryon-0
Copy link
Contributor

Sorry, due to some reason I can't install CASALIOY with poetry :((, but can install privateGPT with pip. (I hope you will support pip in future...)

Can you open a separate issue detailing what doesn't work ?

@su77ungr su77ungr pinned this issue May 23, 2023
@su77ungr
Copy link
Owner

I'm going to perform a comparison of the translation and localisation results by using both OneRingTranslator and guidance.

Google won't make it into this repo tho. Just a brief status update

@janvarev
Copy link
Author

@su77ungr I've added BLEU measurements and script to do them in OneRingTranslator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants