-
Notifications
You must be signed in to change notification settings - Fork 964
Closed as not planned
Labels
Description
I expected TokenizerBuilder
to produce a Tokenizer
from the build()
result, but instead Tokenizer
wraps TokenizerImpl
.
No problem, I see that it impl From<TokenizerImpl> for Tokenizer
, but it's attempting to do quite a bit more for some reason? Meanwhile I cannot use Tokenizer(unwrapped_build_result_here)
as the struct is private 🤔 (while the Tokenizer::new()
method won't take this in either)
let mut tokenizer = Tokenizer::from(TokenizerBuilder::new()
.with_model(unigram)
.with_decoder(Some(decoder))
.with_normalizer(Some(normalizer))
.build()
.map_err(anyhow::Error::msg)?
);
error[E0283]: type annotations needed
--> mistralrs-core/src/pipeline/gguf_tokenizer.rs:139:41
|
139 | let mut tokenizer = Tokenizer::from(TokenizerBuilder::new()
| ^^^^^^^^^^^^^^^^^^^^^ cannot infer type of the type parameter `PT` declared on the struct `TokenizerBuilder`
|
= note: cannot satisfy `_: tokenizers::PreTokenizer`
= help: the following types implement trait `tokenizers::PreTokenizer`:
tokenizers::pre_tokenizers::bert::BertPreTokenizer
tokenizers::decoders::byte_level::ByteLevel
tokenizers::pre_tokenizers::delimiter::CharDelimiterSplit
tokenizers::pre_tokenizers::digits::Digits
tokenizers::decoders::metaspace::Metaspace
tokenizers::pre_tokenizers::punctuation::Punctuation
tokenizers::pre_tokenizers::sequence::Sequence
tokenizers::pre_tokenizers::split::Split
and 4 others
note: required by a bound in `tokenizers::TokenizerBuilder::<M, N, PT, PP, D>::new`
--> /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/mod.rs:314:9
|
314 | PT: PreTokenizer,
| ^^^^^^^^^^^^ required by this bound in `TokenizerBuilder::<M, N, PT, PP, D>::new`
...
319 | pub fn new() -> Self {
| --- required by a bound in this associated function
help: consider specifying the generic arguments
|
139 | let mut tokenizer = Tokenizer::from(TokenizerBuilder::<tokenizers::models::unigram::Unigram, tokenizers::NormalizerWrapper, PT, PP, tokenizers::DecoderWrapper>::new()
| +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Why is this an issue? Isn't the point of the builder so that you don't have to specify the optional types not explicitly set?
cannot infer type of the type parameter `PT` declared on the struct `TokenizerBuilder`
I had a glance over the source on github but didn't see an example or test for using this API and the docs don't really cover it either.
Meanwhile with Tokenizer
instead of TokenizerBuilder
this works:
let mut tokenizer = Tokenizer::new(tokenizers::ModelWrapper::Unigram(unigram));
tokenizer.with_decoder(decoder);
tokenizer.with_normalizer(normalizer);