FAQs

Q: How to choose between sampling or beam search for inference

In various scenarios, the quality of results obtained from beam search and sampling decoding strategies can vary. You can determine your decoding strategy based on the following aspects:

If you have the following needs, consider using sampling decoding:

You require faster inference speed.
You wish for a streaming generation approach.
Your task necessitates some open-ended responses.

If your task is about providing deterministic answers, you might want to experiment with beam search to see if it can achieve better outcomes.

Q: How to ensure that the model generates results of sufficient length

We've observed that during multi-language inference on MiniCPM-V 2.6, the generation sometimes ends prematurely. You can improve the results by passing a min_new_tokens parameter.

res = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer,
    min_new_tokens=100
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faqs.md

faqs.md

FAQs

Files

faqs.md

Latest commit

History

faqs.md

File metadata and controls

FAQs