Q: How to choose between sampling or beam search for inference
In various scenarios, the quality of results obtained from beam search and sampling decoding strategies can vary. You can determine your decoding strategy based on the following aspects:
If you have the following needs, consider using sampling decoding:
- You require faster inference speed.
- You wish for a streaming generation approach.
- Your task necessitates some open-ended responses.
If your task is about providing deterministic answers, you might want to experiment with beam search to see if it can achieve better outcomes.
Q: How to ensure that the model generates results of sufficient length
We've observed that during multi-language inference on MiniCPM-V 2.6, the generation sometimes ends prematurely. You can improve the results by passing a min_new_tokens
parameter.
res = model.chat(
image=None,
msgs=msgs,
tokenizer=tokenizer,
min_new_tokens=100
)