-- **relevance**: Relevance is disabled by default. When enabled, two models are run in parallel: the primary model and the mask model. The mask model is scaled by the relevance value, and then the probabilities of the primary model are multiplied by the complement of the mask model before sampling. The state of the mask model is reset upon each newline character. The net effect is that the model is encouraged to choose a line of dialogue that is most relevant to the prior line of dialogue, even if a more generic response (e.g. "I don't know anything about that") may be more absolutely probable. Lower relevance values put more pressure on the model to produce relevant responses, at the cost of the coherence of the responses. Going much below 1.5 compromises the quality of the responses; 2-3 is the recommended range. However, relevance is disabled by default as it halves the speed of sampling and I'm not confident that it improved the outputs.
0 commit comments