Skip to content

Conversation

@gante
Copy link
Contributor

@gante gante commented Nov 28, 2023

What does this PR do?

A common trend is starting to pop up: people are experimenting with new strategies to generate candidate sequences, to then run an assisted-generation-like strategy. A key example is the new technique in #27722, which is equal to assisted_decoding except for the candidate generation part. This technique in particular achieves nice speedups in some settings, and doesn't need an assistant model -- a model-free speedup!

To facilitate experimentation and the addition of new candidate generation techniques, this PR abstracts the candidate generation part in assisted_decoding to a new class with a stable API. This was inspired in classes like LogitsProcessor or StoppingCriteria -- components of generate that can easily be replaced. All these changes are backwards compatible! 🤗

Suggested review order:

  1. utils.py, to see the shape of assisted_decoding under the abstracted API
  2. candidate.py, to see the structure of the new base class (and the specific case of the original assisted generation)

The following tests are passing:

  1. RUN_SLOW=1 py.test tests/models/whisper/ -k speculative
  2. py.test tests/ -k test_assisted (which catches mixin and integration tests associated with assisted generation)

Happy to add more tests if needed :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: these functions were moved here to avoid circular imports

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to use the new cache format no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soon! (we still need to maintain retrocompatibility)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will be expanded as we add more CandidateGenerator :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so will the logic here be something like

  • check params in generation_config (some if else condition)
  • based on params, set candidate_generator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apoorvumang exactly!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, so I'll write a PromptLookupCandidateGenerator that implements CandidateGenerator, and then wire it up in this function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the plan is to have similar checks to the ones we have for the supported logits processor I guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurZucker precisely, we will have flags to control which candidate generation strategies we have in place. I suspect that, because some candidate generation strategies are so cheap (like the one proposed in #27722), assisted generation may become mainstream!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@apoorvumang
Copy link
Contributor

One more place needs to change I think - generation_mode is currently set using _get_generation_mode , where this is the logic:

if assistant_model is not None:
            if generation_mode in ("greedy_search", "sample"):
                generation_mode = GenerationMode.ASSISTED_GENERATION
            else:
                raise ValueError(
                    "You've set `assistant_model`, which triggers assisted generate. Currently, assisted generate "
                    "is only supported with Greedy Search and Sample."
                )

how do u suggest this should change to support prompt lookup decoding? @gante

@gante
Copy link
Contributor Author

gante commented Nov 30, 2023

@apoorvumang I'd add an or after if assistant_model is not None

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 would maybe rename the file with candiate_generators like we have logits_processor but otherwise great!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the plan is to have similar checks to the ones we have for the supported logits processor I guess?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to use the new cache format no?

@gante gante force-pushed the arbitrary_candidate_fn branch from 77a8b67 to a57367b Compare December 11, 2023 19:57
@gante gante merged commit 4b759da into huggingface:main Dec 12, 2023
@gante gante deleted the arbitrary_candidate_fn branch December 12, 2023 09:26
iantbutler01 pushed a commit to BismuthCloud/transformers that referenced this pull request Dec 16, 2023
…ors (huggingface#27750)

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants