parallel computation of mask in constrained sampling

Currently (as per #899) , the masks are always computed in sequence.

Are the logit Tensors typically async? That is not synchronized yet, so `logits.to_vec1()` in `sample()` is what takes almost all time?

If so, I guess we could stay with the current interface and just do `tokio_rayon::spawn()` for the mask as you do for the sampling.

Otherwise, it would be good to kick-off mask computation before starting the forward pass.

This also depends a little on what we do with #963 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

parallel computation of mask in constrained sampling #964

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

parallel computation of mask in constrained sampling #964

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions