RFC: Code sharing for ET export, C++ runner and tokenizer, with ExecuTorch

### 🚀 The feature, motivation and pitch

Currently torchchat is having its own implementation for these features:
* Utils to optimize, quantize and export eager model to ExecuTorch.
* A LLM runner for AOTI and ExecuTorch.
* Tokenizers (sentencepiece and tiktoken) used by both AOTI runner and ET runner.

The problem for this design is that it is not bringing in new features checked-in into the [export flow in ExecuTorch](https://github.com/pytorch/executorch/tree/main/extension/llm/export). What's worse is that the [demo apps hosted in torchchat](https://github.com/pytorch/torchchat/tree/main/torchchat/edge/android) is expecting a `.pte` file from the export flow in ExecuTorch instead of the one from torchchat and that will easily break if changes happen to one or the other.

Similar story happens to the C++ implementations of the tokenizers. If we look at [tokenizers in ExecuTorch](https://github.com/pytorch/executorch/tree/main/extension/llm/tokenizer) it is a lot similar to what [tokenizers in torchchat](https://github.com/pytorch/torchchat/tree/main/tokenizer) and the code should be unified to avoid duplication.

### Alternatives

An alternative is do nothing. If we keep the status quo, DevX will deteriorate due to constant changes from ExecuTorch that we need to incorporate into torchchat.  

### Additional context

_No response_

### RFC (Optional)

## Proposal

On a high level we want to:
* Reuse export flow in ExecuTorch's `extension/llm` directory.
* Setup a new repo for tokenizers/samplers under `pytorch-labs`.
* Let runner code depend on the new tokenizer repo.

## Details
![llm stack](https://github.com/user-attachments/assets/f923c5dd-6923-4e21-a9d8-e5158aba7f1f)

### Export flow:
Currently torchchat uses [`export.py`](https://github.com/pytorch/torchchat/blob/main/torchchat/export.py) to export a model to ET's .pte file. 
Proposal: fully migrate to ET’s `extension/llm`.
New dependency: ET nightly build in pip.

### Runner:
Torchchat C++ runner needs to work for both AOTI and ET so it’s quite complicated. 
Proposal 1 (preferred):
* Setup a separate repo for runner and tokenizer code. Both ET and torchchat depend on it.
    * Add a public repo under [`pytorch-labs`](https://github.com/pytorch-labs) organization, say `pytorch-labs/tokenizers`
    * Split existing run.cpp into `et_run.cpp` and `aoti_run.cpp`
        * et_run.cpp depends on ExecuTorch as well as `pytorch-labs/tokenizers`,
        * aoti_run.cpp only depends on `pytorch-labs/tokenizers`.
    * Pros: no code duplication, clear dependencies.
    * Cons: maintenance cost for a new repo.

Proposal 2 (short term?):
* Use runner building blocks and tokenizer from ET. Refactor existing run.cpp to reuse those components. Add ET as a git submodule.
    * Pros: no code duplication.
    * Cons: if a user only wants to build an AOTI runner, it’s weird to pull in tokenizer code from ET.

### Model definition:
Torchchat depends on torchtune for model definition. All the source transformations will come from the ET `extension/llm` library. Modules that are modified to be `torch.export`able will be hosted in ET `extension/llm`, torchchat should use those as well.

Example: torchtune’s `MultiHeadAttention` has an input dependent condition that needs to be rewritten into torch.cond so that it’s exportable. This lives in `extension/llm/modules` and should be used by torchchat. [Pending discussion] If torchtune is open to host these exportable modules, torchchat should depend on torchtune to get them.

### Demo app:
For both Android and iOS, we want to build runner and tokenizer as libraries, package them into artifacts and distribute them to torchchat. 
We are already doing this for [Android demo app](https://github.com/pytorch/torchchat/tree/main/torchchat/edge/android)
iOS demo app code should live in torchchat as well and both demo apps should be removed from ET in the future.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Code sharing for ET export, C++ runner and tokenizer, with ExecuTorch #1333

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Proposal

Details

Export flow:

Runner:

Model definition:

Demo app:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: Code sharing for ET export, C++ runner and tokenizer, with ExecuTorch #1333

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Proposal

Details

Export flow:

Runner:

Model definition:

Demo app:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions