Skip to content

Using Hugging Face model card name in export_llama #8872

Closed
@iseeyuan

Description

@iseeyuan

🚀 The feature, motivation and pitch

Currently, user need to manually download hugging face safetensors, convert to llama_transformer format, and load the checkpoint and config for the export and inference.

It would be great to directly download and cache (don't have to load it again) the converted checkpoints, and do the inference. Similar to what mlx does:

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/dolphin3.0-llama3.2-3B-4Bit")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

cc @mergennachin @cccclai @helunwencser @jackzhxng

Metadata

Metadata

Assignees

Labels

module: llmIssues related to LLM examples and apps, and to the extensions/llm/ codetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

Done

Status

Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions