Skip to content

Conversation

@alay2shah
Copy link
Contributor

  • Transformers: text generation, streaming, vision
  • vLLM: text generation, batching, vision
  • llama.cpp: CLI and server API, vision
  • MLX: text generation, streaming, vision (Apple Silicon note)
  • Ollama: Python/curl API examples, vision

- Transformers: text generation, streaming, vision
- vLLM: text generation, batching, vision
- llama.cpp: CLI and server API, vision
- MLX: text generation, streaming, vision (Apple Silicon note)
- Ollama: Python/curl API examples, vision

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use return_dict=True with apply_chat_template and access
input_ids properly to avoid AttributeError on shape access.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use dtype="bfloat16" (not torch_dtype)
- Use batch_decode as in docs
- Keep tokenize=True as per docs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
IMPORTANT: tie lm_head to input embeddings to fix missing
weights issue in transformers v5.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Pin torch==2.9.0 to prevent errors
- Use dtype instead of torch_dtype (deprecated)
- Remove fallback note

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use pip with --extra-index-url for CUDA support
  (--torch-backend is a uv flag, not pip)
- Fix chat() API to use message format instead of raw strings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants