Notes & lightweight experiments on LLMs, open-weight models, multimodal systems Ghi chú & thử nghiệm nhẹ về LLM, mô hình open-weight, hệ thống đa phương thức
- Track learning & decisions clearly (theory → practice).
- Build a minimal but solid stack for open-weight models (download, storage, inference)
- Explore “core block / living core” integration as a state layer on top of base models
- Model: gpt-oss-120B, Llama 3/4, Mixtral (gguf), vLLM/Transformers/Ollama.
- Infra: Vertex AI endpoints, Hugging Face Hub, Cloud Storage (GCS/S3), basic MLOps
- Techniques: prompting, RAG, external agent state, light SFT/LoRA.
- [YYYY-MM-DD] Init repo & plan structure.
- [YYYY-MM-DD] Notes on downloading open-weight models & storage strategy.
- [YYYY-MM-DD] Vertex AI endpoint reading + cost model (per-token vs per-hour).
Update this section as you go / Cập nhật mục này khi bạn tiến hành.
- Minimal inference with vLLM (local or cloud)
- Compare memory/storage footprints (Dense vs MoE, quantization)
- State management layer for “core block” (external orchestration)
- Hugging Face Hub docs, vLLM, Transformers
- Vertex AI Docs (Model Garden, Endpoints)
- Llama model cards & licenses
#LLA Research Notes
MIT