Skip to content

NotYuSheng/Multimodal-Large-Language-Model

Multimodal-Large-Language-Model (MLLM)

GitHub last commit Sphinx

Thank you for checking out the Multimodal-Large-Language-Model project. Please note that this project was created for research purposes.

For a more robust and well-developed solution, you may consider using open-webui/open-webui with ollama/ollama.

Demo image

Documentation

You can access the project documentation at [GitHub Pages].

Host requirements

  • Docker: [Installation Guide]
  • Docker Compose: [Installation Guide]
  • Compatibile with Linux and Windows Host
  • Ensure port 8501 and 11434 are not already in use
  • You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. [Source]
  • Project can be ran on either CPU or GPU

Running on GPU

Tested Model(s)

Model Name Size Link
llava:7b 4.7GB Link
llava:34b 20GB Link

Llava is pulled and loaded by default, other models from Ollama can be added into ollama/ollama-build.sh

Usage

Note

Project will run on GPU by default. To run on CPU, use the docker-compose.cpu.yml instead

  1. Clone this repository and navigate to project folder
git clone https://github.com/NotYuSheng/Multimodal-Large-Language-Model.git
cd Multimodal-Large-Language-Model
  1. Build the Docker images:
docker compose build
  1. Run images
docker compose up -d
  1. Access Streamlit webpage from host
<host-ip>:8501

API calls to Ollama server can be made to

<host-ip>:11434

About

Localized Multimodal Large Language Model (MLLM) integrated with Streamlit and Ollama for text and image processing tasks.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •