Description
LocalAI version:
2.28.0
Environment, CPU architecture, OS, and Version:
Docker
OS: Ubuntu 24.10
CPU: AMD Ryzen 7 9800X3D
GPU: RTX 5090
Describe the bug
Using the flux.1-shchnell models to generate an image from the UI throw an error:
failed to load model with internal loader: could not load model (no success): Unexpected err=GatedRepoError('401 Client Error. (Request ID: Root=1-67ff6f8d-71d5575729dabf4c417800f6;0a8a3758-810b-4bb2-8368-6b681bc4a6bb)\n\nCannot access gated repo for url https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/model_index.json.\nAccess to model black-forest-labs/FLUX.1-schnell is restricted. You must have access to it and be authenticated to access it. Please log in.'), type(err)=
The model configuration on the gallery is not usable due to https://huggingface.co/black-forest-labs/FLUX.1-schnell is not downloadable without authentication... We need to update the configuration with a mirror.
To Reproduce
- Download the Flux.-schnell image generation model
- Try to generate an image with it
Expected behavior
no error
Logs
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr /build/backend/python/diffusers/venv/lib/python3.10/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.29.0 is exactly one major version older than the runtime version 6.30.2 at backend.proto. Please update the gencode to avoid compatibility violations in the next runtime release.
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr warnings.warn(
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr /build/backend/python/diffusers/venv/lib/python3.10/site-packages/transformers/utils/hub.py:105: FutureWarning: UsingTRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. UseHF_HOME
instead.
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr warnings.warn(
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr Server started. Listening on: 127.0.0.1:39481
8:51AM DBG GRPC Service Ready
8:51AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00034ce58} sizeCache:0 unknownFields:[] Model:black-forest-labs/FLUX.1-schnell ContextSize:1024 Seed:358204100 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:true Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/black-forest-labs/FLUX.1-schnell Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType:FluxPipeline SchedulerType: CUDA:true CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr Loading model black-forest-labs/FLUX.1-schnell...
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr Request Model: "black-forest-labs/FLUX.1-schnell"
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr ContextSize: 1024
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr Seed: 358204100
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr NBatch: 512
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr F16Memory: true
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr MMap: true
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr LowVRAM: true
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr NGPULayers: 99999999
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr Threads: 8
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr ModelFile: "/build/models/black-forest-labs/FLUX.1-schnell"
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr PipelineType: "FluxPipeline"
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr CUDA: true
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr ModelPath: "/build/models"
8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr
8:51AM ERR Server error error="failed to load model with internal loader: could not load model (no success): Unexpected err=GatedRepoError('401 Client Error. (Request ID: Root=1-67ff6f8d-71d5575729dabf4c417800f6;0a8a3758-810b-4bb2-8368-6b681bc4a6bb)\n\nCannot access gated repo for url https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/model_index.json.\\nAccess to model black-forest-labs/FLUX.1-schnell is restricted. You must have access to it and be authenticated to access it. Please log in.'), type(err)=<class 'huggingface_hub.errors.GatedRepoError'>"
Additional context
I use the docker image localai/localai:latest-aio-gpu-nvidia-cuda-12