FLOAT Web UI: Enhanced Interface for Audio-driven Talking Portrait

Enhanced Web Interface built upon the official FLOAT implementation

✨ Enhanced Features

Developed from the original FLOAT research implementation with significant usability improvements:

Intuitive Web Interface
- Drag-and-drop file uploads
- Real-time previews
- One-click generation
Simplified Workflow
- Automatic file handling
- Clean output management
- Progress indicators
Extended Accessibility
- Public sharing option (--share)
- Custom server configuration (--port, --server)
- Mobile-friendly design

Quick Start

Installation

# 1. Create Conda Environment
conda create -n float-web-ui python=3.8.5
conda activate float-web-ui

# 2. Install torch and requirements
sh environments.sh

# or manual installation
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Test on Linux, 4060 ti 16GB VRAM , RAM 32GB

Preparing checkpoints

Download checkpints automatically
```
sh download_checkpoints.sh
```
or download checkpoints manually from this google-drive.

The checkpoints should be organized as follows:

./checkpints
|-- checkpoints_here
|-- float.pth                                       # main model
|-- wav2vec2-base-960h/                             # audio encoder
|   |-- .gitattributes
|   |-- config.json
|   |-- feature_extractor_config.json
|   |-- model.safetensors
|   |-- preprocessor_config.json
|   |-- pytorch_model.bin
|   |-- README.md
|   |-- special_tokens_map.json
|   |-- tf_model.h5
|   |-- tokenizer_config.json
|   '-- vocab.json
'-- wav2vec-english-speech-emotion-recognition/     # emotion encoder
    |-- .gitattributes
    |-- config.json
    |-- preprocessor_config.json
    |-- pytorch_model.bin
    |-- README.md
    '-- training_args.bin

W2V based models could be found in the links: wav2vec2-base-960h and wav2vec-english-speech-emotion-recognition.

Basic Usage

Prepare Inputs:
- Image: Front-facing portrait (512×512 recommended)
- Audio: Clean speech (WAV format, 16kHz recommended)

Web Interface:

python app.py --port 7860 --share
Drag & drop your files
Select emotion and intensity
Click "Generate"

Command Line:

python generate.py \
    --ref_path image.png \
    --aud_path audio.wav \
    --emo happy \
    --e_cfg_scale 5
Advanced Options
Parameter	Description	Recommended
--a_cfg_scale	Audio influence (1-10)	2-3
--e_cfg_scale	Emotion intensity (1-10)	5-7 for strong effect
--no_crop	Disable auto-face-crop	Only for pre-cropped images
--seed	Random seed	15-100

Pro Tips : Use --emo neutral for subtle lip-sync only For musical audio, extract vocals first Higher --e_cfg_scale values (8-10) create dramatic expressions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

FLOAT Web UI: Enhanced Interface for Audio-driven Talking Portrait

✨ Enhanced Features

Quick Start

Installation

Preparing checkpoints

Basic Usage

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
checkpoints		checkpoints
models		models
options		options
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
app.py		app.py
demo-web-ui.png		demo-web-ui.png
download_checkpoints.sh		download_checkpoints.sh
environments.sh		environments.sh
generate.py		generate.py
requirements.txt		requirements.txt

Uh oh!

License

Uh oh!

nanofatdog/float-web-ui

Folders and files

Latest commit

History

Repository files navigation

FLOAT Web UI: Enhanced Interface for Audio-driven Talking Portrait

✨ Enhanced Features

Quick Start

Installation

Preparing checkpoints

Basic Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages