EECS-504-F23 Final Project

Pixel Polyglots: Pronunciation Enhancement in Online Language Learning

Language learning applications like Duolingo and Babbel have catalyzed a digital revolution, yet a critical gap persists in effectively teaching pronunciation and speech. As linguists emphasize, conversing with native speakers is optimal for attaining fluency. However, the absence of comprehensive speech visualization tools impedes the immersive experience many enthusiasts seek. This predicament inspires an ingenious solution: creating a service that leverages AI-generated deepfake avatars to provide realistic visualizations of users speaking in their target language, processed directly on their mobile devices with minimal GPU usage.

This network takes the audio features and generates realistic facial expression coefficients for the 3D face model over time. It is trained using a distillation loss from a lip sync model, landmark loss on rendered faces, and a lip reading loss.

This is a variational autoencoder that takes the audio and an identity code as input and outputs a diverse set of head pose coefficients over time. It is trained using reconstruction, GAN, and other losses.

This module maps the generated 3D coefficients to an unsupervised space of facial keypoints. Next, it uses warping and blending to generate the final talking head video that matches the coefficients.

The SadTalker paper faces challenges in representing eye and teeth variations due to limitations in the 3D Morphable Models (3DMM) used, leading to distorted video generation by failing to capture facial landmarks and treating expressive images as neutral. To address this, the method enhances control over image style and features by manipulating latent codes in a redesigned generator architecture. Disentanglement in the intermediate latent space improves control, potentially correcting specific attributes, including facial expressions.

Made with ❤️ by:

Saket Pradhan
Kanishka Gabel
Srushti Hippargi
Shrey Shah

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
Latent Code Correction		Latent Code Correction
Readme_data		Readme_data
audio2exp_models		audio2exp_models
audio2pose_models		audio2pose_models
checkpoints		checkpoints
config		config
examples		examples
face3d		face3d
facerender		facerender
results		results
sample_audio		sample_audio
sample_images		sample_images
.DS_Store		.DS_Store
EECS 504 F23_Pixel Polyglots.pdf		EECS 504 F23_Pixel Polyglots.pdf
LICENSE		LICENSE
README.md		README.md
audio.py		audio.py
croper.py		croper.py
generate_batch.py		generate_batch.py
generate_facerender_batch.py		generate_facerender_batch.py
hparams.py		hparams.py
inference-with-modal.py		inference-with-modal.py
inference.py		inference.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
test_audio2coeff.py		test_audio2coeff.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EECS-504-F23 Final Project

About

Releases 1

Packages

Contributors 4

Languages

License

Saketspradhan/EECS-504-F23

Folders and files

Latest commit

History

Repository files navigation

EECS-504-F23 Final Project

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages