ALIZARIN Engine

Follow the main developer's stream for live R&D:

Status	License	Contribution
In-Development	LGPL-3.0	Welcomes Contributions

The ALIZARIN Engine is currently in Phase 1: Foundation & Data Generation. This project is being developed live on stream as a public-facing R&D process.

Core Engine Overview

The engine's first voice, the mascott voice and proof-of-concept, is developed under the name ALIZARIN sharing a name with the engine.

The ALIZARIN Engine is a high-level, LGPLv3-licensed voice synthesis framework designed for generating multiple voice products (real-time TTS, AI singing, and concatenative singing) from a single, 100% synthetic source. This initiative is inspired by the pioneering work of missile_39's Adachi Rei, but takes a different approach to the "no-human-inside" concept by utilizing a unique hybrid method that incorporates both modern latent-space AI and algorithmic formant synthesis reminiscent of classic systems based on the style of the Votrax SC-01 and its Tandy TRS-80 Voice Synthesizer.

ALIZARIN: The Engine's Acronym

The name is an acronym for the core components of its design:

Augmented (The layered texture system)
Latent (The core AI representation)
Intonation (The quality speech output)
Synthesizer (The engine's primary function)
Algorithm (The base Pyo formant system)
Recursive (The neural network's processing method)
Inference (The process of generating audio)
Network (The foundational neural architecture)

Philosophy & Ethical Commitment

The ALIZARIN Engine is an open-source, community-focused project with a specific set of goals:

Unique, Non-Human Voices: The engine's core philosophy is "no human inside." It is designed to create voices from the ground up using algorithmic and synthetic sources, allowing for truly unique, non-human character identities like missile_39's Adachi Rei voice. I aim not to replace or copy their work, but to pay tribute to their work by adding new voice models and styles to the archetype missile_39's models fill with unique sounding models.
Ethical Commercial Voices: This framework provides a high-quality, free, and open-source path for commercial voices without needing to record, clone, or "steal" an existing human voice.
Protect Voice Actors: The synthetic nature of the voice makes it identifiable as non-human, reducing the chances of being used in fraud or to replace an existing voice actor. This promotes ethical behavior by offering a creative alternative to unauthorized voice clones that I hope many will adopt as an ethical alternative.

Goal: To create an "all-encompassing" voice solution, pairing modern TTS with expressive singers (DiffSinger, UTAU) to allow creators to build complete character identities with consistent voices across delivery methods.

Licensing Model (The ALIZARIN Framework)

This project is built on an ideology of community contribution and creator freedom. We use the LGPLv3 License to achieve a "best-of-both-worlds" scenario:

A Free, Shared Community Resource: The ALIZARIN Engine framework is (and always will be) open-source. Any modifications or improvements made to the engine framework itself must be shared back with the community.
Allowance for Unique, Private Voices: The LGPLv3 license draws a "bright line" between "The Library" (our engine) and "A Work that Uses The Library" (your voice). Your "secret sauce" (formant scripts, texture kits, training data) remains 100% your own property and does not need to be shared, allowing for unique privately licensed voices made from a free and growing community ecosystem.

Community & Ethical Use Requests (Non-Binding)

The following points are not requirements or conditions of the LGPLv3 license, nor are they possible restrictions to place under the required license to make this sharable. These are our sincere requests to you, the user, to foster a positive and ethical community environment.

A Request from the Creator (Sharing): We would love for you to share your public creations with the community and the ALIZARIN Engine team. If your voice is posted publicly, please consider sending us a link and use #ALIZARIN on your social media posts to tag us!
A Request from the Creator (Ethical Usage): We ask all creators using this framework to voluntarily adopt the following minimal restrictions for any voice derived from the ALIZARIN Engine:
- No Hateful Content: Do not use the voice to create or distribute hate speech, harassment, severe threats, or content promoting violence or illegal acts.
- No Impersonation or Fraud: Do not use the voice to impersonate identifiable individuals, especially for the purpose of financial fraud, manipulation, or unauthorized commercial exploitation.
- Transparent Disclosure: Disclose clearly that the voice used in any public-facing content is synthetic and was created by a computational process.

Documentation & Guidelines

This section summarizes the project's core documentation. For full details, please refer to the dedicated pages:

VOICE EULA & GUIDELINES (USAGE_GUIDE.md): (READ FIRST) The full EULA, IP rules for the primary ALIZARIN voice, and the special usage request form.
MULTILINGUAL_RVC_HYBRID_GUIDE.md: The official guide for creating voicebanks in other languages. This is the only potentially permitted exception to our "no-AI-training" rule.
CREATOR'S GUIDE: Detailed suggestions, Q&A, and best practices for creating and protecting your own voice and character IP.
THIRD-PARTY LICENSES: A full list of the open-source software that powers this engine and their licenses.

I. Voice IP Requirements (The Sound)

This section details the rules governing the use of the audio files and voice data for the primary ALIZARIN voice and the minimum standard suggested guidelines for our users to adopt for all community-created voices.

Rule Category	ALIZARIN Voice Requirements	ALIZARIN Minimum Suggested Guidelines
Commercial Music Use	Allowed (Monetization permitted)	Allowed (Monetization permitted)
Hate Speech/Slander	Strictly Prohibited and enforced.	Strictly Prohibited.
Impersonation/Fraud	Strictly Prohibited and enforced.	Strictly Prohibited.
R-18/Adult Content	Prohibited (unless explicitly permitted, see usage request form).	(Not included in minimum suggested guidelines)
Religious/Political Usage	Prohibited (see documentation for details).	(Not included in minimum suggested guidelines)
Voice Redistribution	Prohibited (Requires EULA agreement).	Prohibited.
Voice Alterations/Modifications	Prohibited, with Exceptions. (Prohibited modifications are detailed on the documentation page.)	(Not included in minimum suggested guidelines)

II. Character IP Requirements (The Image and Name)

This section details the rules governing the use of the visual art, name, and personality of the primary ALIZARIN character.

Rule Category	ALIZARIN Character Requirements	ALIZARIN Minimum Suggested Guidelines
Commercial Character IP or Visual Art Use	Requires permission for attachment to a product, API, software, or similar product. (For inquiries, see the special requests form).	Prohibited for commercial use without permission from the IP holder.
Name/Identity Use	Requires permission. (Cannot be used as a primary brand for a product/API, for inquiries see the special requests form.)	Prohibited for use as the brand/name of a product, API, or software.
Hate Speech/Slander	Strictly Prohibited and enforced.	Strictly Prohibited.
Religious/Political Depictions	Prohibited (see documentation for details).	(Not included in minimum suggested guidelines)
R-18/Adult Content	Prohibited (unless explicitly permitted, see special requests form).	(Decide this at your own discretion)
Derivative Art/Fan Work	Allowed and Encouraged (Must adhere to ethical usage policies).	Allowed.
Usage Request Form	Required for all special usages (e.g., visual art, format conversion).	(Separate form is not required for community voices)

Development Roadmap

This repository will be updated with generalized, open-source scripts (licensed LGPLv3) as development of the main "ALIZARIN" voice is completed.

Phase 1: Foundation (In Progress)

1.1: Create the "Texture Kit" (static, storms, hums).
1.2: Design and build the "Formant" synthesizer algorithm in Pyo.
1.3: Create the "Hybrid" audio generation script.
1.4: Generate the 30-60 minute hybrid audio training dataset.

Phase 2: Product Generation & Community Base Models

2.1: Train the MeloTTS (Real-Time TTS) model for the main ALIZARIN voice.
2.2: Train the DiffSinger (AI Singing) model for the main ALIZARIN voice.
2.3: Generate and configure the OpenUTAU (Concatenative) voicebank for the main ALIZARIN voice.
2.4: (Internal) Train an RVC model for personal use.
2.5: [Community Base Model Goal] Generate untextured training data for the High-Pitch and Low-Pitch Community Base Models.
2.6: [Conditional Goal] Train and Release Open-Source Community Base Models (TTS/DiffSinger) (IF CROWDFUNDING GOAL MET).

Phase 3: Deployment

3.1: Build and test the public-facing commercial TTS API.
3.2: Build and test the internal, low-latency streaming API (based on the acrylicc concept).
3.3: Create the "live-sing" DiffSinger & UTAU endpoints for on-stream use.

Phase 4: Distribution

4.1: Draft the Voicebank EULA and Character License.
4.2: Finalize packaging and distribution plan.

Community Initiative Goals

I. Funding Goals (Tiered Deliverables)

The following goals represent tangible product releases and services that will be delivered upon successful funding.

Please Note: These goal amounts are early estimates based on estimated current server costs and R&D requirements. They are subject to change if we encounter unforeseen technical difficulties or significant changes in cloud computing costs. We will be 100% transparent with the community about any adjustments needed. This is almost entirely covering training costs and is not allocated to be paying much for the development time. The project is a work of passion and will still likely take a heavy loss on development if we only meet 2 goals or less.

Goal	Description	Cost	Funded
Upgrade 1: Finetune Duo Voice Pack	HARD GOAL (Singing/Data): Fund the complete process of generating, training, and releasing the two pre-trained DiffSinger Community Base Models and their raw, untextured synthetic training data. This includes all required server time and R&D hours to train two models from scratch.	$750	❌
Upgrade 2: Finetune Duo TTS Pack	HARD GOAL (Speaking/TTS): Fund the process of generating TTS-specific datasets and fine-tuning the base MeloTTS model with the new speakers. This results in the release of the two pre-trained MeloTTS Community Base Models, making custom TTS creation easier for users.	$900	❌
Upgrade 3: Public RVC Sandbox	Release a specialized Python/Colab notebook for easy, localized, fine-tuning of the base models into a real-time RVC voice changer.	$1050	❌
Upgrade 4: "ALIZARIN Streamer" TTS App	Develop and release a standalone desktop application (`.exe`) for streamers. This tool will provide a user-friendly GUI for the TTS engine, connecting to Twitch (for bits/redeems/etc) and allowing users to plug in the two Community Models or their own custom-trained ALIZARIN voices.	$1300	❌

II. Conditional Goals (Community Collaboration)

These features rely on collaboration and linguistic expertise from the community, as the creator only speaks English.

Goal	Description
Multilingual Phonemizer Integration	Integrate, test, and validate code to support additional languages (e.g., Japanese/Spanish) in the ALIZARIN Engine. Requires collaboration with native speakers for linguistic validation and accent refinement.
Advanced Accent/Pronunciation Support	Research and develop methods to incorporate slight characterization or accent without losing the non-human synthetic core.
Non-English Language Pack Integration	Integrate and validate open-source phonemizer tools for non-English languages (requires external linguistic testing and time).

III. Stretch Goals (Long-Term Research)

These are complex, long-term experimental projects that will be prioritized after the main ALIZARIN voice is successfully delivered and its core technology is stable.

Goal	Description
Consenting Voice Actor Module	Develop the workflow and legal framework for co-owning hybrid voices made by mixing synthetic audio with small human voice samples. The goal is to provide slight characterization and clarity in pronunciation or accent as an alternative option while strictly maintaining the non-human sound and personality of the final voice.
Hybrid Data Training & Automation Tool	Research and develop a tool to automate the mixing of the current synthetic output (Formant + Textures) with additional, specialized synthetic elements (like Adachi Rei's pure sine/buzz sounds). The goal is to create novel training sets that improve tone and quality by precisely blending these core synthetic sources.
Initial Paper & Methodology Release	Publish a detailed paper/blog post explaining the full technical and ethical methodology of the ALIZARIN Engine. This serves as an official output goal to establish academic recognition and attract contributors.

Technology Framework

The ALIZARIN Engine is not a single piece of software, but an integrated pipeline of open-source tools.

Component	Technology	Purpose
Base Voice Synthesis	Pyo	(LGPLv3) A Python DSP library for building the "formant" synthesizer.
Real-Time TTS	MeloTTS	(Apache 2.0) The engine for the real-time, low-latency TTS.
AI Singing Voice	DiffSinger (Amphion)	(MIT) The engine for the high-quality, expressive AI singing model.
Concatenative Singing	OpenUTAU	(MIT) The editor and platform for the "classic" UTAU voicebank.
Concatenative Engine	WavRS	(MIT) The real-time UTAU renderer server for TTS/live-singing.
Internal API Blueprint	`speech-to-text-to-teto`	(No License) Inspiration and blueprint for the internal `.ust`-generating script.
API & Deployment	FastAPI	(MIT) The framework for building all public (commercial) and internal (streaming) API endpoints.

Acknowledgements

This project stands on the shoulders of many contributors. We wish to extend our deepest thanks to:

missile_39 and the RepliVoice Team: For creating the pioneering Adachi Rei voice. Their successful development of a fully synthetic voice and commercialization of the RepliVoice technology inspired the core "no-human-inside" philosophy of this engine. The ALIZARIN Engine is designed to be an original work that builds upon this concept, not an remake or copy of the distinct qualities of the Adachi Rei voice. They are a great inspiration and motivator to me.
The Developers of DiffSinger: We specifically thank Jinglin Liu (MoonInTheRiver) and the research team for creating DiffSinger, the technology at the heart of our AI singing product.
- Original Author's Repository: MoonInTheRiver/DiffSinger
- Toolkit Implementation: We utilize the DiffSinger implementation provided by the Amphion toolkit.
acrylicc: For their speech-to-text-to-teto project, which provided the conceptual blueprint for the real-time UTAU-TTS/singing API.
stakira and the OpenUTAU community: For creating OpenUTAU and WavRS, which remain the heart of the concatenative synthesis world.
The Developers of Pyo: For creating the powerful DSP tools that make the formant engine possible.
The Developers of MeloTTS: For providing a high-quality, commercially-permissive TTS model.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github		.github
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CREATOR_GUIDE.md		CREATOR_GUIDE.md
MULTILINGUAL_RVC_HYBRID_GUIDE.md		MULTILINGUAL_RVC_HYBRID_GUIDE.md
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
USAGE_GUIDE.md		USAGE_GUIDE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

ALIZARIN Engine

Core Engine Overview

ALIZARIN: The Engine's Acronym

Philosophy & Ethical Commitment

Licensing Model (The ALIZARIN Framework)

Community & Ethical Use Requests (Non-Binding)

Documentation & Guidelines

I. Voice IP Requirements (The Sound)

II. Character IP Requirements (The Image and Name)

Development Roadmap

Phase 1: Foundation (In Progress)

Phase 2: Product Generation & Community Base Models

Phase 3: Deployment

Phase 4: Distribution

Community Initiative Goals

I. Funding Goals (Tiered Deliverables)

II. Conditional Goals (Community Collaboration)

III. Stretch Goals (Long-Term Research)

Technology Framework

Acknowledgements

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

ALIZARINENGINE/AlizarinEngine

Folders and files

Latest commit

History

Repository files navigation

ALIZARIN Engine

Core Engine Overview

ALIZARIN: The Engine's Acronym

Philosophy & Ethical Commitment

Licensing Model (The ALIZARIN Framework)

Community & Ethical Use Requests (Non-Binding)

Documentation & Guidelines

I. Voice IP Requirements (The Sound)

II. Character IP Requirements (The Image and Name)

Development Roadmap

Phase 1: Foundation (In Progress)

Phase 2: Product Generation & Community Base Models

Phase 3: Deployment

Phase 4: Distribution

Community Initiative Goals

I. Funding Goals (Tiered Deliverables)

II. Conditional Goals (Community Collaboration)

III. Stretch Goals (Long-Term Research)

Technology Framework

Acknowledgements

About

Topics

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Packages