Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@
<a href="https://github.com/existence-master/Sentient/issues/">Report Bug</a>
<span> · </span>
<a href="https://github.com/existence-master/Sentient/issues/">Request Feature</a>
<span> · </span>
<a href="https://www.youtube.com/watch?v=l481bvpCjbc">Watch our Ad!</a>
</h4>
</div>

Expand Down Expand Up @@ -75,27 +77,27 @@ We at [Existence](https://existence.technology) believe that AI won't simply die
### :camera: Screenshots

<div align="center">
<img src="https://private-user-images.githubusercontent.com/59280736/431842199-b76c7a9a-1689-42de-93ed-5d04d6c7ad10.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDIxOTktYjc2YzdhOWEtMTY4OS00MmRlLTkzZWQtNWQwNGQ2YzdhZDEwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTY2ZThhMDIyZmJkZWYxYzE5MzMyNTYzZDM5NjY0MmM3ZDc2NmJjMmYwNGU5MjUzMmJhYTE1NDU3NDhhZGIwODgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.U2Bn6mIdJF2SvXpJ9fyKe2c36-feA2wKtvQNcYjaEYY" alt="screenshot" />
<img src="https://i.postimg.cc/jqNX99VF/image.png" alt="screenshot" />
<p align="center">Context is streamed in from your apps - Sentient uses this context to 👇</p>
</div>
<div align="center">
<img src="https://private-user-images.githubusercontent.com/59280736/431841076-c7337318-38e2-4515-848d-df6ce9ec8685.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDEwNzYtYzczMzczMTgtMzhlMi00NTE1LTg0OGQtZGY2Y2U5ZWM4Njg1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFkYmIzYWJkMDExMmU0NzllMmZmNjU0NmUyNzIyYzJlZjUwMzM1ZDY0NjY0NjlhYTM4ODNiOGNmNDRkYzhhZTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.s87SsI2uPocqdoRQK-b_1R89ApFKnvOoVzislh77bAw" alt="screenshot" />
<img src="https://i.postimg.cc/FRVMVKxj/image.png" alt="screenshot" />
<p align="center">Learn Long-Term Memories about you</p>
</div>
<div align="center">
<img src="https://private-user-images.githubusercontent.com/59280736/431841142-33edc431-6be9-45b3-9b9c-5262f459ede6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDExNDItMzNlZGM0MzEtNmJlOS00NWIzLTliOWMtNTI2MmY0NTllZGU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTFlOTVhMWEyODVhMWZmN2ZmMzNjMGMyZWMxZjQwYzFkNGM4OGZhZTQ4YjVkYTc5MmRhY2ZmZGQxZTBmOTY4NjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.GUEsRDZzletVFm4uKBQhRehk4l2FhzEJuX5jFnglbZ4" alt="screenshot" />
<img src="https://i.postimg.cc/hth7Fzzt/image.png" alt="screenshot" />
<p align="center">Learn Short-Term Memories about you</p>
</div>
<div align="center">
<img src="https://private-user-images.githubusercontent.com/59280736/431841274-ea980432-1357-451b-93d2-d952a65f4607.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDEyNzQtZWE5ODA0MzItMTM1Ny00NTFiLTkzZDItZDk1MmE2NWY0NjA3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZjMTM5OWRlMGI1ODI0Zjg4YmJiYjk2MDBmMWNjNDdhMDRjODM2YjBhNjJjY2JiMzMxMGNlM2UzYjU5OGFmYzcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.F76G4nymktipQtkQZQ_9sfmMFKiQ1AH-0hMoPWt0DQE" alt="screenshot" />
<img src="https://i.postimg.cc/FFM9FYBK/image.png" alt="screenshot" />
<p align="center">Perform Actions for you, asynchronously and by combining all the different tools it needs to complete a task.</p>
</div>
<div align="center">
<img src="https://private-user-images.githubusercontent.com/59280736/431842176-c1ec90b6-edcc-4f9c-bc94-aa2e40b6422f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDIxNzYtYzFlYzkwYjYtZWRjYy00ZjljLWJjOTQtYWEyZTQwYjY0MjJmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRkZGMyY2Y1NTkyMDk1YjY4NWEwZjY1NDUxNWQ5NDc2NWU1OTAwZmM3ZjVjYWNmZDQzYWE1ZGNkMjJiYjQ3ZDImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.4djs4rCVqHY4L5_gshezAiMNgIcLui_eiFbZc8rsKrY" alt="screenshot" />
<img src="https://i.postimg.cc/TPpSW9yv/image.png" alt="screenshot" />
<p align="center">You can also voice-call Sentient anytime for a low-latency, human-like interactive experience.</p>
</div>
<div align="center">
<img src="https://private-user-images.githubusercontent.com/59280736/431842396-03af93ff-6acd-44c7-a973-dca20ac205bd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDIzOTYtMDNhZjkzZmYtNmFjZC00NGM3LWE5NzMtZGNhMjBhYzIwNWJkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZmNGNkZjI5OTQ1YmFjNmYzMmIzNThiOWEyZmIyZTBiMjVlMjczNTc2NmY3MjU1NjkzOTMwNjUwYzgyZDliMzImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.XCieUKi8dB-r8H75QHWKwX7UBtC6m1NXbSFxbUV_lkI" alt="screenshot" />
<img src="https://i.postimg.cc/tJSWPhZ8/image.png" alt="screenshot" />
<p align="center">Your profile can also be enriched with data from other social media sites.</p>
</div>

Expand Down
124 changes: 81 additions & 43 deletions src/server/tests/test_orpheus.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ def run_async():
from llama_cpp import Llama

# Set the path to your GGUF model file (update this to the correct path)
MODEL_PATH = "./models/orpheus-3b-0.1-ft-q4_k_m.gguf" # Replace with your GGUF file path
MODEL_PATH = "../voice/models/orpheus-3b-0.1-ft-q4_k_m.gguf" # Replace with your GGUF file path

# Number of layers to offload to GPU (adjust based on your GPU memory, e.g., 30 for 8GB VRAM)
N_GPU_LAYERS = 20
Expand All @@ -161,6 +161,23 @@ def run_async():
END_TOKEN_IDS = [128009, 128260, 128261, 128257]
CUSTOM_TOKEN_PREFIX = "<custom_token_"

# Default text to be spoken if no text is provided
DEFAULT_TEXT = "This is a default sentence."
BATCH_SENTENCES = [
"Good morning Kabeer!",
"You've got a busy day ahead.",
"Meetings, presentations and even a night out with the boys! <chuckle>",
"You ready to crush this?",
]

def create_filename(sentence, max_words=3, max_length=50):
words = sentence.split()[:max_words]
base = "_".join(words)
safe_base = "".join(c for c in base if c.isalnum() or c in ("_", "-"))
if len(safe_base) > max_length:
safe_base = safe_base[:max_length]
return safe_base + ".wav"

def format_prompt(prompt, voice=DEFAULT_VOICE):
"""Format prompt for Orpheus model with voice prefix and special tokens."""
if voice not in AVAILABLE_VOICES:
Expand Down Expand Up @@ -351,55 +368,76 @@ def list_available_voices():
print("<laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>")

def main():
# Parse command line arguments
parser = argparse.ArgumentParser(description="Orpheus Text-to-Speech using local GGUF model")
parser.add_argument("--text", type=str, help="Text to convert to speech")
parser.add_argument("--voice", type=str, default=DEFAULT_VOICE, help=f"Voice to use (default: {DEFAULT_VOICE})")
parser.add_argument("--output", type=str, help="Output WAV file path")
parser = argparse.ArgumentParser(description="Generate speech from text.")
parser.add_argument("text", nargs="*", help="Text to convert to speech")
parser.add_argument("--voice", default="default_voice", help="Voice to use")
parser.add_argument("--output", help="Output file or directory (in batch mode)")
parser.add_argument("--batch", action="store_true", help="Process predefined batch of sentences")
parser.add_argument("--temperature", type=float, default=0.7, help="Temperature for generation")
parser.add_argument("--top_p", type=float, default=0.9, help="Top-p sampling")
parser.add_argument("--repetition_penalty", type=float, default=1.0, help="Repetition penalty")
parser.add_argument("--list-voices", action="store_true", help="List available voices")
parser.add_argument("--temperature", type=float, default=TEMPERATURE, help="Temperature for generation")
parser.add_argument("--top_p", type=float, default=TOP_P, help="Top-p sampling parameter")
parser.add_argument("--repetition_penalty", type=float, default=REPETITION_PENALTY,
help="Repetition penalty (>=1.1 required for stable generation)")

args = parser.parse_args()

if args.list_voices:
list_available_voices()
return

# Use text from command line or prompt user
prompt = args.text
if not prompt:
if len(sys.argv) > 1 and sys.argv[1] not in ("--voice", "--output", "--temperature", "--top_p", "--repetition_penalty"):
prompt = " ".join([arg for arg in sys.argv[1:] if not arg.startswith("--")])

if args.batch:
# Batch mode
if args.output:
batch_dir = args.output
if not os.path.isdir(batch_dir):
os.makedirs(batch_dir, exist_ok=True)
else:
prompt = input("Enter text to synthesize: ")
if not prompt:
prompt = "Hello, I am Orpheus, an AI assistant with emotional speech capabilities."

# Default output file if none provided
output_file = args.output
if not output_file:
os.makedirs("outputs", exist_ok=True)
timestamp = time.strftime("%Y%m%d_%H%M%S")
output_file = f"outputs/{args.voice}_{timestamp}.wav"
print(f"No output file specified. Saving to {output_file}")

# Generate speech
start_time = time.time()
audio_segments = generate_speech_from_api(
prompt=prompt,
voice=args.voice,
temperature=args.temperature,
top_p=args.top_p,
repetition_penalty=args.repetition_penalty,
output_file=output_file
)
end_time = time.time()

print(f"Speech generation completed in {end_time - start_time:.2f} seconds")
print(f"Audio saved to {output_file}")
batch_dir = "outputs"
os.makedirs(batch_dir, exist_ok=True)

for sentence in BATCH_SENTENCES:
filename = create_filename(sentence)
output_file = os.path.join(batch_dir, filename)
print(f"Generating audio for: {sentence}")
start_time = time.time()
audio_segments = generate_speech_from_api(
prompt=sentence,
voice=args.voice,
temperature=args.temperature,
top_p=args.top_p,
repetition_penalty=args.repetition_penalty,
output_file=output_file
)
end_time = time.time()
print(f"Speech generation for '{sentence}' completed in {end_time - start_time:.2f} seconds")
print(f"Audio saved to {output_file}")
else:
# Non-batch mode
if args.text:
prompt = " ".join(args.text)
else:
prompt = DEFAULT_TEXT
print(f"No text provided. Using default text: {DEFAULT_TEXT}")

if args.output:
output_file = args.output
else:
os.makedirs("outputs", exist_ok=True)
timestamp = time.strftime("%Y%m%d_%H%M%S")
output_file = f"outputs/{args.voice}_{timestamp}.wav"
print(f"No output file specified. Saving to {output_file}")

start_time = time.time()
audio_segments = generate_speech_from_api(
prompt=prompt,
voice=args.voice,
temperature=args.temperature,
top_p=args.top_p,
repetition_penalty=args.repetition_penalty,
output_file=output_file
)
end_time = time.time()
print(f"Speech generation completed in {end_time - start_time:.2f} seconds")
print(f"Audio saved to {output_file}")

if __name__ == "__main__":
main()
Loading