-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: voice mode tts endpoint #7294
base: main
Are you sure you want to change the base?
Conversation
…ove code readability and maintainability 📝 (chat-input.tsx): Add functionality to set voice assistant active state when showAudioInput is true 📝 (voice-assistant.tsx): Add functionality to set voice assistant active state and scroll to bottom when closing audio input 📝 (chat-view.tsx): Update ChatView component to consider sidebarOpen and isVoiceAssistantActive states 📝 (voiceStore.ts): Add isVoiceAssistantActive state and setIsVoiceAssistantActive function to voice store 📝 (index.ts, voice.types.ts): Update types to include sidebarOpen prop in chatViewProps and isVoiceAssistantActive state in VoiceStoreType
…ty to chat input component 🔧 (voice-button.tsx): Update voice button to set new session close voice assistant state 🔧 (sidebar-open-view.tsx): Update sidebar open view to set new session close voice assistant state 🔧 (voiceStore.ts, voice.types.ts): Add new session close voice assistant state and setter to voice store and types
…unction to clean up code and improve readability
…ove code readability and maintainability 📝 (chat-input.tsx): Add functionality to set voice assistant active state when showAudioInput is true 📝 (voice-assistant.tsx): Add functionality to set voice assistant active state and scroll to bottom when closing audio input 📝 (chat-view.tsx): Update ChatView component to consider sidebarOpen and isVoiceAssistantActive states 📝 (voiceStore.ts): Add isVoiceAssistantActive state and setIsVoiceAssistantActive function to voice store 📝 (index.ts, voice.types.ts): Update types to include sidebarOpen prop in chatViewProps and isVoiceAssistantActive state in VoiceStoreType
CodSpeed Performance ReportMerging #7294 will improve performances by 58.55%Comparing Summary
Benchmarks breakdown
|
…-tts`) To optimize the provided `get_tts_config` function for better runtime performance, particularly in cases where `session_id` is not `None` but frequently not in the `tts_config_cache`, we can eliminate some redundant lookups and minimize dictionary access. Here’s the optimized version. ### Optimizations. 1. **Cache Lookup Optimization**: Instead of checking for the key existence and then retrieving it or setting it, we directly attempt to retrieve the value using a try-except block. This uses a single dictionary lookup in the common case where the session ID is found in the cache. 2. **Exception Handling**: Using `KeyError` in the `except` block to handle the case where the session ID is not found in the cache. This approach is faster due to fewer dictionary operations in the normal flow. This results in the function performing fewer dictionary lookups under typical usage, leading to improved performance.
…ter in speech creation function 🐛 (use-start-conversation.ts): update WebSocket URL to use flow_tts endpoint and add support for audio language and input audio transcription model in WebSocket session update configuration
…-tts`) To optimize the `get_tts_config` function for speed, we can reduce the number of lookups on the `tts_config_cache` dictionary and ensure that the instance creation is kept minimal. Here is the revised version. In this optimized version. 1. Remove the unnecessary variable `msg`. 2. Reduced the lookup on `tts_config_cache` by using `dict.get()`, which avoids the need for an explicit dictionary lookup before accessing the value. 3. This approach still lazily initializes the `TTSConfig` and ensures thread safety by directly working on the cache after a single not-found check. By minimizing the dictionary lookups and being more direct in our conditional checks, we should achieve a slight performance improvement, especially when cache misses are not very frequent.
|
||
if session_id not in tts_config_cache: | ||
tts_config_cache[session_id] = TTSConfig(session_id, openai_key) | ||
return tts_config_cache[session_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 22% (0.22x) speedup for get_tts_config
⏱️ Runtime : 1.04 millisecond
→ 857 microseconds
(best of 5
runs)
📝 Explanation and details
To optimize the get_tts_config
function for speed, we can reduce the number of lookups on the tts_config_cache
dictionary and ensure that the instance creation is kept minimal. Here is the revised version.
In this optimized version.
- Remove the unnecessary variable
msg
. - Reduced the lookup on
tts_config_cache
by usingdict.get()
, which avoids the need for an explicit dictionary lookup before accessing the value. - This approach still lazily initializes the
TTSConfig
and ensures thread safety by directly working on the cache after a single not-found check.
By minimizing the dictionary lookups and being more direct in our conditional checks, we should achieve a slight performance improvement, especially when cache misses are not very frequent.
✅ Correctness verification report:
Test | Status |
---|---|
⚙️ Existing Unit Tests | 🔘 None Found |
🌀 Generated Regression Tests | ✅ 6 Passed |
⏪ Replay Tests | 🔘 None Found |
🔎 Concolic Coverage Tests | 🔘 None Found |
📊 Tests Coverage | undefined |
🌀 Generated Regression Tests Details
from typing import Any
# imports
import pytest # used for our unit tests
from langflow.api.v1.voice_mode import get_tts_config
from openai import OpenAI
# function to test
class TTSConfig:
def __init__(self, session_id: str, openai_key: str):
self.session_id = session_id
self.barge_in_enabled = False
self.default_tts_session = {
"type": "transcription_session.update",
"session": {
"input_audio_format": "pcm16",
"input_audio_transcription": {
"model": "gpt-4o-mini-transcribe",
"language": "en",
},
"turn_detection": {
"type": "server_vad",
"threshold": 0.5, # Placeholder value
"prefix_padding_ms": 300, # Placeholder value
"silence_duration_ms": 500, # Placeholder value
},
"input_audio_noise_reduction": {"type": "near_field"},
"include": [],
},
}
self.tts_session: dict[str, Any] = {}
self.oai_client = OpenAI(api_key=openai_key)
def get_session_dict(self):
"""Return a copy of the default session dictionary with current settings."""
return dict(self.default_tts_session)
def get_openai_client(self):
return self.oai_client
tts_config_cache: dict[str, TTSConfig] = {}
from langflow.api.v1.voice_mode import get_tts_config
# unit tests
def test_session_id_none():
# Test with session_id as None
with pytest.raises(ValueError, match="session_id cannot be None"):
get_tts_config(None, "key1")
def test_non_string_session_id():
# Test with non-string session_id
with pytest.raises(TypeError):
get_tts_config(123, "key1")
with pytest.raises(TypeError):
get_tts_config(["list"], "key1")
def test_non_string_openai_key():
# Test with non-string openai_key
with pytest.raises(TypeError):
get_tts_config("session1", 123)
with pytest.raises(TypeError):
get_tts_config("session1", ["list"])
from typing import Any
# imports
import pytest # used for our unit tests
from langflow.api.v1.voice_mode import get_tts_config
# function to test
from openai import OpenAI
# Define constants used in the TTSConfig class
SILENCE_THRESHOLD = 0.5
PREFIX_PADDING_MS = 300
SILENCE_DURATION_MS = 700
class TTSConfig:
def __init__(self, session_id: str, openai_key: str):
self.session_id = session_id
self.barge_in_enabled = False
self.default_tts_session = {
"type": "transcription_session.update",
"session": {
"input_audio_format": "pcm16",
"input_audio_transcription": {
"model": "gpt-4o-mini-transcribe",
# "prompt": "expect words in english",
"language": "en",
},
"turn_detection": {
"type": "server_vad",
"threshold": SILENCE_THRESHOLD,
"prefix_padding_ms": PREFIX_PADDING_MS,
"silence_duration_ms": SILENCE_DURATION_MS,
},
"input_audio_noise_reduction": {"type": "near_field"},
"include": [],
},
}
self.tts_session: dict[str, Any] = {}
self.oai_client = OpenAI(api_key=openai_key)
def get_session_dict(self):
"""Return a copy of the default session dictionary with current settings."""
return dict(self.default_tts_session)
def get_openai_client(self):
return self.oai_client
tts_config_cache: dict[str, TTSConfig] = {}
from langflow.api.v1.voice_mode import get_tts_config
# unit tests
# Valid Inputs
def test_none_session_id_raises_value_error():
with pytest.raises(ValueError, match="session_id cannot be None"):
get_tts_config(None, "valid_key_123")
To test or edit this optimization locally git merge codeflash/optimize-pr7294-2025-03-27T14.00.20
return tts_config_cache[session_id] | |
raise ValueError("session_id cannot be None") | |
# Use get with a default to reduce dictionary lookups | |
config = tts_config_cache.get(session_id) | |
if config is None: | |
config = TTSConfig(session_id, openai_key) | |
tts_config_cache[session_id] = config | |
return config | |
async def flow_tts_websocket_no_session( | |
…fig class for voice mode customization ♻️ (use-start-conversation.ts): refactor code to use "transcription_session.update" type and update session attributes based on audioSettings and audioLanguage variables
…-mode-tts`) Sure, we can optimize the given `create_event_logger` function for better runtime performance and memory usage.. 1. Using `nonlocal` keyword for state variables to avoid dictionary key access overhead. 2. Simplifying the event count increment using `+=` operator and initializing it as 0 by default. Let's rewrite the code. Key Changes. - Replaced `state` dictionary with local variables `last_event_type` and `event_count`. - Used `nonlocal` to modify `last_event_type` and `event_count` from the inner function. - Simplified event count initialization and increment process.
Args: | ||
session_id: The session ID to include in log messages | ||
""" | ||
state = {"last_event_type": None, "event_count": 0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 69% (0.69x) speedup for create_event_logger
⏱️ Runtime : 35.4 microseconds
→ 21.0 microseconds
(best of 19
runs)
📝 Explanation and details
Sure, we can optimize the given create_event_logger
function for better runtime performance and memory usage..
- Using
nonlocal
keyword for state variables to avoid dictionary key access overhead. - Simplifying the event count increment using
+=
operator and initializing it as 0 by default.
Let's rewrite the code.
Key Changes.
- Replaced
state
dictionary with local variableslast_event_type
andevent_count
. - Used
nonlocal
to modifylast_event_type
andevent_count
from the inner function. - Simplified event count initialization and increment process.
✅ Correctness verification report:
Test | Status |
---|---|
⚙️ Existing Unit Tests | 🔘 None Found |
🌀 Generated Regression Tests | ✅ 14 Passed |
⏪ Replay Tests | 🔘 None Found |
🔎 Concolic Coverage Tests | 🔘 None Found |
📊 Tests Coverage | undefined |
🌀 Generated Regression Tests Details
from unittest.mock import patch
# imports
import pytest # used for our unit tests
from langflow.api.v1.voice_mode import create_event_logger
# function to test
from langflow.logging import logger
# unit tests
# Basic Functionality
def test_mixed_event_types():
with patch.object(logger, 'debug') as mock_debug:
codeflash_output = create_event_logger("session1"); log_event = codeflash_output
log_event({"type": "event1"}, "Client → OpenAI")
log_event({"type": "event2"}, "Client → OpenAI")
log_event({"type": "event1"}, "Client → OpenAI")
# Edge Cases
def test_empty_event_dictionary():
with patch.object(logger, 'debug') as mock_debug:
codeflash_output = create_event_logger("session1"); log_event = codeflash_output
with pytest.raises(KeyError):
log_event({}, "Client → OpenAI")
def test_missing_event_type_key():
with patch.object(logger, 'debug') as mock_debug:
codeflash_output = create_event_logger("session1"); log_event = codeflash_output
with pytest.raises(KeyError):
log_event({"data": "value"}, "Client → OpenAI")
def test_none_as_event():
with patch.object(logger, 'debug') as mock_debug:
codeflash_output = create_event_logger("session1"); log_event = codeflash_output
with pytest.raises(TypeError):
log_event(None, "Client → OpenAI")
def test_non_dict_event():
with patch.object(logger, 'debug') as mock_debug:
codeflash_output = create_event_logger("session1"); log_event = codeflash_output
with pytest.raises(TypeError):
log_event("event", "Client → OpenAI")
# Large Scale Test Cases
def test_high_volume_of_events():
with patch.object(logger, 'debug') as mock_debug:
codeflash_output = create_event_logger("session1"); log_event = codeflash_output
for i in range(1000):
log_event({"type": f"event{i}"}, "Client → OpenAI")
def test_high_volume_of_same_events():
with patch.object(logger, 'debug') as mock_debug:
codeflash_output = create_event_logger("session1"); log_event = codeflash_output
for i in range(1000):
log_event({"type": "event1"}, "Client → OpenAI")
# Direction Variations
def test_different_session_ids():
with patch.object(logger, 'debug') as mock_debug:
codeflash_output = create_event_logger("session1"); log_event1 = codeflash_output
codeflash_output = create_event_logger("session2"); log_event2 = codeflash_output
log_event1({"type": "event1"}, "Client → OpenAI")
log_event2({"type": "event1"}, "Client → OpenAI")
# State Persistence
def test_state_reset():
with patch.object(logger, 'debug') as mock_debug:
codeflash_output = create_event_logger("session1"); log_event = codeflash_output
log_event({"type": "event1"}, "Client → OpenAI")
log_event({"type": "event2"}, "Client → OpenAI")
log_event({"type": "event1"}, "Client → OpenAI")
# Invalid Inputs
from unittest.mock import call, patch
# imports
import pytest # used for our unit tests
from langflow.api.v1.voice_mode import create_event_logger
# function to test
from langflow.logging import logger
# unit tests
# Basic Functionality
def test_empty_event_dictionary():
logger_mock = patch('langflow.logging.logger').start()
codeflash_output = create_event_logger("session1"); event_logger = codeflash_output
with pytest.raises(KeyError):
event_logger({}, "Client → OpenAI")
def test_missing_event_type():
logger_mock = patch('langflow.logging.logger').start()
codeflash_output = create_event_logger("session1"); event_logger = codeflash_output
with pytest.raises(KeyError):
event_logger({"data": "value"}, "Client → OpenAI")
def test_null_event():
logger_mock = patch('langflow.logging.logger').start()
codeflash_output = create_event_logger("session1"); event_logger = codeflash_output
with pytest.raises(TypeError):
event_logger(None, "Client → OpenAI")
# Event Deduplication
def test_count_increment_for_same_event_type():
logger_mock = patch('langflow.logging.logger').start()
codeflash_output = create_event_logger("session1"); event_logger = codeflash_output
event_logger({"type": "event1"}, "Client → OpenAI")
event_logger({"type": "event1"}, "Client → OpenAI")
def test_count_reset_on_different_event_type():
logger_mock = patch('langflow.logging.logger').start()
codeflash_output = create_event_logger("session1"); event_logger = codeflash_output
event_logger({"type": "event1"}, "Client → OpenAI")
event_logger({"type": "event2"}, "Client → OpenAI")
# Session ID Inclusion
def test_high_volume_of_events():
logger_mock = patch('langflow.logging.logger').start()
codeflash_output = create_event_logger("session1"); event_logger = codeflash_output
for i in range(1000):
event_logger({"type": f"event{i % 10}"}, "Client → OpenAI")
def test_high_frequency_of_type_changes():
logger_mock = patch('langflow.logging.logger').start()
codeflash_output = create_event_logger("session1"); event_logger = codeflash_output
for i in range(1000):
event_logger({"type": f"event{i % 2}"}, "Client → OpenAI")
# Direction Handling
To test or edit this optimization locally git merge codeflash/optimize-pr7294-2025-03-27T15.17.27
state = {"last_event_type": None, "event_count": 0} | |
last_event_type = None | |
event_count = 0 | |
def log_event(event: dict, direction: str) -> None: | |
"""Log WebSocket events with deduplication and counting. | |
Args: | |
event: The event dictionary to log | |
direction: The direction of the event (e.g., "Client → OpenAI") | |
""" | |
nonlocal last_event_type, event_count | |
event_type = event.get("type") | |
logger.debug(f"Event (session - {session_id}): {direction} {event_type}") | |
if event_type != last_event_type: | |
last_event_type = event_type | |
event_count = 0 | |
event_count += 1 |
…-tts`) To optimize the `get_tts_config` function, we need to ensure that our cache lookups and assignments are as efficient as possible. Here is a more optimized version of the code. ### Changes. 1. Replaced the check `if session_id not in tts_config_cache:` with cache lookup `tts_config = tts_config_cache.get(session_id)` to avoid redundant dictionary lookups and make the code cleaner. 2. Assign `TTSConfig` to a variable only if necessary, which keeps the function's logic straightforward and efficient. In terms of optimization, these changes minimize the number of dictionary lookups and streamline the cache retrieval and assignment process.
…Modal component for managing audio playback state 🔧 (voice-assistant.tsx): pass isPlayingRef prop to VoiceAssistant component for controlling audio playback state
…(`voice-mode-tts`) In the provided code, the primary focus should be on optimizing the `VoiceConfig` class and its methods to improve performance in terms of efficient memory usage and faster execution. To achieve this, we can streamline the initialization process and minimize repeated computations. Here’s an updated version of the `VoiceConfig` class with optimizations. ### Key Optimizations. 1. **Static Method for Default Session**: Used a static method `_create_default_session` to generate the default session dictionary. This ensures that the dictionary is only defined once, and avoids redefining it within each instance, which saves memory and processing time. 2. **Use of `.copy()`**: Instead of creating a new dictionary each time in `get_session_dict`, the `copy()` method is used to return a new dictionary based on the `default_openai_realtime_session` template. This way, the same dictionary template is reused, avoiding unnecessary computations. 3. **Minimized Default Attribute Initialization**: Avoiding instantiation of certain attributes unless necessary reduces initial memory usage. These changes ensure that the initialization processes and attribute access in the `VoiceConfig` class are streamlined for better performance and efficient memory usage.
⚡️ Codeflash found optimizations for this PR📄 14% (0.14x) speedup for
|
…e-mode-tts`) To optimize this program, we can reduce unnecessary data creation and improve the memory usage. `numpy` itself is already highly optimized for performance, but let's ensure that we are making the most efficient use of its capabilities. ### Changes Made. 1. Within the `astype` function, included `order='C'` in order to ensure that the operation is done in C-style memory order which is generally faster for contiguous arrays. 2. Added `copy=False` argument which prevents creating a new array if the dtype conversion can be done in-place, saving memory and time. This should result in a more memory-efficient conversion process while maintaining the same functionality and data integrity.
|
||
|
||
def pcm16_to_float_array(pcm_data): | ||
values = np.frombuffer(pcm_data, dtype=np.int16).astype(np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 72% (0.72x) speedup for pcm16_to_float_array
⏱️ Runtime : 3.82 milliseconds
→ 2.22 milliseconds
(best of 202
runs)
📝 Explanation and details
To optimize this program, we can reduce unnecessary data creation and improve the memory usage. numpy
itself is already highly optimized for performance, but let's ensure that we are making the most efficient use of its capabilities.
Changes Made.
- Within the
astype
function, includedorder='C'
in order to ensure that the operation is done in C-style memory order which is generally faster for contiguous arrays. - Added
copy=False
argument which prevents creating a new array if the dtype conversion can be done in-place, saving memory and time.
This should result in a more memory-efficient conversion process while maintaining the same functionality and data integrity.
✅ Correctness verification report:
Test | Status |
---|---|
⚙️ Existing Unit Tests | 🔘 None Found |
🌀 Generated Regression Tests | ✅ 28 Passed |
⏪ Replay Tests | 🔘 None Found |
🔎 Concolic Coverage Tests | 🔘 None Found |
📊 Tests Coverage | undefined |
🌀 Generated Regression Tests Details
import numpy as np
# imports
import pytest # used for our unit tests
from langflow.api.v1.voice_mode import pcm16_to_float_array
# unit tests
def test_basic_functionality():
# Standard PCM Data
pcm_data = b'\x00\x00\xff\x7f\x01\x80' # [0, 32767, -32767]
expected = np.array([0, 32767, -32767], dtype=np.float32) / 32768.0
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
def test_edge_empty_buffer():
# Empty Buffer
pcm_data = b''
expected = np.array([], dtype=np.float32)
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
def test_edge_single_value():
# Single Value
pcm_data = b'\x00\x00' # [0]
expected = np.array([0], dtype=np.float32) / 32768.0
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
def test_edge_max_positive_value():
# Maximum Positive Value
pcm_data = b'\xff\x7f' # [32767]
expected = np.array([32767], dtype=np.float32) / 32768.0
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
def test_edge_max_negative_value():
# Maximum Negative Value
pcm_data = b'\x00\x80' # [-32768]
expected = np.array([-32768], dtype=np.float32) / 32768.0
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
def test_boundary_alternating_extremes():
# Alternating Extremes
pcm_data = b'\xff\x7f\x00\x80\xff\x7f\x00\x80' # [32767, -32768, 32767, -32768]
expected = np.array([32767, -32768, 32767, -32768], dtype=np.float32) / 32768.0
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
def test_large_scale_large_buffer():
# Large Buffer
pcm_data = b'\x00\x00' * 1000000 # Array of one million zeros
expected = np.zeros(1000000, dtype=np.float32)
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
def test_random_data():
# Random PCM Data
np.random.seed(0) # Seed for reproducibility
random_pcm = np.random.randint(-32768, 32767, size=100, dtype=np.int16)
pcm_data = random_pcm.tobytes()
expected = random_pcm.astype(np.float32) / 32768.0
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
def test_special_all_zeros():
# All Zeros
pcm_data = b'\x00\x00\x00\x00\x00\x00' # [0, 0, 0]
expected = np.array([0, 0, 0], dtype=np.float32) / 32768.0
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
def test_special_all_ones():
# All Ones
pcm_data = b'\x01\x00\x01\x00\x01\x00' # [1, 1, 1]
expected = np.array([1, 1, 1], dtype=np.float32) / 32768.0
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
def test_invalid_non_pcm_data():
# Non-PCM Data
pcm_data = b'\x01\x02\x03\x04' # Random bytes
expected = np.array([513, 1027], dtype=np.float32) / 32768.0
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
def test_mixed_values():
# Mixed Positive and Negative Values
pcm_data = b'\x00\x00\xff\x7f\x00\x80\x01\x00' # [0, 32767, -32768, 1]
expected = np.array([0, 32767, -32768, 1], dtype=np.float32) / 32768.0
codeflash_output = pcm16_to_float_array(pcm_data); result = codeflash_output
# Run the tests
if __name__ == "__main__":
pytest.main()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np
# imports
import pytest # used for our unit tests
from langflow.api.v1.voice_mode import pcm16_to_float_array
# unit tests
def test_basic_functionality():
# Typical PCM Data
pcm_data = b'\x00\x00\xff\x7f\x00\x80\x00@\xff\xbf' # [0, 32767, -32768, 16384, -16384]
expected = np.array([0, 32767, -32768, 16384, -16384], dtype=np.float32) / 32768.0
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
# Repeating Pattern
pcm_data = b'\xe8\x03\x18\xfc' # [1000, -1000]
expected = np.array([1000, -1000], dtype=np.float32) / 32768.0
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
def test_edge_cases():
# Empty Buffer
pcm_data = b''
expected = np.array([], dtype=np.float32)
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
# Single Value
pcm_data = b'\x00\x00' # [0]
expected = np.array([0], dtype=np.float32) / 32768.0
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
# Minimum and Maximum Values
pcm_data = b'\x00\x80\xff\x7f' # [-32768, 32767]
expected = np.array([-32768, 32767], dtype=np.float32) / 32768.0
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
def test_invalid_inputs():
# Non-Bytes Input
with pytest.raises(TypeError):
pcm16_to_float_array("not a byte buffer")
# Odd-Length Buffer
with pytest.raises(ValueError):
pcm16_to_float_array(b'\x00\x00\x00')
def test_large_scale():
# Large Buffer
pcm_data = np.random.randint(-32768, 32767, size=1000000, dtype=np.int16).tobytes()
expected = np.frombuffer(pcm_data, dtype=np.int16).astype(np.float32) / 32768.0
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
def test_special_values():
# All Zeros
pcm_data = b'\x00\x00' * 100
expected = np.zeros(100, dtype=np.float32)
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
# All Ones
pcm_data = b'\x01\x00' * 100 # [1, 1, 1, ..., 1]
expected = np.ones(100, dtype=np.float32) / 32768.0
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
# Alternating Extremes
pcm_data = b'\x00\x80\xff\x7f' * 50 # [-32768, 32767, -32768, 32767, ...]
expected = np.array([-32768, 32767] * 50, dtype=np.float32) / 32768.0
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
def test_boundary_conditions():
# Near Zero Values
pcm_data = b'\x01\x00\xff\xff' # [1, -1]
expected = np.array([1, -1], dtype=np.float32) / 32768.0
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
def test_real_world_data():
# Audio Snippet
pcm_data = b'\x00\x00\xff\x7f\x00\x80\x00@\xff\xbf' # [0, 32767, -32768, 16384, -16384]
expected = np.array([0, 32767, -32768, 16384, -16384], dtype=np.float32) / 32768.0
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
def test_endianness():
# Little-Endian and Big-Endian
pcm_data_le = b'\x01\x00\x00\x80' # [1, -32768] in little-endian
pcm_data_be = b'\x00\x01\x80\x00' # [1, -32768] in big-endian
expected_le = np.array([1, -32768], dtype=np.float32) / 32768.0
expected_be = np.array([256, -32768], dtype=np.float32) / 32768.0
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data_le), expected_le)
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data_be), expected_be)
def test_mixed_values():
# Mixed Positive and Negative Values
pcm_data = b'\x00\x00\xff\x7f\x00\x80' # [0, 32767, -32768]
expected = np.array([0, 32767, -32768], dtype=np.float32) / 32768.0
np.testing.assert_array_almost_equal(pcm16_to_float_array(pcm_data), expected)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
To test or edit this optimization locally git merge codeflash/optimize-pr7294-2025-03-27T21.14.20
values = np.frombuffer(pcm_data, dtype=np.int16).astype(np.float32) | |
# Use the same buffer for float conversion to save memory | |
values = np.frombuffer(pcm_data, dtype=np.int16) | |
return values.astype(np.float32, order="C", copy=False) / 32768.0 |
…-mode-tts`) To optimize the `create_event_logger` function, we can make a few changes to improve the performance, such as reducing the number of dictionary accesses and avoiding repeated calculations. Instead of repeatedly accessing dictionary keys, we can store their values in local variables. Here is the optimized version. Changes made. 1. Replaced the nested `state` dictionary with local variables `last_event_type` and `event_count`. The use of local variables is faster than accessing dictionary items. 2. Changed `state["event_count"] = int(state["event_count"]) + 1` to `event_count += 1` which is more direct and eliminates redundant conversion to `int`. These changes should provide better performance by minimizing dictionary accesses and simplifying operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This pull request introduces several changes to the voice mode API, including the addition of new WebSocket endpoints, improvements to event logging, and updates to the frontend to support session-based interactions. The most important changes are summarized below:
Backend Changes:
OpenAI
import tovoice_mode.py
to enable integration with OpenAI's API.create_event_logger
function for logging WebSocket events with deduplication and counting.TTSConfig
class and related functions.log_event
function inprocess_vad_audio
with the newcreate_event_logger
function.Frontend Changes:
session_id
toMessagesQueryParams
inuse-get-messages-polling.ts
to support session-based interactions.SessionSelector
component to usesetNewSessionCloseVoiceAssistant
fromvoiceStore
to manage voice assistant sessions.ChatViewWrapper
to passsidebarOpen
prop toChatView
and adjust layout based on sidebar state.