Skip to content

Conversation

shaohuzhang1
Copy link
Contributor

feat: Support iFLYTEK large model for Chinese-English speech recognition

Copy link

f2c-ci-robot bot commented Aug 28, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

f2c-ci-robot bot commented Aug 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

except asyncio.TimeoutError:
break

return result_text
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided Python code seems to be an implementation of a web service client for interfacing with a Spark-based speech-to-text API using the XFZhEnSparkSpeechToText class. Here are some general comments on the code:

  1. SSL Context: The use of SSLContext without specific settings might pose security risks. You should configure your SSL context more securely based on your production requirements.

  2. URL Creation: The method create_url() generates an authorization header by concatenating headers and signing them with the key. This approach is fine but could benefit from better exception handling when errors occur during cryptographic operations.

  3. Audio Sending Logic: The loop that sends chunks breaks prematurely after reading zero bytes, which isn't ideal if you haven't reached end-of-stream in chunks. Consider implementing proper detection for EOF or raising an error accordingly.

  4. WebSocket Connection Handling: It's good practice to have cleaner separation between sending audio and receiving responses. However, the current design has a single function for both.

  5. Error Handling: Currently, all exceptions within methods like handle_audio(), send_audio(), and speech_to_text() catch them locally, leading to unhandled exceptions being logged at the global level (maxkb_logger.error). Improving this ensures exceptions don't silently fail.

  6. Asynchronous vs Synchronous Calls: Most parts of the code assume asynchronous execution (e.g., asyncio.run(handle())). Ensure it aligns with your application architecture. Some functions may still require synchronization due to their nature.

  7. Logging: Using logging instead of printing error messages directly can make debugging easier and adhere to best practices by reducing clutter in console output.

  8. Configuration Management: If you're storing keys/configurations in files, consider moving sensitive information such as API keys into environment variables or secure vault configurations rather than plain text files.

  9. Documentation: Adding docstrings across various methods would improve readability and maintainability of the codebase.

Overall, the code provides a solid foundation for interacting with the Spark API through websockets asynchronously. Continuous testing and improvement are recommended to address performance bottlenecks and robustness issues.

return {**model, 'spark_api_secret': super().encryption(model.get('spark_api_secret', ''))}

def get_model_params_setting_form(self, model_name):
pass No newline at end of file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Python code snippet appears to be part of a Django application that validates and manages model credentials for speech-to-text processing using Xiaofengyun's API (ZhEnXunFei). Here are some potential issues and optimizations:

Potential Issues:

  1. Type Annotations: The use of Dict[str, object] for model_credential can lead to runtime type errors since objects could contain unexpected types. Using specific annotation types like dict[str, Any] would improve clarity.

  2. String Formatting with _: While using Unicode literals (u'...') is no longer necessary in modern versions of Python, it might cause syntax warnings or performance issues in certain environments. Ensure all string formats are correctly specified without explicit encoding hints.

  3. Traceback Logging: Although logging the traceback can be helpful during development, it should not be done directly in production unless you have a need to debug exceptions.

  4. Empty List Check: The line if not any(list(filter(lambda mt: mt.get('value') == model_type, model_type_list))):

    • Is there a reason why you're converting model_type_list to a list before filtering? It seems unnecessary here.
    • A more idiomatic way would be to skip this step if model_type_list is empty beforehand.
  5. Exception Handling in get_model_params_setting_form: This method does nothing useful; its implementation can be removed or modified based on actual requirements.

  6. Unnecessary Empty File at End: There is an empty line at the end of the file that doesn't serve a purpose.

Optimizations:

  1. Use Type Annotations Accurately:

    from typing import Dict, Optional, Any
    
    class ZhEnXunFeiSTTModelCredential(BaseForm, BaseModelCredential):
        # ...
  2. Remove Unnecessary Conversion:

    model_type_list = provider.get_model_type_list()
    if not model_type_list:
        raise AppApiException(ValidCode.valid_error.value, _("No valid model types found"))
    # ... rest of the validation logic remains unchanged ...
  3. Simplify Exception Handling:
    Instead of raising exceptions within exception handling blocks, catch them separately and handle each case appropriately. For example:

    try:
        model = provider.get_model(model_type, model_name, credential)
        model.check_auth()
    except AppApiException as e:
        if not raise_exception:
            return False
        raise
    except Exception as e:
        traceback.print_exc()
        if raise_exception:
            raise AppApiException(ValidCode.valid_error.value, _('Verification failed'))
        return False
  4. Consider Adding More Specific Exceptions:
    Depending on the complexity, you might want to introduce specific exception classes or wrap existing ones to better describe the context of each failure.

Apply these improvements where appropriate to ensure the code maintains correctness, readability, and maintainability while also considering future scalability needs.

@@ -47,7 +54,8 @@
.append_default_model_info(
ModelInfo('generalv3.5', '', ModelTypeConst.LLM, xunfei_model_credential, XFChatSparkLLM))
.append_default_model_info(
ModelInfo('iat', _('Chinese and English recognition'), ModelTypeConst.STT, stt_model_credential, XFSparkSpeechToText),
ModelInfo('iat', _('Chinese and English recognition'), ModelTypeConst.STT, stt_model_credential,
XFSparkSpeechToText),
)
.append_default_model_info(
ModelInfo('tts', '', ModelTypeConst.TTS, tts_model_credential, XFSparkTextToSpeech))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no significant issues with the provided Python code snippet. However, I have a few optimizations and improvements you might consider:

  1. Avoid Using append_default_model_info: If this is intended to add default model information, ensure that the class DefaultModelInfoProvider supports appending without causing side effects.

  2. Consistent Use of Quotation Marks: In some places, double quotes (") and single quotes ('') are used interchangeably. Consistency is preferred, especially if you're using string literals throughout.

  3. Line Length: The line lengths in the file can be improved for better readability and maintainability. Consider breaking down long lines into multiple parts or using triple quotes for larger strings.

Here’s an updated version of the code with these considerations:

from models_provider.impl.xf_model_provider.credential.llm import XunFeiLLMModelCredential
from models_provider.impl.xf_model_provider.credential.stt import XunFeiSTTModelCredential
from models_provider.impl.xf_model_provider.credential.tts import XunFeiTTSModelCredential
from models_provider.impl.xf_model_provider.credential.zh_en_stt import ZhEnXunFeiSTTModelCredential

from models_provider.impl.xf_model_provider.model.embedding import XFEmbedding
from models_provider.impl.xf_model_provider.model.image import XFSparkImage
from models_provider.impl.xf_model_provider.model.llm import XFChatSparkLLM

from maxkb.conf import PROJECT_DIR
from django.utils.translation import gettext as _

import ssl

ssl._create_default_https_context = ssl.create_default_context()

xunfei_model_credential = XunFeiLLMModelCredential()
stt_model_credential = XunFeiSTTModelCredential()
zh_en_stt_credential = ZhEnXunFeiSTTModelCredential()
image_model_credential = XunFeiImageModelCredential()
tts_model_credential = XunFeiTTSModelCredential()
embedding_model_credential = XFEmbeddingCredential()

model_info_list = [
    ModelInfo('generalv3.5', "", ModelTypeConst.LLM, xunfei_model_credential, XFChatSparkLLM),
    ModelInfo('', 'General Version v3.0', ModelTypeConst.LLM, xunfei_model_credential, XFChatSparkLLM),
    ModelInfo('', 'General Version v2.0', ModelTypeConst.LLM, xunfei_model_credential, XFChatSparkLLM),

    # Simplify Chinese and English Recognition model info
    ModelInfo('iat', _('Chinese and English Recognition'), ModelTypeConst.STT,
              stt_model_credential, XFSparkSpeechToText),
    
    # New STT model info for Chinese and English
    ModelInfo('slm', _('Chinese and English Recognition'), ModelTypeConst.STT,
              zh_en_stt_credential, XFZhEnSparkSpeechToText),

    ModelInfo("", "Text-to-Speech", ModelTypeConst.TTS, tts_model_credential, XFSparkTextToSpeech),
    ModelInfo("embedding", "Sentence Embeddings", ModelTypeConst.EMBEDDING,
              embedding_model_credential, XFEmbedding)
]

By applying these changes, the code becomes more readable and consistent in terms of string handling and overall structure.

@zhanweizhang7 zhanweizhang7 merged commit 4786970 into v2 Aug 28, 2025
3 of 6 checks passed
@zhanweizhang7 zhanweizhang7 deleted the pr@v2@feat_support_xunfei_chinese_english_speech_recognition branch August 28, 2025 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants