Skip to content

RuntimeError: The size of tensor a (1174) must match the size of tensor b (903) at non-singleton dimension 1 #10

Open
@KeckiTizii

Description

@KeckiTizii

In my code, I included infer_file() in the async def rvc_tts() function and called it in the while loop. When the rvc_tts() function is called for the first time, it works fine, but when the loop starts again, it gives this error:

2024-05-11 13:54:26 | WARNING | rvc_python.modules.vc.modules | Traceback (most recent call last):
  File "d:\AI\MeguminAIProject\venv\lib\site-packages\rvc_python\modules\vc\modules.py", line 184, in vc_single
    audio_opt = self.pipeline.pipeline(
  File "d:\AI\MeguminAIProject\venv\lib\site-packages\rvc_python\modules\vc\pipeline.py", line 415, in pipeline
    self.vc(
  File "d:\AI\MeguminAIProject\venv\lib\site-packages\rvc_python\modules\vc\pipeline.py", line 268, in vc
    feats = feats * pitchff + feats0 * (1 - pitchff)
RuntimeError: The size of tensor a (1174) must match the size of tensor b (903) at non-singleton dimension 1

In which tensor a (1174) and tensor b (903) have continuously changing values, for example: The size of tensor a (2590) must match the size of tensor b (1322) at non-singleton dimension 1.
Here is my code:

import asyncio
import edge_tts
import os
from pydub import AudioSegment, playback
from rvc_python.infer import infer_file
from googletrans import Translator
from characterai import aiocai

translator = Translator()
char = ""
client = aiocai.Client("")
OUTPUT_EDGE = "audio/megumin-edge.wav"
OUTPUT_RVC = "audio/megumin-rvc.wav"
VOICES = [ 'ja-JP-NanamiNeural']
VOICE = VOICES[0]

async def edge_tts(translation):
    communicate = edge_tts.Communicate(translation, VOICE, rate = "+20%")
    await communicate.save(OUTPUT_EDGE)

async def rvc_tts():
    infer_file(
    input_path=OUTPUT_EDGE,
    model_path="model/megumin.pth",
    device="cuda", # Use cpu or cuda
    f0method="harvest",  # Choose between 'harvest', 'crepe', 'rmvpe', 'pm'
    f0up_key=2,  # Transpose setting
    opt_path=OUTPUT_RVC,  # Output file path
    filter_radius=3,
    resample_sr=0,  # Set to desired sample rate or 0 for no resampling.
    rms_mix_rate=0.25,
    protect=0.33,
    version="v2"
)

async def main():
    me = await client.get_me()
    async with await client.connect() as chat:
        new, answer = await chat.new_chat(
            char, me.id
        )

        print(f'{answer.name}: {answer.text}')
        
        while True:
            text = input("You: ")

            message = await chat.send_message(
                char, new.chat_id, text
            )

            translation = translator.translate(message.text, dest='ja').text
            print(f'{message.name}: {translation}')
            await edge_tts(translation)
            await rvc_tts()
            PLAY_RVC = AudioSegment.from_wav(OUTPUT_RVC)
            playback.play(PLAY_RVC)
            os.remove(OUTPUT_EDGE)
            os.remove(OUTPUT_RVC)

asyncio.run(main())

Thanks for the help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions