Currently users need to register different pre and post function for different modalities (e.g., text, image, audio, embedding) (related PR). This approach doesn't emphasize ease of use and might lead to higher risk of misconfiguration.
Therefore, we need to create a unified pre-processing and post-processing function for Nova MME model that will automatically detect the modality and route to the appropriate processing logic.