Skip to content

feat(gen-song): add song generation action#404

Merged
mudler merged 1 commit intomainfrom
feat/song-gen
Feb 5, 2026
Merged

feat(gen-song): add song generation action#404
mudler merged 1 commit intomainfrom
feat/song-gen

Conversation

@mudler
Copy link
Owner

@mudler mudler commented Feb 5, 2026

This adds a new action to enable sound generation.

Needs/follows: mudler/LocalAI#8396

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Copilot AI review requested due to automatic review settings February 5, 2026 17:45
@mudler mudler merged commit 4f4ad50 into main Feb 5, 2026
7 checks passed
@mudler mudler deleted the feat/song-gen branch February 5, 2026 18:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a new generate_song action that enables AI agents to generate music/songs using LocalAI's /sound endpoint (ACE-Step mode). The feature allows agents to create audio based on captions, lyrics, BPM, key scale, language, and duration parameters. Generated songs are saved to local disk and automatically sent to users through Telegram and Slack connectors.

Changes:

  • Added GenSongAction with comprehensive audio format detection and file handling
  • Integrated song sending capability in Telegram and Slack connectors
  • Registered the new action in the service layer with appropriate configuration metadata

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
services/actions/gensong.go New action implementation for generating songs via LocalAI /sound endpoint with audio format detection and local file storage
services/actions.go Registration of generate_song action constant, availability list, default configuration, and factory method
services/connectors/telegram.go Added sendSongToTelegram function and MetadataSongs handling in handleMultimediaContent
services/connectors/slack.go Added song file upload support in generateAttachmentsFromJobResponse for MetadataSongs metadata
go.mod Added github.com/dhowden/tag dependency for audio format detection
go.sum Checksums for the new dhowden/tag library dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +152 to +160
reqBody := soundRequest{
Model: a.model,
Caption: result.Caption,
Lyrics: result.Lyrics,
Keyscale: result.Keyscale,
Language: result.Language,
DurationSeconds: result.Duration,
BPM: result.BPM,
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model parameter from the Run method's input is being ignored. The code always uses a.model (from action configuration) instead of checking if result.Model is provided. If the user specifies a model in the action parameters, it should override the default configuration model.

Consider using result.Model when it's not empty, falling back to a.model when it is empty.

Copilot uses AI. Check for mistakes.

// Handle songs from generate_song action (local file paths)
if songPaths, exists := state.Metadata[actions.MetadataSongs]; exists {
for _, path := range xstrings.UniqueSlice(songPaths.([]string)) {
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type assertion songPaths.([]string) is unsafe and will panic if the metadata value is not exactly []string. The codebase shows similar unsafe type assertions for other metadata (e.g., urls.([]string) in irc.go:225, slack.go:187, telegram.go:570), but this is still a potential runtime panic risk.

Consider adding a type assertion check with the two-value form to gracefully handle type mismatches.

Copilot uses AI. Check for mistakes.

// coming from the generate_song action (local file paths)
if songPaths, exists := state.Metadata[actions.MetadataSongs]; exists {
for _, path := range xstrings.UniqueSlice(songPaths.([]string)) {
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type assertion songPaths.([]string) is unsafe and will panic if the metadata value is not exactly []string. The codebase shows similar unsafe type assertions for other metadata (e.g., urls.([]string) in irc.go:225, slack.go:187, telegram.go:570), but this is still a potential runtime panic risk.

Consider adding a type assertion check with the two-value form to gracefully handle type mismatches.

Copilot uses AI. Check for mistakes.
req.Header.Set("xi-api-key", a.apiKey)
}

client := &http.Client{}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTP client is created without any timeout configuration. Long-running song generation requests could hang indefinitely if the backend service becomes unresponsive. This pattern is consistent with other actions in the codebase (browse.go:32, webhook.go:127), but it's still a potential issue.

Consider setting reasonable timeouts on the HTTP client or relying on the context timeout from the caller.

Suggested change
client := &http.Client{}
client := &http.Client{
Timeout: 60 * time.Second,
}

Copilot uses AI. Check for mistakes.
Comment on lines +130 to +217
func (a *GenSongAction) Run(ctx context.Context, sharedState *types.AgentSharedState, params types.ActionParams) (types.ActionResult, error) {
result := struct {
Caption string `json:"caption"`
Lyrics string `json:"lyrics"`
BPM *int `json:"bpm"`
Keyscale string `json:"keyscale"`
Language string `json:"language"`
Duration *float64 `json:"duration_seconds"`
Model string `json:"model"`
}{}
if err := params.Unmarshal(&result); err != nil {
return types.ActionResult{}, err
}

if result.Caption == "" {
return types.ActionResult{}, fmt.Errorf("caption is required")
}

if a.outputDir == "" {
return types.ActionResult{}, fmt.Errorf("outputDir is required for generate_song (configure the action with an output directory)")
}

reqBody := soundRequest{
Model: a.model,
Caption: result.Caption,
Lyrics: result.Lyrics,
Keyscale: result.Keyscale,
Language: result.Language,
DurationSeconds: result.Duration,
BPM: result.BPM,
}

body, err := json.Marshal(reqBody)
if err != nil {
return types.ActionResult{}, err
}

url := a.apiURL + "/sound"
req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(body))
if err != nil {
return types.ActionResult{}, err
}
req.Header.Set("Content-Type", "application/json")
if a.apiKey != "" {
req.Header.Set("xi-api-key", a.apiKey)
}

client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
return types.ActionResult{Result: "Failed to generate song: " + err.Error()}, err
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
msg, _ := io.ReadAll(resp.Body)
return types.ActionResult{}, fmt.Errorf("sound endpoint failed: %s: %s", resp.Status, string(msg))
}

audioBytes, err := io.ReadAll(resp.Body)
if err != nil {
return types.ActionResult{}, err
}
if len(audioBytes) == 0 {
return types.ActionResult{}, fmt.Errorf("no audio data returned")
}

ext := audioExtensionFromContentType(resp.Header.Get("Content-Type"))
if ext == "" {
ext = audioExtensionFromTag(audioBytes)
}
if ext == "" {
ext = "flac" // default when unknown (e.g. ACE-Step)
}

filename := fmt.Sprintf("song_%d.%s", time.Now().UnixNano(), ext)
savedPath := filepath.Join(a.outputDir, filename)
if err := os.WriteFile(savedPath, audioBytes, 0644); err != nil {
return types.ActionResult{}, fmt.Errorf("failed to save song: %w", err)
}

return types.ActionResult{
Result: fmt.Sprintf("The song was generated and saved to: %s", savedPath),
Metadata: map[string]interface{}{
MetadataSongs: []string{savedPath},
},
}, nil
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new generate_song action does not have any test coverage. The codebase has test coverage for similar actions like genimage_test.go, memory_test.go, and various GitHub-related action tests. Following the established pattern in the codebase, this action should have corresponding test coverage for at least the basic success and error scenarios.

Consider adding a test file gensong_test.go with tests similar to the pattern in genimage_test.go.

Copilot uses AI. Check for mistakes.
Comment on lines +205 to +209
filename := fmt.Sprintf("song_%d.%s", time.Now().UnixNano(), ext)
savedPath := filepath.Join(a.outputDir, filename)
if err := os.WriteFile(savedPath, audioBytes, 0644); err != nil {
return types.ActionResult{}, fmt.Errorf("failed to save song: %w", err)
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated song files are saved to the local filesystem but are never cleaned up after being sent to connectors. This could lead to disk space accumulation over time, especially if the song generation feature is used frequently. While there is a cleanOnStart option, this only cleans files on startup, not after they've been sent.

Consider implementing a mechanism to clean up generated song files after they have been successfully sent to connectors, or add documentation about manual cleanup requirements.

Copilot uses AI. Check for mistakes.
}

return types.ActionResult{
Result: fmt.Sprintf("The song was generated and saved to: %s", savedPath),
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message includes the full file path in the action result, which could expose internal server directory structure to users. For example: "The song was generated and saved to: /var/app/songs/song_1234567890.flac". Consider using just the filename or a more generic message.

While this might be acceptable for an internal system, it's worth considering if this information should be exposed to end users through chat connectors.

Suggested change
Result: fmt.Sprintf("The song was generated and saved to: %s", savedPath),
Result: fmt.Sprintf("The song was generated and saved as file: %s", filepath.Base(savedPath)),

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +105
if a.outputDir != "" {
if err := os.MkdirAll(a.outputDir, 0755); err != nil {
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outputDir is not validated when reading from configuration. If the directory path contains special characters or path traversal sequences (e.g., "../"), this could potentially lead to files being written outside the intended directory. While this is configuration-based rather than user input, it's still worth validating the path to ensure it points to a valid, safe location.

Consider adding path validation to ensure outputDir is an absolute path or at least properly sanitized.

Copilot uses AI. Check for mistakes.
}
filename := filepath.Base(path)
if filename == "" || filename == "." {
filename = "audio"
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename fallback logic sets filename to "audio" when filepath.Base returns empty string or ".". However, this filename has no file extension, which could cause issues when uploading to Slack. The file might not be recognized as an audio file without an extension.

Consider adding a default extension to the fallback filename (e.g., "audio.flac" or "audio.mp3") to ensure proper file type recognition.

Suggested change
filename = "audio"
filename = "audio.mp3"

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +116
if a.outputDir != "" {
if err := os.MkdirAll(a.outputDir, 0755); err != nil {
// log but continue; Run will fail with a clear error when saving
_ = err
}
if a.cleanOnStart {
entries, err := os.ReadDir(a.outputDir)
if err == nil {
for _, e := range entries {
_ = os.Remove(filepath.Join(a.outputDir, e.Name()))
}
}
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code silently ignores errors from os.MkdirAll and os.Remove operations during initialization. While this might be intentional to allow graceful degradation, it could lead to confusing failures later when trying to save files. The comment suggests Run will fail with a clear error, but it would be more user-friendly to log these initialization errors for debugging purposes.

Consider at least logging initialization errors even if the application continues to run.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant