Conversation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
There was a problem hiding this comment.
Pull request overview
This pull request adds a new generate_song action that enables AI agents to generate music/songs using LocalAI's /sound endpoint (ACE-Step mode). The feature allows agents to create audio based on captions, lyrics, BPM, key scale, language, and duration parameters. Generated songs are saved to local disk and automatically sent to users through Telegram and Slack connectors.
Changes:
- Added
GenSongActionwith comprehensive audio format detection and file handling - Integrated song sending capability in Telegram and Slack connectors
- Registered the new action in the service layer with appropriate configuration metadata
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| services/actions/gensong.go | New action implementation for generating songs via LocalAI /sound endpoint with audio format detection and local file storage |
| services/actions.go | Registration of generate_song action constant, availability list, default configuration, and factory method |
| services/connectors/telegram.go | Added sendSongToTelegram function and MetadataSongs handling in handleMultimediaContent |
| services/connectors/slack.go | Added song file upload support in generateAttachmentsFromJobResponse for MetadataSongs metadata |
| go.mod | Added github.com/dhowden/tag dependency for audio format detection |
| go.sum | Checksums for the new dhowden/tag library dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| reqBody := soundRequest{ | ||
| Model: a.model, | ||
| Caption: result.Caption, | ||
| Lyrics: result.Lyrics, | ||
| Keyscale: result.Keyscale, | ||
| Language: result.Language, | ||
| DurationSeconds: result.Duration, | ||
| BPM: result.BPM, | ||
| } |
There was a problem hiding this comment.
The model parameter from the Run method's input is being ignored. The code always uses a.model (from action configuration) instead of checking if result.Model is provided. If the user specifies a model in the action parameters, it should override the default configuration model.
Consider using result.Model when it's not empty, falling back to a.model when it is empty.
|
|
||
| // Handle songs from generate_song action (local file paths) | ||
| if songPaths, exists := state.Metadata[actions.MetadataSongs]; exists { | ||
| for _, path := range xstrings.UniqueSlice(songPaths.([]string)) { |
There was a problem hiding this comment.
The type assertion songPaths.([]string) is unsafe and will panic if the metadata value is not exactly []string. The codebase shows similar unsafe type assertions for other metadata (e.g., urls.([]string) in irc.go:225, slack.go:187, telegram.go:570), but this is still a potential runtime panic risk.
Consider adding a type assertion check with the two-value form to gracefully handle type mismatches.
|
|
||
| // coming from the generate_song action (local file paths) | ||
| if songPaths, exists := state.Metadata[actions.MetadataSongs]; exists { | ||
| for _, path := range xstrings.UniqueSlice(songPaths.([]string)) { |
There was a problem hiding this comment.
The type assertion songPaths.([]string) is unsafe and will panic if the metadata value is not exactly []string. The codebase shows similar unsafe type assertions for other metadata (e.g., urls.([]string) in irc.go:225, slack.go:187, telegram.go:570), but this is still a potential runtime panic risk.
Consider adding a type assertion check with the two-value form to gracefully handle type mismatches.
| req.Header.Set("xi-api-key", a.apiKey) | ||
| } | ||
|
|
||
| client := &http.Client{} |
There was a problem hiding this comment.
The HTTP client is created without any timeout configuration. Long-running song generation requests could hang indefinitely if the backend service becomes unresponsive. This pattern is consistent with other actions in the codebase (browse.go:32, webhook.go:127), but it's still a potential issue.
Consider setting reasonable timeouts on the HTTP client or relying on the context timeout from the caller.
| client := &http.Client{} | |
| client := &http.Client{ | |
| Timeout: 60 * time.Second, | |
| } |
| func (a *GenSongAction) Run(ctx context.Context, sharedState *types.AgentSharedState, params types.ActionParams) (types.ActionResult, error) { | ||
| result := struct { | ||
| Caption string `json:"caption"` | ||
| Lyrics string `json:"lyrics"` | ||
| BPM *int `json:"bpm"` | ||
| Keyscale string `json:"keyscale"` | ||
| Language string `json:"language"` | ||
| Duration *float64 `json:"duration_seconds"` | ||
| Model string `json:"model"` | ||
| }{} | ||
| if err := params.Unmarshal(&result); err != nil { | ||
| return types.ActionResult{}, err | ||
| } | ||
|
|
||
| if result.Caption == "" { | ||
| return types.ActionResult{}, fmt.Errorf("caption is required") | ||
| } | ||
|
|
||
| if a.outputDir == "" { | ||
| return types.ActionResult{}, fmt.Errorf("outputDir is required for generate_song (configure the action with an output directory)") | ||
| } | ||
|
|
||
| reqBody := soundRequest{ | ||
| Model: a.model, | ||
| Caption: result.Caption, | ||
| Lyrics: result.Lyrics, | ||
| Keyscale: result.Keyscale, | ||
| Language: result.Language, | ||
| DurationSeconds: result.Duration, | ||
| BPM: result.BPM, | ||
| } | ||
|
|
||
| body, err := json.Marshal(reqBody) | ||
| if err != nil { | ||
| return types.ActionResult{}, err | ||
| } | ||
|
|
||
| url := a.apiURL + "/sound" | ||
| req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(body)) | ||
| if err != nil { | ||
| return types.ActionResult{}, err | ||
| } | ||
| req.Header.Set("Content-Type", "application/json") | ||
| if a.apiKey != "" { | ||
| req.Header.Set("xi-api-key", a.apiKey) | ||
| } | ||
|
|
||
| client := &http.Client{} | ||
| resp, err := client.Do(req) | ||
| if err != nil { | ||
| return types.ActionResult{Result: "Failed to generate song: " + err.Error()}, err | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| msg, _ := io.ReadAll(resp.Body) | ||
| return types.ActionResult{}, fmt.Errorf("sound endpoint failed: %s: %s", resp.Status, string(msg)) | ||
| } | ||
|
|
||
| audioBytes, err := io.ReadAll(resp.Body) | ||
| if err != nil { | ||
| return types.ActionResult{}, err | ||
| } | ||
| if len(audioBytes) == 0 { | ||
| return types.ActionResult{}, fmt.Errorf("no audio data returned") | ||
| } | ||
|
|
||
| ext := audioExtensionFromContentType(resp.Header.Get("Content-Type")) | ||
| if ext == "" { | ||
| ext = audioExtensionFromTag(audioBytes) | ||
| } | ||
| if ext == "" { | ||
| ext = "flac" // default when unknown (e.g. ACE-Step) | ||
| } | ||
|
|
||
| filename := fmt.Sprintf("song_%d.%s", time.Now().UnixNano(), ext) | ||
| savedPath := filepath.Join(a.outputDir, filename) | ||
| if err := os.WriteFile(savedPath, audioBytes, 0644); err != nil { | ||
| return types.ActionResult{}, fmt.Errorf("failed to save song: %w", err) | ||
| } | ||
|
|
||
| return types.ActionResult{ | ||
| Result: fmt.Sprintf("The song was generated and saved to: %s", savedPath), | ||
| Metadata: map[string]interface{}{ | ||
| MetadataSongs: []string{savedPath}, | ||
| }, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
The new generate_song action does not have any test coverage. The codebase has test coverage for similar actions like genimage_test.go, memory_test.go, and various GitHub-related action tests. Following the established pattern in the codebase, this action should have corresponding test coverage for at least the basic success and error scenarios.
Consider adding a test file gensong_test.go with tests similar to the pattern in genimage_test.go.
| filename := fmt.Sprintf("song_%d.%s", time.Now().UnixNano(), ext) | ||
| savedPath := filepath.Join(a.outputDir, filename) | ||
| if err := os.WriteFile(savedPath, audioBytes, 0644); err != nil { | ||
| return types.ActionResult{}, fmt.Errorf("failed to save song: %w", err) | ||
| } |
There was a problem hiding this comment.
Generated song files are saved to the local filesystem but are never cleaned up after being sent to connectors. This could lead to disk space accumulation over time, especially if the song generation feature is used frequently. While there is a cleanOnStart option, this only cleans files on startup, not after they've been sent.
Consider implementing a mechanism to clean up generated song files after they have been successfully sent to connectors, or add documentation about manual cleanup requirements.
| } | ||
|
|
||
| return types.ActionResult{ | ||
| Result: fmt.Sprintf("The song was generated and saved to: %s", savedPath), |
There was a problem hiding this comment.
The error message includes the full file path in the action result, which could expose internal server directory structure to users. For example: "The song was generated and saved to: /var/app/songs/song_1234567890.flac". Consider using just the filename or a more generic message.
While this might be acceptable for an internal system, it's worth considering if this information should be exposed to end users through chat connectors.
| Result: fmt.Sprintf("The song was generated and saved to: %s", savedPath), | |
| Result: fmt.Sprintf("The song was generated and saved as file: %s", filepath.Base(savedPath)), |
| if a.outputDir != "" { | ||
| if err := os.MkdirAll(a.outputDir, 0755); err != nil { |
There was a problem hiding this comment.
The outputDir is not validated when reading from configuration. If the directory path contains special characters or path traversal sequences (e.g., "../"), this could potentially lead to files being written outside the intended directory. While this is configuration-based rather than user input, it's still worth validating the path to ensure it points to a valid, safe location.
Consider adding path validation to ensure outputDir is an absolute path or at least properly sanitized.
| } | ||
| filename := filepath.Base(path) | ||
| if filename == "" || filename == "." { | ||
| filename = "audio" |
There was a problem hiding this comment.
The filename fallback logic sets filename to "audio" when filepath.Base returns empty string or ".". However, this filename has no file extension, which could cause issues when uploading to Slack. The file might not be recognized as an audio file without an extension.
Consider adding a default extension to the fallback filename (e.g., "audio.flac" or "audio.mp3") to ensure proper file type recognition.
| filename = "audio" | |
| filename = "audio.mp3" |
| if a.outputDir != "" { | ||
| if err := os.MkdirAll(a.outputDir, 0755); err != nil { | ||
| // log but continue; Run will fail with a clear error when saving | ||
| _ = err | ||
| } | ||
| if a.cleanOnStart { | ||
| entries, err := os.ReadDir(a.outputDir) | ||
| if err == nil { | ||
| for _, e := range entries { | ||
| _ = os.Remove(filepath.Join(a.outputDir, e.Name())) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The code silently ignores errors from os.MkdirAll and os.Remove operations during initialization. While this might be intentional to allow graceful degradation, it could lead to confusing failures later when trying to save files. The comment suggests Run will fail with a clear error, but it would be more user-friendly to log these initialization errors for debugging purposes.
Consider at least logging initialization errors even if the application continues to run.
This adds a new action to enable sound generation.
Needs/follows: mudler/LocalAI#8396