Skip to content

Conversation

@nfx
Copy link
Collaborator

@nfx nfx commented Jan 6, 2024

Summary

The go-libs/llnotes package has been updated with several new features, including the addition of three new files: chain.go, pull_request.go, and release_notes.go. The chain.go file introduces a History type, which is a list of messages with methods to manage and manipulate the messages. Three concrete types, SystemMessage, UserMessage, and AssistantMessage, have been added to represent different roles in a chat conversation. The pull_request.go file introduces the PullRequest() method, which fetches and processes pull request diffs using GitHub's GraphQL API and downloads the diff data. It then iterates through the file diffs and processes them one by one, generating UserMessage with the diff content and appending it to the History. The release_notes.go file introduces the ReleaseNotes() method, which generates a release note blog post for the specified GitHub repository. The llnotes directory has also been added to the use directive in the go.work file, and a new README.md file has been added to provide documentation for the llnotes tool. Additionally, the go.mod file specifies a new dependency on the github.com/databrickslabs/sandbox package version v0.1.0-alpha.1. Overall, these changes add new functionality to the llnotes package for processing pull request diffs, generating release notes for GitHub repositories, and interfacing with the Databricks Serving Endpoints API.

Details

This is a new Go source file, chain.go, added to the go-libs/llnotes package. The file defines several types, including message, SystemMessage, UserMessage, AssistantMessage, History, along with methods for these types. The message type is an interface with a single method, ChatMessage, that returns a serving.ChatMessage. The SystemMessage, UserMessage, and AssistantMessage types implement the message interface and are used to represent different roles in a chat conversation. The History type holds a slice of messages and provides methods for working with the chat history, such as Messages, which converts the history to an array of serving.ChatMessages, and With, which appends a new message to the history while ensuring the total number of tokens in the history does not exceed 32768. The messageTokens method calculates the number of tokens in a given message, and totalTokens computes the total number of tokens in the history. The Last method returns the content of the most recent message in the history.

This is a new Go source file, pull_request.go, added to the go-libs/llnotes package. The file introduces a new method, PullRequest(ctx context.Context, number int), that fetches a GitHub pull request's diff and feeds it to a language model for summarization. The method first retrieves the pull request using the lln.gh.GetPullRequest function and then fetches the diff via an HTTP GET request to the GitHub API. The file diff is then parsed and processed, with each file diff being passed to the Talk function for summarization by a language model. The language model's responses are normalized and accumulated in a History slice. The method then initiates another conversation with the language model for reducing the accumulated summaries to a single paragraph, which is then returned as part of the History slice. Additionally, the file contains constants and variables used for regex-based normalization of the language model's responses.

This is a new Go source file, release_notes.go, added to the go-libs/llnotes package. The file introduces a new method, ReleaseNotes(ctx context.Context), that generates release notes for a GitHub repository. The method first retrieves the repository's versions and compares the latest tagged version to the default branch, collecting commit messages along the way. These commit messages are then passed to the Talk function for summarization by the language model. The language model's responses are expected to summarize the most important features in a fluent, multi-sentence paragraph, suitable for a release note blog post. The blogPrompt constant is used to frame the conversation with the language model, emphasizing the need for a coherent, engaging, and informative summary. The generated release notes are then returned as part of the History slice.

This is a Go source file, talk.go, added to the go-libs/llnotes package. The file introduces several types and functions to facilitate communication with a language model through a Databricks endpoint. The Settings struct holds the necessary configuration to instantiate a new llNotes instance, including Databricks and GitHub configurations, the organization, repository names, commit references, and the identifier for the language model. The httpclient package is used to create an httpclient.ApiClient for communication, while the github and databricks-sdk-go packages are utilized to interact with the GitHub and Databricks APIs, respectively. The New(cfg *Settings) function initializes a new llNotes instance and sets the Databricks HTTP timeout to 300 seconds. It then returns a pointer to the new llNotes instance. The llNotes struct contains a databricks.WorkspaceClient, github.GitHubClient, httpclient.ApiClient, the language model identifier, the GitHub organization, and repository names. The Talk(ctx context.Context, h History) method is defined on the llNotes struct and sends a query to the specified language model using the Databricks API. The method returns a response containing the language model's generated content as part of the History slice. If an error occurs, the method returns an error.

The go.work file is used to configure the Go workspace, which allows managing dependencies and build settings for Go projects. This specific change modifies the use section of the go.work file, adding a new entry for the ./llnotes directory. This change indicates that the llnotes directory is now part of the Go workspace, allowing other modules in the workspace to import and use packages within the llnotes directory. Overall, this change integrates the llnotes package into the Go workspace, making it available for other modules within the workspace to use.

This change creates a new file, README.md, in the llnotes directory. The README.md file provides documentation for the llnotes package. The file starts with YAML front matter, which sets the title, language, author, date, and tags for the documentation. The main content begins with a header, "Generate GitHub release notes with LLMs hosted on Databricks Model Serving", which is also the title. The documentation provides a brief description of the functionality provided by the llnotes package, which is to generate GitHub release notes using a large language model (LLM) hosted on Databricks Model Serving. The README.md file follows the Markdown format, which allows for formatting and styling the text with headers, paragraphs, and other markdown elements. The README.md is an important file for documenting and introducing the purpose and functionality of the llnotes package.

This change creates a go.mod file for the llnotes package. The module line specifies the module's name as github.com/databrickslabs/sandbox/llnotes. The go line specifies the required Go version as 1.21.0. The require section lists the required dependencies and their specific versions. The dependencies for this package are: * github.com/databricks/databricks-sdk-go version v0.33.0 * github.com/sourcegraph/go-diff version v0.7.0 * github.com/spf13/pflag version v1.0.5 The go.mod file is automatically generated by the go command, and it specifies the package's dependencies and their versions. This information is used for dependency management and build isolation in Go modules. The go.mod file is essential for the proper functioning and management of the llnotes package and its dependencies.

This change creates a new file, main.go, in the llnotes package. The main.go file contains the main() function, which is the entry point for the package. The main() function initializes a context, sets the product name and version, and then initializes and runs a new lite.Init[llnotes.Settings] instance. The lite.Init[llnotes.Settings] instance creates a new root command, adds two subcommands, "pull-request" and "release-notes", and then runs the root command. * The "pull-request" subcommand extracts a pull request number and calls the llnotes.PullRequest() function, printing the summary of the pull request. * The "release-notes" subcommand calls the llnotes.ReleaseNotes() function, printing the release notes. The lite.New[llnotes.Settings](...) function initializes a new lite CLI instance with the specified configuration and subcommands. The lite.Command[llnotes.Settings, req]{...} instances define the two subcommands, their flags, and their corresponding handlers. The main.go file provides the llnotes package with a user-facing CLI. The CLI allows users to generate GitHub release notes and pull request summaries using a large language model (LLM) hosted on Databricks Model Serving.

@github-actions
Copy link

github-actions bot commented Feb 27, 2024

✅ 5/5 passed, 1 skipped, 14s total

Running from acceptance #128

@nfx nfx marked this pull request as ready for review March 2, 2024 14:47
@nfx nfx requested a review from a team as a code owner March 2, 2024 14:47
@nfx nfx requested a review from priyal-c March 2, 2024 14:47
@nfx nfx changed the title LLNotes experiment Added llnotes experiment Mar 2, 2024
@nfx nfx merged commit 12d87fc into main Mar 2, 2024
@nfx nfx deleted the llnotes branch March 2, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants