Skip to content

Commit 12d87fc

Browse files
authored
Added llnotes experiment (#42)
## Summary The `go-libs/llnotes` package has been updated with several new features, including the addition of three new files: `chain.go`, `pull_request.go`, and `release_notes.go`. The `chain.go` file introduces a `History` type, which is a list of messages with methods to manage and manipulate the messages. Three concrete types, `SystemMessage`, `UserMessage`, and `AssistantMessage`, have been added to represent different roles in a chat conversation. The `pull_request.go` file introduces the `PullRequest()` method, which fetches and processes pull request diffs using GitHub's GraphQL API and downloads the diff data. It then iterates through the file diffs and processes them one by one, generating `UserMessage` with the diff content and appending it to the `History`. The `release_notes.go` file introduces the `ReleaseNotes()` method, which generates a release note blog post for the specified GitHub repository. The `llnotes` directory has also been added to the `use` directive in the `go.work` file, and a new `README.md` file has been added to provide documentation for the `llnotes` tool. Additionally, the `go.mod` file specifies a new dependency on the `github.com/databrickslabs/sandbox` package version `v0.1.0-alpha.1`. Overall, these changes add new functionality to the `llnotes` package for processing pull request diffs, generating release notes for GitHub repositories, and interfacing with the Databricks Serving Endpoints API. ## Details This is a new Go source file, `chain.go`, added to the `go-libs/llnotes` package. The file defines several types, including `message`, `SystemMessage`, `UserMessage`, `AssistantMessage`, `History`, along with methods for these types. The `message` type is an interface with a single method, `ChatMessage`, that returns a `serving.ChatMessage`. The `SystemMessage`, `UserMessage`, and `AssistantMessage` types implement the `message` interface and are used to represent different roles in a chat conversation. The `History` type holds a slice of `message`s and provides methods for working with the chat history, such as `Messages`, which converts the history to an array of `serving.ChatMessage`s, and `With`, which appends a new `message` to the history while ensuring the total number of tokens in the history does not exceed 32768. The `messageTokens` method calculates the number of tokens in a given message, and `totalTokens` computes the total number of tokens in the history. The `Last` method returns the content of the most recent message in the history. This is a new Go source file, `pull_request.go`, added to the `go-libs/llnotes` package. The file introduces a new method, `PullRequest(ctx context.Context, number int)`, that fetches a GitHub pull request's diff and feeds it to a language model for summarization. The method first retrieves the pull request using the `lln.gh.GetPullRequest` function and then fetches the diff via an HTTP GET request to the GitHub API. The file diff is then parsed and processed, with each file diff being passed to the `Talk` function for summarization by a language model. The language model's responses are normalized and accumulated in a `History` slice. The method then initiates another conversation with the language model for reducing the accumulated summaries to a single paragraph, which is then returned as part of the `History` slice. Additionally, the file contains constants and variables used for regex-based normalization of the language model's responses. This is a new Go source file, `release_notes.go`, added to the `go-libs/llnotes` package. The file introduces a new method, `ReleaseNotes(ctx context.Context)`, that generates release notes for a GitHub repository. The method first retrieves the repository's versions and compares the latest tagged version to the default branch, collecting commit messages along the way. These commit messages are then passed to the `Talk` function for summarization by the language model. The language model's responses are expected to summarize the most important features in a fluent, multi-sentence paragraph, suitable for a release note blog post. The `blogPrompt` constant is used to frame the conversation with the language model, emphasizing the need for a coherent, engaging, and informative summary. The generated release notes are then returned as part of the `History` slice. This is a Go source file, `talk.go`, added to the `go-libs/llnotes` package. The file introduces several types and functions to facilitate communication with a language model through a Databricks endpoint. The `Settings` struct holds the necessary configuration to instantiate a new `llNotes` instance, including Databricks and GitHub configurations, the organization, repository names, commit references, and the identifier for the language model. The `httpclient` package is used to create an `httpclient.ApiClient` for communication, while the `github` and `databricks-sdk-go` packages are utilized to interact with the GitHub and Databricks APIs, respectively. The `New(cfg *Settings)` function initializes a new `llNotes` instance and sets the Databricks HTTP timeout to 300 seconds. It then returns a pointer to the new `llNotes` instance. The `llNotes` struct contains a `databricks.WorkspaceClient`, `github.GitHubClient`, `httpclient.ApiClient`, the language model identifier, the GitHub organization, and repository names. The `Talk(ctx context.Context, h History)` method is defined on the `llNotes` struct and sends a query to the specified language model using the Databricks API. The method returns a response containing the language model's generated content as part of the `History` slice. If an error occurs, the method returns an error. The `go.work` file is used to configure the Go workspace, which allows managing dependencies and build settings for Go projects. This specific change modifies the `use` section of the `go.work` file, adding a new entry for the `./llnotes` directory. This change indicates that the `llnotes` directory is now part of the Go workspace, allowing other modules in the workspace to import and use packages within the `llnotes` directory. Overall, this change integrates the `llnotes` package into the Go workspace, making it available for other modules within the workspace to use. This change creates a new file, `README.md`, in the `llnotes` directory. The `README.md` file provides documentation for the `llnotes` package. The file starts with YAML front matter, which sets the title, language, author, date, and tags for the documentation. The main content begins with a header, "Generate GitHub release notes with LLMs hosted on Databricks Model Serving", which is also the title. The documentation provides a brief description of the functionality provided by the `llnotes` package, which is to generate GitHub release notes using a large language model (LLM) hosted on Databricks Model Serving. The `README.md` file follows the Markdown format, which allows for formatting and styling the text with headers, paragraphs, and other markdown elements. The `README.md` is an important file for documenting and introducing the purpose and functionality of the `llnotes` package. This change creates a `go.mod` file for the `llnotes` package. The `module` line specifies the module's name as `github.com/databrickslabs/sandbox/llnotes`. The `go` line specifies the required Go version as `1.21.0`. The `require` section lists the required dependencies and their specific versions. The dependencies for this package are: * `github.com/databricks/databricks-sdk-go` version `v0.33.0` * `github.com/sourcegraph/go-diff` version `v0.7.0` * `github.com/spf13/pflag` version `v1.0.5` The `go.mod` file is automatically generated by the `go` command, and it specifies the package's dependencies and their versions. This information is used for dependency management and build isolation in Go modules. The `go.mod` file is essential for the proper functioning and management of the `llnotes` package and its dependencies. This change creates a new file, `main.go`, in the `llnotes` package. The `main.go` file contains the `main()` function, which is the entry point for the package. The `main()` function initializes a context, sets the product name and version, and then initializes and runs a new `lite.Init[llnotes.Settings]` instance. The `lite.Init[llnotes.Settings]` instance creates a new root command, adds two subcommands, "pull-request" and "release-notes", and then runs the root command. * The "pull-request" subcommand extracts a pull request number and calls the `llnotes.PullRequest()` function, printing the summary of the pull request. * The "release-notes" subcommand calls the `llnotes.ReleaseNotes()` function, printing the release notes. The `lite.New[llnotes.Settings](...)` function initializes a new `lite` CLI instance with the specified configuration and subcommands. The `lite.Command[llnotes.Settings, req]{...}` instances define the two subcommands, their flags, and their corresponding handlers. The `main.go` file provides the `llnotes` package with a user-facing CLI. The CLI allows users to generate GitHub release notes and pull request summaries using a large language model (LLM) hosted on Databricks Model Serving.
1 parent 980bc32 commit 12d87fc

File tree

24 files changed

+1389
-12
lines changed

24 files changed

+1389
-12
lines changed
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
name: Release llnotes
2+
3+
on:
4+
push:
5+
tags:
6+
- 'llnotes/v*'
7+
8+
jobs:
9+
publish:
10+
runs-on: ubuntu-latest
11+
permissions:
12+
id-token: write
13+
contents: write
14+
steps:
15+
- uses: actions/checkout@v3
16+
17+
- name: Setup Go
18+
uses: actions/setup-go@v5
19+
with:
20+
go-version: 1.21
21+
22+
- name: Build
23+
run: /bin/bash .github/workflows/build.sh llnotes
24+
25+
- name: Compress
26+
run: /bin/bash .github/workflows/compress.sh
27+
28+
- name: Create release
29+
uses: softprops/action-gh-release@v1
30+
with:
31+
files: dist/*

.github/workflows/push-llnotes.yml

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
name: build-llnotes
2+
3+
on:
4+
pull_request:
5+
types: [opened, synchronize]
6+
paths: ['llnotes/**']
7+
8+
permissions:
9+
id-token: write
10+
contents: read
11+
pull-requests: write
12+
13+
jobs:
14+
tests:
15+
runs-on: ubuntu-latest
16+
steps:
17+
- name: Checkout
18+
uses: actions/checkout@v4
19+
- name: Setup Go
20+
uses: actions/setup-go@v5
21+
with:
22+
go-version: 1.21
23+
- name: Install Tools
24+
run: go install gotest.tools/gotestsum@latest
25+
- name: Test
26+
working-directory: llnotes
27+
run: make test
28+
29+
fmt:
30+
runs-on: ubuntu-latest
31+
steps:
32+
- name: Checkout
33+
uses: actions/checkout@v4
34+
- name: Setup Go
35+
uses: actions/setup-go@v4
36+
with:
37+
go-version: 1.21.0
38+
- name: Install Tools
39+
run: |
40+
go install golang.org/x/tools/cmd/goimports@latest
41+
go install honnef.co/go/tools/cmd/staticcheck@latest
42+
- name: Run make fmt
43+
working-directory: llnotes
44+
run: make fmt
45+
- name: Run go mod tidy
46+
working-directory: llnotes
47+
run: go mod tidy
48+
- name: Fail on differences
49+
run: |
50+
# Exit with status code 1 if there are differences (i.e. unformatted files)
51+
git diff --exit-code

Makefile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,7 @@ fmt-go-libs:
3535
fmt-acceptance:
3636
cd acceptance && make fmt
3737

38-
fmt: fmt-acceptance fmt-go-libs
38+
fmt-llnotes:
39+
cd llnotes && make fmt
40+
41+
fmt: fmt-acceptance fmt-go-libs fmt-llnotes

NOTICE

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,11 @@ This Software contains code from the following projects, licensed under the BSD-
1010

1111
go-github
1212
Copyright 2013 The go-github AUTHORS. All rights reserved.
13-
License - https://github.com/google/go-github/blob/master/LICENSE
13+
License - https://github.com/google/go-github/blob/master/LICENSE
14+
15+
This Software contains code from the following projects, licensed under the MIT license:
16+
17+
go-diff
18+
Copyright (c) 2014 Sourcegraph, Inc.
19+
Copyright (c) 2012 Matias Bordese
20+
https://github.com/sourcegraph/go-diff/blob/master/LICENSE

go-libs/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ coverage: test
1616

1717
vendor:
1818
@echo "✓ Filling vendor folder with library code ..."
19+
@go mod tidy
1920
@go mod vendor
2021

2122
.PHONY: build vendor coverage test fmt

go-libs/github/github.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,13 @@ func (c *GitHubClient) ListCommits(ctx context.Context, org, repo string, req *L
211211
})
212212
}
213213

214+
func (c *GitHubClient) GetCommit(ctx context.Context, org, repo string, sha string) (*RepositoryCommit, error) {
215+
var res RepositoryCommit
216+
path := fmt.Sprintf("%s/repos/%s/%s/commits/%s", gitHubAPI, org, repo, sha)
217+
err := c.api.Do(ctx, "GET", path, httpclient.WithResponseUnmarshal(&res))
218+
return &res, err
219+
}
220+
214221
func (c *GitHubClient) CompareCommits(ctx context.Context, org, repo, base, head string) listing.Iterator[RepositoryCommit] {
215222
type response struct {
216223
Commits []RepositoryCommit `json:"commits,omitempty"`

go-libs/go.mod

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ require (
88
github.com/fatih/color v1.16.0
99
github.com/google/go-querystring v1.1.0
1010
github.com/nwidger/jsoncolor v0.3.2
11+
github.com/sourcegraph/go-diff v0.7.0
1112
github.com/spf13/cobra v1.8.0
1213
github.com/spf13/pflag v1.0.5
1314
github.com/spf13/viper v1.18.2

go-libs/go.sum

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,8 +111,12 @@ github.com/sagikazarmark/locafero v0.4.0 h1:HApY1R9zGo4DBgr7dqsTH/JJxLTTsOt7u6ke
111111
github.com/sagikazarmark/locafero v0.4.0/go.mod h1:Pe1W6UlPYUk/+wc/6KFhbORCfqzgYEpgQ3O5fPuL3H4=
112112
github.com/sagikazarmark/slog-shim v0.1.0 h1:diDBnUNK9N/354PgrxMywXnAwEr1QZcOr6gto+ugjYE=
113113
github.com/sagikazarmark/slog-shim v0.1.0/go.mod h1:SrcSrq8aKtyuqEI1uvTDTK1arOWRIczQRv+GVI1AkeQ=
114+
github.com/shurcooL/go v0.0.0-20180423040247-9e1955d9fb6e/go.mod h1:TDJrrUr11Vxrven61rcy3hJMUqaf/CLWYhHNPmT14Lk=
115+
github.com/shurcooL/go-goon v0.0.0-20170922171312-37c2f522c041/go.mod h1:N5mDOmsrJOB+vfqUK+7DmDyjhSLIIBnXo9lvZJj3MWQ=
114116
github.com/sourcegraph/conc v0.3.0 h1:OQTbbt6P72L20UqAkXXuLOj79LfEanQ+YQFNpLA9ySo=
115117
github.com/sourcegraph/conc v0.3.0/go.mod h1:Sdozi7LEKbFPqYX2/J+iBAM6HpqSLTASQIKqDmF7Mt0=
118+
github.com/sourcegraph/go-diff v0.7.0 h1:9uLlrd5T46OXs5qpp8L/MTltk0zikUGi0sNNyCpA8G0=
119+
github.com/sourcegraph/go-diff v0.7.0/go.mod h1:iBszgVvyxdc8SFZ7gm69go2KDdt3ag071iBaWPF6cjs=
116120
github.com/spf13/afero v1.11.0 h1:WJQKhtpdm3v2IzqG8VMqrr6Rf3UYpEF239Jy9wNepM8=
117121
github.com/spf13/afero v1.11.0/go.mod h1:GH9Y3pIexgf1MTIWtNGyogA5MwRIDXGUr+hbWNoBjkY=
118122
github.com/spf13/cast v1.6.0 h1:GEiTHELF+vaR5dhz3VqZfFSzZjYbgeKDpBxQVS4GYJ0=

go-libs/llnotes/announce.go

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
package llnotes
2+
3+
import (
4+
"context"
5+
"fmt"
6+
"strings"
7+
8+
"github.com/databricks/databricks-sdk-go/listing"
9+
"github.com/databricks/databricks-sdk-go/logger"
10+
)
11+
12+
var blogPrompt = MessageTemplate(`Do not hallucinate.
13+
You are professional technical writer and you receive draft release notes for {{.version}} of project called {{.repo.Name}} in a markdown format from multiple team members.
14+
Project can be described as "{{.repo.Description}}"
15+
16+
You write a long post announcement that takes at least 5 minutes to read, summarize the most important features, and mention them on top. Keep the markdown links when relevant.
17+
Do not use headings. Write fluent paragraphs, that are at least few sentences long. Blog post title should nicely summarize the feature increments of this release.
18+
19+
Don't abuse lists. paragraphs should have at least 3-4 sentences. The title should be one-sentence summary of the incremental updates for this release
20+
21+
You aim at driving more adoption of the project on Medium.`)
22+
23+
func (lln *llNotes) versionNotes(ctx context.Context, newVersion string) ([]string, error) {
24+
versions, err := listing.ToSlice(ctx, lln.gh.Versions(ctx, lln.org, lln.repo))
25+
if err != nil {
26+
return nil, fmt.Errorf("versions: %w", err)
27+
}
28+
if newVersion == "" {
29+
newVersion = versions[0].Version
30+
}
31+
prevVersion := "v0.0.0"
32+
for i, v := range versions {
33+
if v.Version == newVersion {
34+
prevVersion = versions[i+1].Version
35+
break
36+
}
37+
}
38+
return lln.ReleaseNotesDiff(ctx, prevVersion, newVersion)
39+
}
40+
41+
func (lln *llNotes) Announce(ctx context.Context, newVersion string) (History, error) {
42+
notes, err := lln.versionNotes(ctx, newVersion)
43+
if err != nil {
44+
return nil, fmt.Errorf("parallel: %w", err)
45+
}
46+
repo, err := lln.gh.GetRepo(ctx, lln.org, lln.repo)
47+
if err != nil {
48+
return nil, fmt.Errorf("get repo: %w", err)
49+
}
50+
rawNotes := strings.Join(notes, "\n")
51+
logger.Debugf(ctx, "Raw notes: %s", rawNotes)
52+
return lln.Talk(ctx, History{
53+
blogPrompt.AsSystem(map[string]any{
54+
"version": newVersion,
55+
"repo": repo,
56+
}),
57+
UserMessage(rawNotes),
58+
})
59+
}

go-libs/llnotes/chain.go

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
package llnotes
2+
3+
import (
4+
"fmt"
5+
"strings"
6+
7+
"github.com/databricks/databricks-sdk-go/service/serving"
8+
"github.com/databrickslabs/sandbox/go-libs/sed"
9+
)
10+
11+
type message interface {
12+
ChatMessage() serving.ChatMessage
13+
}
14+
15+
type SystemMessage string
16+
17+
func (m SystemMessage) ChatMessage() serving.ChatMessage {
18+
return serving.ChatMessage{
19+
Role: serving.ChatMessageRoleSystem,
20+
Content: string(m),
21+
}
22+
}
23+
24+
type UserMessage string
25+
26+
func (m UserMessage) ChatMessage() serving.ChatMessage {
27+
return serving.ChatMessage{
28+
Role: serving.ChatMessageRoleUser,
29+
Content: string(m),
30+
}
31+
}
32+
33+
type AssistantMessage string
34+
35+
func (m AssistantMessage) ChatMessage() serving.ChatMessage {
36+
return serving.ChatMessage{
37+
Role: serving.ChatMessageRoleAssistant,
38+
Content: string(m),
39+
}
40+
}
41+
42+
type History []message
43+
44+
func (h History) Messages() (out []serving.ChatMessage) {
45+
for _, v := range h {
46+
out = append(out, v.ChatMessage())
47+
}
48+
return
49+
}
50+
51+
func (h History) messageTokens(m message) int {
52+
// this is good enough approximation of message token count
53+
content := m.ChatMessage().Content
54+
return len(strings.Split(content, " "))
55+
}
56+
57+
func (h History) truncated(m message, maxTokens int) message {
58+
// this is good enough approximation of message token count
59+
cm := m.ChatMessage()
60+
tokens := strings.Split(cm.Content, " ")
61+
if len(tokens) < maxTokens {
62+
return m
63+
}
64+
switch m.(type) {
65+
case UserMessage:
66+
return UserMessage(strings.Join(tokens[:maxTokens-100], " "))
67+
case SystemMessage:
68+
return SystemMessage(strings.Join(tokens[:maxTokens-100], " "))
69+
case AssistantMessage:
70+
return AssistantMessage(strings.Join(tokens[:maxTokens-100], " "))
71+
}
72+
panic("cannot truncate message")
73+
}
74+
75+
func (h History) totalTokens() int {
76+
totalTokens := 0
77+
for _, m := range h {
78+
totalTokens += h.messageTokens(m)
79+
}
80+
return totalTokens
81+
}
82+
83+
func (h History) With(m message) History {
84+
maxContextSize := 32768
85+
increment := h.messageTokens(m)
86+
if increment > maxContextSize {
87+
m = h.truncated(m, 32000)
88+
}
89+
return append(h, m)
90+
}
91+
92+
func (h History) Last() string {
93+
return h[len(h)-1].ChatMessage().Content
94+
}
95+
96+
func (h History) Excerpt(n int) string {
97+
var out []string
98+
oneLine := sed.Rule(`\n|\s+`, ` `)
99+
for i, v := range h {
100+
m := v.ChatMessage()
101+
out = append(out, fmt.Sprintf("(%d/%d) %s: %s",
102+
i+1, len(h),
103+
strings.ToUpper(m.Role.String()),
104+
h.onlyNBytes(oneLine.Apply(m.Content), n)))
105+
}
106+
return strings.Join(out, "\n")
107+
}
108+
109+
func (h History) onlyNBytes(j string, numBytes int) string {
110+
diff := len([]byte(j)) - numBytes
111+
if diff > 0 {
112+
return fmt.Sprintf("%s... (%d more bytes)", j[:numBytes], diff)
113+
}
114+
return j
115+
}

0 commit comments

Comments
 (0)