Skip to content

Commit 4bc5538

Browse files
.Net: Updates for ChatMessageContent to support complex content model (#4113)
### Motivation and Context <!-- Thank you for your contribution to the semantic-kernel repo! Please help reviewers and future users, providing the following information: 1. Why is this change required? 2. What problem does it solve? 3. What scenario does it contribute to? 4. If it fixes an open issue, please link to the issue here. --> Resolves: #3874 Resolves: #3522 Resolves: #3768 This PR contains changes to prepare public API of `ChatMessageContent` to support extensible content model in the future. With latest OpenAI API it's possible to have content as `string` or `array` of content objects (scenario for GPT vision): ![image](https://github.com/microsoft/semantic-kernel/assets/13853051/53817e93-1cdc-4b81-aab9-0d6369e9f1e6) ### Description <!-- Describe your changes, the overall approach, the underlying design. These notes will help understanding how your code works. Thanks! --> 1. Added `ChatMessageContentItemCollection` to store multiple contents of different types in chat message. 2. Added `ChatMessageContentItemCollection? Items` property to `ChatMessageContent`, made `string? Content` property nullable. 3. Added `ImageContent` class that stores image URI. 4. Added ADR and diagram of chat and text content models. 5. Added `Example74_GPTVision` to demonstrate the functionality with GPT Vision model. ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄 --------- Co-authored-by: SergeyMenshykh <68852919+SergeyMenshykh@users.noreply.github.com>
1 parent 2000338 commit 4bc5538

File tree

14 files changed

+739
-12
lines changed

14 files changed

+739
-12
lines changed
+305
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
---
2+
# These are optional elements. Feel free to remove any of them.
3+
status: accepted
4+
contact: dmytrostruk
5+
date: 2023-12-08
6+
deciders: SergeyMenshykh, markwallace, rbarreto, mabolan, stephentoub, dmytrostruk
7+
consulted:
8+
informed:
9+
---
10+
# Chat Models
11+
12+
## Context and Problem Statement
13+
14+
In latest OpenAI API, `content` property of `chat message` object can accept two types of values `string` or `array` ([Documentation](https://platform.openai.com/docs/api-reference/chat/create)).
15+
16+
We should update current implementation of `ChatMessageContent` class with `string Content` property to support this API.
17+
18+
## Decision Drivers
19+
20+
1. New design should not be coupled to OpenAI API and should work for other AI providers.
21+
2. Naming of classes and properties should be consistent and intuitive.
22+
23+
## Considered Options
24+
25+
Some of the option variations can be combined.
26+
27+
### Option #1: Naming updates and new data type for `chat message content`
28+
29+
Since `chat message content` can be an object now instead of `string`, it requires reserved name for better understanding in domain.
30+
31+
1. `ChatMessageContent` will be renamed to `ChatMessage`. (Same for `StreamingChatMessageContent`).
32+
2. `GetChatMessageContent` methods will be renamed to `GetChatMessage`.
33+
3. New abstract class `ChatMessageContent` that will have property `ChatMessageContentType Type` with values `text`, `image`. (Will be extended with `audio`, `video` in the future).
34+
4. `ChatMessage` will contain collection of `ChatMessageContent` objects `IList<ChatMessageContent> Contents`.
35+
5. There will be concrete implementations of `ChatMessageContent` - `ChatMessageTextContent` and `ChatMessageImageContent`.
36+
37+
New _ChatMessageContentType.cs_
38+
39+
```csharp
40+
public readonly struct ChatMessageContentType : IEquatable<ChatMessageContentType>
41+
{
42+
public static ChatMessageContentType Text { get; } = new("text");
43+
44+
public static ChatMessageContentType Image { get; } = new("image");
45+
46+
public string Label { get; }
47+
48+
// Implementation of `IEquatable`...
49+
}
50+
```
51+
52+
New _ChatMessageContent.cs_
53+
54+
```csharp
55+
public abstract class ChatMessageContent
56+
{
57+
public ChatMessageContentType Type { get; set; }
58+
59+
public ChatMessageContent(ChatMessageContentType type)
60+
{
61+
this.Type = type;
62+
}
63+
}
64+
```
65+
66+
Updated _ChatMessage.cs_:
67+
68+
```csharp
69+
public class ChatMessage : ContentBase
70+
{
71+
public AuthorRole Role { get; set; }
72+
73+
public IList<ChatMessageContent> Contents { get; set; }
74+
```
75+
76+
New _ChatMessageTextContent.cs_
77+
78+
```csharp
79+
public class ChatMessageTextContent : ChatMessageContent
80+
{
81+
public string Text { get; set; }
82+
83+
public ChatMessageTextContent(string text) : base(ChatMessageContentType.Text)
84+
{
85+
this.Text = text;
86+
}
87+
}
88+
```
89+
90+
New _ChatMessageImageContent.cs_
91+
92+
```csharp
93+
public class ChatMessageImageContent : ChatMessageContent
94+
{
95+
public Uri Uri { get; set; }
96+
97+
public ChatMessageImageContent(Uri uri) : base(ChatMessageContentType.Image)
98+
{
99+
this.Uri = uri;
100+
}
101+
}
102+
```
103+
104+
Usage:
105+
106+
```csharp
107+
var chatHistory = new ChatHistory("You are friendly assistant.");
108+
109+
// Construct request
110+
var userContents = new List<ChatMessageContent>
111+
{
112+
new ChatMessageTextContent("What's in this image?"),
113+
new ChatMessageImageContent(new Uri("https://link-to-image.com"))
114+
};
115+
116+
chatHistory.AddUserMessage(userContents);
117+
118+
// Get response
119+
var message = await chatCompletionService.GetChatMessageAsync(chatHistory);
120+
121+
foreach (var content in message.Contents)
122+
{
123+
// Possibility to get content type (text or image).
124+
var contentType = content.Type;
125+
126+
// Cast for specific content type
127+
// Extension methods can be provided for better usability
128+
// (e.g. message GetContent<ChatMessageTextContent>()).
129+
if (content is ChatMessageTextContent textContent)
130+
{
131+
Console.WriteLine(textContent);
132+
}
133+
134+
if (content is ChatMessageImageContent imageContent)
135+
{
136+
Console.WriteLine(imageContent.Uri);
137+
}
138+
}
139+
```
140+
141+
### Option #2: Avoid renaming and new data type for `chat message content`
142+
143+
Same as Option #1, but without naming changes. In order to differentiate actual `chat message` and `chat message content`:
144+
145+
- `Chat Message` will be `ChatMessageContent` (as it is right now).
146+
- `Chat Message Content` will be `ChatMessageContentItem`.
147+
148+
1. New abstract class `ChatMessageContentItem` that will have property `ChatMessageContentItemType Type` with values `text`, `image`. (Will be extended with `audio`, `video` in the future).
149+
2. `ChatMessageContent` will contain collection of `ChatMessageContentItem` objects `IList<ChatMessageContentItem> Items`.
150+
3. There will be concrete implementations of `ChatMessageContentItem` - `ChatMessageTextContentItem` and `ChatMessageImageContentItem`.
151+
152+
New _ChatMessageContentItemType.cs_
153+
154+
```csharp
155+
public readonly struct ChatMessageContentItemType : IEquatable<ChatMessageContentItemType>
156+
{
157+
public static ChatMessageContentItemType Text { get; } = new("text");
158+
159+
public static ChatMessageContentItemType Image { get; } = new("image");
160+
161+
public string Label { get; }
162+
163+
// Implementation of `IEquatable`...
164+
}
165+
```
166+
167+
New _ChatMessageContentItem.cs_
168+
169+
```csharp
170+
public abstract class ChatMessageContentItem
171+
{
172+
public ChatMessageContentItemType Type { get; set; }
173+
174+
public ChatMessageContentItem(ChatMessageContentItemType type)
175+
{
176+
this.Type = type;
177+
}
178+
}
179+
```
180+
181+
Updated _ChatMessageContent.cs_:
182+
183+
```csharp
184+
public class ChatMessageContent : ContentBase
185+
{
186+
public AuthorRole Role { get; set; }
187+
188+
public IList<ChatMessageContentItem> Items { get; set; }
189+
```
190+
191+
New _ChatMessageTextContentItem.cs_
192+
193+
```csharp
194+
public class ChatMessageTextContentItem : ChatMessageContentItem
195+
{
196+
public string Text { get; set; }
197+
198+
public ChatMessageTextContentItem(string text) : base(ChatMessageContentType.Text)
199+
{
200+
this.Text = text;
201+
}
202+
}
203+
```
204+
205+
New _ChatMessageImageContent.cs_
206+
207+
```csharp
208+
public class ChatMessageImageContentItem : ChatMessageContentItem
209+
{
210+
public Uri Uri { get; set; }
211+
212+
public ChatMessageImageContentItem(Uri uri) : base(ChatMessageContentType.Image)
213+
{
214+
this.Uri = uri;
215+
}
216+
}
217+
```
218+
219+
Usage:
220+
221+
```csharp
222+
var chatHistory = new ChatHistory("You are friendly assistant.");
223+
224+
// Construct request
225+
var userContentItems = new List<ChatMessageContentItem>
226+
{
227+
new ChatMessageTextContentItem("What's in this image?"),
228+
new ChatMessageImageContentItem(new Uri("https://link-to-image.com"))
229+
};
230+
231+
chatHistory.AddUserMessage(userContentItems);
232+
233+
// Get response
234+
var message = await chatCompletionService.GetChatMessageContentAsync(chatHistory);
235+
236+
foreach (var contentItem in message.Items)
237+
{
238+
// Possibility to get content type (text or image).
239+
var contentItemType = contentItem.Type;
240+
241+
// Cast for specific content type
242+
// Extension methods can be provided for better usability
243+
// (e.g. message GetContent<ChatMessageTextContentItem>()).
244+
if (contentItem is ChatMessageTextContentItem textContentItem)
245+
{
246+
Console.WriteLine(textContentItem);
247+
}
248+
249+
if (contentItem is ChatMessageImageContentItem imageContentItem)
250+
{
251+
Console.WriteLine(imageContentItem.Uri);
252+
}
253+
}
254+
```
255+
256+
### Option #3: Add new property to `ChatMessageContent` - collection of content items
257+
258+
This option will keep `string Content` property as it is, but will add new property - collection of `ContentBase` items.
259+
260+
Updated _ChatMessageContent.cs_
261+
262+
```csharp
263+
public class ChatMessageContent : ContentBase
264+
{
265+
public AuthorRole Role { get; set; }
266+
267+
public string? Content { get; set; }
268+
269+
public ChatMessageContentItemCollection? Items { get; set; }
270+
}
271+
```
272+
273+
New _ChatMessageContentItemCollection.cs_
274+
275+
```csharp
276+
public class ChatMessageContentItemCollection : IList<ContentBase>, IReadOnlyList<ContentBase>
277+
{
278+
// Implementation of IList<ContentBase>, IReadOnlyList<ContentBase> to catch null values.
279+
}
280+
```
281+
282+
Usage:
283+
284+
```csharp
285+
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
286+
287+
var chatHistory = new ChatHistory("You are a friendly assistant.");
288+
289+
chatHistory.AddUserMessage(new ChatMessageContentItemCollection
290+
{
291+
new TextContent("What’s in this image?"),
292+
new ImageContent(new Uri(ImageUri))
293+
});
294+
295+
var reply = await chatCompletionService.GetChatMessageContentAsync(chatHistory);
296+
297+
Console.WriteLine(reply.Content);
298+
```
299+
300+
## Decision Outcome
301+
302+
Option #3 was preferred as it requires small amount of changes to existing hierarchy and provides clean usability for end-user.
303+
304+
Diagram:
305+
![Chat and Text models diagram](diagrams/chat-text-models.png)

0 commit comments

Comments
 (0)