-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core[minor]: Add XML output parser #4258
Merged
Merged
Changes from 5 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
1715d0e
core[minor]: Add XML output parser
bracesproul a2d84e7
cr
bracesproul 5ec4e84
docs
bracesproul a145b4c
chore: lint files
bracesproul 438792e
cr
bracesproul 13e23f2
Merge branch 'main' into brace/xml-output-parser
bracesproul d4eccb8
streaming & docs
bracesproul 16c3a93
cr
bracesproul e52093d
Merge branch 'main' into brace/xml-output-parser
bracesproul c6dd550
chore: lint files
bracesproul File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
20 changes: 20 additions & 0 deletions
20
docs/core_docs/docs/modules/model_io/output_parsers/types/xml.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# XML output parser | ||
|
||
The `XMLOutputParser` takes language model output which contains XML and parses it into a JSON object. | ||
|
||
The output parser currently does _not_ support streaming results. | ||
|
||
## Usage | ||
|
||
import CodeBlock from "@theme/CodeBlock"; | ||
import XMLExample from "@examples/prompts/xml_output_parser.ts"; | ||
|
||
import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx"; | ||
|
||
<IntegrationInstallTooltip></IntegrationInstallTooltip> | ||
|
||
```bash npm2yarn | ||
npm install @langchain/core | ||
``` | ||
|
||
<CodeBlock language="typescript">{XMLExample}</CodeBlock> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
import { XMLOutputParser } from "@langchain/core/output_parsers"; | ||
|
||
const XML_EXAMPLE = `<?xml version="1.0" encoding="UTF-8"?> | ||
<userProfile> | ||
<userID>12345</userID> | ||
<name>John Doe</name> | ||
<email>john.doe@example.com</email> | ||
<roles> | ||
<role>Admin</role> | ||
<role>User</role> | ||
</roles> | ||
<preferences> | ||
<theme>Dark</theme> | ||
<notifications> | ||
<email>true</email> | ||
<sms>false</sms> | ||
</notifications> | ||
</preferences> | ||
</userProfile>`; | ||
|
||
type MySchema = { | ||
userProfile: { | ||
userID: number; | ||
name: string; | ||
email: string; | ||
roles: { role: string[] }; | ||
preferences: { | ||
theme: string; | ||
notifications: { | ||
email: boolean; | ||
sms: boolean; | ||
}; | ||
}; | ||
}; | ||
}; | ||
|
||
// Pass in a generic type for the schema | ||
const parser = new XMLOutputParser<MySchema>(); | ||
|
||
const result = await parser.invoke(XML_EXAMPLE); | ||
|
||
console.log(JSON.stringify(result, null, 2)); | ||
/* | ||
{ | ||
"userProfile": { | ||
"userID": 12345, | ||
"name": "John Doe", | ||
"email": "john.doe@example.com", | ||
"roles": { | ||
"role": [ | ||
"Admin", | ||
"User" | ||
] | ||
}, | ||
"preferences": { | ||
"theme": "Dark", | ||
"notifications": { | ||
"email": true, | ||
"sms": false | ||
} | ||
} | ||
} | ||
} | ||
*/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
import { XMLOutputParser } from "../xml.js"; | ||
|
||
const XML_EXAMPLE = `<?xml version="1.0" encoding="UTF-8"?> | ||
<userProfile> | ||
<userID>12345</userID> | ||
<name>John Doe</name> | ||
<email>john.doe@example.com</email> | ||
<roles> | ||
<role>Admin</role> | ||
<role>User</role> | ||
</roles> | ||
<preferences> | ||
<theme>Dark</theme> | ||
<notifications> | ||
<email>true</email> | ||
<sms>false</sms> | ||
</notifications> | ||
</preferences> | ||
</userProfile>`; | ||
|
||
const BACKTICK_WRAPPED_XML = `\`\`\`xml\n${XML_EXAMPLE}\n\`\`\``; | ||
|
||
type MySchema = { | ||
userProfile: { | ||
userID: number; | ||
name: string; | ||
email: string; | ||
roles: { role: string[] }; | ||
preferences: { | ||
theme: string; | ||
notifications: { | ||
email: boolean; | ||
sms: boolean; | ||
}; | ||
}; | ||
}; | ||
}; | ||
|
||
const expectedResult = { | ||
userProfile: { | ||
userID: 12345, | ||
name: "John Doe", | ||
email: "john.doe@example.com", | ||
roles: { role: ["Admin", "User"] }, | ||
preferences: { | ||
theme: "Dark", | ||
notifications: { | ||
email: true, | ||
sms: false, | ||
}, | ||
}, | ||
}, | ||
}; | ||
|
||
test("Can parse XML", async () => { | ||
const parser = new XMLOutputParser<MySchema>(); | ||
|
||
const result = await parser.invoke(XML_EXAMPLE); | ||
expect(result).toStrictEqual(expectedResult); | ||
}); | ||
|
||
test("Can parse backtick wrapped XML", async () => { | ||
const parser = new XMLOutputParser<MySchema>(); | ||
|
||
const result = await parser.invoke(BACKTICK_WRAPPED_XML); | ||
expect(result).toStrictEqual(expectedResult); | ||
}); | ||
|
||
test("Can format instructions with passed tags.", async () => { | ||
const tags = ["tag1", "tag2", "tag3"]; | ||
const parser = new XMLOutputParser<MySchema>({ tags }); | ||
|
||
const formatInstructions = parser.getFormatInstructions(); | ||
|
||
expect(formatInstructions).toContain("tag1, tag2, tag3"); | ||
}); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
import { XMLParser } from "fast-xml-parser"; | ||
import { BaseOutputParser } from "./base.js"; | ||
|
||
export const XML_FORMAT_INSTRUCTIONS = `The output should be formatted as a XML file. | ||
1. Output should conform to the tags below. | ||
2. If tags are not given, make them on your own. | ||
3. Remember to always open and close all the tags. | ||
|
||
As an example, for the tags ["foo", "bar", "baz"]: | ||
1. String "<foo>\n <bar>\n <baz></baz>\n </bar>\n</foo>" is a well-formatted instance of the schema. | ||
2. String "<foo>\n <bar>\n </foo>" is a badly-formatted instance. | ||
3. String "<foo>\n <tag>\n </tag>\n</foo>" is a badly-formatted instance. | ||
|
||
Here are the output tags: | ||
\`\`\ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing a backtick? |
||
{tags} | ||
\`\`\``; | ||
|
||
export interface XMLOutputParserFields { | ||
/** | ||
* Optional list of tags that the output should conform to. | ||
* Only used in formatting of the prompt. | ||
*/ | ||
tags?: string[]; | ||
} | ||
|
||
export class XMLOutputParser< | ||
// eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
T extends Record<string, any> = Record<string, any> | ||
> extends BaseOutputParser<T> { | ||
tags?: string[]; | ||
|
||
constructor(fields?: XMLOutputParserFields) { | ||
super(); | ||
this.tags = fields?.tags; | ||
} | ||
|
||
static lc_name() { | ||
return "XMLOutputParser"; | ||
} | ||
|
||
lc_namespace = ["langchain_core", "output_parsers"]; | ||
|
||
lc_serializable = true; | ||
|
||
async parse(text: string): Promise<T> { | ||
return parseXMLMarkdown<T>(text); | ||
} | ||
|
||
getFormatInstructions(): string { | ||
const withTags = !!(this.tags && this.tags.length > 0); | ||
return withTags | ||
? XML_FORMAT_INSTRUCTIONS.replace("{tags}", this.tags?.join(", ") ?? "") | ||
: XML_FORMAT_INSTRUCTIONS; | ||
} | ||
} | ||
|
||
export function parseXMLMarkdown< | ||
// eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
T extends Record<string, any> = Record<string, any> | ||
>(s: string) { | ||
const parser = new XMLParser(); | ||
const newString = s.trim(); | ||
// Try to find XML string within triple backticks. | ||
const match = /```(xml)?(.*)```/s.exec(newString); | ||
let parsedResult: T; | ||
if (!match) { | ||
// If match found, use the content within the backticks | ||
parsedResult = parser.parse(newString); | ||
} else { | ||
parsedResult = parser.parse(match[2]); | ||
} | ||
|
||
if (parsedResult && "?xml" in parsedResult) { | ||
delete parsedResult["?xml"]; | ||
} | ||
return parsedResult; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there! I noticed that the recent PR added a new dependency "sax" and changed the version of "p-retry". This might impact the project's peer/dev/hard dependencies. I'm flagging this for your review. Keep up the great work!