Skip to content
This repository was archived by the owner on Jan 2, 2025. It is now read-only.

Conversation

@calyptobai
Copy link
Contributor

Remove code chunks instead of redacting them. This should hopefully reduce the frequency at which we see [REDACTED] in model output.

Closes BLO-1842

Copy link
Contributor

@ggordonhall ggordonhall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In testing this works well, but this is largely because the model rarely generates either QuotedCode or GeneratedCode: it prefers markdown code blocks so it's tricky to evaluate the effects of this change.

)
}
})
let xml = fixup_xml_code(xml);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this work at all if we're returning an empty string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to do this for partially generated messages. E.g. if a message generates with <GeneratedCode><Code>foo, we need to call fixup_xml_code to complete the block so that it can still be parsed on the line right below. If parsing fails here (e.g. there is a different kind of XML block, perhaps HTML in the markdown), we know the block is not a code chunk and should be kept.

@calyptobai calyptobai merged commit 2635185 into main Nov 16, 2023
@calyptobai calyptobai deleted the remove-redacted branch November 16, 2023 20:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants