Skip to content

refactor: let Artifact validator handle base64 decoding across all providers#139

Merged
Kamilbenkirane merged 2 commits intomainfrom
fix/artifact-base64-cleanup
Feb 9, 2026
Merged

refactor: let Artifact validator handle base64 decoding across all providers#139
Kamilbenkirane merged 2 commits intomainfrom
fix/artifact-base64-cleanup

Conversation

@Kamilbenkirane
Copy link
Member

Summary

  • Remove manual base64.b64decode() calls across 6 provider clients
  • Pass base64 strings directly to ImageArtifact/VideoArtifact/AudioArtifact(data=...) — the Artifact field validator automatically decodes base64 strings to bytes
  • Remove unused import base64 from 5 files

Files changed

  • modalities/images/providers/openai/client.py (2 instances)
  • modalities/images/providers/xai/client.py (1 instance)
  • modalities/images/providers/ollama/client.py (2 instances)
  • modalities/images/providers/byteplus/client.py (2 instances)
  • modalities/videos/providers/openai/client.py (1 instance)
  • modalities/audio/providers/google/client.py (1 instance)

Test plan

  • All 472 unit tests pass
  • All pre-commit hooks pass (ruff, mypy, bandit, tests)

Closes #138

🤖 Generated with Claude Code

…oviders

Remove manual base64.b64decode() calls and pass base64 strings directly
to ImageArtifact/VideoArtifact/AudioArtifact(data=...). The Artifact
field validator automatically decodes base64 strings to bytes.

Closes #138

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 9, 2026

Code review

Critical Issue: Missing base64 decoder validator

The entire premise of this PR is incorrect. The PR description claims:

"the Artifact field validator automatically decodes base64 strings to bytes"

This validator does not exist.

Looking at src/celeste/artifacts.py, the Artifact class has:

  • data: bytes | None = None (line 28) - a plain bytes type annotation
  • @field_serializer("data", when_used="json") (line 33) - this serializes bytes to base64 for JSON output
  • No @field_validator that converts base64 strings to bytes on input

What will actually happen

When Pydantic v2 receives a str for a bytes field without a validator, it performs UTF-8 encoding: value.encode("utf-8"). This means a base64 string like "iVBORw0KGgo=" would be stored as the literal ASCII bytes b"iVBORw0KGgo=" (the base64 text itself), NOT as the decoded binary content.

This will cause silent data corruption - the artifact's data field will contain base64 ASCII text bytes instead of the actual image/video/audio binary data.

Impact

All 9 changes across 6 files would produce corrupted outputs:

  • src/celeste/modalities/audio/providers/google/client.py (line 68)
  • src/celeste/modalities/images/providers/byteplus/client.py (lines 82, 163-164)
  • src/celeste/modalities/images/providers/ollama/client.py (lines 55, 116)
  • src/celeste/modalities/images/providers/openai/client.py (lines 63, 144)
  • src/celeste/modalities/images/providers/xai/client.py (line 85)
  • src/celeste/modalities/videos/providers/openai/client.py (line 57)

Required fix

The original base64.b64decode() calls are actually necessary. To achieve the goal of centralizing base64 decoding logic, you would need to:

  1. Add a @field_validator to the Artifact class that detects base64 strings and decodes them to bytes
  2. Then remove the manual base64.b64decode() calls from providers

Note: Two files already have this bug pre-existing (google/imagen.py line 63 and videos/providers/google/client.py line 59), which may have created the false impression this pattern was intentional.

Add @field_validator("data", mode="before") to Artifact base class that
automatically decodes base64 strings to bytes. This is the counterpart
to the existing @field_serializer that encodes bytes to base64 for JSON
output. Without this validator, passing a base64 string to data= would
cause Pydantic to UTF-8 encode it, storing literal base64 text as bytes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Kamilbenkirane Kamilbenkirane merged commit 7c7ac9e into main Feb 9, 2026
11 checks passed
@Kamilbenkirane Kamilbenkirane mentioned this pull request Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: let Artifact validator handle base64 decoding across all providers

1 participant