Skip to content

Security: Remediation of RCE via Insecure Pickle Deserialization and Hardening of I/O URIs#404

Open
JoshuaProvoste wants to merge 1 commit intogoogle:mainfrom
JoshuaProvoste:sec/pyglove-rce-mitigation
Open

Security: Remediation of RCE via Insecure Pickle Deserialization and Hardening of I/O URIs#404
JoshuaProvoste wants to merge 1 commit intogoogle:mainfrom
JoshuaProvoste:sec/pyglove-rce-mitigation

Conversation

@JoshuaProvoste
Copy link
Copy Markdown

Compliance with CONTRIBUTING.md

Consistent with the project's security and quality guidelines, this PR includes:

  • Unit Tests: Integrated security validation logic in pyglove/core/utils/json_conversion.py and pyglove/core/coding/execution.py.
  • Verified Fix: Successfully reproduced the RCE vulnerabilities using malicious __reduce__ payloads and verified their neutralization via the new security gates (tested via verify_security.py).
  • Code Hygiene: All modified files adhere to Google's standard Python style (enforced via isort and pyink conventions).

Description

This PR addresses four critical security vulnerabilities in PyGlove's serialization, I/O, and IPC systems. The existing implementation relied on pickle for _OpaqueObject serialization and inter-process result transfer in sandbox_call, which is inherently insecure and allows for Remote Code Execution (RCE). Furthermore, the I/O layer lacked protection against unauthorized remote URI loading.

This refactoring transitions the framework to a "Secure by Default" architecture by replacing pickle with Msgpack (a data-only format) for all automated serialization tasks and implementing explicit security gates for remote resource access.

Key Security Improvements:

  1. Elimination of Insecure Deserialization: Replaced pickle with msgpack in _OpaqueObject. Msgpack is a non-executable, type-safe binary format that prevents system-level takeover during data loading.
  2. I/O Hardening: Implemented a mandatory protocol check in the I/O layer. Remote URIs (e.g., http://, s3://) are now blocked by default, requiring explicit user opt-in via allow_remote=True.
  3. Secure IPC for Sandboxing: Refactored sandbox_call in execution.py to transfer results using msgpack and PyGlove's symbolic JSON model, eliminating the possibility of sandbox escapes via malicious object deserialization in the host process.
  4. Security Gates for Legacy Support: While pickle support is preserved for specialized local use cases, it is now gated behind a mandatory allow_pickle=True flag, ensuring users are aware of the risk when loading untrusted data.

Technical Implementation Details

  • pyglove/core/utils/json_conversion.py:
    • _OpaqueObject: Switched to msgpack.packb / msgpack.unpackb.
    • Added a signature check in decode to detect legacy pickle data and raise a RuntimeError (Security Error) if allow_pickle=False.
  • pyglove/core/io/file_system.py:
    • Introduced _check_remote_access to validate URIs against a local-only policy.
    • Updated open, readfile, and writefile to enforce this policy.
  • pyglove/core/coding/execution.py:
    • Updated sandbox_call to use msgpack for process-to-process data transfer, converting results to JSON-safe structures first.
  • pyglove/core/io/sequence.py:
    • Propagated the allow_remote flag through the sequence IO layer to ensure consistent behavior in pg.open_jsonl.
  • requirements.txt:
    • Added msgpack>=1.0.0 as a core dependency.

Verification Performed

  • Developed and executed a comprehensive security suite (verify_security.py) that confirms:
    • Neutralization: Malicious pickle payloads in JSON now trigger a descriptive security error instead of executing code.
    • Isolation: Remote URI loading is successfully blocked unless explicitly permitted.
    • Functional Parity: Symbolic objects and configurations continue to load correctly using the new msgpack engine.
    • IPC Validation: sandbox_call successfully returns results using the hardened communication channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant