Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 21, 2025

📄 19% (0.19x) speedup for parse_uri_to_path in skyvern/forge/sdk/api/files.py

⏱️ Runtime : 357 microseconds 299 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 19% speedup by avoiding unnecessary URL decoding when no percent-encoded characters are present. The key optimization is adding a conditional check if '%' in path: before calling the decoding function.

What specific optimizations were applied:

  1. Conditional decoding: Only decode URLs that actually contain percent-encoded characters (indicated by '%')
  2. More efficient decoding: Use unquote_to_bytes() followed by UTF-8 decoding instead of unquote() for better performance when decoding is needed

Why this leads to speedup:

  • The original code always calls unquote() on every path, even when no decoding is needed
  • The optimized version skips the expensive decoding operation for paths without '%' characters (78% of test cases in the profile)
  • When decoding is needed, unquote_to_bytes() + decode() is more efficient than unquote() alone

Performance impact based on test results:

  • Simple paths (no encoding): 8-17% faster due to avoiding unnecessary decoding
  • Percent-encoded paths: 28-50% faster due to more efficient decoding method
  • Large-scale encoded paths: Up to 34% faster, showing the optimization scales well

Impact on existing workloads:
The function is called from download_file() when processing file:// URIs in local environments. Since file paths often don't contain percent-encoded characters, this optimization will provide consistent performance improvements for local file downloads without any behavioral changes.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 54 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from urllib.parse import unquote, urlparse

# imports
import pytest  # used for our unit tests
from skyvern.forge.sdk.api.files import parse_uri_to_path

# unit tests

# 1. Basic Test Cases

def test_basic_local_file_uri():
    # Local file URI with no netloc, simple path
    uri = "file:///home/user/file.txt"
    expected = "/home/user/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 4.06μs -> 3.62μs (12.2% faster)

def test_basic_windows_file_uri():
    # Windows-style file URI with netloc (drive letter)
    uri = "file://C:/Windows/System32/drivers/etc/hosts"
    expected = "C:/Windows/System32/drivers/etc/hosts"
    codeflash_output = parse_uri_to_path(uri) # 3.90μs -> 3.43μs (13.7% faster)

def test_basic_network_file_uri():
    # Network file URI, netloc as server name
    uri = "file://server/share/folder/file.txt"
    expected = "server/share/folder/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.72μs -> 3.27μs (13.9% faster)

def test_basic_uri_with_encoded_characters():
    # URI with percent-encoded characters
    uri = "file:///home/user/My%20Documents/file%20name.txt"
    expected = "/home/user/My Documents/file name.txt"
    codeflash_output = parse_uri_to_path(uri) # 10.3μs -> 6.90μs (50.0% faster)

def test_basic_uri_with_netloc_and_encoded_path():
    # URI with netloc and encoded path
    uri = "file://server/share%20name/file%20name.txt"
    expected = "server/share name/file name.txt"
    codeflash_output = parse_uri_to_path(uri) # 9.18μs -> 6.23μs (47.3% faster)

# 2. Edge Test Cases

def test_empty_uri():
    # Empty string should raise ValueError due to missing scheme
    with pytest.raises(ValueError):
        parse_uri_to_path("") # 3.64μs -> 3.67μs (0.736% slower)

def test_missing_scheme():
    # URI missing scheme should raise ValueError
    uri = "/home/user/file.txt"
    with pytest.raises(ValueError):
        parse_uri_to_path(uri) # 3.49μs -> 3.50μs (0.228% slower)

def test_wrong_scheme():
    # URI with wrong scheme should raise ValueError
    uri = "http:///home/user/file.txt"
    with pytest.raises(ValueError):
        parse_uri_to_path(uri) # 3.66μs -> 3.79μs (3.38% slower)

def test_file_uri_with_no_path():
    # file:// only, no path
    uri = "file://"
    expected = ""
    codeflash_output = parse_uri_to_path(uri) # 3.80μs -> 3.30μs (15.1% faster)

def test_file_uri_with_only_slash():
    # file:///
    uri = "file:///"
    expected = "/"
    codeflash_output = parse_uri_to_path(uri) # 3.64μs -> 3.16μs (15.2% faster)

def test_file_uri_with_dot_path():
    # file:///.
    uri = "file:///."
    expected = "/."
    codeflash_output = parse_uri_to_path(uri) # 3.61μs -> 3.27μs (10.6% faster)

def test_file_uri_with_double_slash_path():
    # file:////tmp/file.txt
    uri = "file:////tmp/file.txt"
    expected = "//tmp/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.45μs -> 3.27μs (5.38% faster)

def test_file_uri_with_trailing_slash():
    # file:///home/user/
    uri = "file:///home/user/"
    expected = "/home/user/"
    codeflash_output = parse_uri_to_path(uri) # 3.48μs -> 3.09μs (12.9% faster)

def test_file_uri_with_special_characters():
    # file:///home/user/file@#$.txt
    uri = "file:///home/user/file@#$.txt"
    expected = "/home/user/file@#$.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.69μs -> 3.29μs (11.9% faster)

def test_file_uri_with_unicode_characters():
    # file:///home/user/файл.txt (unicode in path)
    uri = "file:///home/user/%D1%84%D0%B0%D0%B9%D0%BB.txt"
    expected = "/home/user/файл.txt"
    codeflash_output = parse_uri_to_path(uri) # 12.3μs -> 8.70μs (41.4% faster)

def test_file_uri_with_query_and_fragment():
    # file:///home/user/file.txt?version=2#section
    # Query and fragment should be ignored
    uri = "file:///home/user/file.txt?version=2#section"
    expected = "/home/user/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.63μs -> 3.38μs (7.49% faster)

def test_file_uri_with_netloc_only():
    # file://server
    uri = "file://server"
    expected = "server"
    codeflash_output = parse_uri_to_path(uri) # 3.61μs -> 3.28μs (9.90% faster)

def test_file_uri_with_netloc_and_empty_path():
    # file://server/
    uri = "file://server/"
    expected = "server/"
    codeflash_output = parse_uri_to_path(uri) # 3.50μs -> 3.35μs (4.32% faster)

def test_file_uri_with_multiple_slashes_in_path():
    # file:///home//user///file.txt
    uri = "file:///home//user///file.txt"
    expected = "/home//user///file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.48μs -> 3.21μs (8.34% faster)

def test_file_uri_with_semicolon_in_path():
    # file:///home/user/file;v2.txt
    uri = "file:///home/user/file;v2.txt"
    expected = "/home/user/file;v2.txt"
    codeflash_output = parse_uri_to_path(uri) # 5.51μs -> 5.02μs (9.85% faster)

def test_file_uri_with_colon_in_path():
    # file:///home/user/file:v2.txt
    uri = "file:///home/user/file:v2.txt"
    expected = "/home/user/file:v2.txt"
    codeflash_output = parse_uri_to_path(uri) # 4.03μs -> 3.63μs (11.0% faster)

def test_file_uri_with_ipv6_netloc():
    # file://[2001:db8::1]/share/file.txt
    uri = "file://[2001:db8::1]/share/file.txt"
    expected = "[2001:db8::1]/share/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.93μs -> 3.61μs (8.92% faster)

def test_file_uri_with_port_in_netloc():
    # file://server:8080/share/file.txt
    uri = "file://server:8080/share/file.txt"
    expected = "server:8080/share/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.84μs -> 3.46μs (10.9% faster)

def test_file_uri_with_username_and_password():
    # file://user:pass@server/share/file.txt
    uri = "file://user:pass@server/share/file.txt"
    expected = "user:pass@server/share/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.77μs -> 3.38μs (11.4% faster)

# 3. Large Scale Test Cases

def test_large_path_length():
    # Very long path, but under 1000 characters
    long_folder = "a" * 500
    long_file = "b" * 400 + ".txt"
    uri = f"file:///{long_folder}/{long_file}"
    expected = f"/{long_folder}/{long_file}"
    codeflash_output = parse_uri_to_path(uri) # 4.27μs -> 3.94μs (8.34% faster)

def test_large_number_of_folders():
    # Many nested folders in path
    folders = "/".join([f"folder{i}" for i in range(100)])
    uri = f"file:///{folders}/file.txt"
    expected = f"/{folders}/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 4.17μs -> 3.68μs (13.4% faster)

def test_large_netloc():
    # Very long netloc (e.g., long server name)
    netloc = "server" + "x" * 500
    uri = f"file://{netloc}/share/file.txt"
    expected = f"{netloc}/share/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 4.18μs -> 3.79μs (10.2% faster)

def test_many_percent_encoded_characters():
    # Path with many percent-encoded spaces
    encoded_path = "/".join([f"folder%20{i}" for i in range(100)])
    uri = f"file:///{encoded_path}/file%20name.txt"
    decoded_path = "/".join([f"folder {i}" for i in range(100)])
    expected = f"/{decoded_path}/file name.txt"
    codeflash_output = parse_uri_to_path(uri) # 28.7μs -> 22.4μs (28.0% faster)

def test_large_scale_mixed():
    # Large netloc and path with mixed encoding
    netloc = "server" + "x" * 300
    folders = "/".join([f"folder%20{i}" for i in range(50)])
    file = "file%20name%20" + "y" * 200 + ".txt"
    uri = f"file://{netloc}/{folders}/{file}"
    decoded_folders = "/".join([f"folder {i}" for i in range(50)])
    expected = f"{netloc}/{decoded_folders}/file name {'y'*200}.txt"
    codeflash_output = parse_uri_to_path(uri) # 19.9μs -> 14.8μs (34.2% faster)

def test_large_scale_all_ascii_printable():
    # Path with all printable ASCII characters (except %)
    import string
    ascii_chars = ''.join(c for c in string.printable if c != '%')
    uri = f"file:///{ascii_chars}"
    expected = f"/{ascii_chars}"
    codeflash_output = parse_uri_to_path(uri) # 4.10μs -> 3.71μs (10.4% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from urllib.parse import unquote, urlparse

# imports
import pytest  # used for our unit tests
from skyvern.forge.sdk.api.files import parse_uri_to_path

# unit tests

# 1. Basic Test Cases
def test_basic_file_uri_with_path_only():
    # Standard file URI with path only
    uri = "file:///home/user/file.txt"
    expected = "/home/user/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.64μs -> 3.41μs (6.74% faster)

def test_basic_file_uri_with_netloc_and_path():
    # File URI with netloc (e.g., Windows UNC path)
    uri = "file://server/share/file.txt"
    expected = "server/share/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.83μs -> 3.34μs (14.6% faster)

def test_basic_file_uri_with_encoded_characters():
    # URI with percent-encoded space
    uri = "file:///home/user/My%20Documents/file.txt"
    expected = "/home/user/My Documents/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 9.37μs -> 6.29μs (49.0% faster)

def test_basic_file_uri_with_root_path():
    # Root path only
    uri = "file:///"
    expected = "/"
    codeflash_output = parse_uri_to_path(uri) # 3.72μs -> 3.25μs (14.6% faster)

def test_basic_file_uri_with_windows_drive_letter():
    # Windows-style drive letter in URI
    uri = "file:///C:/Users/Name/file.txt"
    expected = "/C:/Users/Name/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.69μs -> 3.39μs (8.75% faster)

# 2. Edge Test Cases

def test_non_file_scheme_raises_valueerror():
    # Should raise ValueError for non-file scheme
    uri = "http://example.com/file.txt"
    with pytest.raises(ValueError):
        parse_uri_to_path(uri) # 3.88μs -> 3.85μs (0.831% faster)

def test_empty_uri_raises_valueerror():
    # Should raise ValueError for empty URI (scheme missing)
    uri = ""
    with pytest.raises(ValueError):
        parse_uri_to_path(uri) # 3.57μs -> 3.58μs (0.252% slower)

def test_file_uri_with_no_path():
    # File URI with no path, only scheme
    uri = "file://"
    expected = ""
    codeflash_output = parse_uri_to_path(uri) # 3.83μs -> 3.42μs (11.9% faster)

def test_file_uri_with_only_scheme_and_slashes():
    # File URI with only scheme and slashes
    uri = "file:///"
    expected = "/"
    codeflash_output = parse_uri_to_path(uri) # 3.67μs -> 3.26μs (12.6% faster)

def test_file_uri_with_special_characters():
    # File URI with special characters, percent-encoded
    uri = "file:///tmp/%E2%9C%93%20test.txt"
    expected = "/tmp/✓ test.txt"
    codeflash_output = parse_uri_to_path(uri) # 11.6μs -> 8.37μs (39.0% faster)

def test_file_uri_with_multiple_slashes():
    # File URI with multiple consecutive slashes in path
    uri = "file:///tmp///folder//file.txt"
    expected = "/tmp///folder//file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.74μs -> 3.20μs (17.1% faster)

def test_file_uri_with_trailing_slash():
    # File URI with trailing slash
    uri = "file:///tmp/folder/"
    expected = "/tmp/folder/"
    codeflash_output = parse_uri_to_path(uri) # 3.56μs -> 3.06μs (16.5% faster)

def test_file_uri_with_query_and_fragment():
    # Query and fragment should be ignored
    uri = "file:///tmp/file.txt?foo=bar#section"
    expected = "/tmp/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.81μs -> 3.42μs (11.3% faster)

def test_file_uri_with_unicode_characters():
    # URI with unicode characters in percent-encoding
    uri = "file:///tmp/%F0%9F%92%A9.txt"
    expected = "/tmp/💩.txt"
    codeflash_output = parse_uri_to_path(uri) # 11.0μs -> 7.69μs (42.7% faster)

def test_file_uri_with_netloc_only():
    # File URI with netloc only, no path
    uri = "file://server"
    expected = "server"
    codeflash_output = parse_uri_to_path(uri) # 3.77μs -> 3.38μs (11.6% faster)

def test_file_uri_with_netloc_and_empty_path():
    # File URI with netloc and empty path
    uri = "file://server/"
    expected = "server/"
    codeflash_output = parse_uri_to_path(uri) # 3.70μs -> 3.27μs (13.1% faster)

def test_file_uri_with_path_that_is_percent_encoded_slash():
    # Percent-encoded slash should not be treated as path separator
    uri = "file:///tmp%2Ffolder%2Ffile.txt"
    expected = "/tmp/folder/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 9.48μs -> 6.49μs (46.1% faster)

def test_file_uri_with_colon_in_path():
    # Path containing colon (common in Windows paths)
    uri = "file:///C:/Windows/System32/drivers/etc/hosts"
    expected = "/C:/Windows/System32/drivers/etc/hosts"
    codeflash_output = parse_uri_to_path(uri) # 3.52μs -> 3.18μs (10.7% faster)

def test_file_uri_with_dot_and_dotdot_in_path():
    # Path containing . and .. segments
    uri = "file:///tmp/./folder/../file.txt"
    expected = "/tmp/./folder/../file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.36μs -> 3.13μs (7.24% faster)

# 3. Large Scale Test Cases

def test_large_file_uri_path():
    # Large path (over 1000 characters)
    large_folder = "a" * 500
    large_file = "b" * 500 + ".txt"
    uri = f"file:///{large_folder}/{large_file}"
    expected = f"/{large_folder}/{large_file}"
    codeflash_output = parse_uri_to_path(uri) # 3.98μs -> 3.46μs (15.2% faster)

def test_large_file_uri_with_many_subfolders():
    # File URI with many subfolders (less than 1000)
    subfolders = "/".join([f"folder{i}" for i in range(100)])
    uri = f"file:///{subfolders}/file.txt"
    expected = f"/{subfolders}/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 3.89μs -> 3.58μs (8.61% faster)

def test_large_file_uri_with_long_netloc():
    # Large netloc
    netloc = "server" + "x" * 500
    uri = f"file://{netloc}/share/file.txt"
    expected = f"{netloc}/share/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 4.09μs -> 3.64μs (12.2% faster)

def test_large_file_uri_with_many_percent_encoded_chars():
    # Large URI with many percent-encoded spaces
    folder = "My%20Documents" * 50
    uri = f"file:///{folder}/file.txt"
    expected = f"/{'My Documents'*50}/file.txt"
    codeflash_output = parse_uri_to_path(uri) # 19.6μs -> 14.7μs (32.8% faster)

def test_large_file_uri_with_unicode_chars():
    # Large URI with many unicode percent-encoded characters
    # U+1F600 (😀) encoded as %F0%9F%98%80
    unicode_segment = "%F0%9F%98%80" * 100
    uri = f"file:///tmp/{unicode_segment}.txt"
    expected = f"/tmp/{'😀'*100}.txt"
    codeflash_output = parse_uri_to_path(uri) # 52.6μs -> 47.6μs (10.6% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-parse_uri_to_path-mi88p274 and push.

Codeflash Static Badge

The optimized code achieves a **19% speedup** by avoiding unnecessary URL decoding when no percent-encoded characters are present. The key optimization is adding a conditional check `if '%' in path:` before calling the decoding function.

**What specific optimizations were applied:**
1. **Conditional decoding**: Only decode URLs that actually contain percent-encoded characters (indicated by '%')
2. **More efficient decoding**: Use `unquote_to_bytes()` followed by UTF-8 decoding instead of `unquote()` for better performance when decoding is needed

**Why this leads to speedup:**
- The original code always calls `unquote()` on every path, even when no decoding is needed
- The optimized version skips the expensive decoding operation for paths without '%' characters (78% of test cases in the profile)
- When decoding is needed, `unquote_to_bytes()` + `decode()` is more efficient than `unquote()` alone

**Performance impact based on test results:**
- **Simple paths** (no encoding): 8-17% faster due to avoiding unnecessary decoding
- **Percent-encoded paths**: 28-50% faster due to more efficient decoding method
- **Large-scale encoded paths**: Up to 34% faster, showing the optimization scales well

**Impact on existing workloads:**
The function is called from `download_file()` when processing `file://` URIs in local environments. Since file paths often don't contain percent-encoded characters, this optimization will provide consistent performance improvements for local file downloads without any behavioral changes.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 21, 2025 02:27
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant