Skip to content

Conversation

@rithin-pullela-aws
Copy link
Contributor

@rithin-pullela-aws rithin-pullela-aws commented Nov 28, 2025

Description

The tests testBM25WithOpenAIWithImage and testBM25WithOpenAIWithConversationAndImage are failing with error:

"Error while downloading https://upload.wikimedia.org/wikipedia/commons/..."
"code": "invalid_image_url"

This seems to be a known issue with OpenAI's backend failing intermittently with wikimedia, facebook graph api, etc. Example reference here

  • Replaced the Wikimedia Commons URL with an Unsplash image URL (https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=800)
  • Verified the new URL works reliably with OpenAI's API using direct API calls
  • Both tests now pass successfully

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Tests
    • Updated test data sources for validation purposes.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>
@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

Walkthrough

Test integration file modified to update image data URLs in two test methods from Wikimedia hosting to Unsplash. Only test data values changed; no logic, control flow, or error handling modifications.

Changes

Cohort / File(s) Summary
Test Data URL Updates
plugin/src/test/java/org/opensearch/ml/rest/RestMLRAGSearchProcessorIT.java
Updated imageData URLs in testBM25WithOpenAIWithImage and testBM25WithOpenAIWithConversationAndImage from Wikimedia hosting to Unsplash URLs

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

  • Single test file with straightforward URL value replacements
  • No logic or control flow changes

Poem

🐰 URLs hop from Wiki's hall,
To Unsplash's sunny sprawl,
Test data dances, bright and new,
Little changes, clean and true! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: replacing a Wikimedia image URL with an Unsplash URL to fix failing OpenAI RAG integration tests.
Description check ✅ Passed The description includes context, root cause analysis, the specific fix applied, verification details, and test results. However, the Related Issues section is incomplete (Issue number placeholder not filled).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Comment @coderabbitai help to get the list of available commands and usage tips.

@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval November 28, 2025 18:40 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval November 28, 2025 18:40 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval November 28, 2025 18:40 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval November 28, 2025 18:40 — with GitHub Actions Error
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/src/test/java/org/opensearch/ml/rest/RestMLRAGSearchProcessorIT.java (1)

707-731: Fix response variable mix‑ups so this new URL path is actually exercised

The new Unsplash URL at Line 719 looks fine for imageType = "url" and matches the intent to avoid the Wikimedia backend issue. However, the second request block isn’t really validated because it still uses the first response everywhere:

  • Line 721 asserts on response2 instead of response3.
  • Line 724 reads ext from responseMap2 instead of responseMap3.

That means the assertions never check the result of the URL‑based image request.

Suggested patch:

         requestParameters.imageType = "url";
-        requestParameters.imageData = "https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=800";
-        Response response3 = performSearch(INDEX_NAME, "pipeline_test", 5, requestParameters);
-        assertEquals(200, response2.getStatusLine().getStatusCode());
-
-        Map responseMap3 = parseResponseToMap(response3);
-        ext = (Map) responseMap2.get("ext");
+        requestParameters.imageData = "https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=800";
+        Response response3 = performSearch(INDEX_NAME, "pipeline_test", 5, requestParameters);
+        assertEquals(200, response3.getStatusLine().getStatusCode());
+
+        Map responseMap3 = parseResponseToMap(response3);
+        ext = (Map) responseMap3.get("ext");

This makes the second half of the test actually validate the new URL flow.

🧹 Nitpick comments (1)
plugin/src/test/java/org/opensearch/ml/rest/RestMLRAGSearchProcessorIT.java (1)

1060-1073: URL update looks good; consider deduplicating the test image URL

Using the Unsplash URL at Line 1072 is consistent with the earlier test and with imageType = "url". To avoid drift and simplify future changes, consider extracting this URL into a single static final constant (e.g., TEST_IMAGE_URL) and reusing it in both tests.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87e742e and 0547dfb.

📒 Files selected for processing (1)
  • plugin/src/test/java/org/opensearch/ml/rest/RestMLRAGSearchProcessorIT.java (2 hunks)

@mingshl
Copy link
Collaborator

mingshl commented Nov 28, 2025

this image looks great. hope this link won't expire soon.

requestParameters.imageType = "url";
requestParameters.imageData =
"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"; // imageContent;
requestParameters.imageData = "https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=800";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why do need to download the image? What is the goal of this integration test?
Can we keep an image in our resources folder? What if this image link breaks again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular IT covers the case where a URL is sent to OpenAI instead of Base 64 local image. We already have a test that covers local image (line 664)

@rithin-pullela-aws
Copy link
Contributor Author

CI fails because of no space on disk:

Exec output and error:
| Output for ./bin/opensearch-plugin:-> Installing file:/home/ci-runner/.gradle/caches/modules-2/files-2.1/org.opensearch.plugin/opensearch-job-scheduler/3.4.0.0-SNAPSHOT/28a94a61d1c6d719e4dfb482cc3371086f72d442/opensearch-job-scheduler-3.4.0.0-SNAPSHOT.zip
| -> Downloading file:/home/ci-runner/.gradle/caches/modules-2/files-2.1/org.opensearch.plugin/opensearch-job-scheduler/3.4.0.0-SNAPSHOT/28a94a61d1c6d719e4dfb482cc3371086f72d442/opensearch-job-scheduler-3.4.0.0-SNAPSHOT.zip
| -> Installed opensearch-job-scheduler with folder name opensearch-job-scheduler
| -> Installing file:/__w/ml-commons/ml-commons/plugin/build/distributions/opensearch-ml-3.4.0.0-SNAPSHOT.zip
| -> Downloading file:/__w/ml-commons/ml-commons/plugin/build/distributions/opensearch-ml-3.4.0.0-SNAPSHOT.zip
| -> Failed installing file:/__w/ml-commons/ml-commons/plugin/build/distributions/opensearch-ml-3.4.0.0-SNAPSHOT.zip
| -> Rolling back opensearch-job-scheduler
| -> Rolled back opensearch-job-scheduler
| -> Rolling back file:/__w/ml-commons/ml-commons/plugin/build/distributions/opensearch-ml-3.4.0.0-SNAPSHOT.zip
| -> Rolled back file:/__w/ml-commons/ml-commons/plugin/build/distributions/opensearch-ml-3.4.0.0-SNAPSHOT.zip
| Exception in thread "main" java.io.IOException: No space left on device
| 	at java.base/sun.nio.ch.UnixFileDispatcherImpl.write0(Native Method)
| 	at java.base/sun.nio.ch.UnixFileDispatcherImpl.write(UnixFileDispatcherImpl.java:69)
| 	at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:137)
| 	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:102)
| 	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:72)
| 	at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:338)
| 	at java.base/sun.nio.ch.ChannelOutputStream.writeFully(ChannelOutputStream.java:68)
| 	at java.base/sun.nio.ch.ChannelOutputStream.write(ChannelOutputStream.java:105)
| 	at java.base/java.io.InputStream.transferTo(InputStream.java:797)
| 	at java.base/java.io.FileInputStream.transferTo(FileInputStream.java:397)
| 	at java.base/java.io.BufferedInputStream.implTransferTo(BufferedInputStream.java:652)
| 	at java.base/java.io.BufferedInputStream.transferTo(BufferedInputStream.java:631)
| 	at java.base/java.nio.file.Files.copy(Files.java:3164)
| 	at org.opensearch.tools.cli.plugin.InstallPluginCommand.downloadZip(InstallPluginCommand.java:440)
| 	at org.opensearch.tools.cli.plugin.InstallPluginCommand.download(InstallPluginCommand.java:335)
| 	at org.opensearch.tools.cli.plugin.InstallPluginCommand.execute(InstallPluginCommand.java:274)
| 	at org.opensearch.tools.cli.plugin.InstallPluginCommand.execute(InstallPluginCommand.java:251)
| 	at org.opensearch.common.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:110)
| 	at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138)
| 	at org.opensearch.cli.MultiCommand.execute(MultiCommand.java:104)
| 	at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138)
| 	at org.opensearch.cli.Command.main(Command.java:101)
| 	at org.opensearch.tools.cli.plugin.PluginCli.main(PluginCli.java:66)

@mingshl
Copy link
Collaborator

mingshl commented Nov 28, 2025

try run the CI once at a time to avoid the disk issue

@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval November 28, 2025 20:00 — with GitHub Actions Failure
@rithin-pullela-aws
Copy link
Contributor Author

CI failed with:

FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':opensearch-ml-spi:compileJava'.
> Could not resolve all files for configuration ':opensearch-ml-spi:compileClasspath'.
   > Could not resolve com.github.luben:zstd-jni:1.5.6-1.
     Required by:
         project :opensearch-ml-spi > org.opensearch:opensearch:3.4.0-SNAPSHOT:20251128.073210-153 > org.opensearch:opensearch-compress:3.4.0-SNAPSHOT:20251128.073210-153
      > Could not resolve com.github.luben:zstd-jni:1.5.6-1.
         > Could not get resource 'https://ci.opensearch.org/ci/dbc/snapshots/maven/com/github/luben/zstd-jni/1.5.6-1/zstd-jni-1.5.6-1.pom'.
            > Could not HEAD 'https://ci.opensearch.org/ci/dbc/snapshots/maven/com/github/luben/zstd-jni/1.5.6-1/zstd-jni-1.5.6-1.pom'. Received status code 503 from server: Service Unavailable
> There are 5 more failures with identical causes.
* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.
> Run with --scan to get full insights.
[Incubating] Problems report is available at: file:///__w/ml-commons/ml-commons/build/reports/problems/problems-report.html
> Get more help at https://help.gradle.org./
Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.
BUILD FAILED in 15s
You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.
For more on this, please refer to https://docs.gradle.org/8.14.3/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.
9 actionable tasks: 1 executed, 8 up-to-date
Error: Process completed with exit code 1.

@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval November 28, 2025 20:49 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval November 28, 2025 20:49 — with GitHub Actions Error
@rithin-pullela-aws rithin-pullela-aws temporarily deployed to ml-commons-cicd-env-require-approval November 28, 2025 20:49 — with GitHub Actions Inactive
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval December 1, 2025 18:54 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval December 1, 2025 18:54 — with GitHub Actions Error
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval December 1, 2025 20:42 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval December 1, 2025 21:58 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval December 1, 2025 21:58 — with GitHub Actions Error
@b4sjoo
Copy link
Collaborator

b4sjoo commented Dec 1, 2025

It's still failing with timeout:

RestMLRAGSearchProcessorIT > testBM25WithOpenAIWithConversationAndImage STANDARD_OUT
    [2025-12-01T10:23:30,631][INFO ][o.o.m.r.RestMLRAGSearchProcessorIT] [testBM25WithOpenAIWithConversationAndImage] before test
    Running testBM25WithOpenAIWithConversationAndImage
    [2025-12-01T10:23:54,503][INFO ][o.o.m.r.RestMLRAGSearchProcessorIT] [testBM25WithOpenAIWithConversationAndImage] after test


RestMLRAGSearchProcessorIT > testBM25WithOpenAIWithConversationAndImage STANDARD_ERROR
REPRODUCE WITH: ./gradlew ':opensearch-ml-plugin:integTest' --tests 'org.opensearch.ml.rest.RestMLRAGSearchProcessorIT.testBM25WithOpenAIWithConversationAndImage' -Dtests.seed=12178F8EBBFD049E -Dtests.security.manager=false -Dtests.locale=hu-HU -Dtests.timezone=Etc/GMT+12 -Druntime.java=21
    REPRODUCE WITH: ./gradlew ':opensearch-ml-plugin:integTest' --tests 'org.opensearch.ml.rest.RestMLRAGSearchProcessorIT.testBM25WithOpenAIWithConversationAndImage' -Dtests.seed=12178F8EBBFD049E -Dtests.security.manager=false -Dtests.locale=hu-HU -Dtests.timezone=Etc/GMT+12 -Druntime.java=21

RestMLRAGSearchProcessorIT > testBM25WithOpenAIWithConversationAndImage FAILED
    org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:35035/], URI [/test/_search?size=5&search_pipeline=pipeline_test], status line [HTTP/1.1 400 Bad Request]
    {"error":{"root_cause":[{"type":"status_exception","reason":"Error from remote service: {\n  \"error\": {\n    \"message\": \"Timeout while downloading https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=800.\",\n    \"type\": \"invalid_request_error\",\n    \"param\": null,\n    \"code\": \"invalid_image_url\"\n  }\n}"}],"type":"status_exception","reason":"Error from remote service: {\n  \"error\": {\n    \"message\": \"Timeout while downloading https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=800.\",\n    \"type\": \"invalid_request_error\",\n    \"param\": null,\n    \"code\": \"invalid_image_url\"\n  }\n}"},"status":400}
        at __randomizedtesting.SeedInfo.seed([12178F8EBBFD049E:DE0CBBE8F8E271D7]:0)
        at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:501)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:384)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:359)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:199)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:172)
        at app//org.opensearch.ml.rest.RestMLRAGSearchProcessorIT.performSearch(RestMLRAGSearchProcessorIT.java:1448)
        at app//org.opensearch.ml.rest.RestMLRAGSearchProcessorIT.testBM25WithOpenAIWithConversationAndImage(RestMLRAGSearchProcessorIT.java:1073)

@cwperks cwperks merged commit 2854ad2 into opensearch-project:main Dec 1, 2025
38 of 52 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 3.3 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-3.3 3.3
# Navigate to the new working tree
cd .worktrees/backport-3.3
# Create a new branch
git switch --create backport/backport-4472-to-3.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 2854ad26d81e4788e2b7908c86e5eea4afe67e3e
# Push it to GitHub
git push --set-upstream origin backport/backport-4472-to-3.3
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-3.3

Then, create a pull request where the base branch is 3.3 and the compare/head branch is backport/backport-4472-to-3.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants