Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: MongoDBtoBigQuery with UDF: ScriptObjectMirror cannot be cast to bson.Document #582

Closed
NichaRoj opened this issue Feb 7, 2023 · 4 comments · Fixed by #588
Closed
Assignees

Comments

@NichaRoj
Copy link

NichaRoj commented Feb 7, 2023

Related Template(s)

MongoDB to BigQuery

What happened?

When creating a job with MongoDB to BigQuery template and specify a UDF javascript file and function, the job fails to start. See relevant logs below.

When UDF file and function are not specified, jobs with the same template can run normally.

Beam Version

Newer than 2.43.0

Relevant log output

com.google.cloud.teleport.v2.common.UncaughtExceptionLogger - The template launch failed.
java.lang.ClassCastException: class org.openjdk.nashorn.api.scripting.ScriptObjectMirror cannot be cast to class org.bson.Document (org.openjdk.nashorn.api.scripting.ScriptObjectMirror and org.bson.Document are in unnamed module of loader 'app')
	at com.google.cloud.teleport.v2.mongodb.templates.MongoDbUtils.getTableFieldSchemaForUDF(MongoDbUtils.java:167)
	at com.google.cloud.teleport.v2.mongodb.templates.MongoDbToBigQuery.run(MongoDbToBigQuery.java:95)
	at com.google.cloud.teleport.v2.mongodb.templates.MongoDbToBigQuery.main(MongoDbToBigQuery.java:82)
@NichaRoj NichaRoj added bug Something isn't working needs triage p2 labels Feb 7, 2023
@Polber
Copy link
Contributor

Polber commented Feb 7, 2023

This is an issue with how UDF output is transformed into a Document object that is then passed into the pipeline. The output should be parsed from JSON output instead of directly casted. I can work on a fix and update this PR this week.

@Polber
Copy link
Contributor

Polber commented Feb 10, 2023

I created a PR to fix this issue: #588, but it does introduce a couple side-effects I wanted to discuss with the MongoDB team

@theshanbhag Since you added the feature, I thought I would tie you in to get your input.

Currently, MongoDB templates that use javascript UDF's pass a bson Document to the Javascript function, and expect a bson Document back. However, what is actually occurring is a bson Document is being converted to a String by the nashorn javascript engine before being passed to the function. This String is difficult to manipulate with native javascript because it cannot be easily parsed into a JSON object, and the return object cannot be easily casted into a Document.

My solution I introduced in #588 instead passes a json string representation of the Document to the javascript function, and the function is expected to return either a Document-castable object OR a JSON String that can be easily parsed into a Document.

This would allow for a UDF similar to below:

/**
 * A simple transform function.
 * @param {string} inJson
 * @return {string} outJson
 */
function transform(inJson) {
    var outJson = JSON.parse(inJson);
    outJson.key = "value";
    return JSON.stringify(outJson);
}

Thoughts on this?

@theshanbhag
Copy link
Contributor

theshanbhag commented Feb 13, 2023 via email

@Polber
Copy link
Contributor

Polber commented Mar 6, 2023

Hi @theshanbhag, has there been any progress on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants