Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: Run Iceberg with JDK 17 #7391

Merged

Conversation

singhpk234
Copy link
Contributor

@singhpk234 singhpk234 commented Apr 20, 2023

About the change

This change adds a CI to run iceberg with JDK 17, considering now most of the engines are adding support for jdk 17 for ex : trino, spark.

Fixes: #7290

cc @jackye1995

private static final Pattern TIMESTAMPTZ =
Pattern.compile(
"\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}(:\\d{2}(.\\d{1,6})?)?([-+]\\d{2}:\\d{2}|Z)");
"\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}(:\\d{2}(.\\d{1,10})?)?([-+]\\d{2}:\\d{2}|Z)");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when running with java 17, i get 2023-04-20T19:17:17.748170628Z when using OffsetTime class, it only happens with me in ubuntu and not in my developement mac.
As we need to support both the env's hence adjusted the regex to account for the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 10 instead of 9?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, counting mistake :P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would TIME also be affected by this issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it would be 9 places even for java 8, looks like we were not hitting this case earlier

    public static final DateTimeFormatter ISO_LOCAL_TIME;
    static {
        ISO_LOCAL_TIME = new DateTimeFormatterBuilder()
                .appendValue(HOUR_OF_DAY, 2)
                .appendLiteral(':')
                .appendValue(MINUTE_OF_HOUR, 2)
                .optionalStart()
                .appendLiteral(':')
                .appendValue(SECOND_OF_MINUTE, 2)
                .optionalStart()
                .appendFraction(NANO_OF_SECOND, 0, 9, true)
                .toFormatter(ResolverStyle.STRICT, null);
    }

https://github.com/corretto/corretto-8/blob/develop/jdk/src/share/classes/java/time/format/DateTimeFormatter.java#L811

fixed the hard coded test in time as well as updated the regex

build.gradle Outdated Show resolved Hide resolved
@rdblue rdblue changed the title Build / Infra: Run Iceberg Java Test with JDK 17 Build: Run Iceberg Java Test with JDK 17 Apr 20, 2023
@singhpk234
Copy link
Contributor Author

singhpk234 commented Apr 21, 2023

Have added java 17 CI for all engine's as well, as part of commit 1bc99b8, except Flink couldn't find minimum flink version since when java 17 is supported, found this issue https://issues.apache.org/jira/browse/FLINK-15736, but it seems un-resolved (cc @stevenzwu, who might have more context here)


This has now added new CI's, we can reduce the waiting time by the idea, i proposed idea a while back as part of reducing the load on main iceberg repo here (#5153 (comment)) :
The idea is based on apache spark :

This is what Apache Spark does presently :

    We create a PR and our repository triggers the workflow. Our PR uses the resources allocated to us for testing.
    Apache Spark repository finds our workflow, and links it in a comment in our PR

We can do the same for iceberg, i.e run ci on our fork and iceberg just finds the run and attaches the workflow to our pr in iceberg repo.

As per spark folks :

Spark was one of the projects that uses the GitHub resources most in ASF, and now it's one of the lowest after this change :-).

Happy to create a pr to do same for iceberg.

@singhpk234 singhpk234 changed the title Build: Run Iceberg Java Test with JDK 17 Build: Run Iceberg with JDK 17 Apr 21, 2023
build.gradle Outdated Show resolved Hide resolved
@singhpk234 singhpk234 force-pushed the enhancement/run-iceberg-java-with-jdk17 branch from 1bc99b8 to 2c9a60e Compare April 21, 2023 19:21
@singhpk234 singhpk234 force-pushed the enhancement/run-iceberg-java-with-jdk17 branch 2 times, most recently from 791541f to b5d247d Compare April 22, 2023 04:23
} else if (JavaVersion.current() == JavaVersion.VERSION_17) {
project.ext.jdkVersion = '17'
project.ext.extraJvmArgs = ["-XX:+IgnoreUnrecognizedVMOptions",
"--add-opens", "java.base/java.io=ALL-UNNAMED",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to open all these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I collected these flag by debugging individual failures of ut's from test projects (spark / arrow / aws (kryo)) this is a superset i would say we may define only the relevant one's per package, but since there was no harm in passing additional flags hence passed all the flags in all the projects. please let me know your thoughts if the above makes sense, happy to add required flags per projects

build.gradle Outdated Show resolved Hide resolved
@singhpk234 singhpk234 force-pushed the enhancement/run-iceberg-java-with-jdk17 branch from b5d247d to d2faa94 Compare April 23, 2023 07:52
@singhpk234 singhpk234 force-pushed the enhancement/run-iceberg-java-with-jdk17 branch from d2faa94 to 9bcbc9d Compare April 23, 2023 16:55
@github-actions github-actions bot removed the MR label Apr 23, 2023
"--add-opens", "java.base/sun.nio.ch=ALL-UNNAMED",
"--add-opens", "java.base/sun.nio.cs=ALL-UNNAMED",
"--add-opens", "java.base/sun.security.action=ALL-UNNAMED",
"--add-opens", "java.base/sun.util.calendar=ALL-UNNAMED"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could provide a list of classes to open and generate the --add-opens and =ALL-UNNAMED boilerplates. But that might be over-engineering, and I am fine as is. Curious what other people think.

@singhpk234 singhpk234 force-pushed the enhancement/run-iceberg-java-with-jdk17 branch from 4ef73e0 to 3d2fae1 Compare April 24, 2023 21:43
Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! Looks good to me, except for some open questions waiting for other comments from other people.

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks mostly good to me (I've also double-checked whether we can specify all JVM args in gradle.properties via org.gradle.jvmargs but those aren't properly passed down).
I think it would be good to also double-check the difference in the sizes that required the test changes

@singhpk234
Copy link
Contributor Author

I think it would be good to also double-check the difference in the sizes that required the test changes

+1 here is the analysis for the same, basically JDK 8 was reporting column size for binaryColfor 44 where as JDK 17 was reporting it as 43, these stats were directly taken from the parquet footer hence to make the size / stats of the parquet file written env independent, I directly read the columnSize from dataFile itself, and made the changes here.

@@ -47,10 +47,10 @@ public class ExpressionUtil {
private static final Pattern DATE = Pattern.compile("\\d{4}-\\d{2}-\\d{2}");
private static final Pattern TIME = Pattern.compile("\\d{2}:\\d{2}(:\\d{2}(.\\d{1,6})?)?");
private static final Pattern TIMESTAMP =
Pattern.compile("\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}(:\\d{2}(.\\d{1,6})?)?");
Pattern.compile("\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}(:\\d{2}(.\\d{1,9})?)?");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry forgot to ask, why is this changed from 6 to 9? Does JDK17 change default precision of timestamp or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes with my mac on using JDK 17 existing regex would pass, but with ubuntu with JDK 17, was getting 2023-04-20T19:17:17.748170628Z more details here: #7391 (comment)

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for CI to complete

@jackye1995
Copy link
Contributor

Thanks for the work and everyone's review! Looks like everything is passing, and all comments are addressed. I will go ahead and merge it, we can fix anything remaining in follow-up PRs.

@jackye1995 jackye1995 merged commit f70e52c into apache:master Apr 25, 2023
Fokko added a commit that referenced this pull request Apr 26, 2023
* Updated README.md

#7391 
It's just a simple change in my thinking.

* Reduce writing

I like that it's shorter.

Co-authored-by: Fokko Driesprong <fokko@apache.org>

---------

Co-authored-by: Fokko Driesprong <fokko@apache.org>
manisin pushed a commit to Snowflake-Labs/iceberg that referenced this pull request May 9, 2023
manisin pushed a commit to Snowflake-Labs/iceberg that referenced this pull request May 9, 2023
* Updated README.md

apache#7391 
It's just a simple change in my thinking.

* Reduce writing

I like that it's shorter.

Co-authored-by: Fokko Driesprong <fokko@apache.org>

---------

Co-authored-by: Fokko Driesprong <fokko@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Java 17 for Iceberg
5 participants