Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-12923 Added append avro mode to PutHDFS #8544

Closed
wants to merge 4 commits into from

Conversation

balazsgerner
Copy link
Contributor

Summary

NIFI-12923

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@balazsgerner balazsgerner marked this pull request as ready for review March 22, 2024 10:34
&& AVRO_APPEND_MODE.equals(appendMode)
&& destinationExists) {
getLogger().info("Appending avro record to existing avro file");
try (final var reader = new DataFileStream<>(in, new GenericDatumReader<>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't backport to 1.x because var wasn't introduced until Java 10 and we still have to support Java 8 on 1.x for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One additional note, use of var should be avoided in general, even when supported in the main branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, thank you for the feedback. Fixed in 04a1ed7.

@@ -160,6 +170,14 @@ public class PutHDFS extends AbstractHadoopProcessor {
.allowableValues(WRITE_AND_RENAME_AV, SIMPLE_WRITE_AV)
.build();

public static final PropertyDescriptor APPEND_MODE = new PropertyDescriptor.Builder()
.name("Append mode")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.name("Append mode")
.name("Append Mode")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mattyb149
Copy link
Contributor

In order for the append to work I had to set Writing Strategy to Simple write, if I leave the default Write and rename, it actually deletes the file. Is this intended? If not we should add another dependent property so if AVRO is chosen, the Writing Strategy must be Simple Write.

@balazsgerner
Copy link
Contributor Author

In order for the append to work I had to set Writing Strategy to Simple write, if I leave the default Write and rename, it actually deletes the file. Is this intended? If not we should add another dependent property so if AVRO is chosen, the Writing Strategy must be Simple Write.

To be honest, during the whole dev testing I left the 'Writing Strategy' property on default value which is 'Write and rename', and it worked completely fine for me. I double checked it now and it still does. As I see it, my ticket does not affect the connection between writing strategy and conflict resolution strategy so it should work just as before.

I also checked 'Writing strategy'='Simple write', and that also works for me without any problems.

If you experience file deletion I assume your flow might need a tweak. I do not know the processors you use exactly, but I had file deletion problems when I left the 'GetHDFS' or the 'GetFile' processors 'Keep source file' on default which is 'false'. In order to make my flow reexecutable I changed that value to 'true'.

For testing I used the following flow:

  1. GetFile('Keep source file'='true') -> PutHDFS(append,AVRO,'Write and rename'), but it also works with 'Simple write'
  2. GetHDFS('Keep source file'='true') -> ConvertAvroToJSON (I just used this step to validate the content of the avro file, not sure if the convert processor exists upstream, I used a custom nifi instance for this)

@balazsgerner balazsgerner force-pushed the NIFI-12923 branch 2 times, most recently from 0d92f2d to 909810b Compare April 10, 2024 09:45
@balazsgerner
Copy link
Contributor Author

I had to rebase to main in order resolve merge conflicts. Sorry for the force push.

@mattyb149
Copy link
Contributor

Now I'm running into this in my test environment, trying to change the default filesystem now.

@mattyb149
Copy link
Contributor

+1 LGTM, verified the Avro append capability works as expected. Thanks for the improvement! Merging to main and support/nifi-1.x

@mattyb149 mattyb149 closed this in 3239f59 Apr 19, 2024
shubhluck pushed a commit to shubhluck/nifi that referenced this pull request Jun 1, 2024
NIFI-12923 remove var keyword

NIFI-12923 change property name

NIFI-12923 Added property dependency for append_mode

Signed-off-by: Matt Burgess <mattyb149@apache.org>

This closes apache#8544
shubhluck pushed a commit to shubhluck/nifi that referenced this pull request Jun 1, 2024
NIFI-12923 remove var keyword

NIFI-12923 change property name

NIFI-12923 Added property dependency for append_mode

Signed-off-by: Matt Burgess <mattyb149@apache.org>

This closes apache#8544
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants