Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38255: [Java] Implement Flight SQL Bulk Ingestion #43551

Merged
merged 15 commits into from
Sep 5, 2024

Conversation

eramitmittal
Copy link
Contributor

@eramitmittal eramitmittal commented Aug 4, 2024

Please look at #38255 for details on this functionality. Support for Go and C++ was added as part of #38385.
This pull request is to add the required support for Java.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Aug 5, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 5, 2024
@eramitmittal
Copy link
Contributor Author

hi @lidavidm, any update please?

@lidavidm
Copy link
Member

lidavidm commented Aug 7, 2024

Sorry @eramitmittal, I will try to get to it tomorrow

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Aug 8, 2024
@lidavidm
Copy link
Member

lidavidm commented Aug 8, 2024

It appears this fails the actual integration tests, though

@eramitmittal
Copy link
Contributor Author

Thanks looking into the integration failures. They passed on my local machine.

@eramitmittal
Copy link
Contributor Author

eramitmittal commented Aug 9, 2024

@lidavidm any hint on how to proceed?

Only Java 21+ runs are failing on Ubuntu complaining about memory leak in flightSqlIngestion test of IntegrationTest class.
However, I am not able to reproduce the problem.
I see a successful run when I manually run on Ubuntu with Java 21 in my local environment.
image

I have also tried to statically study the FlightSqlIngestionScenario code and I don't see how mem leak could be occurring.

Since the problem occurs only in CI, is there any possibility to run the tests in CI with DEBUG on for allocator?
image

@lidavidm
Copy link
Member

lidavidm commented Aug 9, 2024

That's not the error I'm concerned about, I'm more wondering why the integration tests fail:

 --------------
WARNING: Unknown module: org.apache.arrow.flight.core specified to --add-reads
WARNING: Unknown module: org.apache.arrow.memory.core specified to --add-opens
SLF4J(W): No SLF4J providers were found.
SLF4J(W): Defaulting to no-operation (NOP) logger implementation
SLF4J(W): See https://www.slf4j.org/codes.html#noProviders for further details.
Exception in thread "main" java.lang.AssertionError: Expected:
false
but got:
null
	at org.apache.arrow.flight.integration.tests.IntegrationAssertions.assertEquals(IntegrationAssertions.java:54)
	at org.apache.arrow.flight.integration.tests.FlightSqlScenario.lambda$validateMetadataRetrieval$0(FlightSqlScenario.java:172)
	at org.apache.arrow.flight.integration.tests.FlightSqlScenario.validate(FlightSqlScenario.java:227)
	at org.apache.arrow.flight.integration.tests.FlightSqlScenario.validateMetadataRetrieval(FlightSqlScenario.java:161)
	at org.apache.arrow.flight.integration.tests.FlightSqlScenario.client(FlightSqlScenario.java:76)
	at org.apache.arrow.flight.integration.tests.IntegrationTestClient.run(IntegrationTestClient.java:96)
	at org.apache.arrow.flight.integration.tests.IntegrationTestClient.main(IntegrationTestClient.java:66)

--------------

@lidavidm
Copy link
Member

lidavidm commented Aug 9, 2024

As for the memory leak, @vibhatha since we disabled the debug logging, how can it be reenabled here?

@eramitmittal
Copy link
Contributor Author

@lidavidm the integration tests should be fixed now with new pushed commit.

@vibhatha
Copy link
Collaborator

As for the memory leak, @vibhatha since we disabled the debug logging, how can it be reenabled here?

I need to take a look @lidavidm, I haven't enabled it.

@eramitmittal
Copy link
Contributor Author

Integration tests are passing now.

2 jobs are failing.

Java / AMD64 Ubuntu 22.04 Java JDK 17 Maven 3.9.6 (pull_request):
Failure is related to memory leak which I am unable to reproduce in my local environment. To diagnose the problem I need a CI run with allocator memory Debug turned on. Any suggestion on how to proceed?

Java JNI / AMD64 manylinux2014 Java JNI (pull_request):
These failure are strange given that I didn't touch Gandiva or Dataset projects:
image

image

Are we sure these are not some intermittent issues?

…llocation debug trail in case tests fail due to memory leak)
@eramitmittal
Copy link
Contributor Author

@lidavidm I have added an allocationListener to the tests to keep track of allocation and release. In case of failure due to IllegalStateException I am then printing the allocation/release trail. Hopefully this should be able to provide information on where the mem leak occurs. Can you please approve the CI?

@lidavidm
Copy link
Member

lidavidm commented Sep 4, 2024

Done. It seems it's either the server or the client's handling of putResult that's the issue? (Maybe sometimes we aren't draining the stream on the client properly?)

@eramitmittal
Copy link
Contributor Author

eramitmittal commented Sep 4, 2024

I see client side allocating and releasing PutResult fine.

allocate: 2: 
	org.apache.arrow.flight.integration.tests.TestBufferAllocationListener.onAllocation(TestBufferAllocationListener.java:39)
	org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:337)
	org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:297)
	org.apache.arrow.flight.PutResult.fromProtocol(PutResult.java:82)
	org.apache.arrow.flight.FlightClient$SetStreamObserver.onNext(FlightClient.java:488)
	org.apache.arrow.flight.FlightClient$SetStreamObserver.onNext(FlightClient.java:476)
release: 2: 
	org.apache.arrow.flight.integration.tests.TestBufferAllocationListener.onRelease(TestBufferAllocationListener.java:43)
	org.apache.arrow.memory.AllocationManager.release(AllocationManager.java:171)
	org.apache.arrow.memory.BufferLedger.decrement(BufferLedger.java:153)
	org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:123)
	org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
	org.apache.arrow.memory.ArrowBuf.close(ArrowBuf.java:1032)
	org.apache.arrow.flight.PutResult.close(PutResult.java:97)
	org.apache.arrow.flight.sql.FlightSqlClient.executeIngest(FlightSqlClient.java:328)

Problem is at server side:

//FlightSqlScenarioProducer.java
  private Runnable acceptPutReturnConstant(StreamListener<PutResult> ackStream, long value) {
    return () -> {
      final FlightSql.DoPutUpdateResult build =
          FlightSql.DoPutUpdateResult.newBuilder().setRecordCount(value).build();

      try (final ArrowBuf buffer = allocator.buffer(build.getSerializedSize())) {
        buffer.writeBytes(build.toByteArray());
        ackStream.onNext(PutResult.metadata(buffer));
        ackStream.onCompleted();
      }
    };
  }

I see release corresponding to ackStream.onCompleted()

release: 32: 
	org.apache.arrow.flight.integration.tests.TestBufferAllocationListener.onRelease(TestBufferAllocationListener.java:43)
        ...........
	org.apache.arrow.flight.StreamPipe.onCompleted(StreamPipe.java:99)
org.apache.arrow.flight.integration.tests.FlightSqlScenarioProducer.lambda$acceptPutReturnConstant$0(FlightSqlScenarioProducer.java:526)
	org.apache.arrow.flight.FlightService.lambda$doPutCustom$0(FlightService.java:244)	

..but no trace of buffer getting closed due to try-with-resources. It seems as if executor thread got completed before calling the finally!!!

  /** Request that the server shut down. */
  public void shutdown() {
    server.shutdown();
    if (grpcExecutor != null) {
      grpcExecutor.shutdown();
    }
  }

In the tests client finishes and then server.close() gets called which shutdown the ExecutorService without waiting for pending tasks to finish.

Flow similar to FlightServer.close() is needed

@vibhatha
Copy link
Collaborator

vibhatha commented Sep 4, 2024

@eramitmittal I guess then in this case we need to make sure we gracefully release all resources.

@eramitmittal
Copy link
Contributor Author

@vibhatha in general FlightServer.close() and shutdown methods need a review to account for pending tasks in ExecutorService as well. Other way obviously is that I can introduce a short delay in the IntegrationTest.testScenario method to allow ExecutorService threads to finish. But that will not solve the problem getting highlighted in FlightServer. I think graceful shutdown of FlightServer can be handled as a separate issue and for now I can just make the tests pass. What do you think?

@eramitmittal
Copy link
Contributor Author

Pushed a wait for ExecutorService to terminate in IntegrationTest.testScenario method. @lidavidm please approve the CI

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Sep 4, 2024
…to explain wait for executorService to finish)
@eramitmittal
Copy link
Contributor Author

eramitmittal commented Sep 4, 2024

All tests passed in last run except: AMD64 manylinux2014 Java JNI
Failure in that are not related to my changes:
image

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 4, 2024
@vibhatha
Copy link
Collaborator

vibhatha commented Sep 4, 2024

All tests passed in last run except: AMD64 manylinux2014 Java JNI

Failure in that are not related to my changes:

image

Yes, this is irrelevant.

@eramitmittal
Copy link
Contributor Author

So nice to see the integration tests passing consistently now :). Hope to see this merged soon. Thanks

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Sep 5, 2024
@lidavidm lidavidm merged commit 5ca12bd into apache:main Sep 5, 2024
17 of 18 checks passed
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Sep 5, 2024
@lidavidm
Copy link
Member

lidavidm commented Sep 5, 2024

Argh, I didn't notice that this used a closed GitHub issue.

Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 5ca12bd.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Sep 6, 2024
)

Please look at apache#38255 for details on this functionality. Support for Go and C++ was added as part of apache#38385.
This pull request is to add the required support for Java.
* GitHub Issue: apache#38255

Lead-authored-by: Amit Mittal <amit.mittal@iongroup.com>
Co-authored-by: Amit Mittal <eramitmittal@users.noreply.github.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
khwilson pushed a commit to khwilson/arrow that referenced this pull request Sep 14, 2024
)

Please look at apache#38255 for details on this functionality. Support for Go and C++ was added as part of apache#38385.
This pull request is to add the required support for Java.
* GitHub Issue: apache#38255

Lead-authored-by: Amit Mittal <amit.mittal@iongroup.com>
Co-authored-by: Amit Mittal <eramitmittal@users.noreply.github.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants