-
Notifications
You must be signed in to change notification settings - Fork 1.4k
SQlite data source for RDataFrame #2322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@dpiparo @bluehood From my side this is ready for review. I think it makes sense to have remote reading of files in a possible follow-up PR. Please let me know if you think it makes sense to add an integration test to roottest. In this case, how would I deal with the fact that this datasource depends on the sqlite availability? |
hi, alright i'm back at cern on monday, i'll review asap (and let you know about the roottest, but in first approximation you can follow what rarrowds does). |
hi, cleaned up obsolete jenkins messages, will start a new build and start reviewing today |
@phsft-bot build please |
Starting build on |
It should be rebased now. Is there a standard ROOT method for printing warnings? |
Starting build on |
Build failed on fedora28/native. Errors:
|
Build failed on windows10/vc15. Errors:
|
Build failed on centos7/gcc62. Failing tests: |
Yes, TObject::Warning or ::Warning. |
Starting build on |
Build failed on windows10/vc15. Errors:
|
Build failed on fedora28/native. Errors:
|
@bluehood Yes from my side |
@phsft-bot build |
Starting build on |
Build failed on fedora28/native. Errors:
|
Build failed on windows10/vc15. Errors:
|
@@ -22,11 +22,16 @@ if(NOT ARROW_FOUND) | |||
list(REMOVE_ITEM sources ${CMAKE_CURRENT_SOURCE_DIR}/src/RArrowDS.cxx) | |||
endif() | |||
|
|||
if(NOT sqlite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may need to use ${ROOT_sqlite_FOUND}
or equivalent here and elsewhere.
Starting build on |
Build failed on fedora28/native. Errors:
|
Starting build on |
Starting build on |
thank you @jblomer for being so responsive to comments! |
This PR adds a new data source for
RDataFrame
that is able to provide data from SQliteSELECT
queries. It will be useful for cvmfs, where we have file catalogs and monitoring information in sqlite files. For instance, one can doto show the distribution of file name sizes. There are probably more use cases, for instance sqlite export of conditions data.
This is work in progress, I'm posting it for early comments and feedback.
My open points are
GetEntryRanges()
to make it thread-safe but that's apparently not enough. So I'm now thinking to wrapSetEntry()
andGetEntryRanges()
in a mutex.TSQLiteServer
. It felt like it requires stretching theTSQLServer
interface in perhaps unwanted ways, for saving only a handful of direct sqlite API calls. On the other hand, with (probably quite a bit) more work it might be possible to have a more general data source that works on any SQL result set.NULL
where subsequent rows actually have meaningful values. My guess is that the current heuristic is practical enough, and of course the user can formulate theSELECT
query to avoid ambiguity.