Merged
Conversation
…types returns no results [MapR-DB JSON Tables] + Added `enablePushdown` option to enable/disable all filter pushdown, enabled by default.
+ Fail query on schema change. + Added a configuration option 'ignoreSchemaChange', which when enabled, drops the rows from the result
Hive's HBaseStorageHandler uses HBase's TableInputFormat which is in hbase-server module.
…istribution. The default build/test/packaging behavior for mapr-format-plugin module are 1. BUILD of mapr-format-plugin is ENABLED. 2. Unit tests of mapr-format-plugin module are DISABLED (use `-Pmapr` to enable). 3. Packaging of mapr-format-plugin is DISABLED (use `-Pmapr` to enable). Please see LEGAL-251 for discussion/conclusion regarding inclusion of source code with non-open-source dependency.
Exclude 'hadoop-mapreduce-client-core' and 'hadoop-auth' as transitive dependencies from 'hbase-server'
…ated errors while closing the new group and issue a more detailed error message. close apache#591
…tinct) exists close apache#588
Tests for different data types close apache#598
… logging is not enabled.
+ Function visitor should not use previous function holder if this function is non-deterministic closes apache#509
+ Previously, binary_string used the input buffer as output buffer. So after calling binary_string, the original content was destroyed. Other expressions/ functions that need to access the original input buffer get wrong results. + This fix also sets readerIndex and writerIndex correctly for the output buffer, otherwise the consumer of the output buffer will hit issues. closes apache#604
These changes are a subset of the original pull request from DRILL-4539 (PR-462). - Added changes to support Null Equality Joins; - Created tests for it. close apache#603
…ill JDBC Drill JDBC driver uses Optiq Avatica as its basis, but this dependency has been moved over to Calcite, for quite some time without Drill code being updated for it. This patch updates Avatica version to the version from Calcite (1.4.0-drill-r19). It also refactors Drill JDBC driver to comply with the packages and API changes in Avatica. Finally it fixes the the SQL types for lists and structs, since Drill doesn't support java.sql.Array and java.sql.Struct interfaces. this closes apache#395 Change-Id: Ia608adf900e8708d9e6f6f58ed41e104321a9914
Support loading Drill driver using ServiceLoader. From the user perspective,
it means being able to use the driver without registering it first, like by using
Class.forName("org.apache.drill.jdbc.Driver") for example.
this closes apache#596
Change-Id: Id26922ee42bef5fbce46ac2bcbb83f1859e9bb48
Change MetadataProvider to return metadata results ordered (following convention used by ODBC and JDBC specs). this closes apache#614 Change-Id: Iff59b7fada7040602f1735bccc13bc6bf5c9a252
- Adding tableType filter to GetTablesReq query (needed for JDBC and ODBC drivers). - Fix table type returned by sys and INFORMATION_SCHEMA tables - Also fixes some protobuf typos to related classes. this closes apache#612 Change-Id: If95246a312f6c6d64a88872936f516308874c2d2
…umber even during the unit tests. This is now a build-time generated class, rather than one that looks on the classpath for META-INF files. This pattern for file generation with parameters passed from the POM files was borrowed from parquet-mr.
Drill was writing non-standard dates into parquet files for all releases before 1.9.0. The values have been read by Drill correctly by Drill, but external tools like Spark reading the files will see corrupted values for all dates that have been written by Drill. This change corrects the behavior of the Drill parquet writer to correctly store dates in the format given in the parquet specification. To maintain compatibility with old files, the parquet reader code has been updated to check for the old format and automatically shift the corrupted values into corrected ones automatically. The test cases included here should ensure that all files produced by historical versions of Drill will continue to return the same values they had in previous releases. For compatibility with external tools, any old files with corrupted dates can be re-written using the CREATE TABLE AS command (as the writer will now only produce the specification-compliant values, even if after reading out of older corrupt files). While the old behavior was a consistent shift into an unlikely range to be used in a modern database (over 10,000 years in the future), these are still valid date values. In the case where these may have been written into files intentionally, and we cannot be certain from the metadata if Drill produced the files, an option is included to turn off the auto-correction. Use of this option is assumed to be extremely unlikely, but it is included for completeness. This patch was originally written against version 1.5.0, when rebasing the corruption threshold was updated to 1.9.0. Added regenerated binary files, updated metadata cache files accordingly. One small fix in the ParquetGroupScan to accommodate changes in master that changed when metadata is read. Tests for bugs revealed by the regression suite. Fix drill version number in metadata file generation
…eld in the parquet meta info "is.date.correct = true"; - Removed unnecessary double conversion of value with Julian day; - Added ability to correct corrupted dates for parquet files with the second version old metadata cache file as well. This closes apache#595
1) Configuration / parsing / options / protos 2) Zookeeper integration 3) Registration / unregistration / lazy-init 4) Unit tests This closes apache#574
…ressing This closes apache#518
There's no name and version exchanged between client and server over the User RPC channel. On client side, having access to the server name and version is useful to expose it to the user (through JDBC or ODBC api like DatabaseMetadata#getDatabaseProductVersion()), or to implement fallback strategy when some recent API are not available (like metadata API). On the server side, having access to the client version might be useful for audit purposes and eventually to implement fallback strategy if it doesn't require a RPC version change. this closes apache#622
Unit testing revealed that the VectorAccessorSerializable class claims to serialize SV2s, but, in fact, does not. Actually, it writes them, but does not read them, resulting in corrupted data on read. Fortunately, no code appears to serialize sv2s at present. Still, it is a bug and needs to be fixed. First task is to add serialization code for the sv2. That revealed that the recently-added code to save DrillBufs using a shared buffer had a bug: it relied on the writer index to know how much data is in the buffer. Turns out sv2 buffers don’t set this index. So, new versions of the write function takes a write length. Then, closer inspection of the read code revealed duplicated code. So, DrillBuf allocation moved into a version of the read function that now does reading and DrillBuf allocation. Turns out that value vectors, but not SV2s, can be built from a Drillbuf. Added a matching constructor to the SV2 class. Finally, cleaned up the code a bit to make it easier to follow. Also allowed test code to access the handy timer already present in the code. closes apache#800
…atch Unit tests showed that the “priority queue copier” does not handle an empty batch. This has not been an issue because code elsewhere in the sort specifically works around this issue. This fix resolves the issue at the source to avoid the need for future work-arounds. closes apache#778
Refactors ScanBatch to allow unit testing of record reader implementations, especially the “writer” classes. See JIRA for details. closes apache#811
… functions 1. Revisited calculation logic for string literals and some string functions (cast, upper, lower, initcap, reverse, concat, concat operator, rpad, lpad, case statement, coalesce, first_value, last_value, lag, lead). Synchronized return type length calculation logic between limit 0 and regular queries. 2. Deprecated width and changed it to precision for string types in MajorType. 3. Revisited FunctionScope and splitted it into FunctionScope and ReturnType. FunctionScope will indicate only function usage in term of number of in / out rows, (n -> 1, 1 -> 1, 1->n). New annotation in UDFs ReturnType will indicate which return type strategy should be used. 4. Changed MAX_VARCHAR_LENGTH from 65536 to 65535. 5. Updated calculation of precision and display size for INTERVALYEAR & INTERVALDAY. 6. Refactored part of function code-gen logic (ValueReference, WorkspaceReference, FunctionAttributes, DrillFuncHolder). This closes apache#819
…nd reuse table and tabletInfo per query instead of fetching them multiple times. Compute rowCount from tabletInfo instead of expensive tableStats call. This closes apache#817
…rectly This closes apache#821
…s consisting of multiple operators. This closes apache#823
…ble - A small refactoring of original fix of this issue (DRILL-4039); - Added test for the fix.
If the Hive server restarts, Drill either hangs or continually reports errors when retrieving schemas. The problem is that the Hive plugin tries to handle connection failures, but does not do so correctly for the secure connection case. The problem is complex, see DRILL-5496 for details. This is a workaround: we discard the entire Hive schema cache when we encounter an unhandled connection exception, then we rebuild a new one. This is not a proper fix; for that we'd have to restructure the code. This will, however, solve the immediate problem until we do the needed restructuring.
See DRILL-5498 for details. Replaced the repeated varchar reader for reading columns with a purpose built column parser. Implemented rules to recover from invalid column headers. Added missing test method Changes re code review comments Back out testing-only change close apache#830
NOTE: This pull request provides support for on-wire encryption using SASL framework. The communication channel that are covered are:
1) Between Drill JDBC client and Drillbit.
2) Between Drillbit to Drillbit i.e. control/data channels.
3) It has UI change to view encryption is enabled on which network channel and number of encrypted/unencrypted connections for
user/control/data connections.
close apache#773
NOTE: This pull request provides support for on-wire encryption using SASL framework. Communication channel covered is:
1) C++ Drill Client and Drillbit channel.
close apache#809
…thod Changes: 1. Fixed DCL in FunctionInitializer.checkInit() method (update flag parameter when function body is loaded). 2. Fixed ImportGrabber.getImports() method to return the list with imports. 3. Added unit tests for FunctionInitializer. 4. Minor refactoring (renamed methods, added javadoc). closes apache#843
Standardizes error handling to throw a UserException. Prior code threw various exceptions, called the fail() method, or returned a variety of status codes. closes apache#838
Validates offset vectors in VarChar and repeated vectors. Validates the special case of repeated VarChar vectors (two layers of offsets.) Provides two new session variables to turn on validation. One enables the existing operator (iterator) validation, the other adds vector validation. This allows validation to occur in a “production” Drill (without restarting Drill with assertions, as previously required.) Unit tests validate the validator. Another test validates the integration, but requires manual steps, so is ignored by default. This version is first-cut: all work is done within a single class. Allows back-porting to an earlier version to solve a specific issues. A revision should move some of the work into generated code (or refactor vectors to allow outside access), since offset vectors appear for each subclass; not on a base class that would allow generic operations. * Added boot-time options to allow enabling vector validation in Maven unit tests. * Code cleanup per suggestions. * Additional (manual) tests for boot-time options and default options. closes apache#832
1. Introduced an InMemoryStoreProvider with the ability to maintain a max capacity 2. DrillbitContext now explicitly has a profileStoreProvider that, by default, re-uses the general PersistentStoreProvider, unless it is InMemory, which is when #1 is used. 2. Cleanly separated out QueryProfileStoreContext 3. Converted literal values to constants within ExecConstants 4. Updated drill-module.conf for default capacity closes apache#834
Provide an option to specify blocksize during file creation. This will help create parquet files with single block on HDFS, helping improve performance when we read those files. See DRILL-5379 for details. closes apache#826
The Parquet reader is Drill's premier data source and has worked very well for many years. As with any piece of code, it has grown in complexity over that time and has become hard to understand and maintain. In work in another project, we found that Parquet is accidentally creating "low density" batches: record batches with little actual data compared to the amount of memory allocated. We'd like to fix that. However, the current complexity of the reader code creates a barrier to making improvements: the code is so complex that it is often better to leave bugs unfixed, or risk spending large amounts of time struggling to make small changes. This commit offers to help revitalize the Parquet reader. Functionality is identical to the code in master; but code has been pulled apart into various classes each of which focuses on one part of the task: building up a schema, keeping track of read state, a strategy for reading various combinations of records, etc. The idea is that it is easier to understand several small, focused classes than one huge, complex class. Indeed, the idea of small, focused classes is common in the industry; it is nothing new. Unit tests pass with the change. Since no logic has chanaged, we only moved lines of code, that is a good indication that everything still works. Also includes fixes based on review comments. closes apache#789
…rd batch has large number of fields. - Changed estimation of max index value and added comments. close apache#818
…torage plugin is enabled close apache#845
1. Added WebUserConnection/AnonWebUserConnection and their providers for Authenticated and Anonymous web users. 2. Updated to store the UserSession, BufferAllocator and other session states inside the HttpSession of Jetty instead of storing in DrillUserPrincipal. For each request now a new instance of WebUserConnection will be created. However for authenticated users the UserSession and other states will be re-used whereas for Anonymous Users it will created for each request and later re-cycled after query execution. close apache#829
- Since parquet version of PageWriter cann't allow to use direct memory for allocating ByteBuffers, this PR introduces other version of PageWriter and PageWriteStore. See more: https://issues.apache.org/jira/browse/PARQUET-1006
vdiravka
pushed a commit
that referenced
this pull request
Jan 16, 2019
Currently, the WebServer side needs to process the entire set of results and stream it back to the WebClient. Since the WebUI does paginate results, we can load a larger set for pagination on the browser client and relieve pressure off the WebServer to host all the data (most of which will never be streamed to the browser). e.g. Fetching all rows from a 1Billion records table is impractical and can be capped at (say) 1K. Currently, the user has to explicitly specify LIMIT in the submitted query. An option is provided in the field to allow for this entry, and can be set to selected by default for the Web UI. The submitted query indicates that an auto-limiting wrapper was applied. [Update #1] Updated as per comments 1. Limit Wrapping Unchecked by default 2. Full List configuration of results [Update #2] Minor update [Update #3] Followup closes apache#1593
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Test PR for own repo