Test PR for own repo by vdiravka · Pull Request #1 · vdiravka/drill

vdiravka · 2018-06-07T10:47:17Z

Test PR for own repo

…types returns no results [MapR-DB JSON Tables] + Added `enablePushdown` option to enable/disable all filter pushdown, enabled by default.

+ Fail query on schema change. + Added a configuration option 'ignoreSchemaChange', which when enabled, drops the rows from the result

Hive's HBaseStorageHandler uses HBase's TableInputFormat which is in hbase-server module.

…party folder.

…istribution. The default build/test/packaging behavior for mapr-format-plugin module are 1. BUILD of mapr-format-plugin is ENABLED. 2. Unit tests of mapr-format-plugin module are DISABLED (use `-Pmapr` to enable). 3. Packaging of mapr-format-plugin is DISABLED (use `-Pmapr` to enable). Please see LEGAL-251 for discussion/conclusion regarding inclusion of source code with non-open-source dependency.

Exclude 'hadoop-mapreduce-client-core' and 'hadoop-auth' as transitive dependencies from 'hbase-server'

…ated errors while closing the new group and issue a more detailed error message. close apache#591

…tinct) exists close apache#588

Tests for different data types close apache#598

… logging is not enabled.

+ Function visitor should not use previous function holder if this function is non-deterministic closes apache#509

+ Previously, binary_string used the input buffer as output buffer. So after calling binary_string, the original content was destroyed. Other expressions/ functions that need to access the original input buffer get wrong results. + This fix also sets readerIndex and writerIndex correctly for the output buffer, otherwise the consumer of the output buffer will hit issues. closes apache#604

These changes are a subset of the original pull request from DRILL-4539 (PR-462). - Added changes to support Null Equality Joins; - Created tests for it. close apache#603

…ill JDBC Drill JDBC driver uses Optiq Avatica as its basis, but this dependency has been moved over to Calcite, for quite some time without Drill code being updated for it. This patch updates Avatica version to the version from Calcite (1.4.0-drill-r19). It also refactors Drill JDBC driver to comply with the packages and API changes in Avatica. Finally it fixes the the SQL types for lists and structs, since Drill doesn't support java.sql.Array and java.sql.Struct interfaces. this closes apache#395 Change-Id: Ia608adf900e8708d9e6f6f58ed41e104321a9914

Support loading Drill driver using ServiceLoader. From the user perspective, it means being able to use the driver without registering it first, like by using Class.forName("org.apache.drill.jdbc.Driver") for example. this closes apache#596 Change-Id: Id26922ee42bef5fbce46ac2bcbb83f1859e9bb48

Change MetadataProvider to return metadata results ordered (following convention used by ODBC and JDBC specs). this closes apache#614 Change-Id: Iff59b7fada7040602f1735bccc13bc6bf5c9a252

- Adding tableType filter to GetTablesReq query (needed for JDBC and ODBC drivers). - Fix table type returned by sys and INFORMATION_SCHEMA tables - Also fixes some protobuf typos to related classes. this closes apache#612 Change-Id: If95246a312f6c6d64a88872936f516308874c2d2

This closes apache#605

…umber even during the unit tests. This is now a build-time generated class, rather than one that looks on the classpath for META-INF files. This pattern for file generation with parameters passed from the POM files was borrowed from parquet-mr.

Drill was writing non-standard dates into parquet files for all releases before 1.9.0. The values have been read by Drill correctly by Drill, but external tools like Spark reading the files will see corrupted values for all dates that have been written by Drill. This change corrects the behavior of the Drill parquet writer to correctly store dates in the format given in the parquet specification. To maintain compatibility with old files, the parquet reader code has been updated to check for the old format and automatically shift the corrupted values into corrected ones automatically. The test cases included here should ensure that all files produced by historical versions of Drill will continue to return the same values they had in previous releases. For compatibility with external tools, any old files with corrupted dates can be re-written using the CREATE TABLE AS command (as the writer will now only produce the specification-compliant values, even if after reading out of older corrupt files). While the old behavior was a consistent shift into an unlikely range to be used in a modern database (over 10,000 years in the future), these are still valid date values. In the case where these may have been written into files intentionally, and we cannot be certain from the metadata if Drill produced the files, an option is included to turn off the auto-correction. Use of this option is assumed to be extremely unlikely, but it is included for completeness. This patch was originally written against version 1.5.0, when rebasing the corruption threshold was updated to 1.9.0. Added regenerated binary files, updated metadata cache files accordingly. One small fix in the ParquetGroupScan to accommodate changes in master that changed when metadata is read. Tests for bugs revealed by the regression suite. Fix drill version number in metadata file generation

…eld in the parquet meta info "is.date.correct = true"; - Removed unnecessary double conversion of value with Julian day; - Added ability to correct corrupted dates for parquet files with the second version old metadata cache file as well. This closes apache#595

1) Configuration / parsing / options / protos 2) Zookeeper integration 3) Registration / unregistration / lazy-init 4) Unit tests This closes apache#574

This closes apache#593

…ressing This closes apache#518

There's no name and version exchanged between client and server over the User RPC channel. On client side, having access to the server name and version is useful to expose it to the user (through JDBC or ODBC api like DatabaseMetadata#getDatabaseProductVersion()), or to implement fallback strategy when some recent API are not available (like metadata API). On the server side, having access to the client version might be useful for audit purposes and eventually to implement fallback strategy if it doesn't require a RPC version change. this closes apache#622

closes apache#621

Unit testing revealed that the VectorAccessorSerializable class claims to serialize SV2s, but, in fact, does not. Actually, it writes them, but does not read them, resulting in corrupted data on read. Fortunately, no code appears to serialize sv2s at present. Still, it is a bug and needs to be fixed. First task is to add serialization code for the sv2. That revealed that the recently-added code to save DrillBufs using a shared buffer had a bug: it relied on the writer index to know how much data is in the buffer. Turns out sv2 buffers don’t set this index. So, new versions of the write function takes a write length. Then, closer inspection of the read code revealed duplicated code. So, DrillBuf allocation moved into a version of the read function that now does reading and DrillBuf allocation. Turns out that value vectors, but not SV2s, can be built from a Drillbuf. Added a matching constructor to the SV2 class. Finally, cleaned up the code a bit to make it easier to follow. Also allowed test code to access the handy timer already present in the code. closes apache#800

…atch Unit tests showed that the “priority queue copier” does not handle an empty batch. This has not been an issue because code elsewhere in the sort specifically works around this issue. This fix resolves the issue at the source to avoid the need for future work-arounds. closes apache#778

…literals closes apache#825

Refactors ScanBatch to allow unit testing of record reader implementations, especially the “writer” classes. See JIRA for details. closes apache#811

… functions 1. Revisited calculation logic for string literals and some string functions (cast, upper, lower, initcap, reverse, concat, concat operator, rpad, lpad, case statement, coalesce, first_value, last_value, lag, lead). Synchronized return type length calculation logic between limit 0 and regular queries. 2. Deprecated width and changed it to precision for string types in MajorType. 3. Revisited FunctionScope and splitted it into FunctionScope and ReturnType. FunctionScope will indicate only function usage in term of number of in / out rows, (n -> 1, 1 -> 1, 1->n). New annotation in UDFs ReturnType will indicate which return type strategy should be used. 4. Changed MAX_VARCHAR_LENGTH from 65536 to 65535. 5. Updated calculation of precision and display size for INTERVALYEAR & INTERVALDAY. 6. Refactored part of function code-gen logic (ValueReference, WorkspaceReference, FunctionAttributes, DrillFuncHolder). This closes apache#819

…nd reuse table and tabletInfo per query instead of fetching them multiple times. Compute rowCount from tabletInfo instead of expensive tableStats call. This closes apache#817

…rectly This closes apache#821

…s consisting of multiple operators. This closes apache#823

…ble - A small refactoring of original fix of this issue (DRILL-4039); - Added test for the fix.

If the Hive server restarts, Drill either hangs or continually reports errors when retrieving schemas. The problem is that the Hive plugin tries to handle connection failures, but does not do so correctly for the secure connection case. The problem is complex, see DRILL-5496 for details. This is a workaround: we discard the entire Hive schema cache when we encounter an unhandled connection exception, then we rebuild a new one. This is not a proper fix; for that we'd have to restructure the code. This will, however, solve the immediate problem until we do the needed restructuring.

close apache#839

See DRILL-5498 for details. Replaced the repeated varchar reader for reading columns with a purpose built column parser. Implemented rules to recover from invalid column headers. Added missing test method Changes re code review comments Back out testing-only change close apache#830

…LL-5419 close apache#842

NOTE: This pull request provides support for on-wire encryption using SASL framework. The communication channel that are covered are: 1) Between Drill JDBC client and Drillbit. 2) Between Drillbit to Drillbit i.e. control/data channels. 3) It has UI change to view encryption is enabled on which network channel and number of encrypted/unencrypted connections for user/control/data connections. close apache#773

NOTE: This pull request provides support for on-wire encryption using SASL framework. Communication channel covered is: 1) C++ Drill Client and Drillbit channel. close apache#809

…thod Changes: 1. Fixed DCL in FunctionInitializer.checkInit() method (update flag parameter when function body is loaded). 2. Fixed ImportGrabber.getImports() method to return the list with imports. 3. Added unit tests for FunctionInitializer. 4. Minor refactoring (renamed methods, added javadoc). closes apache#843

Standardizes error handling to throw a UserException. Prior code threw various exceptions, called the fail() method, or returned a variety of status codes. closes apache#838

Validates offset vectors in VarChar and repeated vectors. Validates the special case of repeated VarChar vectors (two layers of offsets.) Provides two new session variables to turn on validation. One enables the existing operator (iterator) validation, the other adds vector validation. This allows validation to occur in a “production” Drill (without restarting Drill with assertions, as previously required.) Unit tests validate the validator. Another test validates the integration, but requires manual steps, so is ignored by default. This version is first-cut: all work is done within a single class. Allows back-porting to an earlier version to solve a specific issues. A revision should move some of the work into generated code (or refactor vectors to allow outside access), since offset vectors appear for each subclass; not on a base class that would allow generic operations. * Added boot-time options to allow enabling vector validation in Maven unit tests. * Code cleanup per suggestions. * Additional (manual) tests for boot-time options and default options. closes apache#832

1. Introduced an InMemoryStoreProvider with the ability to maintain a max capacity 2. DrillbitContext now explicitly has a profileStoreProvider that, by default, re-uses the general PersistentStoreProvider, unless it is InMemory, which is when #1 is used. 2. Cleanly separated out QueryProfileStoreContext 3. Converted literal values to constants within ExecConstants 4. Updated drill-module.conf for default capacity closes apache#834

Provide an option to specify blocksize during file creation. This will help create parquet files with single block on HDFS, helping improve performance when we read those files. See DRILL-5379 for details. closes apache#826

The Parquet reader is Drill's premier data source and has worked very well for many years. As with any piece of code, it has grown in complexity over that time and has become hard to understand and maintain. In work in another project, we found that Parquet is accidentally creating "low density" batches: record batches with little actual data compared to the amount of memory allocated. We'd like to fix that. However, the current complexity of the reader code creates a barrier to making improvements: the code is so complex that it is often better to leave bugs unfixed, or risk spending large amounts of time struggling to make small changes. This commit offers to help revitalize the Parquet reader. Functionality is identical to the code in master; but code has been pulled apart into various classes each of which focuses on one part of the task: building up a schema, keeping track of read state, a strategy for reading various combinations of records, etc. The idea is that it is easier to understand several small, focused classes than one huge, complex class. Indeed, the idea of small, focused classes is common in the industry; it is nothing new. Unit tests pass with the change. Since no logic has chanaged, we only moved lines of code, that is a good indication that everything still works. Also includes fixes based on review comments. closes apache#789

closes apache#828

…rd batch has large number of fields. - Changed estimation of max index value and added comments. close apache#818

…torage plugin is enabled close apache#845

1. Added WebUserConnection/AnonWebUserConnection and their providers for Authenticated and Anonymous web users. 2. Updated to store the UserSession, BufferAllocator and other session states inside the HttpSession of Jetty instead of storing in DrillUserPrincipal. For each request now a new instance of WebUserConnection will be created. However for authenticated users the UserSession and other states will be re-used whereas for Anonymous Users it will created for each request and later re-cycled after query execution. close apache#829

- Since parquet version of PageWriter cann't allow to use direct memory for allocating ByteBuffers, this PR introduces other version of PageWriter and PageWriteStore. See more: https://issues.apache.org/jira/browse/PARQUET-1006

Currently, the WebServer side needs to process the entire set of results and stream it back to the WebClient. Since the WebUI does paginate results, we can load a larger set for pagination on the browser client and relieve pressure off the WebServer to host all the data (most of which will never be streamed to the browser). e.g. Fetching all rows from a 1Billion records table is impractical and can be capped at (say) 1K. Currently, the user has to explicitly specify LIMIT in the submitted query. An option is provided in the field to allow for this entry, and can be set to selected by default for the Web UI. The submitted query indicates that an auto-limiting wrapper was applied. [Update #1] Updated as per comments 1. Limit Wrapping Unchecked by default 2. Full List configuration of results [Update #2] Minor update [Update #3] Followup closes apache#1593

Patrick Wong and others added 30 commits September 9, 2016 10:08

Updated plugin version to 1.7.0-SNAPSHOT

e5bef16

MD-789: Query with condition involving addition of DATE and INTERVAL …

391027a

…types returns no results [MapR-DB JSON Tables] + Added `enablePushdown` option to enable/disable all filter pushdown, enabled by default.

MD-813: Improve count(*) queries against MapR-DB Json tables.

12cbd27

+ Fail query on schema change. + Added a configuration option 'ignoreSchemaChange', which when enabled, drops the rows from the result

Update MapR v5.1.0 artifacts version

5fa9ba3

DRILL-4199: Add Support for HBase 1.X

1882d93

Updated plugin version to 1.9.0-SNAPSHOT

e42fb30

Explicitly specify hbase-server dependency in Hive storage plugin.

ac462c5

Hive's HBaseStorageHandler uses HBase's TableInputFormat which is in hbase-server module.

Exclude 'drill-memory-base' and 'tpch-sample-data' jars from jars/3rd…

373272c

…party folder.

DRILL-4894: Fix unit test failure in 'storage-hive/core' module

f3c26e3

Exclude 'hadoop-mapreduce-client-core' and 'hadoop-auth' as transitive dependencies from 'hbase-server'

DRILL-3898 : Sort spill was modified to catch all errors, ignore repe…

140304d

…ated errors while closing the new group and issue a more detailed error message. close apache#591

DRILL-4771: Drill should avoid doing the same join twice if count(dis…

2295715

…tinct) exists close apache#588

DRILL-4906: CASE Expression with constant generates class exception

1375a00

Tests for different data types close apache#598

DRILL-4911: Avoid plan serialization in SimpleParallelizer when debug…

4edabe7

… logging is not enabled.

DRILL-4618: Correct the usage of random flag in Hive function registry

50dea89

+ Function visitor should not use previous function holder if this function is non-deterministic closes apache#509

DRILL-4927: Add support for Null Equality Joins

a29f1e2

These changes are a subset of the original pull request from DRILL-4539 (PR-462). - Added changes to support Null Equality Joins; - Created tests for it. close apache#603

DRILL-4930: Fix Metadata results ordering

4304817

Change MetadataProvider to return metadata results ordered (following convention used by ODBC and JDBC specs). this closes apache#614 Change-Id: Iff59b7fada7040602f1735bccc13bc6bf5c9a252

DRILL-4870 drill-config.sh sets JAVA_HOME incorrectly for the Mac

17b9648

This closes apache#605

DRILL-4726: Dynamic UDF Support

89f2633

1) Configuration / parsing / options / protos 2) Zookeeper integration 3) Registration / unregistration / lazy-init 4) Unit tests This closes apache#574

DRILL-3178: csv reader should allow newlines inside quotes

42948fe

This closes apache#593

DRILL-4653: Malformed JSON should not stop the entire query from prog…

db48298

…ressing This closes apache#518

DRILL-4950: Remove incorrect false condition; consume all empty batches

6d2b948

closes apache#621

Paul Rogers and others added 28 commits May 5, 2017 15:43

DRILL-4039: Query fails when non-ascii characters are used in string …

0939485

…literals closes apache#825

DRILL-5423: Refactor ScanBatch to allow unit testing record readers

41ffed5

Refactors ScanBatch to allow unit testing of record reader implementations, especially the “writer” classes. See JIRA for details. closes apache#811

DRILL-5429: Improve query performance for MapR DB JSON Tables Cache a…

27c5f45

…nd reuse table and tabletInfo per query instead of fetching them multiple times. Compute rowCount from tabletInfo instead of expensive tableStats call. This closes apache#817

DRILL-5450: Fix initcap function to convert upper case characters cor…

cb9547a

…rectly This closes apache#821

DRILL-5459: Extend physical operator test framework to test mini plan…

0dc237e

…s consisting of multiple operators. This closes apache#823

DRILL-3250: Drill fails to compare multi-byte characters from hive ta…

2988f71

…ble - A small refactoring of original fix of this issue (DRILL-4039); - Added test for the fix.

DRILL-5399: Fix race condition in DrillComplexWriterFuncHolder

9972669

DRILL-5516: Limit memory usage for Hbase reader

7f98400

close apache#839

DRILL-5523: Revert if condition in UnionAllRecordBatch changed in DRI…

416ec70

…LL-5419 close apache#842

DRILL-4335: Apache Drill should support network encryption.

d11aba2

NOTE: This pull request provides support for on-wire encryption using SASL framework. Communication channel covered is: 1) C++ Drill Client and Drillbit channel. close apache#809

Updated README to include export control section

62326be

DRILL-5512: Standardize error handling in ScanBatch

7873988

Standardizes error handling to throw a UserException. Prior code threw various exceptions, called the fail() method, or returned a variety of status codes. closes apache#838

DRILL-5379: Set Hdfs Block Size based on Parquet Block Size

9ab91ff

Provide an option to specify blocksize during file creation. This will help create parquet files with single block on HDFS, helping improve performance when we read those files. See DRILL-5379 for details. closes apache#826

DRILL-5229: Update kudu-client to 1.3.0

dd2692e

closes apache#828

DRILL-5140: Fix CompileException in run-time generated code when reco…

b14e30b

…rd batch has large number of fields. - Changed estimation of max index value and added comments. close apache#818

DRILL-5537: Display columns alias for queries with sum() when RDBMS s…

d38917b

…torage plugin is enabled close apache#845

vdiravka merged commit 4b18c31 into DRILL-test Jun 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test PR for own repo#1

Test PR for own repo#1
vdiravka merged 354 commits intoDRILL-testfrom
DRILL-5544

vdiravka commented Jun 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

vdiravka commented Jun 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants