Skip to content

Test PR for own repo#1

Merged
vdiravka merged 354 commits intoDRILL-testfrom
DRILL-5544
Jun 7, 2018
Merged

Test PR for own repo#1
vdiravka merged 354 commits intoDRILL-testfrom
DRILL-5544

Conversation

@vdiravka
Copy link
Owner

@vdiravka vdiravka commented Jun 7, 2018

Test PR for own repo

Patrick Wong and others added 30 commits September 9, 2016 10:08
…types returns no results [MapR-DB JSON Tables]

+ Added `enablePushdown` option to enable/disable all filter pushdown, enabled by default.
+ Fail query on schema change.
+ Added a configuration option 'ignoreSchemaChange', which when enabled, drops the rows from the result
Hive's HBaseStorageHandler uses HBase's TableInputFormat which is in hbase-server module.
…istribution.

The default build/test/packaging behavior for mapr-format-plugin module are

1. BUILD of mapr-format-plugin is ENABLED.
2. Unit tests of mapr-format-plugin module are DISABLED (use `-Pmapr` to enable).
3. Packaging of mapr-format-plugin is DISABLED (use `-Pmapr` to enable).

Please see LEGAL-251 for discussion/conclusion regarding inclusion of source code with non-open-source dependency.
Exclude 'hadoop-mapreduce-client-core' and 'hadoop-auth' as transitive dependencies from 'hbase-server'
…ated errors while closing the new group and issue a more detailed error message.

close apache#591
+ Function visitor should not use previous function holder if this function is non-deterministic

closes apache#509
+ Previously, binary_string used the input buffer as output buffer. So after calling binary_string, the original content was destroyed. Other expressions/ functions that need to access the original input buffer get wrong results.
+ This fix also sets readerIndex and writerIndex correctly for the output buffer, otherwise the consumer of the output buffer will hit issues.

closes apache#604
These changes are a subset of the original pull request from DRILL-4539 (PR-462).
- Added changes to support Null Equality Joins;
- Created tests for it.

close apache#603
…ill JDBC

Drill JDBC driver uses Optiq Avatica as its basis, but this dependency has
been moved over to Calcite, for quite some time without Drill code being
updated for it.

This patch updates Avatica version to the version from Calcite
(1.4.0-drill-r19). It also refactors Drill JDBC driver to comply with the
packages and API changes in Avatica. Finally it fixes the the SQL types for
lists and structs, since Drill doesn't support java.sql.Array and
java.sql.Struct interfaces.

this closes apache#395

Change-Id: Ia608adf900e8708d9e6f6f58ed41e104321a9914
Support loading Drill driver using ServiceLoader. From the user perspective,
it means being able to use the driver without registering it first, like by using
Class.forName("org.apache.drill.jdbc.Driver") for example.

this closes apache#596

Change-Id: Id26922ee42bef5fbce46ac2bcbb83f1859e9bb48
Change MetadataProvider to return metadata results ordered (following
convention used by ODBC and JDBC specs).

this closes apache#614

Change-Id: Iff59b7fada7040602f1735bccc13bc6bf5c9a252
- Adding tableType filter to GetTablesReq query (needed for JDBC and ODBC
drivers).
- Fix table type returned by sys and INFORMATION_SCHEMA tables
- Also fixes some protobuf typos to related classes.

this closes apache#612

Change-Id: If95246a312f6c6d64a88872936f516308874c2d2
…umber even during the unit tests.

This is now a build-time generated class, rather than one that looks on the
classpath for META-INF files.

This pattern for file generation with parameters passed from the POM files
was borrowed from parquet-mr.
Drill was writing non-standard dates into parquet files for all releases
before 1.9.0. The values have been read by Drill correctly by Drill, but
external tools like Spark reading the files will see corrupted values for
all dates that have been written by Drill.

This change corrects the behavior of the Drill parquet writer to correctly
store dates in the format given in the parquet specification.

To maintain compatibility with old files, the parquet reader code has
been updated to check for the old format and automatically shift the
corrupted values into corrected ones automatically.

The test cases included here should ensure that all files produced by
historical versions of Drill will continue to return the same values they
had in previous releases. For compatibility with external tools, any old
files with corrupted dates can be re-written using the CREATE TABLE AS
command (as the writer will now only produce the specification-compliant
values, even if after reading out of older corrupt files).

While the old behavior was a consistent shift into an unlikely range
to be used in a modern database (over 10,000 years in the future), these are still
valid date values. In the case where these may have been written into
files intentionally, and we cannot be certain from the metadata if Drill
produced the files, an option is included to turn off the auto-correction.
Use of this option is assumed to be extremely unlikely, but it is included
for completeness.

This patch was originally written against version 1.5.0, when rebasing
the corruption threshold was updated to 1.9.0.

Added regenerated binary files, updated metadata cache files accordingly.

One small fix in the ParquetGroupScan to accommodate changes in master that changed
when metadata is read.

Tests for bugs revealed by the regression suite.

Fix drill version number in metadata file generation
…eld in the parquet meta info "is.date.correct = true"; - Removed unnecessary double conversion of value with Julian day; - Added ability to correct corrupted dates for parquet files with the second version old metadata cache file as well.

This closes apache#595
1) Configuration / parsing / options / protos
2) Zookeeper integration
3) Registration / unregistration / lazy-init
4) Unit tests

This closes apache#574
There's no name and version exchanged between client and server over the User RPC
channel.

On client side, having access to the server name and version is useful to expose it
to the user (through JDBC or ODBC api like DatabaseMetadata#getDatabaseProductVersion()),
or to implement fallback strategy when some recent API are not available (like
metadata API).

On the server side, having access to the client version might be useful for audit
purposes and eventually to implement fallback strategy if it doesn't require a RPC
version change.

this closes apache#622
Paul Rogers and others added 28 commits May 5, 2017 15:43
Unit testing revealed that the VectorAccessorSerializable class claims
to serialize SV2s, but, in fact, does not. Actually, it writes them,
but does not read them, resulting in corrupted data on read.

Fortunately, no code appears to serialize sv2s at present. Still, it is
a bug and needs to be fixed.

First task is to add serialization code for the sv2.

That revealed that the recently-added code to save DrillBufs using a
shared buffer had a bug: it relied on the writer index to know how much
data is in the buffer. Turns out sv2 buffers don’t set this index. So,
new versions of the write function takes a write length.

Then, closer inspection of the read code revealed duplicated code. So,
DrillBuf allocation moved into a version of the read function that now
does reading and DrillBuf allocation.

Turns out that value vectors, but not SV2s, can be built from a
Drillbuf. Added a matching constructor to the SV2 class.

Finally, cleaned up the code a bit to make it easier to follow. Also
allowed test code to access the handy timer already present in the code.

closes apache#800
…atch

Unit tests showed that the “priority queue copier” does not handle an
empty batch. This has not been an issue because code elsewhere in the
sort specifically works around this issue. This fix resolves the issue
at the source to avoid the need for future work-arounds.

closes apache#778
Refactors ScanBatch to allow unit testing of record reader
implementations, especially the “writer” classes.

See JIRA for details.

closes apache#811
… functions

1. Revisited calculation logic for string literals and some string functions
(cast, upper, lower, initcap, reverse, concat, concat operator, rpad, lpad, case statement,
 coalesce, first_value, last_value, lag, lead).
Synchronized return type length calculation logic between limit 0 and regular queries.

2. Deprecated width and changed it to precision for string types in MajorType.

3. Revisited FunctionScope and splitted it into FunctionScope and ReturnType.
FunctionScope will indicate only function usage in term of number of in / out rows, (n -> 1, 1 -> 1, 1->n).
New annotation in UDFs ReturnType will indicate which return type strategy should be used.

4. Changed MAX_VARCHAR_LENGTH from 65536 to 65535.

5. Updated calculation of precision and display size for INTERVALYEAR & INTERVALDAY.

6. Refactored part of function code-gen logic (ValueReference, WorkspaceReference, FunctionAttributes, DrillFuncHolder).

This closes apache#819
…nd reuse table and tabletInfo per query instead of fetching them multiple times. Compute rowCount from tabletInfo instead of expensive tableStats call.

This closes apache#817
…ble - A small refactoring of original fix of this issue (DRILL-4039); - Added test for the fix.
If the Hive server restarts, Drill either hangs or continually reports
errors when retrieving schemas. The problem is that the Hive plugin
tries to handle connection failures, but does not do so correctly for
the secure connection case. The problem is complex, see DRILL-5496 for
details.

This is a workaround: we discard the entire Hive schema cache when we
encounter an unhandled connection exception, then we rebuild a new one.

This is not a proper fix; for that we'd have to restructure the code.

This will, however, solve the immediate problem until we do the needed
restructuring.
See DRILL-5498 for details.

Replaced the repeated varchar reader for reading columns with a purpose
built column parser. Implemented rules to recover from invalid column
headers.

Added missing test method

Changes re code review comments

Back out testing-only change

close apache#830
    NOTE: This pull request provides support for on-wire encryption using SASL framework. The communication channel that are covered are:
    1) Between Drill JDBC client and Drillbit.
    2) Between Drillbit to Drillbit i.e. control/data channels.
    3) It has UI change to view encryption is enabled on which network channel and number of encrypted/unencrypted connections for
       user/control/data connections.

close apache#773
NOTE: This pull request provides support for on-wire encryption using SASL framework. Communication channel covered is:
      1) C++ Drill Client and Drillbit channel.

close apache#809
…thod

Changes:
1. Fixed DCL in FunctionInitializer.checkInit() method (update flag parameter when function body is loaded).
2. Fixed ImportGrabber.getImports() method to return the list with imports.
3. Added unit tests for FunctionInitializer.
4. Minor refactoring (renamed methods, added javadoc).

closes apache#843
Standardizes error handling to throw a UserException. Prior code threw
various exceptions, called the fail() method, or returned a variety of
status codes.

closes apache#838
Validates offset vectors in VarChar and repeated vectors. Validates the
special case of repeated VarChar vectors (two layers of offsets.)

Provides two new session variables to turn on validation. One enables
the existing operator (iterator) validation, the other adds vector
validation. This allows validation to occur in a “production” Drill
(without restarting Drill with assertions, as previously required.)

Unit tests validate the validator. Another test validates the
integration, but requires manual steps, so is ignored by default.

This version is first-cut: all work is done within a single class.
Allows back-porting to an earlier version to solve a specific issues. A
revision should move some of the work into generated code (or refactor
vectors to allow outside access), since offset vectors appear for each
subclass; not on a base class that would allow generic operations.

* Added boot-time options to allow enabling vector validation in Maven
unit tests.
* Code cleanup per suggestions.
* Additional (manual) tests for boot-time options and default options.

closes apache#832
1. Introduced an InMemoryStoreProvider with the ability to maintain a max capacity
2. DrillbitContext now explicitly has a profileStoreProvider that, by default, re-uses the general PersistentStoreProvider, unless it is InMemory, which is when #1 is used.
2. Cleanly separated out QueryProfileStoreContext
3. Converted literal values to constants within ExecConstants
4. Updated drill-module.conf for default capacity

closes apache#834
Provide an option to specify blocksize during file creation.
This will help create parquet files with single block on HDFS, helping improve performance when we read those files.

See DRILL-5379 for details.

closes apache#826
The Parquet reader is Drill's premier data source and has worked very well
for many years. As with any piece of code, it has grown in complexity over
that time and has become hard to understand and maintain.

In work in another project, we found that Parquet is accidentally creating
"low density" batches: record batches with little actual data compared to
the amount of memory allocated. We'd like to fix that.

However, the current complexity of the reader code creates a barrier to
making improvements: the code is so complex that it is often better to
leave bugs unfixed, or risk spending large amounts of time struggling to
make small changes.

This commit offers to help revitalize the Parquet reader. Functionality is
identical to the code in master; but code has been pulled apart into
various classes each of which focuses on one part of the task: building
up a schema, keeping track of read state, a strategy for reading various
combinations of records, etc. The idea is that it is easier to understand
several small, focused classes than one huge, complex class. Indeed, the
idea of small, focused classes is common in the industry; it is nothing new.

Unit tests pass with the change. Since no logic has chanaged, we only moved
lines of code, that is a good indication that everything still works.

Also includes fixes based on review comments.

closes apache#789
…rd batch has large number of fields.

- Changed estimation of max index value and added comments.

close apache#818
1. Added WebUserConnection/AnonWebUserConnection and their providers for Authenticated and Anonymous web users.
2. Updated to store the UserSession, BufferAllocator and other session states inside the HttpSession of Jetty instead
	of storing in DrillUserPrincipal. For each request now a new instance of WebUserConnection will be created. However
	for authenticated users the UserSession and other states will be re-used whereas for Anonymous Users it will created
	for each request and later re-cycled after query execution.

close apache#829
- Since parquet version of PageWriter cann't allow to use direct memory for allocating ByteBuffers,
  this PR introduces other version of PageWriter and PageWriteStore. See more: https://issues.apache.org/jira/browse/PARQUET-1006
@vdiravka vdiravka merged commit 4b18c31 into DRILL-test Jun 7, 2018
vdiravka pushed a commit that referenced this pull request Jan 16, 2019
Currently, the WebServer side needs to process the entire set of results and stream it back to the WebClient.
Since the WebUI does paginate results, we can load a larger set for pagination on the browser client and relieve pressure off the WebServer to host all the data (most of which will never be streamed to the browser).
e.g. Fetching all rows from a 1Billion records table is impractical and can be capped at (say) 1K. Currently, the user has to explicitly specify LIMIT in the submitted query.
An option is provided in the field to allow for this entry, and can be set to selected by default for the Web UI.
The submitted query indicates that an auto-limiting wrapper was applied.
[Update #1] Updated as per comments
1. Limit Wrapping Unchecked by default
2. Full List configuration of results
[Update #2] Minor update
[Update #3] Followup
closes apache#1593
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.