jj #3

tooptoop4 · 2020-05-23T11:11:43Z

No description provided.

The visitor always returned `null` result, and using `Node` as result type was misleading.

Default implementation provided by `DefaultTraversalVisitor` is suitable for `Void` result. Moreover, `DefaultTraversalVisitor` is useful base class for when traversing a tree (a `Void` result), not when transforming it (non-`Void` result).

The decision of how unnest expressions map to output fields is now determined by the analyzer. Doing this in the planner duplicates effort and is brittle.

This simplifies join operator as: 1. Probe is spilled in just one place. Previously probe was spilled when page was added or output was requested 2. Join operator state is now split between two internal WorkProcessors which a) join probe b) manage final unspilling 3. Operator state is not managed across multiple classic in/out operator methods

It's caller responsibility to choose when we append outer row.

Stress testing showed that Rubix caching is not stable with parallel warmup enabled. Temporarily diabling by default.

In our testing, a cloud's network proved to be not reliable. We observed data corruption when transmitting data over TCP between Presto nodes (internal communication unsecured, no compression). Verify data integrity to prevent incorrect query results. Optionally retry when data corruption is detected.

Internal Rubix configuration is modified by Rubix (BookKeeper and LocalDataTransferServer). If same cache key is used then FileSystem for such configuration can be cached by PrestoFileSystemCache causing cache to be disabled.

Co-authored-by: qqibrow <qqibrow@gmail.com> Co-authored-by: Zhenxiao Luo <luoz@uber.com>

This allows running queries over the results of a raw Elasticsearch query. It extends the syntax of the enhanced ES table names with the following: SELECT * FROM es.default."<index>$query:<base32-encoded ES query>" The query is base32-encoded to avoid having to deal with escaping quotes and case sensitivity issues in table identifiers. The result of these query tables is a table with a single row and a single column named "result" of type JSON.

We can directly use the stream method in any collection.

In real deployments Rubix on coordinator is not part of caching node pool. This pollutes coordinator log with Rubix error messages whenever coordinator reads a file (e.g ACID version file). This commit disables caching on coordinator by default. Caching on coordinator can still be enabled via feature toggle for development purpose.

Presto's JSON type imposes additional constraints that are not desirable for this use case: JSON values must be equatable and orderable. This requires parsing and re-organizing the document to canonicalize field ordering. Using VARCHAR will also play better with the SQL 2016 JSON features, which operate on binary or string data directly.

Additionally - remove spfileXE.ora and set paramaters programmatically - remove redundant "super." in TestingOracleServer

findepi and others added 30 commits April 27, 2020 21:12

Move non-contextual state from Context to field

e094af1

Fix incorrect visitor result type

2b02894

The visitor always returned `null` result, and using `Node` as result type was misleading.

Enforce DefaultTraversalVisitor Void result

1ce21bc

Default implementation provided by `DefaultTraversalVisitor` is suitable for `Void` result. Moreover, `DefaultTraversalVisitor` is useful base class for when traversing a tree (a `Void` result), not when transforming it (non-`Void` result).

Use Zulu JDK 11 to run product tests

aa89cb6

Remove fallback for Java 8

ea8f7bd

Properly initialize HDFS configuration in RubixInitializer

a4ed120

Rename variable

573dc38

Expose JdbcClient invocation statistics in JMX

7d211ef

Remove "Java 8" phrase from "stream API" comments

dc78d33

Update server RPM for Java 11

44be687

Remove documentation references to Java 8

9432607

Add random suffix to all tables in TestDistributedQueries

5bee687

Clean up planning of unnest

69f8c05

The decision of how unnest expressions map to output fields is now determined by the analyzer. Doing this in the planner duplicates effort and is brittle.

Fix javadoc

1ab3968

Remove unsound check

d5bcdf9

It's caller responsibility to choose when we append outer row.

Add Like predicate to SHOW COLUMNS

a246feb

Make Hive caching and S3 security mapping mutually exclusive

bc35fdb

Make Hive caching and GCS access token mutually exclusive

6912112

Verify support for correlation during analysis

5972bd3

Check Limit with ties rewritten

c14d78f

Support order sensitivity in RowNumberNode

052214e

Allow reading ORC files which do not have row-group information

ed1d66c

Do not used Rubix parallel warmup by default

eafbdcd

Stress testing showed that Rubix caching is not stable with parallel warmup enabled. Temporarily diabling by default.

Replace hive.cache.parallel-warmup-enabled with hive.cache.read-mode

c970ae0

Fix wrong return value in geospatial document

cfd3350

Add data mapping tests with padded chars to TestDistributedQueries

a380c5d

Verify number of deserialized pages

780e272

Skip whitespace padded char data mapping test for Kudu

b1a2969

sopel39 and others added 29 commits May 21, 2020 14:32

Make it explicit that none of the futures should fail

360ed2b

Static import MoreFutures#getFutureValue

07eefdf

Use dedicated cache key for internal Rubix configuration

ed96422

Internal Rubix configuration is modified by Rubix (BookKeeper and LocalDataTransferServer). If same cache key is used then FileSystem for such configuration can be cached by PrestoFileSystemCache causing cache to be disabled.

Test Hive caching with two tables created in sequence

0680bef

Remove unused method

7be38e6

Fix error message in WindowNode.Function constructor

7537ef0

Support multiple unnest outputs in unnest matcher

f00fae5

Pushdown dereference expressions in the query plan

69ef682

Co-authored-by: qqibrow <qqibrow@gmail.com> Co-authored-by: Zhenxiao Luo <luoz@uber.com>

Plan assertions for end to end dereference pushdown in hive

5dedde4

Eliminate deprecated stream method usage

e471fed

We can directly use the stream method in any collection.

Fix Launcher local startup failure

75a539d

Use MBean to obtain caching stats in tests

532bdba

Use assertEventually in Rubix tests

dc2c82e

Add Rubix read mode tests

73b93c2

Use generated table data in Rubix product test

f921ef0

Use Rubix async read mode by default

9c8c815

Add Java requirement to JDBC driver documentation

fa92f95

Add ISO date and time examples

1d9c37f

Fix documentation beginQuery and cleanupQuery

b4bd966

show join distribution type in webui live plan

c8f7a93

Make TestHiveCaching#testReadFromTable more resilient

eae0a0a

Add requirements, pspg and authentication for CLI

a2b0697

Add software requirements

5969330

Detail Hive connector requirements

d91c816

Increase process numbers to 1000 from 500 in Oracle test

3e0574c

Additionally - remove spfileXE.ora and set paramaters programmatically - remove redundant "super." in TestingOracleServer

Remove unused code in TypeSignatureParser

a9f2b2c

tooptoop4 merged commit 9f1f415 into tooptoop4:master May 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jj #3

jj #3

tooptoop4 commented May 23, 2020

jj #3

jj #3

Conversation

tooptoop4 commented May 23, 2020