Skip to content

Commit e57e14e

Browse files
author
QP Hou
authored
update datafusion website (#162)
1 parent 20861e5 commit e57e14e

File tree

17 files changed

+787
-68
lines changed

17 files changed

+787
-68
lines changed

datafusion/_modules/datafusion.html

Lines changed: 546 additions & 0 deletions
Large diffs are not rendered by default.

datafusion/_modules/index.html

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -397,8 +397,9 @@
397397

398398
<h1>All modules for which code is available</h1>
399399
<ul><li><a href="builtins.html">builtins</a></li>
400-
<li><a href="datafusion/functions.html">datafusion.functions</a></li>
401-
<li><a href="functions.html">functions</a></li>
400+
<li><a href="datafusion.html">datafusion</a></li>
401+
<ul><li><a href="datafusion/functions.html">datafusion.functions</a></li>
402+
</ul><li><a href="functions.html">functions</a></li>
402403
</ul>
403404

404405
</div>

datafusion/_sources/cli/index.rst.txt

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,24 @@ The Arrow DataFusion CLI is a command-line interactive SQL utility that allows
2323
queries to be executed against CSV and Parquet files. It is a convenient way to
2424
try DataFusion out with your own data sources.
2525

26+
Install and run using Homebrew (on MacOS)
27+
=========================================
28+
29+
The easiest way to give DataFusion CLI a spin is via Homebrew (on MacOS). Install it as any other pre-built software like this:
30+
31+
.. code-block:: bash
32+
33+
brew install datafusion
34+
# ==> Downloading https://ghcr.io/v2/homebrew/core/datafusion/manifests/5.0.0
35+
# ######################################################################## 100.0%
36+
# ==> Downloading https://ghcr.io/v2/homebrew/core/datafusion/blobs/sha256:9ecc8a01be47ceb9a53b39976696afa87c0a8
37+
# ==> Downloading from https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:9ecc8a01be47ceb9a53b39976
38+
# ######################################################################## 100.0%
39+
# ==> Pouring datafusion--5.0.0.big_sur.bottle.tar.gz
40+
# 🍺 /usr/local/Cellar/datafusion/5.0.0: 9 files, 17.4MB
41+
42+
datafusion-cli
43+
2644
Run using Cargo
2745
===============
2846

datafusion/_sources/python/index.rst.txt

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,10 @@ Simple usage:
3939
.. code-block:: python
4040
4141
import datafusion
42+
from datafusion import functions as f
43+
from datafusion import col
4244
import pyarrow
4345
44-
# an alias
45-
f = datafusion.functions
46-
4746
# create a context
4847
ctx = datafusion.ExecutionContext()
4948
@@ -56,8 +55,8 @@ Simple usage:
5655
5756
# create a new statement
5857
df = df.select(
59-
f.col("a") + f.col("b"),
60-
f.col("a") - f.col("b"),
58+
col("a") + col("b"),
59+
col("a") - col("b"),
6160
)
6261
6362
# execute and collect the first (and only) batch
@@ -77,7 +76,7 @@ UDFs
7776
7877
udf = f.udf(is_null, [pyarrow.int64()], pyarrow.bool_())
7978
80-
df = df.select(udf(f.col("a")))
79+
df = df.select(udf(col("a")))
8180
8281
8382
UDAF
@@ -117,7 +116,7 @@ UDAF
117116
118117
df = df.aggregate(
119118
[],
120-
[udaf(f.col("a"))]
119+
[udaf(col("a"))]
121120
)
122121
123122

datafusion/_sources/specification/roadmap.md.txt

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ to provide:
6161
- Additional constant folding / partial evaluation [#1070](https://github.com/apache/arrow-datafusion/issues/1070)
6262
- More sophisticated cost based optimizer for join ordering
6363
- Implement advanced query optimization framework (Tokomak) #440
64+
- Finer optimizations for group by and aggregate functions
6465

6566
## Datasources
6667

@@ -92,8 +93,28 @@ Note: There are some additional thoughts on a datafusion-cli vision on [#1096](h
9293
- publishing to apt, brew, and possible NuGet registry so that people can use it more easily
9394
- adopt a shorter name, like dfcli?
9495

95-
## Ballista
96+
# Ballista
9697

97-
# Vision
98+
Ballista is a distributed compute platform based on Apache Arrow and DataFusion. It provides a query scheduler that
99+
breaks a physical plan into stages and tasks and then schedules tasks for execution across the available executors
100+
in the cluster.
98101

99-
TBD
102+
Having Ballista as part of the DataFusion codebase helps ensure that DataFusion remains suitable for distributed
103+
compute. For example, it helps ensure that physical query plans can be serialized to protobuf format and that they
104+
remain language-agnostic so that executors can be built in languages other than Rust.
105+
106+
## Ballista Roadmap
107+
108+
## Move query scheduler into DataFusion
109+
110+
The Ballista scheduler has some advantages over DataFusion query execution because it doesn't try to eagerly execute
111+
the entire query at once but breaks it down into a directionally-acyclic graph (DAG) of stages and executes a
112+
configurable number of stages and tasks concurrently. It should be possible to push some of this logic down to
113+
DataFusion so that the same scheduler can be used to scale across cores in-process and across nodes in a cluster.
114+
115+
## Implement execution-time cost-based optimizations based on statistics
116+
117+
After the execution of a query stage, accurate statistics are available for the resulting data. These statistics
118+
could be leveraged by the scheduler to optimize the query during execution. For example, when performing a hash join
119+
it is desirable to load the smaller side of the join into memory and in some cases we cannot predict which side will
120+
be smaller until execution time.

datafusion/cli/index.html

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,6 +395,11 @@
395395

396396
<nav id="bd-toc-nav">
397397
<ul class="visible nav section-nav flex-column">
398+
<li class="toc-h2 nav-item toc-entry">
399+
<a class="reference internal nav-link" href="#install-and-run-using-homebrew-on-macos">
400+
Install and run using Homebrew (on MacOS)
401+
</a>
402+
</li>
398403
<li class="toc-h2 nav-item toc-entry">
399404
<a class="reference internal nav-link" href="#run-using-cargo">
400405
Run using Cargo
@@ -453,6 +458,22 @@ <h1>DataFusion Command-line<a class="headerlink" href="#datafusion-command-line"
453458
<p>The Arrow DataFusion CLI is a command-line interactive SQL utility that allows
454459
queries to be executed against CSV and Parquet files. It is a convenient way to
455460
try DataFusion out with your own data sources.</p>
461+
<div class="section" id="install-and-run-using-homebrew-on-macos">
462+
<h2>Install and run using Homebrew (on MacOS)<a class="headerlink" href="#install-and-run-using-homebrew-on-macos" title="Permalink to this headline"></a></h2>
463+
<p>The easiest way to give DataFusion CLI a spin is via Homebrew (on MacOS). Install it as any other pre-built software like this:</p>
464+
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>brew install datafusion
465+
<span class="c1"># ==&gt; Downloading https://ghcr.io/v2/homebrew/core/datafusion/manifests/5.0.0</span>
466+
<span class="c1"># ######################################################################## 100.0%</span>
467+
<span class="c1"># ==&gt; Downloading https://ghcr.io/v2/homebrew/core/datafusion/blobs/sha256:9ecc8a01be47ceb9a53b39976696afa87c0a8</span>
468+
<span class="c1"># ==&gt; Downloading from https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:9ecc8a01be47ceb9a53b39976</span>
469+
<span class="c1"># ######################################################################## 100.0%</span>
470+
<span class="c1"># ==&gt; Pouring datafusion--5.0.0.big_sur.bottle.tar.gz</span>
471+
<span class="c1"># 🍺 /usr/local/Cellar/datafusion/5.0.0: 9 files, 17.4MB</span>
472+
473+
datafusion-cli
474+
</pre></div>
475+
</div>
476+
</div>
456477
<div class="section" id="run-using-cargo">
457478
<h2>Run using Cargo<a class="headerlink" href="#run-using-cargo" title="Permalink to this headline"></a></h2>
458479
<p>Use the following commands to clone this repository and run the CLI. This will require the Rust toolchain to be installed. Rust can be installed from <a class="reference external" href="https://rustup.rs/">https://rustup.rs</a>.</p>

datafusion/genindex.html

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -477,18 +477,24 @@ <h2 id="B">B</h2>
477477
<h2 id="C">C</h2>
478478
<table style="width: 100%" class="indextable genindextable"><tr>
479479
<td style="width: 33%; vertical-align: top;"><ul>
480+
<li><a href="python/generated/datafusion.Expression.html#datafusion.Expression.cast">cast() (datafusion.Expression method)</a>
481+
</li>
482+
<li><a href="python/generated/datafusion.ExecutionContext.html#datafusion.ExecutionContext.catalog">catalog() (datafusion.ExecutionContext method)</a>
483+
</li>
480484
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.ceil">ceil() (in module datafusion.functions)</a>
481485
</li>
482486
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.character_length">character_length() (in module datafusion.functions)</a>
483487
</li>
484488
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.chr">chr() (in module datafusion.functions)</a>
485489
</li>
486490
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.col">col() (in module datafusion.functions)</a>
487-
</li>
488-
<li><a href="python/generated/datafusion.DataFrame.html#datafusion.DataFrame.collect">collect() (datafusion.DataFrame method)</a>
489491
</li>
490492
</ul></td>
491493
<td style="width: 33%; vertical-align: top;"><ul>
494+
<li><a href="python/generated/datafusion.DataFrame.html#datafusion.DataFrame.collect">collect() (datafusion.DataFrame method)</a>
495+
</li>
496+
<li><a href="python/generated/datafusion.Expression.html#datafusion.Expression.column">column() (datafusion.Expression static method)</a>
497+
</li>
492498
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.concat">concat() (in module datafusion.functions)</a>
493499
</li>
494500
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.concat_ws">concat_ws() (in module datafusion.functions)</a>
@@ -519,6 +525,8 @@ <h2 id="D">D</h2>
519525
<h2 id="E">E</h2>
520526
<table style="width: 100%" class="indextable genindextable"><tr>
521527
<td style="width: 33%; vertical-align: top;"><ul>
528+
<li><a href="python/generated/datafusion.ExecutionContext.html#datafusion.ExecutionContext.empty_table">empty_table() (datafusion.ExecutionContext method)</a>
529+
</li>
522530
<li><a href="python/generated/datafusion.ExecutionContext.html#datafusion.ExecutionContext">ExecutionContext (class in datafusion)</a>
523531
</li>
524532
</ul></td>
@@ -547,11 +555,13 @@ <h2 id="I">I</h2>
547555
<td style="width: 33%; vertical-align: top;"><ul>
548556
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.Volatility.immutable">immutable() (datafusion.functions.Volatility static method)</a>
549557
</li>
550-
</ul></td>
551-
<td style="width: 33%; vertical-align: top;"><ul>
552558
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.in_list">in_list() (in module datafusion.functions)</a>
553559
</li>
560+
</ul></td>
561+
<td style="width: 33%; vertical-align: top;"><ul>
554562
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.initcap">initcap() (in module datafusion.functions)</a>
563+
</li>
564+
<li><a href="python/generated/datafusion.Expression.html#datafusion.Expression.is_null">is_null() (datafusion.Expression method)</a>
555565
</li>
556566
</ul></td>
557567
</tr></table>
@@ -572,6 +582,8 @@ <h2 id="L">L</h2>
572582
<li><a href="python/generated/datafusion.DataFrame.html#datafusion.DataFrame.limit">limit() (datafusion.DataFrame method)</a>
573583
</li>
574584
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.lit">lit() (in module datafusion.functions)</a>
585+
</li>
586+
<li><a href="python/generated/datafusion.Expression.html#datafusion.Expression.literal">literal() (datafusion.Expression static method)</a>
575587
</li>
576588
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.ln">ln() (in module datafusion.functions)</a>
577589
</li>
@@ -657,6 +669,8 @@ <h2 id="R">R</h2>
657669
<h2 id="S">S</h2>
658670
<table style="width: 100%" class="indextable genindextable"><tr>
659671
<td style="width: 33%; vertical-align: top;"><ul>
672+
<li><a href="python/generated/datafusion.DataFrame.html#datafusion.DataFrame.schema">schema() (datafusion.DataFrame method)</a>
673+
</li>
660674
<li><a href="python/generated/datafusion.DataFrame.html#datafusion.DataFrame.select">select() (datafusion.DataFrame method)</a>
661675
</li>
662676
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.sha224">sha224() (in module datafusion.functions)</a>
@@ -673,14 +687,14 @@ <h2 id="S">S</h2>
673687
</li>
674688
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.sin">sin() (in module datafusion.functions)</a>
675689
</li>
690+
</ul></td>
691+
<td style="width: 33%; vertical-align: top;"><ul>
676692
<li><a href="python/generated/datafusion.DataFrame.html#datafusion.DataFrame.sort">sort() (datafusion.DataFrame method)</a>
677693

678694
<ul>
679695
<li><a href="python/generated/datafusion.Expression.html#datafusion.Expression.sort">(datafusion.Expression method)</a>
680696
</li>
681697
</ul></li>
682-
</ul></td>
683-
<td style="width: 33%; vertical-align: top;"><ul>
684698
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.split_part">split_part() (in module datafusion.functions)</a>
685699
</li>
686700
<li><a href="python/generated/datafusion.ExecutionContext.html#datafusion.ExecutionContext.sql">sql() (datafusion.ExecutionContext method)</a>
@@ -703,14 +717,16 @@ <h2 id="S">S</h2>
703717
<h2 id="T">T</h2>
704718
<table style="width: 100%" class="indextable genindextable"><tr>
705719
<td style="width: 33%; vertical-align: top;"><ul>
720+
<li><a href="python/generated/datafusion.ExecutionContext.html#datafusion.ExecutionContext.table">table() (datafusion.ExecutionContext method)</a>
721+
</li>
706722
<li><a href="python/generated/datafusion.ExecutionContext.html#datafusion.ExecutionContext.tables">tables() (datafusion.ExecutionContext method)</a>
707723
</li>
708724
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.tan">tan() (in module datafusion.functions)</a>
709-
</li>
710-
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.to_hex">to_hex() (in module datafusion.functions)</a>
711725
</li>
712726
</ul></td>
713727
<td style="width: 33%; vertical-align: top;"><ul>
728+
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.to_hex">to_hex() (in module datafusion.functions)</a>
729+
</li>
714730
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.translate">translate() (in module datafusion.functions)</a>
715731
</li>
716732
<li><a href="python/generated/datafusion.functions.html#datafusion.functions.trim">trim() (in module datafusion.functions)</a>

datafusion/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,7 @@ <h2>Table of content<a class="headerlink" href="#table-of-content" title="Permal
458458
<ul>
459459
<li class="toctree-l1"><a class="reference internal" href="specification/roadmap.html">Roadmap</a></li>
460460
<li class="toctree-l1"><a class="reference internal" href="specification/roadmap.html#datafusion">DataFusion</a></li>
461-
<li class="toctree-l1"><a class="reference internal" href="specification/roadmap.html#vision">Vision</a></li>
461+
<li class="toctree-l1"><a class="reference internal" href="specification/roadmap.html#ballista">Ballista</a></li>
462462
<li class="toctree-l1"><a class="reference internal" href="specification/invariants.html">DataFusion’s Invariants</a></li>
463463
<li class="toctree-l1"><a class="reference internal" href="specification/output-field-name-semantic.html">Datafusion output field name semantic</a></li>
464464
</ul>

datafusion/objects.inv

47 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)