Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions datafusion/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
_sources
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
A,b,c
1,2,3
1,10,5
2,5,6
2,1,4
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
a,b,c
1,2,3
6 changes: 3 additions & 3 deletions datafusion/contributor-guide/roadmap.html
Original file line number Diff line number Diff line change
Expand Up @@ -421,7 +421,7 @@ <h3>Additional SQL Language Features<a class="headerlink" href="#additional-sql-
<h3>Query Optimizer<a class="headerlink" href="#query-optimizer" title="Permalink to this heading">¶</a></h3>
<ul class="simple">
<li><p>More sophisticated cost based optimizer for join ordering</p></li>
<li><p>Implement advanced query optimization framework (Tokomak) #440</p></li>
<li><p>Implement advanced query optimization framework (Tokomak) <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/440">#440</a></p></li>
<li><p>Finer optimizations for group by and aggregate functions</p></li>
</ul>
</section>
Expand All @@ -436,8 +436,8 @@ <h3>Datasources<a class="headerlink" href="#datasources" title="Permalink to thi
<h3>Runtime / Infrastructure<a class="headerlink" href="#runtime-infrastructure" title="Permalink to this heading">¶</a></h3>
<ul class="simple">
<li><p>Migrate to some sort of arrow2 based implementation (see <a class="reference external" href="https://github.com/apache/arrow-datafusion/milestone/3">milestone</a> for more details)</p></li>
<li><p>Add DataFusion to h2oai/db-benchmark <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/147">147</a></p></li>
<li><p>Improve build time <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/348">348</a></p></li>
<li><p>Add DataFusion to h2oai/db-benchmark <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/147">#147</a></p></li>
<li><p>Improve build time <a class="reference external" href="https://github.com/apache/arrow-datafusion/issues/348">#348</a></p></li>
</ul>
</section>
<section id="resource-management">
Expand Down
Binary file modified datafusion/objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion datafusion/searchindex.js

Large diffs are not rendered by default.

60 changes: 44 additions & 16 deletions datafusion/user-guide/cli.html
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,11 @@
Usage
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#selecting-files-directly">
Selecting files directly
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#registering-parquet-data-sources">
Registering Parquet Data Sources
Expand Down Expand Up @@ -367,26 +372,43 @@
-->
<section id="datafusion-command-line-sql-utility">
<h1>DataFusion Command-line SQL Utility<a class="headerlink" href="#datafusion-command-line-sql-utility" title="Permalink to this heading">¶</a></h1>
<p>The DataFusion CLI is a command-line interactive SQL utility that allows
queries to be executed against any supported data files. It is a convenient way to
<p>The DataFusion CLI is a command-line interactive SQL utility for executing
queries against any supported data files. It is a convenient way to
try DataFusion out with your own data sources, and test out its SQL support.</p>
<section id="example">
<h2>Example<a class="headerlink" href="#example" title="Permalink to this heading">¶</a></h2>
<p>Create a CSV file to query.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">echo</span> <span class="s2">&quot;1,2&quot;</span> &gt; data.csv
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">echo</span> <span class="s2">&quot;a,b&quot;</span> &gt; data.csv
$ <span class="nb">echo</span> <span class="s2">&quot;1,2&quot;</span> &gt;&gt; data.csv
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ datafusion-cli
DataFusion CLI v12.0.0
❯ CREATE EXTERNAL TABLE foo STORED AS CSV LOCATION <span class="s1">&#39;data.csv&#39;</span><span class="p">;</span>
<span class="m">0</span> rows <span class="k">in</span> set. Query took <span class="m">0</span>.017 seconds.
❯ <span class="k">select</span> * from foo<span class="p">;</span>
+----------+----------+
<span class="p">|</span> column_1 <span class="p">|</span> column_2 <span class="p">|</span>
+----------+----------+
<span class="p">|</span> <span class="m">1</span> <span class="p">|</span> <span class="m">2</span> <span class="p">|</span>
+----------+----------+
<span class="m">1</span> row <span class="k">in</span> set. Query took <span class="m">0</span>.012 seconds.
<p>Query that single file (the CLI also supports parquet, compressed csv, avro, json and more)</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$ datafusion-cli
DataFusion CLI v17.0.0
❯ <span class="k">select</span> * from <span class="s1">&#39;data.csv&#39;</span><span class="p">;</span>
+---+---+
<span class="p">|</span> a <span class="p">|</span> b <span class="p">|</span>
+---+---+
<span class="p">|</span> <span class="m">1</span> <span class="p">|</span> <span class="m">2</span> <span class="p">|</span>
+---+---+
<span class="m">1</span> row <span class="k">in</span> set. Query took <span class="m">0</span>.007 seconds.
</pre></div>
</div>
<p>You can also query directories of files with compatible schemas:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$ ls data_dir/
data.csv data2.csv
</pre></div>
</div>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$ datafusion-cli
DataFusion CLI v16.0.0
❯ <span class="k">select</span> * from <span class="s1">&#39;data_dir&#39;</span><span class="p">;</span>
+---+---+
<span class="p">|</span> a <span class="p">|</span> b <span class="p">|</span>
+---+---+
<span class="p">|</span> <span class="m">3</span> <span class="p">|</span> <span class="m">4</span> <span class="p">|</span>
<span class="p">|</span> <span class="m">1</span> <span class="p">|</span> <span class="m">2</span> <span class="p">|</span>
+---+---+
<span class="m">2</span> rows <span class="k">in</span> set. Query took <span class="m">0</span>.007 seconds.
</pre></div>
</div>
</section>
Expand Down Expand Up @@ -430,6 +452,7 @@ <h3>Run using Docker<a class="headerlink" href="#run-using-docker" title="Permal
</section>
<section id="usage">
<h2>Usage<a class="headerlink" href="#usage" title="Permalink to this heading">¶</a></h2>
<p>See the current usage using <code class="docutils literal notranslate"><span class="pre">datafusion-cli</span> <span class="pre">--help</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>Apache Arrow &lt;dev@arrow.apache.org&gt;
Command Line Client <span class="k">for</span> DataFusion query engine.

Expand All @@ -446,11 +469,16 @@ <h2>Usage<a class="headerlink" href="#usage" title="Permalink to this heading">
-q, --quiet Reduce printing other than the results and work quietly
-r, --rc &lt;RC&gt;... Run the provided files on startup instead of ~/.datafusionrc
-V, --version Print version information

Type <span class="sb">`</span><span class="nb">exit</span><span class="sb">`</span> or <span class="sb">`</span>quit<span class="sb">`</span> to <span class="nb">exit</span> the CLI.
</pre></div>
</div>
</section>
<section id="selecting-files-directly">
<h2>Selecting files directly<a class="headerlink" href="#selecting-files-directly" title="Permalink to this heading">¶</a></h2>
<p>Files can be queried directly by enclosing the file or
directory name in single <code class="docutils literal notranslate"><span class="pre">'</span></code> quotes as shown in the example.</p>
<p>It is also possible to create a table backed by files by explicitly
via <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> as shown below.</p>
</section>
<section id="registering-parquet-data-sources">
<h2>Registering Parquet Data Sources<a class="headerlink" href="#registering-parquet-data-sources" title="Permalink to this heading">¶</a></h2>
<p>Parquet data sources can be registered by executing a <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL statement. It is not necessary to provide schema information for Parquet files.</p>
Expand Down
Loading