Skip to content

Commit a340748

Browse files
authored
docs: Move existing documentation into new Contributor Guide and add Getting Started section (apache#334)
1 parent 897dde7 commit a340748

File tree

11 files changed

+221
-140
lines changed

11 files changed

+221
-140
lines changed

.github/workflows/benchmark-tpch.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,12 @@ on:
2525
push:
2626
paths-ignore:
2727
- "doc/**"
28+
- "docs/**"
2829
- "**.md"
2930
pull_request:
3031
paths-ignore:
3132
- "doc/**"
33+
- "docs/**"
3234
- "**.md"
3335
# manual trigger
3436
# https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow

.github/workflows/benchmark.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,12 @@ on:
2525
push:
2626
paths-ignore:
2727
- "doc/**"
28+
- "docs/**"
2829
- "**.md"
2930
pull_request:
3031
paths-ignore:
3132
- "doc/**"
33+
- "docs/**"
3234
- "**.md"
3335
# manual trigger
3436
# https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow

.github/workflows/pr_build.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,12 @@ on:
2525
push:
2626
paths-ignore:
2727
- "doc/**"
28+
- "docs/**"
2829
- "**.md"
2930
pull_request:
3031
paths-ignore:
3132
- "doc/**"
33+
- "docs/**"
3234
- "**.md"
3335
# manual trigger
3436
# https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow

.github/workflows/spark_sql_test.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,12 @@ on:
2525
push:
2626
paths-ignore:
2727
- "doc/**"
28+
- "docs/**"
2829
- "**.md"
2930
pull_request:
3031
paths-ignore:
3132
- "doc/**"
33+
- "docs/**"
3234
- "**.md"
3335
# manual trigger
3436
# https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow

EXPRESSIONS.md

Lines changed: 0 additions & 109 deletions
This file was deleted.
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Contributing to Apache DataFusion Comet
21+
22+
We welcome contributions to Comet in many areas, and encourage new contributors to get involved.
23+
24+
Here are some areas where you can help:
25+
26+
- Testing Comet with existing Spark jobs and reporting issues for any bugs or performance issues
27+
- Contributing code to support Spark expressions, operators, and data types that are not currently supported
28+
- Reviewing pull requests and helping to test new features for correctness and performance
29+
- Improving documentation
30+
31+
## Finding issues to work on
32+
33+
We maintain a list of good first issues in GitHub [here](https://github.com/apache/datafusion-comet/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
34+
35+
## Reporting issues
36+
37+
We use [GitHub issues](https://github.com/apache/datafusion-comet/issues) for bug reports and feature requests.
38+
39+
## Asking for Help
40+
41+
The Comet project uses the same Slack and Discord channels as the main Apache DataFusion project. See details at
42+
[Apache DataFusion Communications]. There are dedicated Comet channels in both Slack and Discord.
43+
44+
## Regular public meetings
45+
46+
The Comet contributors hold regular video calls where new and current contributors are welcome to ask questions and
47+
coordinate on issues that they are working on.
48+
49+
See the [Apache DataFusion Comet community meeting] Google document for more information.
50+
51+
[Apache DataFusion Communications]: https://datafusion.apache.org/contributor-guide/communication.html
52+
[Apache DataFusion Comet community meeting]: https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing

DEBUGGING.md renamed to docs/source/contributor-guide/debugging.md

Lines changed: 27 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,13 @@ under the License.
2020
# Comet Debugging Guide
2121

2222
This HOWTO describes how to debug JVM code and Native code concurrently. The guide assumes you have:
23+
2324
1. Intellij as the Java IDE
2425
2. CLion as the Native IDE. For Rust code, the CLion Rust language plugin is required. Note that the
25-
Intellij Rust plugin is not sufficient.
26+
Intellij Rust plugin is not sufficient.
2627
3. CLion/LLDB as the native debugger. CLion ships with a bundled LLDB and the Rust community has
27-
its own packaging of LLDB (`lldb-rust`). Both provide a better display of Rust symbols than plain
28-
LLDB or the LLDB that is bundled with XCode. We will use the LLDB packaged with CLion for this guide.
28+
its own packaging of LLDB (`lldb-rust`). Both provide a better display of Rust symbols than plain
29+
LLDB or the LLDB that is bundled with XCode. We will use the LLDB packaged with CLion for this guide.
2930
4. We will use a Comet _unit_ test as the canonical use case.
3031

3132
_Caveat: The steps here have only been tested with JDK 11_ on Mac (M1)
@@ -42,21 +43,24 @@ use advanced `lldb` debugging.
4243
1. Add a Debug Configuration for the unit test
4344

4445
1. In the Debug Configuration for that unit test add `-Xint` as a JVM parameter. This option is
45-
undocumented *magic*. Without this, the LLDB debugger hits a EXC_BAD_ACCESS (or EXC_BAD_INSTRUCTION) from
46-
which one cannot recover.
46+
undocumented _magic_. Without this, the LLDB debugger hits a EXC_BAD_ACCESS (or EXC_BAD_INSTRUCTION) from
47+
which one cannot recover.
48+
49+
1. Add a println to the unit test to print the PID of the JVM process. (jps can also be used but this is less error prone if you have multiple jvm processes running)
50+
51+
```JDK8
52+
println("Waiting for Debugger: PID - ", ManagementFactory.getRuntimeMXBean().getName())
53+
```
54+
55+
This will print something like : `PID@your_machine_name`.
4756

48-
1. Add a println to the unit test to print the PID of the JVM process. (jps can also be used but this is less error prone if you have multiple jvm processes running)
49-
``` JDK8
50-
println("Waiting for Debugger: PID - ", ManagementFactory.getRuntimeMXBean().getName())
51-
```
52-
This will print something like : `PID@your_machine_name`.
57+
For JDK9 and newer
5358

54-
For JDK9 and newer
55-
```JDK9
56-
println("Waiting for Debugger: PID - ", ProcessHandle.current.pid)
57-
```
59+
```JDK9
60+
println("Waiting for Debugger: PID - ", ProcessHandle.current.pid)
61+
```
5862

59-
==> Note the PID
63+
==> Note the PID
6064

6165
1. Debug-run the test in Intellij and wait for the breakpoint to be hit
6266

@@ -96,7 +100,8 @@ Detecting the debugger
96100
https://stackoverflow.com/questions/5393403/can-a-java-application-detect-that-a-debugger-is-attached#:~:text=No.,to%20let%20your%20app%20continue.&text=I%20know%20that%20those%20are,meant%20with%20my%20first%20phrase).
97101

98102
# Verbose debug
99-
By default, Comet outputs the exception details specific for Comet.
103+
104+
By default, Comet outputs the exception details specific for Comet.
100105

101106
```scala
102107
scala> spark.sql("my_failing_query").show(false)
@@ -112,7 +117,7 @@ This was likely caused by a bug in DataFusion's code and we would welcome that y
112117
```
113118

114119
There is a verbose exception option by leveraging DataFusion [backtraces](https://arrow.apache.org/datafusion/user-guide/example-usage.html#enable-backtraces)
115-
This option allows to append native DataFusion stacktrace to the original error message.
120+
This option allows to append native DataFusion stacktrace to the original error message.
116121
To enable this option with Comet it is needed to include `backtrace` feature in [Cargo.toml](https://github.com/apache/arrow-datafusion-comet/blob/main/core/Cargo.toml) for DataFusion dependencies
117122

118123
```
@@ -129,15 +134,16 @@ RUST_BACKTRACE=1 $SPARK_HOME/spark-shell --jars spark/target/comet-spark-spark3.
129134
```
130135

131136
Get the expanded exception details
137+
132138
```scala
133139
scala> spark.sql("my_failing_query").show(false)
134140
24/03/05 17:00:49 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
135141
org.apache.comet.CometNativeException: Internal error: MIN/MAX is not expected to receive scalars of incompatible types (Date32("NULL"), Int32(15901))
136142

137-
backtrace:
143+
backtrace:
138144
0: std::backtrace::Backtrace::create
139145
1: datafusion_physical_expr::aggregate::min_max::min
140-
2: <datafusion_physical_expr::aggregate::min_max::MinAccumulator as datafusion_expr::accumulator::Accumulator>::update_batch
146+
2: <datafusion_physical_expr::aggregate::min_max::MinAccumulator as datafusion_expr::accumulator::Accumulator>::update_batch
141147
3: <futures_util::stream::stream::fuse::Fuse<S> as futures_core::stream::Stream>::poll_next
142148
4: comet::execution::jni_api::Java_org_apache_comet_Native_executePlan::{{closure}}
143149
5: _Java_org_apache_comet_Native_executePlan
@@ -151,6 +157,8 @@ at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:126)
151157
(reduced)
152158

153159
```
160+
154161
Note:
162+
155163
- The backtrace coverage in DataFusion is still improving. So there is a chance the error still not covered, if so feel free to file a [ticket](https://github.com/apache/arrow-datafusion/issues)
156164
- The backtrace evaluation comes with performance cost and intended mostly for debugging purposes

DEVELOPMENT.md renamed to docs/source/contributor-guide/development.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -49,25 +49,29 @@ A few common commands are specified in project's `Makefile`:
4949
- `make clean`: clean up the workspace
5050
- `bin/comet-spark-shell -d . -o spark/target/` run Comet spark shell for V1 datasources
5151
- `bin/comet-spark-shell -d . -o spark/target/ --conf spark.sql.sources.useV1SourceList=""` run Comet spark shell for V2 datasources
52-
52+
5353
## Development Environment
54+
5455
Comet is a multi-language project with native code written in Rust and JVM code written in Java and Scala.
55-
For Rust code, the CLion IDE is recommended. For JVM code, IntelliJ IDEA is recommended.
56+
For Rust code, the CLion IDE is recommended. For JVM code, IntelliJ IDEA is recommended.
5657

5758
Before opening the project in an IDE, make sure to run `make` first to generate the necessary files for the IDEs. Currently, it's mostly about
5859
generating protobuf message classes for the JVM side. It's only required to run `make` once after cloning the repo.
5960

6061
### IntelliJ IDEA
61-
First make sure to install the Scala plugin in IntelliJ IDEA.
62+
63+
First make sure to install the Scala plugin in IntelliJ IDEA.
6264
After that, you can open the project in IntelliJ IDEA. The IDE should automatically detect the project structure and import as a Maven project.
6365

6466
### CLion
67+
6568
First make sure to install the Rust plugin in CLion or you can use the dedicated Rust IDE: RustRover.
6669
After that you can open the project in CLion. The IDE should automatically detect the project structure and import as a Cargo project.
6770

6871
### Running Tests in IDEA
72+
6973
Like other Maven projects, you can run tests in IntelliJ IDEA by right-clicking on the test class or test method and selecting "Run" or "Debug".
70-
However if the tests is related to the native side. Please make sure to run `make core` or `cd core && cargo build` before running the tests in IDEA.
74+
However if the tests is related to the native side. Please make sure to run `make core` or `cd core && cargo build` before running the tests in IDEA.
7175

7276
## Benchmark
7377

@@ -82,9 +86,11 @@ To run TPC-H or TPC-DS micro benchmarks, please follow the instructions
8286
in the respective source code, e.g., `CometTPCHQueryBenchmark`.
8387

8488
## Debugging
89+
8590
Comet is a multi-language project with native code written in Rust and JVM code written in Java and Scala.
86-
It is possible to debug both native and JVM code concurrently as described in the [DEBUGGING guide](DEBUGGING.md)
91+
It is possible to debug both native and JVM code concurrently as described in the [DEBUGGING guide](debugging)
8792

8893
## Submitting a Pull Request
89-
Comet uses `cargo fmt`, [Scalafix](https://github.com/scalacenter/scalafix) and [Spotless](https://github.com/diffplug/spotless/tree/main/plugin-maven) to
90-
automatically format the code. Before submitting a pull request, you can simply run `make format` to format the code.
94+
95+
Comet uses `cargo fmt`, [Scalafix](https://github.com/scalacenter/scalafix) and [Spotless](https://github.com/diffplug/spotless/tree/main/plugin-maven) to
96+
automatically format the code. Before submitting a pull request, you can simply run `make format` to format the code.

docs/source/index.rst

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,16 +35,23 @@ Apache DataFusion Comet
3535
Apache DataFusion Comet is an Apache Spark plugin that uses Apache DataFusion
3636
as a native runtime to achieve improvement in terms of query efficiency and query runtime.
3737

38-
This documentation site is currently being developed. The most up-to-date documentation can be found in the
39-
GitHub repository at https://github.com/apache/datafusion-comet.
38+
.. _toc.links:
39+
.. toctree::
40+
:maxdepth: 1
41+
:caption: User Guide
42+
43+
Supported Expressions <user-guide/expressions>
44+
user-guide/compatibility
4045

4146
.. _toc.links:
4247
.. toctree::
4348
:maxdepth: 1
44-
:caption: Project Links
49+
:caption: Contributor Guide
4550

46-
compatibility
51+
Getting Started <contributor-guide/contributing>
4752
Github and Issue Tracker <https://github.com/apache/datafusion-comet>
53+
contributor-guide/development
54+
contributor-guide/debugging
4855

4956
.. _toc.asf-links:
5057
.. toctree::

0 commit comments

Comments
 (0)