Skip to content

Commit de42281

Browse files
kjmrknsnsrowen
authored andcommitted
[MINOR][DOCS][WIP] Fix Typos
## What changes were proposed in this pull request? Fix Typos. ## How was this patch tested? NA Closes #23145 from kjmrknsn/docUpdate. Authored-by: Keiji Yoshida <kjmrknsn@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
1 parent 31c4fab commit de42281

13 files changed

+34
-34
lines changed

docs/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,8 @@ Example applications are also provided in Python. For example,
6666

6767
./bin/spark-submit examples/src/main/python/pi.py 10
6868

69-
Spark also provides an experimental [R API](sparkr.html) since 1.4 (only DataFrames APIs included).
70-
To run Spark interactively in a R interpreter, use `bin/sparkR`:
69+
Spark also provides an [R API](sparkr.html) since 1.4 (only DataFrames APIs included).
70+
To run Spark interactively in an R interpreter, use `bin/sparkR`:
7171

7272
./bin/sparkR --master local[2]
7373

docs/rdd-programming-guide.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -332,7 +332,7 @@ One important parameter for parallel collections is the number of *partitions* t
332332

333333
Spark can create distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, [SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html), and any other Hadoop [InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
334334

335-
Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes an URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation:
335+
Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes a URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation:
336336

337337
{% highlight scala %}
338338
scala> val distFile = sc.textFile("data.txt")
@@ -365,7 +365,7 @@ Apart from text files, Spark's Scala API also supports several other data format
365365

366366
Spark can create distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, [SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html), and any other Hadoop [InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
367367

368-
Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes an URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation:
368+
Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes a URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation:
369369

370370
{% highlight java %}
371371
JavaRDD<String> distFile = sc.textFile("data.txt");
@@ -397,7 +397,7 @@ Apart from text files, Spark's Java API also supports several other data formats
397397

398398
PySpark can create distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, [SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html), and any other Hadoop [InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
399399

400-
Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes an URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation:
400+
Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes a URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation:
401401

402402
{% highlight python %}
403403
>>> distFile = sc.textFile("data.txt")
@@ -1122,7 +1122,7 @@ costly operation.
11221122

11231123
#### Background
11241124

1125-
To understand what happens during the shuffle we can consider the example of the
1125+
To understand what happens during the shuffle, we can consider the example of the
11261126
[`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation generates a new RDD where all
11271127
values for a single key are combined into a tuple - the key and the result of executing a reduce
11281128
function against all values associated with that key. The challenge is that not all values for a

docs/running-on-mesos.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -687,7 +687,7 @@ See the [configuration page](configuration.html) for information on Spark config
687687
<td><code>0</code></td>
688688
<td>
689689
Set the maximum number GPU resources to acquire for this job. Note that executors will still launch when no GPU resources are found
690-
since this configuration is just a upper limit and not a guaranteed amount.
690+
since this configuration is just an upper limit and not a guaranteed amount.
691691
</td>
692692
</tr>
693693
<tr>

docs/sql-data-sources-avro.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,9 +66,9 @@ write.df(select(df, "name", "favorite_color"), "namesAndFavColors.avro", "avro")
6666
## to_avro() and from_avro()
6767
The Avro package provides function `to_avro` to encode a column as binary in Avro
6868
format, and `from_avro()` to decode Avro binary data into a column. Both functions transform one column to
69-
another column, and the input/output SQL data type can be complex type or primitive type.
69+
another column, and the input/output SQL data type can be a complex type or a primitive type.
7070

71-
Using Avro record as columns are useful when reading from or writing to a streaming source like Kafka. Each
71+
Using Avro record as columns is useful when reading from or writing to a streaming source like Kafka. Each
7272
Kafka key-value record will be augmented with some metadata, such as the ingestion timestamp into Kafka, the offset in Kafka, etc.
7373
* If the "value" field that contains your data is in Avro, you could use `from_avro()` to extract your data, enrich it, clean it, and then push it downstream to Kafka again or write it out to a file.
7474
* `to_avro()` can be used to turn structs into Avro records. This method is particularly useful when you would like to re-encode multiple columns into a single one when writing data out to Kafka.
@@ -151,7 +151,7 @@ Data source options of Avro can be set via:
151151
<tr>
152152
<td><code>avroSchema</code></td>
153153
<td>None</td>
154-
<td>Optional Avro schema provided by an user in JSON format. The date type and naming of record fields
154+
<td>Optional Avro schema provided by a user in JSON format. The date type and naming of record fields
155155
should match the input Avro data or Catalyst data, otherwise the read/write action will fail.</td>
156156
<td>read and write</td>
157157
</tr>

docs/sql-data-sources-hive-tables.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ creating table, you can create a table using storage handler at Hive side, and u
7474
<td><code>inputFormat, outputFormat</code></td>
7575
<td>
7676
These 2 options specify the name of a corresponding `InputFormat` and `OutputFormat` class as a string literal,
77-
e.g. `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`. These 2 options must be appeared in pair, and you can not
77+
e.g. `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`. These 2 options must be appeared in a pair, and you can not
7878
specify them if you already specified the `fileFormat` option.
7979
</td>
8080
</tr>

docs/sql-data-sources-jdbc.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ the following case-insensitive options:
5555
as a subquery in the <code>FROM</code> clause. Spark will also assign an alias to the subquery clause.
5656
As an example, spark will issue a query of the following form to the JDBC Source.<br><br>
5757
<code> SELECT &lt;columns&gt; FROM (&lt;user_specified_query&gt;) spark_gen_alias</code><br><br>
58-
Below are couple of restrictions while using this option.<br>
58+
Below are a couple of restrictions while using this option.<br>
5959
<ol>
6060
<li> It is not allowed to specify `dbtable` and `query` options at the same time. </li>
6161
<li> It is not allowed to specify `query` and `partitionColumn` options at the same time. When specifying

docs/sql-data-sources-load-save-functions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -324,4 +324,4 @@ CLUSTERED BY(name) SORTED BY (favorite_numbers) INTO 42 BUCKETS;
324324
`partitionBy` creates a directory structure as described in the [Partition Discovery](sql-data-sources-parquet.html#partition-discovery) section.
325325
Thus, it has limited applicability to columns with high cardinality. In contrast
326326
`bucketBy` distributes
327-
data across a fixed number of buckets and can be used when a number of unique values is unbounded.
327+
data across a fixed number of buckets and can be used when the number of unique values is unbounded.

docs/sql-getting-started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Here we include some basic examples of structured data processing using Datasets
9999
<div data-lang="scala" markdown="1">
100100
{% include_example untyped_ops scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
101101

102-
For a complete list of the types of operations that can be performed on a Dataset refer to the [API Documentation](api/scala/index.html#org.apache.spark.sql.Dataset).
102+
For a complete list of the types of operations that can be performed on a Dataset, refer to the [API Documentation](api/scala/index.html#org.apache.spark.sql.Dataset).
103103

104104
In addition to simple column references and expressions, Datasets also have a rich library of functions including string manipulation, date arithmetic, common math operations and more. The complete list is available in the [DataFrame Function Reference](api/scala/index.html#org.apache.spark.sql.functions$).
105105
</div>

docs/sql-programming-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ title: Spark SQL and DataFrames
77
Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided
88
by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally,
99
Spark SQL uses this extra information to perform extra optimizations. There are several ways to
10-
interact with Spark SQL including SQL and the Dataset API. When computing a result
10+
interact with Spark SQL including SQL and the Dataset API. When computing a result,
1111
the same execution engine is used, independent of which API/language you are using to express the
1212
computation. This unification means that developers can easily switch back and forth between
1313
different APIs based on which provides the most natural way to express a given transformation.

docs/sql-pyspark-pandas-with-arrow.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ For detailed usage, please see [`pyspark.sql.functions.pandas_udf`](api/python/p
129129

130130
Currently, all Spark SQL data types are supported by Arrow-based conversion except `MapType`,
131131
`ArrayType` of `TimestampType`, and nested `StructType`. `BinaryType` is supported only when
132-
installed PyArrow is equal to or higher then 0.10.0.
132+
installed PyArrow is equal to or higher than 0.10.0.
133133

134134
### Setting Arrow Batch Size
135135

0 commit comments

Comments
 (0)