Skip to content

Commit 506ff91

Browse files
committed
Fix typos found with codespell
1 parent 2e1b713 commit 506ff91

File tree

12 files changed

+25
-25
lines changed

12 files changed

+25
-25
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,7 @@ and for `uv run` commands the additional parameter `--no-project`
225225
```bash
226226
# fetch this repo
227227
git clone git@github.com:apache/datafusion-python.git
228-
# create the virtual enviornment
228+
# create the virtual environment
229229
uv sync --dev --no-install-package datafusion
230230
# activate the environment
231231
source .venv/bin/activate

docs/source/contributor-guide/ffi.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ optimization levels. If you wish to go down this route, there are two approaches
195195
have identified you can use.
196196

197197
#. Re-export all of ``datafusion-python`` yourself with your extensions built in.
198-
#. Carefully synchonize your software releases with the ``datafusion-python`` CI build
198+
#. Carefully synchronize your software releases with the ``datafusion-python`` CI build
199199
system so that your libraries use the exact same compiler, features, and
200200
optimization level.
201201

docs/source/contributor-guide/introduction.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Bootstrap:
4343
4444
# fetch this repo
4545
git clone git@github.com:apache/datafusion-python.git
46-
# create the virtual enviornment
46+
# create the virtual environment
4747
uv sync --dev --no-install-package datafusion
4848
# activate the environment
4949
source .venv/bin/activate

docs/source/user-guide/common-operations/expressions.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Arrays
6464
------
6565

6666
For columns that contain arrays of values, you can access individual elements of the array by index
67-
using bracket indexing. This is similar to callling the function
67+
using bracket indexing. This is similar to calling the function
6868
:py:func:`datafusion.functions.array_element`, except that array indexing using brackets is 0 based,
6969
similar to Python arrays and ``array_element`` is 1 based indexing to be compatible with other SQL
7070
approaches.

docs/source/user-guide/common-operations/windows.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ In this section you will learn about window functions. A window function utilize
2424
multiple rows to produce a result for each individual row, unlike an aggregate function that
2525
provides a single value for multiple rows.
2626

27-
The window functions are availble in the :py:mod:`~datafusion.functions` module.
27+
The window functions are available in the :py:mod:`~datafusion.functions` module.
2828

2929
We'll use the pokemon dataset (from Ritchie Vink) in the following examples.
3030

@@ -99,8 +99,8 @@ If you do not specify a Window Frame, the frame will be set depending on the fol
9999
criteria.
100100

101101
* If an ``order_by`` clause is set, the default window frame is defined as the rows between
102-
unbounded preceeding and the current row.
103-
* If an ``order_by`` is not set, the default frame is defined as the rows betwene unbounded
102+
unbounded preceding and the current row.
103+
* If an ``order_by`` is not set, the default frame is defined as the rows between unbounded
104104
and unbounded following (the entire partition).
105105

106106
Window Frames are defined by three parameters: unit type, starting bound, and ending bound.
@@ -116,7 +116,7 @@ The unit types available are:
116116
``order_by`` clause.
117117

118118
In this example we perform a "rolling average" of the speed of the current Pokemon and the
119-
two preceeding rows.
119+
two preceding rows.
120120

121121
.. ipython:: python
122122

docs/source/user-guide/data-sources.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ DataFusion provides a wide variety of ways to get data into a DataFrame to perfo
2525
Local file
2626
----------
2727

28-
DataFusion has the abilty to read from a variety of popular file formats, such as :ref:`Parquet <io_parquet>`,
28+
DataFusion has the ability to read from a variety of popular file formats, such as :ref:`Parquet <io_parquet>`,
2929
:ref:`CSV <io_csv>`, :ref:`JSON <io_json>`, and :ref:`AVRO <io_avro>`.
3030

3131
.. ipython:: python
@@ -120,7 +120,7 @@ DataFusion can import DataFrames directly from other libraries, such as
120120
`Polars <https://pola.rs/>`_ and `Pandas <https://pandas.pydata.org/>`_.
121121
Since DataFusion version 42.0.0, any DataFrame library that supports the Arrow FFI PyCapsule
122122
interface can be imported to DataFusion using the
123-
:py:func:`~datafusion.context.SessionContext.from_arrow` function. Older verions of Polars may
123+
:py:func:`~datafusion.context.SessionContext.from_arrow` function. Older versions of Polars may
124124
not support the arrow interface. In those cases, you can still import via the
125125
:py:func:`~datafusion.context.SessionContext.from_polars` function.
126126

python/datafusion/dataframe.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -588,7 +588,7 @@ def tail(self, n: int = 5) -> DataFrame:
588588
def collect(self) -> list[pa.RecordBatch]:
589589
"""Execute this :py:class:`DataFrame` and collect results into memory.
590590
591-
Prior to calling ``collect``, modifying a DataFrme simply updates a plan
591+
Prior to calling ``collect``, modifying a DataFrame simply updates a plan
592592
(no actual computation is performed). Calling ``collect`` triggers the
593593
computation.
594594
@@ -767,7 +767,7 @@ def explain(self, verbose: bool = False, analyze: bool = False) -> None:
767767
768768
Args:
769769
verbose: If ``True``, more details will be included.
770-
analyze: If ``Tru`e``, the plan will run and metrics reported.
770+
analyze: If ``True``, the plan will run and metrics reported.
771771
"""
772772
self.df.explain(verbose, analyze)
773773

python/datafusion/functions.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1673,7 +1673,7 @@ def approx_percentile_cont(
16731673
between two of the values.
16741674
16751675
This function uses the [t-digest](https://arxiv.org/abs/1902.04023) algorithm to
1676-
compute the percentil. You can limit the number of bins used in this algorithm by
1676+
compute the percentile. You can limit the number of bins used in this algorithm by
16771677
setting the ``num_centroids`` parameter.
16781678
16791679
If using the builder functions described in ref:`_aggregation` this function ignores
@@ -2415,7 +2415,7 @@ def lead(
24152415
Lead operation will return the argument that is in the next shift_offset-th row in
24162416
the partition. For example ``lead(col("b"), shift_offset=3, default_value=5)`` will
24172417
return the 3rd following value in column ``b``. At the end of the partition, where
2418-
no futher values can be returned it will return the default value of 5.
2418+
no further values can be returned it will return the default value of 5.
24192419
24202420
Here is an example of both the ``lead`` and :py:func:`datafusion.functions.lag`
24212421
functions on a simple DataFrame::
@@ -2469,7 +2469,7 @@ def lag(
24692469
24702470
Lag operation will return the argument that is in the previous shift_offset-th row
24712471
in the partition. For example ``lag(col("b"), shift_offset=3, default_value=5)``
2472-
will return the 3rd previous value in column ``b``. At the beginnig of the
2472+
will return the 3rd previous value in column ``b``. At the beginning of the
24732473
partition, where no values can be returned it will return the default value of 5.
24742474
24752475
Here is an example of both the ``lag`` and :py:func:`datafusion.functions.lead`
@@ -2554,7 +2554,7 @@ def rank(
25542554
25552555
Returns the rank based upon the window order. Consecutive equal values will receive
25562556
the same rank, but the next different value will not be consecutive but rather the
2557-
number of rows that preceed it plus one. This is similar to Olympic medals. If two
2557+
number of rows that precede it plus one. This is similar to Olympic medals. If two
25582558
people tie for gold, the next place is bronze. There would be no silver medal. Here
25592559
is an example of a dataframe with a window ordered by descending ``points`` and the
25602560
associated rank.
@@ -2667,7 +2667,7 @@ def cume_dist(
26672667
"""Create a cumulative distribution window function.
26682668
26692669
This window function is similar to :py:func:`rank` except that the returned values
2670-
are the ratio of the row number to the total numebr of rows. Here is an example of a
2670+
are the ratio of the row number to the total number of rows. Here is an example of a
26712671
dataframe with a window ordered by descending ``points`` and the associated
26722672
cumulative distribution::
26732673
@@ -2748,7 +2748,7 @@ def string_agg(
27482748
"""Concatenates the input strings.
27492749
27502750
This aggregate function will concatenate input strings, ignoring null values, and
2751-
seperating them with the specified delimiter. Non-string values will be converted to
2751+
separating them with the specified delimiter. Non-string values will be converted to
27522752
their string equivalents.
27532753
27542754
If using the builder functions described in ref:`_aggregation` this function ignores

python/datafusion/user_defined.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -528,7 +528,7 @@ def memoize(self) -> None:
528528
"""
529529

530530
def get_range(self, idx: int, num_rows: int) -> tuple[int, int]: # noqa: ARG002
531-
"""Return the range for the window fuction.
531+
"""Return the range for the window function.
532532
533533
If `uses_window_frame` flag is `false`. This method is used to
534534
calculate required range for the window function during

python/tests/test_dataframe.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -734,8 +734,8 @@ def test_window_frame_defaults_match_postgres(partitioned_df):
734734

735735
assert df_1.sort(col_a).to_pydict() == expected
736736

737-
# When order is not set, the default frame should be unounded preceeding to
738-
# unbounded following. When order is set, the default frame is unbounded preceeding
737+
# When order is not set, the default frame should be unbounded preceding to
738+
# unbounded following. When order is set, the default frame is unbounded preceding
739739
# to current row.
740740
no_order = f.avg(col_a).over(Window()).alias("over_no_order")
741741
with_order = f.avg(col_a).over(Window(order_by=[col_a])).alias("over_with_order")
@@ -1009,14 +1009,14 @@ def test_html_formatter_repr_rows(df, clean_formatter_state):
10091009
html_output = df._repr_html_()
10101010

10111011
tr_count = count_table_rows(html_output)
1012-
# Tabe should have header row (1) + 2 data rows = 3 rows
1012+
# Table should have header row (1) + 2 data rows = 3 rows
10131013
assert tr_count == 3
10141014

10151015
configure_formatter(min_rows_display=2, repr_rows=3)
10161016
html_output = df._repr_html_()
10171017

10181018
tr_count = count_table_rows(html_output)
1019-
# Tabe should have header row (1) + 3 data rows = 4 rows
1019+
# Table should have header row (1) + 3 data rows = 4 rows
10201020
assert tr_count == 4
10211021

10221022

0 commit comments

Comments
 (0)