Skip to content

Commit c816d84

Browse files
authored
docs: Add docstring code samples for Series.apply and DataFrame.map (#185)
* docs: Add docstring code samples for `Series.apply` and `DataFrame.map` * improved docstring with concurrency-safe code samples * Correct indentation of text in code samples
1 parent 37914a4 commit c816d84

File tree

2 files changed

+123
-7
lines changed

2 files changed

+123
-7
lines changed

third_party/bigframes_vendored/pandas/core/frame.py

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2159,8 +2159,68 @@ def map(self, func, na_action: Optional[str] = None) -> DataFrame:
21592159
In pandas 2.1.0, DataFrame.applymap is deprecated and renamed to
21602160
DataFrame.map.
21612161
2162+
**Examples:**
2163+
2164+
>>> import bigframes.pandas as bpd
2165+
>>> bpd.options.display.progress_bar = None
2166+
2167+
Let's use ``reuse=False`` flag to make sure a new ``remote_function``
2168+
is created every time we run the following code, but you can skip it
2169+
to potentially reuse a previously deployed ``remote_function`` from
2170+
the same user defined function.
2171+
2172+
>>> @bpd.remote_function([int], float, reuse=False)
2173+
... def minutes_to_hours(x):
2174+
... return x/60
2175+
2176+
>>> df_minutes = bpd.DataFrame(
2177+
... {"system_minutes" : [0, 30, 60, 90, 120],
2178+
... "user_minutes" : [0, 15, 75, 90, 6]})
2179+
>>> df_minutes
2180+
system_minutes user_minutes
2181+
0 0 0
2182+
1 30 15
2183+
2 60 75
2184+
3 90 90
2185+
4 120 6
2186+
<BLANKLINE>
2187+
[5 rows x 2 columns]
2188+
2189+
>>> df_hours = df_minutes.map(minutes_to_hours)
2190+
>>> df_hours
2191+
system_minutes user_minutes
2192+
0 0.0 0.0
2193+
1 0.5 0.25
2194+
2 1.0 1.25
2195+
3 1.5 1.5
2196+
4 2.0 0.1
2197+
<BLANKLINE>
2198+
[5 rows x 2 columns]
2199+
2200+
If there are ``NA``/``None`` values in the data, you can ignore
2201+
applying the remote function on such values by specifying
2202+
``na_action='ignore'``.
2203+
2204+
>>> df_minutes = bpd.DataFrame(
2205+
... {
2206+
... "system_minutes" : [0, 30, 60, None, 90, 120, bpd.NA],
2207+
... "user_minutes" : [0, 15, 75, 90, 6, None, bpd.NA]
2208+
... }, dtype="Int64")
2209+
>>> df_hours = df_minutes.map(minutes_to_hours, na_action='ignore')
2210+
>>> df_hours
2211+
system_minutes user_minutes
2212+
0 0.0 0.0
2213+
1 0.5 0.25
2214+
2 1.0 1.25
2215+
3 <NA> 1.5
2216+
4 1.5 0.1
2217+
5 2.0 <NA>
2218+
6 <NA> <NA>
2219+
<BLANKLINE>
2220+
[7 rows x 2 columns]
2221+
21622222
Args:
2163-
func:
2223+
func (function):
21642224
Python function wrapped by ``remote_function`` decorator,
21652225
returns a single value from a single value.
21662226
na_action (Optional[str], default None):

third_party/bigframes_vendored/pandas/core/series.py

Lines changed: 62 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -728,18 +728,74 @@ def apply(
728728
func,
729729
) -> DataFrame | Series:
730730
"""
731-
Invoke function on values of Series.
731+
Invoke function on values of a Series.
732732
733-
Can be ufunc (a NumPy function that applies to the entire Series)
734-
or a Python function that only works on single values.
733+
**Examples:**
734+
735+
>>> import bigframes.pandas as bpd
736+
>>> bpd.options.display.progress_bar = None
737+
738+
Let's use ``reuse=False`` flag to make sure a new ``remote_function``
739+
is created every time we run the following code, but you can skip it
740+
to potentially reuse a previously deployed ``remote_function`` from
741+
the same user defined function.
742+
743+
>>> @bpd.remote_function([int], float, reuse=False)
744+
... def minutes_to_hours(x):
745+
... return x/60
746+
747+
>>> minutes = bpd.Series([0, 30, 60, 90, 120])
748+
>>> minutes
749+
0 0
750+
1 30
751+
2 60
752+
3 90
753+
4 120
754+
dtype: Int64
755+
756+
>>> hours = minutes.apply(minutes_to_hours)
757+
>>> hours
758+
0 0.0
759+
1 0.5
760+
2 1.0
761+
3 1.5
762+
4 2.0
763+
dtype: Float64
764+
765+
You could turn a user defined function with external package
766+
dependencies into a BigQuery DataFrames remote function. You would
767+
provide the names of the packages via ``packages`` param.
768+
769+
>>> @bpd.remote_function(
770+
... [str],
771+
... str,
772+
... reuse=False,
773+
... packages=["cryptography"],
774+
... )
775+
... def get_hash(input):
776+
... from cryptography.fernet import Fernet
777+
...
778+
... # handle missing value
779+
... if input is None:
780+
... input = ""
781+
...
782+
... key = Fernet.generate_key()
783+
... f = Fernet(key)
784+
... return f.encrypt(input.encode()).decode()
785+
786+
>>> names = bpd.Series(["Alice", "Bob"])
787+
>>> hashes = names.apply(get_hash)
735788
736789
Args:
737790
func (function):
738-
Python function or NumPy ufunc to apply.
791+
BigFrames DataFrames ``remote_function`` to apply. The function
792+
should take a scalar and return a scalar. It will be applied to
793+
every element in the ``Series``.
739794
740795
Returns:
741-
bigframes.series.Series: If func returns a Series object the result
742-
will be a DataFrame.
796+
bigframes.series.Series: A new Series with values representing the
797+
return value of the ``func`` applied to each element of the original
798+
Series.
743799
"""
744800
raise NotImplementedError(constants.ABSTRACT_METHOD_ERROR_MESSAGE)
745801

0 commit comments

Comments
 (0)