You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This tutorial is intended as a comparison between using **PyMongoArrow**,
8
-
versus just PyMongo. The reader is assumed to be familiar with basic
9
-
`PyMongo <https://pymongo.readthedocs.io/en/stable/tutorial.html>`_ and
10
-
`MongoDB <https://docs.mongodb.com>`_ concepts.
11
-
7
+
.. contents:: On this page
8
+
:local:
9
+
:backlinks: none
10
+
:depth: 1
11
+
:class: singlecol
12
+
13
+
.. facet::
14
+
:name: genre
15
+
:values: reference
16
+
17
+
.. meta::
18
+
:keywords: PyMongo, equivalence
19
+
20
+
In this guide, you can learn about the differences between {+driver-short+} and the
21
+
PyMongo driver. This guide assumes familiarity with basic :driver:`PyMongo
22
+
</pymongo>` and `MongoDB <https://docs.mongodb.com>`__ concepts.
12
23
13
24
Reading Data
14
25
------------
@@ -17,93 +28,98 @@ The most basic way to read data using PyMongo is:
17
28
18
29
.. code-block:: python
19
30
20
-
coll = db.benchmark
21
-
f = list(coll.find({}, projection={"_id": 0}))
22
-
table = pyarrow.Table.from_pylist(f)
31
+
coll = db.benchmark
32
+
f = list(coll.find({}, projection={"_id": 0}))
33
+
table = pyarrow.Table.from_pylist(f)
23
34
24
-
This works, but we have to exclude the "_id" field because otherwise we get this error::
35
+
This works, but you have to exclude the ``_id`` field, otherwise you get the following error:
25
36
26
-
pyarrow.lib.ArrowInvalid: Could not convert ObjectId('642f2f4720d92a85355671b3') with type ObjectId: did not recognize Python value type when inferring an Arrow data type
37
+
.. code-block:: python
27
38
28
-
The workaround gets ugly (especially if you're using more than ObjectIds):
39
+
pyarrow.lib.ArrowInvalid: Could not convert ObjectId('642f2f4720d92a85355671b3') with type ObjectId: did not recognize Python value type when inferring an Arrow data type
29
40
30
-
.. code-block:: pycon
41
+
The following code example shows a workaround for the preceding error when
42
+
using PyMongo:
31
43
32
-
>>> f = list(coll.find({}))
33
-
>>> for doc in f:
34
-
... doc["_id"] = str(doc["_id"])
35
-
...
36
-
>>> table = pyarrow.Table.from_pylist(f)
37
-
>>> print(table)
38
-
pyarrow.Table
39
-
_id: string
40
-
x: int64
41
-
y: double
44
+
.. code-block:: python
42
45
43
-
Even though this avoids the error, an unfortunate drawback is that Arrow cannot identify that it is an ObjectId,
44
-
as noted by the schema showing "_id" is a string.
45
-
The primary benefit that PyMongoArrow gives is support for BSON types through Arrow/Pandas Extension Types. This allows you to avoid the ugly workaround:
46
+
>>> f = list(coll.find({}))
47
+
>>> for doc in f:
48
+
... doc["_id"] = str(doc["_id"])
49
+
...
50
+
>>> table = pyarrow.Table.from_pylist(f)
51
+
>>> print(table)
52
+
pyarrow.Table
53
+
_id: string
54
+
x: int64
55
+
y: double
46
56
47
-
.. code-block:: pycon
57
+
Even though this avoids the error, a drawback is that Arrow can't identify that ``_id`` is an ObjectId,
58
+
as noted by the schema showing ``_id`` as a string.
0 commit comments