|
1 | | -# Load JSON Lines files |
| 1 | +# Load JSON Lines files |
2 | 2 |
|
3 | 3 | Load JSON Lines **(jsonl)** files incrementally, supporting both uncompressed and compressed formats, handling broken |
4 | 4 | lines, and allowing custom deserialization and opener callbacks. |
@@ -45,7 +45,7 @@ Check [note](#note-compression) for more details |
45 | 45 |
|
46 | 46 | import jsonl |
47 | 47 |
|
48 | | -path = "file.jsonl.gz" # gzip compressed file, but it can be ".bz2" or ".xz" |
| 48 | +path = "file.jsonl.gz" # gzip compressed file, but it can be ".bz2" or ".xz" |
49 | 49 |
|
50 | 50 | # Example data to save in the file |
51 | 51 | data = [ |
@@ -90,7 +90,7 @@ with open(path) as fp: |
90 | 90 |
|
91 | 91 | #### Load from a URL |
92 | 92 |
|
93 | | -You can load a JSON Lines directly from a URL incrementally, if needed you can also create custom |
| 93 | +You can load a JSON Lines directly from a URL incrementally, if needed you can also create custom |
94 | 94 | requests using `urllib.request.Request`. |
95 | 95 |
|
96 | 96 | ```python |
@@ -140,6 +140,8 @@ WARNING:root:Broken line at 2: Expecting ',' delimiter: line 2 column 1 (char 28 |
140 | 140 |
|
141 | 141 | #### Load a file using a custom deserialization |
142 | 142 |
|
| 143 | +##### Passing a `json_loads` function |
| 144 | + |
143 | 145 | The `json_loads` parameter allows for custom deserialization and must take a JSON-formatted |
144 | 146 | string as input and return a Python object. |
145 | 147 |
|
@@ -185,6 +187,37 @@ print(tuple(iterator1)) |
185 | 187 | print(tuple(iterator2)) |
186 | 188 | ``` |
187 | 189 |
|
| 190 | +##### Passing additional keyword arguments |
| 191 | + |
| 192 | +The `jsonl.load` function also accepts additional keyword arguments that are passed to the underlying |
| 193 | +JSON deserialization function (by default, `json.loads`). This is useful when you want to customize the behavior of the |
| 194 | +deserialization |
| 195 | + |
| 196 | +Here’s an example using the built-in `json` module to parse float values as `decimal.Decimal`: |
| 197 | + |
| 198 | +```python |
| 199 | +# -*- coding: utf-8 -*- |
| 200 | + |
| 201 | +import decimal |
| 202 | + |
| 203 | +import jsonl |
| 204 | + |
| 205 | +path = "file.jsonl" |
| 206 | + |
| 207 | +# Example data to save in the file with `float` values |
| 208 | +data = [ |
| 209 | + {"name": "Gilbert", "wins_avg": 2.5}, |
| 210 | + {"name": "May", "wins_avg": 3.75}, |
| 211 | +] |
| 212 | + |
| 213 | +# Save the data to the jsonl file |
| 214 | +jsonl.dump(data, path) |
| 215 | + |
| 216 | +# Load the data back, parsing `float` values as `decimal.Decimal` using the `parse_float` keyword argument |
| 217 | +iterator = jsonl.load(path, parse_float=decimal.Decimal) |
| 218 | +print(tuple(iterator)) |
| 219 | +``` |
| 220 | + |
188 | 221 | #### Load a file using a custom opener |
189 | 222 |
|
190 | 223 | The `opener` parameter allows loading files from custom sources, such as a ZIP archive. Here’s how to use it: |
|
0 commit comments