@@ -6,6 +6,12 @@ Supported Partition Attribute Types
66
77.. default-domain:: mongodb
88
9+ .. contents:: On this page
10+ :local:
11+ :backlinks: none
12+ :depth: 2
13+ :class: singlecol
14+
915The following table lists the supported data types for partition attributes and
1016an example :datalakeconf:`~databases.[n].collections.[n].dataSources.[n].path`
1117for each data type:
@@ -35,6 +41,10 @@ for each data type:
3541 In the above ``path`` examples, ``phone`` is interpreted
3642 as a string.
3743
44+ .. seealso::
45+
46+ :ref:`parse-null-values`
47+
3848 * - ``int``
3949 - Parses the filename as an integer.
4050 - filename: ``/zipcodes/90210.json``
@@ -44,6 +54,10 @@ for each data type:
4454 In the above example, ``zipcode`` is interpreted
4555 as an integer.
4656
57+ .. seealso::
58+
59+ :ref:`parse-padded-numeric-values`
60+
4761 * - ``isodate``
4862 - Parses the filename in `RFC 3339 <https://tools.ietf.org/html/rfc3339>`_
4963 format as an ISO-8601 format date.
@@ -89,6 +103,10 @@ for each data type:
89103 In the above example, ``startTimestamp`` is interpreted
90104 as a Unix timestamp in seconds.
91105
106+ .. seealso::
107+
108+ :ref:`parse-padded-numeric-values`
109+
92110 * - ``epoch_millis``
93111 - Parses the filename as a Unix timestamp in milliseconds.
94112 - filename: ``/metrics/1549046112000.json``
@@ -98,6 +116,10 @@ for each data type:
98116 In the above example, ``startTimestamp`` is interpreted
99117 as a Unix timestamp in milliseconds.
100118
119+ .. seealso::
120+
121+ :ref:`parse-padded-numeric-values`
122+
101123 * - ``objectid``
102124 - Parses the filename as an
103125 :manual:`ObjectId </reference/method/ObjectId/>`.
@@ -123,7 +145,9 @@ for each data type:
123145 {+adl+} supports the `Package Syntax
124146 <https://golang.org/pkg/regexp/syntax/>`__ for regular expressions
125147 in the path to the filename.
126-
148+
149+ .. _parse-null-values:
150+
127151Parsing Null Values from Filenames
128152----------------------------------
129153
@@ -142,3 +166,31 @@ attribute types except ``string``. For example, consider the following |s3|
142166For the path ``/records/{month string}/*``, {+dl+} does not add any
143167computed fields for the ``month`` attribute to documents generated
144168from the third record in the above store.
169+
170+ .. _parse-padded-numeric-values:
171+
172+ Parsing Padded Numbers from Filenames
173+ -------------------------------------
174+
175+ For attribute types like ``int``, ``epoch_millis``, and ``epoch_secs``,
176+ if you want {+dl+} to correctly parse numeric values that are padded
177+ with leading zeros in the path to the file, specify the number
178+ of digits in the padded value using regular expressions. For example,
179+ consider a |s3| store with the following files:
180+
181+ .. code-block:: text
182+ :copyable: false
183+
184+ |--users
185+ |--001.json
186+ |--002.json
187+ ...
188+
189+ The following ``path`` syntax uses a regular expression with the
190+ ``int`` attribute type to specify the number of digits in the
191+ filename:
192+
193+ .. code-block:: sh
194+ :copyable: false
195+
196+ /users/{user_id int:\\d{3}}
0 commit comments