-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify encoding of colon between scheme and type #361
base: master
Are you sure you want to change the base?
Changes from all commits
0950bb9
0565922
64c1f44
2f491d2
ee8126f
259342e
7a28a99
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -55,7 +55,7 @@ sometimes look like a ``host`` but its interpretation is specific to a ``type``. | |||||
|
||||||
|
||||||
Some ``purl`` examples | ||||||
~~~~~~~~~~~~~~~~~~~~ | ||||||
~~~~~~~~~~~~~~~~~~~~~~ | ||||||
|
||||||
:: | ||||||
|
||||||
|
@@ -72,7 +72,7 @@ Some ``purl`` examples | |||||
|
||||||
|
||||||
A ``purl`` is a URL | ||||||
~~~~~~~~~~~~~~~~~ | ||||||
~~~~~~~~~~~~~~~~~~~ | ||||||
|
||||||
- A ``purl`` is a valid URL and URI that conforms to the URL definitions or | ||||||
specifications at: | ||||||
|
@@ -110,7 +110,7 @@ A ``purl`` is a URL | |||||
|
||||||
|
||||||
Rules for each ``purl`` component | ||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||
|
||||||
A ``purl`` string is an ASCII URL string composed of seven components. | ||||||
|
||||||
|
@@ -122,27 +122,11 @@ The rules for each component are: | |||||
|
||||||
- **scheme**: | ||||||
|
||||||
- The ``scheme`` is a constant with the value "pkg" | ||||||
- Since a ``purl`` never contains a URL Authority, its ``scheme`` must not be | ||||||
suffixed with double slash as in 'pkg://' and should use instead | ||||||
'pkg:'. Otherwise this would be an invalid URI per rfc3986 at | ||||||
https://tools.ietf.org/html/rfc3986#section-3.3:: | ||||||
|
||||||
If a URI does not contain an authority component, then the path | ||||||
cannot begin with two slash characters ("//"). | ||||||
|
||||||
It is therefore incorrect to use such '://' scheme suffix as the URL would | ||||||
no longer be valid otherwise. In its canonical form, a ``purl`` must | ||||||
NOT use such '://' ``scheme`` suffix but only ':' as a ``scheme`` suffix. | ||||||
- ``purl`` parsers must accept URLs such as 'pkg://' and must ignore the '//'. | ||||||
- ``purl`` builders must not create invalid URLs with such double slash '//'. | ||||||
- The ``scheme`` is followed by a ':' separator | ||||||
- For example these two purls are strictly equivalent and the first is in | ||||||
canonical form. The second ``purl`` with a '//' is an acceptable ``purl`` but is | ||||||
an invalid URI/URL per rfc3986:: | ||||||
|
||||||
pkg:gem/ruby-advisory-db-check@0.12.4 | ||||||
pkg://gem/ruby-advisory-db-check@0.12.4 | ||||||
- The ``scheme`` is a constant with the value "pkg". | ||||||
- The ``scheme`` and ``type`` MUST be separated by a colon ':'. | ||||||
- ``purl`` parsers MUST accept URLs in which the ``scheme`` and colon ':' are | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
followed by one or more slash '/' characters, such as 'pkg://', and MUST | ||||||
ignore -- i.e., normalize by removing -- all such '/' characters. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
|
||||||
- **type**: | ||||||
|
@@ -234,21 +218,24 @@ The rules for each component are: | |||||
Character encoding | ||||||
~~~~~~~~~~~~~~~~~~ | ||||||
|
||||||
For clarity and simplicity a ``purl`` is always an ASCII string. To ensure that | ||||||
there is no ambiguity when parsing a ``purl``, separator characters and non-ASCII | ||||||
characters must be UTF-encoded and then percent-encoded as defined at:: | ||||||
For clarity and simplicity, a ``purl`` is always an ASCII string. To ensure that | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to refactor this section separately. @mprpic volunteered to make this PR. |
||||||
there is no ambiguity when parsing a ``purl``, unless otherwise provided in | ||||||
this specification, separator characters and non-ASCII characters MUST be | ||||||
UTF-encoded and then percent-encoded as defined at:: | ||||||
|
||||||
https://en.wikipedia.org/wiki/Percent-encoding | ||||||
|
||||||
Use these rules for percent-encoding and decoding ``purl`` components: | ||||||
|
||||||
- the ``type`` must NOT be encoded and must NOT contain separators | ||||||
|
||||||
- the '#', '?', '@' and ':' characters must NOT be encoded when used as | ||||||
separators. They may need to be encoded elsewhere | ||||||
- the '#', '?', '@' and ':' characters MUST remain unencoded and displayed | ||||||
as-is when used as separators. They may need to be encoded elsewhere. | ||||||
|
||||||
- the ':' ``scheme`` and ``type`` separator does not need to and must NOT be encoded. | ||||||
It is unambiguous unencoded everywhere | ||||||
- the colon ':' separator between ``scheme`` and ``type`` MUST remain unencoded. | ||||||
For example, in the PURL snippet ``pkg:npm`` the colon ':' MUST remain | ||||||
unencoded and displayed as-is, i.e., ``pkg:npm``, and the PURL snippet | ||||||
``pkg%3Anpm`` is invalid. | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gernot-h @pombredanne Consider adding at the top of the file, perhaps as a new one-line paragraph following the current first paragraph, something along the lines of the following:
Or perhaps a slight modification to the example provided by RFC 2119:
(Note that the core spec currently contains a great deal of language that will need to be modified to implement RFC 2119/8174.) |
||||||
- the '/' used as ``type``/``namespace``/``name`` and ``subpath`` segments separator | ||||||
does not need to and must NOT be percent-encoded. It is unambiguous unencoded | ||||||
|
@@ -259,7 +246,7 @@ Use these rules for percent-encoding and decoding ``purl`` components: | |||||
- the '=' ``qualifiers`` key/value separator must NOT be encoded | ||||||
- the '#' ``subpath`` separator must be encoded as ``%23`` elsewhere | ||||||
|
||||||
- All non-ASCII characters must be encoded as UTF-8 and then percent-encoded | ||||||
- All non-ASCII characters MUST be encoded as UTF-8 and then percent-encoded. | ||||||
|
||||||
It is OK to percent-encode ``purl`` components otherwise except for the ``type``. | ||||||
Parsers and builders must always percent-decode and percent-encode ``purl`` | ||||||
|
@@ -268,7 +255,7 @@ build" sections. | |||||
|
||||||
|
||||||
How to build ``purl`` string from its components | ||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||
|
||||||
Building a ``purl`` ASCII string works from left to right, from ``type`` to | ||||||
``subpath``. | ||||||
|
@@ -343,7 +330,7 @@ To build a ``purl`` string from its components: | |||||
|
||||||
|
||||||
How to parse a ``purl`` string in its components | ||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||
|
||||||
Parsing a ``purl`` ASCII string into its components works from right to left, | ||||||
from ``subpath`` to ``type``. | ||||||
|
@@ -386,7 +373,8 @@ To parse a ``purl`` string in its components: | |||||
- The left side lowercased is the ``scheme`` | ||||||
- The right side is the ``remainder`` | ||||||
|
||||||
- Strip the ``remainder`` from leading and trailing '/' | ||||||
- Strip all leading and trailing '/' characters (e.g., '/', '//', '///' and | ||||||
so on) from the ``remainder`` | ||||||
|
||||||
- Split this once from left on '/' | ||||||
- The left side lowercased is the ``type`` | ||||||
|
@@ -424,7 +412,7 @@ There are several known ``purl`` package type definitions tracked in the | |||||
separate `<PURL-TYPES.rst>`_ document. | ||||||
|
||||||
Known ``qualifiers`` key/value pairs | ||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||
|
||||||
Note: Do not abuse ``qualifiers``: it can be tempting to use many qualifier | ||||||
keys but their usage should be limited to the bare minimum for proper package | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
Frequently Asked Questions | ||
========================== | ||
|
||
The following FAQs are organized into | ||
|
||
- a "Components" section that includes each of the seven PURL components | ||
(``scheme``, ``type``, ``namespace``, ``name``, ``version``, ``qualifiers`` | ||
and ``subpath``), and | ||
|
||
- a "General" section containing a mix of questions and answers that don't fit | ||
neatly into a component-focused category. | ||
|
||
If you have a question about the PURL specification and don't find an answer | ||
below, you can open an issue `here <https://github.com/package-url/purl-spec/issues/new?template=Blank+issue>`_. | ||
|
||
Components | ||
~~~~~~~~~~ | ||
|
||
Scheme | ||
------ | ||
|
||
**QUESTION**: Can the ``scheme`` component be followed by a colon and two slashes, like a URI? | ||
|
||
No. Since a Package-URL, or PURL, never contains a URL Authority, its ``scheme`` should not be suffixed with double slash as in 'pkg://' and should use 'pkg:' instead. Otherwise this would be an invalid URI per RFC 3986 at https://tools.ietf.org/html/rfc3986#section-3.3:: | ||
|
||
If a URI does not contain an authority component, then the path | ||
cannot begin with two slash characters ("//"). | ||
|
||
This rule applies to all slash '/' characters between the ``scheme``'s colon separator and the ``type`` component, e.g., ':/', '://', ':///' et al. | ||
|
||
In its canonical form, a PURL must not use any such ':/' ``scheme`` suffix and may only use ':' as a ``scheme`` suffix. This means that: | ||
|
||
- PURL parsers must accept URLs such as 'pkg://'and must ignore -- i.e., normalize by deleting -- all such '/' characters. | ||
- PURL builders should not create invalid URLs with one or more slash '/' characters between 'pkg:' and the `type` component. | ||
|
||
For example, although these two PURLs are strictly equivalent, the first is in canonical form, while the second -- with a '//' between 'pkg:' and the ``type`` 'gem' -- is an acceptable PURL but is an invalid URI/URL per RFC 3986:: | ||
|
||
pkg:gem/ruby-advisory-db-check@0.12.4 | ||
|
||
pkg://gem/ruby-advisory-db-check@0.12.4 | ||
|
||
**QUESTION**: Is the colon between ``scheme`` and ``type`` encoded? Can it be encoded? If yes, how? | ||
|
||
There are two sections of the core specification that address this question: | ||
|
||
- The "Rules for each ``purl`` component" section provides that "[t]he ``scheme`` and ``type`` MUST be separated by a colon ':'". | ||
- The "Character encoding" section provides that | ||
|
||
the '#', '?', '@' and ':' characters MUST remain unencoded and displayed as-is when used as separators. . . . [T]he colon ':' separator between ``scheme`` and ``type`` MUST remain unencoded. For example, in the PURL snippet ``pkg:npm`` the colon ':' MUST remain unencoded and displayed as-is, i.e., ``pkg:npm``, and the PURL snippet ``pkg%3Anpm`` is invalid. | ||
|
||
In this case, the colon ':' between ``scheme`` and ``type`` is being used as a separator, and consequently should be used as-is, never encoded and never requiring any decoding. Moreover, it should be a parsing error if the colon ':' does not come directly after 'pkg'. Tools are welcome to recover from this error to help with damaged PURLs, but that's not a requirement. | ||
|
||
---- | ||
|
||
Type | ||
---- | ||
|
||
[to come] | ||
|
||
---- | ||
|
||
Namespace | ||
--------- | ||
|
||
[to come] | ||
|
||
---- | ||
|
||
Name | ||
---- | ||
|
||
[to come] | ||
|
||
---- | ||
|
||
Version | ||
------- | ||
|
||
[to come] | ||
|
||
---- | ||
|
||
Qualifiers | ||
---------- | ||
|
||
[to come] | ||
|
||
---- | ||
|
||
Subpath | ||
------- | ||
|
||
[to come] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.