Skip to content

Conversation

@TG1999
Copy link
Contributor

@TG1999 TG1999 commented Sep 2, 2022

Use uuid instead of base36
Reference: #811

Signed-off-by: Tushar Goel tushar.goel.dav@gmail.com

@TG1999 TG1999 requested review from pombredanne and tdruez September 2, 2022 19:28
@TG1999 TG1999 linked an issue Sep 2, 2022 that may be closed by this pull request
3 tasks
Copy link
Contributor

@tdruez tdruez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 36 chars UUID seems overkill, do we really have a need for that many unique possibilities?
Using the first 18 chars (or even less) would be plenty imho.

@TG1999 TG1999 force-pushed the vcid branch 4 times, most recently from 5d968e5 to b82fdc4 Compare September 5, 2022 12:44
@mjherzog
Copy link
Member

mjherzog commented Sep 5, 2022

I suspect that with 36 (or even 18) we will end up with a situation similar to Docker image layers where you will need a truncated VCID for reports and other human-oriented purposes. This may be something we want to do from the start given that something like 10 characters will probably be unique within a particular set of VCIDs for a project or codebase.

@TG1999
Copy link
Contributor Author

TG1999 commented Sep 5, 2022

I have truncated the uuid length to 10, this is how a sample VCID looks like now VCID-2f5e133a-b

@tdruez
Copy link
Contributor

tdruez commented Sep 6, 2022

VCID looks like now VCID-2f5e133a-b

In that example, the last dash is not very useful. I would suggest to split the chars towards the middle for better readability.

This worth more discussion in any cases before implementation, using uuid4 maybe not be the best for our need.

@TG1999 TG1999 force-pushed the vcid branch 2 times, most recently from d53a1c1 to bef49a2 Compare September 6, 2022 13:18
@TG1999
Copy link
Contributor Author

TG1999 commented Sep 6, 2022

@tdruez I experimented a bit with shortUUID, and formed an ID like this VCID-JebnC-khqZw, does this look good ?

@tdruez
Copy link
Contributor

tdruez commented Sep 6, 2022

formed an ID like this VCID-JebnC-khqZw, does this look good ?

@TG1999 Yes, I think it looks much better than a uuid4. Do we really need a new dependency to generate this though?
It feels like this could probably done in a couple lines of Python.
Also, did you include the integers to raise the number of possibility per characters?

@pombredanne
Copy link
Member

Here is what I suggest after a discussion in our weekly call. Reuse something such as "VCID-25fx-gxgp-m24r" where we use three groups of 4 letters, in the same style the GitHub GHSA vulnerability ids.
Basically we could quite likely encode 60 to 64 bits of ids in 12 characters ( 96 bits ) alright. For instance 60 bits could be used with base32 (5 bits encoded per character). We likely can use base36 or base32 or else. Being case sensitive is better.
May be using a base32 and avoiding confusable characters such as : 0 (zero) and O (big and small O) and i (small i and big I) and l (small l and big L)... all the ASCII letters and numbers (36 signs) minus these 4 is 32 characters

"VCID-25fx-gxgp-m24r" encoded using 60 bits in three groups of base32 4-chars strings.

@pombredanne
Copy link
Member

Here is a snippet based on @TG1999 code:

>>> import uuid, hashlib, base64
>>> uid = base64.b32encode(hashlib.sha1(uuid.uuid4().bytes).digest())[:12].decode('utf-8').lower()
>>> f'VCID-{uid[:4]}-{uid[4:8]}-{uid[8:12]}'
'VCID-x2at-frm3-gg6j'
>>> uid
'x2atfrm3gg6j'

This could be something such as:

from hashlib import sha256
from uuid import uuid4


def build_vcid(prefix='VCID'):
    """
    Return a new VulnerableCode VCID unique identifier string using the ``prefix``.

    For example::
    >>> import re
    >>> vcid = build_vcid()
    >>> # VCID-6npv-94wz-hhuq
    >>> assert re.match('VCID(-[a-z1-9]{4}){3}', vcid), vcid
    """
    # we keep only 64 bits (e.g. 8 bytes)
    uid = sha256(uuid4().bytes).digest()[:8]
    # we keep only 12 encoded bytes (which corresponds to 60 bits)
    uid = base32_custom(uid)[:12].decode('utf-8').lower()
    return f'{prefix}-{uid[:4]}-{uid[4:8]}-{uid[8:12]}'


_base32_alphabet = b'ABCDEFGHJKMNPQRSTUVWXYZ123456789'
_base32_table = None


def base32_custom(btes):
    """
    Encode the ``btes`` bytes object using a Base32 encoding using a custom
    alphabet and return a bytes object.

    Code copied and modified from the Python Standard Library:
    base64.b32encode function

    SPDX-License-Identifier: Python-2.0
    Copyright (c) The Python Software Foundation

    For example::
    >>> assert base32_custom(b'abcd') == b'ABTZE25E', base32_custom(b'abcd')
    >>> assert base32_custom(b'abcde00000xxxxxPPPPP') == b'PFUGG3DFGA2DAPBTSB6HT8D2MBJFAXCT'
    """
    global _base32_table
    # Delay the initialization of the table to not waste memory
    # if the function is never called
    if _base32_table is None:
        b32tab = [bytes((i,)) for i in _base32_alphabet]
        _base32_table = [a + b for a in b32tab for b in b32tab]

    encoded = bytearray()
    from_bytes = int.from_bytes

    for i in range(0, len(btes), 5):
        c = from_bytes(btes[i: i + 5], 'big')
        encoded += (
            _base32_table[c >> 30] +  # bits 1 - 10
            _base32_table[(c >> 20) & 0x3ff] +  # bits 11 - 20
            _base32_table[(c >> 10) & 0x3ff] +  # bits 21 - 30
            _base32_table[c & 0x3ff]  # bits 31 - 40
        )
    return bytes(encoded)

@TG1999 TG1999 force-pushed the vcid branch 2 times, most recently from 683233c to 313256b Compare September 7, 2022 10:07
Use uuid instead of base36
Reference: aboutcode-org#811

Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>
('vulnerabilities', '0022_alter_vulnerability_vulnerability_id'),
]

def save_vulnerability_id(apps, schema_editor):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TG1999 would using a bulk_update() be a better approach?
@tdruez ^ ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I've never used it, so you'll have to give it a try and let me know :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks! let me give it a shot with and without

@pombredanne pombredanne merged commit 2a70836 into aboutcode-org:main Sep 8, 2022
@pombredanne
Copy link
Member

I used bulk_update for good results in 7c708e8 and merged it all. Thank you ++ @TG1999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revisit VULCOID ...

4 participants