Skip to content

Move from keyczar to cryptography library for symmetric encryption / decryption #4165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jun 13, 2018

Conversation

Kami
Copy link
Member

@Kami Kami commented Jun 6, 2018

This pull request moves our code for handling symmetric AES encryption and decryption from keyczar to cryptography library.

The change is fully backward and forward compatible as it's an implementation detail so it should in no way affect end users (and they shouldn't notice it either).

Background

When python-keyczar library was released, it was one of the best and most user-friendly crypto library with sane and safe defaults in the Python world.

Over the years the crypto world moved on (one of the notable moves forward was development of the cryptography library which was meant to be a more portable and safer / saner higher level replacement for PyCrypto, M2Crypto and other crypto libraries in the Python world).

Python-keyczar is still a relatively safe and good choice in the Python world, but one of the biggest problem is that it's not actively maintained / developed anymore and it doesn't support Python 3.

There are also some minor weakness in keyczar library, but nothing really major.

For us, the main reason for moving away from keyczar to cryptography is Python 3 support and active development and maintenance of cryptography library.

Implementation

The main goal behind this change was to make our crypto code work under Python 3 and to make it fully backward compatible aka don't require any user interaction / migration.

To be able to do that, I needed to re-implement keyczar AES Encrypt() and Decrypt() methods, but use the primitives from the cryptography library.

I'm never a fan or proponent of "reimplementing" crypto stuff, but in this case it was mostly limited to simple stuff such as safely generating IV string (using safe and recommended os.urandom), generating HMAC signature for the message and serializing the encrypted message (and vice-versa for decryption).

Even though that code is relatively simple and actual hard part such as encryption + signing is outsourced to cryptography, please perform a very thorough review / audit of the code to make sure I'm not doing anything stupid / unsafe.

In addition to that, I also needed to implement the code which knows how to read and handle JSON key files generated by keyczar which contain AES key and HMAC key.

While working on that, I also made a small change / improvement - I changed the default size for the AES and HMAC keys we generate from 128 to 256 bits.

Future Improvements

As mentioned above, re-implementing those two methods is not ideal.

In the future when we have a plan for a seamless migration in place, we should move to Fernet recipe in the cryptography library for handling symmetric encryption and decryption (https://cryptography.io/en/latest/fernet/).

Underneath, that recipe actually implements quite a similar algorithm to one used by keyczar (AES in CBC mode with SHA1 HMAC in keyczar case and AES in CBC mode with SHA256 HMAC in cryptography case), but it's better audited and maintained than our code (and that's always important when dealing with crypto / security sensitive code).

Keep in mind that the "seamless" migration in the future will at the very least require generating new secret key which will be used by cryptography fernet recipe.

With that change, we now have 90% or so of our unit tests running and passing under Python 3.6.

Resolves #3974

Kami added 16 commits June 5, 2018 15:11
functions which utilize crypto primitives from cryptography library.

We are moving from keyczar to cryptography, because keyczar doesn't
support Python 3, but cryptography does.

NOTE: Those functions utilize cryptography library primitives, but they
follow the same approach as keyczar Encrypt() and Decrypt() methods so
they are fully compatible with keyczar key files and keyczar encrypted /
decrypted data.
primitives to safely generate a new random symmetric AES and HMAC key
in keyczar compatible format.
@Kami Kami added the python3 label Jun 6, 2018
@Kami Kami added this to the 2.8.0 milestone Jun 6, 2018
@@ -25,3 +25,4 @@ psutil==5.4.5
webtest==2.0.25
rstcheck>=3.3.0,<3.4
pyrabbit
python-keyczar
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StackStorm doesn't depend on python-keyczar anymore, but we still needs it for tests where we check our code and output is fully compatible with keyczar one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add that in as a comment in the file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will do.

I thought I already added it, but probably to the other file.

Now that our crypto code is Python 3 compatible, we can also run those
tests under Python 3.
@Kami Kami requested review from lakshmi-kannan, blag and bigmstone June 7, 2018 07:34
@Kami Kami changed the title [WIP] Move from keyczar to cryptography library for symmetric encryption / decryption Move from keyczar to cryptography library for symmetric encryption / decryption Jun 7, 2018
@lakshmi-kannan
Copy link
Contributor

Whoever wrote the original code with python-keyczar had right abstractions that changing libraries was almost easy :D. I like the changes.

Copy link
Contributor

@bigmstone bigmstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Especially added assertions that key is large enough. I would almost prefer this be its own library w/ the work you put into building backwards compatibility for keyczar JSON compatibility and reading that into AESKey. 👍

@Kami
Copy link
Member Author

Kami commented Jun 11, 2018

@bigmstone

I would almost prefer this be its own library w/ the work you put into building backwards compatibility for keyczar JSON compatibility and reading that into AESKey. +1

Yeah, I was actually thinking of making a new package, cryptography-keyczar-adapter or something along those lines.

It's worth nothing that keyczar offers a lot more functionality and APIs which we don't use and as such, it's not implemented here (asymmetric / public crypto, key rotation, etc). So the adapter is / would be specifically for symmetric AES crypto and reading keyczar compatible JSON key files :)

:param key_type: Type of crypto key.
:type key_type: :class:`keyczar.keys.KeyType`
def __init__(self, aes_key_string, hmac_key_string, hmac_key_size, mode='CBC', size=128):
if mode not in 'CBC':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be:

    if mode not in ['CBC']:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, good catch 👍

return AESKey(aes_key_string=aes_key_string, hmac_key_string=hmac_key_string,
hmac_key_size=key_size, mode='CBC', size=key_size)

def __json__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not a good idea at this point, since special methods starting and ending with double underscores are reserved for the Python interpreter (read: their names get mangled by CPython).

Is this for backwards compatibility with keyczar? That's really the only good reason I can think to use a dunder for this method.

I would rename this to something like to_json or _to_json if it needs to be private.

Copy link
Member Author

@Kami Kami Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree - I will change it to to_json (__json__ is kinda a left over from the simplejson library and for consistency since we use it in some other places).

'cryptography_symmetric_encrypt',
'cryptography_symmetric_decrypt',

# NOTE: Keyczar functions are here for testing reasons - they are only used by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to export these explicitly if you tweak everything that imports them to dig them out of the module:

from st2common.util.crypto import keyczar_symmetric_encrypt

keyczar_symmetric_encrypt(...)

can be rewritten as:

import st2common.util.crypto as crypto

crypto.keyczar_symmetric_encrypt(...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing each function separately is just a personal preference. I prefer it over module imports where module names are generic as this one.

As far as exporting all the public functions and using __all__ goes - that's just a "safety guard" and good practice I tend to follow. It makes it easier to see which public methods / classes a module exports and in case someone does "from foo import *", it will only include "public" methods. I know asterik imports are a bad idea and we should have a lint check for that, but still.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I like to import each function separately too. Although I think you can still import individual functions even if they aren't "exported" with __all__, so you may not even need to export those functions at all. Whatever you think is best. 😃

Copy link
Member Author

@Kami Kami Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, __all__ just affects asterisk import (*) import behavior :)

from st2common.util.crypto import symmetric_encrypt
from st2common.util.crypto import symmetric_decrypt
from st2common.util.crypto import keyczar_symmetric_decrypt
from st2common.util.crypto import keyczar_symmetric_encrypt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be rewritten to:

import st2common.util.crypto as crypto
keyczar_symmetric_encrypt = crypto.keyczar_symmetric_encrypt

so you don't have to export keyczar_symmetric_encrypt.

Same story with keyczar_symmetric_decrypt. See my companion comment in st2common/utils/crypto.py.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above - personal preference and I think it's easier to glance at the top of the module to see what functions / classes a particular module imports and uses.


:rtype: ``str``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would raise a warning here instead of an error - UNLESS you are maintaining backwards compatibility.

Some users may want to import old (<128 bit) keys.

Copy link
Member Author

@Kami Kami Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never supported smaller keys so I think there is no need for that (our tool always generated 128 bit keys and AES doesn't support smaller keys anyway).


:rtype: :class:`AESKey`
"""
if key_size < 128:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to consider putting the minimum AES key size into its own constant to give the reader an idea of where the number 128 came from. Right now it's just a magic number to anybody who doesn't keep up on crypto recommendations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I will make a constant.

NOTE: Header itself is unused, but it's added so the format is compatible with keyczar format.

"""
assert isinstance(encrypt_key, AESKey), 'encrypt_key needs to be AESKey class instance'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These asserts look okay, just be aware that running Python with optimizations turned on bypasses all assertions made with the assert keyword. This has caused security issues in the past (example: PySAML).

Assertions are meant to be used as debugging aids, not as a mechanism for error handling.

It sucks, but you may want to use plain if statements for all of your assertions:

if not isinstance(plaintext, (six.text_type, six.string_types, six.binary_types)):
    raise CustomErrorType(...)  # Not an AssertionError

More help: https://dbader.org/blog/python-assert-tutorial

Copy link
Member Author

@Kami Kami Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I used them for what they are intended - as debugging / developer aid aka to make sure programmer passes in a correct type (stuff which should get caught during tests and not during run time for which I would use exceptions).

For dynamic run time stuff I always use exceptions and not asserts.

My reasoning is that this code never gets a dynamic user data passed in and data is always passed in by a programmer so asserts make more sense since we are guarding against invalid use of the method by the programmer and not guarding a user potentially passing in invalid data.

@@ -57,28 +308,128 @@ def symmetric_encrypt(encrypt_key, plaintext):
'Initialization Vector' per run and the IV is part of the output.

:param encrypt_key: Symmetric AES key to use for encryption.
:type encrypt_key: :class:`keyczar.keys.AesKey`
:type encrypt_key: :class:`keyczar.keys.AESKey`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is close but not quite accurate. The class should be: st2common.utils.crypto.AESKey.

Copy link
Member Author

@Kami Kami Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed.

@blag
Copy link
Contributor

blag commented Jun 12, 2018

Definitely consider releasing it as its own package even if it only implements what we need from it. Other open source devs might be interested in it and implement what they need, to the point it becomes fairly complete. Heck, the cryptography developers themselves might be interested in absorbing it as an adapter.

I've appreciated keyczar but it's not really maintained anymore, so I welcome creating a backwards-compatible replacement/adapter for it.

@Kami Kami merged commit e3eef09 into master Jun 13, 2018
@Kami Kami deleted the move_from_keyczar_to_cryptography_py3 branch June 13, 2018 08:11
Kami added a commit that referenced this pull request Jun 13, 2018
Kami added a commit that referenced this pull request Jun 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants