-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIPS support for hashlib #53462
Comments
(taking the liberty of adding gregory.p.smith to the "nosy" list; hope that's OK) This is a higher-level take on bpo-9146. Some versions of OpenSSL have a FIPS mode that can refuse the use of non-certified hashes. The idea is that FIPS mode should prevent the use of non-certified hashes for security uses. For example, MD5 shouldn't be used for signatures these days (see e.g. http://www.kb.cert.org/vuls/id/836068). However, there are legitimate non-security uses of these hashes. For example, one might use MD5 hashes of objects to places them in bins for later retrieval, purely as a speed optimization (e.g. files in directories on a filesystem). I'm working on a patch to hashlib which would better support this, but it involves an API expansion, and I wanted to sound things out first. The API idea is to introduce a new keyword argument, say "usedforsecurity" to hashlib.new() and to the named hashlib constructors, such as hashlib.md5(). This would default to True. If code is using these hashes in FIPS mode, the developer needs to override this: usedforsecurity=False to mark the callsite as a non-security-sensitive location. Internally, this would lead to the EVP_MD_CTX being initialized with EVP_MD_CTX_FLAG_NON_FIPS_ALLOW. This way, if you run unaudited code in an environment that cares about FIPS, the code will raise exceptions if it uses a non-valid hash, but during code audit the callsites can be marked clearly as "usedforsecurity=False", and be used as before. In non-FIPS environments, the flag would be ignored. Am I right in thinking that the _hashlib module should be treated as an implementation detail here? The entry points within _hashlib are likely to double, with a pair of pre-initialized contexts, one with the flag, one without. Does this sound reasonable? Thanks. |
That sounds fine to me and I do like the usedforsecurity annotation on the API. I'll gladly review any patches. |
Attached is a patch against the py3k branch which implements this. I've checked that it builds against openssl-0.9.8o.tar.gz, openssl-1.0.0a.tar.gz, and against Fedora 12 and 13's heavily-patched openssl-1.0.0. The bulk of my testing has been against Fedora's openssl. I've added selftests to try to verify the new API. I try to detect if the OpenSSL enforces FIPS, via trying to run "openssl md5" as a subprocess, and seeing if I can trigger an error. With FIPS enforcement off, all tests pass when built against 0.9.8o and 1.0.0a and F13's 1.0.0, other than those for FIPS enforcement itself, which skip. With FIPS enforcement on, all tests pass when built against F13's openssl. (I haven't yet figured out how to get the fips selftest to pass for the other builds, it's testing checksums against the wrong libcrypto for some reason; see caveat below): For all of the various contexts stored in _hashopenssl.c, we now store two: one with the override flag, one without. This required some reworking of the various preprocessor magic in that file, so I've gathered everything related to an algorithm into a structure, and moved most of the logic into functions, rather than macros. I'm assuming that these will get inlined under optimization, and that the bulk of the time that you're trying to optimize out are the EVP lookups and initializations, rather than function call overhead. How's this looking? Do I need to add a dummy "usedforsecurity" arg to all of the non-openssl message digest implementations within the tree? Unfortunately, if fips mode is on, and the fips selftest fails for the openssl library, every hash use will fail, both with and without the flag: |
I've refreshed this patch against the latest version of the code in hg. In an attempt to make it easier to review, I've split it up into four (so far) thematic patches, which apply in sequence. |
[and yes, I used git to generate the 4 patches; sorry ] |
The cumulative effect of the above patches (to _hashlib) are equivalent to what I've applied downstream to python 2 in RHEL 6.0 and Fedora 17 onwards, and python 3 in Fedora 17 onwards. In those environments I've additionally patched hashlib to only use _hashlib, rather than falling back on _md5 etc, since otherwise you get confusing error messages from hashlib.md5() when it defers to _md5 due to FIPS enforcement. In my downstream builds we can be sure of building against OpenSSL, but this other part of the patch seems less appropriate for upstream python, given that upstream python tries to be flexible in terms of its dependencies. Hope this makes sense. |
quick summary of comments from pycon sprints discussion: this looks pretty good. i like the 0001 refactoring cleanup. a couple things to fix in error handling (better messages and some bogus handling in the test). dmalcolm has the notes on what to do. do it and commit away or ask for more review as you see fit. |
Patch 0002:
Patch 0003:
Patch 0004:
Overall:
|
My summary of our discussion was pretty terse. :) dmalcolm has more detailed TODO list notes that include things like the error cases and .rst documentation. As for how to commit it, i'd make 0001 its own commit as it is a useful refactoring otherwise unrelated to this change. I'll leave it entirely up to dmalcolm how many commits he wants 0002 onward to be. No need to be picky. usedforsecurity vs used_for_security, agreed, used_for_security is better. dmalcolm was going to make an enum to index the two element array, that'd give meaningful names instead of 0 or 1. simply using two named variables would also work but it would require the loop in 0003 to be expanded or turned into a small static method for the body (not a bad idea) instead. i'm fine with either. |
I've not done enough digging on the issue I'm presently experiencing to draw any conclusions make any suggestions, but this change seems to break the present distribute module (version 0.6.27). It appears it will likely break a great deal of other code too. I've pasted the relevant output here and attached the full traceback. Whilst I agree with the notion behind this change, Fedora's quick actions have led to me spending the best part of an hour of the night before ship day diagnosing issues caused by undocumented (or at least under-documented) changes to code I haven't written or interfaced with. _Please_ publicise the change a little better? Pretty please!? |
This changes haven't been committed in Python, so you probably want to post on the Fedora bug tracker instead. |
While you are at it, can you edit the docs to put md5() at the bottom of the page at the back of the list in a 2-point font and raise a DeprecationWarning("This function is totally lame, and it is slower than SHA-3, get with the program.") the first time it is used? I don't agree that md5 has a legitimate place in systems designed after 1996. |
Please... don't make suggestions unrelated to the issue. Open a new issue instead. |
Everything in this issue posted until now has to be managed as vendor patch. |
It's out of scope for 3.3 but I'd love to see the feature in 3.4. |
A few thoughts; usedforsecurity=xxx seems awkward: I wouldn't want, as a user of hashlib, to have to put that in literally every use I make of it. If I understand the situation correctly, the goal is for both linters, and at runtime, identification of the intended purpose of a call to md5 - e.g. whether there are security implications in its use (as far as FIPS is concerned). Perhaps having two separate implementations of the interfaces, one general purpose and one FIPS would be decent. e.g. from hashlib.fips import sha1 Then the md5 thats in hashlib is by definition not FIPS ready and any code using it should be fixed. |
@rbcollins, I don't think providing a hashlib.fips module without md5() solves the problem. The idea is to have a way to call md5() in non-secure situations, and to signal to the FIPS system that the call is OK. A separate module would work if it included an md5() function that always did that signaling. But creating a separate module just to wrap one function like that seems like overkill, doesn't it? |
I agree with Doug. From my understanding, the intention of the patch is to allow the usage of md5 for non-security purposes, without being blocked by FIPS. |
@doug - I don't see how a separate fips module *wouldn't* solve it:
And its way less messy: remember we're going to have this flag passed to every hashlib invocation from every project in order to *opt out* of the FIPS restrictions. Because, over time, FIPS will change, so noone can assume that any given function is and will remain FIPS compatible: and this flag is going to percolate up into e.g. the HMAC module. I think thats pretty ugly: want to calculate the sha of a blob to look it up in git? sha1sum(file.read(), usedforsecurity=False) Separately I wonder about the impact on higher layers - are they ready to be parameterised by objects, or do they look things up by name - and thus need to start accepting this new parameter and passing it down? |
The separate module idea is an interesting one, though I wonder if it aligns with users' goals. Perhaps some users simply want to set the OPENSSL_FORCE_FIPS_MODE environment variable and then run existing Python code with it to ensure that code is FIPS-compliant. A separate module assumes that the developer is the one who makes the decision of running in FIPS compliance mode or not. |
So I see the argument on both sides of this discussion. Having those optional arguments for all the functions seems like an obvious blocker. If a submodule is a blocker, what if we provide a context-manager to signal this? |
AFAICT from David's patch, there isn't a new argument in all hashlib functions but only in the digest constructors. Someone might want to correct me. |
@rbtcollins, so you mean the apps using it, shall be "fips aware" ? That will be the point of your separate module? if fips_enabled then |
@robert, I thought you were proposing a hashlib.fips module that did not include md5() at all. If it does include the function, and the function does whatever is needed to disable the "die when using MD5" on a FIPS system, then I agree it would work. Your point about the FIPS standard changing and needing to include more hash types in the future is good. |
@antoine - The idea behind introducing some API mechanism is exactly as you say, to let the developer say "this use of this algorithm is not related to security" to tell FIPS systems to not be pedantic. |
@rbtcollins, even if we go with a FIPS aware module, we'd still need to detect if md5 was used for security purposes. |
Objection from hashlib maintainer: I will reject a used_for_security flag with default of False. I'm slowly moving Python to a secure-by-default policy. Therefore used_for_security must be an explicit opt-out. I'm aware that the policy will require modifications to all software that uses MD5. To be honest that's my goal. If you care about FIPS, then any use of MD5 must be a concious and careful decision. I want developers to move away from MD5 and replace it with SipHash24, Blake2 or SHA-2. MD5 should *only* remain when backwards incompatibility prevent migration. |
MD5 us especially not allowed in FIPS mode. |
Should this issue be closed as resolved? |
I proposed PR 19703 to expose OpenSSL FIPS_mode() as hashlib.get_fips_mode(). FIPS support was introduced in version 0.9.7 of OpenSSL and so is available in the minimum OpenSSL required to build Python 3.9. LibreSSL doesn't have FIPS_mode() on purpose. Ted Unangst wrote: "I figured I should mention our current libressl policy wrt FIPS mode. It's gone and it's not coming back." My plan is to use hashlib.get_fips_mode() to skip a few tests if the FIPS mode is enabled. Simple example: test_crypt.test_methods() checks that self.assertEqual(crypt.methods[-1], crypt.METHOD_CRYPT). Except that in FIPS mode, METHOD_CRYPT is not available since it's too weak (3DES if I recall correctly). I would like to skip this test in FIPS mode. My colleague Chalampos also plans to add a FIPS enabled buildbot running RHEL8 to ensure that the Python test suite pass in FIPS mode, and detect regressions in FIPS mode. |
I'm trying to understand how "portable" is it to expose OpenSSL FIPS_mode() as hashlib.get_fips_mode() which would return a boolean (True or False). It seems like FIPS is more complex than that. Other crypto libraries which implement FIPS have a different way to expose FIPS mode to the consumer of the API:
See also RHEL 8 Security Hardening documentation, "Chapter 3. Using system-wide cryptographic policies": For my needs (skip tests which are not relevant in FIPS mode), it seems like keeping the function private in _hashlib.get_fips_mode() is enough. My plan is to use it in as test.support.get_fips_mode() function which would return False if _hashlib.get_fips_mode() is missing. |
Petr Viktorin and Christian Heimes convinced me that it's a bad idea to expose OpenSSL FIPS_mode() as a public hashlib.get_fips_mode() function. It is too specific to OpenSSL. For example, FIPS_mode() result is a number which is specific to OpenSSL. Other crypto libraries are likely to use different values. Moreover, as I wrote in my previous message, other crypto libraries expose the FIPS mode differently. It may not just be a global FIPS mode. Finally, there are different FIPS modes. For example, Gcrypt has an "Enforced FIPS" mode. So I modified PR 19703 to only expose FIPS_mode() as a private _hashlib.get_fips_mode() function. Well, as done in RHEL in fact ;-) |
I'm against exposing the function as hashlib.get_fips_mode() because it is an internal implementation detail. I don't want to confuse users or make users think that "if hashlib.get_fips_mode()" is sufficient for feature tests. For starters there are multiple levels and versions of the FIPS standard like FIPS-140-2 and FIPS-140-3. Instead if doing a FIPS test, users and applications should perform a feature test and handle the error. The approach is future-proof and can also cover crypto policies restriction like minimum key sizes. |
Christian Heimes: "Instead if doing a FIPS test, users and applications should perform a feature test and handle the error. The approach is future-proof and can also cover crypto policies restriction like minimum key sizes." Alright, I see. Thanks for your explanation ;-) |
_hashlib.get_fips_mode() is not compatible with new FIPS design in OpenSSL 3.0.0: The function calls 'FIPS_mode()' and 'FIPS_mode_set()' are present in OpenSSL 3.0 but always fail. You should rewrite your application to not use them. https://wiki.openssl.org/index.php/OpenSSL_3.0#Upgrading_from_the_OpenSSL_2.0_FIPS_Object_Module |
I suggest to modify the code so the private function becomes unavailable in _hashlib on OpenSSL 3.0 and newer. What do you think? |
Memo to me: Add whatsnew |
@gpshead Is the issue relevant after transition to HACL*? |
OpenSSL is still the default. regardless, this issue was completed ages ago as far as I can tell. Let commercial-distro-motivated folks open new specific issues for things they want to see done for anything that remains feature wise here. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: