Cryptography

It's easy to make mistakes when you manually use cryptographic primitives. This project primarily uses cryptography for the purpose of signing and encrypting its tokens. If this is done incorrectly, the project is entirely insecure, because with forged tokens almost all data can be easily queried. Therefore, this document aims to properly document how cryptography is used in this project. Cryptography is also used for storing passwords, but this is almost entirely handled by a separate library, mostly using default settings.

Refresh tokens are only readable by the authorization/authentication part of the server and therefore can use the more secure and faster symmetric encryption. Since refresh tokens are not very standardized, this part is the most 'custom' and is the only part that uses cryptographic primitives. All these operations can be found in the auth/hazmat package. It's named hazmat because it uses the hazmat module of the Python library pyca/cryptography and to also signify that this code should be checked thoroughly. The pyca/cryptography library relies on OpenSSL (the underlying crypto implementation) binaries that are packaged together with the Python library. It is important to frequently update pyca/cryptography as OpenSSL receives frequent security fixes.

Refresh tokens

For our refresh tokens, we use AES encryption, specifically 256-bit encryption in GCM mode. If used correctly, GCM is one of the securer modes. However, if you misuse nonces/iv (random bytes used for every encryption to ensure the same plaintext looks different each time, ensuring no information leaks), some information could leak, such as the plaintext length. Care must therefore still be taken.

Our refresh tokens are simple, consisting only of a unique id, a family id and a random tag that makes it unique among its family. A 'family' is a set of refresh tokens descended from a single authentication. Therefore, we encode them as simple Python dicts (using pydantic) and our AES encryption thus works only on these dictionaries. crypt_dict.py provides the encryption and decryption for this.

We encode the dicts as JSON (as plaintext utf-8), generate a random 12-byte nonce (as recommended for AES-GCM) using the Python secrets.token_bytes function (which is recommended for such cryptographic use). We don't use any associated data (which would be unencrypted but could not be modified) as refresh tokens can exist in only a single context, so we simply encrypt using our initialized AESGCM object. This object must already contain the private symmetric key, which we assume is 256-bits (but technically could be also 128 and 192-bits). We simply concatenate the nonce and the encrypted data (which contains an authentication tag added by the pyca/cryptography, which ensures integrity of the data) and encode this in a string using base64url. Note that as it is not necessary, we do not add any base64 padding in the encoding step.

Our decryption works exactly the same, decoding it first, taking the nonce as the first twelve bytes, the data+tag what comes after. It then decrypts using an initialized AESGCM object and decodes the JSON into a dict. A lot can go wrong, but since all refresh tokens should have been generated by this application, we only provide an opaque 'DecryptError'. It is important to note that the base64url decoding simply ignores characters outside the alphabet. Furthermore, since it works with both padding and no padding there are multiple "encodings" of a single refresh token. The encoding holds no semantic value, so only do logic on the fully decrypted and decoded refresh token!

Access tokens (and id tokens)

Access tokens use asymmetric encryption and are JSON web tokens (JWTs). They come signed with a number of claims, meaning the resource server (which in this application is currently the same server as the auth server, but the code is mostly decoupled) simply has to check the validity of the token using a freely available public key. Anyone could check the validity of the claims inside the JWT.

We use EdDSA as our algorithm using the Ed448 curve. The latter technically offers better security than the slightly more standard Ed25519, but the difference is small. It was just a choice. EdDSA is used over other algorithms for its greater compactness. Note that this algorithm is not resistant to hypothetical advanced quantum computers, although we are very far from any quantum computer with enough power to break it. Note that AES is resistant to quantum computers.

To implement signing and verifying, we use the PyJWT library, which internally also relies on pyca/cryptography (and therefore on OpenSSL) for its cryptography. sign_dict.py takes care of signing the token passed as a dict. Since the authorization server would never have to verify an access token, we implement that inside our application (apiserver/lib/hazmat/tokens.py), not inside the decoupled auth component.

Passwords (OPAQUE)

We store passwords using the OPAQUE protocol (see README). This library uses some asymmetric encryption (and other smart stuff) so we can have our cake and eat it too. We don't handle any password on the server, and we are also protected against pre-computation attacks (where using the provided salt an attacker precomputes a password dictionary before taking control of the server). See the opaquepy library (maintained privately by Tip) for details.

Keys

We use an asymmetric key (a public verification key and a private signing key), as well as a symmetric (private) key. Our symmetric key is simply encoded using base64url, while our public key (due to requirements of PyJWT) uses a more complicated scheme, namely in PEM format, using PKCS#8 for the private key and X509PKCS#1 for the public key. These are standardized schemes. We wrap them in our own structs to make handling easier and, they all include a kid (key id), to make it possible to store multiple in a database.

dodekabook

Cryptography

Refresh tokens

Access tokens (and id tokens)

Passwords (OPAQUE)

Keys