The anatomy of OpenStack Keystone token formats

Tokens in Keystone are generally composed of a number of technologies layered together. All tokens can be deconstructed into at least two layers: a payload which is wrapped in some transport format. The payload provides attributes such as uniqueness, identity, and authorization context. The transport format provides the necessary packaging for transmission and validation.

What is included in the payload is fairly flexible, and impacts the required deployment infrastructure: do you need token persistence? Will token validation be chatty on the network?

The transport format must be URL-friendly without requiring percent encoding by clients prior to use in a URL (which is required by Identity API v2's token validation call). Beyond that, it can provide additional properties such as token integrity verification, authenticity, and non-repudation.

UUID

Payload: UUID4. This is the simplest token format that Keystone supports. The only data included in UUID tokens are randomly generated UUID4 values that provide nothing more than uniqueness:

>>> import uuid
>>> payload = uuid.uuid4()
>>> payload
UUID('2887731d-2a1a-4611-8af2-340b60125865')
>>> print(payload)
2887731d-2a1a-4611-8af2-340b60125865

Format: hexidecimal. These tokens are then packaged using their hexidecimal representation:

>>> token = payload.hex
>>> len(token)
32

So the token itself looks like this:

2887731d2a1a46118af2340b60125865

Because they don't include an identity or authorization context, they must be validated online with Keystone, where Keystone maps an issued UUID token to an identity and authorization context.

Fernet

Payload: MessagePack. Fernet tokens are extremely lightweight tokens that disregard the kitchen sink in favor minimal identity information and a dynamic authorization context. They wrap just a few pieces of data into a MessagePacked payload:

>>> import msgpack
>>> data = (b_user_id, b_scope_id, issued_at_int, expires_at_int, audit_ids)
>>> payload = msgpack.packb(data)
>>> len(payload)
71
>>> print(payload)  # it's just raw binary
���YF��c�gh�����ʻaA�b^�h^��P�CH>�
>>> payload
'\x95\xb0\x1a\x7f\xcc\x0b\xfbYF\x8a\x9fc\xa4gh\x96\x9f\xe1\xb0\x9e\x14\xbe\x9f\xf9\x7fE\x08\xbb\xca\xbb\x01\x1e\x02a\xcb\xcbA\xd5;\xaeb\xe9h^\xcbA\xd5;\xb1\xe6\xe9h^\x91\xb0\xf2kP\xf4\x98\x8bN\xb0\x8e\x08\xdaMCH>\xf9'

Format: Fernet. Fernet is an opinionated format explicitly designed for use in API tokens by Heroku. It addresses many of the same problems that OpenStack faces, and makes some of the same design considerations that have appeared in the OpenStack community (such as PKI tokens, encrypted OAuth tokens, and AE tokens).

>>> from cryptography import fernet
>>> key = fernet.Fernet.generate_key()
>>> token = fernet.Fernet(key).encrypt(payload)
>>> len(token)
186

So the token itself looks like this:

gAAAAABU7roWGiCuOvgFcckec-0ytpGnMZDBLG9hA7Hr9qfvdZDHjsak39YN98HXxoYLIqVm19Egku5YR3wyI7heVrOmPNEtmr-fIM1rtahudEdEAPM4HCiMrBmiA1Lw6SU8jc2rPLC7FK7nBCia_BGhG17NVHuQu0S7waA306jyKNhHwUnpsBQ

One of the fun things about Fernet is that the token's creation time is actually built into the format itself. If you know how to use a knife, you can retrieve it:

>>> timestamp = struct.unpack('>Q', base64.urlsafe_b64decode(token)[1:9])[0]
>>> print(timestamp)
1424748903
>>> datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
2015-02-24 03:35:03

I actually wrote a library to make non-cryptographic validation of the token's structure (such as the timestamp) completely trivial.

PKI and PKIZ

PKI and PKIZ tokens are nearly identical (and in fact share the same payload), but PKIZ tokens add compression to the mix (as well as a couple other learned lessons).

Payload: JSON. Both PKI and PKIZ tokens deliver the entire JSON response that would normally be produced as a result of online token validation as the token's payload:

{
    "token": {
        "audit_ids": [
            "YyobSaHcTNCu7seusdTtpQ"
        ],
        "catalog": [
            {
                "endpoints": [
                    {
                        "id": "9a29eaf20f7942b6b9c96cfb0aa02a3e",
                        "interface": "admin",
                        "region": null,
                        "region_id": null,
                        "url": "http://104.239.163.215:35357/v3"
                    },
                    {
                        "id": "d3233afd2b6041d4a39f8ac1233757fd",
                        "interface": "public",
                        "region": null,
                        "region_id": null,
                        "url": "http://104.239.163.215:35357/v3"
                    }
                ],
                "id": "1b796e214f8140118108a7e4e4ca6e16",
                "name": "Keystone",
                "type": "identity"
            }
        ],
        "expires_at": "2015-02-26T05:48:26.094098Z",
        "extras": {},
        "issued_at": "2015-02-26T05:33:26.094127Z",
        "methods": [
            "password"
        ],
        "project": {
            "domain": {
                "id": "default",
                "name": "Default"
            },
            "id": "59002ce739f143bb8b2cc33caf98fcf9",
            "name": "admin"
        },
        "roles": [
            {
                "id": "360b177d8c2347ff95e0ac1615ba8fb6",
                "name": "admin"
            }
        ],
        "user": {
            "domain": {
                "id": "default",
                "name": "Default"
            },
            "id": "85a9af145ddb4d19a9544dfbeac5d1f0",
            "name": "admin"
        }
    }
}

That's the shortest realistic example that I can produce.

Format: CMS + [zlib] + base64. The JSON payload is first signed using an asymmetric key, and wrapped in cryptographic message syntax. In the case of PKIZ tokens, the signed payload is then compressed using zlib. Next, PKI tokens are base64 encoded and then made URL-safe using an arbitary substitution scheme. PKIZ tokens are instead base64-URL encoded using a conventional substitution scheme. Finally, although PKI tokens do not have a fixed prefix, PKIZ tokens are explicitly prefixed with PKIZ_.

A minimal PKI token might look like this:

MIIE-gYJKoZIhvcNAQcCoIIE7zCCBOsCAQExDTALBglghkgBZQMEAgEwggNMBgkqhkiG9w0BBwGgggM9BIIDOXsidG9rZW4iOnsibWV0aG9kcyI6WyJwYXNzd29yZCJdLCJyb2xlcyI6W3siaWQiOiIzNjBiMTc3ZDhjMjM0N2ZmOTVlMGFjMTYxNWJhOGZiNiIsIm5hbWUiOiJhZG1pbiJ9XSwiZXhwaXJlc19hdCI6IjIwMTUtMDItMjZUMDU6NDg6MjYuMDk0MDk4WiIsInByb2plY3QiOnsiZG9tYWluIjp7ImlkIjoiZGVmYXVsdCIsIm5hbWUiOiJEZWZhdWx0In0sImlkIjoiNTkwMDJjZTczOWYxNDNiYjhiMmNjMzNjYWY5OGZjZjkiLCJuYW1lIjoiYWRtaW4ifSwiY2F0YWxvZyI6W3siZW5kcG9pbnRzIjpbeyJyZWdpb25faWQiOm51bGwsInVybCI6Imh0dHA6Ly8xMDQuMjM5LjE2My4yMTU6MzUzNTcvdjMiLCJyZWdpb24iOm51bGwsImludGVyZmFjZSI6ImFkbWluIiwiaWQiOiI5YTI5ZWFmMjBmNzk0MmI2YjljOTZjZmIwYWEwMmEzZSJ9LHsicmVnaW9uX2lkIjpudWxsLCJ1cmwiOiJodHRwOi8vMTA0LjIzOS4xNjMuMjE1OjM1MzU3L3YzIiwicmVnaW9uIjpudWxsLCJpbnRlcmZhY2UiOiJwdWJsaWMiLCJpZCI6ImQzMjMzYWZkMmI2MDQxZDRhMzlmOGFjMTIzMzc1N2ZkIn1dLCJ0eXBlIjoiaWRlbnRpdHkiLCJpZCI6IjFiNzk2ZTIxNGY4MTQwMTE4MTA4YTdlNGU0Y2E2ZTE2IiwibmFtZSI6IktleXN0b25lIn1dLCJleHRyYXMiOnt9LCJ1c2VyIjp7ImRvbWFpbiI6eyJpZCI6ImRlZmF1bHQiLCJuYW1lIjoiRGVmYXVsdCJ9LCJpZCI6Ijg1YTlhZjE0NWRkYjRkMTlhOTU0NGRmYmVhYzVkMWYwIiwibmFtZSI6ImFkbWluIn0sImF1ZGl0X2lkcyI6WyJZeW9iU2FIY1ROQ3U3c2V1c2RUdHBRIl0sImlzc3VlZF9hdCI6IjIwMTUtMDItMjZUMDU6MzM6MjYuMDk0MTI3WiJ9fTGCAYUwggGBAgEBMFwwVzELMAkGA1UEBhMCVVMxDjAMBgNVBAgMBVVuc2V0MQ4wDAYDVQQHDAVVbnNldDEOMAwGA1UECgwFVW5zZXQxGDAWBgNVBAMMD3d3dy5leGFtcGxlLmNvbQIBATALBglghkgBZQMEAgEwDQYJKoZIhvcNAQEBBQAEggEAYJR+ETbjA4RpgToeRm0qh-zxRWyBL4RdN99hLHV6foIpcr6uXMN-DaUJvGygPDi1wi-HAbpErJAe9iRHk4+8BUnX--jQRTaYhkg237eyjpYHU8Hgt8Ydn7Wdnn0hriXKt+RZBG-ZEnnP-MZ9V9GGJz-BoAMHx42uF5j6mlfVvUxtJGSaZ2wPROkLIHAjrX-8zEo8YhtGQHi-rFvXOoP+w8TVb907R2WNsGs3LbFKRmDv-yev6pMnz+gQu8uImf2idd18hyEYdw8M9bgZc2YsGBiPSeIm-VhzH9qTX0e7fK-chhAE+saIEbl5Mw0PzybhTyKHRzqtsW4HWFOlbE0yOA==

That's 1,712 bytes. Not quite enough to break the Internet, but it's a slippery slope when you start piling on additional data.

Whereas, a minimal PKIZ token might look like this:

PKIZ_eJxtVcmSozgUvPMVc6_oKMBgm0Mf2IzBCIpVlm4sNiAEtssLy9eP7K6Jqo4YboCUysyX7-nXL_ZopmV7_-gger784oBtm-8VcnYnbNePwlODQj-xb6tZ1zX_qquBORqx6moVreq20nAATLUyh6rygFa1F65uG0sZeE0brKqqgKLZtuHvr01pKZ8YSo3fX5scpnxmKW0x2Us4OQPae3MpKhPWnZJzdWfKxZG-fi6uTQaDxm9s2TPAgEgwe10i-9DkPWLOfkwpIJWMYq32LId4c7LgfN2-2p1c5zBhG50aW8I5bxxlHw0N3tdDtndoISh1qdtLm9gDiJMbMOwbIDgBBlpyIEZLQII7mNuJnTrDhgH2GmN1pmgRvCRgS7khSO82Oa_sjrY2ObFvaYf26ZUr_2ZgYojrEo683fPX78WmhOaw82MgITHtPCvhgWjzvpW2HLBwh4nX-kYgYENtmCd3BAX63IhgeMuYkUcmB4kbHsHxgb-8wlBuC0s5c3kfzoxafpicCcPynIvy8WVkJwu5NTA56ZQ_9Xc1X27VpTutR2AwyQTILjFFDkzSxIxZgjmZvbh4lAQ8WXyBSd9AHb2XVjrhbkNw9ATctDnzhbOb4at0Tu2RkIC4HX3DHDFBPIYhRXG1AHNKEUEy6hAPIJhw5Cju9toUXdpzGVTue_Fp1vnOzLuy04WiG56Ap3IbDn6zfoBY5V1iz34kjR4BjL4p-AQI4JkDd4HmJ4sn2hPsB9CZ-UOLDtdIfFVoKKFzzeBL4hm_fAELDhgVQy07TwwpjkMmg9a-0cqsTIJnPdPXDqBDC7sXSraRP-y1V4UyJo8dcObKbfuNSBIex7YErISFqlpgI-CxUdYotmcQOy0mxeiJKYuwR5-s825z416Otjd62Hs8KyH9OoketuGE9oAl8aa8fBHT6U8Sw0cONyzu9pKV_sz90cLodxsh3wZ_BSn8imupO8o3S6_GsSkxhjyaW55jNAVECtm37AUmlQQgK6eFJCAC-T-aP-v-J-IbAVuUf1aP--rxNklGMekrIRM290g8NxnFt6yjJOmd3qavvpiLRUrx5u_O5H62JjDMH52JJMja-hhbuooSNoEsjU0iDWyGIZ1NF6itpQqJyWk10NMUjAZR2YjyUrYKaGl6Z6bxIJAGQ0VGGgRbQ03TvPdoaZg-UIfXZr0aNlwK5Rnvg9EyVPgHAABjUS7KSaYHa3MrrJG6nffIA1tT_2c2ckbwc6CamhaoZlWZ6s5fHiM7FSN_F4LPwIZ62eK-Ck7bCCpG5gpWk55VZuJb-wZ30-Uwfh6c4_0Srgp12Ak0si9usTwdmuUcuHlIuqUjXarRXcN-_THIn6tdAN-nPSg57PGwD4Wt2Avm6qpmghnW1w0ZrGUX7cQ3MprKmr7nWFmkufamysNiZfWSqNPDabMl54Q7ykPw2Gzxx1G8gzcNvGvRvTCjTLAqtQ1dZ7xM-zxbbam8Vha3SgGNhxL8-bESItc8SiF3PhHSXD4Mfztp16N2Em_F8CYqviBlaj917zPUwf2h-1nsiVSIpWGKeu-Gdtc6rtfD2eRWEbn5VNhNU-wivHb8i14U1yo6RNH7qf0Y4ValpVTG9nR4NMHv39zrQjM94_ty-xc2_Erg

Yes, that's still 1,637 bytes. In this case, we have a minimal service catalog encoded into the token, whereas a PKIZ token with a larger service catalog would see a much higher compression ratio when compared to their PKI counterparts.