Benchmarking OpenStack Keystone token formats

tl;dr: PKI and PKIZ tokens are slower than UUID tokens, and based on the June 2015 update, Fernet tokens are faster to create than UUID tokens (but also way slower to validate).

The simplest token format in Keystone today is that of UUID tokens: they're randomly generated 32 character strings that must be validated online with Keystone. This means that Keystone must be aware of all the tokens it considers to be valid and be able to map tokens to user identities and authorization metadata.

PKI tokens pack absolutely everything about the owner's identity and token's authorization context into the token itself and then wrap the whole package using CMS. The result is measurable kilobytes. PKIZ tokens are an evolution of PKI tokens which introduce compression into the equation, but compression can only help so much.

A recently proposed alternative comes in the form of Fernet tokens (specification), which differ from UUID tokens in that they're encrypted using AES-CBC and signed using SHA256. They also contain some metadata about the owner's identity and the available authorization context.

In any Keystone deployment, there are two primary operations where performance is critical: token creation and token validation. Every other operation tends to occur on a much less frequent basis, and thus performance is not a priority. These are the two operations worth benchmarking.

The ability to sync Keystone tokens across multiple regions is a popular feature request. The desired use case is that you'd have separate instances of Keystone deployed in two or more regions and be able to sucessfully use tokens generated in one region against any other region. The complexity with UUID tokens is that you need to sync tokens into every region as quickly as possible. You're then faced with two choices: either you withhold the token from the client until it's replicated everywhere, or you create it locally and asynchronously replicate it everywhere afterwords. The first scenario results in high response times when creating tokens, and the second scenario results in a race condition where the client might attempt to use the token before it's available in the target region. I don't know anyone who is a fan of race conditions, so that means we have to accept high response times, right? Perhaps...

Fernet tokens provide some interesting relief in that Fernet tokens do not need to be persisted and thus do not need to be replicated into every region. Yes, that's right:

> SELECT * FROM `token`;
Empty set (0.00 sec)

The wonderful side effect is that in a multi-region deployment, say one that spans five datacenters worldwide (1: Washington, DC, 2: Chicago, Illinois, 3: Dallas, Texas, 4: Hong Kong, and 5: Sydney, Australia), your response times are similar to that of a single-region deployment, but your token is immediately available for use in all regions. There's no time to wait, synchronous or not: response times do not suffer from the overhead of synchronous replication, and Keystone nodes in alternate regions can readily acknowledge your token's encryption key.

And for the icing on the cake, Fernet tokens should always weigh in under 255 bytes (unlike PKI tokens which sometimes exceed 8,192 bytes — thus breaking the Internet).

Keystone + globally-distributed Galera cluster

These metrics represent a five node, globally-distributed Galera cluster performing replication over the public Internet: pretty much the first class nightmare of all the use cases we're looking at today.

Token creation performance

Response time Requests per second
UUID 342.4 ms (baseline) 166.9 (baseline)
PKI 351.4 ms (3% slower) 110.2 (34% slower)
PKIZ 339.7 ms (1% faster) 120.7 (28% slower)
Fernet 50.8 ms (85% faster) 237.1 (42% faster)

Token validation performance

Response time Requests per second
UUID 6.02 ms (baseline) 1715.7 (baseline)
PKI 6.25 ms (4% slower) ** 1717.2 (0% faster) **
PKIZ 6.15 ms (2% slower) ** 1676.4 (2% slower) **
Fernet 5.55 ms (8% faster) 1957.8 (14% faster)

** ApacheBench has a maximum request size, which a pair of PKI or PKIZ tokens exceed (X-Auth-Token plus X-Subject-Token). As suggested on Stack Overflow, I worked around the limitation by recompiling ApacheBench with a larger maximum request size.

Keystone + single database node

So, Fernet tokens look fantastically attractive to multi-region deployments, but the more common use case is that of a single region deployment. So, let's benchmark that! This is still the same deployment but with all remote Galera nodes removed from the cluster, leaving a single SQL node as the primary.

Token creation performance

Response time Requests per second
UUID 58.0 ms (baseline) 223.7 (baseline)
PKI 71.6 ms (23% slower) 173.0 (23% slower)
PKIZ 73.0 ms (26% slower) 178.0 (20% slower)
Fernet 52.0 ms (10% faster) 237.1 (6% faster)

Token validation performance

Response time Requests per second
UUID 5.90 ms (baseline) 1787.3 (baseline)
PKI 6.25 ms (6% slower) ** 1671.8 (6% slower) **
PKIZ 6.28 ms (6% slower) ** 1689.4 (5% slower) **
Fernet 5.63 ms (5% faster) 1974.8 (10% faster)

** The same issue as above.

DevStack stable/kilo

(June 2015 update)

With the release of stable/kilo, I've run similar benchmarks against a vanilla, baremetal, all-in-one devstack deployment.

For raw benchmark results and the scripts used to produce these numbers (which differ from the above benchmarks, and are likely far easier to reproduce yourself), see this gist.

Determining why Fernet appears to be significantly slower that previously reported is my next mission. Stay tuned!

Special thanks to the team at Time Warner Cable for raising suspicions that the performance of Fernet's final implementation in production did not reflect my early benchmarks (shown above), thus warranting this investigation. @openmfisch

Note: These benchmark numbers cannot be compared against the above benchmark numbers, except to show that Fernet does not have the same performance advantage over UUID that it did in the previous benchmarks.

Token creation performance

Response time Requests per second
UUID 72.5 ms (baseline) 57.6 (baseline)
Fernet 62.7 ms (13% faster) 74.8 (30% faster)

Token validation performance

Response time Requests per second
UUID 18.8 ms (baseline) 256.7 (baseline)
Fernet 93.8 ms (400% slower) 48.3 (81% slower)

Notes on benchmarking methodology ( caveat emptor! ):

  • The benchmarked implementation of Fernet tokens was a proof of concept that will be available in OpenStack's 2015.1 Kilo release
  • Keystone was deployed using these Ansible playbooks which include minimal (and likely idiotic) performance tuning (feedback welcome!) for Apache httpd and MariaDB
  • The benchmarking scripts and raw result data is all available for your criticism
  • All data is the best of at least three benchmark runs in each configuration (the most competitive results of each configuration are compared above)
  • Response times were measured using a single API user
  • Requests per second were measured using multiple concurrent API users
  • Both Keystone and the nearest database node were run on bare metal (all other nodes were virtual machines)
  • Ping time from the primary database node to the furthest database node in the cluster was noted to be 211 ms
  • All API calls were made using Identity API v3
  • Fernet tokens were briefly known as "Keystone lightweight tokens (KLWT)"; before that, they were known as "authenticated encryption (AE) tokens"