Hierarchical multitenancy

Welcome to the biggest, scariest word in OpenStack. Please don't run away (yet, anyway).

Background

Keystone's original model for multitenancy was entirely flat: tenants had no relation to one another whatsoever. In Grizzly, we renamed tenants in our API to projects and introduced the concept of domains to serve as containers for collections of tenants that could be administered as a group. When discussing authorization, domains can simply be thought of as groups of projects.

Domains allow us to model real-world use cases such as a single customer having multiple environments, where each environment is isolated from the others using OpenStack's existing (flat) model of multi-tenancy.

Evolution

This model only takes us so far, however. How do you subdivide an existing environment? Or delegate access to a subset of your infrastructure to a third-party? How do you manage quotas for one portion a project separately from the rest?

Real-world models of these scenarios tend to be hierarchical, with authorization applicable to a higher level in the hierarchy trickling down to the layers beneath it. Thus, we're exploring options to adapt our existing API to express a hierarchical model.

Implications

Authorization at any one node in the hierarchy logically implies that you have the same authorization throughout the tree below that node. However, my customer's customer is not my customer. As a cloud deployer, I should be able to see an aggregation of my customer's subtenant resources, but I should not have any more granular or direct access. Our solution to this is a "privacy" attribute on projects, which, once set to true, provides a hard stop for certain kinds of inheritance (effectively making them behave like a top level domain).

Thinking about OpenStack in terms of hierarchical resource ownership colors your perspective on just about everything. Suddenly we expect new features like RBAC policies existing at every node in the hierarchy and the ability to define project-specific roles. We are also faced with new use cases, such as sharing access to your resources with your child projects (for example, a public cloud provider sharing images in glance with all the customers in the cloud).

Constraints

  • Only allow role assignments up and down your own tree, but not across trees (which would effectively create trust relationships, which is cheating federation). This assumes that identities are still owned by projects or domains.
  • Do not allow a project to be re-parented. The implications otherwise are highly complex, but may be overcome in the future.
  • The parent of a private project (a "domain") must be another domain. The implications otherwise are highly complex, but may be overcome in the future.

Okay, but...

This all sounds like it won't scale, right?

So, how do you efficiently compute the list of all ancestors of a given project? What about all the descendants of a given project?

tl;dr pre-compute it.

Unfortunately, this requires persisting your parent-child relationships in two ways. The first is the traditional "this is my parent" reference which exists for every project (for root projects, i.e. domains, this would be null).

The second is that you need a table with just two columns: let's call them ancestor and descendant. For a parent » child hierarchy where:

  • A is B's parent.
  • A is C's parent.
  • B is D's parent.

You would then persist the following records:

AncestorDescendant
1AB
2AC
3BD
4AD

The reasoning behind the first three rows should be obvious (as we just outlined those direct relationships above), but row 4 is what enables all the magic.

Row 4 allows us to query for all descendants of A in O(1) (pardon the psuedo-SQL):

> SELECT descendant ... WHERE ancestor = A;
B, C, D

And for double bonus points, allows us to query for all ancestors of D in O(1):

> SELECT ancestor ... WHERE descendant = D;
B, A

Building the table means doing several writes, especially when you add an ancestor deep in the hierarchy, but you only have to do the tree traversal once. After that, it's basically free.

When you delete a project, you select all the descendants first, and then delete all records where either the project being deleted or any of it's descendants appear as either an ancestor or descendant (effectively removing the entire subtree).

Can we rename hierarchical multitenancy to something else?

Of course! How about hierarchical multiprojectcy? Kidding aside, I don't think end users will ever be faced with such daunting terminology. We're not going to use this term in any API or client interface. Colloquially, I imagine we'll adopt more natural phrasing like "project trees."