Monday, September 16, 2013

Chef Encrypted Data Bags Are A Code Smell.


Chef databags are a centralized key/value store for json based data in the Chef environment. Encrypted databags are exactly what is advertised on the box, the data is encrypted with an external key before uploading to the data store.

Encrypted databags are meant to help solve the problem of dealing with identifier information that you want automatically installed, but that you need to keep private. Examples are database login passwords, the private key of a public key pair and other tokens that allow a server access to additional services.

Encrypted databags provide protection against two kinds of access:

Implicit Access:

This is defined to mean access outside the chef protocol to the underlying data store on your chef server. If you're using a service like Hosted Chef or just practicing general good data hygiene, DB access to the server should only expose "public" data if at all possible.

Explicit Access:

Explicit access is via the standard chef protocol were using the ssl keypair created as part of the chef client bootstrap to access data in the chef datastore or via a knife command using an admin key pair.

The problem with encrypted databags in both use cases is that they only appear to solve the underlying problem by moving it out of the chef workflow. In order to use either protection effectively, you need to create an "out of band" system to manage the shared secret required to access the databag.

In the implicit access protection case, this is likely a worthwhile and manageable cost since the key only needs to be secret from the chef server. However, this only protects against read-only implicit access. If the bad guy has write implicit access to your chef datastore, the game is over on your next client chef run.

The explicit access case is where the real problems arise. In this case, the intention is to prevent some admins/hosts from access to private data. This however creates an unfortunate side effect in that access to the shared secret becomes an "invisible access control list". Using an encryption key as an authorization object creates problems since you destroy the chain of identity. All you ever know is that "somebody with the key" accessed the data. You need to create a separate tracking and access control system to provide an audit-able trail. Encrypted databags don't solve a problem in this case, they just create a whole new set of problems.

Chef Vault is an attempt to get around this problem by creating access control lists using the available private key on each chef client. Without the strong ACL system that either Private or Hosted Chef provide, this is probably the simplest workable solution. Any other solution should implement all of the features of Chef Vault. The most general solution in the explicit access case is ACL's based on the ssl identity of the client. There are many other objects in Chef on which the ACL system of the Opscode Chef would provide useful security boundaries.

It's important to remember that more keys is not better security. Encryption keys should be used to provide data integrity, privacy and authentication. (i.e. they answer the who questions, who are you? who am I? who sent that message? ). They should never be used to answer the what questions. ( what can I read? what can I write?, etc ).

Providing secrets in a scalable and secure fashion is still a largely unsolved problem. But any solution that attempts to use only shared secret encryption will not scale. Public key encryption and rings of shared access seem to be the only workable way forward and every chef client and admin has key pair already available. Any scalable solution should be using this existing identity. If you are using encrypted databags to control explicit access, then you are building in scaling, access and audit problems for the future.