Tuesday, May 27, 2014

Publisher Program and Servers

This week I spent time working on the publisher side of the Gnocchi.  I worked on building a framework for encoding, signing, and encrypting documents where all of the components are easily pluggable.  I looked into various C++ cryptography libraries and decided that Crypto++ would be the best option.

I looked at a few different crypto libraries in C++.  The most extensive and well maintained general purpose libraries seem to be GNU Crypto and Crypto++.  I decided to use Crypto++ because it is a modern and well maintained library.  Three previous versions of Crypto++ have meet the FIPS 140-2 level 1 requirements, which suggests that there isn't any obvious security issues.  It also has an extremely wide variety of algorithms with a consistent interface that makes them easily interchangeable.  Crypto++ makes it easy to handle function composition using data pipelining which makes it easy to encode, decode, pad, and perform other transformations on the data before it is run into the encryption function.  It also has native support for secure random number generation and key handling.

The main focus this week was to familiarize myself with Crypto++, and then building a sensible API and command line interface around it.  My goal was to minimize dependencies between different parts of the program and to make the control flow as easy to follow as possible.  I've nearly finished implementing an early iteration of the code base, the last few pieces of functionality left to implement are as follows:
  • Encoding the control information for encrypted documents
  • Public key and group key handling
  • Sending the processed document to the server
Similarly to the rest of the implementation, these things cannot be completed until there is a formal specification for Gnocchi.  However, I am planning on writing an example implementation so that when the specification is ready it will be easy to edit and extend the example code.

Here is an example input, signed output, and encrypted signed output.  In the signed document, the control part contains the signature of the input encoded in Base64.  In the encrypted document, the control information is just the AES key encoded in Base64, this is just a placeholder for the real functionality.

The Gnocchi Publisher should be able to work with any document store, which means allowing arbitrary file upload formats.  I was thinking about something like having downloadable configuration files that specify how to authenticate a user and the format of the upload request.  The configuration files could be stored by domain, e.g. (facebook.com, google.com, mail.google.com), where the most specific domain is used.  The server should be able to store the Gnocchi documents however it wants to, and neither the Gnocchi publisher nor browser extension shouldn't need to know anything about how the server works.

However the key handling server gets implemented, I think we should create an easy to use tool that allows users to create their own key store servers very simply.  There should be a one command tool that allows users to create their own key handling server and start it on a free PaaS (platform as a service) or their own server.  Some ways to distribute the server code are:
  • A node module uploaded to npm, the node package manager (if implemented in Node)
  • A Docker image ready to run the server
  • A git repository with the project ready to be uploaded to Heroku or another PaaS

Friday, May 16, 2014

Getting up to speed

This is my first blog post of the summer.  I am working on the NoSSL/Gnocchi project, which is both a cryptographic protocol space that doesn't involve SSL(NoSSL) and an implementation of a protocol in this space (Gnocchi).  Gnocchi is centered around the idea that you can't steal what you don't have.  Servers shouldn't be keeping private material online, which SSL requires.  Instead, content should be encrypted using different keys for every author, to minimize catastrophic points of failure.  Gnocchi is intended to be used in the common space of read-only, append-only data, such as a user's tweets, or medical sensor data.  Following is summaries of some of the reading I did this week to bring myself up to speed with the project, as well as some of the basic prototype implementation work I did.

Secure HTTP

Secure HTTP (S-HTTP) was a competitor with SSL for encrypting communication, however, it lost because both Microsoft and Netscape supported SSL (https).  S-HTTP runs on top of HTTP and encrypts the data instead of the connection.

It works by negotiating the signature and encryption schemes, and then encapsulating the HTTP message (generally) by any combination of authenticating, signing, or encrypting (including none of them) and then adding its own headers.  The recipient then unwraps the message using the specified cryptographic methods to get the clear HTTP message.  The host is sent in the clear in the S-HTTP method because it needs to be routed to the server correctly, however the path is inside the (possibly) encrypted HTTP message.

The biggest difference between S-HTTP and SSL is that SSL establishes a secure socket connection with the server and then transmits all future data through the encrypted channel.  However, in S-HTTP the plain HTTP message is encrypted and then encapsulated using the S-HTTP protocol and then that message is sent over a regular connection.  Since S-HTTP messages are clearly distinguishable from HTTP messages, they can be sent over the same port (80), while HTTPS messages require their own port (443).

Both systems have the same failure point where if the server is compromised, every users data is exposed since the data is decrypted once it is sent to the server.  They both require the server's private key to be used online, therefore reducing the security of the system.

Merkle Hash Trees

Merkle hash trees are used to efficiently and securely verify content of directory-like data structures.  To compute the top level hash you go through the tree depth first and compute the hash of the leaves (data), then to compute the hash of an inner node you concatenate the hashes of all of its children and then hash that.  That process will yield the top level hash which you can sign and distribute.  Then to verify that the content of a leaf, which contains the data, you recompute the hash of that node, and then use that along with the existing hashes of the other relevant nodes to recompute the top level hash, then check to make sure it matches the distributed signed hash.  This allows for verification of a node in linear time with the depth of the node (logarithmic to the number of nodes in a balanced tree).

SFS Read Only File System

The SFS read only file system allows for high throughput for one-writer many-reader content hosted on untrusted servers.  All cryptographic procedures that require the authors private key happen offline, minimizing the risk of attackers being able to steal the authors private key.  When the client downloads files from a server they verify it using the authors public key, which means the server doesn't have to be trusted.  This also has the effect greatly increasing the throughput of the server because all of the cryptographic procedures are offloaded to the client or author.

The basic workflow of the SFS read only file system is as follows:

  1. The writer creates the database (file system) offline and signs the top level hash of the Merkle hash tree.
  2. The writer pushes copies of the database to the servers.
  3. The reader makes a request to the server and gets the data for that level of the filesystem, then verifies the data using the Merkle hash tree and the authors public key.

MIME Types

The MIME types that would make the most sense for Gnocchi are "multipart/Signed" and "multipart/Encrypted".  They are general purpose MIME types for signed and encrypted content.  The encryption/signature procedure is specified in the control part of the message, where you can specify your own content type, such as "application/gnocchi-signature".  I have just started to implement a prototype of the MIME types for Gnocchi, I'm working in Python, which has an extensive MIME type module built in.  The MIME type classes automatically handle the signing, verifying, encrypting, and decrypting, however, I haven't started working on any of the key management yet.  I plan on having the basic functionality of the program working sometime on Monday.  I will have to update the server (mentioned next) to work with the new MIME types, since I worked on this after I built the server, but that shouldn't take long.

Server

I built an extremely simple server and uploaded it to a free hosting platform here (its not very interesting).  All it does is allow files to be uploaded which are then stored in a database by their filename.  Then the file can be viewed or downloaded by going to a url related to the files name.  Currently anyone can upload, view, download, and delete any file.  It is just supposed to be a very simple server for testing the Gnocchi system, and possibly expanded upon.  File uploads aren't actual files because it was easier to implement that way, however that can be changed without too much effort.  An example of how it works is as follows:

Upload a 'file' (if a file with this name already exists pick a new name or delete the old file)
curl -X POST -H "Content-Type: application/json" http://gnocchiserver.herokuapp.com/files -d '{"filename":"test", "data":"Hello Gnocchi world"}'

View the file (or just visit the url in your browser)
curl http://gnocchiserver.herokuapp.com/files/test

Download the file (if you visit the url in your browser it will automatically download)
curl http://gnocchiserver.herokuapp.com/download/test

Delete the file
curl -X DELETE http://gnocchiserver.herokuapp.com/files/test