Friday, August 15, 2014

Gabbing Noisily On, Concerning Code Improvements

Rewrite

I have finished working on the publisher code rewrite. The rewritten code is now able to perform all of the operations that the old code could, and it has a few improvements to the CLI. However, most of the improvements have been internal to the code.
The main goal of the rewrite was to decouple the code and make it more maintainable. In this, I believe I was successful, some metrics supporting this claim are listed below:
  • The code compiles about 4x faster, ~9s instead of about ~40s.
  • Only one file depends on OpenSSL, and similarly only one file depends on Boost. Each of these files are only imported by one file, which speeds up compile time and reduces dependencies.
  • All basic operations happen through abstract base class interfaces which allow any part to be swapped out with a different implementation without needing to change any of the code. For example, cryptographic operations can be done through the OpenSSL CLI or its C API, and switching the two out only requires changing one line. The other interfaces are an interface for storing other peoples certificates, formatting messages, and storing data to disk.
  • The structure of the code base has been changed from a flat layout with all of the files in the same directory, to a heirarchical structure which separates the different modules.
  • The publisher API easier to use, which includes storing the publishers private key and certificate instead of asking the user to enter it every time.
During the rewrite I also made a few miscilanious improvements, a few are listed below:
  • I added a script that parses a C++ source file and extracts the enums in it and outputs functions that map the enums to and from strings.
  • I cleaned up data storage and serialization. This made file storage much more stable, and I seperated datum into different files to minimize parsing.
  • I made a much simpilier make file that generates dependencies automatically using the C++ compiler.

Things I am currently working on

Publisher App

Now I am working on adding extra convienence features to the publisher app to make it more user friendly. Some of the things that I am working on are:
  • Displaying information about a group such as its members, what version it is on, and the derived key of a certain version.
  • Adding nicknames to the members certificates so it is easier to refer to a member. Currently you have to know the hex fingerprint of their certificate to add or remove them from a group.
  • Communication with the key server. This should simply be a matter of formatting, then using python to send the actual request because I do not want to work with C++ networking when I could just python.

Poster

I am also just starting to work on the poster describing what I did this summer. My plan is to work on it this weekend and the beginning of next week before I start band week. I want to have a draft by Sunday night so I can spend at least the next week improving it. Currently my plan is to make it in LaTeX so that it can be viewed digitally and printed out.

Monday, August 4, 2014

Abstractly (Re)factoring the Publisher Code

The first thing I worked on is fixing a bug in the group key encryption.  On other systems it was throwing an error when generating RSA key pairs.  The bug ended up being a small scoping issue that my compiler automatically fixed, but other compilers didn't.

Then I worked on implementing encryption of group keys under each member.  This works by encrypting the message once under a random AES-256 key/IV, and then encrypting the AES key under each group members public key.  This indirection is necessary because RSA encryption can only handle inputs that are less than the length of the key.

After that I worked on refactoring the publisher code.  Since the summer is coming to a close, I want the publisher code to be extremely readable for the next person, whoever that may be.  The first goal of my refactor is to have all of the "low level" functionality, such as crypto, formatting, and file access, to be accessed through an abstract base class interface so the implementation can be changed out without changing the code that uses it.  This technique also allows tests on the interface to be applied to every implementation of the interface (sub-class) without any extra code.  For example, the code could switch from using the OpenSSL API to using the OpenSSL CLI by only changing which implementation of the crypto interface is used, which is only located in one place.

Along with the refactor, I've been adding unit tests to the code base.  This is one of the "technical debts" that I've built up over the summer that I need to pay off.  However, the nice thing about writing the tests alongside the refactored code, is that I get to use the fancy new abstract base classes to test all implementations of the interface at once.

Lastly, I had to correct a mistake I made with group key encryption.  I had incorrectly assumed that the public exponent should be encrypted when distributing the group key to each group member.  Therefore, I used a symmetric cipher to encrypt the group key and then encrypt the symmetric key under each members public key, because if you include the public exponent in encrypted part, it is too long for RSA encryption.  However, I learned that the public exponent does not need to be encrypted, so the group key state can be directly encrypted under each members public key.

Monday, July 7, 2014

OpenSSL Command Line Apps

This week I worked on supporting arbitrary encryption and signing functions within the publisher app.  In order to do this I switched from using OpenSSL's C API to using its command line interface.  I made this change because when using the C API, for each function I wanted to support I needed to add about 40 lines of code, as well as understand the cipher well enough to choose a good configuration.  However, with the command line interface the same code can support any cipher that OpenSSL supports just by passing in a different cipher name.  This reduced complexity and consistent interface should help reduce subtle bugs in the cryptographic functions.  Since the cryptography is the main component of the publisher app, it is important for it to be implemented correctly.

The OpenSSL command line interface is called by forking the process and redirecting the input/output streams to pipes to communicate with the parent process, which limits publisher application use to UNIX like systems.

While implementing encryption with the CLI, I found a bug in OpenSSL key and IV generation.  When you call openssl enc -aes-256-cbc -P -nosalt -pass stdin you pass in the password through standard input to generate the AES-256 key and IV.  When you pass in random bytes, for example from head -c 16 /dev/random, there is a possibility that most or all of the random bytes will be ignored.  The key derivation function reads the password in as a null terminated string, so as soon as the null byte is passed in, it stops reading the data.  Therefore if one of the first few bytes is the null byte, the key and IV generated won't have much entropy.  For example,

If you pass in aASCII '5' (0x35), a null byte, followed by any data.
openssl enc -aes-256-cbc -P -nosalt -pass stdin
key=E4DA3B7FBBCE2345D7772B0674A318D50B6DEB36D53E14600683C32878F482DC
iv =FB205138427CA770AD173FAFA25CDA94


The output will be the same as if you just passed in an ASCII '5'.
openssl enc -aes-256-cbc -P -nosalt -pass pass:5
key=E4DA3B7FBBCE2345D7772B0674A318D50B6DEB36D53E14600683C32878F482DC
iv =FB205138427CA770AD173FAFA25CDA94


This effectively reduces the key space of the resulting key and IV to 256^n, where n is the location of the first null byte.  An attacker with 2^16 entries containing keys generated from 0x0000 to 0xffff can break any password with a null byte in the first 3 bytes.  Assuming the password passed in is uniform random bytes, about 2/256 passwords will be in the attackers table of 2^16 elements, which means that about 1% of passwords will be incredibly weak.  Adding a salt doesn't mitigate the risk either, since the salt isn't secret information.  

Thursday, June 19, 2014

Group Keys and Filesystems

I spent this week finishing up the current iteration of group keys and working on a filesystem abstraction.  At this point the whole publisher program is "working", I have managed a whole run through.  However, the public key interface is extremely basic, it just takes in a PEM encoded public key file and stores it internally, and all of the actual encoding and cryptography is still up in the air without a specification.

Group Keys

The group key implementation is working.  The program allows you to create groups, add/remove other peoples public keys to groups (change the membership), and encrypt/decrypt messages using the key derived from the group state.  Whenever a user is removed from a group the group state is wound using the RSA scheme.  The control information that is put in the encryption function now encrypts the file key under the group key of each group that has access to it.  To get the file key for decryption the user must be a member of at least one group that the file key is encrypted under and have a version greater or equal to the key version it was encrypted under.

Filesystem

This week I also worked on a filesystem abstraction that handles the publisher program's local storage and I/O.  I built the system to:

  • allow for insertion of arbitrary middleware dynamically, such as changing the directory, or encrypting the files before they reach the disk.
  • allow for an arbitrary back end, such as saving and loading to the filesystem, or a database, or a server, or memory.
  • make testing cleaner by allowing saving and loading files to and from memory, that way testing doesn't risk polluting or being affected by the filesystem.
I haven't implemented any middleware for it yet, but I wanted to have it so that it is extremely easy to change how and where the files are saved later.

Documentation

I also spent a day documenting all of the publisher code.  The file in the github repo /publisher/cpp/doc/html/index.html has the documentation home page, and from there you can explore the documentation.  It is organized my module and class in what I hope is a sensible manner.  I had previously put this off because the code was changing very rapidly, but I decided now that other people are looking at my code it is worthwhile to document it.

The Future

I am going to look at the OpenSSL apps and see if I can use the command line apps to do the cryptography.  This would be useful because it would allow for easier inclusion of different ciphers without adding a ton of boilerplate.  If this seems feasible and useful I will implement in the code to allow for arbitrary ciphers to be used for signing and encryption.

Friday, June 6, 2014

Functional Progress

Progress

This week I've been working more on the publisher program.  I've implemented the basics of the group key functionality, including:

  • Encoding and decoding file control information (encrypt file keys under different group keys)
  • Group key regression using the RSA scheme, including key derivation from the regression state
  • Stringifying and parsing group keys
  • Encrypting and decrypting group keys with the users public keys

I think most of the modular functionality of the publisher is there (although rough in a few places) it just needs to be "glued" together with an app that will handle persistent data, user interaction, and networking.

Moving Forward

The thing I want to get done in the most immediate future is get a runthrough of the publisher app with all of the file encryption control information there, instead of just the key encoded in Base64.  The next thing I want to do is work on the database side of the publisher app, which will include indexing and storing the public keys of people in the publishers groups, private key management (generating and storing the users private key), group management (handling which users are in which groups, and revocation).  I wanted to work on this next because all the work here should be fairly generic and work with almost any specification.  Then I want to move on to the key server, which handles the distribution of group keys to the people in the group.  This again, should be generic, although it the details will change with the specification.

Unrelated Topics

I'm extremely interested in teaching myself Haskell.  Haskell is a lazy purely functional language with an extremely strong type system.  It started out as an effort to consolidate the many functional languages around in the late 80s into a standardized language.  Since then it has started to grow in popularity and slowly become a more mainstream language.  The thing I like most about Haskell is the mindset of the developers and community.  Instead of creating hacky solutions to problems, they aim for general solutions rooted in mathematics, generally category theory.  For example in order to implement I/O, instead of allowing some functions to have side effects, like Lisp, they use an IO Monad to encapsulate the state of the IO and execute it as one IO action in the main function.  There are plenty of other examples of Haskell's use of Category Theory, both in the language and implemented in libraries, such as Functors and Monoids.  Through the use of these abstract concepts the design pattern in Haskell programs is kept consistent, instead of everybody writing their own {parser, http server, UI, ...} API, libraries use monads, or functors, or another general interface.  The advantage of that is that the programmer knows exactly how a monad (or other interface) works, and can reason about the computation much more easily.  I am planning on reading (in my own time) Categories for the Working Mathematician in order to get a better understanding the mathematics behind Haskell.  I think it will help me become a better programmer, and also do so in a way that will allow me to transfer my knowledge and intuition into another field, such as graph theory, physics, complexity theory, linguistics ect.

Wednesday, June 4, 2014

Switch to OpenSSL

First I will list some random half-baked thoughts, then I will continue onto the weekly update.

  • A remote side channel attack that uses javascript to perform timing attacks.  One possibility is to use the web-worker attack that was discussed in a meeting last Wednesday to slow down the computer in a consistent way, and then time how long a certain computationally heavy action takes and use the variations in that time as the side channel.  Possibly try to steal RSA exponents for a SSL connection by injecting javascript in a different unencrypted page loaded at the same time.
  • Provably correct implementations of cryptographic functions.  Use a proof assistant to prove the correctness of the implementation, then use a tool to extract the program code directly from the proof.  I haven't looked into it much, but I know there has been work done with provably correct micro-kernels, such as showing that they are immune to buffer overflows.


I spent most of the week switching from Crypto++ to OpenSSL for the cryptographic library in the publisher program.  It was brought to my attention that Crypto++ isn't nearly as actively maintained as OpenSSL.  I had originally avoided OpenSSL because of the implementation issues brought up, however, the OpenSSL cryptographic library is the standard cryptographic library for C/C++.  The main reason to switch was precautionary, it is unclear how many people are using Crypto++ in a major product, so we switched to the de facto standard.

I underestimated how long it would take me to switch from Crypto++ to OpenSSL.  I estimated it would take me a few hours, but it ended up taking 3 days.  The main reason is that I was unfamiliar with the OpenSSL interface, which isn't well documented, and has a steep learning curve.  I was also using Crypto++ for some tasks other than encryption and signing, such as key handling and encoding.  On top of that, I ran into a bug that I had an incredibly hard time fixing.  I was allocating memory for objects using OpenSSL's allocation functions, which call malloc, and then freeing them with delete, or visa versa.  I wasn't aware that this causes undefined behaviour, which caused a bug later in the program.  I have finished the transition and implemented a command line interface around the API.

Tuesday, May 27, 2014

Publisher Program and Servers

This week I spent time working on the publisher side of the Gnocchi.  I worked on building a framework for encoding, signing, and encrypting documents where all of the components are easily pluggable.  I looked into various C++ cryptography libraries and decided that Crypto++ would be the best option.

I looked at a few different crypto libraries in C++.  The most extensive and well maintained general purpose libraries seem to be GNU Crypto and Crypto++.  I decided to use Crypto++ because it is a modern and well maintained library.  Three previous versions of Crypto++ have meet the FIPS 140-2 level 1 requirements, which suggests that there isn't any obvious security issues.  It also has an extremely wide variety of algorithms with a consistent interface that makes them easily interchangeable.  Crypto++ makes it easy to handle function composition using data pipelining which makes it easy to encode, decode, pad, and perform other transformations on the data before it is run into the encryption function.  It also has native support for secure random number generation and key handling.

The main focus this week was to familiarize myself with Crypto++, and then building a sensible API and command line interface around it.  My goal was to minimize dependencies between different parts of the program and to make the control flow as easy to follow as possible.  I've nearly finished implementing an early iteration of the code base, the last few pieces of functionality left to implement are as follows:
  • Encoding the control information for encrypted documents
  • Public key and group key handling
  • Sending the processed document to the server
Similarly to the rest of the implementation, these things cannot be completed until there is a formal specification for Gnocchi.  However, I am planning on writing an example implementation so that when the specification is ready it will be easy to edit and extend the example code.

Here is an example input, signed output, and encrypted signed output.  In the signed document, the control part contains the signature of the input encoded in Base64.  In the encrypted document, the control information is just the AES key encoded in Base64, this is just a placeholder for the real functionality.

The Gnocchi Publisher should be able to work with any document store, which means allowing arbitrary file upload formats.  I was thinking about something like having downloadable configuration files that specify how to authenticate a user and the format of the upload request.  The configuration files could be stored by domain, e.g. (facebook.com, google.com, mail.google.com), where the most specific domain is used.  The server should be able to store the Gnocchi documents however it wants to, and neither the Gnocchi publisher nor browser extension shouldn't need to know anything about how the server works.

However the key handling server gets implemented, I think we should create an easy to use tool that allows users to create their own key store servers very simply.  There should be a one command tool that allows users to create their own key handling server and start it on a free PaaS (platform as a service) or their own server.  Some ways to distribute the server code are:
  • A node module uploaded to npm, the node package manager (if implemented in Node)
  • A Docker image ready to run the server
  • A git repository with the project ready to be uploaded to Heroku or another PaaS

Friday, May 16, 2014

Getting up to speed

This is my first blog post of the summer.  I am working on the NoSSL/Gnocchi project, which is both a cryptographic protocol space that doesn't involve SSL(NoSSL) and an implementation of a protocol in this space (Gnocchi).  Gnocchi is centered around the idea that you can't steal what you don't have.  Servers shouldn't be keeping private material online, which SSL requires.  Instead, content should be encrypted using different keys for every author, to minimize catastrophic points of failure.  Gnocchi is intended to be used in the common space of read-only, append-only data, such as a user's tweets, or medical sensor data.  Following is summaries of some of the reading I did this week to bring myself up to speed with the project, as well as some of the basic prototype implementation work I did.

Secure HTTP

Secure HTTP (S-HTTP) was a competitor with SSL for encrypting communication, however, it lost because both Microsoft and Netscape supported SSL (https).  S-HTTP runs on top of HTTP and encrypts the data instead of the connection.

It works by negotiating the signature and encryption schemes, and then encapsulating the HTTP message (generally) by any combination of authenticating, signing, or encrypting (including none of them) and then adding its own headers.  The recipient then unwraps the message using the specified cryptographic methods to get the clear HTTP message.  The host is sent in the clear in the S-HTTP method because it needs to be routed to the server correctly, however the path is inside the (possibly) encrypted HTTP message.

The biggest difference between S-HTTP and SSL is that SSL establishes a secure socket connection with the server and then transmits all future data through the encrypted channel.  However, in S-HTTP the plain HTTP message is encrypted and then encapsulated using the S-HTTP protocol and then that message is sent over a regular connection.  Since S-HTTP messages are clearly distinguishable from HTTP messages, they can be sent over the same port (80), while HTTPS messages require their own port (443).

Both systems have the same failure point where if the server is compromised, every users data is exposed since the data is decrypted once it is sent to the server.  They both require the server's private key to be used online, therefore reducing the security of the system.

Merkle Hash Trees

Merkle hash trees are used to efficiently and securely verify content of directory-like data structures.  To compute the top level hash you go through the tree depth first and compute the hash of the leaves (data), then to compute the hash of an inner node you concatenate the hashes of all of its children and then hash that.  That process will yield the top level hash which you can sign and distribute.  Then to verify that the content of a leaf, which contains the data, you recompute the hash of that node, and then use that along with the existing hashes of the other relevant nodes to recompute the top level hash, then check to make sure it matches the distributed signed hash.  This allows for verification of a node in linear time with the depth of the node (logarithmic to the number of nodes in a balanced tree).

SFS Read Only File System

The SFS read only file system allows for high throughput for one-writer many-reader content hosted on untrusted servers.  All cryptographic procedures that require the authors private key happen offline, minimizing the risk of attackers being able to steal the authors private key.  When the client downloads files from a server they verify it using the authors public key, which means the server doesn't have to be trusted.  This also has the effect greatly increasing the throughput of the server because all of the cryptographic procedures are offloaded to the client or author.

The basic workflow of the SFS read only file system is as follows:

  1. The writer creates the database (file system) offline and signs the top level hash of the Merkle hash tree.
  2. The writer pushes copies of the database to the servers.
  3. The reader makes a request to the server and gets the data for that level of the filesystem, then verifies the data using the Merkle hash tree and the authors public key.

MIME Types

The MIME types that would make the most sense for Gnocchi are "multipart/Signed" and "multipart/Encrypted".  They are general purpose MIME types for signed and encrypted content.  The encryption/signature procedure is specified in the control part of the message, where you can specify your own content type, such as "application/gnocchi-signature".  I have just started to implement a prototype of the MIME types for Gnocchi, I'm working in Python, which has an extensive MIME type module built in.  The MIME type classes automatically handle the signing, verifying, encrypting, and decrypting, however, I haven't started working on any of the key management yet.  I plan on having the basic functionality of the program working sometime on Monday.  I will have to update the server (mentioned next) to work with the new MIME types, since I worked on this after I built the server, but that shouldn't take long.

Server

I built an extremely simple server and uploaded it to a free hosting platform here (its not very interesting).  All it does is allow files to be uploaded which are then stored in a database by their filename.  Then the file can be viewed or downloaded by going to a url related to the files name.  Currently anyone can upload, view, download, and delete any file.  It is just supposed to be a very simple server for testing the Gnocchi system, and possibly expanded upon.  File uploads aren't actual files because it was easier to implement that way, however that can be changed without too much effort.  An example of how it works is as follows:

Upload a 'file' (if a file with this name already exists pick a new name or delete the old file)
curl -X POST -H "Content-Type: application/json" http://gnocchiserver.herokuapp.com/files -d '{"filename":"test", "data":"Hello Gnocchi world"}'

View the file (or just visit the url in your browser)
curl http://gnocchiserver.herokuapp.com/files/test

Download the file (if you visit the url in your browser it will automatically download)
curl http://gnocchiserver.herokuapp.com/download/test

Delete the file
curl -X DELETE http://gnocchiserver.herokuapp.com/files/test