Friday, August 15, 2014

Gabbing Noisily On, Concerning Code Improvements

Rewrite

I have finished working on the publisher code rewrite. The rewritten code is now able to perform all of the operations that the old code could, and it has a few improvements to the CLI. However, most of the improvements have been internal to the code.
The main goal of the rewrite was to decouple the code and make it more maintainable. In this, I believe I was successful, some metrics supporting this claim are listed below:
  • The code compiles about 4x faster, ~9s instead of about ~40s.
  • Only one file depends on OpenSSL, and similarly only one file depends on Boost. Each of these files are only imported by one file, which speeds up compile time and reduces dependencies.
  • All basic operations happen through abstract base class interfaces which allow any part to be swapped out with a different implementation without needing to change any of the code. For example, cryptographic operations can be done through the OpenSSL CLI or its C API, and switching the two out only requires changing one line. The other interfaces are an interface for storing other peoples certificates, formatting messages, and storing data to disk.
  • The structure of the code base has been changed from a flat layout with all of the files in the same directory, to a heirarchical structure which separates the different modules.
  • The publisher API easier to use, which includes storing the publishers private key and certificate instead of asking the user to enter it every time.
During the rewrite I also made a few miscilanious improvements, a few are listed below:
  • I added a script that parses a C++ source file and extracts the enums in it and outputs functions that map the enums to and from strings.
  • I cleaned up data storage and serialization. This made file storage much more stable, and I seperated datum into different files to minimize parsing.
  • I made a much simpilier make file that generates dependencies automatically using the C++ compiler.

Things I am currently working on

Publisher App

Now I am working on adding extra convienence features to the publisher app to make it more user friendly. Some of the things that I am working on are:
  • Displaying information about a group such as its members, what version it is on, and the derived key of a certain version.
  • Adding nicknames to the members certificates so it is easier to refer to a member. Currently you have to know the hex fingerprint of their certificate to add or remove them from a group.
  • Communication with the key server. This should simply be a matter of formatting, then using python to send the actual request because I do not want to work with C++ networking when I could just python.

Poster

I am also just starting to work on the poster describing what I did this summer. My plan is to work on it this weekend and the beginning of next week before I start band week. I want to have a draft by Sunday night so I can spend at least the next week improving it. Currently my plan is to make it in LaTeX so that it can be viewed digitally and printed out.

Monday, August 4, 2014

Abstractly (Re)factoring the Publisher Code

The first thing I worked on is fixing a bug in the group key encryption.  On other systems it was throwing an error when generating RSA key pairs.  The bug ended up being a small scoping issue that my compiler automatically fixed, but other compilers didn't.

Then I worked on implementing encryption of group keys under each member.  This works by encrypting the message once under a random AES-256 key/IV, and then encrypting the AES key under each group members public key.  This indirection is necessary because RSA encryption can only handle inputs that are less than the length of the key.

After that I worked on refactoring the publisher code.  Since the summer is coming to a close, I want the publisher code to be extremely readable for the next person, whoever that may be.  The first goal of my refactor is to have all of the "low level" functionality, such as crypto, formatting, and file access, to be accessed through an abstract base class interface so the implementation can be changed out without changing the code that uses it.  This technique also allows tests on the interface to be applied to every implementation of the interface (sub-class) without any extra code.  For example, the code could switch from using the OpenSSL API to using the OpenSSL CLI by only changing which implementation of the crypto interface is used, which is only located in one place.

Along with the refactor, I've been adding unit tests to the code base.  This is one of the "technical debts" that I've built up over the summer that I need to pay off.  However, the nice thing about writing the tests alongside the refactored code, is that I get to use the fancy new abstract base classes to test all implementations of the interface at once.

Lastly, I had to correct a mistake I made with group key encryption.  I had incorrectly assumed that the public exponent should be encrypted when distributing the group key to each group member.  Therefore, I used a symmetric cipher to encrypt the group key and then encrypt the symmetric key under each members public key, because if you include the public exponent in encrypted part, it is too long for RSA encryption.  However, I learned that the public exponent does not need to be encrypted, so the group key state can be directly encrypted under each members public key.

Monday, July 7, 2014

OpenSSL Command Line Apps

This week I worked on supporting arbitrary encryption and signing functions within the publisher app.  In order to do this I switched from using OpenSSL's C API to using its command line interface.  I made this change because when using the C API, for each function I wanted to support I needed to add about 40 lines of code, as well as understand the cipher well enough to choose a good configuration.  However, with the command line interface the same code can support any cipher that OpenSSL supports just by passing in a different cipher name.  This reduced complexity and consistent interface should help reduce subtle bugs in the cryptographic functions.  Since the cryptography is the main component of the publisher app, it is important for it to be implemented correctly.

The OpenSSL command line interface is called by forking the process and redirecting the input/output streams to pipes to communicate with the parent process, which limits publisher application use to UNIX like systems.

While implementing encryption with the CLI, I found a bug in OpenSSL key and IV generation.  When you call openssl enc -aes-256-cbc -P -nosalt -pass stdin you pass in the password through standard input to generate the AES-256 key and IV.  When you pass in random bytes, for example from head -c 16 /dev/random, there is a possibility that most or all of the random bytes will be ignored.  The key derivation function reads the password in as a null terminated string, so as soon as the null byte is passed in, it stops reading the data.  Therefore if one of the first few bytes is the null byte, the key and IV generated won't have much entropy.  For example,

If you pass in aASCII '5' (0x35), a null byte, followed by any data.
openssl enc -aes-256-cbc -P -nosalt -pass stdin
key=E4DA3B7FBBCE2345D7772B0674A318D50B6DEB36D53E14600683C32878F482DC
iv =FB205138427CA770AD173FAFA25CDA94


The output will be the same as if you just passed in an ASCII '5'.
openssl enc -aes-256-cbc -P -nosalt -pass pass:5
key=E4DA3B7FBBCE2345D7772B0674A318D50B6DEB36D53E14600683C32878F482DC
iv =FB205138427CA770AD173FAFA25CDA94


This effectively reduces the key space of the resulting key and IV to 256^n, where n is the location of the first null byte.  An attacker with 2^16 entries containing keys generated from 0x0000 to 0xffff can break any password with a null byte in the first 3 bytes.  Assuming the password passed in is uniform random bytes, about 2/256 passwords will be in the attackers table of 2^16 elements, which means that about 1% of passwords will be incredibly weak.  Adding a salt doesn't mitigate the risk either, since the salt isn't secret information.  

Thursday, June 19, 2014

Group Keys and Filesystems

I spent this week finishing up the current iteration of group keys and working on a filesystem abstraction.  At this point the whole publisher program is "working", I have managed a whole run through.  However, the public key interface is extremely basic, it just takes in a PEM encoded public key file and stores it internally, and all of the actual encoding and cryptography is still up in the air without a specification.

Group Keys

The group key implementation is working.  The program allows you to create groups, add/remove other peoples public keys to groups (change the membership), and encrypt/decrypt messages using the key derived from the group state.  Whenever a user is removed from a group the group state is wound using the RSA scheme.  The control information that is put in the encryption function now encrypts the file key under the group key of each group that has access to it.  To get the file key for decryption the user must be a member of at least one group that the file key is encrypted under and have a version greater or equal to the key version it was encrypted under.

Filesystem

This week I also worked on a filesystem abstraction that handles the publisher program's local storage and I/O.  I built the system to:

  • allow for insertion of arbitrary middleware dynamically, such as changing the directory, or encrypting the files before they reach the disk.
  • allow for an arbitrary back end, such as saving and loading to the filesystem, or a database, or a server, or memory.
  • make testing cleaner by allowing saving and loading files to and from memory, that way testing doesn't risk polluting or being affected by the filesystem.
I haven't implemented any middleware for it yet, but I wanted to have it so that it is extremely easy to change how and where the files are saved later.

Documentation

I also spent a day documenting all of the publisher code.  The file in the github repo /publisher/cpp/doc/html/index.html has the documentation home page, and from there you can explore the documentation.  It is organized my module and class in what I hope is a sensible manner.  I had previously put this off because the code was changing very rapidly, but I decided now that other people are looking at my code it is worthwhile to document it.

The Future

I am going to look at the OpenSSL apps and see if I can use the command line apps to do the cryptography.  This would be useful because it would allow for easier inclusion of different ciphers without adding a ton of boilerplate.  If this seems feasible and useful I will implement in the code to allow for arbitrary ciphers to be used for signing and encryption.

Friday, June 6, 2014

Functional Progress

Progress

This week I've been working more on the publisher program.  I've implemented the basics of the group key functionality, including:

  • Encoding and decoding file control information (encrypt file keys under different group keys)
  • Group key regression using the RSA scheme, including key derivation from the regression state
  • Stringifying and parsing group keys
  • Encrypting and decrypting group keys with the users public keys

I think most of the modular functionality of the publisher is there (although rough in a few places) it just needs to be "glued" together with an app that will handle persistent data, user interaction, and networking.

Moving Forward

The thing I want to get done in the most immediate future is get a runthrough of the publisher app with all of the file encryption control information there, instead of just the key encoded in Base64.  The next thing I want to do is work on the database side of the publisher app, which will include indexing and storing the public keys of people in the publishers groups, private key management (generating and storing the users private key), group management (handling which users are in which groups, and revocation).  I wanted to work on this next because all the work here should be fairly generic and work with almost any specification.  Then I want to move on to the key server, which handles the distribution of group keys to the people in the group.  This again, should be generic, although it the details will change with the specification.

Unrelated Topics

I'm extremely interested in teaching myself Haskell.  Haskell is a lazy purely functional language with an extremely strong type system.  It started out as an effort to consolidate the many functional languages around in the late 80s into a standardized language.  Since then it has started to grow in popularity and slowly become a more mainstream language.  The thing I like most about Haskell is the mindset of the developers and community.  Instead of creating hacky solutions to problems, they aim for general solutions rooted in mathematics, generally category theory.  For example in order to implement I/O, instead of allowing some functions to have side effects, like Lisp, they use an IO Monad to encapsulate the state of the IO and execute it as one IO action in the main function.  There are plenty of other examples of Haskell's use of Category Theory, both in the language and implemented in libraries, such as Functors and Monoids.  Through the use of these abstract concepts the design pattern in Haskell programs is kept consistent, instead of everybody writing their own {parser, http server, UI, ...} API, libraries use monads, or functors, or another general interface.  The advantage of that is that the programmer knows exactly how a monad (or other interface) works, and can reason about the computation much more easily.  I am planning on reading (in my own time) Categories for the Working Mathematician in order to get a better understanding the mathematics behind Haskell.  I think it will help me become a better programmer, and also do so in a way that will allow me to transfer my knowledge and intuition into another field, such as graph theory, physics, complexity theory, linguistics ect.

Wednesday, June 4, 2014

Switch to OpenSSL

First I will list some random half-baked thoughts, then I will continue onto the weekly update.

  • A remote side channel attack that uses javascript to perform timing attacks.  One possibility is to use the web-worker attack that was discussed in a meeting last Wednesday to slow down the computer in a consistent way, and then time how long a certain computationally heavy action takes and use the variations in that time as the side channel.  Possibly try to steal RSA exponents for a SSL connection by injecting javascript in a different unencrypted page loaded at the same time.
  • Provably correct implementations of cryptographic functions.  Use a proof assistant to prove the correctness of the implementation, then use a tool to extract the program code directly from the proof.  I haven't looked into it much, but I know there has been work done with provably correct micro-kernels, such as showing that they are immune to buffer overflows.


I spent most of the week switching from Crypto++ to OpenSSL for the cryptographic library in the publisher program.  It was brought to my attention that Crypto++ isn't nearly as actively maintained as OpenSSL.  I had originally avoided OpenSSL because of the implementation issues brought up, however, the OpenSSL cryptographic library is the standard cryptographic library for C/C++.  The main reason to switch was precautionary, it is unclear how many people are using Crypto++ in a major product, so we switched to the de facto standard.

I underestimated how long it would take me to switch from Crypto++ to OpenSSL.  I estimated it would take me a few hours, but it ended up taking 3 days.  The main reason is that I was unfamiliar with the OpenSSL interface, which isn't well documented, and has a steep learning curve.  I was also using Crypto++ for some tasks other than encryption and signing, such as key handling and encoding.  On top of that, I ran into a bug that I had an incredibly hard time fixing.  I was allocating memory for objects using OpenSSL's allocation functions, which call malloc, and then freeing them with delete, or visa versa.  I wasn't aware that this causes undefined behaviour, which caused a bug later in the program.  I have finished the transition and implemented a command line interface around the API.

Tuesday, May 27, 2014

Publisher Program and Servers

This week I spent time working on the publisher side of the Gnocchi.  I worked on building a framework for encoding, signing, and encrypting documents where all of the components are easily pluggable.  I looked into various C++ cryptography libraries and decided that Crypto++ would be the best option.

I looked at a few different crypto libraries in C++.  The most extensive and well maintained general purpose libraries seem to be GNU Crypto and Crypto++.  I decided to use Crypto++ because it is a modern and well maintained library.  Three previous versions of Crypto++ have meet the FIPS 140-2 level 1 requirements, which suggests that there isn't any obvious security issues.  It also has an extremely wide variety of algorithms with a consistent interface that makes them easily interchangeable.  Crypto++ makes it easy to handle function composition using data pipelining which makes it easy to encode, decode, pad, and perform other transformations on the data before it is run into the encryption function.  It also has native support for secure random number generation and key handling.

The main focus this week was to familiarize myself with Crypto++, and then building a sensible API and command line interface around it.  My goal was to minimize dependencies between different parts of the program and to make the control flow as easy to follow as possible.  I've nearly finished implementing an early iteration of the code base, the last few pieces of functionality left to implement are as follows:
  • Encoding the control information for encrypted documents
  • Public key and group key handling
  • Sending the processed document to the server
Similarly to the rest of the implementation, these things cannot be completed until there is a formal specification for Gnocchi.  However, I am planning on writing an example implementation so that when the specification is ready it will be easy to edit and extend the example code.

Here is an example input, signed output, and encrypted signed output.  In the signed document, the control part contains the signature of the input encoded in Base64.  In the encrypted document, the control information is just the AES key encoded in Base64, this is just a placeholder for the real functionality.

The Gnocchi Publisher should be able to work with any document store, which means allowing arbitrary file upload formats.  I was thinking about something like having downloadable configuration files that specify how to authenticate a user and the format of the upload request.  The configuration files could be stored by domain, e.g. (facebook.com, google.com, mail.google.com), where the most specific domain is used.  The server should be able to store the Gnocchi documents however it wants to, and neither the Gnocchi publisher nor browser extension shouldn't need to know anything about how the server works.

However the key handling server gets implemented, I think we should create an easy to use tool that allows users to create their own key store servers very simply.  There should be a one command tool that allows users to create their own key handling server and start it on a free PaaS (platform as a service) or their own server.  Some ways to distribute the server code are:
  • A node module uploaded to npm, the node package manager (if implemented in Node)
  • A Docker image ready to run the server
  • A git repository with the project ready to be uploaded to Heroku or another PaaS