Anonymization by High-Performance Scalable Mixing
Original work by David Chaum, Voting Systems Institute, USA; Debajyoti Das, Aniket Kate, Purdue University, USA;
Farid Javani, Alan T. Sherman, Cyber Defense Lab, UMBC, USA; Anna Krasnova, Radboud University, NL and Joeri de Ruiter, University of Birmingham, UK.
ABSTRACTcMix is the first practical system that can prevent traffic analysis of chat messages at scale. It creates a complete anonymity set every
second for all messages sent during the previous second. cMix uniquely requires no public-key operations during the sending of a chat message—neither by the smart phone sending the message, the roughly ten nodes that process each message in sequence, nor the receiving smart phone. A typical number of public-key operations are, however, performed by each node, but only in a precomputataion.
This meansa savings in hardware of more than an order of magnitude, since computation need not be conducted while all other nodes are waiting. It also allows slower and less reliable cryptographic hardware to be used. cMix is a suite of cryptographic protocols that can replace today’s dominant chat systems. It can provide payload secrecy, sender-recipient unlinkability, sender anonymity, and sender au-
thentication for recipients—all secure unless all cMix nodes are compromised.
For each batch, the adversary may know all senders and all recipients of traffic in the underlying packet-switched network, yet the adversary cannot link any sender to recipient. cMix provides fast delivery of messages, in both the forward and reverse directions, by having each node perform only a small number of symmetric-key and simple group operations (no modular exponentiations) in real time. Performance benefits include moderately low latency (despite large batch sizes) and efficient utilization of node machines. Senders (e.g., smartphones) perform their part of the cMix real-time protocols with similarly modest amounts of computation, resulting in negligible additional delay, battery, or bandwidth usage. The performance of cMix scales linearly in terms of the number of nodes, users, and messages, highlight some of the design features that go into cMix.
By processing messages in large batches, and by requiring all messages of a batch to have the same length, cMix avoids flow analysis attacks that plague many existing anonymity systems. Asfor most batched mixnets, cMix can support a variety of ways to manage batches. One option is “threshold and timing” , where the system “fires” (sends along all messages in the buffer) every t seconds, provided there are at least β0 messages in the buffer.
These choices have implications on anonymity and latency but are independent of the cMix system. cMix’s use of a cascade is a strength that yields large anonymity sets and avoids the traffic-analysis weaknesses and resulting small anonymity sets of other designs (see Section 6.2.1). In cMix, each sender choosing to participate in a particular round sends an input to the cMix system, which after passing through a fixed cascade of mix nodes, arrives in an output buffer. Unless all nodes collude, the outputs are unlinkable to the inputs, even if the adversary knows for each batch the set of senders and the set of receivers. By contrast, Tor cannot handle this powerful adversarial
model. We envision each mix node to be a powerful highly-reliable computing system, preferably located in a separate region of the
world. With traditional mixnets, to achieve low latency of messages, node hardware must be idle for much of the time. Attempts to increase machine utilization by pipelining require smaller batch sizes. By contrast, cMix’s use of precomputation facilitates higher machine utilization by having the expensive public-key operations performed on separate dedicated hardware hardware far in advance.
Furthermore, these precomputations are highly parallelizable. We assume enough computational resources are provided for the pre-
computations that they do not become a real-time bottleneck. In addition to reducing latency, cMix can also increase throughput because it eliminates all real-time expensive public-key operations by both users (including smartphones) and nodes. If the usage rate is consistently uniform, the throughput gain could be limited; however, the cMix design increases throughput significantly, when there are any spikes in real-time usage, by permitting betterload balancing with special-purpose machines continuously running highly parallelizable precomputations. Furthermore, as mentioned in Section 9.3, we are pursuing ways to reuse the precomputations securely.
The exact format of an input depends on the application. For example, for some applications, each input might be an ordered pair
(Bi,Mi), where Bi is the recipient and Mi is the payload.
The sender encrypts the entire input using message keys shared by the sender and each mix node. Each sender establishes a long-term
shared key separately with each cMix node. Each such shared key seeds a cryptographic pseudorandom number generator to produce
a sequence of message keys, the next in the sequence being selected when the sender chooses to participate in a particular round. Each
sender encrypts its input with modular multiplication by the next message key for each cMix node.
During the real-time mixing of an input batch, each cMix node replaces each of its message keys with a precomputed random value. Then, using another random value and a random precomputation also determined in the precomputation, each node includes the new random value and applies its permutation to the buffer of messages in the batch. Finally, using a value that was precomputed in a multiparty-secure manner from all the random values and permutations, a single group operation cancels all random numbers,
leaving the permuted output batch. By freeing mix nodes from performing expensive public-key operations in real time, real-time
mixing is much faster than in previous mix networks.