Demystifying HybridDKG - A Distributed Key Generation Scheme

HybridDKG is one of the latest distributed key generation schemes and can operate in an asynchronous environment, making it the most viable DKG scheme to use over the internet.
Photo of George Gkitsas
George Gkitsas
· 15 min read

Introduction

In a previous post, we discussed important VSS and DKG constructs and presented the Aggregatable DKG. Now that the reader better understands DKG, this article elaborates on a different DKG scheme called HybridDKG, as published in Distributed Key Generation in the Wild.

HybridDKG is a DKG that operates under the asynchronous communication model which is a much more realistic model than the synchronous model that most other DKG schemes rely on. Its underlying VSS, called HybridVSS, is based on AVSS, which is the component of HybridDKG that allows it to operate asynchronously. Since HybridDKG is operating under the asynchronous model, it requires both its VSS and DKG protocols to use consensus mechanisms in order for the nodes to agree on the secrets.

Assumptions

As with every cryptographic contruct, HybridDKG relies on certain assumptions in order to prove its security properties.

Communication model

HybridDKG uses the asynchronous model of communication, in which clocks of nodes may not be accurate (meaning they can be out of sync) and messages can be delayed for arbitrary period of time. In this model, it is assumed that the adversary manages the communication channels and can delay messages as it wishes. HybridDKG assumes that a real-world adversary cannot control communication channels for all the honest nodes and, thus, that the adversary (eventually) delivers all the messages between two uncrashed nodes.

A stricter assumption, named weak synchrony, is used only in order to prove the protocol's liveness. Under the weak synchrony assumption, the time difference delay(T)delay(T) between the time a message was sent (TT) and the time it is received doesn't grow indefinitely over time. Weak synchrony is not necessary for the protocol's correctness.

It is also assumed that all communication is authenticated and the adversary is unable to spoof messages.

Faults

HybridDKG can tolerate non-byzantine node crashes, non-byzantine broken communication links and byzantine faults. It groups non-byzantine crashes and non-byzantine single broken communication links under the same fault type. The byzantine adversary is considered static, meaning that it cannot change its choice of nodes as time progresses.

The protocol can tolerate:

  • Up to tt byzantine nodes
  • Up to ff non-byzantine crashed nodes and non-byzantine failed connections at any given instant in time
  • d(κ)d(\kappa) maximum number of crashes an adversary is allowed to perform during its lifetime

The number of the participants is:

n3t+2f+1n \ge 3t+2f+1

Adversary

An assumption introduced by the specific polynomial commitment scheme used in this HybridVSS is that the adversary is computationally bounded on the DLP with a security parameter κ\kappa.

Prerequisites

We first list some mathematical properties that will be useful later on:

  • To reconstruct a univariate polynomial of degree tt we need t+1t+1 points on that polynomial
  • Having a bivariate polynomial ϕ(x,y)\phi(x,y) of degree 2t2t, we can generate univariate polynomials by fixing the value of xx (or yy):

ax=x(y)=ϕ(x,y)a_{x=x'}(y) = \phi(x', y)

  • If the bivariate polynomial is symmetric, it holds that ϕ(x,y)=ϕ(y,x)\phi(x,y) = \phi(y,x)
  • From the above we can derive that for a symmetric ϕ\phi it holds ax(y)=ay(x)a_{x'}(y) = a_{y}(x')

HybridVSS

In the asynchronous model, each node follows two assumptions: that enough nodes will get their shares and there will be enough nodes participating in reconstruction. In order to remove the need to make assumptions, HybridVSS takes it a step further and provides a mechanism for nodes to keep track of how many other nodes are participating and their states instead of relying on the above assumptions.

We will now define the sharing, reconstruction, and recovery phases of HybridVSS. All the messages described below are tagged with a unique session ID (Pd,τ)(P_d, \tau), with PdP_d being the session's dealer and τ\tau a unique number.

Sharing

Initialization

The dealer PdP_d:

  • produces a random symmetric bivariate polynomial ϕ(x,y)=i,j=0tϕijxiyj\phi(x,y) = \sum_{i,j=0}^t\phi_{ij}x^iy^j with the secret encoded in as the constant factor ϕ(0,0)=ϕ00\phi(0,0) = \phi_{00}.
  • calculates a homomorphic commitment CC to the polynomial which is a matrix with Cij=gϕijC_{ij} = g^{\phi_{ij}}.

CC can be used by nodes to verify that:

  • a given univariate polynomial ai(x)=l=0tai,lxla_i(x)=\displaystyle\sum_{l=0}^ta_{i,l}x^l belongs to ϕ\phi by checking:

gai,l=?j=0t(Cjl)ij,l[0,t]g^{a_{i,l}} \stackrel{?}{=} \displaystyle\prod_{j=0}^t(C_{jl})^{i^j}, \forall l\in[0,t]

  • a given value α\alpha corresponds to ϕ(m,i)\phi(m,i) by checking:

gα=?j,l=0t(Cjl)mjilg^\alpha \stackrel{?}{=} \displaystyle\prod_{j,l=0}^t(C_{jl})^{m^ji^l}

Dealing
PdP_d then deals to each node PiP_i a univariate polynomial ai(x)=ϕ(x,i)a_i(x) = \phi(x,i) by sending a message called send\mathbf{send}. This message also delivers CC. CC is also sent with all other type of messages between nodes so that any two participants can verify the dealer’s commitment and prevent a malicious dealer from segregating the nodes by sharing different commitments.

Each PiP_i that receives a send\mathbf{send}, verifies ai(x)a_i(x) using CC. This polynomial provides a way for PiP_i to generate a point that belongs to the polynomial aj(x)a_j(x) of another node PjP_j since ai(j)=aj(i)a_i(j) = a_j(i) due to ϕ\phi's symmetry.

Gossip-based validation and consensus

Each PiP_i that receives a valid send\mathbf{send} message, sends a message called echo\mathbf{echo} to each node PjP_j to inform them of the fact that the dealer got in touch. echo\mathbf{echo} carries ai(j)a_i(j) as a proof that it was dealt a valid ai(x)a_i(x).

After sending echo\mathbf{echo} to all nodes, PiP_i enters a listening state, where it listens for two kinds of messages, echo\mathbf{echo} and ready\mathbf{ready}.

For every received echo\mathbf{echo} from a node PjP_j the receiver PiP_i:

  • verifies aj(i)a_j(i) against CC and if verified, it stores the point (j,aj(i))(j, a_j(i)) in a set Aci{(j,aj(i))}jA_c^i \xleftarrow{}\{(j, a_j(i))\}_j
  • if enough (n+t+12\lceil\frac{n+t+1}{2}\rceil) valid echo\mathbf{echo} messages with the same commitment CC are received to convince PiP_i that enough nodes were reached by the dealer, it uses the points in AciA_c^i to interpolate a polynomial aˉi(x)\bar{a}_i(x)
  • sends a ready\mathbf{ready} to all other nodes carrying a point aˉi(j)\bar{a}_i(j) as a proof of the fact that it received enough echo\mathbf{echo}s

For every received and verified ready\mathbf{ready}, PiP_i:

  • stores the received point aˉj(i)\bar{a}_j(i) in the same set AciA_c^i
  • if enough (t+1t+1) ready\mathbf{ready} messages with the same commitment CC are received to be able to interpolate a polynomial of degree tt, PiP_i interpolates aˉi(x)\bar{a}_i(x)
  • sends a ready\mathbf{ready} that carries aˉi(j)\bar{a}_i(j) to each node PjP_j
  • if enough ready\mathbf{ready} messages (ntfn-t-f) with the same commitment CC are received for PiP_i to be convinced that enough nodes know that enough nodes are participating, it calculates its share si=aˉi(0)s_i = \bar{a}_i(0) and stops.

Additional notes
A node terminates the sharing process if it has received (ntf)(n-t-f) ready\mathbf{ready} messages. Here, the node makes sure that there is a sufficient amount of honest nodes with shares. In other words, it makes sure that the participating honest nodes form a majority in the presence of up to tt byzantine adversaries and up to ff crashed nodes.

If a node received a ready\mathbf{ready} from another node, it implies that the other node has sent an echo and it was never received. I.e., a ready\mathbf{ready} implies an echo\mathbf{echo}. It is possible for a node to finish the sharing phase by only receiving enough ready messages. In this case, it needs to interpolate aˉi(x)\bar{a}_i(x) in order to calculate its share, since the interpolation in echo\mathbf{echo} processing didn't happen. This is the reason behind the interpolation step in processing a ready\mathbf{ready} message.

An important note is that a node will only process the first send\mathbf{send}, echo\mathbf{echo} and share\mathbf{share} messages it received from a specific node PiP_i during a session (Pd,τ)(P_d, \tau). This is to make sure that:

  • nodes' echo\mathbf{echo} and ready\mathbf{ready} counters cannot be manipulated by an adversary by repeating messages
  • any recovery messages in circulation will not modify the systems state (this will become clear in the recovery subsection below)

Reconstruction

Let's first explain how the shares relate to the secret. The share of PiP_i is si=aˉi(0)=aˉ0(i)s_i = \bar{a}_i(0) = \bar{a}_0(i), which means that given t+1t+1 shares we can interpolate aˉ0(x)=ϕ(0,x)\bar{a}_0(x) = \phi(0,x). We can then evaluate this polynomial to 00 to get the secret s=aˉ0(0)=ϕ(0,0)s=\bar{a}_0(0)=\phi(0,0).

To reconstruct the secret, each node sends its share sis_i to all other nodes using the message reconstruct_share\mathbf{reconstruct\_share}. It also listens for reconstruct_share\mathbf{reconstruct\_share} messages. For each received reconstruct_share\mathbf{reconstruct\_share} from a node PjP_j, it will first verify aj(0)a_j(0) against the commitment CC:
gsi=?j=0t(Cj0)mjg^{s_i} \stackrel{?}{=} \displaystyle\prod_{j=0}^t(C_{j0})^{m^j}
and if the point is verified, it will store it. When the node has collected t+1t+1 points, it is ready to interpolate aˉ0(x)\bar{a}_0(x) and then obtain the secret aˉ0(0)\bar{a}_0(0).

Recovery

We denote d(κ)d(\kappa) (κ\kappa is the system's security parameter) the maximum number of crashes an adversary is allowed to perform during its lifetime.

During recovery, the previously crashed node will ask the help of other nodes in replaying its previous communication. Each node keeps a record of all outgoing messages along with their intended recipients in a set BB, and BlB_l indicates the subset of BB intended for the node PlP_l. Additionally, each node maintains two counters cntcnt and cntlcnt_l. The former tracks the number of overall (system-wide) help requests and the latter the number of help requests sent by each node PlP_l.

A crashed node that just recovered will first send a help\mathbf{help} message to all nodes. It will also replay messages in BB. When a node receives the help\mathbf{help} message it will first make sure that the help limits are not exceeded: cnt(t+1)d(κ)cnt \le (t+1)d(\kappa) and cntld(κ)cnt_l \le d(\kappa) and if these checks pass, it will replay all messages in BlB_l, thus helping to replay the message history that is relevant to the recovered node.

With the above in mind we can see that, if recovery messages were to be processed, many, if not all, packet counters would count a recovered node twice.

HybridDKG

As with other DKG constructions, HybridDKG makes all nodes HybridVSS dealers. Each node generates a collection of shares from several HybridVSS instances and its final share will be the sum of those shares. Unlike DKGs operating in the synchronous setting, the asynchronous model introduces the requirement of agreement on at least (t+1)(t+1) successful HybridVSS instances for a set of honest nodes. An additional complexity of the asynchronous setting is the the need to differentiate between crashed and slow nodes.

HybridDKG introduces a role called leader (L\mathcal{L}), which is responsible for making a proposal of a set of finished HybridVSS instances to be used. Since the leader might be malicious or crash, a leader-change mechanism is introduced. Nodes use timeouts to decide if a leader should be considered crashed.

HybridVSS is extended with a message, shared\mathbf{shared} which is sent by every node that finishes the HybridVSS sharing phase. It carries all the ready\mathbf{ready} messages for the session (Pd,τ)(P_d,\tau) and signatures over them, which helps the leader to provide a validity proof of its proposal.

HybridDKG uses two phases, the optimistic and the pessimistic phase: the former is responsible for the key generation and is where crash recoveries take place; the latter is responsible for executing potential leader-change operations.

Optimistic phase

HybridVSS execution

The optimistic phase starts by each node PdP_d becoming a dealer of a HybridVSS session for a secret sds_d. At the end of a HybridVSS session, each node PiP_i broadcasts a (Pd,τ,shared,Cd,si,d,Rd)(P_d, \tau, \mathbf{shared}, C_d, s_{i,d}, R_d) message. The set RdR_d includes ntfn−t−f signed ready\mathbf{ready} messages for session (Pd,τ)(P_d,\tau), si,ds_{i,d} is the share of node PiP_i and CdC_d is its commitment. All nodes collect PdP_d and RdR_d of received shared\mathbf{shared} messages:
Q^Q^{Pd}\hat{Q} \xleftarrow{} \hat{Q}\cup\{P_d\}
R^R^{Rd}\hat{R} \xleftarrow{} \hat{R}\cup\{R_d\}

VSS proposal

Once L\mathcal{L} has processed t+1t+1 shared\mathbf{shared} messages, it will broadcast a send\mathbf{send} message, proposing the set of VSS instances that these shared\mathbf{shared} messages correspond to. This message is HybridDKG specific and not to be confused with the send\mathbf{send} message of HybridVSS. send\mathbf{send} carries Q^\hat{Q} which constitutes L\mathcal{L}'s proposal on the set of HybridVSS instances to use. It also carries R^\hat{R} as a proof that the sessions referred by Q^\hat{Q} are valid.

Leader-change condition

Any node, apart from L\mathcal{L}, that processed t+1t+1 shared\mathbf{shared} messages will start a local timer that stops after delay(T)delay(T). If a node's timer stops before the node has reached the end of the protocol, the node assumes that the leader has either crashed or is malicious and requests a leader change. A leader-change request is performed by broadcasting a lead_ch\mathbf{lead\_ch} message. We defer explaining the leader-change logic until we finish the optimistic phase.

Gossip-based validation and consensus

All nodes, apart from L\mathcal{L}, will start listening for send\mathbf{send}, echo\mathbf{echo} and ready\mathbf{ready} messages. Again, these messages are HybridDKG specific and not to be confused with their synonymous HybridVSS. Conceptually though, these messages serve the same purposes as in HybridVSS.

An echo\mathbf{echo} message is send upon receipt of a valid send\mathbf{send}. The purpose of echo\mathbf{echo} is to inform the receiver that the sender was reached by L\mathcal{L} with a valid proposal. echo\mathbf{echo} carries the proposed set Q\mathcal{Q}.

A ready\mathbf{ready} message is send by nodes that have received enough (n+t+12\lceil\frac{n+t+1}{2}\rceil) echo\mathbf{echo} messages for the same set QQ or enough (t+1t+1) ready\mathbf{ready} messages for the same set QQ. At this point, the node knows that the protocol can finish successfully, since it knows enough participating nodes and the ready\mathbf{ready} message informs the receivers of that fact. The node will also update its Q^\hat{Q} set to QQ, since at this point the set is agreed, and keep the relevant signed echo\mathbf{echo} or ready\mathbf{ready} messages as a proof. This update of the set serves as a note upon which VSS instances were agreed and is used during a leader change (more on that at the section explaining the pessimistic phase).

Shares construction

Finally, when a node receives enough (ntfn-t-f) ready\mathbf{ready} messages, it can finish the sharing protocol by calculating its share:
si=PdQsi,ds_i = \displaystyle\sum_{P_d \in Q}s_{i,d}
and its commitment matrix, generated by the commitment matrices of each HybridVSS instance:
C = {Cp,q}={PdQ(Cd)p,q}C_{p,q}\} = \{\displaystyle\prod_{P_d \in Q}(C_d)_{p,q}\}

Pessimistic phase

A node enters this phase when it becomes unsatisfied with the current leader. The reasons for this dissatisfaction can be due to a time out or the receipt of an invalid message from the leader.

A predefined cyclic permutation of nodes is assumed to be agreed upfront. For simplicity, we will assume that leaders are sorted in a simple ascending order.

When entering this phase, the node will broadcast a message, called lead_ch\mathbf{lead\_ch}, informing other nodes that it proposes a leader change. The messages carries the proposed leader Lˉ\mathcal{\bar{L}} to which the nodes want to change. A dissatisfied node proposes the node that is next (in the ordering) of its current leader.

Two things need to be agreed between nodes: if the leader change is happening and, if so, which node will be the new leader.

A node that receives t+f+1t+f+1 lead_ch\mathbf{lead\_ch} messages will be convinced that a leader change is happening and will broadcast a lead_ch\mathbf{lead\_ch} proposing the smallest proposed leader it received. If the node receives t+f+1t+f+1 lead_ch\mathbf{lead\_ch} messages proposing the same leader, it will accept that leader since consensus on the leader is reached. Then, if the node is not the next leader, it will restart its counter and enter the optimistic mode.

In the case the node is the next leader, it will use the previously stored Q^\hat{Q} set as the VSS set to use. It will broadcast a send\mathbf{send} message with this set and the accompanying signed packets (shared\mathbf{shared}, echo\mathbf{echo} or ready\mathbf{ready}) as a proof and then enter the optimistic mode.

Crash recovery

For recovery of a crashed node, the exact same approach with HybridVSS is used.

Reconstruction

Reconstruction of a HybridDKG secret is performed the same way as in HybridVSS. Each node broadcasts its share and collects shares from other nodes. When a node has collected enough shares, it will interpolate the corresponding polynomial and evaluate it at 00 to calculate the secret.

Performance

HybridVSS

Assuming no crashes and thus no recovery messages:
The size of HybridVSS messages is dominated by the size of the commitment CC which has t(t+1)2\frac{t(t+1)}{2} group elements and thus the protocol's bit complexity is O(κn4)\mathcal{O}(\kappa n^4). An improvement exists that reduces the bit complexity down to O(κn3)\mathcal{O}(\kappa n^3) by using a collision-resistant hash function (AVSS, Sec. 3.4). In the following calculations, we assume that this improvement is used. The message complexity is O(n2)\mathcal{O}(n^2).

Taking into account the recovery messages, the message complexity becomes O(tdn2)\mathcal{O}(tdn^2) and the the bit complexity O(κtdn3)\mathcal{O}(κtdn^3) were d=d(κ)d = d(\kappa).

HybridDKG

With up to nn VSS instances being able to complete and with no pessimistic phases taking place we have O(tdn3)\mathcal{O}(tdn^3) message complexity and O(κtdn4)\mathcal{O}(κtdn^4) bit complexity.

Counting in pessimistic phases, we have at most O(d)\mathcal{O}(d) leader changing operations. Each leader change operation has O(tdn2)\mathcal{O}(tdn^2) message and O(κtdn3)\mathcal{O}(κtdn^3) bit complexity. Thus the overall overhead of the pessimistic phase is O(td2n2)\mathcal{O}(td^2n^2) message and O(κtd2n3)\mathcal{O}(κtd^2n^3) bit complexity.

Adding the pessimistic overhead to HybridDKG complexities, we end up with O(tdn2(n+d))\mathcal{O}(tdn^2(n + d)) message and O(κtdn3(n+d))\mathcal{O}(κtdn^3(n + d)) bit complexities.

Written by George Gkitsas, zero-knowledge cryptography researcher & protocol developer at Heliax, the team building the Anoma Network. If you're interested in zero-knowledge cryptography and cutting-edge cryptographic protocols engineering positions in Rust, check out the open positions at Heliax.