Providing secure on-chain randomness on Anoma

Anoma uses a threshold BLS digital signature to generate an unpredictable, bias-resistant and publicly verifiable on-chain pseudorandom number. This process will be implemented so that a pseudorandom number will be produced with every new block.
Alex Flory Samartino

Abstract

In order to generate randomness, an on-chain pseudorandom number should be unpredictable, bias-resistant and publicly verifiable. By implementing a threshold BLS digital signature, Anoma is able to generate a pseudorandom number that satisfies those requirements.

Introduction

Randomness is used for many applications in blockchain technology. For instance, it is used to generate private keys, and select a block producer (for some proof of stake blockchains), and for decentralized gambling or gaming applications, etc. In the context of blockchains, randomness can be classified as on-chain randomness (randomness accessible to the state machine) or off-chain randomness (randomness not directly accessible to the state machine). On-chain randomness and off-chain randomness are different and achieve distinctive tasks. They can also be combined to complement each other.

As computers are deterministic devices, it is impossible to generate a true random number without an external data source, therefore a pseudorandom number is used. A pseudorandom number looks indistinguishable from a true random number, such that an adversary can not distinguish if a number has been created randomly or via a pseudorandom number generator. Pseudorandom number functions are deterministic, i.e. given the same input, they will always produce the same output. To be suitable for generating randomness on blockchain, the pseudorandom number should have certain properties:

• Publicly verifiable: anyone should be able to check that the pseudorandom number was properly produced.
• Unpredictable: no one can predict the next pseudorandom number.
• Bias-resistant: no one should be able to bias the generation of the pseudorandom number in any way.

As we will see in the next section, the process to generate a pseudorandom number on Anoma has been selected to satisfy these constraints.

Randomness on Anoma

The process used to generate randomness on Anoma is a combination of various approaches described in different research papers (see f.i. pragmatic signature aggregation with BLS, short signatures from the Weil pairing, and Threshold Signatures, Multisignatures and Blind Signatures Based on the Gap-Diffie-Hellman-Group Signature Scheme). The setup phase consists of a distributed key generation (DKG). This DKG is then combined with a Boneh-Lynn-Shacham (BLS) digital signature process to produce a threshold BLS digital signature. On a high level, each participant will create signature shares. By communicating between them, participants gather different signature shares from each other. As soon as a participant collects enough signature shares (to meet the threshold), they are able to aggregate them to produce the signature. This signature will then be verified (using the public key generated during the DKG). If it is correct, this signature will be hashed, and this hash will be the pseudorandom number. This process allows us to achieve pseudorandomness with a threshold defined by the algorithm. As no one knows the signature beforehand (as it is only determined once the threshold is met), no one knows the pseudorandom number beforehand.

In order to understand how this pseudorandomness is achieved, we will have to dive deeper into the cryptographic principles underpinning this methodology. The following is not meant as a crash course on these principles, but rather a small initiation to help you grasp the concepts on a high level. Note that some of those concepts have already been discussed in more detail in the first article on Anoma’s blog. If you are already familiar with DKG and BLS digital signature, feel free to skip the following sections and jump straight to the last section.

Shamir Secret Sharing

Shamir secret sharing was created by Adi Shamir, and is a way to secure a secret by distributing it among multiple participants. It is built upon polynomial interpolation over a finite field. In other words, given a set of points, one needs to discover the polynomial of the lowest possible degree that passes through those points. In order to interpolate a polynomial, one needs to receive a sufficient set of values. For instance, if the polynomial that someone needs to interpolate is a linear function (i.e. $f(x) = ax + b$, with the secret being $b$), and they receive only 1 set of values $(c,d)$, they will not be able to interpolate this linear function. Indeed, they need at least two different sets of values (e.g. $(c,d)$ and $(e,f)$) to be able to interpolate this linear function and find the secret value $b$, as seen below.

Fig 1. In order to interpolate a function of degree 1, one needs at least two different sets of values. Thus, by plotting two sets of values (e.g. $(c,d)$ and $(e,f)$), one is able to interpolate the linear function and find the vertical intercept $b$ (i.e. the secret).

The set of values is usually referred to as shares, while the minimum number of shares needed to interpolate the polynomial is referred to as the threshold. More generally, a polynomial of degree $k$ is defined by $k+1$ coefficients. Thus, to interpolate a polynomial of degree $k$, one needs at least $k+1$ shares (e.g. ($x_1$,$y_1$); ($x_1$,$y_2$) ⋯($x_{k+1}$,$y_{k+1}$) ), with no two $x_i$ equal). The interpolation of a polynomial is done using Lagrange interpolation. The basic idea is similar to the example provided above. So, if you understand this concept, feel free to skip the example below.

Let us assume that Victoria wants to distribute her secret among 6 people, and would like the threshold to be 3 (i.e. the secret can be found only if someone gathers at least 3 different shares). Therefore, the polynomial to interpolate will be of degree 2 (as $k+ 1 = 3$). Suppose that Victoria’s secret is 1234 and she randomly selects the two other coefficients, she obtains the following polynomial:

$f(x) =1234 + 194x + 98x^2$

Victoria will then create 6 shares (f.i. $(1,f(1)) = (1, 1234+ 194 \cdot 1 +$ $98\cdot1^2)= (1,1526)$). She will then distribute the following 6 shares to 6 people:

$(1, 1526); (2,2014); (3,2698); (4,3578); (5,4654); (6,5926)$

In order to interpolate this function, someone needs to know at least 3 shares among the 6 that were provided by Victoria. Let us assume that someone gathers the second, third, and fifth share. Using Lagrange interpolation, this person will get (with $x_0,y_0=(2,2014)$; $x_1,y_1=(3,2698)$; $x_2,y_2=(5,4654)$):

$l_0=\frac{x-x_1}{x_0-x_1} \cdot \frac{x-x_2}{x_0-x_2} = \frac{x-3}{2-3} \cdot \frac{x-5}{2-5}=\frac{1}{3} x^2 - \frac{8}{3}x + 5$
$l_1=\frac{x-x_0}{x_1-x_0} \cdot \frac{x-x_2}{x_1-x_2} = \frac{x-2}{3-2} \cdot \frac{x-5}{3-5}= -\frac{1}{2} x^2+ \frac{7}{2}x - 5$
$l_2=\frac{x-x_0}{x_2-x_0} \cdot \frac{x-x_1}{x_2-x_1} = \frac{x-2}{5-2} \cdot \frac{x-3}{5-3}= \frac{1}{6}x^2 - \frac{5}{6} x + 1$

Thus,

$L(x)= \sum_{i=0}^{k} y_i \cdot l_i (x)= 2014 \cdot l_0 + 2698 \cdot l_1 + 4654 \cdot l_2 = 98 x^2+ 194 x+1234$

Using 3 shares and the Lagrange interpolation, that person was able to retrieve Victoria’s secret (i.e. 1234).

Distributed Key Generation

One disadvantage of Shamir Secret Sharing is that one needs to trust the dealer (i.e. the one distributing the shares). The dealer could distribute the wrong shares to some people creating friction in the group, or use their knowledge of the secret to their advantage (as they know the secret before anyone else). A distributed key generation (DKG) alleviates these disadvantages as it enables multiple parties to share a secret without relying on a trusted third party. The DKG used on Anoma is similar to multiple instances of Shamir Secret Sharing running in parallel. The secret to be found is the sum of all the individual secrets (i.e. $b=\sum b_i$). For instance, let us assume that Alice, Bob and Charlie each create a polynomial of degree 1 (i.e. Alice will create $f_A(x) = a_A x + b_A$ , Bob $f_B(x)$, and Charlie $f_C(x)$). At that stage, none of them know the secret $b$, they only know part of the secret (i.e. Alice knows $b_A$, Bob $b_B$ , and Charlie $b_C$ ). Each participant will create at least two shares and distribute them among their friends (represented as dots in figure below). By communicating to each other, they will be able to interpolate the different functions and find the secret $b$ (as seen below) functions and compute the secret $b= \sum b_i$.

Fig 2. None of the participants know the secret $b$ at the beginning of the DKG. By communicating between each other and combining multiple shares, they are able to interpolate the various functions and compute the secret $b$.

The example above highlights this basic concept. In practice, these concepts are combined with other mathematical tools (related to computationally hard problems, such as the elliptic curve discrete logarithm problem) to become more secure. There exist different DKGs, with different enhancements and twists. Thus, in practice, it is a bit more complex, but the basic idea is the same.

Note that at the time of writing, Anoma will run two DKG instances: one will be used for the on-chain pseudorandomness and another one will be used by Ferveo (enabling front-running protection). At some point in the future, the same DKG might be used by both Ferveo and the on-chain pseudorandomness process (both use the same BLS12-381 curve).

Elliptic Curve Digital Signature Algorithm

There are plenty of articles on the Elliptic curve digital signature algorithm (see f.i. here or here). On a high-level, elliptic curve scalar multiplication has the nice property of being almost irreversible (i.e. a one way function). That is to say, it is easy to compute the output of a function given an input, but it is really hard (i.e. computationally intensive, the complexity for recovering the input is exponential to the size of the curve) to compute the input given the output. Based on this idea, the elliptic curve digital signature allows one to have a private key (which is their secret) and a public key. Knowing a private key, it is easy to compute a public key. Conversely, it is extremely difficult to compute a private key given the public key. This model also allows the production of unforgeable signatures. Given a message and a private key, a signature can be produced. Anyone with access to the public key and message can verify the validity of the signature. However, only the person with the knowledge of the private key can create a valid signature (i.e. that can be verified using the message and the associated public key, as seen in figure 3 below).

In practice, DKGs are instantiated with secure cryptographic schemes such as ECDSA (see f.i. how they are combined here).

Boneh-Lynn-Shacham signature

A bilinear pairing allows the construction of very efficient signature schemes among which the Boneh-Lynn-Shacham signature (learn more about bilinear pairings by reading this post from Vitalik). On a high level, Boneh-Lynn-Shacham (BLS) digital signatures have the nice property of being aggregable. In other words, multiple shards of signatures can be combined to form one signature. Thus, one could split the private key into private key shards and distribute them to multiple participants. Given a message $M$ and their private key shard, each participant could create a signature shard. By aggregating these signature shards, a signature can be produced. The validity of this signature can then be verified using the public key and the message $M$ (as seen in figure 4 below). Note that participants can use different messages to avoid potential attacks (e.g. Rogue attack).

We now have a high-level understanding of different concepts that should allow us to understand how a threshold BLS digital signature works.

Threshold Boneh-Lynn-Shacham digital signature

It is possible (given certain conditions) to combine a BLS digital signature with a DKG, and it results in a threshold BLS digital signature. The DKG will be used as a setup phase. Thus, every participant creates a random polynomial $p_i$ with secret $s_i$. Keeping in mind that no one is in possession of the full private key, every participant uses their shard of the private key to compute a public key shard. These public key shards are communicated and verified by each participant. Once each public key shard has been verified, they are combined to form the “full” public key. Thus, at the end of the setup phase, every participant is in possession of a private key shard and the public key. The setup phase is quite complex/expensive, as every participant needs to verify the public key shards of every other participant, so this setup phase should be run seldomly.

The BLS digital signature can then be run. Given a message $M$, every participant can produce a shard of the signature using their private key shards. Participants communicate with each other to gather at least $t$ signatures (to meet the threshold $t$). Once this threshold is reached, they can compute the joint threshold signature. Using the public key and the message $M$, this signature will be verified. If correct, it will be hashed to produce the pseudorandom number. Note that, as no one knows the “full” private key, no one is able to compute the signature of the message $M$ beforehand. The only way to generate the signature is to aggregate signature shards. As this system is cryptographically secure, no one can predict the pseudorandom number, which can only be discovered once the threshold is reached.

As a pseudorandom number is generated every time the BLS threshold signature protocol is run, this construction can be chained so that the output of the algorithm is used as input (i.e. message) for the next round. The first time that the BLS is run, a random message needs to be used. Subsequently, the message used will be the pseudorandom number generated by the previous round (as seen below).

This process enables the creation of a pseudorandom number which is unpredictable, bias-resistant and publicly verifiable. One limitation of this process is that the threshold used in the DKG needs to be higher than two thirds the number of participants (i.e. $t > 2n/3$).

As this process will consume resources (computational power, etc.) it is expected to be integrated into Anoma’s protocol and thus run by validators. A pseudorandom number can then be produced every time a block is created.

Conclusion

Anoma uses a threshold BLS digital signature (combination of a distributed key generation (DKG) and a Boneh-Lynn-Shacham (BLS) digital signature to generate a pseudorandom number. On a high level, participants create a public key and each one generates shards of a private key. Note that the complete private key is not stored anywhere and is therefore not known by anyone. Using their private key shard and a message, participants can create signature shards. By aggregating enough of them (to meet the threshold), they can produce a signature. Using the message and the public key, this signature is then verified. If valid, this signature will be hashed, and this hash will be the pseudorandom number. This pseudorandom number can then be used as the message for the subsequent round. This process enables the creation of a pseudorandom number which is unpredictable, bias-resistant and publicly verifiable. It will be integrated into Anoma's protocol, and produce a pseudorandom number with every new block. This pseudorandomness will be accessible to validity predicates operating on the ledger.