Ethereum Foundation logo
  • home
  • blog
  • research
  • bounties
  • team
  • events

Cryptanalysis of the Algorand Subset-Sum Hash Function (UPDATED 25th June 2022)

Dmitry Khovratovich | June 25, 2022

1. Introduction

Algorand has proposed a compression function which is lattice-ZKP friendly and which they plan to use inside the Merkle-Damgard framework.

The public constants of the function are matrix AZqn×mA\in Z_q^{n\times m} where q=264,n=8,m=1024q=2^{64}, n=8, m=1024. Let us denote the columns of AA by AiA_i, 0i<m0\leq i <m. Matrix entries should be pseudorandomly generated.

The compression function fAf_A processes the mm-bit input x=(x0,x1,,xm1)\mathbf{x} = (x_0,x_1,\ldots,x_{m-1}) as follows:

fA(x)=i<mxiAif_A(\mathbf{x}) = \sum_{i<m}x_i A_i

with the input interpreted as a nlogq=512n\log q = 512-bit value.

The authors claim one-wayness of the compression function, though not mentioning any concrete security level.

2. Generalized birthday problem

The generalized birthday problem was first formulated by Wagner as follows:

Given lists L1,L2,,L2kL_1,L_2,\ldots,L_{2^k} of binary nn-bit strings find liLil_i\in L_i such that

l1l2l2k=(000).l_1\oplus l_2 \oplus \cdots l_{2^{k}} = (00\cdots 0).

For k=1k=1 it is the standard birthday problem and a solution can be found in 2n/22^{n/2} time (for that big lists). It is less obvious that for k>1k>1 one can do better than 2n/22^{n/2}, concretely in 2nk+12^{\frac{n}{k+1}} time. The idea can be illustrated for k=2k=2:

  • Find 2n/32^{n/3} \emph{partial collisions} between L1L_1 and L2L_2 i.e. strings that collide in the first n/3n/3 bits. This can be done in 2n/32^{n/3} time by taking 2n/32^{n/3} elements from both lists and sorting them by the first bits. Put collisions and their components into new list X={(l1l2,l1,l2)}X = \{(l_1\oplus l_2,l_1,l_2)\}
  • Repeat the same for lists L3L_3 and L4L_4 and obtain list YY.
  • Find tuples (x,l1,l2)X(x,l_1,l_2)\in X and (y,l3,l4)Y(y,l_3,l_4)\in Y such that x=yx=y. This is feasible since both xx and yy are zero in the first n/3n/3 bits so we need only 2n/32^{n/3} tuples in both lists to find a collision on the remaining 2n/32n/3 bits.

Wagner's attack illustration

The same approach works for bigger kk. Note though that there is no such algorithms for 3 lists L1,L2,L3L_1,L_2,L_3 and the best attack is still O(2n/2)O(2^{n/2}).

Finally remark that the process is memory-heavy for k>1k>1. This fact is used in the memory-hard proof-of-work Equihash, used in Zcash.

3. Attack on the Algorand hash

The compression function described in Section 1 is vulnerable to a modification of the generalized birthday attack. Let us find a collision:

fA(x)=fA(y)f_A(\mathbf{x})=f_A(\mathbf{y})

We do as follows:

  • Split the x,y\mathbf{x},\mathbf{y} into 16 64-bit chunks x1,x2,,x16,y1,y2,,y16\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_{16},\mathbf{y}_1,\mathbf{y}_2,\ldots,\mathbf{y}_{16}.
  • Interpret the output fA(x)f_A(\mathbf{x}) as a 6-tuple (a1,a2,a3,a4,a5,a6)(a_1,a_2,a_3,a_4,a_5,a_6) where a1a_1 is 32-bit and all other aia_i are 96-bit.
  • For all 264+642^{64+64} pairs (xi,yi),i<8(\mathbf{x}_{i},\mathbf{y}_{i}), i<8, find 264+6432=2962^{64+64-32}=2^{96} collisions for fAf_A in a1a_1, i.e. solutions for
fA(xi)+fA(yi)=(0,,,,,)f_A(\mathbf{x}_{i})+f_A(\mathbf{y}_{i})=(0,*,*,*,*,*)

spending 2962^{96} time and space for each ii using list sorting for birthday paradox. Store all solutions in lists LiL_i.

  • For each j<8j<8 find partial collisions between L2j+1L_{2j+1} and L2j+2L_{2j+2} in a2a_2:
zL2j+1+zL2j+2=(0,0,,,,)\underbrace{z}_{\in L_{2j+1}}+\underbrace{z'}_{\in L_{2j+2}} = (0,0,*,*,*,*)

Note that since both zz and zz' are 0 in a1a_1 they sum to 0 in it. The number of partial collisions between L2j+1L_{2j+1} and L2j+2L_{2j+2} is 296+9696=2962^{96+96-96}=2^{96}. Store the results in 8 lists LkL_k.

  • We now find 2962^{96} partial collisions between pairs of lists in a3a_3 and obtain 4 lists. Then proceed the same way with a4a_4 and get two lists.
  • Find a single collision between the two lists in a5a_5 and a6a_6 at cost 2962^{96}. It yields
fA(xi)+fA(yi)=0    fA(x)=fA(y)\sum f_A(\mathbf{x}_{i})+f_A(\mathbf{y}_i)=0\;\Leftrightarrow\; f_A(\mathbf{x}) = f_A(\mathbf{y})

i.e. a collision.

Overall the collision attack costs 2982^{98} time (it is not necessary to work on all the lists simultaneously) and thus the overall security of the subset sum hash is at most 98 bits in the time cost model.

UPDATE (25 June 2022)

4. Bug in Section 3 and its fix

The Algorand team has kindly reported us a flaw in Section 3. Concretely, if one merges fA(xi)f_A(\mathbf{x}_i) and yi\mathbf{y}_i in the first step of the attack than due to the linearity of fAf_A the number of possible pairs is 3643^{64} rather than 4644^{64}, which makes the search for 2962^{96} partial collisions more expensive.

The simple fix to this flaw is to merge instead fA(x1)f_A(\mathbf{x}_1) with fA(x2)f_A(\mathbf{x}_2), then fA(x3)f_A(\mathbf{x}_3) with fA(x4)f_A(\mathbf{x}_4), so that the inputs activate different scalars in AA. When repeating for fA(yi)f_A(\mathbf{y}_i), one should target different collision bits to avoid having x=y\mathbf{x}=\mathbf{y}. The rest of the attack remains the same with the same complexity estimate.

5. Algorand internal analysis

In response to our original post, the Algorand team has published an internal analysis. The report investigates the complexity of the Wagner attack implemented on a quantum computer. For collision search the report estimates the quantum attack complexity as 21082^{108} time and 2402^{40} memory. The same document also gives the complexity of the classical Wagner attack as 21072^{107} time and 2852^{85} memory, which is a variation (another point on the time-area tradeoff curve) of our attack in Section 3, assuming our bugfix above.

Acknowledgements

We thank Chris Peikert for fruitful discussions that has led to the attack refinements.

cryptography@ethereum.org

© 2024 Ethereum Foundation. All rights reserved.