ImperialViolet

Thanks to Alexander Sotir... (15 Mar 2009)

Thanks to Alexander Sotirov to pushing me to check that the carry chains in donna-c64 were sufficient. I don't know if I realised something when I wrote it which I'm currently missing, or if I just screwed up, but I now believe that they're wrong.

I wrote this Haskell code to check it:

This Haskell code has been written to experiment with the carry chains in
curve25519-donna-c64. It's a literate Haskell program, one can load it into
GHCI and play along.

> module Main where
>
> import Data.Bits (shiftR, (.&.))

There are two constants that we'll need.

Our five limbs are, nominally, 51 bits wide, so this is the maximum value of
their initial values.

> twoFiftyOneMinusOne = (2 ^ 51- 1

2^128 - 1 is the limit of the range of our temporary variables. If we exceed
this at any point, our calculations will be incorrect.

> two128MinusOne = (2 ^ 128- 1

Now we define a type which mimics our 128-bit unsigned type in C. It's a
disjuction of an Integer and the distinguished value 'Overflow'. 'Overflow' is
contagious: if we try to perform any operations where one or both of the
operands is 'Overflow', then the result is also 'Overflow'.

> data U128 = U128 Integer
>           | Overflow
>           deriving (Show, Eq)

We make U128 an instance of Num so that we can perform arithmetic with it.

> instance Num U128 where
>   (U128 a) + (U128 b) = mayOverflow (a + b)
>   _ + _ = Overflow
>   (U128 a) * (U128 b) = mayOverflow (a * b)
>   _ * _ = Overflow
>   (U128 a) - (U128 b) = mayOverflow (a - b)
>   _ - _ = Overflow
>   negate _ = Overflow
>   abs a@(U128 _) = a
>   abs _ = Overflow
>   signum (U128 _) = 1
>   signum _ = 0
>   fromInteger = mayOverflow

> instance Ord U128 where
>   compare (U128 a) (U128 b) = compare a b
>   compare _ _ = EQ

This function lifts an Integer to a U128. If the value is out of range, the
result is 'Overflow'

> mayOverflow :: Integer -> U128
> mayOverflow x
>   | x > two128MinusOne = Overflow
>   | x < 0 = Overflow
>   | otherwise = U128 x

Our field elements consist of five limbs. In the C code, these limbs are
actually uint64_t's, but we keep them as U128's here. We will convince ourselves
that we don't hit any 64-bit overflows later.

> data FieldElement = FieldElement { m0 :: U128, m1 :: U128, m2 :: U128,
>                                    m3 :: U128, m4 :: U128 }
>                                  deriving (Show, Eq)

Now, two helper functions:

This function takes only the bottom 51-bits of a value

> clamp :: U128 -> U128
> clamp (U128 a) = U128 $ a .&. 0x7ffffffffffff
> clamp _ = Overflow

This function drop the bottom 51-bits of a value

> topBits :: U128 -> U128
> topBits (U128 a) = U128 $ a `shiftR` 51
> topBits _ = Overflow

This function simulates the 'fsquare' function in donna-c64, including its carry
chain. If the carry chain is sufficient, then iterating this function for any
valid initial value should never overflow.

> square :: FieldElement -> FieldElement
> square e = result where
>   t0 = m0 e * m0 e
>   t1 = m0 e * m1 e +
>        m1 e * m0 e
>   t2 = m0 e * m2 e +
>        m2 e * m0 e +
>        m1 e * m1 e
>   t3 = m0 e * m3 e +
>        m3 e * m0 e +
>        m1 e * m2 e +
>        m2 e * m1 e
>   t4 = m0 e * m4 e +
>        m4 e * m0 e +
>        m3 e * m1 e +
>        m1 e * m3 e +
>        m2 e * m2 e
>   t5 = m4 e * m1 e +
>        m1 e * m4 e +
>        m2 e * m3 e +
>        m3 e * m2 e
>   t6 = m4 e * m2 e +
>        m2 e * m4 e +
>        m3 e * m3 e
>   t7 = m3 e * m4 e +
>        m4 e * m3 e
>   t8 = m4 e * m4 e
>
>   t0' = t0 + t5 * 19
>   t1' = t1 + t6 * 19
>   t2' = t2 + t7 * 19
>   t3' = t3 + t8 * 19
>
>   t1'' = t1' + topBits t0'
>   t2'' = t2' + topBits t1''
>   t3'' = t3' + topBits t2''
>   t4' = t4 + topBits t3''
>   t0'' = t0' + 19 * topBits t4'
>   t1''' = clamp t1'' + topBits t0''

At this point, we implement two carry chains. If 'currentChain' is true, then we
implement the carry chain as currently written in donna-c64. Otherwise, we
perform an extra step and carry t1 into t2.

>   result = if currentChain
>               then FieldElement (clamp t0'') t1''' (clamp t2'') (clamp t3'')
>                                 (clamp t4')
>               else FieldElement (clamp t0'') (clamp t1''') t2''' (clamp t3'')
>                                 (clamp t4') where
>                    t2''' = clamp t2'' + topBits t1'''

This is the maximum initial element: an element where all limbs are 2^51 - 1.
Inspection of the 'fexpand' function should be sufficient to convince oneself of
this.

> maxInitialElement :: FieldElement
> maxInitialElement = FieldElement twoFiftyOneMinusOne twoFiftyOneMinusOne
>                                  twoFiftyOneMinusOne twoFiftyOneMinusOne
>                                  twoFiftyOneMinusOne

This function takes two field elements and returns the worst case result: one
where the maximum of each limb is chosen.

> elementWiseMax :: FieldElement -> FieldElement -> FieldElement
> elementWiseMax x y = FieldElement (f m0) (f m1) (f m2) (f m3) (f m4) where
>   f :: (FieldElement -> U128) -> U128
>   f accessor = max (accessor x) (accessor y)

We now define a series of values generated by squaring the previous element and
setting any limb that is less than the maximum to the maximum value.

> maxSeries = iterate (elementWiseMax maxInitialElement . square)
>                     maxInitialElement

This value controls which carry chain is used in 'square', the current one or
the one with the extra carry

> currentChain = True

By running this, we can see that the current carry chain is insufficient for
this simulation:

ghci> maxSeries !! 4
FieldElement {m0 = Overflow, m1 = Overflow, m2 = Overflow, m3 = Overflow,
              m4 = Overflow}

The series overflows after only four iterations. However, if we use the
alternative carry chain, the series is stable far beyound the requirements of
the Montgomery ladder used in donna-c64:

ghci> maxSeries !! 100000
FieldElement {m0 = U128 2251799813685247, m1 = U128 2251799813685247,
              m2 = U128 2251799813685247, m3 = U128 2251799813685247,
              m4 = U128 2251799813685247}

Additionally, these values are small enough not to overflow the 64-limb limbs.