 
                                 ZIP
          A simple, efficient pseudo-random number generator
 
 
                             Chris Barker
                      barker@ling.rochester.edu
 
 
SUMMARY: I call this generator `zip' because it can be implemented by
an algorithm which reminds me of a zipper, and also because it is
relatively speedy.  Like the 1/p generator [Knuth, BBS] and the x^2
mod n generator [BBS], Zip is based on discrete logs, and can serve as
a "random access" generator (the ith element of a sequence can be
efficiently computed directly without computing intervening elements).
Blum, Blum, and Shub [BBS] show that although the 1/p generator is not
cryptographically secure, the x^2 mod n generator (though much slower)
is secure, modulo certain intractability assumptions.  Zip is an
attempt to find a variant of the 1/p generator which is not only
simple and which has an efficient implementation, long periods, and
good statistical properties, but which is also cryptographically
secure.  One reason that Zip may be more secure than the 1/p generator
is that unlike the 1/p generator, each bit of the output depends on
every bit of the previous state.  Unfortunately, I don't have a proof
that Zip is secure.  On the other hand, for modulus lengths over 128
bits, Zip is at least an order of magnitude faster than the x^2 mod n
generator.
 
Topics addressed in this document: the generator; an efficient
algorithm; time cost; comparison with the 1/p generator; comparison
with the x^2 mod n generator; period; security; and two sample
software implementations.
 
[Version of 6 April 1995.  Available via anonymous ftp as
ftp://ling.rochester.edu/pub/barker/Crypto/zip.txt.]
 
=========================================================================
 
 
THE GENERATOR: Let p be an integer greater than 2.  Then Zip(p) is the
sequence of integers x_0, x_1, x_2, ... given by (1):
 
  (1)  x_i = (2 ^ (i * (p - |p| - 1))) mod p
 
Here |p| is the length of p when p is represented in binary digits
(i.e. |p| = 1 + the integer part of log_2 p), and the mod operation
always gives the least non-negative residue.
 
Examples: if p is 5, then the sequence generated by zip is 1, 2, 4, 3,
1, 2, 4, 3, ...  Zip(29) is 1, 10, 13, 14, 24, 8, 22, 17, 25, 18, 6,
2, 20, 26, 28, 19, 16, 15, 5, 21, 7, 12, 4, 11, 23, 27, 9, 3, 1, 10, ...
 
Note that the ith element of the sequence can be calculated
efficiently without first calculating any previous elements of the
sequence.  However, given some element x_i, it is not necessary to
know i in order to extend the sequence forwards and backwards, since
x_(i+1) = (x_i * (2 ^ (p - |p| - 1))) mod p.  When calculating
elements of the sequence in order, we can call the current x_i the
STATE of the generator.  I will assume that it is infeasible in
general to calculate i given x_i, since this is an instance of the
indexing problem for discrete logarithms.
 
Choose some p and consider a subsequence whose first element is x.
Obviously, if both p and x are known, the subsequence can easily be
generated.  Therefore for cryptographic applications, only the
low-order bit or bits of the state will be used as the output of the
generator, and the remaining high-order bits will be kept secret.
(There may also be a security advantage to keeping p secret.)  Thus
for some m, x_i mod m will be the output of the generator.  Since the
identity of x determines the output of the generator, x mod m can be
thought of as an initialization vector and x/m can be thought of as
the secret key.  For example, choose p = 5387, x = 2866, and m = 64.
Then the initialization vector is 2866 % 64 = 50 and the secret key is
2866/64 % = 44.  In base 2 notation, we have:
 
           key
          ------
     x =  101100110010
                ------
                 initialization vector
 
Since 2^(5387 - 13 - 1) mod 5387 = 2923, the next element in the
sequence is (2866 * 2923) mod 5387 = 533.  Thus the subsequence
starting with 2866 begins like this:
 
              secret
              state   output
              ------  ------
  x = 2866  = 101100  110010
       533  = 001000  010101
      1116  = 010001  011100
      2933  = 101101  110101
      2442  = 100110  001010
       191  = 000010  111111
      3432  = 110101  101000
      1142  = 010001  110110
      3513  = 110110  111001
       877  = 001101  101101
 
I will assume that a subsequence cannot be extended reliably without
(in effect) knowing the state for some element of the subsequence.  If
so, then the security of the generator depends on the difficulty of
determining the state based on observation of the output of the
generator.
 
 
AN EFFICIENT ALGORITHM:
 
  1. x = x - 1
  2. repeat |p| times:
       x = (x + ((1 - (x mod 2)) * p)) / 2
  3. x = x + 1
  4. go to step 1.
 
I will ignore steps (1) and (3), since they can be omitted without
affecting any relevant property of the generator.  In step (2), each
iteration of the loop replaces x with an adjacent element in the
sequence of powers of 2 mod p; more specifically, if x = (2 ^ i) mod p,
then ((x + ((1 - (x mod 2)) * p)) / 2) = (2 ^ (i - 1)) mod p.  Thus
after |p| repetitions, x will be replaced with (2 ^ (i - |p|)) mod p.
In other words, the algorithm samples the list of the residues of the
powers of 2 at intervals of every |p| elements.
 
The following diagram will show why this algorithm reminds me of a
zipper.  The diagram represents the calculation of the first three
elements (as listed above) of Zip(29):
 
                                            0   (A)
                                     +  11101
                                     --------
                                        11101
                                    +  11101
                                    ---------
                                      1010111
                                  +  11101
                                  -----------
                                    100111111   (B)
                               +  11101
                               --------------
                                 100001111111
                              +  11101
                              ---------------
                                1011011111111
                             +  11101
                             ----------------
                               11001111111111  (C)
 
In line (A), we have x_0 - 1 = 0 (cf. step (1) in the algorithm).
Note that in lines (B) and (C), the leftmost (higher-order) digits
give x_1 - 1 = 9 and x_2 - 1 = 12.  The idea of each addition is to
shift and add a copy of p to the working number so as to turn the
rightmost zero into a 1, resulting in a longer and longer string of
1's on the right.  Imagining the 0's as gaps and the 1's as teeth,
what the algorithm does is fit the teeth into the gaps like zipping up
an endless zipper.
 
 
TIME COST: Assume that finding a minimal residue mod 2 (i.e., deciding
whether a number is even or odd) and division by 2 can be done in
negligible time.  Since x never grows larger than p, step (2) requires
at most (|p| * |p|/2) bit operations on average.  Note that Zip could
be implemented in hardware with little more than adder circuitry.
 
This time cost might be reduced further by careful choice of p.  For
instance, consider p = (2^128) + 12451.  Since p is mostly zero's (in
base 2), an optimized adder would typically have considerably fewer
than |p| bit operations during each addition operation.  However, this
may reduce security.  To see why, observe that what Zip does to
produce the next state is mix the bits of the current state with (a
function of) p.  Reducing the entropy in p presumably reduces the
entropy mixed in to the new state.
 
 
COMPARISON WITH THE 1/p GENERATOR: Assume p is prime.  In step (2) of
the algorithm, if we output the least significant bit of the state
(i.e., x mod 2) for each iteration of the loop, the output gives the
quotient digits of 1/p (in base 2, and in reverse order).  Blum, Blum,
and Shub [BBS] showed that the 1/p PRNG is not secure: given (2 * |p|)
+ 1 bits of output, it is possible to reconstruct p as well as the
state of the generator, and thus to efficiently extend the sequence to
the left and to the right.  However, each output bit of the 1/p
generator can be predicted by examining a single bit of the state
(namely, the least significant bit).  In contrast, for Zip, each
output bit depends on all of the previous state bits, so my hope is
that Zip is more secure than the 1/p generator.
 
 
COMPARISON WITH THE x^2 mod n GENERATOR: The x^2 mod n generator
outputs the low-order bits of k^(2 * i) mod n, where i > 0, n = p*q,
and p and q are each congruent to 3 mod 4.  One advantage of Zip over
the x^2 mod n generator is that Zip is more efficient.  For each
log(log(n)) bits of output, the x^2 mod n generator requires a
multiplication (more specifically, squaring) modulo n.  Assuming that
we use the low-order half of the state as output for Zip, each
multiplication mod p yields |p|/2 bits.  Thus for a modulus of 128
bits, Zip is at least 64/7 faster (almost a full order of magnitude).
Furthermore, since in the algorithm described above for Zip finding
residues modulo p corresponds to division by 2, this constitutes a
speedup of another factor of 2.  As the length of the modulus
increases, of course, the speed advantage of Zip over the x^2 mod n
generator also increases.  A second advantage of Zip over the x^2 mod
n scheme, as we will see immediately below, is that Zip can guarantee
a maximally long period for all choices of an initial state.
 
 
PERIOD: Fix p, and assume that p is prime and that 2 generates the
prime field Z/pZ (i.e., for every integer x such that 0 < x < p there
exists some i such that x = (2 ^ i) mod p).  Let k = 2^(p - |p| - 1)
mod p.  Then the period of Zip(p) will be maximal just in case k also
generates the prime field Z/pZ.  If so, then each number between 0 and
p will appear in the sequence exactly once before the sequence
repeats.  For instance, as in the example above, when p = 29, k = 10;
since the powers of 10 are members of all of the residue classes mod
29, Zip(29) has a period of length 28.  Similarly, when p = (2^128) +
12451, the period of Zip(p) is p - 1.  One advantage of choosing p to
have a maximal period is that all choices for an initial state have
the same status with respect to the generator, that is, they are
guaranteed to be members of the same sequence.  Thus any random choice
of starting state is in effect a pointer to an unpredictable location
in that sequence (unpredictable modulo solving the indexing problem
for discrete logarithms).  Furthermore, a maximal period guarantees
that the output will be evenly distributed statistically across the
(lower bits of) all numbers less than p.
 
 
SECURITY: Zip is an instance of a more general type of generator given
in (3).
 
   (3)  x_i = (2 ^ (i * (p - r))) mod p
 
When r = |p| + 1, the result is Zip.  When r = 2 (assuming only the
first low-order bit is used as output), the result is the 1/p
generator.  Since the 1/p generator is not secure, this means that any
security for such generators must depend in part on special properties
of r (more specifically, on the relationship of r to p).  Note that
using an algorithm analogous to the one described above for Zip, the
time cost of computing the next element in the sequence for a
generator as in (3) is directly proportional to the size of r.
 
Let k = (2 ^ (p - |p| - 1)) mod p.  In addition to requiring that p be
prime, and that 2 and k generate all of the residue classes modulo p,
there may be security advantages to imposing other requirements on p
and k.  In particular, I have noticed through visual inspection of
plots of x_i against i that there may be quasi-periods related to the
greatest common divisor of k and totient(p) (i.e., gcd(k, p - 1)).
Therefore it may be a good idea to choose p so that gcd(k, p - 1) = 1.
One way to try to avoid other interactions with the factors of
totient(p) is to choose p such that p/2 is also prime.  It may also be
helpful to ensure that k > p/2.  Choices for p which (probably) have
all these properties include 5387, 6899, 66107, (2^30) + 4819,
(2^128) + 12451 and 33873952540991185077834780274045047419.
 
Apart from use as a PRNG, Zip can be thought of as providing a class
of functions for mixing bits (one function for each choice of p), and
as such can provide one component of more complex crypto-systems.  For
instance, consider a two-round version of Zip, which might be
constructed as follows: take the output from one Zip sequence and use
each element as the initial state of a second Zip generator.  That is,
if zip(x, p) returns the next element after x in the sequence Zip(p),
the next element after x given by the compound generator would be
zip(zip(x, p), q).  Presumably two rounds would make it more difficult
to determine p, let alone the secret bits of the state of the first
generator.
 
 
APPENDIX A: C Code
 
/*
This appendix contains a complete C program which calculates zip
sequences.  It is limited to p's of less than 32 bits in length.
The input to the program should be p, the start state, the number of
elements in the sequence to be generated, and a modulus to use for
extracting the low-order bits of each element of the sequence.  For example:
 
   % echo 1073746643 1 5 65536 | a.out
           11
         5034
         6492
        14686
        63293
*/
 
main() {
  unsigned long int p, x, i, m, j;
  scanf("%d%d%d%d", &p, &x, &i, &m);
  for (;i;i--) {
    x = x - 1;
    for (j=p; j; j = j/2) x = (x + ((1 - (x % 2)) * p)) / 2;
    x = x + 1;
    printf("%13d\n",x%m);
  }
}
 
 
APPENDIX B: BC code
 
/*
This appendix contains code for bc.  It is intended to work with any
POSIX-compliant bc interpreter; in any case, it works under gnu bc
version 1.02.  For example:
 
   % bc zip.bc
   bc 1.02 (Mar 3, 92) Copyright (C) 1991, 1992 Free Software Foundation, Inc.
   This is free software with ABSOLUTELY NO WARRANTY.
   For details type `warranty'.
   z((2^128) + 12451, 1, 5, 2^64)
   17248172396283567379
   784527369116372560
   5509101724713233130
   11007674396980137398
   6333912974857180050
   0
*/
 
define z(p,x,i,m) {
  auto j
  for(i=i;i;i--) {
    x = x - 1
    for(j=p;j;j=j/2) x = (x + ((1 - (x % 2)) * p)) / 2
    x = x + 1
    x%m
  }
}
 
 
REFERENCES:
 
[BBS] L. Blum, M. Blum, M. Shub (1986), `A simple unpredictable
pseudo-random number generator', _SIAM Journal on Computing_, 15:2,
364--383.
 
[Knuth] D. Knuth (1981), _The Art of Computer Programming: Vol. 2,
Seminumerical Algorithms_, Addison-Wesley, Reading, MA.