The Convergence Rate
of Euclid's algorithm

[All references below can be found in refs.html. This link is not repeated below.]

This document presents information about the behaviour of Euclid's algorithm. This information must be at one's finger tips if an attempt to write a high speed module is to meet with success.
The sort of facts needed are :-
  (i) How large is 'q', the quotient, in the 'mod' operation in algorithm E, in eucnotes.html, likely to be.
  (ii) How many passes will be required, on average, to complete algorithm E.

Consider the size of 'q' first. Ref. 11 Art. 4.5.3 provides a profound discussion of this problem.
  From page 368 we find :-

Probability of 'q' being 'n' approx=
log2 ((n+1)2 / ((n+1)2 - 1))
(log2 (x) = 1.4427 * ln (x)
will be needed on most calculators)

e.g. Probability (q = 1) = 1.4427 * ln (4/3) = 0.415.

Other values :-

n
P(q = n)
1
0.415
2
0.1699
3
0.09311
4
0.05889
5
0.0406
6
0.02975
7
0.02272
8
0.01792
9
0.01450
10
0.01197
11
0.01005
12
0.00856

Cumulative probabilities can be derived from the same formula :-

n
P(q <= n)
P(q > n)
1
0.415
0.585
2
0.585
0.415
4
0.737
0.263
8
0.848
0.152
16
0.918
0.0825
32
0.957
0.0431
64
0.978
0.0220
128
0.989
0.0111
256
0.9944
0.00560
512
0.9972
0.00281
1024
0.9986
0.00141

Contemplation of these tables leads to 2 important conclusions :-

  (a) The Probability of n <= 4 is 0.737. In other words, the bulk of passes in algorithm E will have tiny 'q' values.

For these cases, easily the best technique
for obtaining the 'mod' result,
on Pentium processors,
will be repeated subtraction.

  (b) The Probability of n > 16 is 8.25%, or roughly 1 time in 12 - not at all a rare occurrence. For these cases, the use of 'DIV', to obtain the 'mod' result, will probably be a better technique. Any module calculating a 'mod' using subtraction when q >= 32, say, would certainly fail to deliver speed!

Hence thinking is driven towards using 2 mod techniques,
with selection dependent on the value of 'q'.

[As a practical matter, it is impossible to know the value of 'q' in advance (only a DIV can find this - and avoiding such a call is the objective here).
Given this, the following procedure might be adopted :-
  (i) Set a pass counter.
  (ii) Do a subtraction + decrement the pass counter.
  (iii) If subtraction work is finished, goto next section.
  (iv) If the pass counter now zero jump to section to do 'big-q' calc.
          else goto (ii)
A suitable value for the pass counter can be found by experimentation for best speed. See speed.html for a precise method for timing software.]

To reinforce observation (b), the probability formula can be used to calculate the mean q-value and the variance about the mean :-

Sample
Sample mean q 
(1st Moment)
variance about mean
(2nd Moment)
q = {1,2, ...128}
5.585
173.01
q = {1,2, ... 256}
6.571
355.73
q = {1,2, ... 512}
7.564
723.031
q = {1,2, ... 1024}
8.560
1459.32
q = {1,2, ... 2048}
9.552
2920.6

[Note that the 'mean' q-value is surprisingly large. This comes about because the probability distribution has a 'fat tail' - in other words large q-values occur with sufficient probability to force up the mean.

  Note the 'approx=' in the probability distribution formula above.
This formula is not the last word on the topic - recent researchers have moved the subject forwards (see Ref 11 section 4.5.3 for a starting point). Hence, the tables above are indicative only, and the precision given reflects the formula; but the formula does not faithfully reflect the underlying problem. However, the tables above are quite satisfactory for algorithm design.]

Now to consider the number of passes needed. The number of passes is closely related to the expected value of 'q' but with some lesser influences also at work. From above, we see that 'q' from the mod operation will be 3-and-a-bit binary bits, on average. Furthermore, the remainder, after the 'mod' will be from

the set {1, ...(A-1)},

and it seems only reasonable to expect it to be in the middle of this range - a further half binary bit. (Actually the remainder can be shown to be (on average) less than the middle of this range - see Ref 11 page 361.) This leads to a rough guess at the number of passes =

(sum of input bits) / (approx. 3.5).

##########

This section might be skipped. I am going to discuss better estimates of the number of passes. For algorithmic developments, the comments above are adequate.

A 'worst case scenario' for the number of passes in Algorithm E can be found from the Fibonacci number sequence.
  Fibonacci numbers are generated by :-

F0 = 0, F1 = 1, F2 = 1, ...
then Fn = Fn-1 + Fn-2

A few moments thought will tell you that if 2 successive Fibonacci numbers were input to Algorithm E, the sequence of quotients during the 'mod' operation would be {1,1, ... 1} and the algorithm would converge at the slowest possible rate.

[In eucnotes.html I use Fibonacci numbers
    F47 = 0xB11924E1 and F46 = 0x6D73E55F
(the 2 largest adjacent pair representable in 32 bits)
for timing this case.
    For this pair, 45 passes are needed.]

At the other extreme, if A=1 with B anything then 1 pass is needed, and if A=(B-1) with B anything; then only 2 passes would be needed.

However, typical inputs will lie somewhere between these 2 extremes. From the crude estimate above, with both input values 32 bits, we get :-

the number of passes = 64/3.5 = 18.3.

Two minor aspects of the problem are covered in Reference 11 section 4.5.3 :-
  (i) The number of passes varies depending on whether or not the input values are mutually prime.
  (ii) The size of the lesser input value is conventionally used as the measure of the problem size.
  This is done because if the larger value happens to be very much larger, the first 'mod' will reduce it to below the lesser value. In other words, the size of the larger can be misleading - no matter how much larger it is, 1 pass is all that is required to remove the size difference. Furthermore the natural logarithm of the lesser value is used, as a measure of problem 'size', in preference to bit index.

Given these assumptions, Ref 11 gives the following approximations :-
(smartphone users may see the letter 't' in place of the 'Greek tau' symbol I intend in the next few lines)

  (a) For mutually prime inputs, number of passes =

tn = 0.843.ln(n) + 1.47    (Eqn 50, section 4.5.3)

  (b) For non mutually prime inputs, number of passes =

tn = 0.843.ln(n) + 0.06   (Eqn 59, section 4.5.3)

For this document, finding multiplicative inverses (developed in eucnotes.html) is an important case; and these only exist for mutually prime inputs. Evaluating formula (a) when the lesser value is close to 232 gives the number of passes = 20.16. (i.e. about 3.2 bits per pass)

A case of particular interest, for me, is finding a multiplicative inverse for members of the multiplicative group of a Finite Field. For simple Finite Fields, all arithmetic is done 'mod p', where 'p' is some fixed prime.

The multiplicative group members of such a Field is

the set {1,2, ... (p-1)}.

Such a group is can be proven to be 'cyclic' i.e. generators exist which 'cycle' through the whole set by repeated multiplication.

i.e. if b = (a * generatorn) mod p
['a' fixed, 'n'= {1,2,...(p-1)}]

then the sequence of b-values results in each group member being recovered once only.

Finding a group generator is, in general, difficult, requiring a factorization of (p-1) being found.
However, for

primes of the form p = (2.q + 1)   (with q also prime)

generators are easy to find. The value 0xFFFFFF2F is the largest prime of this form representable in 32 bits, and for this prime 0x2AE60442 is a generator.

[Primes of this form were used by Sophie Germain. She found that Fermat's Last Theorem was correct (with a caveat) for exponents of this form.]

When this prime and generator were used to find 100,000 group members, with each member fed to Euclid's algorithm, then the average number of passes was found to be 19.16.

ln(0xFFFFFF2F) is 22.18,
and averaging the ln() of each group member, gave 21.183.
(The minimum number of passes was 5,
and the maximum 33.)

For this scenario,

tn = 0.843.(ln(prime) - 1) + 1.30

is a better fit.

[A more modern estimate of the number of passes can be found in Ref 3, theorem 2.1.3.
(smartphones may display 'p' where Greek pi=3.14159 is intended in this section)

For pairs of integers in the set {1,2 ... N}
Number of passes = 12.ln(2).ln(N)/p2

When N = 2n,   ln(N) = n.ln(2)
and Number of passes = 12.ln2(2).n/p2
which simplifies to Number of passes = 0.584.n

If N = 232, Number of passes = 0.584.32 = 18.69

##########



[go to site Home page]