[All references below can be found in refs.html. This link is not repeated below.]
This document presents information about the behaviour of Euclid's algorithm.
This information must be at one's finger tips if an attempt to write a
high speed module is to meet with success.
The sort of facts needed are :-
(i) How large is 'q', the quotient, in the 'mod' operation in
algorithm E, in
eucnotes.html, likely to be.
(ii) How many passes will be required, on average, to complete algorithm E.
Consider the size of 'q' first. Ref. 11 Art. 4.5.3 provides a profound
discussion of this problem.
From page 368 we find :-
Probability of 'q' being 'n' approx=
log2 ((n+1)2 / ((n+1)2 - 1))
(log2 (x) = 1.4427 * ln (x)
will be needed on most calculators)
e.g. Probability (q = 1) = 1.4427 * ln (4/3) = 0.415.
Other values :-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cumulative probabilities can be derived from the same formula :-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Contemplation of these tables leads to 2 important conclusions :-
(a) The Probability of n <= 4 is 0.737. In other words, the bulk of passes in algorithm E will have tiny 'q' values.
For these cases, easily the best technique
for obtaining the 'mod' result,
on Pentium processors,
will be repeated subtraction.
(b) The Probability of n > 16 is 8.25%, or roughly 1 time in 12 - not at all a rare occurrence. For these cases, the use of 'DIV', to obtain the 'mod' result, will probably be a better technique. Any module calculating a 'mod' using subtraction when q >= 32, say, would certainly fail to deliver speed!
Hence thinking is driven towards using 2 mod techniques,
with selection dependent on the value of 'q'.
[As a practical matter, it is impossible to know the value of 'q' in advance (only a DIV can
find this - and avoiding such a call is the objective here).
Given this, the following procedure might be adopted :-
(i) Set a pass counter.
(ii) Do a subtraction + decrement the pass counter.
(iii) If subtraction work is finished, goto next section.
(iv) If the pass counter now zero jump to section to do 'big-q'
calc.
else goto (ii)
A suitable value for the pass counter can be found by experimentation
for best speed. See speed.html for a precise method
for timing software.]
To reinforce observation (b), the probability formula can be used to calculate the mean q-value and the variance about the mean :-
|
(1st Moment) |
(2nd Moment) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[Note that the 'mean' q-value is surprisingly large. This
comes about because the probability distribution has a 'fat tail' - in
other words large q-values occur with sufficient probability to force up
the mean.
Note the 'approx=' in the probability distribution formula above.
This formula is not the last word on the topic - recent researchers
have moved the subject forwards (see Ref 11 section 4.5.3 for a starting
point). Hence, the tables above are indicative only, and the precision
given reflects the formula; but the formula does not faithfully reflect
the underlying problem. However, the tables above are quite satisfactory
for algorithm design.]
Now to consider the number of passes needed. The number of passes is closely related to the expected value of 'q' but with some lesser influences also at work. From above, we see that 'q' from the mod operation will be 3-and-a-bit binary bits, on average. Furthermore, the remainder, after the 'mod' will be from
the set {1, ...(A-1)},
and it seems only reasonable to expect it to be in the middle of this range - a further half binary bit. (Actually the remainder can be shown to be (on average) less than the middle of this range - see Ref 11 page 361.) This leads to a rough guess at the number of passes =
(sum of input bits) / (approx. 3.5).
##########
This section might be skipped. I am going to discuss better estimates of the number of passes. For algorithmic developments, the comments above are adequate.
A 'worst case scenario' for the number of passes in Algorithm E can
be found from the Fibonacci number sequence.
Fibonacci numbers are generated by :-
F0 = 0, F1 = 1, F2 = 1, ...
then Fn = Fn-1 + Fn-2
A few moments thought will tell you that if 2 successive Fibonacci numbers were input to Algorithm E, the sequence of quotients during the 'mod' operation would be {1,1, ... 1} and the algorithm would converge at the slowest possible rate.
[In eucnotes.html I use Fibonacci numbers
F47 = 0xB11924E1 and F46 = 0x6D73E55F
(the 2 largest adjacent pair representable in 32 bits)
for timing this case.
For this pair, 45 passes are needed.]
At the other extreme, if A=1 with B anything then 1 pass is needed, and if A=(B-1) with B anything; then only 2 passes would be needed.
However, typical inputs will lie somewhere between these 2 extremes. From the crude estimate above, with both input values 32 bits, we get :-
the number of passes = 64/3.5 = 18.3.
Two minor aspects of the problem are covered in Reference 11 section 4.5.3 :-
(i) The number of passes varies depending on whether or not
the input values are mutually prime.
(ii) The size of the lesser input value is conventionally used
as the measure of the problem size.
This is done because if the larger value happens to be very
much larger, the first 'mod' will reduce it to below the lesser value.
In other words, the size of the larger can be misleading - no matter how
much larger it is, 1 pass is all that is required to remove the size difference.
Furthermore the natural logarithm of the lesser value is used, as a measure
of problem 'size', in preference to bit index.
Given these assumptions, Ref 11 gives the following approximations :-
(smartphone users may see the letter 't' in place of the 'Greek tau'
symbol I intend in the next few lines)
(a) For mutually prime inputs, number of passes =
tn = 0.843.ln(n) + 1.47 (Eqn 50, section 4.5.3)
(b) For non mutually prime inputs, number of passes =
tn = 0.843.ln(n) + 0.06 (Eqn 59, section 4.5.3)
For this document, finding multiplicative inverses (developed in eucnotes.html) is an important case; and these only exist for mutually prime inputs. Evaluating formula (a) when the lesser value is close to 232 gives the number of passes = 20.16. (i.e. about 3.2 bits per pass)
A case of particular interest, for me, is finding a multiplicative inverse for members of the multiplicative group of a Finite Field. For simple Finite Fields, all arithmetic is done 'mod p', where 'p' is some fixed prime.
The multiplicative group members of such a Field is
the set {1,2, ... (p-1)}.
Such a group is can be proven to be 'cyclic' i.e. generators exist which 'cycle' through the whole set by repeated multiplication.
i.e. if b = (a * generatorn) mod p
['a' fixed, 'n'= {1,2,...(p-1)}]
then the sequence of b-values results in each group member being recovered once only.
Finding a group generator is, in general, difficult, requiring a factorization
of (p-1) being found.
However, for
primes of the form p = (2.q + 1) (with q also prime)
generators are easy to find. The value 0xFFFFFF2F is the largest prime of this form representable in 32 bits, and for this prime 0x2AE60442 is a generator.
[Primes of this form were used by Sophie Germain. She found that Fermat's Last Theorem was correct (with a caveat) for exponents of this form.]
When this prime and generator were used to find 100,000 group members, with each member fed to Euclid's algorithm, then the average number of passes was found to be 19.16.
ln(0xFFFFFF2F) is 22.18,
and averaging the ln() of each group member, gave 21.183.
(The minimum number of passes was 5,
and the maximum 33.)
For this scenario,
tn = 0.843.(ln(prime) - 1) + 1.30
is a better fit.
[A more modern estimate of the number of passes can be found in Ref 3, theorem 2.1.3.
(smartphones may display 'p' where Greek pi=3.14159 is intended in this section)
For pairs of integers in the set {1,2 ... N}
Number of passes = 12.ln(2).ln(N)/p2
When N = 2n, ln(N) = n.ln(2)
and Number of passes = 12.ln2(2).n/p2
which simplifies to Number of passes = 0.584.n
If N = 232, Number of passes = 0.584.32 = 18.69
##########