PREAMBLE.
The Internet has problems.
It is not fanciful to imagine that the benefit that it has brought worldwide might be undermined in reaction to it's malignancies such as hacking, fake identities, denial of service attacks, black propaganda, the transmission of malignant virus's and others.
This document is a proposal to restrict these malignancies.
In the event that it is not adopted, hopefully it might spur on other concerned people to produce better solutions.
What are the root causes of the malignancies?
(A) The Internet enables users to be disconnected from personal responsibility.
(B) Internet communications have too little privacy.
Solving the personal responsibility problem.
The solution to this problem is well known and easily stated :-
Anyone indulging in damaging behaviour over the Internet, has to be made certain that they could be identified, suffer ostracism by their peers and possibly have their access restrained or even removed.
In other words a forensic trail to culprits is needed.
Once detection becomes likely, much scallywaggery will stop.
[I deliberately reject any role for legislatures and enforcement officers; since the Internet is a world-wide construction, getting world-wide legislation in place would defeat Solomon.]
Techniques to do this were developed 30 yrs ago and require the setting up of an Internet-wide public-key system (PKS).
PKS systems enable the connection
of a unique public and private key pair
[both very large integers]
to each user and these
act as a digital surrogate for the user.
Each user's private key is
held by, and accessible to, each user alone.
The "public key"
[another very large integer - related to the private key,
but computationally hard to find from it]
is made available to anyone, on demand,
from public key-holding companies.
Once obtained it can be used to securely
communicate with the user holding the private key
and prevent eavesdropping.
No one (not holding the private key
paired to the public key)
can decrypt and examine the communication.
[Note that what is called a "Public Key Infrastructure" (P.K.I.) (public key-holding and servicing companies) has to be waiting in the backgound.
The P.K.I. requires capital to set up and a regular income stream to maintain it -
i.e. has to be paid for.
A business plan to support these companies is needed, but I do not discuss this here.
Note that public keys, once obtained by a user, might be saved into "contact lists", just as with mobile phone numbers. Hence the rate of users contacting the P.K.I. companies will reduce after an early surge.]
In general, public-key ciphers are slow and are used to set up session keys for a second, much faster, cipher system that does the bulk of the communication encryption and decryption work.
The Stream Cipher, below, is my candidate for this secondary task.
[With an Internet-wide cipher, the approach above may not be satisfactory for short messages e.g. text messaging. For such items the public key cipher, alone, might be quicker.
Better yet, some ciphers (like Jefferson wheels) are capable of being extended and configured in a very large number of ways. It would be possible to have sender and receiver exchange, say, a 128 bit random number and use this to build a unique (virtual) Jefferson wheel and use this for text messaging.
[With schemes like this, the 'key' is partly fixed in the configuration and this part is subject to attrition. So long as each users configuration was time limited and anyway resettable, moderate security with high speed would result.]
There are many alternatives, but I cannot discuss them here.]
With a little more effort, each communication
can optionally be "signed" by the sender
with this digital signature
very close to non-counterfeitable
- such a digital signature has far more resistance
to counterfeiting than a written signature.
Signing, in this way,
enables the originator of any communication
to be uniquely identified and
only the intended recipient can decrypt it.
This mode of operation is ideal
for minor contract negotiations.
If the communication causes damage (fraud or a malignant attachment perhaps) a court of law can penalize the originator unless some weakness of the signing procedure can be demonstrated to the judge. If a weakness is successfully proven in court, then the signing procedure would need to be made stronger - a move that can be anticipated, pre-planned with the plan ready to go at short notice.
All this is well known, and signed communications would largely solve the personal responsibility problem.
Major organisations facing reputational or financial losses will want to refuse access to non-signed communications. I see major organisations as the biggest drivers of this system - private individuals will not be keen to pay the P.K.I. companies but if they want to continue to access major organisations they would have to.
No Internet-wide scheme can be entirely bullet proof, including mine.
Nevertheless, I hope to demonstrate that my proposal can be made robust and self repairing.
Improving communication privacy.
From a recent radio program I learnt a little of the early days of the Postal Service in the U.K..
Back in Cromwell's day, mail was opened regularly by state operatives. Everyone knew this was happening, consequently the contents of mail communication was stulted.
If business or political ideas cannot be discussed, revised, discussed ... in private, decision making will be degraded.
This will put a brake on the development of the host society.
Well, Cromwell is back in town!
As recent revelations have made clear, security agencies all over the world are snooping on Internet traffic, eagerly assisted by many hardware and software companies, reckless of reputational damage.
As in Cromwell's day, this will be acting as a drag on societal development.
Of course, the agencies have a job to do but they appear to have little self control and the government departments responsible for reining them in are out to lunch.
What the Internet needs is a cipher system of moderate strength so that breaking in would need significant effort but not defeat the agencies. If it could defeat the agencies they would take steps to prevent it's installation.
Most Internet traffic is of low to moderate value, with any value of short duration, and when the cost of breaking in exceeds the value of the contents, few will trouble to break in.
One group of people that will try to break in are the mathematically talented and software literate types in search of fame. For a moderate strength cipher, such people can be expected to succeed - then shout their success from the rooftops! Who among them could resist the glory of becoming known as
"The Man who broke the Internet cipher".
Looking beyond the obvious media frenzy, such events would provide valuable feedback to the effect that the cipher strength needed to be stepped up. This, in turn, means that any Internet-wide cipher must have multiple fallback positions, prepared in advance, to avoid systemic collapse.
A cipher with the characteristic of 'keeping a step ahead of the hounds' might remain efficacious for decades.
Given that an adaptable defensive strategy can thwart evolving attack attempts, the next biggest risk facing such a cipher is that it might be incompetent from the off.
If a strategic flaw were discovered after the date of introduction, it is possible that the fallbacks might also be flawed - followed by the collapse of the whole system.
The best mechanisms for defeating this possibility are :-
Lesser Issues.
##############################
This document extends document
multiplicative random number generators's.
[Referred to as 'tmrng' below.]
The PRNG's (pseudo random number generators)
introduced there, are a component of the
Stream Cipher discussed below.
You will need to have read this document.
------------
[Some textbooks are referred to below. The link to my site's page of references is :-
refs.html - includes ISBN numbers should you wish to purchase them.
The references of particular relevance to this document are :-
Ref [16] "Handbook of Applied Cryptography"
by A.J. Menezes, P.C. van Oorschot, S.A. Vanstone.
(Below I refer to this as 'Menezes')
Ref [11] "The Art of Computer Programming, Vol 2,
Seminumerical Algorithms" by Knuth.
(Below I refer to this as 'Knuth')
Another book referred to is :-
"Applied Cryptography" by Bruce Schneier
In a 'specialist references' document.
(Below I refer to this as 'Schneier')]
[The term 'word' below means 64-bit items
unless explicitly stated otherwise.]
------------
[In this document, I call the raw material being encrypted "plaintext" whatever the format. After encryption, plaintext is turned into "ciphertext". In this document this will be in binary format initially. This may need mapping into hexadecimal text or base64 text symbols for transmission down some channels.]
Below, I am going to present a Stream Cipher to act as the secondary cipher as discussed above. This cipher is intended to have a very high encryption and decryption speed with moderate security. To make such a cipher suitable for Internet-wide use, the need to prevent systemic collapse dominates.
What are the basic alternatives?
Today, cipher systems use either 'block ciphers' or 'stream ciphers'.
Block ciphers encrypt the data in 32 (or 64 or 128 ...) bit blocks and have many good security features. Each block encryption uses multiple steps in multiple stages, and hence they are not fast.
This makes them unattractive for an Internet-wide cipher.
Stream ciphers typically generate
a supply of random bits and
XOR them with the plaintext
to derive the ciphertext.
[XORing can be thought of as carryless bit-addition.]
[Conventionally, stream ciphers operate bit-by-bit or even byte-by-byte.
On todays computers, operating 64-bit word by 64-bit word gives a speed gain of more than 8, over byte-by-byte. This is too beneficial to refuse.
Some files will not be a multiple of '8' bytes in length; but such problems are easily solved in software.
Another problem is that some database records may have data fields that may not be a multiple of 8 bytes in size, and which may need to be accessed without decrypting the whole record.
[e.g. My bank branch code is a 6 digit decimal number.]
These, and similar problems, are easily handled in software, only a protocol being needed.]
XOR'ing is an invertible operation in the sense that XOR'ing the resulting ciphertext with the same sequence of random bits, will recover the original plaintext.
This is the decryption method.
Many Stream ciphers are presented and discussed in depth in 'Schneier' chapters 16 and 17. Most of the designs involve multiple bit-feedback of large bit-holding arrays which are shifted 1 bit per cycle. These shift-type Stream ciphers can be made very fast in hardware and they are a mainstay of military communication equipment.
However, the need for a circuit board to hold a hardware Stream cipher, is impractical here.
Furthermore, a circuit board cannot easily be made adaptable to different shift-type Stream cipher designs; i.e. incapable of assisting with the 'multiple fallback' requirement.
In software their speed performance is poor.
The x86 opcode set is not conducive to shift-type Stream ciphers of this type. The work can be done, but each iteration might take several 10's of clock ticks per random bit generated.
[Here down, I start to use some nomenclature and information from 'tmrng'.]
Might congruential generators (e.g. the PRNG's discussed in 'tmrng') be used?
From 'tmrng' it will be clear that each iteration of the PRNG's
takes a time proportional to 3 multiplications (when operated in the Montgomery domain) and since a single MUL opcode takes 5 ticks on my AMD A8 and 3 ticks on my Intel i3 (for 64-bit word sized operands), it is clear that the 64-bit PRNG (MG_64) generates several random bits per clock tick.
[Contrast this with the crude speed estimate for shift-type Stream ciphers (and the implied speed of block ciphers) whose speed is 'many ticks per bit'!]
[This PRNG speed estimate has to be tempered by the fact that several other opcodes are needed to perform secondary tasks - and these add to the clock ticks.
Furthermore, for the multi-word PRNG's, the #calls to MUL goes up as the square of the PRNG #words - also reducing the speed of production.]
Clearly, the PRNG's in 'tmrng' are dazzlingly fast
and would meet any speed requirements.
[Below, I demonstrate that using a PRNG
with simple XORing, as above, is insecure.
To counter this, some post-processing
of PRNG output is needed.
This will reduce speed a little but
the high PRNG speed makes this
of little consequence.]
Secondly, as discussed below, a new 'G' value can be found in a fraction of a second; this means that a unique PRNG can be set up for each communication. This goes a long way to meeting the 'Rebuild the cipher...' and the 'prevent systemic collapse...' requirements.
[If a new 'G' value is found, it has to be exchanged with the receiver as a part of the session key.
The agencies would (I guess) be able to break into a 128 bit Elliptic Curve PKS, to recover this 'G' value.]
[Finding a new Safe prime is more difficult.
It might be done, but it would not meet online speed requirments, and is best done offline.]
Achieving Security from PRNG based Stream Ciphers
Cryptographic security, from Stream Ciphers using PRNG's, is not straightforward :-
'Schneier' (pg 371) says
"The ... evidence is that congruential generators
[PRNG's] aren't useful for cryptography."!
These comments reflect mainstream cryptographic opinion!
Below, I hope to show that a decent Stream Cipher
using congruential generators can be built.
Nevertheless, caveat emptor applies.
[I need the following shortly :-
There is a theorem in mathematics called The XOR Theorem.
This states that, after XORing 2 bitstreams together
the entropy of the resulting bitstream
is greater than or equal to the greater of
the entropy of either input bitstreams.
[Entropy is a measure of randomness (or equivocation) useful in cryptanalysis.
[If an event is certain (either it certainly has occurred or it certainly has not occurred, the event (posterior) probabilities would be 1.0 and 0.0 respectively.
This is not convenient in cryptanalysis - when an event is certain, its equivocation is zero.
Entropy (a parameter derived from probabilites - it might be viewed as a weighted probability) counters this difficulty. The entropy of either event is 0.0.]
Below I use the word 'equivocation' in place of 'entropy'.]
It is the task of cryptanalysts to reduce the equivocation of a ciphertext to zero;
and the task of cipher developers to prevent them.
The XOR Theorem immediately throws attention onto the PRNG output
- to make the ciphertext close to maximal equivocation
the PRNG output has to be close to maximal equivocation
since the plaintext will be of moderate or low equivocation
(zero if known, low even when it is unknown).
[Document 'tmrng' displays results (towards the bottom) showing that the PRNG's discussed there pass Maurer's test (see 'Menezes' pg 183) - a severe test of the 'randomness' of the PRNG's output. This means that the result of XORing the output from these PRNG's with any plaintext will produce a near maximal equivocation ciphertext (from the XOR Theorem).]
An attacker in possession of only a single near maximal equivocation ciphertext is in a weak position - statistical-type analysis (using Entropy in preference to probability) is his only method of attack, and this will almost certainly fail.
[This scenario is called a "Ciphertext only attack".]
This is excellent news! But unfortunately other methods of attack exist.
The big problem is the "Known Plaintext attack".
Ciphers that cannot resist this attack are useless.
In this attack the analyst knows all or part of the plaintext in advance - possibly he wrote the message himself and inveigled it into the sender's network; possibly because his studies of the network have revealed to him proforma sections of your messages.
[Proforma sections are vigorously suppressed by skilled network security operatives - but on the Internet this form of editing is, in general, not going to occur.
This means that, on the Internet, the first line of defence against the Known Plaintext attack will, in general, be absent.]
[I use the term 'Streambits', here down, to indicate the random bits used to encrypt the plaintext.]
[The following relations are needed shortly :-
{Plaintext} XOR {Streambits} => {Ciphertext} -- P1
{Ciphertext} XOR {Streambits} => {Plaintext} -- P2
{Ciphertext} XOR {Plaintext} => {Streambits} -- P3
Possibilities P1 and P2 are just the encryption and decryption stages.
Possibility P3 is the step used by an attacker in a Known Plaintext attack.
When using P3, he has intercepted a Ciphertext and he knows some (or all) of the underlying plaintext so P3 enables him to get some of the Streambits.]
On first encounter, a natural reaction to P3 is "So what?"
The Streambits found are far from irrelevant :-
If he can recover enough Streambits to find one PRNG output word {Xn} and happens to know pS plus the PRNG 'G' value then all subsequent and previous PRNG outputs can be found.
Subsequent values come from the defining formula [B] (see 'tmrng').
Previous values can be found by rearranging the defining formula [B] as :-
Xn = G-1.Xn+1 (mod pS) --- Eqn. Z
Eqn. Z and formula [B] enable the whole plaintext to be recovered!
If he can recover 2 successive PRNG outputs {Xn+1, Xn}and happens to know just pS alone, then the PRNG 'G' can be found by rearranging the defining formula [B] :-
G = Xn+1.Xn-1 (mod pS) --- Eqn. X
Then Eqn. Z and defining formula [B]
enable recovery of the whole of the PRNG output
and the whole of the plaintext.
['Schneier', pg 6, lists 3 variants of known-plaintext attack, but that sophistry is of minor relevance here.]
[You might think that "... know pS ..." is unlikely.
A book called "La cryptographie militaire" written in 1883 by a luminary of cryptography called Kerckhoffs is relevant here. He updated cryptography for the telegraph age - the most egregious problem of that time was the explosive growth in message and ciphertext volumes, brought about by the introduction of the telegraph, and the ease of interception.
Today's transmission methods have moved a long way from those of Kerckhoffs day, but his principles still apply.
He published a set of 6 fundamental rules that a military cipher system should follow :-
Rule #2 divides cipher systems into 2 components - 'system' and 'key'. The 'system' will be in place for a long time and be known about by many participants. The combination of long time and many individuals with knowledge, means that secrecy will be unlikely - and the only safe assumption will be that it does not exist.
The 'key' must be short-lived (ideally single use) and known to very few. In these circumstances, secrecy of the 'key' is a reasonable assumption. This immediately makes the 'key' the sole repository of security for any cipher system.
Applied to this document, the fact that a PRNG is being used would be firmly in the 'system' category.
The list of items possibly in the 'key' category for a PRNG is {X0, G, and pS} (in formula [B]).
However, it is impractical to vary pS rapidly. Hence I consider pS should generally be classified as a 'system' item.
This leaves {X0 and G} as the 'key'.
There may be circumstances where 'G' is fixed for significant periods - i.e. should then be considered a 'system' component.
However, my proposal below is that 'G' should change for every communication.]
Operational method I has zero key material; hence zero security!
Operational method II has a little security, but is a sitting duck for a Known Plaintext attack.
Operational method III appears to increase security significantly - but this is not so. Following a Known Plaintext attack, Eqn X above can be used to find 'G', followed by Eqn Z and formula [B] to finish the break.
Operational method IV is the most secure method, but is much less practical. Finding a Safe prime 'on the fly' is difficult - the 2048 bit Safe prime presented in 'tmrng' took several minutes to find.
Given the comments above, operational methods I, II, and III are seen to have little crytographic merit in the face of a Known Plaintext attack. It is only a slight exageration to say that they have none!
Operational method IV is the best - but not practical for an Internet-wide cipher.
From considerations of the Known Plaintext attack above, 2 problems are clear :-
Thinking about these problems, the solution is obvious - the PRNG output needs to be hidden.
This idea immediately leads to several candidate solutions.
These items might be combined to derive yet more options.
[My first attempt used 'A' and 'D' in combination :-
In this design the least significant word of an MG_128 provided the source of random bits
and these words were then shuffled using algorithm M.
However, algorithm M has little cryptographic merit unless the buffer it uses is large; but if this buffer is large the speed penalty is considerable.
[All random access buffer operations, as in algorithm M, are cache unfriendly. If the caches are over-stressed they will force the processor to stop, until further demands can be accommodated.]
I worked up this solution fully, but the unimpressive speed led me to abandon it.]
[If buffers are written to (or read) sequentially this problem is much reduced.
(see document Speed Testing for explanations of this.)]
My second attempt used 'A' and 'C' in combination.
In this design the least significant 64 bit word
of an MG_128 is the main source of random bits.
[i.e. half of each Xn per iteration is used.
Below, I use the label 'main PRNG'
to indicate this MG_128.]
Each sender finds new X0 and 'G' values
for the main PRNG for each communication.
[This contributes 256 bits to the key material.]
The word from the MG_128 is then XOR'd with
the output from an MG_64
to get the next Streambits word.
[I shall call this MG_64 'the masking PRNG' below,
since its role is to 'mask'
the partial output of the MG_128.]
[I propose that the pS and 'G' values
of the masking PRNG be fixed, network wide,
with only a random initial value X0
(set by the sender)
changing with each communication.
[This contributes 64 more bits to the key material.]
[I refer to this design as 'the proposal' below.]
[My MG_128 code fits in the x64 processor registers, in the main, and hence is very fast.
The MG_256 (and larger) designs overflow the available register space and so I have subroutine-ized them. Other than the 'square of the size' ratio, mentioned above, a little subroutine calling overhead and register pushing and popping will chip away at their speed.]
I propose that the MG_256 ... MG_2048 main PRNGs be held in reserve to counter future security problems.
I also propose that only the least significant word of each of these PRNGs is used - this means that the fraction of output used decreases as the main PRNG size increases.
My guess is that the design outlined is resistant to Known Plaintext attacks.
If anyone knows differently, let me know!
Another means of attack :-
With a very large network producing large numbers of ciphertexts, an analyst may be able to exploit similarities between ciphertexts.
If he is lucky, (and the network members are either dim or over confident or desperate) he may get a pair of ciphertexts with either the same (or similar) plaintexts or the same (or similar) key material.
[On the Internet most users can be expected to be cryptographically dim, even if otherwise intelligent.]
[Attacks of this general type were discussed by Kerckhoffs. His observations are even more relevant to an Internet-wide cipher where traffic volumes are immense.]
An example :- suppose 2 ciphertexts are generated from 2 plaintexts with identical 'key' material (hence identical Streambits) and transmitted.
If an attacker learns what has been done and XOR's these ciphertexts together, the Streambits used to generate them will be stripped out and he will end up with the 2 plaintexts XOR'd together.
This result may not appear immediately useful, but the attacker has made a large step forwards - it is likely that the equivocation has been reduced.
[The equivocation of plaintext being low, by the XOR Theorem the equivocation of the intermediate result will probably be not much larger than that of either plaintext.]
The result is a sitting duck for old-style statistical analysis and this will surely succeed and recover both plaintexts!
The counter to this attack is :- Never use the same stream cipher 'key' material twice!
Given the low skill level of most Internet users, this has to be enforced by the software.
[A extension of this attack was used against Russian agent ciphertext transmitted from the U.K. in WW II.
This ciphertext was formed using 'one-time pads' - the only uncrackable cipher system.
When German armies approached Moscow, the unit responsible for one-time pad production, based in Moscow, was moved eastwards for safety, and this disrupted pad production.
The organisation responsible for pad production was forced to issue identical sheets to different agents as a 'solution' to the pad shortage problem. After an enormous amount of work on the lines discussed above, a few plaintexts were obtained - one consequence of which was the unmasking of Maclean (one of the Cambridge spies) and hundreds of Soviet agents in the American establishment.
[One-time pads operate by carryless addition of base10 arithmetic - i.e. not dissimilar to XORing.]
A break in similar circumstances was made into German diplomatic traffic coming from the German embassy in the Irish Republic in WW II. Pad deliveries were restricted by the Royal Navy and security personnel around the embassy.]
[In probability theory the word 'ensemble' is used to indicate 'all possible outcomes'. 'Ensemble average' then means averaging over all possible outcomes (a possibly infinite number!).]
In cryptography, the word ensemble typically applies to all the ciphertext from all of the members of some organisation - i.e. some subset of total Internet traffic.
As the size of the ensemble of ciphertexts grows, statistics derived from it grow more accurate and faint patterns become more apparent.
With the Internet, the number of samples in some ensembles can be huge.
In these circumstances the chances of no information being teased out of communications encrypted by an Internet-wide cipher is small.
The considerations above lead to the conclusion
that security, using any Internet-wide cipher,
is never going to be impenetrable.
Hence, a more practical objective
is to restrict the number of penetrations.
The way to achieve this is to
make the cipher "a moving target".
In my proposal, there are 320 bits of key material
- all of these help with this problem.
[Provided the key bits are of high equivocation!
- the software needs to impose this.]
However, changing 'G' for each communication
is the principal mechanism for defeating
ensemble attacks.
Changing 'G' sets up a significantly different
main PRNG for each use.
Once again, let me know if you disagree
with any of this!
[All experimental results given below were obtained under 'hot cache' conditions.
(Use the Speed testing link, above, for an explanation of this term.)
The reason for this is that 'hot cache' timings are the only experimentally repeatable results.
On the down side, 'hot cache' times are minima, and when a processor is busy multi-tasking this will 'cool' the caches down and increase execution times. This means that when the cipher system is embedded within other tasks (a certainty with my proposal), it will run more slowly.
[By 'more slowly' I mean factors up to 2-3 are possible.
'Software Optimization' techniques can be used to reduce this problem (see e.g. Ref[7]).]
Hence values quoted below have to be used with caution.]
The most obvious speedup for my proposal comes from :-
Inspecting the defining formula for the type [B] PRNG, it should be clear that the bulk of the execution time will be from the 'mod p' step.
On an x86 processor and a 64-bit PRNG, the obvious way to perform this step is to use opcode DIV. This opcode takes over 70 clock ticks to execute on recent 64 bit processors (compared to 3-5 for a MUL).
Can a faster method be found to perform this step?
Yes - the Montgomery 'mod' technique can achieve the same result for the price of 3 MUL's + a few more fast ops.
[The logic above is correct so long as the PRNG is small. Larger PRNG's (multi-64 bit words in size) see the advantage of using Montgomery modding fade away. A 64-bit PRNG is about 3 times faster when using this technique, but a 2048-bit PRNG is only about 30% faster.
For interest, PRNG's a little larger than 2048 bits operate faster in the integer domain.
When the word size is very large (> 160 words) operating in the Montgomery domain comes back into contention for speed. The reason for this lies with the speed gains available from the Toom Cook method when multi-word multiplications are being done.
These observations are also relevant when Finite Field arithmetic is being done.]
The problem with Montgomery modding is that operations take place in another domain - I shall call it the Montgomery domain. If calculations are being done in the domain of the integers, it is necessary to map the integers into the Montgomery domain, perform Montgomery modding, then map the result back to the integer domain. These 'mappings' more than cancel the speed gain that a single Montgomery modding provides.
['Mapping in' requires the use of a slow DIV for example.]
[A Montgomery Primer, (for word sized arguments) can be reached by clicking this link.]
[This problem means that Montgomery modding is commonly only used for evaluating exponentials (mod an odd composite integer or a prime). For this type of work, mappings are needed only at the start and end of the heavy exponential work load - i.e. the mapping overhead is small in proportion to the time needed to calculate the exponential.
This point is relevant to speeding up finding generator 'G' ('Menezes' algo 4.80), where an expontential (mod prime) has to be found.
An indication of the performance of algo 4.80 (for an unrelated problem with smaller arguments using the Montgomery method) can be found in A Performance Summary ...
For the problem here, times are given below.]
Can a method be found to avoid the mappings?
Yes - map the PRNG parameters into the Montgomery domain
(a one-off overhead at setup time)
then work the PRNG in the Montgomery domain thereafter,
using the Montgomery domain results directly
- no mapping out - as the output of the PRNG.
This suggestion raises an obvious problem :-
"What is the quality of the output of such a PRNG?"
If quality were degraded, then this speedup technique would have to be abandoned.
In summary, the quality is high. Some evidence for this is in 'tmrng' and some more is given below.
Another speed up technique is to use 'lines' as the unit of Stream cipher output.
[x86 processors, move data (and code) in lines at the hardware level.
[One of my old machines used 32 byte lines and another 128 byte lines.]
Latterly, 64 byte lines have become the standard.
The line size might be found 'on the fly' using opcode CPUID.]
Writing (or reading) data in units of line-aligned lines minimizes the stress on the cache to RAM hardware.
Line-aligning means the last 6 bits of the block start address are zero (for a 64 byte line size). All of RAM is organised into lines. Each line move to/from RAM uses the front side bus 8 times - 1 for each byte bit - since the data bus is 64 bits wide.
It is difficult to find an unbiased experiment to determine the benefit from using line-aligned line-sized movements but from my tests the speed gain appears to be a little under '2'.
[The importance of not over-stressing the hardware comes from the fact that the caches have the authority to force the processor to STOP if they become overloaded.
I was nervous about the high speed of my proposed design triggering a reaction from the caches. So I did tests filling a range of buffer sizes to try to trigger it. The problem did not occur.
My tests were done on a PC several years old and of moderate power. More modern PC's should be clear of this problem.
It is possible that smartphones and other less powerful devices could be troubled by this problem, but I have no means of testing this].
With all the assumptions discussed above
I obtained an output speed of
110 Mbyte/sec from my proposal.
In a recent newspaper article (The Times 12/12/18) the best UK broadband average transmission speed of 265.9 Mbits/sec was quoted (for an address in Bournville (extracted from a survey by uSwitch)) - say 33 Mbyte/sec.
[At this speed a 2 hour film can be downloaded in under 4 mins.]
Hence, my proposal runs 3.3 times faster than movies can currently be transmitted in the UK. This speed performance means that it meets the need to be 'unnoticed' by the end user.
The transmission of movies is probably a 'worst case scenario' for my proposal and it comes through well.
Clearly, other high speed tasks, e.g. live video feeds, can be handled with ease.
What of the time needed to find a generator? This must be small if the generator is to be included in the 'key material'. In tests I found that a 128 bit generator could be found in 0.011 secs (using the Montgomery method), and a 2048 bit generator could be found in 0.105 secs. The time to find the larger generator is about the limit for a satisfactory end-user experience.
[These results were obtained on a 3.2 GHz PC.]
Clearly, these times are not a hindrance to finding a generator for each communication.
[Several other tasks have to be done at setup time and a user would only be sensitive to total setup time; hence the time to find a 'G' has to be a fraction of the, say, 1.0-2.0 secs ideal.]
From 'tmrng', it will be clear that all the PRNG's discussed there pass Maurer's test. Since my proposal combines these PRNGs by XORing, then by the XOR Theorem, the output quality of my proposal should not be lower.
Nevertheless, tests are needed to prove this.
In the Maurer test, the resulting statistics should have a defined mean, and a defined variance. By scaling these results, a value can be derived that should have a zero mean and unit variance Gaussian probability. This derived value is a little easier to work with.
Below, I have run the test 'NTRIALS' times and then used the Kolmogorov Smirnov (KS) test to check for the quality of fit to a Gaussian curve for these nominally Gaussian values.
The KS test calculates 2 statistics - K+ and K-.
'Knuth' (see the bottom of Table 2, pg 51) provides an approximate formula (with an error of the order of 0.02 for the conditions below) to calculate the cumulative probabilities of experimental K+ and K- values.
[If NTRIALS was large, ('Knuth' suggests 1000) the K+ and K- statistics should have a Rayleigh distribution.]
TESTING Proposal (using an MG_128) for randomness :- NTRIALS=48 [nLbits is the window bit-width in Maurer's test] [All probabilities are cumulative probabilites] nLbits=6 [mean, variance found {5.21771, 0.00381988} Gaussian mean = 0.480 (Ideal= 0.5)] K_plus = 0.416954 (Prob = 0.322) K_minus = 0.436333 (Prob = 0.346) nLbits=7 [mean, variance found {6.19625, 0.00287986} (Gaussian mean = 0.427)] K_plus = 0.860929 (Prob = 0.791) K_minus = 0.308482 (Prob = 0.198) nLbits=8 [mean, variance found {7.18367, 0.00213069} (Gaussian mean = 0.560)] K_plus = 0.733356 (Prob = 0.683) K_minus = 0.856111 (Prob = 0.788) nLbits=9 [mean, variance found {8.17642, 0.00155753} (Gaussian mean = 0.586)] K_plus = 0.49270 (Prob = 0.414) K_minus = 1.30692 (Prob = 0.971) nLbits=10 [mean, variance found {9.17232, 0.00112931} (Gaussian mean = 0.496)] K_plus = 0.665242 (Prob = 0.613) K_minus = 0.352907 (Prob = 0.247) nLbits=11 [mean, variance found {10.17, 0.000814458} (Gaussian mean = 0.410)] K_plus = 0.662721 (Prob = 0.611) K_minus = 0.422171 (Prob = 0.328) nLbits=12 [mean, variance found {11.1688, 0.000585193} (Gaussian mean = 0.442)] K_plus = 0.755709 (Prob = 0.704) K_minus = 0.452946 (Prob = 0.366) nLbits=13 [mean, variance found {12.1681, 0.000419274} (Gaussian mean = 0.430)] K_plus = 0.62883 (Prob = 0.574) K_minus = 0.54918 (Prob = 0.482) nLbits=14 [mean, variance found {13.1677, 0.000299863} (Gaussian mean = 0.529)] K_plus = 0.419063 (Prob = 0.325) K_minus = 0.531976 (Prob = 0.461) nLbits=15 [mean, variance found {14.1675, 0.00021413} (Gaussian mean = 0.686)] K_plus = 0.20424 (Prob = 0.099) K_minus = 1.04244 (Prob = 0.897) nLbits=16 [mean, variance found {15.1674, 0.000152744} (Gaussian mean = 0.461)] K_plus = 0.567686 (Prob = 0.504) K_minus = 0.517116 (Prob = 0.443)
These results are satisfactory.
Internet protocols need to be improved urgently if the waning confidence of the public is to be reversed.
My proposal can achieve this.
The principal problems of the need for very high encryption and decryption speed, resistance to ciphertext only, known plaintext and ensemble attacks have been largely overcome by my proposal.
The adaptability and extendability of my proposal are also beneficial features, and necessary for any Internet-wide cipher system.
More controversial is the need to accommodate the intelligence agencies.
I provide this by suggesting an Elliptic Curve (EC) PKS of moderate strength. My guess is that this PKS will be easier to penetrate than the Stream Cipher; but few, outside the agencies, would be able to do this.
[There is room for debate on this topic - I have no hard evidence about it. If my guess is wrong there are many adjustments available to rectify matters.]
[If it became clear that the Stream cipher had been penetrated, it should be clear several stronger versions are available and ready to go at short notice.]
Since several intelligence agencies around the world are also malicious actors, it could be argued that the PKS should be strengthened to defeat them. However, this would guarantee nothing being put in place at all.
The most obvious attack mechanism for agencies is identity stealing. Once a public key has been obtained (easy just request it!) then, if the EC can be broken, the private key can be found - one fully captured electronic identity.
Any user spotting unexplained activity would need to reset their public key parameters urgently and alert the network security supervisors of their problems.
If an EC break were verified, the next waiting elliptic curve would have to be installed urgently.
If the frequency of such breaks became too high, a move to a larger Field prime with a new set of EC's would be necessary.
On the plus side, such events would provide good feedback on system security.