Spiral (20 page)

Read Spiral Online

Authors: Koji Suzuki

BOOK: Spiral
5.61Mb size Format: txt, pdf, ePub

So he tucked the notebook under his arm and headed up to the third-floor reading room, where he took a seat by the window.

As a student playing at cipher-cracking with Ryuji, he'd had quite a collection of books on cryptography at home. But what with getting married and then getting divorced, he'd moved three times since then, not to mention the fact that he'd lost interest in the subject; all those books had disappeared somewhere along the line. There were certain types of codes that he couldn't hope to decipher without the help of character substitution charts and letter-frequency graphs of the kind found in specialist works, and he doubted he'd be able to get anywhere on this one without their help. And since it just seemed foolish to buy them all over again, he'd ended up at the library.

At one point he'd had a good grasp of the basics of constructing and unscrambling codes, but it had been ten years, so he first took a quick glance through a primer on the subject. He decided that his first step should be to decide just what class of code was contained in the smallpox-like virus's base sequence.

Codes can be generally divided into three types: substitution ciphers, in which the letters of the message are replaced by other letters, symbols, or numbers; transposition ciphers, in which the order of the words of the message is changed; and insertion ciphers, in which extraneous words are inserted between the words of the message. The numbers that popped out of Ryuji's belly after the autopsy, which Ando was able to link to the English word "ring", was a good example of a simple substitution cipher.

It didn't take him long to guess that the virus's code had to be of the substitution variety. What he had to work with was a group of four letters, ATGC, corresponding to the four bases, so it was most likely that the code consisted of assigning a particular character to a predetermined grouping of letters. That was most code-like.

Code-like.
When the thought occurred to him, it made him sit up and think. The essential purpose of a code is to convey information from one party to another without any third party being able to figure it out. As students, codes had been nothing but a game to them, brain-teasers. But in, say, times of war, when a code's susceptibility to deciphering could sway the tide of a conflict, a "code-like" code would mean one which was, in effect, too dangerous to use. In other words, one way to keep the enemy from breaking your codes was to make sure they didn't look like codes at first glance. If you caught an enemy spy and found he was carrying a notebook filled with suspicious looking strings of numbers, it would be a safe bet that it was top-secret informatin, encrypted.

Even allowing for the possibility of decoys, when a code is identified as such, the chances of it being broken rise significantly.

Ando tried to think logically. If the purpose of a code is to keep information from the hands of a third party, then a code should only seem "code-like" to the person for whom the information is intended. Staring at the forty-two letters interpolated into the base sequence of the virus, Ando found them extremely code-like. That had been his impression from the very first time he'd looked at the chart.

Now why would that be?

He tried to analyze the source of that impression. Why did it seem code-like to him? It wasn't as if there had never been puzzling repetitions found in the course of DNA sequencing. But in spite of that, this particular repetition seemed meaningful. It popped up everywhere they looked in the sequence, no matter where they sliced it. It was as if it was trying to call attention to itself, saying,
I'm a code, dummy.
The sequence of letters seemed particularly code-like to Ando in light of his experience with the numbers that had popped out of Ryuji's belly. In other words, maybe there had been two purposes to the word "ring" squeezing its way out just then: not only was it meant to alert Ando to the existence of the
Ring
report, but it was also a form of warning. It was as if Ryuji were telling him,
I
may use codes again as the situation warrants, so keep your eyes peeled and don't miss them.
And maybe he'd used the simplest kind of substitution cipher as a hint, too.

Given that the mysterious string of bases had only been found in the virus drawn from Ryuji, it was safe to assume that he was the one sending the code. It was an undeniable fact, of course, that Ryuji had died and his body been reduced to ashes, but a sample of his tissue still remained in the lab. A countless number of instances of his DNA, the blueprint for the individual entity that was Ryuji, still remained in the cells in that tissue sample. What if that DNA had inherited Ryuji's will, and was trying to express something in words?

It was a nonsensical theory completely unworthy of an anatomist like Ando. But if he did succeed in making the string of letters yield words by means of substitution, then that would trump all other readings of the situation. Theoretically, it was possible to take DNA from Ryuji's blood sample and use it to make an individual exactly like Ryuji-a clone. This assemblage of DNA sharing the same will had exerted an influence over the virus that had entered its bloodstream, inserting a word or words. Ando could suddenly sense Ryuji's cunning and sheer genius behind this. Why had he inserted the message only into the virus, an invader, and not into his red blood cells? Because, with his medical background, Ryuji knew that there was no chance that DNA from the other cells would be sequenced. He'd known that he could only count on the virus responsible for the cluster of deaths being run through a sequencer, and so he'd concentrated his efforts on the virus's DNA. So that the words he sent would be received.

All of which finally led Ando to one conclusion. Since this code looked to him like a code, it was no longer functioning, in essence, as a code should. Rather, it was just that Ryuji's DNA had no other way to communicate with the outside. The DNA double helix was composed of four bases represented by the letters ATGC. Ando couldn't think of any other way for it to make its will known but by combining those four letters in various ways. It had chosen this way because there was no other available to it. It was the only means Ryuji had at his disposal.

Suddenly all the despair Ando had felt a few moments ago was gone, replaced by a buoying confidence.

Maybe I'll be able to decipher this after all.

He felt like shouting. If Ryuji's will, lingering in his DNA, was trying to speak to Ando, then it stood to reason that the words it used would be ones easy for Ando to decode. Why should they be more difficult than they needed to be? Ando went back and checked his line of reasoning to see if there were any holes in his deductions. If he started off on the wrong foot, he could wander around forever without finding the answer.

He no longer saw what he was doing as merely a way of killing time. Now that he felt that he would actually be able to decipher the message, he couldn't wait to find out what it said.

The rest of the morning, until lunchtime, Ando spent working on two approaches.

The sequence he had to work with was:

 

ATGGAAGAAGAATATCGTTATATTCCTCCTCCTCAACAACAA

 

The first question was how to divide the letters up. He tried dividing them up in twos and in threes. First, by twos:

 

 

Taking a pair of letters as one unit, the four letters available yielded a possible sixteen different combinations. He wondered if each combination might represent one letter.

But this immediately led him to another problem: what language was this message written in?

It probably wasn't the Japanese syllabary. There were nearly fifty characters in that, far more than the sixteen allowed by the pair method. The English and French alphabets both had twenty-six letters, while Italian only used twenty. But he also knew he couldn't overlook the possibility that the message was in romanized Japanese. Identifying the language of a code is sometimes half the battle.

But this was a problem that had already been solved for Ando. The fact that he'd been able to replace the numerals 178136 with the word "ring" could probably be taken as a hint from Ryuji that the present code would also yield something in English. Ando was sure of this point. And so the question of language was as good as settled.

The forty-two base letters could be split into twenty-one pairs. But several pairs were identical: there were four AA's, three TA's, three TC's, and two CC's. There were only thirteen unique pairings. Ando jotted these numbers down on a piece of paper and then paged through a book on code-solving until he found a chart showing the frequency of appearance in English of different letters of the alphabet.

He knew that although the English alphabet contains twenty-six letters, not all of them occur in equal numbers in everyday use. E, T, and A, for example, are common, while Q and Z might appear only once or twice per page. Most handbooks on code-breaking will include various kinds of letter frequency charts in the back, among other statistical references. Using such tables and statistics made it easier to determine the language a coded message was in.

In this case, what the figures told him was that in an English phrase of twenty-one letters, the average number of different letters used was twelve. Ando clicked his heels. What he had was thirteen different letters, not far off the average at all. This told him that, statistically speaking, there was nothing wrong with him dividing the sequence into twenty-one pairs and assuming that each pair stood for a letter.

Putting that possibility on hold for a moment, Ando next tried dividing up the sequence into sets of three:

 

ATG GAA GAA GAA TAT CGT TAT ATT CCT CCT CCT CAA CAA CAA

 

This produced fourteen trios, or seven unique varieties: GAA, TAT, CGT, ATT, CCT, and CAA. The charts told him that an English phrase of fourteen letters contained an average of nine different letters. Not far off from the seven he had.

Ando immediately noticed that there was a lot of overlap produced by this system. GAA, CCT, and CAA each occurred three times, and TAT appeared twice. But what really bothered Ando was the fact that GAA, CCT, and CAA each appeared three times in a row. If he assigned each triplet a single letter of the alphabet, there were three separate cases in this short passage of the same letter being repeated three times. He knew enough English to know that double letters were not at all uncommon. But he couldn't think of any English words with triple letters. The only possibility he could think of was situations in which one word ended with a double letter and the next word began with the same letter, e.g., "too old" or "will link".

He picked up an English book he happened to spy nearby and started examining a page at random to see just how often the same letter occurred three times in succession. He'd gone through four or five pages before he found a single instance. The chances of it happening three times in one fourteen-letter sequence were basically nil, he concluded. By contrast, dividing up the forty-two letters into pairs produced just one double letter. As a result, he decided that statistically it made more sense to go with the first option and divide the bases into pairs of letters.

He'd narrowed down the possibilities. From here he could proceed through trial and error.

 

The AA pair appeared four times, which meant it must correspond to a letter used with great frequency. Consulting another chart, Ando confirmed that the most frequently used letter in English is, of course, E. So he hypothesized that AA stood for the letter E. The second most common pairs in his sequence were TA and TC, occurring three times each. He also noticed that AA was followed by TA once, while TC was followed by AA once. This might be important, since there were also statistics for various combinations of letters. He started trying out various possibilities for TA and TC, constantly referring to his charts.

As far as letters which often follow the letter E and which are also common in and of themselves, the letter A seemed like the best candidate, which meant that TA could stand for A. By the same logic, he thought that TC might correspond to the letter T. Further, by the way it combined with other letters, he guessed that CC might be N. Thus far the statistics seemed to be serving him well. At least, he hadn't run into any problems.

Other books

The Satin Sash by Red Garnier
Wolf Born by Ann Gimpel
"N" Is for Noose by Sue Grafton
Death Weavers by Brandon Mull
City of the Fallen by Bocco, Diana
Lots of Love by Fiona Walker
Someone Wishes to Speak to You by Jeremy Mallinson