What Mad Pursuit (14 page)

Authors: Francis Crick

BOOK: What Mad Pursuit

4.1Mb size Format: txt, pdf, ePub

Eventually the code (see
appendix B
) was solved by experimental methods, not by theory. Major contributors were the groups of Marshall Nirenberg and of Gobind Khorana. The group of an earlier Nobel laureate, Severo Ochoa, also made important contributions. Even as the code was coming out, attempts were made to guess the whole from the part, but these were also largely unsuccessful. In some ways the code embodies the core of molecular biology, just as the periodic table of the elements embodies the core of chemistry, but there is a profound difference. The periodic table is probably true everywhere in the universe, and especially relevant in places that have about the same temperature and pressure as the Earth. If there is life on other worlds and even if that life also uses nucleic acids and proteins, which is far from certain, it seems very probable that the code there would be substantially different. There are even minor variants of it in some of the organisms we have here on the Earth. The genetic code, like life itself, is not one aspect of the eternal nature of things but is, at least in part, the product of accident.

9
Fingerprinting Proteins

I
N THE LAST CHAPTER I discussed the various theoretical attempts to solve the coding problem. In this one I describe some experimental approaches. The problem was much the same as before: Do genes (DNA) control the synthesis of protein? And if so, how?

It seems obvious enough now that the amino acid sequence of a protein is determined genetically, and in particular by the base sequence of a stretch of DNA (or RNA), but this was not always so clear. After the double helix was discovered the idea seemed much more attractive, so much so that Jim and I began to take it for granted. The next step was to show that the gene and the protein it coded were co-linear. By this I mean that the sequence of bases in that stretch of nucleic acid was in step with the corresponding sequences of amino acids in the particular protein it coded, just as a stretch of Morse code is co-linear with the corresponding message in English.

In those days there seemed no hope of sequencing either DNA or RNA directly, but in favorable circumstances we thought it might be possible to order a set of mutants within one gene, using standard genetic methods. Since the genetic distances were likely to be rather small, the recombination rates involved were expected to be much less than those geneticists usually measured. This implied that many progeny would have to be examined, suggesting that it would be necessary to use some sort of microorganism, such as a bacterium or a virus.

Once the mutants had been put in order, the next step would be to pin down the amino acid change due to each mutant. Although sequencing a protein chain was then still laborious, Fred Sanger had shown that it could be done, and we expected that for a small protein it would not be impossibly difficult.

Some time in the summer of 1954 I was sitting on the grass at Wood’s Hole, explaining these ideas to the Polish geneticist Boris Ephrussi. Boris, by then working in Paris, had been particularly interested in genes in yeast that appeared to be outside the nucleus of the cell. We know now that such cytoplasmic genes are coded in the DNA of the cell’s mitochondria, but at that time all that was known was that they did not behave like nuclear genes. Boris was indignant. “How do you know,” he asked, “that the amino acid sequence is not determined by a cytoplasmic gene and that all the nuclear genes do is to fold up the protein correctly?”

I don’t think Boris necessarily believed this (and certainly I did not), but his question made me realize that we first needed to show that a
single
mutant in a nuclear gene altered the amino acid sequence of the protein for which it coded, probably changing just a single amino acid. On returning to Cambridge I decided that this was the next most important step to take.

It was not at all clear what organism to use nor what protein to study. A little later Vernon Ingram joined us at the Cavendish. His main task was to add heavy atoms to hemoglobin or myoglobin, to help the X-ray work, but he and I decided to have a go at the genetic problem. We realized that for the first step we need not map the gene in detail. All we needed was enough genetic information to show that a mutant was being inherited in a Mendelian way and was therefore likely to belong to a nuclear gene. Nor did we need to fix the changed amino acid in the sequence. It was only necessary to show that there had been a change in the sequence due to the mutant. We thought that this would make things easier, since we then only needed to study the amino acid
composition
of the proteins. If the protein were small enough we might, with luck, pick up a change as small as an alteration to just one amino acid.

In order to work with a protein that was easy to obtain, we chose the protein lysozyme. Lysozyme is a small, basic (meaning positively charged) enzyme originally characterized by Alexander Fleming, the discoverer of penicillin. Fleming had shown that it occurred in tears and that egg white was also a rich source. The enzyme lyses (breaks up) a certain class of bacteria, and in both contexts acts to counteract bacterial infection. One particular bacterium is especially sensitive to it, and this can be used as an assay for the enzyme.

Our main target was egg white but we also tried human tears. Each morning when I came into the laboratory the assistant took a small sample of my tears. Not being an actor, I did not find it easy to weep at will, so my assistant would hold a slice of raw onion underneath one eye. I would hold my head to one side, to make it less easy for the tear to escape down the tear duct, and she would catch the tears with a little Pasteur pipette as they dribbled out of the other side of my eye. Even so, it was difficult to produce more than one or two tears, though I found it helped to think sad thoughts. Curiously enough, I never cry spontaneously at sad or tragic events, but a happy ending makes me weep uncontrollably. Let the bride finally walk triumphantly down the aisle, with the organ playing in jubilation. The tears will stream down my face, in spite of my intense annoyance and embarrassment.

The effect of a single tear can be dramatic. A weak suspension of the bacteria we used looks appreciably cloudy, though not as dense as milk. Add a single tear, swirl the fluid in the test tube, and in a moment the suspension becomes completely clear. All the bacteria have been lysed, thus immediately reducing the scattering of light that caused the cloudiness. Of course we used a more quantitative assay, but the phenomenon was basically the same.

Because chick lysozyme has a strong positive charge, unlike all the other proteins in egg white, it is possible to crystallize it
in the egg white
, without any further purification. To a biochemist it is really surprising to see the crystals sitting in the rather concentrated, gooey egg white. For the same reason lysozyme was relatively easy to separate on the simple ion exchange columns that had just then been developed for fractionating proteins.

It would be nice to report that we found a mutant, but in fact we had no success at all. We tested the lysozyme rather crudely, checking, in effect, its charge and the way it absorbed ultraviolet light, yet we could easily show that chick lysozyme differed from guinea fowl lysozyme, and that they were both quite different from the lysozyme in my tears. Although we studied about a dozen strains of chickens, kindly supplied by the local chicken geneticist, testing about a hundred eggs in all, we never detected any difference. We tried the tears of half a dozen people around the lab, but these all seemed to be similar to each other. I wanted to test the tears of my younger daughter Jacqueline, then only two years old, but Odile would have none of it. What! Use her precious baby for an experiment! I was sternly forbidden to attempt it.

I expect we would have gone on, but at that stage there was a dramatic development. Max Perutz was working on hemoglobins, including human hemoglobin. Some years earlier Harvey Itano and Linus Pauling had shown that the hemoglobin from a person with sickle-cell anemia was electrophoretically different from normal hemoglobin. Pauling rightly dubbed it a genetic disease. A colleague of his at Cal Tech measured its amino acid composition and reported that there was no difference between normal and sickle-cell hemoglobin. This conclusion was badly worded. What he meant was that there was no difference in composition he could reliably detect, but since hemoglobin is a comparatively large protein, a single amino acid change could easily be missed using this rather crude measure.

Sanger had developed a method he called fingerprinting proteins. He digested the protein with an enzyme (trypsin) that cut the polypeptide chain only at special places. The limited number of peptide fragments thus produced were then run on a two-dimensional paper chromatographic system to sort them one from another, spreading the peptides out on the paper. Vernon realized that this was just the method he needed to pick up small alterations in a protein. Fortunately Max had been sent some sickle-cell hemoglobin, and he gave some to Vernon to test. To his delight, the fingerprints of sickle-cell hemoglobin and of normal hemoglobin differed in the position of a single peptide.

Vernon was able to isolate the altered peptide, determine its sequence, and show that indeed the difference was due to the change of a single amino acid. Valine had been substituted for glutamic acid. At one point, I recall, he thought that perhaps two amino acids might be changed. Jim and I were brasher then and refused to believe this. “Try it again, Vernon,” we said, “you’ll find there’s just a single change” and so it turned out to be.

This result was surprising from two points of view. Sickle-cell anemia is a disease in which the altered hemoglobin forms a type of crystal inside the “red” cells of the blood when it gives up its oxygen in the veins. This often breaks the red cell open, so that patients have a chronic lack of hemoglobin in their blood and, in many cases, die in their teens. Yet this lethal effect is produced by a tiny alteration in just one of the organism’s many genes (we know now it is due to a single base change). Essentially just two molecules are defective, one inherited from the father and one from the mother. How can such a minute change possibly kill someone? The reason is the cascade of magnification. Each defective gene is copied many, many times, since each cell in the body has to have its own copy. Then, in the precursors of each red cell, each gene is copied many times onto messenger RNA, and each messenger RNA directs the synthesis of many defective protein molecules. The tiny atomic defect gets magnified and magnified till there is a considerable amount of the defective protein in the patient’s body, quite enough to kill him if the circumstances are unfavorable.

The other surprising aspect was the scientific one. Strange as it may seem, up to that point most geneticists and protein chemists had not seriously considered that their respective fields were related. Of course a few far sighted individuals, such as Hermann Muller and J. B. S. Haldane, were aware of the likely connection, but each field pursued its aims with very little awareness of the other. Ingram’s result produced a dramatic change of attitude. At about this time I ran into Fred Sanger, I think on a train to London. He said that he and his small group thought they ought to learn a little genetics, a subject about which, up to that point, they hardly knew anything at all except that it existed.

I arranged that we should have weekly evening meetings in my sitting room at the Golden Helix. Sydney Brenner and Seymour Benzer agreed to conduct these tutorials. I recall the first one rather vividly. Sydney came over a little while before the others. I asked him what he proposed to say. He said he thought he would start with Mendel and peas. I suggested that this was perhaps by now a little old-fashioned. Why not start with haploid organisms (which have only one copy of the genetic material), such as bacteria, rather than peas or mice or men, which are diploid (that is, with two copies in each cell) and thus more complicated? Sydney agreed. He gave a brilliant lecture, mainly on the difference between genotype and phenotype, illustrated with examples from bacteria and bacterial viruses. It was all the more striking since I knew it was improvised as he went along.

I think that there is a lesson here for those wanting to build a bridge between two distinct but obviously related fields (a possible modern example would be cognitive science and neurobiology). I am not sure that reasoned arguments, however well constructed, do much good. They may produce an awareness of a possible connection, but not much more. Most geneticists could not have been easily persuaded to learn protein chemistry, for example, just because a few clever people thought that was where genetics ought to go. They thought (as functionalists do today) that the logic of their subject did not depend on knowing all the biochemical details. The geneticist R. A. Fisher once told me that what we had to explain was why genes were arranged like beads on a string. I don’t think it ever occurred to him that the genes made up the string!

What makes people really appreciate the connection between two fields is some new and striking result that obviously connects them in a dramatic way. One good example is worth a ton of theoretical arguments. Given that, the bridge between the two fields is soon crowded with research workers eager to join in the new approach.

10
Theory in Molecular Biology

A
S WE HAVE JUST SEEN, the genetic code was a problem that would not yield to purely theoretical approaches. This does not mean that some general theoretical framework could not be helpful, if only to guide the directions that experiments might take. It was the nature of the structure of DNA that gave life to such speculations. Otherwise they would have been too vague to be useful. In 1957 I was invited to give a paper to a symposium of the Society for Experimental Biology in London. This gave me the opportunity to sort out and write down my ideas, most of which had been formulated earlier.

What the structure of DNA suggested was that the sequence of bases in the DNA coded for the sequence of amino acids in the corresponding protein. In the paper I called this the sequence hypothesis. Rereading it, I see that I did not express myself very precisely, since I said “… it assumes that the specificity of a piece of nucleic acid is expressed solely by the sequence of its bases, and that this sequence is a (simple) code for the amino acid sequence of a particular protein.” This rather implies that
all
nucleic acid sequences must code for protein, which is certainly not what I meant. I should have said that the only way for a gene to code for an amino acid sequence of a protein is by means of its base sequence. This leaves open the possibility that parts of the base sequence can be used for other purposes, such as control mechanisms (to determine if that particular gene should be working and at what rate) or for producing RNA for purposes other than coding. However, I don’t believe anyone noticed my slip, so little harm was done.

The other theoretical idea I proposed was of a rather different character. I suggested that “once ‘information’ has passed into protein
it cannot get out again
,“ adding that “Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein” (see
appendix A
).

I called this idea the central dogma, for two reasons, I suspect. I had already used the obvious word hypothesis in the sequence hypothesis, and in addition I wanted to suggest that this new assumption was more central and more powerful. I did remark that their speculative nature was emphasized by their names.

As it turned out, the use of the word dogma caused almost more trouble than it was worth. Many years later Jacques Monod pointed out to me that I did not appear to understand the correct use of the word dogma, which is a belief
that cannot be doubted
, I did apprehend this in a vague sort of way but since I thought that
all
religious beliefs were without any serious foundation, I used the word in the way I myself thought about it, not as most of the rest of the world does, and simply applied it to a grand hypothesis that, however plausible, had little direct experimental support.

What is the use of such general ideas? Obviously they are speculative and so may turn out to be wrong. Nevertheless, they help to organize more positive and explicit hypotheses. If well formulated, they can act as a guide through a tangled jumble of theories. Without such a guide, any theory seems possible. With it, many hypotheses fall away and one sees more clearly which ones to concentrate on. If such an approach still leaves one lost in the jungle, one tries again with a new dogma, to see if that fares any better. Fortunately in molecular biology the one first selected turned out to be correct.

I believe this is one of the most useful functions a theorist can perform in biology. In almost all cases it is virtually impossible for a theorist, by thought alone, to arrive at the correct solution to a set of biological problems. Because they have evolved by natural selection, the mechanisms involved are usually too accidental and too intricate. The best a theorist can hope to do is to point an experimentalist in the right direction, and this is often best done by suggesting what directions to avoid. If one has little hope of arriving, unaided, at the correct theory, then it is more useful to suggest which class of theories are
un
likely to be true, using some general argument about what is known of the nature of the system.

Looking back, it can now be seen that “On Protein Synthesis” is a mixture of good and bad ideas, of insights and nonsense. Those insights that have proved correct are the ones based mainly on general arguments, using data established for some time. The incorrect ideas sprang mainly from the more recent experimental results, which in most cases have turned out to be either incomplete or misleading, if not completely wrong.

Even at this stage an erroneous idea had crept in. It is clear that I thought of the RNA in the cytoplasm—in the microsomal particles, as they were then called (the word ribosome had not yet come into general use)—as a “template"; that is, as having a rather rigid structure, comparable to the double helix of DNA though probably having only a single chain. It was only later that I realized that this was too restrictive an idea, and that “tape” might be nearer the truth. Just as a ticker tape has no rigid structure except momentarily when it is actually in the ticker machine, I eventually realized that the RNA directing the synthesis of a protein need not be rigid, but could be flexible, except for that part that coded the next amino acid to be incorporated. Another consequence of this idea was that the growing protein chain did not have to stay on the template but could start to fold itself up as synthesis proceeded, as indeed had been suggested earlier.

There was another more serious mistake in my thinking at that time. I will not spell out all the details (they are given in the paper), but in effect I was making mistakes because I was confusing the mechanism itself (of protein synthesis) with completely separate mechanisms that were controlling it. Thus, in brief, because some experiments suggested that free leucine (one of the amino acids) was needed for RNA synthesis, it was concluded that there were probably common intermediates for protein and RNA synthesis, which could be used to synthesize one or the other, as required. In fact it is the
control
mechanism that requires free leucine if RNA synthesis is to continue, presumably because new RNA is not needed if the cell is so starved that free leucine is not available. I believe one can easily fall into this mistake of mixing up effects due to the nature of a mechanism itself with effects due to its control when trying to unscramble a complex biological system.

Another mistake in this general category is worth noting at this point. This is to mistake a minor process, evolved to improve the performance of the major process, for the major process itself and hence draw false conclusions about the latter. Alternatively one can be ignorant of the minor process and hence conclude that a postulated mechanism for the major process could not work.

Consider, for example, the rate of making errors in DNA replication. It is not difficult to see that if an organism has a million significant base pairs, then the error rate per step of replication should not be as great as one in a million. (The exact formulation has been given rather elegantly by Manfred Eigen.) Human DNA has about three billion base pairs (per haploid set) and although we now know that only a fraction of these have to be replicated accurately, the error rate cannot be greater than about one in a hundred million (speaking very roughly) or the organism would be torpedoed in evolution by its own errors. Yet there is a natural rate for making replication errors [due to the tautomeric nature of the bases] that it would be difficult to reduce to below about one in ten thousand. Surely, then, DNA cannot be the genetic material since its replication would produce too many errors.

Fortunately we never took this argument seriously. The obvious way out is to assume that the cell has evolved error-correcting mechanisms. Because the double helix carries two (complementary) copies of the sequence information, it is easy to see how this might be done. The
observed
error rate (the mutation rate) would be due to the errors in the error-correcting mechanism and thus can be reduced to a very low value. Leslie Orgel and I actually wrote a private letter to Arthur Kornberg, pointing this out and predicting that the enzyme he was studying that replicated DNA in the test tube (the so-called Kornberg enzyme) should contain within itself an error-correcting device, as indeed it does. DNA is, in fact, so precious and so fragile that we now know that the cell has evolved a whole variety of repair mechanisms to protect its DNA from assaults by radiation, chemicals, and other hazards. This is exactly the sort of thing that the process of evolution by natural selection would lead us to expect.

There is perhaps one other type of mistake that is worth mentioning. One should not be too clever. Or, more precisely, it is important not to believe too strongly in one’s own arguments. This particularly applies to
negative
arguments, arguments that suggest that a particular approach should certainly not be tried since it is bound to fail.

Consider the following example. As far as I know this argument was never made but it could easily have been in, say, 1950. Rosalind Franklin had shown that fibers of DNA, especially when pulled carefully and mounted under conditions in which the humidity was controlled, could give an X-ray diffraction pattern of the so-called A form, which has many fairly sharp spots. Using the theory of Fourier Transforms, it can be seen immediately that these spots show the existence of a structure with a regular repeat. If DNA were the genetic material it could hardly have a regular repeat, since it could carry no information. Thus DNA cannot be the genetic material.

However, there is a counterargument to this. The X-ray spots do not extend to very small spacings. Why do the spots fall off in this way? It could either be that the structure is highly regular but is distorted in some random manner in the fiber, or it could be that part of the structure is regular and part is irregular. If so, why should not the irregular part carry the genetic information? If this is the case, then solving the regular part of the X-ray structure, using the spots that do exist, will never tell us what we want to know—the nature of the
genetic
information—so why bother to do it?

Knowing the answer, the fallacy in this negative argument can be seen. It is true indeed that the X-ray data on fibers can never tell us the intimate details of the base sequence. What the data did lead to was the model of the double helix with base pairing as its key feature. At the low resolution associated with these spots, one base pair looks rather like any of the other three, but what the model showed us, for the first time, was the
existence
of base pairs, and this turned out to be crucial for the rapid development of the subject.

What, then, was the proper argument that should have been used? Surely it is that the chemical nature of genes is a subject of overwhelming importance. Genes were known to occur on chromosomes, and that was where DNA is found. Thus
anything
to do with DNA should be pursued as far as it can be, since one can never be sure in advance what may turn up. While one should certainly try to think which lines are worth pursuing and which are not, it is wise to be very cautious about one’s own arguments, especially when the subject is an important one, since then the cost of missing a useful approach is high.