Arrival of the Fittest: Solving Evolution's Greatest Puzzle (34 page)

BOOK: Arrival of the Fittest: Solving Evolution's Greatest Puzzle
10.99Mb size Format: txt, pdf, ePub

12
. See Cheng (1998) for a survey on the evolution of diverse antifreeze proteins. The ancestors of these proteins include enzymes and lectins, which have a variety of roles, including the promotion of cell adhesion.

13
. The species in question are sculpins in the genus
Myoxocephalus
. See Cheng (1998).

14
. Fletcher et al. (2001). Different antifreeze proteins within the same organism might not have a different origin.

15
. The ability to protect against freezing may not have arisen abruptly but gradually, where some amino acid changes increased a protein’s ability to protect against freezing to a small extent, until today’s antifreeze proteins had formed.

16
. The reactions in question are catalyzed by the enzymes HisA and TrpF. See Wierenga (2001).

17
. The
E. coli
enzyme is called L-ribulose-5-phosphate 4-epimerase. After the mutation, it becomes an aldolase. See O’Brien and Herschlag (1999).

18
. Several other changes occurred in its hemoglobin, but this one, a proline-to-alanine substitution, is especially important. See Liang et al. (2001), as well as Liu et al. (2001) and Golding and Dean (1998). A number of additional mechanisms facilitate high-altitude adaptation. See Liu et al. (2001), as well as Monge and Leonvelarde (1991).

19
. Gene duplications of the genes encoding opsins were also involved. Golding and Dean (1998) provide an overview of these and other adaptations.

20
. The phenomenon is also known as the Red Queen effect, a term coined by the American biologist Leigh Van Valen.

21
. These proteins do not appear out of nowhere. They are modifications of so-called ABC transporters, a very large and widespread class of proteins that transport all manner of molecules in and out of cells, in organisms ranging from bacteria to humans. See Putman, van Veen, and Konings (2000), as well as Gottesman et al. (1995). The modifications can affect the transporter’s amino acid sequence or the amount of protein itself, for example by changing the number of genes encoding a transporter in the genome. See Mrozikiewicz et al. (2007), as well as Stein, Walther, and Wunderlich (1994). For the rapid spreading of drug resistance see Tomasz (1997) and LeHello et al. (2013).

22
. One could say that these changes are not dramatic on the level of protein phenotypes, but they are dramatic for the (physiological) phenotypes of whole organisms. Thus whether a change constitutes an innovation depends on the level of organization at which one chooses to study the phenotype.

23
. Strictly speaking, this genotype is the DNA sequence encoding the protein, but the two are equivalent for my purpose, because a single DNA string uniquely specifies an amino acid string.

24
. As of this writing, experiments have determined the folds of more than seventy thousand proteins, and computational methods that infer the fold of one amino acid string from an experimentally determined fold of another, similar string, can infer the shapes of millions more. A central public repository for information about protein fold and function is the Protein Data Bank (http://www.pdb.org).

25
. See Maynard-Smith (1970).

26
. Many proteins are complexes of multiple polypeptides. Such complexes can be many times larger than any one polypeptide.

27
. More precisely, this space is called a generalized hypercube. See Reidys, Stadler, and Schuster (1997). One can walk away from each vertex of this hypercube in as many directions as the hypercube has neighbors. For a protein of one hundred amino acids, for example, which has nineteen hundred neighbors, nineteen hundred such directions exist.

28
. A variety of distance measures exists in sequence space. Several of them take into account that some amino acids are more similar in their chemical properties than others. How far a genotype network reaches through sequence space may vary somewhat with the distance measure used.

29
. See Eco (1977) and Putnam (1975) for a discussion of basic semiotic concepts and ambiguities about the meaning of “meaning.”

30
. Estimates of the fraction of foldable proteins vary broadly between 0.01 and 10 percent. See Keefe and Szostak (2001), as well as Finkelstein (1994) and Davidson and Sauer (1994). For the purpose of this section, I equated meaningful proteins with foldable proteins, because function requires folding in most proteins, with the caveat that some unstructured proteins may also perform useful functions.

31
. I use the notion of “work” here in the physical sense.

32
. See Keefe and Szostak (2001).

33
. Bacteriophages that can go dormant like this are called lysogenic. Their DNA becomes part of the host genome until the host experiences severe stress, at which time the viral DNA starts to express its genes and viral particles are made. See Ptashne (1992).

34
. See Reidhaar-Olson and Sauer (1990) and Taylor et al. (2001). Note that even though the total number of sequences adopting a given function may be large, the fraction of sequence space they occupy may be vanishingly small.

35
. Although these solutions may differ in their amino acid sequence, they may have other commonalities, for example a particular spatial arrangement of specific amino acids that allows catalysis of a reaction.

36
. Our genomes encode more than one globin. The hemoglobin protein itself is made up of four globin polypeptides, two so-called alpha chains and two beta chains, each of which is encoded by different genes. Other globin genes in our genome include one that is mostly expressed during development in the womb, and yet another that is important for binding oxygen in muscles.

37
. This is an estimated rate of mutation per human generation, not per round of DNA replication, which would be even lower. See, for example, Nachmann and Crowell (2000).

38
. Hemoglobin-related diseases are well studied and known as hemoglobinopathies. Sickle-cell anemia is one of them. Not all of these diseases are caused by alterations of single DNA letters. They can also be caused by deletions of DNA and other genetic changes. Some mutations in the DNA letter sequence of a gene may not affect the amino acid sequence of the encoded gene at all, because the genetic code is redundant, such that some nucleotide combinations encode the same amino acid, a fact that I briefly mentioned in chapter 1.

39
. They are taken from the beta chain of hemoglobin.

40
. Assuming a human generation time of twenty-five years.

41
. The estimates of times to most recent common ancestry I provide are approximate, as these times can only be estimated with substantial error. See, for example, Hedges and Kumar (2004), as well as Hedges and Kumar (2003).

42
. Even globins from organisms as different as plants and animals probably were not independent inventions but derive from a common ancestor. See Hardison (1996).

43
. A subtle philosophical question is what constitutes different solutions to the same problem. A chemist might argue that two proteins differing in their amino acid sequence but cleaving a small molecule with the same reaction mechanism are similar solutions, whereas two proteins that use a different reaction mechanism are different solutions. From an evolutionary perspective, however, it is sensible to view all genotypes that serve the same function as different solutions to the same problem, because each of these phenotypes can, in principle, be discovered independently from other such genotypes.

44
. See Kapp et al. (1995) and Goodman et al. (1988). To this day, proteins may keep diverging further and further from their common ancestor. See Povolotskaya and Kondrashov (2010).

45
. I emphasize the role of globins in nitrogen fixation here, but globins can also help distribute oxygen in plants. See Hardison (1996).

46
. See Rizzi et al. (1994).

47
. See Wierenga (2001). Proteins with this fold can actually have different functions, but even proteins with this fold and the
same
function can be highly divergent. TIM barrels may have originated multiple times independently in the history of life.

48
. The argument is analogous to the one from chapter 3 about the exploration of the metabolic library: A few nonillion organisms exploring a new protein every second since life’s origins would yield only a vanishingly small fraction of all proteins. It would not even make a difference if this estimate were off by several orders of magnitude.

49
. Other factors, such as gene duplication and phenotypic plasticity, can also facilitate innovation in proteins. For an overview of such factors see Wagner (2011).

50
. Most of the protein pairs he analyzed were far apart in genotype space, but not so far that they would not have originated from a common ancestral protein, as opposed to having originated independently. See Ferrada and Wagner (2010).

51
. RNA can also carry out other functions inside cells, such as to regulate genes through a process called RNA interference. Here, RNA can have an advantage over proteins, because the principle of base complementarity allows it to bind other nucleic acids with high specificity, such as parts of a messenger RNA transcribed from a gene. Among other functions of RNAs, their role in protein transport is especially noteworthy. It involves the signal recognition particle, an RNA-protein complex that helps proteins enter a part of the cell called the endoplasmatic reticulum.

52
. We know the folds of some very well studied molecules, such as the ribosomal RNA that catalyzes the key reaction of protein synthesis in the ribosome, but such information is lacking for many other RNAs.

53
. Together with Manfred Eigen, Schuster showed theoretically how heterogeneous populations of RNA molecules that can catalyze each other’s production can form self-sustaining systems they called hypercycles. See Eigen and Schuster (1979).

54
. These are described in multiple publications beginning with Hofacker et al. (1994).

55
. The base pairs that can form in the secondary structure are A-U, C-G, and G-U. (RNA contains the base uracil, abbreviated by the letter U, instead of the base thymine of DNA.) One difference between the helices of proteins and those of RNA is that the helices of protein structures are formed by a contiguous amino acid strand, whereas the helices of RNA are formed by different, generally noncontiguous parts of the same molecule. Many RNA molecules also require interactions with metal ions to form stable tertiary structures.

56
. As reported by Schuster et al. (1994), the number
S
of RNA secondary structures scales exponentially with sequence length
L,
as
S
α (
L
-15
)(1.85)
L
.

57
. An important early paper from Schuster’s research group is Schuster et al. (1994), and a broader range of later work is summarized in Schuster (2006). Although based only on secondary structure, this work provides the most comprehensive characterization of a genotype space to date. On a historical note, the first work that provided potential evidence for the existence of genotype networks came before Schuster’s, and used simple models of protein folding. See Lipman and Wilbur (1991), as well as Lau and Dill (1989). Like RNA secondary structure models, these models tell us little about the evolution of protein function, and more about the evolution of structure. Schuster’s group coined the term “neutral networks” for genotype networks. Although widely used, the term “neutrality” has a specific meaning to most students of molecular evolution. Namely, it implies changes that do not affect fitness in any way. The kinds of changes that distinguish neighboring genotypes on a genotype network are not necessarily of this nature, as I discuss in Wagner (2011). Thus it is best to use this term sparingly, and here I avoid it altogether for this reason.

58
. All these observations refer to typical shapes. There may be shapes formed by only a single RNA sequence, but these shapes would be very hard to find in a blind evolutionary search. The vast majority of RNA sequence space is filled with structures that are formed by many sequences. Moreover, the shapes of multiple biologically important RNA molecules are also formed by many sequences, as we were able to show in Jörg, Martin, and Wagner (2008).

Other books

Sutherland’s Pride by Kathryn Brocato
Risking It All by Kirk, Ambrielle
In Your Shadow by Middleton, J
Death and the Sun by Edward Lewine
Blue Mountain by Martine Leavitt
Her Story by Casinelli, Christina
Revolving Doors by Perri Forrest