64 Codons

DNA, the genetic material, replicates and undergoes a process known as transcription to synthesize RNA molecules. The genes or small regions in the DNA code for RNA. Four nucleotides constitute the double helix structure. Adenine, guanine, thymine, and cytosine are nucleotide bases present in the DNA structure. An RNA molecule consists of uracil instead of thymine. RNA is mainly involved in synthesizing proteins. Primary to quaternary structures of proteins are amino acid-dependent. Therefore, proteins also have sequences like DNA and RNA. Nucleic acids have gene sequences while proteins have amino acid sequences. Most of the proteins are not self-synthesizing because they are RNA dependent. A, C, G and U in an RNA molecule are not just letters but are capable of creating huge sentences. Meaning, A, C, G, and U nucleotides together form three-letter codes to generate codons. Thus an amino acid sequence is specified by these four nucleotides.
The concept of the codon is very simple to understand. Imagine any three of the above nucleotides together, say AUG. Thus, AUG becomes a codon. It can code for an amino acid or a signal. AUG is known as an initiation codon. It is involved in the initiation of translation (protein synthesis). Thus three nucleotide bases come together in triplets and generate total 64 codons. Why only three letter code? The logic behind cells utilizing three letter code instead of one or two letter code is to create 20 different amino acids. Just one or two letter codes won’t be sufficient. A code with more than three bases would possibly synthesize undesirable products. Thus a three letter code fits perfect, in the picture.


Image 1: Codon dictionary

The discovery of a triplet code:
Francis Crick and his colleagues worked on T4 bacteriophage and discovered the triplet code of genes. Note that T4 is a virulent phage. This phage produces its progeny in the E. coli cells and releases them through cell lysis. The Crick Brenner experiment demonstrated the triplet nature of the genetic code. They used a mutagen known as proflavin which was capable of inserting or deleting a base pair causing frameshift mutations. The experimenters tried adding and deleting the base pairs. After many such attempts, they realized the gene function to be dependent on three base pairs. Thus, they concluded that the genetic code uses a codon of three nucleotide bases. They used mutant and wild-type strains of phage. The rII phages exhibit mutant phenotypes and r+ phages exhibit wild-type phenotype. The reversion of mutant to wild-type is possible with proflavin. A base pair was added to reverse the deletion mutation. Later on, the experiments were carried out by Nirenberg and Gobind Khorana. They established an exact relationship between 64 codons and 20 amino acids. They used cell-free, protein synthesizing system with purified components isolated from E. coli. They consisted of ribosomes, tRNAs and protein factors. The study was highly extensive since the aim was to find out which codon specified for which amino acid. So they prepared synthetic mRNAs with different types of bases and were added to the system. Hence they utilized different copolymers to decipher the genetic code.

Characteristics of the genetic code:
1.     The genetic code is a triplet code. It means there are three nucleotides in a codon.
2.     The code is continuous because the mRNA is continuously read. Three nucleotides are read at a time without an erroneous skip.
3.     The triplet code does not overlap. The code reads the mRNA in successive groups of three nucleotides.
4.     The code is almost universal. The genetic language shared by all the organisms is almost the same.
5.     The code exhibits degeneracy. It means that one code can generate more than one amino acid.
6.     The code has start and stop signals.
7.     There is a wobble in the genetic code. 

INFO-BOX: Terminologies
  • Code: It is a sequence of nucleotides.
  • Codon: It is a section of DNA consisting of three nucleotide pairs or a section of RNA consisting of a code for a single amino acid.
  • Code dictionary: It is a listing of 64 possible codons and their translational meanings.
  • Coding region: It is an open reading frame. A coding region is also known as Exon that encodes for a protein.
  • Coding strand: It is a DNA strand consisting of the same sequence as a transcribed mRNA with a linear array of codons. These codons interact with the anti-codons so that they can give a primary sequence of a protein.
  • Codon preference: This concept deals with a disproportionate usage of codons. They would correspond with an abundance of tRNAs.


Wobble Hypothesis:
As discussed earlier, there are 64 codons. 61 out of 64 codons are sense codons. The remaining three codons are known as non-sense codons or stop codons. Francis Crick proposed the wobble hypothesis. The hypothesis states that the tRNAs are capable of reading the codons. The 5’ end base of the anticodon is not as constrained as the other two bases. For example, consider two leucine codons such as CUC and CUU. Leucine tRNA can read these two codons. Leucine tRNA normally pairs with CUC since it has a GAG sequence. However while pairing with CUU, the leucine tRNA follows Wobble pairing. The presence of modified purine inosine at its 5’ end of the anticodon enables it to recognize three different codons.


Image 2: Wobble pairing

The genetic code may not necessarily be universal:
Originally it was considered that the genetic code is universal. It won't be possible to change an established code. However, the genetic code may not be universal. There are deviations in some organisms. The mitochondrial genomes in some organisms use non-standard codes. For example, the mitochondrial genome of mammals exhibits UGA codon which codes for tryptophan instead of the stop codon. The same UGA codon codes for cysteine in Euplotes species. Various context-dependent codon reassignments are present in archaea. For example archeal UGA codon codes for selenocysteine and UAG codes for pyrrolysine. Non-standard codes are also known for nuclear genome of lower eukaryotes and may involve reassignment of termination codons.

Codon usage bias:
A synonymous codon frequency in a coding DNA is known as codon usage bias. A synonymous substitution is evolutionary and substitutes one base for another in a coding region. There could be a greater frequency of coding the same amino acid as if biased. Codon biases try to create a balance between mutational biases and natural selection for translation optimization. The gene expression levels, the G-C compositions, strand-specific mutational bias, GC skew, and many other factors contribute towards codon usage bias. 

References:
[1] Genomes, T.A. Brown, third edition
[2] Human molecular genetics 3, T. Strachan, Andrew P. Read, Volume 3
[3] A textbook of biotechnology, R.C. Dubey


© Copyright, 2018 All Rights Reserved.

Genomics and Proteomics for Cancer Research

The uncontrolled division of cells creates an abnormal environment in the body, leading to a condition known as cancer. It is the b...