After the primer is removed from the leading strand, how does DNA polymerase I add dNTPs without a 3'-OH?

After the primer is removed from the leading strand, how does DNA polymerase I add dNTPs without a 3'-OH?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I have a question about replication in prokaryotes. I learned in school that:

  1. DNA polymerase needs 3'-OH to add a dNTP.
  2. The chromosomes of prokaryotes are usually circular.
  3. The primer in the leading strand (of course the primers in the lagging strand too) is removed by the 5'→3' exonuclease activity of DNA polymerase I.

It makes sense that on the lagging strand the primers are removed and then replaced by new dNTPs using the 3'-OH of the previous Okazaki fragment. However, for the primer on the leading strand, there is no Okazaki fragment upstream and thus no 3'OH that DNA polymerase can use for polymerisation.

How is the primer replaced by new dNTPs on leading strand?

The leading strand post-initiation

First consider the DNA replication fork after initiation has occurred.

The 3'-OH primer on the leading strand is the 3'-end of the strand of DNA being synthesized in the 5' to 3' direction (circled in the diagram below, modified from Berg, Biochemistry). It is not removed because there is no need to do so. (The primer on the lagging strand is only removed because it is RNA.) So the problem posed in the question does not arise.

Remember that the reason for Okazaki fragments is that there is no DNA 3'-OH primer for the lagging strand, so a temporary RNA 3'-OH primer is generated instead, RNA polymerase - unlike DNA polymerase - is able to copy DNA without a primer. There is no such problem for the leading strand.

The leading strand at the origin of replication

As @canadianer points out in a comment, the remarks above do not apply to the initiation of DNA replication, the single instance where the leading strand must have an RNA primer. In the case of prokaryotes such as Escherichia coli replication proceeds in both directions so that a short time after initiation one has a replication bubble as shown in the figure below (adapted from Russel, iGenetics) :

It can be seen that, at the origin, the 3'-OH of the DNA from the lagging strand of the other direction of replication (boxed in red) can act as a primer for the DNA synthesis function of DNA polymerase I, after its exonuclease activity has removed a nucleotide from the start of the RNA primer on the leading strand.


Berg et al. Ch.27

The Cell, Ch. 5

Sandwalk: Blog page


There was a typo in the original question which caused confusion about which strand the poster was concerned with. This answer assumes the leading strand.

Section Summary

Replication in prokaryotes starts from a sequence found on the chromosome called the origin of replication—the point at which the DNA opens up. Helicase opens up the DNA double helix, resulting in the formation of the replication fork. Single-strand binding proteins bind to the single-stranded DNA near the replication fork to keep the fork open. Primase synthesizes an RNA primer to initiate synthesis by DNA polymerase, which can add nucleotides only in the 5′ to 3′ direction. One strand is synthesized continuously in the direction of the replication fork this is called the leading strand. The other strand is synthesized in a direction away from the replication fork, in short stretches of DNA known as Okazaki fragments. This strand is known as the lagging strand. Once replication is completed, the RNA primers are replaced by DNA nucleotides and the DNA is sealed with DNA ligase, which creates phosphodiester bonds between the 3′-OH of one end and the 5′ phosphate of the other strand.

Quantum Leaps in Biochemistry

Anil Day , Joanna Poulton , in Foundations of Modern Biochemistry , 1996

Mitochondria Import RNA

The discovery that the RNA primer used to initiate DNA synthesis in mammalian mitochondria was imported from the nucleus dispelled the belief that only proteins but not RNA could cross organelle membranes ( Chang and Clayton, 1987 ). Evidence that tRNAs are imported from the cytosol into mitochondria is derived from elucidating the coding capacity of mitochondrial DNA and also characterizing tRNAs present in isolated mitochondria. Mammalian mitochondria use 22 unusual tRNAs to decode the 61 sense codons. The linear 15.8-kb genome of C. reinhardtii only encodes three tRNAs ( Michaelis et al., 1990 ). Although liverwort mitochondrial DNA encodes 27 tRNA species, two species necessary to read leucine and threonine codons are absent. C. reinhardtii, plant, and trypanosome mitochondria appear to import nuclear-encoded tRNAs to make up a complete set for protein synthesis. Eleven of the 31 tRNA species present in potato mitochondria are encoded by nuclear DNA and are imported from the cytosol ( Dietrich et al., 1992 ). The plastid genome of Epifagus virginiana, a nonphotosynthetic parasite of beech trees, lacks 13 tRNA genes found in green plastids. This suggests tRNA import can also occur in plastids ( Wolfe et al., 1992 ).

DNA Replication

When the cell enters S (synthesis) phase in the cell cycle, all the chromosomal DNA must be replicated. DNA polymerases synthesize new strands by adding nucleotides to the 3'-OH group present on the previous nucleotide using the separated single strands of DNA as templates. This process generates two new double-stranded molecules, called sister chromatids, from one double helix. But how are the new and old strands distributed? The answer to this question was elucidated by classic experiments by Meselson and Stahl.

Experiment that demonstrated semiconservative DNA replication

For an overview of the experiment, watch:

Now, listen to the following story about these classic experiments by one of the scientists involved:

The Mechanisms of Replication

Like many molecular events we will study, replication can be divided into three stages: initiation, elongation, and termination.


DNA polymerase does not begin at random locations on chromosomes. DNA replication in both prokaryotes and eukaryotes begins at an origin of replication (Ori), which are specific sequences at specific positions on chromosomes. In E. coli, the OriC origin is

245 bp in size. Chromosome replication begins with the binding of an initiator protein (DnaA) to an AT-rich sequence (for example TTATCCACA in E. coli) in the OriC and melts (disrupts the hydrogen bonding between) the two strands. Helicase enzymes will be added to unwind additional DNA from this point.

In prokaryotes, with a small, simple, circular chromosome, only one origin of replication is needed to replicate the whole genome. For example, E. coli has a

4.5 Mb genome (chromosome) that can be duplicated in

40 minutes assuming a single origin, bi-directional replication, and a speed of

1000 bases/second/fork for the polymerase. At least five prokaryotic DNA polymerases have been discovered to date with DNA Polymerase III (Pol III) being the primary polymerase for replication. Pol I is used primarily to fill in gaps created during lagging strand synthesis (described below) or through error-correcting mechanisms. DNA polymerase II, IV and V are used to synthesize DNA when certain types of repair is needed at other times in the cellular life cycle.

Why would an initiator protein binding site in the OriC be an AT-rich region?

Hint: Think about the number of hydrogen bonds between AT and GC base pairs.

In eukaryotes, with multiple linear chromosomes, more than one origin of replication is required per chromosome to duplicate the whole chromosome set in the 8-hours of S-phase of the cell cycle. For example, the human diploid genome has 46 chromosomes (6 x 10 9 base pairs). Even the shortest chromosomes are

50 Mbp long and so could not possibly be replicated from one origin. Additionally, the rate of replication fork movement is slower than in prokaryotes, only

100 base/second. Thus, eukaryotes contain multiple origins of replication distributed over the length of each chromosome to enable the duplication of each chromosome within S-phase.

Figure (PageIndex<1>): Schematic of a eukaryote chromosome showing multiple origins (1, 2, 3) of replication, each defining a replicon (1, 2, 3). Replication may start at different times in S-phase. Here #1 and #2 begin first then #3. As the replication forks proceed bidirectionally, they create &ldquoreplication bubbles&rdquo that meet and form larger bubbles. The end result is two semi-conservatively replicated duplex DNA strands, with the parental strands in black and the newly-synthesized strands in red. (Original-Locke-CC:AN)


As DNA polymerase proceeds along the template, the nucleotide that base pairs with each base on the template is covalently bonded to the 3' end of the growing strand. Note that the energy is provided by the nucleotide triphosphate itself two phosphates are released and one phosphate remains as a part of the phosphodiester bond.


In prokaryotes, elongation proceeds bidirectionally until the replication forks meet. RNA primers are removed by a specialized DNA polymerase and then DNA is synthesized in their place. The resulting DNA fragments are then "sealed" together with DNA ligase, an enzyme that forms covalent phosphodiester bonds between two nucleotides.

In eukaryotes, replication also proceeds bidirectionally until adjacent forks meet, or the fork encounters the end of the chromosome.

Implications and limitations of the 5'-to-3' activity of DNA polymerase

Recall that enzymes are specific to their substrates. In the case of DNA polymerase, the structure only allows it to add nucleotides to the 3' end of existing DNA, which presents some questions:

1. If the enzyme can only add nucleotides to existing DNA, how will it get started?

2. Because DNA is double stranded, each strand needs to be used as a template, but these strands are antiparallel. How can one complex make new DNA in opposite directions?

3. How will the 3' end be replicated when there is no longer a place for a primer on the complementary strand?

Obstacle #1: How to begin

Once oriC has been opened and the helicases have attached to the two sides of the replication fork, the replication machine, aka the replisome can begin to form. However, before the DNA polymerases take positions, they need to be primed. DNA polymerases are unable to join two individual free nucleotides together to begin forming a nucleic acid they can only add onto a pre-existing strand of at least two nucleotides. Therefore, a specialized RNA polymerase (RNAP&rsquos do not have this limitation) known as primase is a part of the replisome, and reads creates a short RNA strand termed the primer for the DNA polymerase to add onto. Although only a few nucleotides are needed, the prokaryotic primers may be as long as 60 nt depending on the species. The helicase will continue to travel in front of the fork to unwind new DNA and allow primase to add new primers as needed.

Obstacle #2: Make two strands in opposite directions at the same time

Because DNA is being unwound in the direction of fork movement, both strands need to be synthesized in the unwound region at the same time.The two subunits of DNA polymerase that are adding nucleotides are actually tethered together, so they cannot travel in opposite directions.

Helicase opens up the double stranded DNA and leads the rest of the replication machine along. Because nucleic acids are polymerized by adding the 5&rsquo phosphate of a new nucleotide to the 3&rsquo hydroxyl of the previous nucleotide, one of the strands, called the leading strand, is being synthesized in the same direction that the replication machine moves. No problem there.

The other strand is problematic: looked at linearly, the newly synthesized strand would be going 3&rsquo to 5&rsquo in the direction of helicase movement, but DNA polymerases cannot add nucleotides to the 5&rsquo end (remember that it has the phosphate, not a a hydroxyl). How do cells resolve this problem? In the current model, the replication machine consists of the helicase, primases, and two DNA polymerase III holoenzymes moving in the same physical direction (following the helicase). In fact, the Pol III complexes are physically linked through &tau subunits (Figure (PageIndex<3>).

Figure (PageIndex<3>): DNA Replication in prokaryotes. DNA polymerase III is a multi-subunit holoenzyme. The blue ring represents the &beta subunit (also known as the &beta clamp), a dimer of semicircular subunits that has a central hole through which the DNA is threaded. The core polymerase, via an &alpha-&beta interaction, is attached to this &beta clamp so that it stays on the DNA longer, increasing the processivity of Pol III to over 5000nt.

For the other strand to be replicated, the strand must be fed into the polymerase backwards, by looping the DNA around. Moving behind DNA helicase, primase quickly synthesizes a short primer before having to move forward with the replisome and synthesizing again, leaving intermittent primers in its wake. Thus, Pol III is forced to synthesize only short fragments of the chromosome at a time, called Okazaki fragments after their discoverers, Reiji and Tsuneko Okazaki. Pol III begins synthesizing by adding nucleotides onto the 3&rsquo end of a primer and continues until it hits the 5&rsquo end of the next primer. Pol III does not (and can not) connect the strand it is synthesizing with the 3&rsquo end of the RNA primer.

DNA replication is called a semi-discontinuous process because while the leading strand is being synthesized continuously, the lagging strand is synthesized in fragments. This leads to two major problems: first, there are little bits of RNA left behind in the newly made strands (just at the 5&rsquo end for the leading strand, in many places for the lagging) and second, Pol III can only add free nucleotides to a fragment of single stranded DNA it cannot connect another fragment. Therefore, the new &ldquostrand&rdquo is not whole, but riddled with missing phosphodiester bonds.

The first problem is resolved by DNA polymerase I. Unlike Pol III, Pol I is a monomeric protein and acts alone, without additional proteins. There are also 10-20 times as many Pol I molecules as there are Pol III molecules, because they are needed for so many Okazaki fragments. DNA Polymerase I has three activities:

  1. like Pol III, it can synthesize a DNA strand based on a DNA template,
  2. also like Pol III, it is a 3&rsquo-5&rsquo proofreading exonuclease, but unlike Pol III,
  3. it is also a 5&rsquo-3&rsquo exonuclease. The 5&rsquo-3&rsquo exonuclease activity is crucial in removing the RNA primer. The 5&rsquo-3&rsquo exonuclease binds to double- stranded DNA that has a single-stranded break in the phosphodiester backbone such as what happens after Okazaki fragments have been synthesized from one primer to the next, but cannot be connected. This 5&rsquo-3&rsquo exonuclease then removes the RNA primer. The polymerase activity then adds new DNA nucleotides to the upstream Okazaki fragment, filling in the gap created by the removal of the RNA primer. The proofreading exonuclease acts just like it does for Pol III, immediately removing a newly incorporated incorrect nucleotide. After proofreading, the overall error rate of nucleotide incorporation is approximately 1 in 10 7 .

Even though the RNA has been replaced with DNA, this still leaves a fragmented strand. The last major player in the DNA replication story finally appears: DNA ligase. This enzyme has one simple but crucial task: it catalyzes the attack of the 3&rsquo-OH from one fragment on the 5&rsquo phosphate of the next fragment, generating a phosphodiester bond.

See the whole complex in action in this animation:

Obstacle #3: How to copy the end?

The ends of linear chromosomes present a problem &ndash at each end one strand cannot be completely replicated because there is no primer to extend. Although the loss of such a small sequence would not be a problem, the continued rounds of replication would result in the continued loss of sequence from the chromosome end to a point where it would begin to lose essential gene sequences. Thus, this DNA must be replicated. Most eukaryotes solve this problem with a specialized DNA polymerase called telomerase, in combination with a regular polymerase. Telomerases are RNA-directed DNA polymerases. They are a riboprotein, as they are composed of both protein and RNA. These enzymes contain a small piece of RNA that serves as a portable and reusable template from which the complementary DNA is synthesized. The RNA in human telomerases uses the sequence 3'-AAUCCC-5' as the template, and thus our telomeric DNA has the complementary sequence 5'-TTAGGG-3' repeated thousands of times. After telomerase has made the first strand, a primase synthesizes an RNA primer and a DNA polymerase then makes a complement of the extended sequence.

Telomerase = non-coding RNA + reverse transcriptase protein

In humans, the gene for the telomerase RNA is named TERC (telomerase RNA component) and is found on chromosome 3. This gene therefore encodes a product (an RNA) but not a protein!

The gene for the protein is named TERT (telomerase reverse transcriptase) and is found on chromosome 5.

These independently-encoded and independently-produced gene products must assemble to perform their function.

Note that the number of repeats, and thus the size of the telomere, is not set, but may fluctuate after each round of the cell cycle. Because there are many repeats at the end, this fluctuation maintains a length buffer &ndash sometimes it&rsquos longer, sometimes it&rsquos shorter &ndash but the average length will be maintained over the generations of cell replication.

Figure (PageIndex<4>): Telomere replication showing the completion of the leading strand and incomplete replication of the lagging strand. The gap is replicated by the extension of the 3&rsquo end by telomerase and then filled in by extension from an RNA primer. (Original-Locke-CC:AN)

In the absence of telomerase, as is the case in human somatic cells, repeated cell division leads to telomere shortening. If a critical limit is reached, the cells enter a senescence phase of non-growth. The activation of telomerase expression, which often occurs in cancer cells, permits a cell and its descendants to become immortal because their telomeres will not reach the critical limit. Cells that can be grown in culture, such as HeLa cells that overexpress telomerase can be propagated essentially indefinitely. HeLa cells, which were originally isolated from an African-American cancer patient, Henrietta Lacks, without her consent, have been kept in culture since 1951.

DNA polymerase proofreading

DNA polymerase incorporates the correct nucleotides into each new strand based on the favorable hydrogen bonding that occurs between bases that that bind using the rules of base pairing (A-T and G-C). If an incorrect base is incorporated, the structure of the double helix will be distorted and will often be detected by the enzyme if it has not proceeded too far down the strand. DNA polymerase has 3' to 5' exonuclease activity to remove nucleotides through the mismatch and then repeat synthesis of that region of the strand with the correct nucleotides. As a result of this proofreading ability, DNA polymerases have very low error rates, allowing organisms to successfully pass genetic information from cell to cell and from generation to generation.


RNA primers are used by living organisms in the initiation of synthesizing a strand of DNA. A class of enzymes called primases add a complementary RNA primer to the reading template de novo on both the leading and lagging strands. Starting from the free 3’-OH of the primer, known as the primer terminus, a DNA polymerase can extend a newly synthesized strand. The leading strand in DNA replication is synthesized in one continuous piece moving with the replication fork, requiring only an initial RNA primer to begin synthesis. In the lagging strand, the template DNA runs in the 5′→3′ direction. Since DNA polymerase cannot add bases in the 3′→5′ direction complementary to the template strand, DNA is synthesized ‘backward’ in short fragments moving away from the replication fork, known as Okazaki fragments. Unlike in the leading strand, this method results in the repeated starting and stopping of DNA synthesis, requiring multiple RNA primers. Along the DNA template, primase intersperses RNA primers that DNA polymerase uses to synthesize DNA from in the 5′→3′ direction. [1]

Another example of primers being used to enable DNA synthesis is reverse transcription. Reverse transcriptase is an enzyme that uses a template strand of RNA to synthesize a complementary strand of DNA. The DNA polymerase component of reverse transcriptase requires an existing 3' end to begin synthesis. [1]

Primer removal Edit

After the insertion of Okazaki fragments, the RNA primers are removed (the mechanism of removal differs between prokaryotes and eukaryotes) and replaced with new deoxyribonucleotides that fill the gaps where the RNA was present. DNA ligase then joins the fragmented strands together, completing the synthesis of the lagging strand. [1]

In prokaryotes, DNA polymerase I synthesizes the Okazaki fragment until it reaches the previous RNA primer. Then the enzyme simultaneously acts as a 5′→3′ exonuclease, removing primer ribonucleotides in front and adding deoxyribonucleotides behind until the region has been replaced by DNA, leaving a small gap in the DNA backbone between Okazaki fragments which is sealed by DNA ligase.

In eukaryotic primer removal, DNA polymerase δ extends the Okazaki fragment in 5′→3′ direction, and upon encountering the RNA primer from the previous Okazaki fragment, it displaces the 5′ end of the primer into a single-stranded RNA flap, which is removed by nuclease cleavage. Cleavage of the RNA flaps involves either flap structure-specific endonuclease 1 (FEN1) cleavage of short flaps, or coating of long flaps by the single-stranded DNA binding protein replication protein A (RPA) and sequential cleavage by Dna2 nuclease and FEN1. [2]

Synthetic primers are chemically synthesized oligonucleotides, usually of DNA, which can be customized to anneal to a specific site on the template DNA. In solution, the primer spontaneously hybridizes with the template through Watson-Crick base pairing before being extended by DNA polymerase. The ability to create and customize synthetic primers has proven an invaluable tool necessary to a variety of molecular biological approaches involving the analysis of DNA. Both the Sanger chain termination method and the “Next-Gen” method of DNA sequencing require primers to initiate the reaction. [1]

PCR primer design Edit

The polymerase chain reaction (PCR) uses a pair of custom primers to direct DNA elongation toward each other at opposite ends of the sequence being amplified. These primers are typically between 18 and 24 bases in length and must code for only the specific upstream and downstream sites of the sequence being amplified. A primer that can bind to multiple regions along the DNA will amplify them all, eliminating the purpose of PCR. [1]

A few criteria must be brought into consideration when designing a pair of PCR primers. Pairs of primers should have similar melting temperatures since annealing during PCR occurs for both strands simultaneously, and this shared melting temperature must not be either too much higher or lower than the reaction's annealing temperature. A primer with a Tm (melting temperature) too much higher than the reaction's annealing temperature may mishybridize and extend at an incorrect location along the DNA sequence. A Tm significantly lower than the annealing temperature may fail to anneal and extend at all.

Additionally, primer sequences need to be chosen to uniquely select for a region of DNA, avoiding the possibility of hybridization to a similar sequence nearby. A commonly used method for selecting a primer site is BLAST search, whereby all the possible regions to which a primer may bind can be seen. Both the nucleotide sequence as well as the primer itself can be BLAST searched. The free NCBI tool Primer-BLAST integrates primer design and BLAST search into one application, [3] as do commercial software products such as ePrime and Beacon Designer. Computer simulations of theoretical PCR results (Electronic PCR) may be performed to assist in primer design by giving melting and annealing temperatures, etc. [4]

As of 2014, many online tools are freely available for primer design, some of which focus on specific applications of PCR. Primers with high specificity for a subset of DNA templates in the presence of many similar variants can be designed using DECIPHER [ citation needed ] .

Selecting a specific region of DNA for primer binding requires some additional considerations. Regions high in mononucleotide and dinucleotide repeats should be avoided, as loop formation can occur and contribute to mishybridization. Primers should not easily anneal with other primers in the mixture this phenomenon can lead to the production of 'primer dimer' products contaminating the end solution. Primers should also not anneal strongly to themselves, as internal hairpins and loops could hinder the annealing with the template DNA.

When designing primers, additional nucleotide bases can be added to the back ends of each primer, resulting in a customized cap sequence on each end of the amplified region. One application for this practice is for use in TA cloning, a special subcloning technique similar to PCR, where efficiency can be increased by adding AG tails to the 5′ and the 3′ ends. [5]

Degenerate primers Edit

Some situations may call for the use of degenerate primers. These are mixtures of primers that are similar, but not identical. These may be convenient when amplifying the same gene from different organisms, as the sequences are probably similar but not identical. This technique is useful because the genetic code itself is degenerate, meaning several different codons can code for the same amino acid. This allows different organisms to have a significantly different genetic sequence that code for a highly similar protein. For this reason, degenerate primers are also used when primer design is based on protein sequence, as the specific sequence of codons are not known. Therefore, primer sequence corresponding to the amino acid isoleucine might be "ATH", where A stands for adenine, T for thymine, and H for adenine, thymine, or cytosine, according to the genetic code for each codon, using the IUPAC symbols for degenerate bases. Degenerate primers may not perfectly hybridize with a target sequence, which can greatly reduce the specificity of the PCR amplification.

Degenerate primers are widely used and extremely useful in the field of microbial ecology. They allow for the amplification of genes from thus far uncultivated microorganisms or allow the recovery of genes from organisms where genomic information is not available. Usually, degenerate primers are designed by aligning gene sequencing found in GenBank. Differences among sequences are accounted for by using IUPAC degeneracies for individual bases. PCR primers are then synthesized as a mixture of primers corresponding to all permutations of the codon sequence.

2. Size Separation by Gel Electrophoresis

In the second step, the chain-terminated oligonucleotides are separated by size via gel electrophoresis. In gel electrophoresis, DNA samples are loaded into one end of a gel matrix, and an electric current is applied DNA is negatively charged, so the oligonucleotides will be pulled toward the positive electrode on the opposite side of the gel. Because all DNA fragments have the same charge per unit of mass, the speed at which the oligonucleotides move will be determined only by size. The smaller a fragment is, the less friction it will experience as it moves through the gel, and the faster it will move. In result, the oligonucleotides will be arranged from smallest to largest, reading the gel from bottom to top.

In manual Sanger sequencing, the oligonucleotides from each of the four PCR reactions are run in four separate lanes of a gel. This allows the user to know which oligonucleotides correspond to each ddNTP.

In automated Sanger sequencing, all oligonucleotides are run in a single capillary gel electrophoresis within the sequencing machine.

Section Summary

Replication in prokaryotes starts from a sequence found on the chromosome called the origin of replication—the point at which the DNA opens up. Helicase opens up the DNA double helix, resulting in the formation of the replication fork. Single-strand binding proteins bind to the single-stranded DNA near the replication fork to keep the fork open. Primase synthesizes an RNA primer to initiate synthesis by DNA polymerase, which can add nucleotides only in the 5′ to 3′ direction. One strand is synthesized continuously in the direction of the replication fork this is called the leading strand. The other strand is synthesized in a direction away from the replication fork, in short stretches of DNA known as Okazaki fragments. This strand is known as the lagging strand. Once replication is completed, the RNA primers are replaced by DNA nucleotides and the DNA is sealed with DNA ligase, which creates phosphodiester bonds between the 3′-OH of one end and the 5′ phosphate of the other strand.

After the primer is removed from the leading strand, how does DNA polymerase I add dNTPs without a 3'-OH? - Biology

By the end of this section, you will be able to do the following:

  • Explain the process of DNA replication in prokaryotes
  • Discuss the role of different enzymes and proteins in supporting this process

DNA replication has been well studied in prokaryotes primarily because of the small size of the genome and because of the large variety of mutants that are available. E. coli has 4.6 million base pairs in a single circular chromosome and all of it gets replicated in approximately 42 minutes, starting from a single site along the chromosome and proceeding around the circle in both directions. This means that approximately 1000 nucleotides are added per second. Thus, the process is quite rapid and occurs without many mistakes.

DNA replication employs a large number of structural proteins and enzymes, each of which plays a critical role during the process. One of the key players is the enzyme DNA polymerase, also known as DNA pol, which adds nucleotides one-by-one to the growing DNA chain that is complementary to the template strand. The addition of nucleotides requires energy this energy is obtained from the nucleoside triphosphates ATP, GTP, TTP and CTP. Like ATP, the other NTPs (nucleoside triphosphates) are high-energy molecules that can serve both as the source of DNA nucleotides and the source of energy to drive the polymerization. When the bond between the phosphates is “broken,” the energy released is used to form the phosphodiester bond between the incoming nucleotide and the growing chain. In prokaryotes, three main types of polymerases are known: DNA pol I, DNA pol II, and DNA pol III. It is now known that DNA pol III is the enzyme required for DNA synthesis DNA pol I is an important accessory enzyme in DNA replication, and along with DNA pol II, is primarily required for repair.

How does the replication machinery know where to begin? It turns out that there are specific nucleotide sequences called origins of replication where replication begins. In E. coli, which has a single origin of replication on its one chromosome (as do most prokaryotes), this origin of replication is approximately 245 base pairs long and is rich in AT sequences. The origin of replication is recognized by certain proteins that bind to this site. An enzyme called helicase unwinds the DNA by breaking the hydrogen bonds between the nitrogenous base pairs. ATP hydrolysis is required for this process. As the DNA opens up, Y-shaped structures called replication forks are formed. Two replication forks are formed at the origin of replication and these get extended bi-directionally as replication proceeds. Single-strand binding proteins coat the single strands of DNA near the replication fork to prevent the single-stranded DNA from winding back into a double helix.

DNA polymerase has two important restrictions: it is able to add nucleotides only in the 5′ to 3′ direction (a new DNA strand can be only extended in this direction). It also requires a free 3′-OH group to which it can add nucleotides by forming a phosphodiester bond between the 3′-OH end and the 5′ phosphate of the next nucleotide. This essentially means that it cannot add nucleotides if a free 3′-OH group is not available. Then how does it add the first nucleotide? The problem is solved with the help of a primer that provides the free 3′-OH end. Another enzyme, RNA primase, synthesizes an RNA segment that is about five to ten nucleotides long and complementary to the template DNA. Because this sequence primes the DNA synthesis, it is appropriately called the primer. DNA polymerase can now extend this RNA primer, adding nucleotides one-by-one that are complementary to the template strand ((Figure)).

Art Connection

Figure 1. A replication fork is formed when helicase separates the DNA strands at the origin of replication. The DNA tends to become more highly coiled ahead of the replication fork. Topoisomerase breaks and reforms DNA’s phosphate backbone ahead of the replication fork, thereby relieving the pressure that results from this “supercoiling.” Single-strand binding proteins bind to the single-stranded DNA to prevent the helix from re-forming. Primase synthesizes an RNA primer. DNA polymerase III uses this primer to synthesize the daughter DNA strand. On the leading strand, DNA is synthesized continuously, whereas on the lagging strand, DNA is synthesized in short stretches called Okazaki fragments. DNA polymerase I replaces the RNA primer with DNA. DNA ligase seals the gaps between the Okazaki fragments, joining the fragments into a single DNA molecule. (credit: modification of work by Mariana Ruiz Villareal)

Question: You isolate a cell strain in which the joining of Okazaki fragments is impaired and suspect that a mutation has occurred in an enzyme found at the replication fork. Which enzyme is most likely to be mutated?

The replication fork moves at the rate of 1000 nucleotides per second. Topoisomerase prevents the over-winding of the DNA double helix ahead of the replication fork as the DNA is opening up it does so by causing temporary nicks in the DNA helix and then resealing it. Because DNA polymerase can only extend in the 5′ to 3′ direction, and because the DNA double helix is antiparallel, there is a slight problem at the replication fork. The two template DNA strands have opposing orientations: one strand is in the 5′ to 3′ direction and the other is oriented in the 3′ to 5′ direction. Only one new DNA strand, the one that is complementary to the 3′ to 5′ parental DNA strand, can be synthesized continuously towards the replication fork. This continuously synthesized strand is known as the leading strand. The other strand, complementary to the 5′ to 3′ parental DNA, is extended away from the replication fork, in small fragments known as Okazaki fragments, each requiring a primer to start the synthesis. New primer segments are laid down in the direction of the replication fork, but each pointing away from it. (Okazaki fragments are named after the Japanese scientist who first discovered them. The strand with the Okazaki fragments is known as the lagging strand.)

The leading strand can be extended from a single primer, whereas the lagging strand needs a new primer for each of the short Okazaki fragments. The overall direction of the lagging strand will be 3′ to 5′, and that of the leading strand 5′ to 3′. A protein called the sliding clamp holds the DNA polymerase in place as it continues to add nucleotides. The sliding clamp is a ring-shaped protein that binds to the DNA and holds the polymerase in place. As synthesis proceeds, the RNA primers are replaced by DNA. The primers are removed by the exonuclease activity of DNA pol I, which uses DNA behind the RNA as its own primer and fills in the gaps left by removal of the RNA nucleotides by the addition of DNA nucleotides. The nicks that remain between the newly synthesized DNA (that replaced the RNA primer) and the previously synthesized DNA are sealed by the enzyme DNA ligase, which catalyzes the formation of phosphodiester linkages between the 3′-OH end of one nucleotide and the 5′ phosphate end of the other fragment.

Once the chromosome has been completely replicated, the two DNA copies move into two different cells during cell division.

The process of DNA replication can be summarized as follows:

  1. DNA unwinds at the origin of replication.
  2. Helicase opens up the DNA-forming replication forks these are extended bidirectionally.
  3. Single-strand binding proteins coat the DNA around the replication fork to prevent rewinding of the DNA.
  4. Topoisomerase binds at the region ahead of the replication fork to prevent supercoiling.
  5. Primase synthesizes RNA primers complementary to the DNA strand.
  6. DNA polymerase III starts adding nucleotides to the 3′-OH end of the primer.
  7. Elongation of both the lagging and the leading strand continues.
  8. RNA primers are removed by exonuclease activity.
  9. Gaps are filled by DNA pol I by adding dNTPs.
  10. The gap between the two DNA fragments is sealed by DNA ligase, which helps in the formation of phosphodiester bonds.

(Figure) summarizes the enzymes involved in prokaryotic DNA replication and the functions of each.

Prokaryotic DNA Replication: Enzymes and Their Function
Enzyme/protein Specific Function
DNA pol I Removes RNA primer and replaces it with newly synthesized DNA
DNA pol III Main enzyme that adds nucleotides in the 5′-3′ direction
Helicase Opens the DNA helix by breaking hydrogen bonds between the nitrogenous bases
Ligase Seals the gaps between the Okazaki fragments to create one continuous DNA strand
Primase Synthesizes RNA primers needed to start replication
Sliding Clamp Helps to hold the DNA polymerase in place when nucleotides are being added
Topoisomerase Helps relieve the strain on DNA when unwinding by causing breaks, and then resealing the DNA
Single-strand binding proteins (SSB) Binds to single-stranded DNA to prevent DNA from rewinding back.

Link to Learning

Review the full process of DNA replication here.

Section Summary

Replication in prokaryotes starts from a sequence found on the chromosome called the origin of replication—the point at which the DNA opens up. Helicase opens up the DNA double helix, resulting in the formation of the replication fork. Single-strand binding proteins bind to the single-stranded DNA near the replication fork to keep the fork open. Primase synthesizes an RNA primer to initiate synthesis by DNA polymerase, which can add nucleotides only to the 3′ end of a previously synthesized primer strand. Both new DNA strands grow according to their respective 5′-3′ directions. One strand is synthesized continuously in the direction of the replication fork this is called the leading strand. The other strand is synthesized in a direction away from the replication fork, in short stretches of DNA known as Okazaki fragments. This strand is known as the lagging strand. Once replication is completed, the RNA primers are replaced by DNA nucleotides and the DNA is sealed with DNA ligase, which creates phosphodiester bonds between the 3′-OH of one end and the 5′ phosphate of the other strand.

Art Connections

(Figure) You isolate a cell strain in which the joining of Okazaki fragments is impaired and suspect that a mutation has occurred in an enzyme found at the replication fork. Which enzyme is most likely to be mutated?


The classical chain-termination method requires a single-stranded DNA template, a DNA primer, a DNA polymerase, normal deoxynucleotide triphosphates (dNTPs), and modified di-deoxynucleotide triphosphates (ddNTPs), the latter of which terminate DNA strand elongation. These chain-terminating nucleotides lack a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides, causing DNA polymerase to cease extension of DNA when a modified ddNTP is incorporated. The ddNTPs may be radioactively or fluorescently labelled for detection in automated sequencing machines.

The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP), while the other added nucleotides are ordinary ones. The deoxynucleotide concentration should be approximately 100-fold higher than that of the corresponding dideoxynucleotide (e.g. 0.5mM dTTP : 0.005mM ddTTP) to allow enough fragments to be produced while still transcribing the complete sequence (but the concentration of ddNTP also depends on the desired length of sequence). [2] Putting it in a more sensible order, four separate reactions are needed in this process to test all four ddNTPs. Following rounds of template DNA extension from the bound primer, the resulting DNA fragments are heat denatured and separated by size using gel electrophoresis. In the original publication of 1977, [2] the formation of base-paired loops of ssDNA was a cause of serious difficulty in resolving bands at some locations. This is frequently performed using a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C). The DNA bands may then be visualized by autoradiography or UV light and the DNA sequence can be directly read off the X-ray film or gel image.

In the image on the right, X-ray film was exposed to the gel, and the dark bands correspond to DNA fragments of different lengths. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The relative positions of the different bands among the four lanes, from bottom to top, are then used to read the DNA sequence.

Technical variations of chain-termination sequencing include tagging with nucleotides containing radioactive phosphorus for radiolabelling, or using a primer labeled at the 5' end with a fluorescent dye. Dye-primer sequencing facilitates reading in an optical system for faster and more economical analysis and automation. The later development by Leroy Hood and coworkers [4] [5] of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing.

Chain-termination methods have greatly simplified DNA sequencing. For example, chain-termination-based kits are commercially available that contain the reagents needed for sequencing, pre-aliquoted and ready to use. Limitations include non-specific binding of the primer to the DNA, affecting accurate read-out of the DNA sequence, and DNA secondary structures affecting the fidelity of the sequence.

Dye-terminator sequencing Edit

Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction, rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which emit light at different wavelengths.

Owing to its greater expediency and speed, dye-terminator sequencing is now the mainstay in automated sequencing. Its limitations include dye effects due to differences in the incorporation of the dye-labelled chain terminators into the DNA fragment, resulting in unequal peak heights and shapes in the electronic DNA sequence trace chromatogram after capillary electrophoresis (see figure to the left).

This problem has been addressed with the use of modified DNA polymerase enzyme systems and dyes that minimize incorporation variability, as well as methods for eliminating "dye blobs". The dye-terminator sequencing method, along with automated high-throughput DNA sequence analyzers, was used for the vast majority of sequencing projects until the introduction of next generation sequencing.

Automation and sample preparation Edit

Automated DNA-sequencing instruments (DNA sequencers) can sequence up to 384 DNA samples in a single batch. Batch runs may occur up to 24 times a day. DNA sequencers separate strands by size (or length) using capillary electrophoresis, they detect and record dye fluorescence, and output data as fluorescent peak trace chromatograms. Sequencing reactions (thermocycling and labelling), cleanup and re-suspension of samples in a buffer solution are performed separately, before loading samples onto the sequencer. A number of commercial and non-commercial software packages can trim low-quality DNA traces automatically. These programs score the quality of each peak and remove low-quality base peaks (which are generally located at the ends of the sequence). The accuracy of such algorithms is inferior to visual examination by a human operator, but is adequate for automated processing of large sequence data sets.

Challenges Edit

Common challenges of DNA sequencing with the Sanger method include poor quality in the first 15-40 bases of the sequence due to primer binding and deteriorating quality of sequencing traces after 700-900 bases. Base calling software such as Phred typically provides an estimate of quality to aid in trimming of low-quality regions of sequences. [6] [7]

In cases where DNA fragments are cloned before sequencing, the resulting sequence may contain parts of the cloning vector. In contrast, PCR-based cloning and next-generation sequencing technologies based on pyrosequencing often avoid using cloning vectors. Recently, one-step Sanger sequencing (combined amplification and sequencing) methods such as Ampliseq and SeqSharp have been developed that allow rapid sequencing of target genes without cloning or prior amplification. [8] [9]

Current methods can directly sequence only relatively short (300-1000 nucleotides long) DNA fragments in a single reaction. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide.

Microfluidic Sanger sequencing is a lab-on-a-chip application for DNA sequencing, in which the Sanger sequencing steps (thermal cycling, sample purification, and capillary electrophoresis) are integrated on a wafer-scale chip using nanoliter-scale sample volumes. This technology generates long and accurate sequence reads, while obviating many of the significant shortcomings of the conventional Sanger method (e.g. high consumption of expensive reagents, reliance on expensive equipment, personnel-intensive manipulations, etc.) by integrating and automating the Sanger sequencing steps.

In its modern inception, high-throughput genome sequencing involves fragmenting the genome into small single-stranded pieces, followed by amplification of the fragments by polymerase chain reaction (PCR). Adopting the Sanger method, each DNA fragment is irreversibly terminated with the incorporation of a fluorescently labeled dideoxy chain-terminating nucleotide, thereby producing a DNA “ladder” of fragments that each differ in length by one base and bear a base-specific fluorescent label at the terminal base. Amplified base ladders are then separated by capillary array electrophoresis (CAE) with automated, in situ “finish-line” detection of the fluorescently labeled ssDNA fragments, which provides an ordered sequence of the fragments. These sequence reads are then computer assembled into overlapping or contiguous sequences (termed "contigs") which resemble the full genomic sequence once fully assembled. [10]

Sanger methods achieve read lengths of approximately 800bp (typically 500-600bp with non-enriched DNA). The longer read lengths in Sanger methods display significant advantages over other sequencing methods especially in terms of sequencing repetitive regions of the genome. A challenge of short-read sequence data is particularly an issue in sequencing new genomes (de novo) and in sequencing highly rearranged genome segments, typically those seen of cancer genomes or in regions of chromosomes that exhibit structural variation. [11]

Applications of microfluidic sequencing technologies Edit

Other useful applications of DNA sequencing include single nucleotide polymorphism (SNP) detection, single-strand conformation polymorphism (SSCP) heteroduplex analysis, and short tandem repeat (STR) analysis. Resolving DNA fragments according to differences in size and/or conformation is the most critical step in studying these features of the genome. [10]

Device design Edit

The sequencing chip has a four-layer construction, consisting of three 100-mm-diameter glass wafers (on which device elements are microfabricated) and a polydimethylsiloxane (PDMS) membrane. Reaction chambers and capillary electrophoresis channels are etched between the top two glass wafers, which are thermally bonded. Three-dimensional channel interconnections and microvalves are formed by the PDMS and bottom manifold glass wafer.

The device consists of three functional units, each corresponding to the Sanger sequencing steps. The thermal cycling (TC) unit is a 250-nanoliter reaction chamber with integrated resistive temperature detector, microvalves, and a surface heater. Movement of reagent between the top all-glass layer and the lower glass-PDMS layer occurs through 500-μm-diameter via-holes. After thermal-cycling, the reaction mixture undergoes purification in the capture/purification chamber, and then is injected into the capillary electrophoresis (CE) chamber. The CE unit consists of a 30-cm capillary which is folded into a compact switchback pattern via 65-μm-wide turns.

Sequencing chemistry Edit

Platforms Edit

The Apollo 100 platform (Microchip Biotechnologies Inc., Dublin, CA) [12] integrates the first two Sanger sequencing steps (thermal cycling and purification) in a fully automated system. The manufacturer claims that samples are ready for capillary electrophoresis within three hours of the sample and reagents being loaded into the system. The Apollo 100 platform requires sub-microliter volumes of reagents.

Comparisons to other sequencing techniques Edit

Performance values for genome sequencing technologies including Sanger methods and next-generation methods [11] [13] [14]
Technology Number of lanes Injection volume (nL) Analysis time Average read length Throughput (including analysis Mb/h) Gel pouring Lane tracking
Slab gel 96 500–1000 6–8 hours 700 bp 0.0672 Yes Yes
Capillary array electrophoresis 96 1–5 1–3 hours 700 bp 0.166 No No
Microchip 96 0.1–0.5 6–30 minutes 430 bp 0.660 No No
454/Roche FLX (2008) < 0.001 4 hours 200–300 bp 20–30
Illumina/Solexa (2008) 2–3 days 30–100 bp 20
ABI/SOLiD (2008) 8 days 35 bp 5–15
Illumina MiSeq (2019) 1–3 days 2x75–2x300 bp 170–250
Illumina NovaSeq (2019) 1–2 days 2x50–2x150 bp 22,000–67,000
Ion Torrent Ion 530 (2019) 2.5–4 hours 200–600 bp 110–920
BGI MGISEQ-T7 (2019) 1 day 2x150 bp 250,000
Pacific Biosciences SMRT (2019) 10–20 hours 10–30 kb 1,300
Oxford Nanopore MinIon (2019) 3 days 13–20 kb [15] 700

The ultimate goal of high-throughput sequencing is to develop systems that are low-cost, and extremely efficient at obtaining extended (longer) read lengths. Longer read lengths of each single electrophoretic separation, substantially reduces the cost associated with de novo DNA sequencing and the number of templates needed to sequence DNA contigs at a given redundancy. Microfluidics may allow for faster, cheaper and easier sequence assembly. [10]