Information

What is Pseudopalindrome?

What is Pseudopalindrome?



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I stumbled upon this word in a webpage by bio.libretexts.org:

The dam-methylase of E. coli recognizes the tetranucleotide GATC in DNA and transfers a methyl group (from S‑adenosyl methionine) to the amino group at position 6 of the adenine in that sequence. Note that GATC is a pseudopalindrome, so both strands read the same for these four nucletides in DNA.

What does it mean? How is it different from palindromic sequence?

I've searched google,ncbi and google.books but couldn't find anything.


In DNA there are four nucleotides or "bases", each of which can be matched with a complementary base on the partner chain in the double helix. Thus:

Adenine (A) and Thymine (T) are complementary and Cytosine (C) and Guanine (G) are complementary.

So, a nucleotide sequence is said to be a palindrome if it has an even number of base pairs and is equal to the reverse of its complementary sequence.

For example, in a single strand of DNA the sequence of bases CCATTAATGG is palindromic because the sequence of bases in the complementary strand is GGTAATTACC, its reverse.

A pseudopalindrome is a DNA sequence with an odd number of base pairs yielding a symmetrical complement except at the central base-pair. For example, the DNA sequence ACCTGGT is pseudopalindromic, because its complement on the other strand is TGGACCA, which is its reverse except for the central element.

[There may be some confusion in the literature. The example cited in the question, GATC, is a palindrome, having CTAG as its complement, not a pseudopalindrome.]


Griffith (1928) was a microbiologist working with avirulent strains of Pneumococcus infection of mice with such strains does not kill the mice. He showed that these avirulent strains could be transformed into virulent strains , that is, infection with the transformed bacteria kills mice (Fig. 2.1.A.). Smooth (S) strains produce a capsular polysaccharide on their surface, which allow the Pneumococi to escape destruction by the mouse, and the infection proceeds, i.e. they are virulent. This polysaccaride can be type I, II, or III . Virulent S strains can be killed by heat (i.e., sterilization) and, of course, the dead bacteria can no longer infect the mouse.

The smooth strains can give rise to variants that do not produce the polysaccharide. Colonies of these bacteria have a rough (R) appearance, but more importantly they are not immune to the mouse's defenses, and cannot mount a lethal infection, i.e. they are avirulent.

When heat-killed S bacteria of type III are co-inoculated with live R (avirulent) bacteria derived from type II, the mouse dies from the productive infection. This shows that the live R bacteria had acquired something from the dead S bacteria that allowed the R bacteria to become virulent! The virulent bacteria recovered from the mixed infection now had a smooth phenotype, and made type III capsular polysaccharide. They had been transformed from rough to smooth, from type II to type III. Transformation simply means that a character had been changed by some treatment of the organism.

In 1944, Avery, McCarty and Macleod showed that the transforming principle is DNA . Earlier work from Friedrich Meicher (around 1890 to 1900) showed that chromosomes are nucleic acid and protein. Avery, McCarty and Macleod used biochemical fractionation of the bacteria to find out what chemical entity was capable of transforming avirulent R into virulent S bacteria, using the pneumococcus transfomation assay of Griffith. Given the chromosomal theory of inheritance, it was thought most likely that it would be protein or nucleic acid. At this time, nucleic acids like DNA were thought to be short oligonucleotides (four or five nucleotides long), functioning primarily in phosphate storage. Thus proteins, with their greater complexity, were the favored candidate for the transforming entity, at least before the experiment was done.

Different biochemical fractions of the dead S bacteria were added to the live R bacteria before infection, testing to see which fraction transformed avirulent R into virulent S bacteria. The surprising result was that DNA, not protein, was capable of transforming the bacteria . The carbohydrate fraction did not transform, even though it is a polysaccharide that makes the bacteria smooth, or S. Neither did the protein fraction, even though most enzymes are proteins, and proteins are a major component of chromosomes. But the DNA fraction did transform, showing that it is the "transforming principle" or the chemical entity capable of changing the bacteria from rough to smooth.

Figure 2.1. DNA is the transforming principle, i.e. the chemical entity that can confer a new phenotype when introduced into bacteria. A. The transformation experiments of Griffith. B. The chemical fractionation and transformation experiments of Avery, McCarty and Macleod.

At the time it was thought that DNA did not have sufficient complexity to be the genetic material. However, we now know that native DNA is a very long polymer and these earlier ideas about DNA being very short were derived from work with highly degraded samples.


Protein and Nucleic Acid Complexes

A LEUCINE ZIPPER

One form of DNA-binding motif is referred to as a leucine zipper. An example of a leucine zipper is a domain of protein GCN4. GCN4 is a 281-amino acid polypeptide, of which only the last 33 residues are required for dimerization. The C-terminal end of GCN4 binds to the major groove of B-DNA at a recognition site involved in the control of amino acid biosynthesis in yeast ( Hinnebusch, 1984 ). The crystal structure of this C-terminal segment was first determined without any DNA present ( O'Shea et al., 1991) . Later, a crystal structure was done with the protein motif bound to a fragment of B-DNA ( Keller et al., 1995) . The resemblance of the motif to a common zipper is striking and the motif is often referred to as bZip.

A stereodiagram of a leucine zipper protein–DNA complex is shown in Fig. 10.5 . The bZip motif, like the helix–turn–helix motif, interacts with DNA as a homodimer with twofold rotational symmetry. The DNA recognition element is a 20-mer. This site is a pseudopalindrome of two 4-base pair half-sites that overlap at a central G–C base pair. Each of these half-sites is contacted by one protein monomer. The DNA conformation in the complex is straight B-form DNA. No deviations are seen in its conformation along the portion that binds protein.

Fig. 10.5 . A leucine zipper with a recognition element. The coordinates have an accession code 1YSA and are derived from a crystallographic study.

The protein dimer is formed by interaction of the side chains in the coiled coil zipper. An N-terminal region has a high composition of basic residues that bind to the DNA in the major groove. The leucine zipper sequence is characterized by a crude heptad repeat, (abdedfg)n, with the occurrence of hydrophobic residues at positions a and d. Conserved leucines occur at position d over the length of the helix. In an imperfect fashion, the hydrophobic residues are the teeth of the zipper.

The protein segment shown in Fig. 10.5 contains 56 amino acids. Remember that this is only a small portion of GCN4. The dimeric form containing both the dimerization domain and the basic DNA-binding region without the cognizant DNA is shown in the stereodiagram presented in Fig. 10.6 . In the GCN4 peptide structure, the dimerization motif itself is a pair of coiled coil α helices. The amino acid sequence for this domain is as follows:

Fig. 10.6 . A leucine zipper with side chains (see legend to Fig. 10.5 ).

Note the large number of basic residues in the N-terminal segment, from position 227 to 245. In fact, 6 of the 18 residues are basic amino acids. This has been referred to as the basic region, and as seen in Fig. 10.5 , it is the portion of the protein in direct contact with the DNA.

The dimerization motif packs with hydrophobic residues near the center of the coiled coil, although such packing is not nearly as systematic as the name “leucine zipper” implies. In fact, as is visible in Fig. 10.6 , one of the leucines is located on the surface of the coiled coil. Furthermore, the “teeth” of the zipper do not interleave but rather face each other across the twofold rotation axis of the coiled coil.

The open end of the zipper appears to end near residue a L5. In the α-helix coiled coil, there is a twist to the helical segment leading to a supercoil. In the complex with DNA, this coil twisting no longer occurs. This should be demonstrable with a molecular graphics program. Start with a relatively long line along the helical axis. By moving this line up and down through the coil, it should be possible to demonstrate where the supercoiling of a single α helix appears to end.


Results and discussion

Identification of the I-SspI active site and the PD-(D/E)-XK nuclease fold

Attempts to clone the toxic wild-type I-SspI ORF into a bacterial expression system were unsuccessful. In order to produce protein for crystallization trials (as well as locate active site residues), we decided to identify a catalytically inactive mutant which could be overexpressed in E. coli without compromising its stability or DNA-binding affinity.

A wild-type I-SspI reading frame was synthesized (Supplementary data) and subcloned in a promoterless pUC vector for plasmid amplification (Blue Heron Inc.). This gene was then subcloned into a pET15b expression vector (Novagen Inc.) under a combination of bacterial strains and media conditions that repress transcription. Transformation of this vector into expression strains, including high stringency hosts such as BL21(DE3)pLysS (Novagen), did not produce colonies. Therefore, we decided to use the results of such a transformation as the basis for a screen for inactivating mutations in the endonuclease active site.

We designed a mutagenic strategy to focus on the most ubiquitous known catalytic property of endonucleases: binding of divalent cations in the active site that are required during the reaction (Yang et al, 2006). Reasoning that metal ions are most often bound by aspartate (and less frequently by glutamate) residues, we designed an 𠆊sp to Ala' scan protocol. Primers were designed to mutate each aspartate in the I-SspI ORF (nine positions total). These primers were combined into a single ‘multichange' mutagenesis PCR (Stratagene Inc.) and the resulting mixture of mutagenic products was directly transformed into the BL21(DE3) E. coli expression strain. The plasmid-encoded I-SspI ORFs from individual colonies were sequenced, and we determined that mutation of a single aspartate residue (D8A) permitted bacterial growth (Supplementary Table S1). We therefore reasoned that Asp 8 is very likely a catalytic residue, although a structural role could not be ruled out.

A subsequent sequence-based search by PSI-BLAST, as well as analyses with sequence/structure threading servers such as 3D-PSSM or Phyre (Kelley et al, 2000), failed to reveal any homologues with known function, although a free-standing hypothetical reading frame was detected in T7 phage (gene 5.3). However, a structure-based sequence comparison server (meta server, http://bioinfo.pl/meta/) indicated a weak match of the first 100 residues of I-SspI against PDB entry 1GEF, with overall 18% sequence identity and a Z-score of 31.5. This structure corresponds to a Holliday junction resolvase (Hjc) from the archaea Pyrococcus furiosus, that contains the ‘PD-(D/E)-XK' core fold found in most type II restriction endonucleases (Nishino et al, 2001). A sequence alignment with archaeal resolvase enzymes ( Figure 1 ) allowed us to create a homology model of the I-SspI N-terminal core fold. At the time that this study was initially submitted for publication, this structural fold prediction was also described by another group, using the same computational server (Orlowski et al, 2007) a comparison of the structure prediction with the crystal structure is described below.

Sequence alignment between the bacterial I-Ssp6803I homing endonuclease and archaeal Holliday junction resolvases. Only the first 110 residues of I-SspI, that align well, are shown with the homologous regions of the Hjc sequences. The final 40 residues of I-Ssp6803I that are not shown to participate in structural elaborations on the PD-(D/E)-XK core fold that are unique to I-SspI. Secondary structure elements of the homing endonuclease are shown above the alignment structural elements from Pyrococcus furiosus Hjc are shown below. All of these elements are conserved in I-SspI with the exception of α-helix 2 (㬒), which instead is an extended loop (‘L1' in the text and subsequent figures) that contacts the DNA target site. The blue stars above the alignment indicate conserved residues at active sites of the Hjc resolvase family. Residue labels in parentheses above the sequence alignment indicate mutations present in the crystal structure, as described in the text. Sequence alignments were carried out by ESPript (Gouet et al, 1999).

A sequence comparison of I-SspI with the Hjc resolvase family (Komori et al, 2000) revealed conservation of several active site residues: Glu 9 in Hjc-Pfu (Glu 11 in I-SspI), Asp 33 (Asp 36), Glu 47 (Gln 49) and Lys 49 (Lys 51). Although Asp 8 is not conserved in this alignment, it is only three residues away from the conserved Glu 11, on the same side of a α-helix in the active site. A mutant construct of I-SspI corresponding to Glu 11 to Gln (E11Q) was also successfully transformed and overexpressed in E. coli. This led us to hypothesize that I-SspI may contain a canonical PD-(D/E)-XK fold with Asp 8 and Glu 11 both involved in the structure and function of the endonuclease active site.

Structure determination

Although catalytically inactive point mutants (D8A or E11Q) are overexpressed in E. coli, the majority of the protein is insoluble. On the basis of our model of the enzyme core described above, a phenylalanine residue (F55) was predicted to be solvent-exposed and far removed from the DNA-binding surface (Supplementary Figure S1). Incorporation of a lysine residue at this position (to create a E11Q/F55K double mutant) allowed the preparation of milligram quantities of highly pure, easily concentrated material. In addition, a pair of leucine residues (L16 and L21) were changed to methionine to facilitate phasing.

The resulting protein construct displays nanomolar affinity to its DNA target site using isothermal titration calorimetry (Supplementary Figure S2) and was cocrystallized bound to a 27 bp DNA duplex containing an I-SspI target site. The structure of the complex was determined using the multiwavelength anomalous dispersion (MAD) method, with data collected at beamline 5.0.2 at the Advanced Light Source (ALS) synchrotron. The experimental electron density map was of excellent quality (Supplementary Figure S3). The structure was refined to 3.1 Å resolution with Rwork/Rfree= 0.266/0.313 ( Table I ).

Table 1

Crystal complexNative (E11Q/F55K)Se-Met_peak (E11Q/F55K/L16M/L21M)Se-Met_remote (E11Q/F55K/L16M/L21M)
Crystallographic data
 Resolution (Å)4.03.13.3
 Wave length (Å)1.54280.979 (peak)0.965 (remote)
 Space groupI422I422I422
�ll parameters (Å)a=b=143.09, c=319.43, α=β=γ=90°a=b=143.78, c=319.18, α=β=γ=90°a=b=143.96, c=319.83, α=β=γ=90°
𠀼ompleteness (%)99.9 (100.0)94.0 (61.4)100.0 (100.0)
Rsym (%)13.1 (45.1)8.3 (37.4)15.2 (63.0)
I21.8 (6.3)20.0 (3.5)16.2 (3.5)
 Redundancy14.2 (14.6)7.6 (7.6)9.5 (9.4)
    
MAD phasing   
 Resolution (Å) 3.3 
 Phasing power 1.035 
    
Refinement   
 Resolution (Å) 3.1 
 R-factor (%) 26.6 
 R-free (%) 31.3 
 R.m.s. bond length (Å) 0.008 
 R.m.s. bond angle (deg) 1.995 
 Ramachandran distribution 83.1% core 
  13.3% allowed 
  1.8% generous 
  1.8% disallowed 
 Mean B-factor (Å 2 ) 75.10 

Overall quaternary protein structure and stoichiometry of DNA binding

The structure of the I-SspI/DNA complex consists of one protein tetramer bound to a single DNA duplex the crystallographic asymmetric unit contains one copy of this complex ( Figure 2 ). We were able to model completely the entire chain of both DNA-bound monomers and the entire DNA molecule. The unbound monomers were also easily modeled, except for a short disordered surface loop region in each subunit (residues 71� in monomer C and residues 68� in monomer D) that is only ordered upon DNA binding.

Structure of I-SspI bound to its DNA target site. Protein subunits are each colored separately. The complex is shown in three mutually orthogonal orientations in (AC). The buried surface area in each subunit interface is indicated. Two bound calcium ions are shown as red spheres. Loop L1 from monomers A and B is indicated by double black arrows. These loops are primarily associated with the central bases of the target site. These two loops do not interact with the bases in an identical manner—reflecting the asymmetry of this region of an otherwise symmetric target site. The same loops are disordered in subunits C and D, which are not bound to DNA and display minimal subunit contacts. The crystallization oligonucleotide construct is shown below panel A. The cleavage sites are indicated by cyan triangles. The base positions corresponding to physiological homing site are shown in red and the central 3-bases (corresponding both to the 3′ overhangs produced by cleavage and to anticodon triplet for fMet) are bold. The palindromic base pairs in the structure are underlined.

Several independent lines of evidence agree with the observed protein:DNA stoichiometry: (i) the protein runs as a tetramer on a size exclusion column (ii) the crystals were grown in a three-fold excess of DNA relative to the protein tetramer (therefore, potential binding at the second site was not limited by the DNA concentration) and (iii) binding experiments using isothermal titration calorimetry clearly indicates a DNA:protein stoichiometry that agrees with the crystal structure (Supplementary Figure S2). As discussed below, binding of a single DNA duplex induces a rearrangement in the packing of the tetramer that prevents binding of a second site.

The I-SspI PD-(D/E)-XK fold

Each I-SspI monomer displays a topology containing four α-helices and nine β-strands ( Figure 3A ). The core catalytic region consists of one α-helix (㬑) surrounded by five β-strands (㬡, 㬢, 㬣, 㬧 and 㬨). Three of these elements (㬑, 㬡 and 㬢) are involved in assembly of the protein tetramer. Two additional α helices (㬓 and 㬔) pack against this core fold and comprise the C-terminal end of the monomer.

Structural comparison of protein subunits from the I-Ssp6803I homing endonuclease, the Hjc Holliday junction resolvase and the PvuII restriction endonuclease. (A) Structure and topology diagram of a single homing endonuclease subunit. The secondary structural elements are labeled and colored as follows: the PD-(D/E)-XK catalytic core region is pink and peripheral elaborations on that core are green. The N- and C-terminal residues of the secondary structural elements are indicated in the topology diagram. Catalytic residues are shown as sticks in the model on the left and labeled in red on the right. Regions involved in DNA recognition are indicated by dotted boxes and are numbered as shown in Figure 6 and described in the text. (B) The Hjc Holliday junction resolvase subunit. This structure has not been determined in the presence of DNA. (C) The PvuII restriction endonuclease subunit. Inlay: superposition of the I-SspI and Hjc catalytic cores (r.m.s.d. 1.9 Å).

Visual examination and analyses using the DALI structure comparison server (Holm and Sander, 1996) indicate that this domain corresponds to the canonical PD-(D/E)-XK nuclease fold, found in most restriction endonucleases. In this fold, the β-sheet is concave and markedly curved toward the 㬑-helix. All residues thought to be important for catalysis by I-SspI and its immediate structural homologues are positioned at the concave side of the five β-strands, at one end of each subunit.

In addition to restriction endonucleases, the PD-(D/E)-XK fold has also been observed in other enzymes involved in DNA rearrangements and modifications, including phage exonucleases, archaeal Holliday junction resolvases, phage T7 endonuclease I, transposase TnsA and certain DNA repair enzymes such as MutH and Vsr (Bujnicki et al, 2001). Structural comparisons with previously determined crystal structures using the DALI server reveals that the overall structure of the I-SspI monomer is most similar to the archael Holliday junction resolvases, typified by the Hjc enzyme from Pyrococcus furiosus, with a Z-score of 9.9 and r.m.s.d. for aligned Cα atoms of 2.4 Å (1.9 Å across the catalytic core) ( Figure 3B ). Whereas resolvase enzymes recognize a specific DNA backbone conformation without any strong sequence preference (Komori et al, 2000 Nishino et al, 2001), I-SspI recognizes a long DNA target sequence. This difference in binding activities results from unique structural elaborations on the core endonuclease fold as discussed below.

Of the type II restriction endonucleases that have been visualized to date, the closest structural homologue of I-SspI is PvuII ( Figure 3C ), with a DALI Z-score of 5.6 and an r.m.s.d. over the aligned Cα atoms of 3.3 Å. However, the I-SspI endonuclease (which recognizes a 23 bp target) is smaller (150 residues) than PvuII (157 residues), which recognizes a 6 bp target sequence. This suggests that the structural elaborations to a PD-(D/E)-XK domain required for recognition of a long DNA target with reduced fidelity can be achieved at least as economically as the alternative elaborations required for recognition of a short site with absolute fidelity. As approximately the same number of direct contacts to DNA base pairs are made by I-SspI and PvuII, we also suggest that restriction endonucleases require additional protein mass primarily to expand their surface complementarity to the phosphoribosyl backbone and/or induce significant DNA bending, as strategies to increase fidelity.

Assembly of the endonuclease tetramer

The protein tetramer measures approximately 80 × 80 × 40 Å and displays 222 (D2) symmetry that is broken by DNA binding across two of the four protein subunits ( Figure 2 ). The catalytic cores of the four subunits show nearly identical structures, with an average r.m.s.d. value between subunits of 1 Å. Two of the protein monomers (A and B) interact with the DNA, which is uncleaved and slightly bent by �° around its central base. Approximately 3500 Å 2 are buried in the binding interface between protein and DNA.

Two additional protein monomers (C and D) complete the tetramer and point in the opposite direction from the protein𠄽NA complex, in a back-to-back arrangement with a nearly 90° rotation ( Figure 2 ). Although the core folds of the individual subunits are closely superimposable to each other, the relative orientation of the two DNA-bound subunits differ from that of the unbound subunits. Superposition of the DNA-bound subunits against their unbound counterparts ( Figure 4 ) indicates that this difference consists of a rigid-body rotation of protein subunits by approximately 5°.

Superposition of DNA-bound and unbound subunits in I-SspI. The endonuclease subunits are colored as in Figure 2 . The DNA-bound subunits (A and B) and their bound DNA ligand are superimposed on subunit (C) of the two DNA-free monomers. As discussed in the text, this analysis indicates that the unbound subunits display a rigid-body rotational difference in their relative orientations and packing, as compared to the DNA-bound subunits. This results in a difference of approximately 6 Å in the position of the DNA-binding surface of subunit D relative to subunit A. The observed structure and packing of the C/D subunits cannot be accomodated for DNA binding either by DNA straightening (because of steric crowding in the central minor groove) or by repacking of subunit D (which would destabilize the A/D dimer interface).

The tetrameric architecture of I-SspI is maintained by three pairs of unique packing arrangements between DNA-bound and DNA-unbound monomers, generating a dimer-of-dimers in which two dimers each bind to a DNA half-site, and the two DNA-bound subunits (and their symmetry mates) are in looser contact with one another ( Figure 2 ). A total of approximately 3300 Å 2 of protein surface area is buried within the tetramer.

Monomers A𠄽 (and B and C), are associated with one another through an antiparallel packing of their 㬑 helices from the PD-(D/E)-XK fold, creating two interfaces that each bury approximately 500 Å 2 ( Figure 2B ). This interaction is mediated by van der Waals interactions between small hydrophobic residues presented by one side of each helix. In contrast, this same helix is exposed to solvent in the homodimeric Hjc resolvase and is populated by highly polar and charged residues. This comparison demonstrates the structural differences that evolve as members of a protein fold family diverge from a common ancestor, resulting in different quaternary structures.

In addition, monomers A𠄼 (and B𠄽) form a two four-stranded β-sheets at their interfaces ( Figure 2C ), again using secondary structure from the nuclease core fold. Two β-strands from each monomer (㬡 and 㬢) participate in these interfaces, that each bury approximately 650 Å 2 . Between these two sets of dimer interactions, none of which involve the interface between the DNA-bound subunits, approximately 2300 Å 2 of protein surface area is buried.

The packing described above generates contacts between surface loops (L1 and L1′) from each of the DNA-bound protein subunits ( Figure 2A ). Approximately 400 Å 2 of surface area is buried between these two loops, which contact the major groove of the DNA target site underneath the central 3 bp (positions 𢄡, 0 and ʱ) of the target site. In the DNA interface, one L1 loop is closely associated with the DNA target, whereas the other is more distant. This asymmetry is caused by the corresponding asymmetry of the target site's three central base pairs, which provide a superior binding target for the L1 loop in one direction (5′-CAT-3′) than in the opposing direction (5′-ATG-3′). In the opposing protein subunits (C and D), the same loops are largely disordered, leaving only a pair of side-chain contacts between the cores of those subunits.

Binding of DNA targets by endonuclease tetramers

Superposition of the DNA-bound subunits and their DNA target against the unbound subunits ( Figure 4 ), indicates that neither the DNA nor the protein tetramer can be remodeled to allow binding of a second target site without either (i) imposing unreasonable steric crowding in the central minor groove, or (ii) destabilizing the protein tetramer. It therefore appears that binding of the first DNA duplex to protein subunits A and B breaks perfect 222 symmetry in the unbound enzyme, and induces a movement subunits relative to one another that is incompatible with high-affinity binding of a second duplex.

A tetrameric enzyme assembly is also generated by many type II restriction endonucleases, and has been described in crystallographic structure analyses of DNA-bound complexes of SfiI (Vanamee et al, 2005) and NgoMIV (Deibert et al, 2000). Such quaternary structures appear to have often evolved in restriction endonucleases for the purpose of establishing mechanism that requires the presence and binding of two cognate recognition sites for efficient cleavage, through positive cooperativity and allosteric activation (Gowers et al, 2004). Such behaviors can lead either to enhanced cleavage of one of the bound target sites (a type IIe restriction mechanism) or of both bound sites in a coordinated manner (a type IIf mechanism, displayed by SfiI and NgoMIV). This behavior may be important to avoid undesirable cleavage of spontaneously demethylated bacterial host sites.

The use of a tetrameric assembly by the homing endonuclease I-SspI appears to facilitate binding of a single long target site by allowing the core PD-(D/E)-XK domains to be far apart. In contrast, tetramer assembly by SfiI facilitates binding of two short DNA sites as described above, with the catalytic cores of each functional dimer packed more closely together. These different purposes for tetramer formation are reflected in very different arrangements of the endonuclease subunits for I-SspI and SfiI, each of which conforms to D2 (222) symmetry ( Figure 5 ). Head-to-head stacking of the two DNA-bound dimers in the restriction endonuclease places the PD-(D/E)-XK domains, their DNA-binding surfaces and the active sites close together in the individual protein𠄽NA interfaces𠅊 necessary property for recognition of a short target site. In contrast, ‘side-by-side' packing of the enzyme subunits in I-SspI allows the DNA-bound nuclease cores to be much more loosely associated with one another and to then distribute additional DNA-binding surfaces (displayed as elaborations that project from the core PD-(D/E)-XK fold) toward the distal ends of the longer target site. The use of a tetrameric endonuclease assembly specifically to stabilize the functional dimer on the DNA target was described originally for the SfiI-DNA structure (Vanamee et al, 2005).

Topology of I-Ssp6803I tetramer assembly and comparison with the SfiI restriction endonuclease. (A) Active sites and overall tetrameric packing of the I-Ssp6803I homing endonuclease. (B) Active sites and overall tetrameric packing of the SfiI restriction endonuclease. These two endonuclease have the same core fold and similar cleavage patterns (producing complementary 3-base, 3′ overhangs, as a result of cleavage across the minor groove) as shown below the models. Active sites from different monomers are colored in cyan or green (only secondary structures carrying the active site residues are shown for clarity). The cleavage sites on the DNA are indicated by red spots. The general architecture of the tetrameric assembly is indicted by the cartoon blocks representation, and are colored according to the corresponding protein subunits. The ribbon diagrams are shown with the 𠆋' subunit from each structure in roughly similar orientations, to facilitate direct comparison of the tetrameric packing of the endonucleases.

DNA target recognition

The physiological DNA homing site is a pseudopalindrome (5′-TCGTCGGGCT CAT AACCCGAAGG-3′), with the sequence differing between DNA half-sites at four base pairs: 넑, ଙ, ଓ, and ଑. In addition, a single A:T base pair (position 𠆀') is located at the exact center of the target, and also breaks symmetry in the site and its protein-bound complex. Biochemical DNA protection assays indicated that bases ʹ and ⬑ are bound more tightly than are their counterparts at 𢄩 and � (leading to significant differences between these bases in footprinting experiments). Therefore, in the DNA construct used for crystallization the base pairs at positions 𢄩 and � were converted to match their symmetry mates ( Figure 2 ).

Overall, the DNA displays a slightly bent B-form conformation that curves away from the protein. The minor groove at the center of the DNA is significantly broadened to � Å. Four discontinuous elaborations that extend from the PD-(DE)-XK protein fold are largely responsible for DNA recognition and binding, and make a variety of contacts across the entire length of the target. The protein regions involved in DNA contacts are numbered and labeled (1 through 4) in Figures 3 and ​ and6 6 and correspond to the description below.

DNA-binding by I-SspI. (A) A single I-SspI monomer in complex with a DNA half-site. The regions in direct contact with bases are colored in green. Each distinct contact region on the protein is designated by numbers that correspond to Figure 3 and the text. These regions are magnified to show details in panel B. (B) Schematic diagram of DNA-binding and close-up views of the corresponding contacts. Only half of the DNA target is represented. Residues contacting DNA bases or backbone are labeled as follows: across the DNA, contacts in the minor groove are indicated on the left of each base while contacts in the major groove are indicated on their right. For the protein, residues observed making identical contacts in both monomers (A and B) are labeled in black, whereas residues observed making contacts in individual monomers A and B are labeled in green and blue, respectively. Contacts made by protein to DNA are colored as follows: blue lines indicate direct contacts between bases and protein side chains, blue dashed lines indicate direct contacts between bases and protein main chains, and red lines indicate nonspecific contacts to DNA backbone.

First, the N-terminal ends of helices 㬑 and 㬑′ contact the central region of the DNA (base pairs 𠄲 to ʲ), where they insert into the minor groove and contribute two residues to each of the active sites. Second, the L1 surface loop from subunit 𠆊' wraps around the DNA and contacts the opposite edge of the same base pairs, making contacts in the major groove at base pairs 𢄡, 0 and ʲ. This interaction is asymmetric, as the same loop from subunit 𠆋' is not in contact with the DNA: this difference is a result of the corresponding asymmetry across the center of the DNA target.

Third, two short antiparallel β-sheets (㬥–㬦 and 㬤–㬩) are arranged end to end (in tandem) and provide additional contacts in the major groove of each half-site, from base pair positions 3𠄷. Fourth and finally, another protein surface loop (L2) extends from the end of strand 㬩 and makes contacts to the most distal ends of the DNA half-sites, within the minor groove at base pairs 널 and 11.

Thus, a discontinuous pair of short antiparallel β-sheets (consisting of two strands each, arranged end to end), along with two surface loops, wrap around 23 contiguous base pairs of the DNA target site and establish a mixture of contacts to the bases and to backbone atoms. The strategy of using β-strands for DNA target recognition is somewhat reminiscent of that used by other families of homing endonucleases such as the LAGLIDADG enzymes. However, in LAGLIDADG endonucleases the β-sheet DNA-binding platform is a single continuous structure in each protein domain that is an intimate part of the overall protein fold. In contrast, the constraints imposed by the use of discontinuous, surface-exposed elaborations on the PD-(D/E)-XK nuclease fold of I-SspI appears to reduce the density of contact side chains in the interface.

As is observed in other homing endonuclease𠄽NA cocrystal structures, the number of contacts to individual base pairs is variable and undersaturated ( Figure 6 ). At least 12 direct hydrogen bond contacts are made between protein side chains and DNA bases in the major groove of each half-site, corresponding to contacts to approximately one-third of the available hydrogen bond contact points in the DNA major groove. In addition, the protein makes extensive contacts in the minor groove to the central A:T base pair, and a pair of additional minor groove contacts to individual N2 nitrogens of the cytosines at positions널 and 11 (at the distal ends of the target site).

The target site displays the highest density of protein contacts across its central 13 bp (𢄦 to ʶ), whereas the flanking positions (ଗ through 11) exhibit fewer direct contacts. In the structure of the tRNA fMet , the bases encoded by the central 13 positions in the DNA target site correspond to the majority of the anticodon stem loop, whereas several bases encoded by the flanking sequence are unpaired in the tRNA. In particular, bases ଔ through 6 of the I-SspI target site encode a run of three consecutive G:C base pairs in the tRNA anticodon stem, which is a sequence feature that is diagnostic of tRNA fMet . Thus, the contacts made by the homing endonuclease (and its likely pattern of specificity) appears to mirror the sequence constraints of the host gene and its tRNA product.

Active-site architecture

Phosphodiester bond hydrolysis by a PD-(D/E)-XK fold follows a metal-dependent, in-line displacement mechanism. A general base is required to deprotonate the water nucleophile, a Lewis acid (usually one or more metal ions) stabilizes the phosphoanion transition state, and an acid protonates the 3′-oxyanion leaving group. The lysine residue in the PD-(D/E)-XK fold is often assigned to the role of a general base, although this role can also be assumed by a variety of other residues. The two acidic side chains of the motif (and occasionally a third acidic side chain) serve to ligate divalent metal ion cofactors, usually Mg 2+ . Occasionally, an amide-containing Gln or Asn residue can participate in metal binding (Yang et al, 2006).

This canonical active site architecture is recapitulated in I-SspI. A single bound calcium ion is observed in each active site in both ∣fo∣-∣fc∣ difference maps and in an anomalous difference Fourier map from the native data set collected on a home X-ray source ( Figure 7A ). This bound metal ion is coordinated by the scissile phosphate, by a backbone carbonyl oxygen, by Asp 36 from strand 㬢 and by Gln 49 from strand 㬣. An inner shell water molecule bound to this metal ion would be appropriately positioned to act as a nucleophile. Three additional residues (Asp 8 and Gln 11 from helix 㬑 and Lys 51 from strand 㬣) are also observed in the active site. The lysine residue is located appropriately to participate in general acid�se catalysis and deprotonate the water nucleophile.

The active site of I-Ssp6803I. (A) The active site of I-Ssp6803I is shown as a ball-and-stick representation. The observed calcium ion position is shown as a red sphere. The anomalous difference map calculated from a native data set collected on a rotating anode X-ray source (CuKα λ=1.54 Å) is shown in blue and contoured at 4.5σ. The predicted location of the water nucleophile and direction of its attack is indicated by the arrow the scissile phosphodiester bond is indicated with a red star. (B) Superimposed active sites of I-SspI with EcoRV and Hjc. K51, D36 and E11 are conserved Q49 is replaced by D and E, respectively in EcoRV and Hjc.

It is possible that a second metal ion is also bound by the wild-type enzyme active site, using Asp 8 and/or Glu 11 as coordinating side chains. Modeling a second metal ion near these side chains would recapitulate the structure and mechanism observed for a number of restriction endonucleases ( Figure 7B ).

Fold prediction versus structure determination

At the time that this study was initially submitted for publication, the same structural fold prediction (of a PD-(D/E)-XK domain) was also described by another group, using the same computational server (Orlowski et al, 2007). Both predictions produce a reasonable model of the catalytic core region, with an r.m.s.d. across that core of approximately 2 Å as compared to the crystal structure. As expected for homology models, more significant differences are observed when they are compared to the actual structure of the entire monomer (Supplementary Figure S4), corresponding to an overall r.m.s.d. between models and structure of 𢏃.5𠄳.7 Å.

Unfortunately, homology modeling of the DNA-bound I-SspI (Orlowski et al, 2007) breaks down significantly, owing to incorrect assignment of the endonuclease quaternary structure. The published attempt to model this complex, using reference models of existing dimeric PD-(D/E)-XK endonucleases (such as BglI) lead to a model of I-SspI for subunit packing and for the corresponding DNA conformation which is incorrectly constrained (Supplementary Figure S4). As a result, virtually all of the observed DNA-contacting regions in the I-SspI crystal structure are not predicted in the model of the protein𠄽NA complex. In particular, the L1 and L1′ loops that are associated with the central DNA base pairs of the target site are modeled as extensions of the active site α-helix, and ‘region 4' ( Figure 6 ) is not in proximity to DNA.

The homology modeling exercise reported previously (Orlowski et al, 2007), and also conducted in the early stages of this study, were very useful for identifying a previously unexpected evolutionary relationship between intron homing and the restriction fold family, and (in our case) for design of crystallizable protein constructs. However, the differences between the model and the crystal structure illustrate the difficulties of predicting both elaborations on core folds and of long-range quaternary interactions and oligomery. Such additional aspects of structure are critical for the functional and mechanistic constraints placed on any molecular system, including the homing endonuclease described here.

Biological roles and evolutionary successes of nuclease fold families

Enzymatic catalysis of phosphoryl transfer reactions is a fundamental requirement for virtually all forms of nucleic acid modification (Yang et al, 2006). A relatively small number of core folds are found to encompass the vast majority of enzymes that make and break phosphodiester bonds. In particular, two unrelated protein folds, the PD-(D/E)-XK and HNH domains, are each found in enzymes involved in similar processes. However, these families enjoy different levels of representation within these processes: the PD-(D/E)-XK family dominates bacterial restriction (but is now shown to have ventured at least once into mobile introns and homing), whereas the HNH family dominates many lineages of mobile introns (and is also found in bacterial colicins) but is found only rarely in bacterial restriction endonucleases.

There are a variety of reasons that might explain the differential success of these protein folds. What is clear is that the PD-(D/E)-XK fold is used frequently, and with great success, to recognize short DNA sequences with absolute fidelity, whereas it is used in at least one limited case to recognize long DNA sequences with reduced fidelity. The comparison of how this fold operates under two separate biological contexts provides an excellent illustration of the balance of mechanistic and structural pressures that dictate the final success and use of such a motif.

Finally, it should be noted that the involvement of the PD-(D/E)-XK fold in the �rk side' of prokaryotic genetics (as a selfish agent capable of invasion of bacterial genomes) versus its usual and commonly accepted role as a guardian of the bacterial genome (as a restriction endonuclease) may not be as clear-cut as is implied in this study or in conventional views of molecular biology. Restriction–modification (R/M) systems, which were discovered on the basis of their ability to inhibit phage infections of host bacterial strains, appear to be capable of acting as invasive elements on their own (Kobayashi, 2001). They are often associated with a mobile DNA vectors such as plasmids, viruses and transposons, and may engage in extensive horizontal transfer between bacterial genomes. A key factor in this behavior is the competitive advantage that such an element may exhibit upon incorporation and expression in a host: deletion of the R/M system may lead to death of host progeny as newly replicated (and ‘unprotected') DNA target sites are cleaved by residual endonuclease activity. At least one study has demonstrated that the EcoRI gene displays homing-type mobility when placed into the appropriate sequence context (Eddy and Gold, 1992). Thus, the use of the restriction nuclease fold in a bacterial homing endonuclease may not represent a sudden departure down a dark path of transposition and non-Mendelian inheritance, but rather a return to one of its fundamental genetic and biological properties.


Materials and methods

Screening for active-site residues

Six mutagenesis primers (Supplementary data) were designed to change nine aspartates (Asp 8, 31, 36, 40, 65, 85, 87, 105 and 120) to alanines. These primers were combined at equimolar concentration in one Stratagene Multi-Change ® mutagenesis reaction, following the manufacturer's reaction protocol. The reaction products were transformed into E. coli BL21 (DE3) RIL + cells and plated on LB containing 50 μg/ml ampicillin.

Fold recognition and comparative modeling

The 3D-Jury ( Ginalski et al, 2003 ) consensus method was used for fold recognition at the Meta server (http://bioinfo.pl/meta/). We used a K * Sync alignment method ( Chivian and Baker, 2006 ) at Robetta server ( Kim et al, 2004 ) (http://robetta.bakerlab.org/) to improve alignment between the I-SspI query and 1GEF.

Protein purification

Cultures were induced at 16°C for 18 h. Cells were harvested by centrifugation and lysed using a microfluidizer in 400 mM NaCl, 50 mM Tris, pH 7.5 and 10% glycerol. Cell debris was removed by centrifugation, then forced through a 0.2 μm syringe filter and applied to a heparin affinity column. Protein was concentrated and dialyzed against storage buffer (600 mM NaCl, 50 mM Tris pH 7.5 and 10% glycerol). Size-exclusion chromatography using a Superdex-200 column equilibrated against the same buffer was then performed, and the protein was concentrated to 3.5 mg/ml.

Isothermal titration calorimetry

Studies of DNA binding are described in the Supplementary data.

Crystallization, data collection and structural determination

The DNA oligonucleotides used for cocrystallization were purchased from Integrated DNA Technologies (1 μmol scale, HPLC-purified). The oligos were dissolved in H2O, and complementary DNA strands were annealed by incubating for 10 min at 95°C followed by slow cooling. The purified mutant protein was mixed with a three-fold excess of DNA duplex (relative to the concentration of enzyme tetramer) and 10 mM CaCl2.

The best DNA construct identified for cocrystallization consisted of two strands of sequence: 5′-GAGGCCTTCGGGCTCATAACCCGAAGGGA-3′ and its complement 5′-TCTCCCTTCGGGTTATGAGCCCGAAGGCC-3′. The construct forms a pseudo-palindromic, 27 bp duplex with 2 bp cohesive overhangs on the 5′ ends. The sequence of base pairs at positions −11 and −9 (which do not display strong protection by bound endonuclease) were changed to match their counterparts at positions +9 and +11 (which are protected by the bound endonuclease), and thereby promote stable binding. The crystals were grown at 4°C by vapor diffusion against a reservoir containing 15–20% MPD and 100 mM MES buffer at pH 6. Crystals grew in 1 week and were harvested into 100 mM MES (pH 6.0), 25% MPD (2-methyl-2,4 pentanediol), 600 mM NaCl, 20 mM CaCl2 and 10% glycerol and flash-frozen in liquid nitrogen. These crystals diffracted to 3.3 Å at beamline 5.0.2 of the ALS. The data were processed and scaled using the DENZO/ SCALEPACK program package ( Otwinowski and Minor, 1997 ).

To use the dispersive edge of selenium-substituted methionine (SeMet) for phasing, two residues (Leu 16 and Leu 21) were mutated to methionines. These positions were chosen from regions predicted to be distant from the catalytic active site and the DNA-binding interface, based on the homology model described above (Supplementary Figure S1). SeMet-derivatized I-SspI quadruple-mutant (E11Q/F55K/L16M/L21M) was expressed from the BL21(DE3) E. coli strain under growth and media conditions designed to promote selenomethionine incorporation ( Doublie, 1997 ). The purification, crystallization and data collection protocols were the same as described above, except for the addition of 1% H2O2 into the crystal harvesting buffer.

Because of radiation decay, two crystals were used to collect at the peak wavelength and the remote wavelength of selenium, respectively, at beamline 5.0.2 of the ALS. Data sets were processed using the DENZO/SCALEPACK software package ( Otwinowski and Minor, 1997 ). Data statistics are summarized in Table I. CNS ( Brunger et al, 1998 ) was used to perform a heavy atoms search. The selenium sites were then input into SHARP ( La Fortelle, 1997 ) for phasing using the MAD method. Model building and initial refinement were performed by COOT ( Emsley and Cowtan, 2004 ). The final model was refined to 3.1 Å resolution against the data set collected at the peak wavelength using REFMAC ( Murshudov et al, 1997 ). The stereochemistry of the model was monitored with PROCHECK ( Lakowski et al, 1993 ) Table I. A total of 98.2% of the non-glycine residues from the enzyme tetramer are located in the allowed regions of the Ramachandran plot. A native data set collected with a rotating copper anode X-ray source was used to generate an anomalous difference map. Two major peaks were visualized in the map at contour level of 4.5σ and assigned as calcium ions in the final model. All figures were generated with MacPymol ( DeLano, 2002 ).


FUNDING

The National Institutes of Health (R01 GM49857 and RL1 CA133833 to B.L.S.) the Gates Foundation Grand Challenge Program (to B.L.S.) Grant-in-Aids for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan (Nos. 16780232, 16013222 and 14760216 to N.N.). Funding for open access charge: National Institutes of Health and Gates Foundation Grand Challenge Program.

Conflict of interest statement. None declared


RESULTS AND DISCUSSION

Affinity, binding free energy and thermodynamic compensation across homing endonuclease families

The binding of 11 separate combinations of homing endonucleases and various DNA target sites, representing five major structural classes of these proteins, was analyzed by ITC. The DNA constructs used for these studies are shown in Table 1 the structure of each wild-type protein/DNA complex is shown in Figure 1 . The affinities and thermodynamic signatures of these interactions are summarized in Table 2 and Figures 2 and 3 .

Thermodynamic profiles of representative homing endonuclease binding events. Heat profiles of sequential injections of DNA against relevant endonucleases are shown. All endonucleases are wild-type except for a catalytically inactivated mutant of I-Ssp8603I all DNA target sites are the physiological sequences form the corresponding biological hosts. Three of the five representative homing endonucleases (the monomeric and homodimeric LAGLIDADG enzymes and the HNH enzyme) display endothermic binding the other two (the His-Cys box and PD-D/E-XK enzymes) display exothermic binding. All of the enzymes studied could be fit to standard saturation curves, except for I-HmuI (described in detail in the text), which displays complex multiphasic binding behavior.

Thermodynamic profiles of representative homing endonuclease binding events. Heat profiles of sequential injections of DNA against relevant endonucleases are shown. All endonucleases are wild-type except for a catalytically inactivated mutant of I-Ssp8603I all DNA target sites are the physiological sequences form the corresponding biological hosts. Three of the five representative homing endonucleases (the monomeric and homodimeric LAGLIDADG enzymes and the HNH enzyme) display endothermic binding the other two (the His-Cys box and PD-D/E-XK enzymes) display exothermic binding. All of the enzymes studied could be fit to standard saturation curves, except for I-HmuI (described in detail in the text), which displays complex multiphasic binding behavior.

Affinities and thermodynamic values of homing endonuclease–DNA-binding events

Protein . Site . KD . Δ H . Δ S . T Δ S . Δ G .
. . nM . kcal/mol . cal/mol/deg . kcal/mol . kcal/mol .
I-PpoI WT 17 ± 1.7 −35.5 ± 0.29 −81.6 ± 0.87 24.7 −10.8
I-PpoI LL 87 ± 8.8 −35.1 ± 0.57 −83 ± 3.0 25.3 −9.8
I-PpoI RR 23 ± 1.6 −38.8 ± 0.26 −93.1 ± 2.4 28.2 −10.6
I-AniI WT 96 ± 10 11.2 ± 1.1 69.3 ± 3 −21.0 −9.8
I-AniI WT-OPT 8 ± 1 6.6 ± 0.7 59.0 ± 3 −17.9 −11.3
I-MsoI WT 21 ± 5.0 12.7 ± 0.27 77 ± 1.1 −23.3 −10.6
I-MsoI LL 6 ± 1.4 16.4 ± 0.20 91.7 ± 0.36 −27.8 −11.4
I-MsoI RR 17 ± 6.0 14.4 ± 0.40 83.0 ± 0.64 −25.2 −10.8
I-MsoI MIS 760 ± 53 21.2 ± 0.42 98 ± 2.7 −29.7 −8.4
I-MsoI redesigned MIS 46 ± 6.5 34.2 ± 0.42 146.5 ± 0.7 −44.4 −10.2
I-MsoI redesigned WT 243 ± 20 37.1 ± 4 152.7 ± 6 −46.3 −9.2
I-SspI (E11Q) a WT 350 ± 30 −23.3 ± 3 −47.3 ± 4 14.3 −9.0
I-HmuI b WT ( 30 ) Endothermic
Protein . Site . KD . Δ H . Δ S . T Δ S . Δ G .
. . nM . kcal/mol . cal/mol/deg . kcal/mol . kcal/mol .
I-PpoI WT 17 ± 1.7 −35.5 ± 0.29 −81.6 ± 0.87 24.7 −10.8
I-PpoI LL 87 ± 8.8 −35.1 ± 0.57 −83 ± 3.0 25.3 −9.8
I-PpoI RR 23 ± 1.6 −38.8 ± 0.26 −93.1 ± 2.4 28.2 −10.6
I-AniI WT 96 ± 10 11.2 ± 1.1 69.3 ± 3 −21.0 −9.8
I-AniI WT-OPT 8 ± 1 6.6 ± 0.7 59.0 ± 3 −17.9 −11.3
I-MsoI WT 21 ± 5.0 12.7 ± 0.27 77 ± 1.1 −23.3 −10.6
I-MsoI LL 6 ± 1.4 16.4 ± 0.20 91.7 ± 0.36 −27.8 −11.4
I-MsoI RR 17 ± 6.0 14.4 ± 0.40 83.0 ± 0.64 −25.2 −10.8
I-MsoI MIS 760 ± 53 21.2 ± 0.42 98 ± 2.7 −29.7 −8.4
I-MsoI redesigned MIS 46 ± 6.5 34.2 ± 0.42 146.5 ± 0.7 −44.4 −10.2
I-MsoI redesigned WT 243 ± 20 37.1 ± 4 152.7 ± 6 −46.3 −9.2
I-SspI (E11Q) a WT 350 ± 30 −23.3 ± 3 −47.3 ± 4 14.3 −9.0
I-HmuI b WT ( 30 ) Endothermic

a ‘I-SspI’ corresponds to a catalytically inactive mutant (E11Q) of the I-Ssp8603I homing endonuclease, containing an alteration of an active-site putative metal-binding residue. In addition, the enzyme contains a second mutation on its surface far from the DNA-binding interface (F55K) to improve its solubility. The resulting affinity and Δ Gbinding therefore may not reflect the wild-type enzyme.

b I-HmuI displays complex behavior in ITC experiments that is not reliably modeled mathematically ( Figure 2 ). The reaction is strongly endothermic, and the Kd for that binding event has been estimated at 30 nM.

Affinities and thermodynamic values of homing endonuclease–DNA-binding events

Protein . Site . KD . Δ H . Δ S . T Δ S . Δ G .
. . nM . kcal/mol . cal/mol/deg . kcal/mol . kcal/mol .
I-PpoI WT 17 ± 1.7 −35.5 ± 0.29 −81.6 ± 0.87 24.7 −10.8
I-PpoI LL 87 ± 8.8 −35.1 ± 0.57 −83 ± 3.0 25.3 −9.8
I-PpoI RR 23 ± 1.6 −38.8 ± 0.26 −93.1 ± 2.4 28.2 −10.6
I-AniI WT 96 ± 10 11.2 ± 1.1 69.3 ± 3 −21.0 −9.8
I-AniI WT-OPT 8 ± 1 6.6 ± 0.7 59.0 ± 3 −17.9 −11.3
I-MsoI WT 21 ± 5.0 12.7 ± 0.27 77 ± 1.1 −23.3 −10.6
I-MsoI LL 6 ± 1.4 16.4 ± 0.20 91.7 ± 0.36 −27.8 −11.4
I-MsoI RR 17 ± 6.0 14.4 ± 0.40 83.0 ± 0.64 −25.2 −10.8
I-MsoI MIS 760 ± 53 21.2 ± 0.42 98 ± 2.7 −29.7 −8.4
I-MsoI redesigned MIS 46 ± 6.5 34.2 ± 0.42 146.5 ± 0.7 −44.4 −10.2
I-MsoI redesigned WT 243 ± 20 37.1 ± 4 152.7 ± 6 −46.3 −9.2
I-SspI (E11Q) a WT 350 ± 30 −23.3 ± 3 −47.3 ± 4 14.3 −9.0
I-HmuI b WT ( 30 ) Endothermic
Protein . Site . KD . Δ H . Δ S . T Δ S . Δ G .
. . nM . kcal/mol . cal/mol/deg . kcal/mol . kcal/mol .
I-PpoI WT 17 ± 1.7 −35.5 ± 0.29 −81.6 ± 0.87 24.7 −10.8
I-PpoI LL 87 ± 8.8 −35.1 ± 0.57 −83 ± 3.0 25.3 −9.8
I-PpoI RR 23 ± 1.6 −38.8 ± 0.26 −93.1 ± 2.4 28.2 −10.6
I-AniI WT 96 ± 10 11.2 ± 1.1 69.3 ± 3 −21.0 −9.8
I-AniI WT-OPT 8 ± 1 6.6 ± 0.7 59.0 ± 3 −17.9 −11.3
I-MsoI WT 21 ± 5.0 12.7 ± 0.27 77 ± 1.1 −23.3 −10.6
I-MsoI LL 6 ± 1.4 16.4 ± 0.20 91.7 ± 0.36 −27.8 −11.4
I-MsoI RR 17 ± 6.0 14.4 ± 0.40 83.0 ± 0.64 −25.2 −10.8
I-MsoI MIS 760 ± 53 21.2 ± 0.42 98 ± 2.7 −29.7 −8.4
I-MsoI redesigned MIS 46 ± 6.5 34.2 ± 0.42 146.5 ± 0.7 −44.4 −10.2
I-MsoI redesigned WT 243 ± 20 37.1 ± 4 152.7 ± 6 −46.3 −9.2
I-SspI (E11Q) a WT 350 ± 30 −23.3 ± 3 −47.3 ± 4 14.3 −9.0
I-HmuI b WT ( 30 ) Endothermic

a ‘I-SspI’ corresponds to a catalytically inactive mutant (E11Q) of the I-Ssp8603I homing endonuclease, containing an alteration of an active-site putative metal-binding residue. In addition, the enzyme contains a second mutation on its surface far from the DNA-binding interface (F55K) to improve its solubility. The resulting affinity and Δ Gbinding therefore may not reflect the wild-type enzyme.

b I-HmuI displays complex behavior in ITC experiments that is not reliably modeled mathematically ( Figure 2 ). The reaction is strongly endothermic, and the Kd for that binding event has been estimated at 30 nM.

The dissociation constants of the wild-type homing endonuclease families against their wild-type, physiological target sites range from 17 (I-PpoI) to 96 nM (I-AniI). The precise affinity and thermodynamic values for one wild-type endonuclease (I-HmuI, of the phage HNH family) could not be estimated directly from ITC data, due to multiphasic behavior in its binding isotherm (described in detail subsequently). Gel shift experiments for that endonuclease give KD values of ∼30 nM to its cognate DNA targets (B.L.S., unpublished data). Finally, the dissociation constant of a catalytically inactive point mutant (E11Q) of I-Ssp8603I (a bacterial PD-D/E-XK homing endonuclease) was 350 nM this probably represents a compromised binding affinity due to the loss of a bound metal ion in the protein–DNA interface. For the wild-type endonucleases that could be analyzed in detail, the range of binding free energies (Δ G ) to their physiological DNA targets was −10.8 kcal/mol (I-PpoI) to −9.8 kcal/mol (I-AniI).

With the exception of I-HmuI, the stoichiometries of protein–DNA interactions measured in these experiments were within 25% of 1:1 (protein functional assemblies:DNA target sites), in agreement with previously published biochemical studies and crystallographic analyses ( Figure 2 ). The lower stoichiometry observed for I-HmuI binding may reflect a higher percentage of aggregated or misfolded protein (that may also contribute to the unusual isotherm for that endonuclease).

For two endonucleases, their physiological binding site is not the optimal substrate for that protein. I-AniI (a LAGLIDADG monomer) binds a variant target site containing two basepair substitutions with 12-fold lower KD (8 nM) than its target in Aspergillus . Similarly, I-MsoI (a LAGLIDADG homodimer) binds a palindromic DNA target that consists of its two left half-sites with a 3-fold lower KD (6 nM) than its physiological target site. Both of these observations are discussed in more detail in the following sections.

The thermodynamic DNA-binding profiles of these proteins are widely distributed across the continuum of enthalpy/entropy compensation, and obey the same linear pattern of thermodynamic compensation noted previously for a wide variety of DNA-binding proteins ( 1 ) ( Figures 2 and 3 ). Two of the endonucleases (the His-Cys box protein I-PpoI and the PD-D/E-XK protein I-Ssp8603I) display strongly exothermic DNA binding, whereas the monomeric and homodimeric LAGLIDADG endonucleases (I-AniI and I-MsoI) and the HNH endonuclease (I-HmuI) display strongly endothermic DNA binding ( Figure 2 ). The range of values for ΔH (from −35 to +13 kcal/mol) and TΔS (−23 kcal/mol to +25 kcal/mol) that separate these individual protein–DNA interactions are as broad as previously observed for other DNA-binding proteins ( Figure 3 ).

Isothermal enthalpy–entropy compensation by homing endonucleases and other DNA-binding proteins. The enthalpic (Δ H ) and entropic (− T Δ S ) contributions to site-specific DNA recognition of different protein–DNA complexes, including several representative wild-type homing endonucleases (highlighted in red) are shown. The thermodynamic values for previously studied DNA-binding proteins are shown in blue, and are taken from previous analyses by Jen-Jacobson et al. ( 1 ) and references therein.

Isothermal enthalpy–entropy compensation by homing endonucleases and other DNA-binding proteins. The enthalpic (Δ H ) and entropic (− T Δ S ) contributions to site-specific DNA recognition of different protein–DNA complexes, including several representative wild-type homing endonucleases (highlighted in red) are shown. The thermodynamic values for previously studied DNA-binding proteins are shown in blue, and are taken from previous analyses by Jen-Jacobson et al. ( 1 ) and references therein.

DNA bending and thermodynamic signatures

All available crystal structures of homing endonuclease/DNA complexes demonstrate significant DNA bending that is associated with the mechanism by which each accomplishes recognition of long target sequences ( 11 ) ( Figures 1 and 4 ). In the case of dimeric (LAGLIDADG and His-Cys-Box) and tetrameric (PD-D/E-XK) homing endonucleases, cleavage is executed across the minor groove of the target site (generating 3′ overhangs), thus allowing the protein to contact and ‘read out’ contiguous sets of nucleotide basepairs within the major groove of each DNA half-site, at positions that flank the cleavage sites. This strategy requires that the center of the DNA target sites be significantly distorted near the scissile phosphates, either by narrowing the minor groove and using two closely juxtaposed active sites (as seen for the LAGLIDADG enzymes) or by widening the minor groove and using two physically separated active sites (as seen for the His-Cys box and PD-D/E-XK enzymes). In contrast, the monomeric phage endonucleases (such as the HNH enzyme I-HmuI) bind even longer targets (>25 bp), and use a tandem series of protein domains to contact intermittent stretches of DNA bases within both the major and minor groove of the target site. For those complexes, DNA bending is again necessary in order to access DNA bases in the minor groove and to straddle the DNA backbone at various positions in the complex.

DNA distortion induced by homing endonuclease binding. DNA bend parameters were quantitated using program ‘Readout’ ( 34 ), via a web-based server located at: http://gibk26.bse.kyutech.ac.jp/jouhou/readout/ ). The cognate target sequence, used in the individual crystal structures from which the bend parameters were calculated, are shown cleavage sites are indicated. All endonucleases in this study except for I-HmuI cleave both strands to produce 3' overhangs I-HmuI nicks the lower strand only (vertical line). The individual features of basepair steps distortion shown in the graphs (roll, tilt, twist and rise) are illustrated relative to a standard coordinate frame for double-stranded DNA.

DNA distortion induced by homing endonuclease binding. DNA bend parameters were quantitated using program ‘Readout’ ( 34 ), via a web-based server located at: http://gibk26.bse.kyutech.ac.jp/jouhou/readout/ ). The cognate target sequence, used in the individual crystal structures from which the bend parameters were calculated, are shown cleavage sites are indicated. All endonucleases in this study except for I-HmuI cleave both strands to produce 3' overhangs I-HmuI nicks the lower strand only (vertical line). The individual features of basepair steps distortion shown in the graphs (roll, tilt, twist and rise) are illustrated relative to a standard coordinate frame for double-stranded DNA.

For previously examined DNA-binding proteins, unfavorable enthalpic changes upon DNA binding (corresponding to an endothermic reaction event) have been observed to correlate with increased distortion of the DNA target, which in turn corresponds to basepair unstacking and molecular strain ( 1 ). As discussed subsequently, a comparison of the binding thermodynamics of the representative homing endonuclease complexes ( Figure 2 , Table 2 ) to the distortion of their DNA target sites ( Figure 4 ) also demonstrates the dominant role of base unstacking in promoting unfavorable changes in enthalpy during DNA binding, while conversely also demonstrating that significant DNA bending can be accomplished while still maintaining base stacking and producing a strongly exothermic-binding event.

The three homing endonucleases that display strongly ‘endothermic’ DNA binding (the two LAGLIDADG endonucleases and the I-HmuI HNH endonuclease) all display basepair roll angles and unstacking at individual basepair steps that significantly depart from undistorted B-form DNA ( Figure 4 ). The LAGLIDADG enzymes both display a single significant negative roll angle at their central −1/+1 bp step of ∼−20°, corresponding to pronounced narrowing of the minor groove at the center of the target cleavage region. This DNA bend and unstacking is accompanied by overwinding of the central basepair step (‘twist’ rising to 50° and 60° for the I-MsoI homodimer and I-AniI monomer, respectively). In contrast, the HNH endonuclease I-HmuI displays strongly ‘positive’ roll angles at two separate basepair steps (+20° at step +9/+10 and then +37° at step +13/+14). At each position the protein ‘hurdles’ the phosphate backbone and inserts side chains into the minor groove. The latter of these bends is accompanied by widening of the minor groove and significant underwinding of the DNA (‘twist’ being reduced to 22°).

The two endonucleases that exhibit strongly ‘exothermic’ DNA binding (the His-Cys box I-PpoI endonuclease, and the PD-D/E-XK endonuclease I-SspI) also display significant DNA bending (overall bends of ∼75° and 25° across the central 8 bp of their target sites) and corresponding widening of the minor groove at the site of cleavage. However, both proteins accomplish binding and distortion of their DNA targets with roll angles that never exceed +/−12° at any single basepair step, and relatively small distortions of helical winding (‘twist’) values across their target sites ( Figure 4 ). In the case of I-PpoI, the severe bend imparted to the DNA target is accomplished through a mixture of smaller cumulative roll angles (which act in concert to widen the minor groove at the cleavage sites) and mutually opposing tilts in a single DNA basepair step in each half-site.

Multidomain DNA recognition: I-HmuI

Whereas the eukaryotic and bacterial homing endonucleases display relatively compact structures containing single DNA-binding domains arranged in various oligomeric symmetries, the phage endonucleases (HNH and GIY-YIG families) contain separate DNA interaction domains arranged in a sequential tandem array along a single peptide chain ( 15 , 16 , 35 ). These proteins are prone to significant exchange and shuffling of their domains during evolution, and the occasional insertion of additional structural elements such as zinc fingers ( 36 ).

I-HmuI displays such a multidomain structure ( Figure 1 ) ( 17 ). Its N-terminal, antiparallel β–sheet (contacting the 5′ end of the target site) is associated with the HNH nuclease core, which is then followed by two α-helices that intercalate in the minor groove in the center of the site, and finally a helix–turn–helix domain that generates the majority of base-specific contacts at the 3′ end of the target site. The DNA binding isotherm displayed by I-HmuI ( Figure 2 ) is unique among the measurements described here, in that it displays complex multiphasic behavior as the total concentration of protein increases with sequential injections. Each of the first three injections (generating DNA concentrations in the sample cell rising from ∼10 to 30 nM) results in a rapid absorption of heat (+Δ H ) followed by a slower, more modest release of heat (−Δ H ). The overall heat signature becomes stronger with each of these early injections. Beginning with the fourth injection (resulting in a DNA concentration of ∼40 nM), the isotherm demonstrates only a single feature corresponding to heat absorption. This signal then displays a hyperbolic reduction in strength with each ensuing injection as saturation of the DNA target sites is achieved. The transition from the multiphasic injection isotherms to single peaks and subsequent return to baseline occurs at a protein concentration near the previously measured KD for the I-HmuI–DNA interaction.

While the pattern of heat absorption and release demonstrated by I-HmuI described above and shown in Figure 2 is difficult to explain to complete satisfaction, there are two likely explanations. The first is that that a small percentage of the protein is aggregated, a behavior that is eliminated by early injections of DNA (perhaps as free protein is titrated into DNA-bound complexes). The second is that the early injection isotherms might reflect to a biphasic binding event consisting of a relatively fast endothermic association of one protein domain (possibly the C-terminal helix–turn–helix domain) followed by a slower exothermic step of intramolecular association. It is possible that the N-terminal nuclease domain, which displays fewer contacts with the DNA and is known to display reduced DNA specificity on its own ( 37 ), may require a slow step of DNA distortion at the center of the target site before docking to the remaining target sequence. Recent studies of GIY-YIG endonucleases, which possess similar structural organization, have also demonstrated DNA binding and cleavage behaviors that involve initial rapid association and subsequent slower conformational changes ( 38 ).

Symmetry recognizing asymmetry

Many multimeric homing endonucleases are encoded within introns that interrupt highly conserved rDNA and tRNA genes ( 39–43 ). The preponderance of base-paired stem–loop elements in folded RNA structures increases the frequency of palindromic sequences within their coding regions, encouraging the persistence of multimeric homing endonucleases, including both homodimers (such as the His-Cys box endonuclease I-PpoI and the LAGLIDADG endonuclease I-MsoI, encoded by introns within algral rDNA genes) and tetramers (such as I-Ssp8603I, encoded by an intron within a bacterial tRNA gene). While such homing endonucleases possess the advantage of being encoded by particularly short reading frames, which are tolerated well by their intron hosts, they are presented with the challenge of maintaining activity when target site symmetry is broken. In many cases, a homodimeric protein is found to display cleavage activity against a physiological DNA target site that displays surprisingly little palindromic symmetry between left and right half-sites. For example, the LAGLIDADG homodimeric endonuclease I-CeuI acts at a target site that displays only 36% sequence identity between left and right half sites ( 21 ).

In such cases, one would anticipate that a homodimeric homing endonuclease would display binding and cleavage activity towards palindromic variants of its natural target site, and that many such proteins would display energetically superior (more favorable) contacts to one half-site over the other. This was examined for homodimeric endonucleases from both the His-Cys Box (I-PpoI) and LAGLIDADG (I-MsoI) families, by measuring their binding affinity and thermodynamic profiles against palindromic DNA sequences (‘Left-Left’, or ‘LL’ and ‘Right-Right’, or ‘RR’) derived from individual half-sites of their host target sites ( Table 1 ). The two palindromic sequences derived from the I-PpoI target site differ from one another at 4 bp out of 14 total. In contrast, the two palindromic sites derived from the I-MsoI target (which is substantially more asymmetric) differ at 12 out of 22 bp. Each of the palindromic sites differs from the wild-type target at half of those basepairs (all within a single half-site in each case).

I-PpoI and I-MsoI both display noticeable differences in affinity towards the two palindromic DNA sequences derived from their wild-type target ( Table 2 ). I-PpoI binds its ‘RR’ target 4-fold more tightly than its ‘LL’ target I-MsoI binds its ‘LL’ target 3-fold more tightly than ‘RR’. The corresponding differences in Δ Gbinding between the palindromic targets for each enzyme (ΔΔ G ) are ∼0.6 and 0.8 kcal/mol, respectively. In both cases, these differences in binding affinity are attributable to more favorable Δ Hbinding for the superior palindromic target (−3.7 and −2.0 kcal/mol, respectively), which is partially offset by smaller, unfavorable changes in T Δ S . The differences in the binding affinities and Δ Hbinding to the LL and RR palindromes is correlated to the total number of contacts made to nucleotide bases in each unique half-site. For example, I-MsoI displays 20 contacts to atoms on DNA bases in the left half-site (10 water mediated, 10 direct to side chains) versus 15 similar contacts to DNA bases in the right half-site (again, half of the contacts are water mediated). The number of contacts to DNA backbone atoms are identical in both.

Interestingly, the two enzymes behave differently towards their wild-type asymmetric target sites from the host as compared to the palindromic variants. I-PpoI binds its asymmetric cognate target site with an affinity ( KD ∼17 nM) that is comparable to the more ‘tightly’ bound of the two palindromes ( KD ∼23 nM). In contrast, I-MsoI binds its asymmetric cognate site with an affinity ( KD ∼21 nM) that is comparable to the more ‘loosely’ bound palindromic target ( KD ∼17 nM). The ability of a homodimeric homing endonuclease to recognize an asymmetric target site with an affinity that rivals a palindromic repeat of one half-site, or to even prefer the asymmetric cognate target over either palindromic variant (rather than displaying an affinity that is obviously an intermediate between the two palindromes) has also been observed for the LAGLIDADG homing endonuclease I-CeuI ( 21 ).

The LAGLIDADG family: specificity and engineering

Homing endonucleases are under intense scrutiny as potential reagents for targeted genetic applications, in which genomes of target organisms or cells are manipulated in vivo , using site-specific recombination to alter or add desired traits. This field requires the development of highly specific DNA-binding proteins that can stimulate gene conversion events at unique sites within complex genomes. Over the past several years, two different approaches to create enzymes capable of inducing site-specific DNA double-strand breaks have been developed: zinc finger nucleases ( 44 ) and engineered homing endonucleases ( 12 , 45 ). LAGLIDADG homing endonucleases are particularly attractive systems for the development of gene-specific reagents: they are the most specific of all known homing endonucleases, possess relatively small and highly modularized structures, and have tightly coupled mechanisms of specific DNA recognition and cleavage. A variety of studies have demonstrated that the DNA-binding specificity of LAGLIDADG endonucleases can be altered in predictable ways, ranging from the creation of artificial chimeric enzymes ( 46 , 47 ) to endonucleases that harbor individual amino acid substitutions in the protein–DNA interface ( 48–55 ).

Unlike many DNA-binding proteins, engineering and selection of homing endonuclease constructs that display altered DNA-binding specificity (particularly the LAGLIDADG family) is facilitated by two properties: they display relatively modest reductions in affinity when individual contacts to nucleotide bases are eliminated, and they utilize highly modularized and separable contacts between individual amino acid side chains and DNA bases. While this strategy of DNA recognition does not reduce the rules for binding specificity down to the simplicity of a one-to-one ‘code’ between protein residues and DNA bases, it does greatly simplify the complexity of DNA recognition and its redesign.

In order to investigate the exact effect of deleterious single base substitutions on binding specificity and affinity of LAGLIDADG endonucleases, we characterized the differences in binding thermodynamics for two separate LAGLIDADG homing endonucleases (monomeric I-AniI and homodimeric I-MsoI) when complexed to optimal target sites, versus nonoptimal miscognate target sites. For each study, the effect of simultaneously altering two basepairs was measured (sequences of sites shown in Table 1 ). We then went on to further characterize the thermodynamic effects of a ‘redesign cycle’ for I-MsoI, where amino acid substitutions in the enzyme were engineered with the goal of reacquiring high affinity, specific recognition of the formerly miscognate target sequence.

In the case of I-AniI, the optimal target site for the endonuclease differs from the wild-type, physiologic target site at two basepair positions ( Table 1 ). Both mutations (at positions +1 and +8) consist of an inversion of an A:T basepair and were identified in an in vitro screen for hypercleavable target site variants ( 13 ). The endonuclease displays a 12-fold difference in affinity towards these two targets ( KD(wild-type) = 96 nM KD(optimal) = 8 nM), corresponding to a difference in Δ Gbinding of −1.5 kcal/mol. The less favorable Δ Gbinding to the wild-type site is caused by a substantial increase in the heat absorbed during binding to that site (Δ H is increased by 4.6 kcal/mol), which is partially offset by a more favorable entropic change for the formation of the miscognate complex.

In contrast, the homodimeric I-MsoI displays near optimal binding affinity to its physiological cognate target site. Two simultaneous alterations of that target sequence, consisting of a substitution of −6 C:G to −6G:C in the ‘left’ DNA half-site, and a similar change from +6 T:A to +6 G:C in the symmetry-related ‘right’ half-site, result in significant reduction of cleavage activity under standard reaction protocols. These substitutions were chosen based on structure-based computational predictions of DNA mutations that would cause a significant reduction in binding affinity of the wild-type enzyme ( 51 ). At both of these DNA positions, Lys 28 is engaged in a hydrogen bond to the purine ring and Thr 83 makes a water-mediated contact to the pyrimidine ( Figure 5 A). Converting either base pair to a G:C was predicted to disrupt binding by the loss of the direct hydrogen-bonding interactions and by desolvation of Lys28.

Thermodynamic signature of a homing endonuclease redesign cycle. The wild-type I-MsoI endonuclease (top left A ) binds to its cognate target site with a Kd of 21 nM, driven by a favorable entropic change ( T Δ S ) upon binding. Simultaneous alteration of a base pair in each half site (position +/− 6) results in a 30-fold increase in Kd , correlated with an unfavorable increase of 2.1 kcal/mol in the free energy of binding (ΔΔG) (top right B ). This energetic penalty is caused by a large unfavorable increase in the enthalpy of binding. Redesign of the enzyme, via two point mutations in the protein/DNA interface (bottom right D ), almost entirely restores affinity and free energy of binding ( KD = 46 nM). Finally, analysis of the redesigned enzyme against the original target site (bottom left C ) indicates that the newly created LHE, while displaying a specificity switch, only displays about a 5-fold to 10-fold increase in KD .

Thermodynamic signature of a homing endonuclease redesign cycle. The wild-type I-MsoI endonuclease (top left A ) binds to its cognate target site with a Kd of 21 nM, driven by a favorable entropic change ( T Δ S ) upon binding. Simultaneous alteration of a base pair in each half site (position +/− 6) results in a 30-fold increase in Kd , correlated with an unfavorable increase of 2.1 kcal/mol in the free energy of binding (ΔΔG) (top right B ). This energetic penalty is caused by a large unfavorable increase in the enthalpy of binding. Redesign of the enzyme, via two point mutations in the protein/DNA interface (bottom right D ), almost entirely restores affinity and free energy of binding ( KD = 46 nM). Finally, analysis of the redesigned enzyme against the original target site (bottom left C ) indicates that the newly created LHE, while displaying a specificity switch, only displays about a 5-fold to 10-fold increase in KD .

I-MsoI displays a 36-fold increase in the dissociation constant K D against the miscognate target site relative to the wild-type target site (from 21 nM up to 760 nM) ( Figure 5 B). This reduction in affinity corresponds to a 2.1 kcal/mol loss of favorable Δ Gbinding (from −10.6 kcal/mol for the wild-type complex to −8.5 kcal/mol for the miscognate site). Similar to I-AniI, the change in Δ Gbinding to the miscognate site is caused by a significant increase in the heat absorbed during binding (Δ H rises from 12.7 to 21.2 kcal/mol), which is partially offset by a more favorable entropic change ( T Δ S decreases from −23.3 to −29.7 kcal/mol).

For both enzymes, the deleterious basepair mismatches are presumed to lead to the loss of hydrogen bonds between protein side chains and DNA bases, and possibly the introduction of interatomic steric clashes in the interface. The total magnitude of unfavorable thermodynamic changes induced by these mismatches is reduced, however, by increased entropy in the miscognate complex: presumably the complexes between the endonucleases and their miscognate target sites display increased torsional and vibrational disorder in both the protein and the DNA at the site of nonoptimal contacts.

The I-MsoI endonuclease was subsequently engineered to compensate for the basepair substitutions at positions +/−6 in the DNA site, by redesign of the surrounding amino acids ( Figure 5 D). In these studies, a double point mutation in the enzyme, consisting of K28L and T83R, was predicted to reestablish energetically favorable interactions with the DNA bases ( 51 ). This computational redesign prediction was validated by crystallographic analyses, and by functional assays for cleavage activity against the novel target site. In the structure of the redesigned cognate enzyme/DNA complex, Leu28 makes a non-polar contact with the C5 of the cytosine rings at the altered basepair positions and Arg83 makes two hydrogen bonds to the corresponding guanine base of the same basepairs.

Armed with crystal structures of the WT:WT and Mutant:Mutant complexes of I-MsoI bound to DNA ( Figure 5 A and D), the thermodynamic profiles of the redesigned enzyme to its corresponding ‘cognate’ target and to the original wild-type DNA sequence were determined. As described above, the wild-type enzyme displays dissociation constants of 21 and 760 nM to its cognate and noncognate target sites (a 36-fold difference in affinity). In comparison, the redesigned enzyme displays a binding affinity to the mutated DNA target that is almost entirely restored as compared to the original wild-type complex ( KD = 46 nM) and a 5.3-fold higher dissociation constant for the original wild-type site ( KD = 243 nM). The amount of discrimination between cognate and noncognate sites displayed by the redesigned enzyme is therefore somewhat lower than the wild-type protein, corresponding to a ΔΔ Gbinding (cognate versus noncognate) of ∼1 kcal/mol.

The near wild-type affinity and Δ Gbinding between the redesigned I-MsoI and the altered DNA target site is driven entirely by a significant improvement in the already favorable entropic change upon binding. T ΔS binding is increased by over 20 kcal/mol as compared to the original wild-type binding event, compensating for a significant unfavorable increase in heat absorption (+Δ H ) upon binding. This dramatic alteration in the thermodynamic profile of the redesigned protein–DNA pair is probably a result of the introduction of a hydrophobic leucine side chain at residues 28 and 28′ in place of lysine. In both enzyme constructs, residue 28 is solvent-exposed in the unbound enzyme and buried in the DNA complex. In the case of the wild-type protein, the lysine side chain simply exchanges hydrogen bonds with solvent for similar interactions with the DNA. In contrast, binding of the redesigned endonuclease is presumably accompanied by significant desolvation of the solvent-exposed leucine side chain. When the original DNA target sequence is bound by the same redesigned enzyme construct, the less favorable enthalpy of binding is partially negated by the still strongly favorable entropic changes that accompany burial of the hydrophobic leucine side chains. Therefore, the enzyme does not display as large a difference in affinity between cognate and miscognate DNA targets as is observed for the wild-type enzyme. An important take-home message of this analysis is that the replacement of hydrogen-bond contacts with hydrophobic van der Waals contacts in the protein–DNA interface, during protein engineering, may lead to an undesirable loss of specificity and discrimination.


Introduction

Type II restriction endonucleases recognize short nucleotide sequences usually 4𠄸 bp in length and cleave DNA leaving blunt ends or 5′- or 3′-overhangs. Most Type IIP enzymes are active as dimers and many recognize target sequences in DNA that match the two-fold symmetry of the enzymes (Pingoud et al, 2005a). Strict matches require palindromic DNA sequences with an even number of base pairs, which are cleaved to generate either blunt ends or overhangs with an even number of nucleotides. Target sequences with an odd number of base pairs in the recognition sequence are necessarily pseudopalindromic, break the two-fold symmetry at the central base pair, and yield overhangs with an odd number of nucleotides upon cleavage ( Figure 1 ).

Oligonucleotides used for cocrystallization of Ecl18kI (this work) and NgoMIV (Deibert et al, 2000). The recognition sequence is shown in bold letters, boxes indicate the cleavage patterns with 5 nt and 4 nt 5′ overhangs, respectively.

Most Type IIP enzymes cut within the boundaries of the recognition sequence. Among those, palindrome cutters predominate over pseudopalindrome cutters, and 4 nt 5′-overhangs or blunt ends are the most common cleavage products (Roberts et al, 2005). Not surprisingly, a vast majority of mechanistic and structural studies of restriction endonucleases have focused on such enzymes (Pingoud et al, 2005a). Comparative structural analysis reveals that Type IIP restriction enzymes share a conserved core that harbors active site residues (Venclovas et al, 1994 Aggarwal, 1995 Kovall and Matthews, 1998). Different DNA cleavage patterns result from changes in the dimerization mode that alter the distance between active sites and hence the number of base pairs interspaced between scissile phosphates, as shown by a comparison of the EcoRI (G/AATTC) and EcoRV (GAT/ATC) crystal structures (Venclovas et al, 1994). Much less is known about pseudopalindrome sequence cutters, in part because very few enzymes in this group have been structurally characterized. Structures are available for apo-EcoRII(/CCWGG, WϚ or T) (Zhou et al, 2004) and for EcoO109I (RG/GNCCY, RϚ or G Y=T or C) (Hashimoto et al, 2005), BglI (GCCNNNN/NGGC) and SfiI (GGCCNNNN/NGGCC) complexes with DNA (Newman et al, 1998 Vanamee et al, 2005). BglI and SfiI are very similar to the blunt end-cutter EcoRV in terms of monomer structure, but generate 3 nt 3′-overhangs rather than blunt ends, because the enzymes dimerize in very different ways (Newman et al, 1998 Vanamee et al, 2005). In contrast, EcoO109I (RG/GNCCY) and BsoBI (C/YCGRG) (van der Woerd et al, 2001) generate different cleavage patterns despite strikingly similar dimerization modes. In fact, EcoO109I partially unstacks the DNA within its recognition sequence due to the interaction between an indole ring of a tryptophan residue and cytosine, resulting in DNA stretching and a register shift in the cleavage pattern with respect to BsoBI (Hashimoto et al, 2005). Thus, local changes of DNA structure and not alterations of the dimerization mode explain the different cleavage patterns of EcoO109I and BsoBI.

It is unlikely that the DNA ‘stretching mechanism' employed by EcoO109I can account for the cleavage patterns of Ecl18kI (/CCNGG) (Den'mukhametov et al, 1997), EcoRII(/CCWGG) (Bigger et al, 1973 Boyer et al, 1973) and PspGI (/CCWGG) (Morgan et al, 1998), because it would require 6 nt 5′-overhang cutters as ‘precursors', which are currently unknown. Instead, amino-acid sequence similarities and extensive mutagenesis data (Pingoud et al, 2002, 2005b Tamulaitis et al, 2002) argue for a close evolutionary link between Ecl18kI (/CCNGG) and the crystallographically characterized palindrome cutters NgoMIV (G/CCGGC) (Deibert et al, 2000), Cfr10I (R/CCGGY) (Bozic et al, 1996) and Bse634I (R/CCGGY) (Grazulis et al, 2002), which all generate 4 nt 5′-overhangs. How Ecl18kI, EcoRII and PspGI accommodate the extra base pair within their recognition sites and generate 5 nt overhangs is unclear.

To address these questions, we have determined the crystal structure of Ecl18kI restriction endonuclease from Enterobacter cloaceae in complex with a 9 bp oligonucleotide duplex containing the recognition site. The amino-acid sequence of Ecl18kI is over 99% identical to the isoschizomeric restriction endonucleases SsoII (Karyagina et al, 1993), SenPI (Ibanez et al, 1997) and StyD4I (Miyahara et al, 1997).


FUNDING

A.V., S.M., T.M., L.V., C.L. were supported by CNRS (Mission pour l’interdisciplinarité, Agromics 2014–2016) this work was submitted to fulfill the requirements for a doctorate of biology at ED341-E2M2 from Université de Lyon, granted from the French Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche (to T.M.) this work has benefited from the I2BC crystallization and protein–protein interactions platforms supported by FRISBI [ANR-10-INSB-05-01] Cell and Tissue Imaging (PICT IBiSA), Institut Curie, member of the French National Research Infrastructure France-BioImaging [ANR10-INBS-04]. Funding for open access charge: CNRS.

Conflict of interest statement. None declared.


DISCUSSION

We previously reported that in the halotolerant obligate methanotroph M. alcaliphilum 20Z, genes encoding enzymes for biosynthesis of the compatible solute ectoine are organized into the ectABC-ask operon (32). The ectABC-ask operon is transcribed from two σ 70 -like promoters. Similar σ 70 promoters drive the expression of ectoine gene clusters in a variety of halophilic species, such as Chromohalobacter salexigens, Bacillus pasteurii, and Marinococcus halophilus (6, 21, 24), and thus, such an organization of transcriptional machinery is not unique for the methanotrophic bacterium.

A gene (designated ectR1) encoding a transcriptional regulator belonging to the MarR family was identified upstream of the ectABC-ask operon in M. alcaliphilum 20Z. We showed that deletion of the ectR1 gene results in derepression of the ectoine biosynthesis genes at different salt concentrations in the growth medium, thus indicating that EctR1 acts as a negative regulator of the ectABC-ask operon. This finding was further supported by the DNase I footprinting assay data. We found that the EctR1 binding site overlaps with the putative � element of the ectAp1 region of the ect operon, resulting in complete blocking of the promoter and thus suggesting steric inhibition of RNA polymerase recruitment. However, very low DNA binding activity (160 pmol of protein per 1 pmol of DNA) of the recombinant EctR1 preparation was detected. We suppose that an unknown mechanism (modification or metabolic signal interactions) may regulate the DNA binding ability of EctR1 depending on the medium osmolarity, but the respective systems for posttranslational modification that are present in M. alcaliphilum 20Z may be absent in E. coli.

The salt-dependent activation of expression of the ect operon observed for the ectR1-impaired strain also indicated that M. alcaliphilum 20Z may possess a complex regulatory system that involves multiple layers of responses. At present, we may only speculate that regulation of the ect genes involves a dynamic balance between repression by the EctR1 regulator and, most likely, activation mediated by a specific, yet unknown component(s). Characterization of the additional regulatory components of the transcriptional control system of the ectABC-ask operon is currently under way.

The MarR family includes a diverse group of regulators that can be classified into three general categories in accordance with their physiological functions: (i) regulation of response to environmental stress, (ii) regulation of virulence factors, and (iii) regulation of aromatic catabolic pathways (48). To our knowledge, EctR1 is the first example of a MarR-like regulator that controls osmoresponse genes. Like other members of the MarR family, EctR1 is located in an orientation opposite that of the controlled gene cluster, has a conservative winged helix-turn-helix motif, and binds to DNA as a homodimer. The EctR1 binding site contains two imperfect inverted repeats with 2 bp separating the two halves of the pseudopalindrome. The centers of the palindrome half-sites are separated by 10 bp, thus indicating that the positioning of each subunit of the EctR1 homodimer occurs on the same face of the DNA helix. This mode of EctR1-DNA binding is similar to that for other members of the MarR family proteins, such as HucR and MexR (12, 47), but different from that for E. coli MarR, which binds to DNA on different faces of the double helix (25).

The levels of both ectABC-ask and ectR1 transcription correlate with the salinity of the growth medium. Since the EctR1 binding site is located between the ectR1 transcription and translation start sites, EctR1 may repress its own expression via inhibition of the elongation process. Autoregulation has been observed for many transcriptional regulators, such as MarR (25), CinR (9), EmrR (10, 23), and HucR (47). In particular, HucR represses the transcription of its own gene and that of genes involved in the catabolism of uric acid (47). This repression is relieved by the binding of uric acid to the repressor, reducing its DNA binding ability. In the case of M. alcaliphilum 20Z, it was demonstrated that cells grown in a medium without NaCl accumulate small concentrations of ectoine (20) thus, ectoine is always present in the cytoplasm and most likely does not alter the DNA binding ability of EctR1. Moreover, the addition of exogenous ectoine or glycine betaine to the growth medium did not affect the induction of enzymes involved in the ectoine biosynthesis pathway.

EctR1 orthologs are present in other halophilic bacteria. In the ectoine- and 5-hydroxyectoine-producing species Salibacillus salexigens, in the DNA segment following the ectD (ectoine hydroxylase) gene, a partial reading frame whose deduced product showed similarity to MarR-type regulators was found (5). However, this reading frame, which is cotranscribed with ectD, was disrupted by two stop codons. Our analysis of the DNA fragment containing the ectoine biosynthetic genes (NCBI accession no. <"type":"entrez-nucleotide","attrs":<"text":"EU315063","term_id":"163717532","term_text":"EU315063">> EU315063) in the methanol-utilizing bacterium Methylophaga alcalica showed the presence of an open reading frame with high homology to the ectR1 gene of M. alcaliphilum 20Z (73% identity of translated amino acids). Moreover, a simple NCBI BLAST search revealed several ectR1-like genes located immediately upstream of the ectoine gene cluster in 17 halophilic bacterial species. Between them, the open reading frames of an Oceanospirillum sp. ( <"type":"entrez-protein","attrs":<"text":"EAR60187","term_id":"89080949","term_text":"EAR60187">> EAR60187), Nitrosococcus oceani ( <"type":"entrez-protein","attrs":<"text":"ABA57535","term_id":"76882854","term_text":"ABA57535">> ABA57535), Saccharophagus degradans ( <"type":"entrez-protein","attrs":<"text":"ABD80450","term_id":"89950435","term_text":"ABD80450">> ABD80450), a Reinekea sp. ( <"type":"entrez-protein","attrs":<"text":"ZP_01114878","term_id":"88799299","term_text":"ZP_01114878">> ZP_01114878), and an Oceanobacter sp. ( <"type":"entrez-protein","attrs":<"text":"EAT11341","term_id":"94426351","term_text":"EAT11341">> EAT11341) showed the highest identities of translated amino acid sequences with M. alcaliphilum EctR1 (35.5, 42.2, 45.6, 51.7, and 55.1%, respectively). These results indicate the presence of an EctR-mediated regulatory system controlling ectoine biosynthesis at the transcriptional level in diverse halophilic/halotolerant bacteria.