What are the different ways an exon gets spliced?

What are the different ways an exon gets spliced?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Exons are produced by more than one mechanism, e.g. splicing out introns after transcription, if I remember correctly. Please list all mechanisms.

There are several ways splicing can occur, which depends on the RNA molecule to be spliced and the catalyst that performs the splicing:

  1. mRNA splicing is carried out by spliceosome, which consists of small nuclear RNAs. There are sequences at the end of the introns and branch sties which indicate the splice sites. The so called lariat structure is formed when the 2'OH group of an adenosine residue in the branch site attacks the 5' splice site.

  2. self-splicing of ribosomal RNA precursor - it is performed with the absence of spliceosome.

  3. tRNA splicing - it requires three enzymes and ATP hydrolysis.


Biochemistry, L. Stryer, 5th eddition

J. Abelson. tRNA Splicing

Strategies to Correct Nonsense Mutations

Hana Benhabiles , . Fabrice Lejeune , in Nonsense Mutation Correction in Human Diseases , 2016

1.2 Examples

Exon skipping strategy has already reached clinical trials in the case of Duchenne muscular dystrophy (DMD). In this pathology, about 75% of patients could be treated by exon skipping ( Aartsma-Rus et al., 2003 ) and, in particular, about 16% of patients could be targeted by an exon 51 skipping therapy. The exon 51 encodes a part of the central domain of the dystrophin protein called Rod domain, starting from exon 8 to exon 62 ( Fig. 3.3 ). Rod domain is formed by 24 repeats similar to a peptide motif found in β-spectrin. Interestingly, about 60% of the rod domain can be deleted without severe consequences on the dystrophin function ( England et al., 1990 ).

Figure 3.3 . Schematic representation of the protein domain organization of the dystrophin.

The exons encoding the different domains are indicated at the top. Dystrophin can be divided into four domains named N-terminal domain (orange), the rod domain containing 24 repeats (purple) and four hinge domains (green), a cysteine rich domain (blue), and a C-terminal domain (pink).

Two strategies have been developed to induce the skipping of the exon 51 or other exons in the rod domain. The first one is to mask the 3′ splice site of an intron in order to induce the skipping of the following exon(s), using antisense oligonucleotide. The second strategy focuses on inhibiting the splicing of a particular exon by tethering a splicing inhibitor on this exon. In order to achieve this, modified U7 snRNA bound by hnRNPA1 was designed in order to anneal with a specific sequence of the target exon ( Goyenvalle et al., 2009 ). Both strategies promoted very encouraging results with the synthesis of internally truncated dystrophin protein in cell culture and animal models, such as mouse or dog ( Aartsma-Rus et al., 2003 Barbash et al., 2013 Goyenvalle et al., 2012 Hoogaars et al., 2012 Vulin et al., 2012 ).

Based on the positive results ex vivo on cell culture, as well as in vivo in mouse and dog models, clinical trials were attempted (for the definition of the clinical trial phases, see Note 3.1 ).

The Clinical Trial Phases

Clinical trial phase I: This phase is the first study on human and requires a small number of healthy people or patients (between 20 and 80). The toxicity and the tolerance of the drug are evaluated during this phase.

Clinical trial phase II: The aim of this second phase is to determine the minimal efficient dosage. The study is performed on 100–300 voluntary patients who will help to demonstrate some therapeutic benefit, and to identify secondary effects.

Clinical trial phase III: This study measures the efficacy of the drug versus a placebo, or a reference treatment. Several hundred to thousand patients are recruited at this stage. It is the final step before the authorization to put the drug on the market.

Clinical trial phase IV: This phase starts after the introduction of the drug on the market and allows the identification of secondary effects or toxicity after a long period of use.

The number of patients participating in clinical trials can be much lower in the case of the development of treatment on rare diseases, due to the limited number of patients.

Several trials were programmed, such as the one supported by GlaxoSmithKline (GSK) and Prosensa Therapeutics with the molecule Drisapersen, a drug using 2′-O-methyl phosphorothioate antisense oligonucleotide to induce the exon 51 skipping of the dystrophin gene (Clinical trial NCT01803412). At the end of the clinical trial phase II, after 24 weeks of treatment, patients who received Drisapersen succeeded to walk 35.8 m more than patients who received the placebo in the 6-min walk test (6-MWT, see Note 3.2 ) ( Butland et al., 1982 ). Unfortunately, this drug failed at the clinical trial phase III, since the rescue of the dystrophin function was not significant at the 6-MWT. Another clinical trial, also using antisense approach, has been completed up to the clinical phase II, from Sarepta Therapeutics, with a drug called Eteplirsen that also induces exon skipping of the exon 51 of the dystrophin gene, using a phosphorodiamidate morpholino oligomer (PMO) (Clinical trial NCT 01396239). Results of this clinical trial indicate that patients treated with Eteplirsen were capable of walking about 67 m more than the control group treated with a placebo ( Mendell et al., 2013 ). Patients at the origin of the test were able to walk 200–400 m in the 6-MWT. Interestingly, patients treated with Eteplirsen succeeded to hold their original distance covered during 6 min after 48 weeks of treatment, while patients treated with placebo decreased their performance. This result suggests that the treatment was not able to induce an increase in the muscular mass, but prevents the existing mass to decrease, which is already a very encouraging result.

The 6-Min Walking Test (6-MWT)

This test measures the distance that a patient can cover in 6 min of walk without physical assistance. The patient should do the exercise as fast as possible but it is allowed to slow down or even to stop to rest for a while during the exercise. The test has to be done on a flat ground without any obstacles with a length of at least 25 m without turns.

For a healthy person, the expected distance can be measured by the following formula:

d = 218 + (5.14 × size in centimeters) − (5.32 × age) − (1.8 × weight in kilograms) + [51.31 × gender (1 for men and 0 for women)].

As an example, a woman of 45 years of age, 157 cm tall and 50 kg weight, is expected to cover about 696 m.

The exon 51 is not the only exon of the dystrophin eligible for the exon skipping strategy. Indeed, the exon 53 has similar characteristic to exon 51. A clinical trial phase I sponsored by Nippon Shinyaku Pharmaceuticals that started in Jun. 2013 with the drug NS-065/NCNP-01 (NCT02081625) was expected to be completed by Mar. 2015, and target the skipping of the exon 53. The nature of NS-065/NCNP-01 is a morpholino antisense oligonucleotide.

Intron definition, exon definition and back-splicing revisited

Pre-mRNA splicing is performed by the sequential function of different spliceosome complexes. Spliceosome assembly depends on the presence — from the 5′ end to the 3′ end of introns — of conserved 5′ splice site (5′SS), branch point sequence (BPS) and 3′SS. Of all spliceosome complexes, the only one lacking a resolved structure is the first to assemble on the pre-mRNA, known as the E complex. It remains unclear how the spliceosome initially defines (recognizes and assembles across) introns or exons, and how canonical splicing is favoured over a non-canonical reaction known as back-splicing, which generates exonic circular RNAs (circRNAs). Li et al. now present the cryo-electron microscopy (cryo-EM) structure of the E complex of Saccharomyces cerevisiae, which suggests that intron definition, exon definition and back-splicing can be carried out by the same complexes.

In yeast, which typically contain short introns and long exons, intron definition seems to dominate. By contrast, in vertebrates, where introns are longer and exons are shorter, exon definition is thought to prevail. To study these mechanisms in detail, the authors assembled in vitro functional budding yeast E complexes on either the ACT1 pre-mRNA or the UBC4 pre-mRNA, and determined their cryo-EM structures.

Key to initiating splicing through intron definition is bringing together the 5′SS and the BPS. A striking feature found in the structure of the ACT1–E complex was a

25 bp double helix downstream of the 5′SS, which was not found in the UBC4–E complex. The 5′SS-to-BPS region (265 nt) of the ACT1 intron, but not the same region in UBC4 (58 nt), is predicted to form long stem-like structures, and mutations that abolish these structures resulted in substantial pre-mRNA accumulation (splicing inhibition). Thus, secondary structures may help to bring together essential intronic elements and facilitate spliceosome assembly.

The structure of the E complex, especially the relative positions of the 5′SS and the BPS, suggested that the same E complex can form across exons: instead of connecting a 5′SS with a downstream BPS across an intron, the 5′SS could connect with an upstream BPS across an exon. To test whether exon definition occurs in vivo in yeast, the authors truncated the DYN2 gene to contain only its middle exon and flanking introns (IEI construct) and mutated the splicing elements bordering either side of the DYN2 exon — a BPS mutation in intron 1 and a 5′SS mutation in intron 2. If splicing of DYN2 is governed solely by intron definition, retention of the intron in which a mutation resides would be expected, with minimal effect on the other intron however, if splicing is governed by exon definition, each of the mutations would lead to the retention of both introns. The observed composition of splicing intermediates produced from the different IEI mutants suggested that both intron definition and exon definition occur in vivo and contribute to correct splicing of DYN2 and showed, for the first time, that exon definition occurs in yeast.

A prediction of this model is that formation of exon definition complexes across long exons leads to back-splicing — splicing of the 5′SS downstream with the 3′SS upstream of the exon — and the formation of exonic circRNAs, owing to the lack of steric hindrance. Shortening exons of circRNA-producing IEI constructs abolished circRNA formation from the constructs, supporting the preference for back-splicing across long (or multiple) exons and the occurrence of exon definition.

“both intron definition and exon definition occur in vivo and contribute to correct splicing”

In summary, in yeast (and likely in all eukaryotes) the same E complexes are able to define both introns and exons, without the need for additional components or structural rearrangements. Furthermore, exon definition can cause back-splicing across long exons, suggesting that circRNAs are natural by-products of spliceosome-mediated splicing in all eukaryotes.

Results and discussion

Variation in the levels of alternative splicing in different human tissues

Alternative splicing events are commonly distinguished in terms of whether mRNA isoforms differ by inclusion or exclusion of an exon, in which case the exon involved is referred to as a 'skipped exon' (SE) or 'cassette exon', or whether isoforms differ in the usage of a 5' splice site or 3' splice site, giving rise to alternative 5' splice site exons (A5Es) or alternative 3' splice site exons (A3Es), respectively (depicted in Figure 1). These descriptions are not necessarily mutually exclusive for example, an exon can have both an alternative 5' splice site and an alternative 3' splice site, or have an alternative 5' splice site or 3' splice site but be skipped in other isoforms. A fourth type of alternative splicing, 'intron retention', in which two isoforms differ by the presence of an unspliced intron in one transcript that is absent in the other, was not considered in this analysis because of the difficulty in distinguishing true intron retention events from contamination of the EST databases by pre-mRNA or genomic sequences. The presence of these and other artifacts in EST databases are important caveats to any analysis of EST sequence data. Therefore, we imposed stringent filters on the quality of EST to genomic alignments used in this analysis, accepting only about one-fifth of all EST alignments obtained (see Materials and methods).

Levels of alternative splicing in 16 human tissues with moderate or high EST sequence coverage. Horizontal bars show the average fraction of alternatively spliced (AS) genes of each splicing type (and estimated standard deviation) for random samplings of 20 ESTs per gene from each gene with ≥ 20 aligned EST sequences derived from a given human tissue. The different splicing types are schematically illustrated in each subplot. (a) Fraction of AS genes containing skipped exons, alternative 3' splice site exons (A3Es) or 5' splice site exons (A5Es), (b) fraction of AS genes containing skipped exons, (c) fraction of AS genes containing A3Es, (d) fraction of AS genes containing A5Es.

To determine whether differences occur in the proportions of these three types of AS events across human tissues, we assessed the frequencies of genes containing skipped exons, alternative 3' splice site exons or alternative 5' splice site exons for 16 human tissues (see Figure 1 for the list of tissues) for which sufficiently large numbers of EST sequences were available. Because the availability of a larger number of ESTs derived from a gene increases the chance of observing alternative isoforms of that gene, the proportion of AS genes observed in a tissue will tend to increase with increasing EST coverage of genes [10, 31]. Since the number of EST sequences available differs quite substantially among human tissues (for example, the dbEST database contains about eight times more brain-derived ESTs than heart-derived ESTs), in order to compare the proportion of AS in different tissues in an unbiased way, we used a sampling strategy that ensured that all genes/tissues studied were represented by equal numbers of ESTs.

It is important to point out that our analysis does not make use of the concept of a canonical transcript for each gene because it is not clear that such a transcript could be chosen objectively or that this concept is biologically meaningful. Instead, AS events are defined only through pairwise comparison of ESTs.

Our objective was to control for differences in EST abundance across tissues while retaining sufficient power to detect a reasonable fraction of AS events. For each tissue we considered genes that had at least 20 aligned EST sequences derived from human cDNA libraries specific to that tissue ('tissue-derived' ESTs). For each such gene, a random sample of 20 of these ESTs was chosen (without replacement) to represent the splicing of the given gene in the given human tissue. For the gene and tissue combinations included in this analysis, the median number of EST sequences per gene was not dramatically different between tissues, ranging from 25 to 35 (see Additional data file 1). The sampled ESTs for each gene were then compared to each other to identify AS events occurring within the given tissue (see Materials and methods). The random sampling was repeated 20 times and the mean fraction of AS genes observed in these 20 trials was used to assess the fraction of AS genes for each tissue (Figure 1a). Different random subsets of a relatively large pool will have less overlap in the specific ESTs chosen (and therefore in the specific AS events detected) than for random subsets of a smaller pool of ESTs, and increased numbers of ESTs give greater coverage of exons. However, there is no reason that the expected number of AS events detected per randomly sampled subset should depend on the size of the pool the subset was chosen from. While the error (standard deviation) of the measured AS frequency per gene should be lower when restricting to genes with larger minimum pools of ESTs, such a restriction would not change the expected value. Unfortunately, the reduction in error of the estimated AS frequency per gene is offset by an increase in the expected error of the tissue-level AS frequency resulting from the use of fewer genes. The inclusion of all genes with at least 20 tissue-derived ESTs represents a reasonable trade-off between these factors.

The human brain had the highest fraction of AS genes in this analysis (Figure 1a), with more than 40% of genes exhibiting one or more AS events, followed by the liver and testis. Previous EST-based analyses have identified high proportions of splicing in human brain and testis tissues [29, 30, 32]. These studies did not specifically control for the highly unequal representation of ESTs from different human tissues. As larger numbers of ESTs increase the chance of observing a larger fraction of the expressed isoforms of a gene, the number of available ESTs has a direct impact on estimated proportions of AS, as seen previously in analyses comparing the levels of AS in different organisms [31]. Thus, the results obtained in this study confirm that the human brain and testis possess an unusually high level of AS, even in the absence of EST-abundance advantages over other tissues. We also observe a high level of AS in the human liver, a tissue with much lower EST coverage, where higher levels of AS have been previously reported in cancerous cells [33, 34]. Human muscle, uterus, breast, stomach and pancreas had the lowest levels of AS genes in this analysis (less than 25% of genes). Lowering the minimum EST count for inclusion in this analysis from 20 to 10 ESTs, and sampling 10 (out of 10 or more) ESTs to represent each gene in each tissue, did not alter the results qualitatively (data not shown).

Differences in the levels of exon skipping in different tissues

Alternatively spliced genes in this analysis exhibited on average between one and two distinct AS exons. Analyzing the different types of AS events separately, we found that the human brain and testis had the highest levels of skipped exons, with more than 20% of genes containing SEs (Figure 1b). The high level of skipped exons observed in the brain is consistent with previous analyses [29, 30, 32]. At the other extreme, the human ovary, muscle, uterus and liver had the lowest levels of skipped exons (about 10% of genes).

An example of a conserved exon-skipping event observed in human and mouse brain tissue is shown in Figure 2a for the human fragile X mental retardation syndrome-related (FXR1) gene [35, 36]. In this event, skipping of the exon alters the reading frame of the downstream exon, presumably leading to production of a protein with an altered and truncated carboxy terminus. The exon sequence is perfectly conserved between the human and mouse genomes, as are the 5' splice site and 3' splice site sequences (Figure 2a), suggesting that this AS event may have an important regulatory role [37–39].

Examples of tissue-specific AS events in human genes with evidence of splice conservation in orthologous mouse genes. (a) Human fragile X mental retardation syndrome-related (FXR1) gene splicing detected in brain-derived EST sequences. FXR1 exhibited two alternative mRNA isoforms differing by skipping/inclusion of exons E15 and E16. Exclusion of E16 creates a shift in the reading-frame, which is predicted to result in an altered and shorter carboxy terminus. The exon-skipping event is conserved in the mouse ortholog of the human FXR1 gene, and both isoforms were detected in mouse brain-derived ESTs. (b) Human betaine-homocysteine S-methyltransferase (BHMT) gene splicing detected in liver-derived ESTs. BHMT exhibited two alternative isoforms differing by alternative 5' splice site usage in exon E4. Sequence comparisons indicate that the exon and splice site sequences involved in both alternative 5' splice site exon events are conserved in the mouse ortholog of the human BHMT gene. (c) Human cytochrome P450 2C8 (CYP2C8) gene splicing. CYP2C8 exhibited two alternative mRNA isoforms differing in the 3' splice site usage for exon E4 (detected in ESTs derived from several tissues), where the exclusion of a 71-base sequence creates a premature termination codon in exon E4b. Exons and splice sites involved in the AS event are conserved in the mouse ortholog of CYP2C8.

Differences in the levels of alternative splice site usage in different tissues

Analyzing the proportions of AS events involving the usage of A5Es and A3Es revealed a very different pattern (Figure 1c,d). Notably, the fraction of genes containing A3Es was more than twice as high in the liver as in any other human tissue studied (Figure 1d), and the level of A5Es was also about 40-50% higher in the liver than in any other tissue (Figure 1c). The tissue with the second highest level of alternative usage for both 5' splice sites and 3' splice sites was the brain. Another group of human tissues including muscle, uterus, breast, pancreas and stomach - similar to the low SE frequency group above - had the lowest level of A5Es and A3Es (less than 5% of genes in each category). Thus, a picture emerges in which certain human tissues such as muscle, uterus, breast, pancreas and stomach, have low levels of AS of all types, whereas other tissues, such as the brain and testis, have relatively high levels of AS of all types and the liver has very high levels of A3Es and A5Es, but exhibits only a modest level of exon skipping. To our knowledge, this study represents the first systematic analysis of the proportions of different types of AS events occurring in different tissues. Repeating the analyses by removing ESTs from disease-associated tissue libraries, using available library classifications [40], gave qualitatively similar results (see Additional data files 2, 3, and 4). These data show that ESTs derived from diseased tissues show modestly higher frequencies of exon skipping, but the relative rankings of tissues remain similar. The fractions of genes containing A5Es and A3Es were not changed substantially when diseased-tissue ESTs were excluded.

From the set of genes with at least 20 human liver-derived ESTs, this analysis identified a total of 114 genes with alternative 5' splice site and/or 3' splice site usage in the liver. Those genes in this set that were named, annotated and for which the consensus sequences of the alternative splice sites were conserved in the orthologous mouse gene (see Materials and methods) are listed in Table 1. Of course, conservation of splice sites alone is necessary, but not sufficient by itself, to imply conservation of the AS event in the mouse. Many essential liver metabolic and detoxifying enzyme-coding genes appear on this list, including enzymes involved in sugar metabolism (for example, ALDOB, IDH1), protein and amino acid metabolism (for example, BHMT, CBP2, TDO2, PAH, GATM), detoxification or breakdown of drugs and toxins (for example, GSTA3, CYP3A4, CYP2C8).

Sequences and splicing patterns for two of these genes for which orthologous mouse exons/genes and transcripts could be identified - the genes BHMT and CYP2C8 - are shown in detail in Figure 2b,c. In the event depicted for BHMT, the exons involved are highly conserved between the human and mouse orthologs (Figure 2b), consistent with the possibility that the splicing event may have a (conserved) regulatory role. This AS event preserves the reading frame of downstream exons, so the two isoforms are both likely to produce functional proteins, differing by the insertion/deletion of 23 amino acids. In the event depicted for CYP2C8, usage of an alternative 3' splice site removes 71 nucleotides, shifting the reading frame and leading to a premature termination codon in the exon (Figure 2c). In this case, the shorter alternative transcript is a potential substrate for nonsense-mediated decay [41, 42] and the AS event may be used to regulate the level of functional mRNA/protein produced.

Differences in splicing factor expression between tissues

To explore the differences in splicing factor expression in different tissues, available mRNA expression data was obtained from two different DNA microarray studies [43–45]. For this trans-factor analysis, we obtained a list of 20 splicing factors of the SR, SR-related and hnRNP protein families from proteomic analyses of the human spliceosome [46–48] (see Materials and methods for the list of genes). The variation in splicing-factor expression between pairs of tissues was studied by computing the Pearson (product-moment) correlation coefficient (r) between the 20-dimensional vectors of splicing-factor expression values between all pairs of 26 human tissues. The DNA microarray studies analyzed 10 tissues in addition to the 16 previously studied (Figure 3). A low value of r between a pair of tissues indicates a low degree of concordance in the relative mRNA expression levels across this set of splicing factors, whereas a high value of r indicates strong concordance.

Correlation of mRNA expression levels of 20 known splicing factors (see Materials and methods) across 26 human tissues (lower diagonal: data from Affymetrix HU-133A DNA microarray experiment [45] upper diagonal: data from Affymetrix HU-95A DNA microarray experiment [43]). Small squares are colored to represent the extent of the correlation between the mRNA expression patterns of the 20 splicing factor genes in each pair of tissues (see scale at top of figure).

While most of the tissues examined showed a very high degree of correlation in the expression levels of the 20 splicing factors studied (typically with r > 0.75 Figure 3), the human adult liver was clearly an outlier, with low concordance in splicing-factor expression to most other tissues (typically r < 0.6, and often much lower). The unusual splicing-factor expression in the human liver was seen consistently in data from two independent DNA microarray studies using different probe sets (compare the two halves of Figure 3). The low correlation observed between liver and other tissues in splicing factor expression is statistically significant even relative to arbitrary collections of 20 genes (see Additional data file 8). Examining the relative levels of specific splicing factors in the human adult liver versus other tissues, the relative level of SRp30c message was consistently higher in the liver and the relative levels of SRp40, hnRNP A2/B2 and Srp54 messages were consistently lower. A well established paradigm in the field of RNA splicing is that usage of alternative splice sites is often controlled by the relative concentrations of specific SR proteins and hnRNP proteins [49–52]. This functional antagonism between particular SR and hnRNP proteins is often due to competition for binding of nearby sites on pre-mRNAs [49, 53, 54]. Therefore, it seems likely that the unusual patterns of expression seen in the human adult liver for these families of splicing factors may contribute to the high level of alternative splice site usage seen in this tissue. It is also interesting that splicing-factor expression in the human fetal liver is highly concordant with most other tissues, but has low concordance with the adult liver (Figure 3). This observation suggests that substantial changes in splicing-factor expression may occur during human liver development, presumably leading to a host of changes in the splicing patterns of genes expressed in human liver. Currently available EST data were insufficient to allow systematic analysis of the patterns of AS in fetal relative to adult liver.

An important caveat to these results is that the DNA microarray data used in this analysis measure mRNA expression levels rather than protein levels or activities. The relation between the amount of mRNA expressed from a gene and the concentration of the corresponding protein has been examined previously in several studies in yeast as well as in human and mouse liver tissues [55–58]. These studies have generally found that mRNA expression levels correlate positively with protein concentrations, but with fairly wide divergences for a significant fraction of genes.

Over-represented motifs in alternative exons in the human brain, testis and liver

The unusually high levels of alternative splicing seen in the human brain, testis and liver prompted us to search for candidate tissue-specific splicing regulatory motifs in AS exons in genes expressed in each of these tissues. Using a procedure similar to Brudno et al. [59], sequence motifs four to six bases long that were significantly enriched in exons skipped in AS genes expressed in the human brain relative to constitutive exons in genes expressed in the brain were identified. These sequences were then compared to each other and grouped into seven clusters, each of which shared one or two four-base motifs (Table 2). The motifs in cluster BR1 (CUCC, CCUC) resemble the consensus binding site for the polypyrimidine tract-binding protein (PTB), which acts as a repressor of splicing in many contexts [60–63]. A similar motif (CNCUCCUC) has been identified in exons expressed specifically in the human brain [29]. The motifs in cluster BR7 (containing UAGG) are similar to the high-affinity binding site UAGGG [A/U], identified for the splicing repressor protein hnRNP A1 by SELEX experiments [64]. The consensus sequences for the remaining clusters BR2 to BR6 (GGGU, UGGG, GGGA, CUCA, UAGC, respectively), as well as BR7, all resembled motifs identified in a screen for exonic splicing silencers (ESSs) in cultured human cells (Z. Wang and C.B.B., unpublished results), suggesting that most or all of the motifs BR1 to BR7 represent sequences directly involved in mediating exon skipping. In particular, G-rich elements, which are known to act as intronic splicing enhancers [65, 66], may function as silencers of splicing when present in an exonic context.

A comparison of human testis-derived skipped exons to exons constitutively included in genes expressed in the testis identified only a single cluster of sequences, TE1, which share the tetramer UAGG. Enrichment of this motif, common to the brain-specific cluster BR7, suggests a role for regulation of exon skipping by hnRNP A1 - or a trans-acting factor with similar binding preferences - in the testis.

Alternative splice site usage gives rise to two types of exon segments - the 'core' portion common to both splice forms and the 'extended' portion that is present only in the longer isoform. Two clusters of sequence motifs enriched in the core sequences of A5Es in genes expressed in the liver relative to the core segments of A5Es resulting from alignments of non-liver-derived ESTs were identified - LI1 and LI2. Both are adenosine-rich, with consensus tetramers AAAC and UAAA, respectively. The former motif matches a candidate ESE motif identified previously using the computational/experimental RESCUE-ESE approach (motif 3F with consensus [AG]AA [AG]C) [19]. The enrichment of a probable ESE motif in exons exhibiting alternative splice site usage in the liver is consistent with the model that such splicing events are often controlled by the relative levels of SR proteins (which bind many ESEs) and hnRNP proteins. Insufficient data were available for the analysis of motifs in the extended portions of liver A5Es (which tend to be significantly shorter than the core regions) or for the analysis of liver A3Es.

A measure of dissimilarity between mRNA isoforms

To quantify the differences in splicing patterns between mRNAs or ESTs derived from a gene locus, a new measure called the splice junction difference ratio (SJD) was developed. For any pair of mRNAs/ESTs that align to overlapping portions of the same genomic locus, the SJD is defined as the proportion of splice junctions present in both transcripts that differ between them, including only those splice junctions that occur in regions of overlap between the transcripts (Figure 4). The SJD varies between zero and one, with a value of zero for any pair of transcripts that have identical splice junctions in the overlapping region (for example, transcripts 2 and 5 in Figure 4, or for two identical transcripts), and has a value of 1.0 for two transcripts whose splice junctions are completely different in the regions where they overlap (for example, transcripts 1 and 2 in Figure 4). For instance, transcripts 2 and 3 in Figure 4 differ in the 3' splice site used in the second intron, yielding an SJD value of 2/4 = 0.5, whereas transcripts 2 and 4 differ by skipping/inclusion of an alternative exon, which affects a larger fraction of the introns in the two transcripts and therefore yields a higher SJD value of 3/5 = 0.6.

Computation of splice junction difference ratio (SJD). The SJD value for a pair of transcripts is computed as the number of splice junctions in each transcript that are not represented in the other transcript, divided by the total number of splice junctions in the two transcripts, in both cases considering only those splice junctions that occur in portions of the two transcripts that overlap (see Materials and methods for details). SJD value calculations for different combinations of the transcripts shown in the upper part of the figure are also shown.

The SJD value can be generalized to compare splicing patterns between two sets of transcripts from a gene - for example, to compare the splicing patterns of the sets of ESTs derived from two different tissues. In this case, the SJD is defined by counting the number of splice junctions that differ between all pairs of transcripts (i, j), with transcript i coming from set 1 (for example, heart-derived ESTs), and transcript j coming from set 2 (for example, lung-derived ESTs), and dividing this number by the total number of splice junctions in all pairs of transcripts compared, again considering only those splice junctions that occur in regions of overlap between the transcript pairs considered. Note that this definition has the desirable property that pairs of transcripts that have larger numbers of overlapping splice junctions contribute more to the total than transcript pairs that overlap less. As an example of the splice junction difference between two sets of transcripts, consider the set S1, consisting of transcripts (1,2) from Figure 4, and set S2, consisting of transcripts (3,4) from Figure 4. Using the notation introduced in Figure 4, SJD(S1,S2) = d(S1,S2) / t(S1,S2) = [d(1,3) + d(1,4) + d(2,3) + d(2,4)]/ [t(1,3) +t(1,4) + t(2,3) + t(2,4)] = [3 + 4 + 2 + 3]/ [3 + 4 + 4 + 5] = 12/16 = 0.75, reflecting a high level of dissimilarity between the isoforms in these sets, whereas the SJD falls to 0.57 for the more similar sets S1 = transcripts (1,2) versus S3 = transcripts (2,3). Note that in cases where multiple similar/identical transcripts occur in a given set, the SJD measure effectively weights the isoforms by their abundance, reflecting an average dissimilarity when comparing randomly chosen pairs of transcripts from the two tissues. For example, the SJD computed for the set S4 = (1,2,2,2,2), that is, one transcript aligning as transcript 1 in Figure 4 and four transcripts aligning as transcript 2, and the set S5 = (2,2,2,2,3) is 23/95 = 0.24, substantially lower than the SJD value for sets S1 versus S3 above, reflecting the higher fraction of identically spliced transcripts between sets S4 and S5.

Global comparison of splicing patterns between tissues

To make a global comparison of patterns of splicing between two different human tissues, a tissue-level SJD value was computed by comparing the splicing patterns of ESTs from all genes for which at least one EST was available from cDNA libraries representing both tissues. The 'inter-tissue' SJD value is then defined as the ratio of the sum of d(SA,SB) values for all such genes, divided by the sum of t(SA,SB) values for all of these genes, where SA and SB refer to the set of ESTs for a gene derived from tissues A and B, respectively, and d(SA,SB) and t(SA,SB) are defined in terms of comparison of all pairs of ESTs from the two sets as described above. This analysis uses all available ESTs for each gene in each tissue (rather than samples of a fixed size). A large SJD value between a pair of tissues indicates that mRNA isoforms of genes expressed in the two tissues tend to be more dissimilar in their splicing patterns than is the case for two tissues with a smaller inter-tissue SJD value. This definition puts greater weight on those genes for which more ESTs are available.

The SJD values were then used to globally assess tissue-level differences in alternative splicing. A set of 25 human tissues for which at least 20,000 genomically aligned ESTs were available was compiled for this comparison (see Materials and methods) and the SJD values were then computed between all pairs of tissues in this set (Figure 5a). A clustering of human tissues on the basis of their inter-tissue SJD values (Figure 5b) identified groups of tissues that cluster together very closely (for example, the ovary/thyroid/breast cluster, the heart/lymph cluster and the bone/B-cell cluster), while other tissues including the brain, pancreas, liver, peripheral nervous system (PNS) and placenta occur as outgroups. These results complement a previous clustering analysis based on data from microarrays designed to detect exon skipping [24]. Calculating the mean SJD value for a given tissue when compared to the remaining 24 tissues (Figure 5c) identified a set of human tissues including the ovary, thyroid, breast, heart, bone, B-cell, uterus, lymph and colon that have 'generic' splicing patterns which are more similar to most other tissues. As expected, many of these tissues with generic splicing patterns overlap with the set of tissues that have low levels of AS (Figure 1). On the other hand, another group of tissues including the human brain, pancreas, liver and peripheral nervous system, have highly 'distinctive' splicing patterns that differ from most other tissues (Figure 5c). Many of these tissues were identified as having high proportions of AS in Figure 1. Taken together, these observations suggest that specific human tissues such as the brain, testis and liver, make more extensive use of AS in gene regulation and that these tissues have also diverged most from other tissues in the set of spliced isoforms they express. Although we are not aware of reliable, quantitative data on the relative abundance of different cell types in these tissues, a greater diversity of cell types is likely to contribute to higher SJD values for many of these tissues.

Comparison of alternative mRNA isoforms across 25 human tissues. (a) Color-coded representation of SJD values between pairs of tissues (see Figure 4 and Materials and methods for definition of SJD). (b) Hierarchical clustering of SJD values using average-linkage clustering. Groups of tissues in clusters with short branch lengths (for example, thyroid/ovary, B-cell/bone) have highly similar patterns of AS. (c) Mean SJD values (versus other 24 tissues) for each tissue.

Difference between introns and exons


Introns: Introns are segments of DNA that do not encode any amino acid sequence in the coding region.

Exons: Exons are the segments of DNA that encode a part of an amino acid sequence of a complete protein.

Encode the DNA

Introns: Introns belong to non-coding DNA.

Exons: Exons belong to the DNA encoder.


Introns: Introns are considered as the bases located between two exons.

Exons: Exons are the bases that encode an amino acid sequence of a protein.


Introns: Introns are only found in eukaryotes.

Exons: Exons are found in both prokaryotes and eukaryotes.

Movement in Nucleus

Introns: Introns remain in the nucleus by splicing the primary mRNA transcript during mRNA processing within the nucleus.

Exons: Exons leave the nucleus towards the cytoplasm after the production of mature mRNA.

Sequence conservation

Introns: Sequences in introns are less conserved compared to exons.

Exons: The sequences in the exons are highly conserved.

Presence in the genome

Introns: Introns are found in the primary transcription of DNA and mRNA.

Exons: Exons are found in both DNA and mRNA.


Introns: The function of introns is not clearly known, but it is considered to be a substantial fraction of the DNA.

Exons: The function of exons is to translate into a protein.


A gene is a segment of DNA that produces a functional product, either a polypeptide or an RNA. The intergenic regions of a gene are composed of introns. This means that a gene in eukaryotes consists of a coding region structure, which is divided into segments called exons Introns can be found between two exons. Introns belong to non-coding DNA.

All exons together with the intergenic regions are transcribed by RNA polymerase in the primary mRNA transcript. Introns are removed from the primary transcript during mRNA processing. Therefore, a mature mRNA consists only of exons.

Exon splicing can occur in an alternative way in polycistronic mRNAs in prokaryotes, producing more than one type of mature mRNA from a single primary mRNA transcript. Introns in the genome are considered a substantial fraction of the DNA, while exons encode proteins. Therefore, the main difference between introns and exons is their function in the genome.

Peer review information Nature Structural & Molecular Biology thanks Christopher Glass and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Anke Sparmann was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The Argument About Exon Biology

Scientists are only starting to understand the options presented by alternative splicing. There’s determined, a totally free on-line dictionary with highstock 4.2.

Linnean rankings are thought to be unimportant. Proctor Requirements A proctor is necessary for this course in case the student’s aim is to find a grade. That the fossil record, generally, suggests evolution is definitely an important bit of evidence, but it gets even more telling when it’s combined with other evidence for evolution.

An intriguing post that has made me think a little more about the matter. Burets are used when a precise number of liquid has to be used. For instance, an overfed pet may get obese and an individual’s skin tone may change because of exposure to the sun. We have a bath at least one time per day.

1 way which helps scientists place fossils. Some key transcripts can be spliced in a couple of different ways. Just the type of day you must find work done at the dig.

This may be due to the smaller quantity of SNVs existing within a little window, which can lessen the ability to detect PIRs. That’s the fossilization process on the job. These aren’t referred to and can be taken out at intervals to boost disc space. Introns contain quite a few of sequences that take part in splicing including spliceosome recognition websites. Since you may see, there’s a crystal clear signal around the splice websites and this signal is utilized by several programs that do splice site prediction.

Prokaryotic versus Eukaryotic Gene Expression

To understand how gene expression is regulated, we must first understand how a gene becomes a functional protein in a cell. The process occurs in both prokaryotic and eukaryotic cells, just in slightly different fashions.

Because prokaryotic organisms lack a cell nucleus, the processes of transcription and translation occur almost simultaneously. When the protein is no longer needed, transcription stops. As a result, the primary method to control what type and how much protein is expressed in a prokaryotic cell is through the regulation of DNA transcription into RNA. All the subsequent steps happen automatically. When more protein is required, more transcription occurs. Therefore, in prokaryotic cells, the control of gene expression is almost entirely at the transcriptional level.

The first example of such control was discovered using E. coli in the 1950s and 1960s by French researchers and is called the lac operon. The lac operon is a stretch of DNA with three adjacent genes that code for proteins that participate in the absorption and metabolism of lactose, a food source for E. coli. When lactose is not present in the bacterium’s environment, the lac genes are transcribed in small amounts. When lactose is present, the genes are transcribed and the bacterium is able to use the lactose as a food source. The operon also contains a promoter sequence to which the RNA polymerase binds to begin transcription between the promoter and the three genes is a region called the operator. When there is no lactose present, a protein known as a repressor binds to the operator and prevents RNA polymerase from binding to the promoter, except in rare cases. Thus very little of the protein products of the three genes is made. When lactose is present, an end product of lactose metabolism binds to the repressor protein and prevents it from binding to the operator. This allows RNA polymerase to bind to the promoter and freely transcribe the three genes, allowing the organism to metabolize the lactose.

Eukaryotic cells, in contrast, have intracellular organelles and are much more complex. Recall that in eukaryotic cells, the DNA is contained inside the cell’s nucleus and it is transcribed into mRNA there. The newly synthesized mRNA is then transported out of the nucleus into the cytoplasm, where ribosomes translate the mRNA into protein. The processes of transcription and translation are physically separated by the nuclear membrane transcription occurs only within the nucleus, and translation only occurs outside the nucleus in the cytoplasm. The regulation of gene expression can occur at all stages of the process (Figure 1). Regulation may occur when the DNA is uncoiled and loosened from nucleosomes to bind transcription factors ( epigenetic level), when the RNA is transcribed (transcriptional level), when RNA is processed and exported to the cytoplasm after it is transcribed ( post-transcriptional level), when the RNA is translated into protein (translational level), or after the protein has been made ( post-translational level).

Figure 1: Eukaryotic gene expression is regulated during transcription and RNA processing, which take place in the nucleus, as well as during protein translation, which takes place in the cytoplasm. Further regulation may occur through post-translational modifications of proteins.

The differences in the regulation of gene expression between prokaryotes and eukaryotes are summarized in Table 1.

  • RNA transcription occurs prior to protein translation, and it takes place in the nucleus. RNA translation to protein occurs in the cytoplasm.
  • RNA post-processing includes addition of a 5′ cap, poly-A tail, and excision of introns and splicing of exons.


This study analyzed RNA-Seq data of hippocampus brain tissues from 74 participants from the ACT study. Participants were diagnosed as cognitively normal elder controls (CN) or AD patients. As shown in Table 1, there were 24 AD and 50 CN participants, and the mean Braak stages of AD and CN were 4.4 and 2.78, respectively.

Differentially expressed exons in AD hippocampus tissue

Using the computational pipeline described in Fig. 1, normalized expression levels were calculated for each exon in a genome-wide manner, and a generalized linear regression model was used to identify AD-associated exon skipping events. After adjusting for multiple comparisons using the FDR method, we identified three exon skipping events in two genes, RELN and NOS1, as significantly associated with the AD (FDR-corrected p-value < 0.05 and fold change > 1.5). Two exons in RELN, exons 24 and 37, showed significantly lower expression levels in AD patients compared to CN participants, suggesting that the exons tend to be skipped more in the AD (Fig. 2a). The exon 24 was predicted to encode a hEGF domain region (PF00008, Fig. 2b). Furthermore, for AD participants, we investigated whether the Braak stage is associated with the exon skipping event. In 24 AD participants, 19 were in the Braak stages 4, 5 and 6. Among the 19 AD participants in higher Braak stages, 15 showed significantly lower expression levels of the exon 24 compared to the other 4 AD participants (one-way chi-squared test, p-value = 0.01), suggesting that the exon tends to be skipped more in higher Braak stages of AD participants. In NOS1, exon 23 had lower expression levels in AD patients compared to CN participants (Fig. 3a) and was a part of the NO_synthase domain (PF02898, Fig. 3b). Our results suggest that these exon skipping events may affect the function of the corresponding protein products.

Two AD-associated exon skipping events in RELN. a Comparison of the expression levels of each exon between AD and CN participants, indicating two significant exon skipping events (exons 24 and 37) in the AD. b Exon structure and the hEGF domain encoded by the skipped exons (UCSC genome browser)

One AD-associated exon skipping event in NOS1. a Comparison of the expression levels of each exon between AD and CN participants, indicating a significant exon skipping event (exon 23) in the AD. b Exon structure and the NO_synthase domain encoded by the skipped exon (UCSC genome browser)

Prediction of the effect of the identified exon skipping events on protein

To characterize the potential impact of the variants on the protein, each variant was analyzed using UniProt web browser and RaptorX. All of three skipped exons in RELN and NOS1 resulted in producing a coding sequence which is out-of-frame, potentially generating an undesired protein product. As presented in Fig. 4, the first AD-associated exon skipping (exon 24) in RELN showed lower expression levels of the exon in AD compared to CN participants (Fig. 4b FDR-corrected p-value = 0.034, fold change = 1.51). The adjacent exon 23 and 25 showed similar levels of difference but not significant (FDR = 0.154 for both exons, fold change = 1.39 and 1.37, respectively). The transcript (transcript1) retaining the exon 24, which encodes the hEGF, likely results in a functional version of the RELN gene, while the transcript which lacks exon 24 will lose the hEGF domain and thus produce the truncated protein product due to the out-of-frame of the exon length (Fig. 4a). Structural analysis of the truncated version of the protein suggests that exon skipping may be implicated in functional changes of RELN in the AD (Fig. 4c). The other exon (exon 37) in RELN is presented in Additional file 1: Figure S1. Additionally, exon 23 was identified as being skipped in NOS1 (Fig. 5a). AD participants had significantly lower expression levels of the exon compared to CN (Fig. 5b FDR corrected p-value = 0.043, fold change = 1.90). The transcript (transcript1) retaining the exon 23, which encodes a part of the NO_synthase, can be translated into the protein with normal functions of NOS1. In contrast, the transcript with the skipping of exon 23 may not only lose the NO_synthase domain but also produce the truncated protein due to the out-of-frame of the exon length. We also showed the partial loss of the protein structure due to the skipped exon using the protein 3D structure analysis (Fig. 5c).

Functional impact of the AD-associated exon skipping event (exon 24) in RELN. a Schema of the potential functional implication of exon skipping and splicing-associated SNP. b Normalized expression levels for exon 24 between AD and CN participants. c Structure alignment of the pair of transcript1 retaining exon 24 (green) and transcript 2 with the exon skipping (red)

Functional impact of the AD-associated exon skipping event (exon 23) in NOS1. a Skipping of exon 23. b Normalized expression levels for exon 23 between AD and CN participants. c Structure alignment of the pair of transcript1 retaining exon 23 (green) and transcript 2 with the exon skipping (red)

Association of SNPs affecting exon skipping with AD-related neuroimaging phenotypes

Next, we performed an association analysis of SNPs affecting exon skipping events with a global cortical measure of amyloid-β deposition as an AD-related quantitative phenotype. Using the splicing decision model, we first identified 46 and 11 SNPs in RELN and NOS1, respectively, potentially affecting the identified three exon skipping events (MAF > 1%) from HRC-based imputed ADNI GWAS data. We identified one SNP (rs362771) in intron adjacent to the skipped exon 24 in RELN as significantly associated with cortical amyloid-β levels (Fig. 6a permutation-based corrected p-value < 0.05). Furthermore, we performed an unbiased whole-brain-based imaging association analysis using age, sex, years of education as covariates to assess the effect of rs362771 on whole-brain amyloid-β deposition and identified significant associations after adjustment for multiple comparisons using cluster-wide FDR procedure. The minor allele of rs362771 conferred decreases in cortical amyloid-β levels in the right temporal and bilateral parietal lobes (Fig. 6b). As shown in Fig. 4a, we found that the SNP (rs362771) is located within the ISE site (5th site of hexametric sequence CCTTCC), suggesting that the SNP may affect the exon skipping and the skipped exon may be associated with AD pathogenesis.

Regional effects (a global cortical amyloid-β load) and voxel-wide association (b) of rs362771 in RELN affecting exon skipping with amyloid-β deposition

Creating families of proteins through differential nRNA splicing

The average vertebrate nRNA consists of relatively short exons (averaging about 140 bases) separated by introns that are usually much longer. Most mammalian nRNAs contain numerous exons. By splicing together different sets of exons, different cells can make different types of mRNAs, and hence, different proteins. Whether a sequence of RNA is recognized as an exon or as an intron is a crucial step in gene regulation. What is an intron in one cell's nucleus may be an exon in another cell's nucleus.

Alternative nRNA splicing is based on determining which sequences can be spliced out as introns. This can occur in several ways (Figure 5.28). Cells can differ in their ability to recognize the 5´ splice site (at the beginning of the intron) or the 3´ splice site (at the end of the intron). Or some cells could fail to recognize a sequence as an intron at all, retaining it within the message. The splicing of nRNA is mediated through a complex called a spliceosome, made up of small nuclear RNAs (snRNA) and proteins, that assembles at a splice site. Whether a spliceosome recognizes the splice sites depends on certain factors in the nucleus that can interact with those sites and compete or cooperate with the proteins that direct spliceosome formation. The 5´ splice site is normally recognized by small nuclear RNA U1 (U1 snRNA) and splicing factor 2 (SF2 also known as alternative splicing factor). The choice of alternative 3´ splice sites is often controlled by which splice site can best bind a protein called U2AF. The spliceosome forms when the 5´ and 3´ splice sites are brought together and the intervening RNA is cut out.

Figure 5.28

Schematic diagram of alternative nRNA splicing. Exons are represented as shaded boxes, alternatively spliced exons are represented by hatched boxes, and introns are represented by broad lines. By convention, the path of splicing is shown by fine V-shaped (more. )


5.14 The mechanism of differential nRNA splicing. Differential nRNA splicing depends on the assembly of the nucleosome and upon the ratio of certain proteins in the nucleus of the cell.

Differential RNA processing has been found to control the alternative forms of expression of genes encoding over 100 proteins. The deletion of certain potential exons in some cells but not in others enables one gene to create a family of closely related proteins. Instead of one gene-one polypeptide, one can have one gene-one family of proteins. For instance, alternative RNA splicing enables the α-tropomyosin gene to encode brain, liver, skeletal muscle, smooth muscle, and fibroblast forms of this protein (Figure 5.29 Breitbart et al. 1987). The nuclear RNA for α-tropomyosin contains 11 potential exons, but different sets of exons are used in different cells. Such different proteins encoded by the same gene are called splicing isoforms of the protein.

Figure 5.29

Alternative RNA splicing to form a family of rat α-tropomyosin proteins. The α-tropomyosin gene is represented on top. The numbers correspond to the amino acids encoded by the exons. The thin lines represent the sequences that become introns (more. )

In some instances, alternatively spliced RNAs yield proteins that play similar, yet distinguishable, roles in the same cell. The nuclear RNA for the Pax6 transcription factor is actually spliced to yield two types of Pax6 proteins. In about half the mRNAs, there is an added exon that interrupts one major DNA-binding site and enables another to be used (Epstein et al. 1994). The two forms of Pax 6 appear to be made at similar rates in all the cells expressing the Pax6 gene. If the human gene for PAX6 is mutated such that the 5´ splicing site becomes more efficient, the Pax6 isoform containing the amino acids encoded by the alternatively spliced exon is made in excess. The eyes of people with this mutation have defects in their lenses, corneas, and pupils.

If you think that differential splicing means that certain genes with dozens of introns can create thousands of different related proteins, you are probably correct. Proteins derived from the neurexin genes, for example, are found on the cell surfaces of developing neurons, and they may be important in specifying the connections that these neurons make. * These genes can be alternatively spliced at several different sites, creating hundreds of proteins from the same gene (Ullrich et al. 1995 Ichtchenko et al. 1995).

Differential nRNA Processing and Drosophila Sex Determination.

Watch the video: Splicing (May 2022).