Top Qs
Timeline
Chat
Perspective
Shapiro–Senapathy algorithm
From Wikipedia, the free encyclopedia
Remove ads
The Shapiro—Senapathy algorithm (S&S) algorithm is one of the most influential computational methods for identifying splice sites in eukaryotic genes. The algorithm employs a Position Weight Matrix (PWM) scoring formula to predict donor and acceptor splice sites in any given gene. This methodology has been instrumental in discovering splice sites and disease-causing splice site mutations in the human genome, and has become a standard tool in clinical genomics.
![]() | This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|

The S&S algorithm has been cited in over 6,000 peer-reviewed clinical studies, and has formed the basis of widely used software, including Human Splicing Finder[1], SROOGLE[2],and Alamut[3], which identify splice sites and splice site mutations that cause disease. The algorithm has uncovered splicing mutations in diseases ranging from cancers to inherited disorders, and predicted the deleterious effects of these mutations including exon skipping, intron retention, and cryptic splice site activation.
Remove ads
The algorithm
Summarize
Perspective
A splice site defines the boundary between a coding exon and a non-coding intron in eukaryotic genes. The S&S algorithm employs a sliding window, corresponding to the length of the splice site motif, to scan a gene sequence and detect potential splice sites. For each sliding window, the algorithm calculates a score by comparing the nucleotide sequence to a Position Weight Matrix (PWM) derived from known splice sites. This formula generates a percentile score, indicating the likelihood that a given sequence functions as a donor or acceptor splice site.
The majority of disease-causing mutations in the human genome are located in splice sites. Clinical genomics studies analyze the splice site scores generated by the S&S algorithm to predict the consequences of splice site mutations including exon skipping and intron retention. The algorithm’s sensitivity to single-nucleotide changes makes it a valuable tool for determining mutations that may impact RNA splicing and contribute to disease.
In addition to identifying real splice sites, the S&S algorithm has been used to discover cryptic splice sites — alternative splice sites activated by mutations — which may disrupt normal splicing. The algorithm detects mutations that lead to the activation of cryptic splice sites, which may be located proximal to real splice sites or deep within non-coding introns. It has thus played a central role in determining the causes of numerous diseases that are due to cryptic splicing.
Remove ads
Cancer gene discovery using S&S
Summarize
Perspective
Algorithm have been instrumental in identifying splice-site mutations linked to various cancers. For example, genes causing commonly occurring cancers including breast cancer,[4][5][6] ovarian cancer,[7][8][9] colorectal cancer,[10][11][12] leukemia,[13][14] head and neck cancers,[15][16] prostate cancer,[17][18] retinoblastoma,[19][20] squamous cell carcinoma,[21][22][23] gastrointestinal cancer,[24][25] melanoma,[26][27] liver cancer,[28][29] Lynch syndrome,[30][31][11] skin cancer,[21][32][33] and neurofibromatosis[34][35] have been found. In addition, splicing mutations in genes causing less commonly known cancers including gastric cancer,[36][37][24] gangliogliomas,[38][39] Li-Fraumeni syndrome, Loeys–Dietz syndrome, Osteochondromas (bone tumor), Nevoid basal cell carcinoma syndrome,[7] and Pheochromocytomas[9] have been identified.
Specific mutations in different splice sites in various genes causing breast cancer (e.g., BRCA1, PALB2), ovarian cancer (e.g., SLC9A3R1, COL7A1, HSD17B7), colon cancer (e.g., APC, MLH1, DPYD), colorectal cancer (e.g., COL3A1, APC, HLA-A), skin cancer (e.g., COL17A1, XPA, POLH), and Fanconi anemia (e.g., FANC, FANA) have been uncovered. The mutations in the donor and acceptor splice sites in different genes causing a variety of cancers that have been identified by S&S are shown in Table 1.
Table 1. Mutations in the donor and acceptor splice sites in different genes
Remove ads
Discovery of genes causing inherited disorders using S&S
Summarize
Perspective
Specific mutations in different splice sites in various genes that cause inherited disorders, including, for example, Type 1 diabetes (e.g., PTPN22, TCF1 (HCF-1A)), hypertension (e.g., LDL, LDLR, LPL), Marfan syndrome (e.g., FBN1, TGFBR2, FBN2), cardiac diseases (e.g., COL1A2, MYBPC3, ACTC1), eye disorders (e.g., EVC, VSX1) have been uncovered. A few example mutations in the donor and acceptor splice sites in different genes causing a variety of inherited disorders identified using S&S are shown in Table 2.
Table 2. Mutations in the donor and acceptor splice sites in different genes causing inherited disorders
Remove ads
Genes causing immune system disorders
Summarize
Perspective
More than 100 immune system disorders affect humans, including inflammatory bowel diseases, multiple sclerosis, systemic lupus erythematosus, bloom syndrome, familial cold autoinflammatory syndrome, and dyskeratosis congenita. The Shapiro–Senapathy algorithm has been used to discover genes and mutations involved in many immune disorder diseases, including Ataxia telangiectasia, B-cell defects, epidermolysis bullosa, and X-linked agammaglobulinemia.
Xeroderma pigmentosum, an autosomal recessive disorder is caused by faulty proteins formed due to new preferred splice donor site identified using S&S algorithm and resulted in defective nucleotide excision repair.[27]
Type I Bartter syndrome (BS) is caused by mutations in the gene SLC12A1. S&S algorithm helped in disclosing the presence of two novel heterozygous mutations c.724 + 4A > G in intron 5 and c.2095delG in intron 16 leading to complete exon 5 skipping.[28]
Mutations in the MYH gene, which is responsible for removing the oxidatively damaged DNA lesion are cancer-susceptible in the individuals. The IVS1+5C plays a causative role in the activation of a cryptic splice donor site and the alternative splicing in intron 1, S&S algorithm shows, guanine (G) at the position of IVS+5 is well conserved (at the frequency of 84%) among primates. This also supported the fact that the G/C SNP in the conserved splice junction of the MYH gene causes the alternative splicing of intron 1 of the β type transcript.[29]
Splice site scores were calculated according to S&S to find EBV infection in X-linked lymphoproliferative disease.[57] Identification of Familial tumoral calcinosis (FTC) is an autosomal recessive disorder characterized by ectopic calcifications and elevated serum phosphate levels and it is because of aberrant splicing.[58]
Remove ads
Application of S&S in hospitals for clinical practice and research
Summarize
Perspective
The Shapiro–Senapathy (S&S) algorithm has played a significant role in advancing the diagnosis and treatment of human diseases through its application in modern clinical genomics. With the widespread adoption of next-generation sequencing (NGS) technologies, the S&S algorithm is now routinely integrated into clinical practice by geneticists and diagnostic laboratories. It is implemented in various computational tools such as Human Splicing Finder (HSF),[1] Splice Site Finder (SSF),[59] and Alamut Visual,[3] which assist in interpreting the functional impact of genetic variants on RNA splicing.
The algorithm is particularly useful in identifying pathogenic splice site mutations in cases where the clinical presentation is unclear or where conventional diagnostic methods have failed to identify a causative gene. Its utility has been demonstrated across diverse patient cohorts, including individuals from different ethnic backgrounds with various cancers and inherited genetic disorders. The following are selected examples illustrating its application in clinical research.
Cancers
Inherited disorders
Remove ads
S&S - Algorithm for identifying splice sites, exons and split genes
Summarize
Perspective
The Shapiro–Senapathy algorithm (SSA) was developed to identify splice sites in uncharacterized genomic sequences, with early applications in the Human Genome Project.[94][95] The method introduced a Position Weight Matrix (PWM)-based approach to analyze splicing sequences across eukaryotic organisms, marking the first computational framework to systematically define splice sites using probabilistic scoring.
Key innovations of the algorithm included:
- Exon Detection – Exons were defined as sequences bounded by acceptor and donor splice sites with S&S scores above a threshold, requiring an open reading frame (ORF) for validation.
- Gene Prediction – The method enabled the identification of complete genes by assembling predicted exons, forming a basis for later gene-finding tools.
- Mutation Analysis – The algorithm distinguishes deleterious splice-site mutations (which disrupt protein function by lowering S&S scores) from neutral variations. This capability allowed researchers to study disease-linked cryptic splice sites in humans, animals, and plants.
SSA's PWM-based framework influenced subsequent computational methods, including machine learning and neural network approaches, for splice-site prediction and alternative splicing research. It remains a foundational tool in genomics and disease studies.
Remove ads
Discovering the mechanisms of aberrant splicing in diseases
Summarize
Perspective
The Shapiro–Senapathy algorithm has been used to determine the various aberrant splicing mechanisms in genes due to deleterious mutations in the splice sites, which cause numerous diseases. Deleterious splice site mutations impair the normal splicing of the gene transcripts, and thereby make the encoded protein defective. A mutant splice site can become “weak” compared to the original site, due to which the mutated splice junction becomes unrecognizable by the spliceosomal machinery. This can lead to the skipping of the exon in the splicing reaction, resulting in the loss of that exon in the spliced mRNA (exon-skipping). On the other hand, a partial or complete intron could be included in the mRNA due to a splice site mutation that makes it unrecognizable (intron inclusion). A partial exon-skipping or intron inclusion can lead to premature termination of the protein from the mRNA, which will become defective leading to diseases. The S&S has thus paved the way to determine the mechanisms by which a deleterious mutation could lead to a defective protein, resulting in different diseases depending on which gene is affected.
Examples of splicing aberrations
An example of splicing aberration (exon skipping) caused by a mutation in the donor splice site in the exon 8 of MLH1 gene that led to colorectal cancer is given below. This example shows that a mutation in a splice site within a gene can lead to a profound effect in the sequence and structure of the mRNA, and the sequence, structure and function of the encoded protein, leading to disease.

Remove ads
S&S in cryptic splice sites research and medical applications
Summarize
Perspective
The proper identification of splice sites has to be highly precise as the consensus splice sequences are very short and there are many other sequences similar to the authentic splice sites within gene sequences, which are known as cryptic, non-canonical, or pseudo splice sites. When an authentic or real splice site is mutated, any cryptic splice sites present close to the original real splice site could be erroneously used as authentic site, resulting in an aberrant mRNA. The erroneous mRNA may include a partial sequence from the neighboring intron or lose a partial exon, which may result in a premature stop codon. The result may be a truncated protein that would have lost its function completely.
Shapiro–Senapathy algorithm can identify the cryptic splice sites, in addition to the authentic splice sites. Cryptic sites can often be stronger than the authentic sites, with a higher S&S score. However, due to the lack of an accompanying complementary donor or acceptor site, this cryptic site will not be active or used in a splicing reaction. When a neighboring real site is mutated to become weaker than the cryptic site, then the cryptic site may be used instead of the real site, resulting in a cryptic exon and an aberrant transcript.
Numerous diseases have been caused by cryptic splice site mutations or usage of cryptic splice sites due to the mutations in authentic splice sites.[100][101][102][103][104]
Remove ads
S&S in animal and plant genomics research
Summarize
Perspective
S&S has also been used in RNA splicing research in many animals[105][106][107][108][109] and plants.[110][111][112][113][114]
The mRNA splicing plays a fundamental role in gene functional regulation. Very recently, it has been shown that A to G conversions at splice sites can lead to mRNA mis-splicing in Arabidopsis.[110] The splicing and exon–intron junction prediction coincided with the GT/AG rule (S&S) in the Molecular characterization and evolution of carnivorous sundew (Drosera rotundifolia L.) class V b-1,3-glucanase.[111] Unspliced (LSDH) and spliced (SSDH) transcripts of NAD+ dependent sorbitol dehydroge nase (NADSDH) of strawberry (Fragaria ananassa Duch., cv. Nyoho) were investigated for phytohormonal treatments.[112]
Ambra1 is a positive regulator of autophagy, a lysosome-mediated degradative process involved both in physiological and pathological conditions. Nowadays, this function of Ambra1 has been characterized only in mammals and zebrafish.[106] Diminution of rbm24a or rbm24b gene products by morpholino knockdown resulted in significant disruption of somite formation in mouse and zebrafish.[107] Dr.Senapathy algorithm used extensively to study intron-exon organization of fut8 genes. The intron-exon boundaries of Sf9 fut8 were in agreement with the consensus sequence for the splicing donor and acceptor sites concluded using S&S.[108]
References
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads