Top Qs
Timeline
Chat
Perspective

LOC101928193

Protein-coding gene in the species Homo sapiens From Wikipedia, the free encyclopedia

Remove ads

LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria.[2] The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2.[3] The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides.[4] The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.[5]

Quick facts Identifiers, Aliases ...
Remove ads

Gene

Locus

Thumb
Cytogenic location of LOC101928193 at 9q34.2.[3] The gene is on the positive strand and is located from base pairs 133,189,767 to 133,192,979.

The cytogenic location of LOC101928193 in humans is located on the positive strand at 9q34.2. The molecular location of the protein-encoding region of LOC101928193 is from base pairs 133,189,767 to 133,192,979. Within this region, there is 1 intron and 2 exons.[4]

Gene neighborhood

LOC101928193 is flanked by GBGT1 and 0BP2B on chromosome 9.[5] GBGT1 encodes a member of the ABO gene family and also plays a role in synthesizing glycolipids that are involved in tropism and binding pathogens.[6] 0BP2B is a gene that associates with E-Selectin Level in the ABO gene region.[7]

Remove ads

mRNA

In humans, the LOC101928193 gene produces 3 transcript variants, which produce 3 isoforms of the protein.[4] The LOC101928193 isoform X1 is the longest one at 406 codons in length.[8] LOC101928193 isoform X2 is 388 codons long and LOC101928193 isoform X3 is 399 codons long.[9][10] All isoforms have 2 exons and their coding mRNA is 3213 nucleotides long.[4]

Protein

Summarize
Perspective

The molecular weight of LOC101928193 is 43.5 kilodaltons.[11] The isoelectric point is 9 pI.[11]

Thumb
LOC101928193 amino acid composition.[12] This is a glycine, valine, and serine rich protein. It is also methionine, asparagine, aspartic acid, glutamic acid, and lysine poor.

Composition

Thumb
LOC101928193 predicted secondary structure.[13] The orange c's indicate predicted coils and the red e's indicate predicted beta sheets.

Compared to most human proteins, there are more valine, glycine, serine, histidine, and phenylalanine residues in LOC101928193.[12] LOC101928193 is an alanine, methionine, asparagine, aspartic acid, glutamic acid, and lysine poor protein. The enrichment of all other amino acids is normal compared to other human proteins. LOC101928193 composition is highly conserved between mammals.[12]

More information Amino acid, Enrichment level ...

LOC101928193 has an amino acid charge distribution of 0.7% negative, 4.9% positive, and 94.4% neutral. There are no charge runs, hydrophobic segments, or transmembrane domains.

Thumb
LOC101928193 predicted tertiary structure. Image coloured by rainbow N → C terminus.[14]

Domains and motifs

There are two different motifs present in LOC101928193. Myristoylation sites are found in the protein sequence 17 times, and a zinc finger domain motif occurs once.[15] The presence of myristoylation sites indicates that LOC101928193 may function in membrane targeting, protein-protein interactions, and signal transduction pathways. Zinc finger domain motifs aid in gene transcription, cell adhesion, protein folding, and chromatin remodeling.[15]

Primary sequence

The LOC101928193 primary coding sequence mRNA is 3213 nucleotides long.[8] There are no upstream open-reading frames, Kozak consensus sequences, or transmembrane regions.

Thumb
LOC101928193 conceptual translation with post-translational modifications and motifs.

Secondary structure

LOC101928193 has a predicted secondary structure of 56.40% random coils and 43.60% beta sheets.[13] No alpha helices are predicted to occur. Due to the lack of alpha helices in the protein, no coiled coils are predicted to occur in the LOC101928193 secondary structure.[16]

Tertiary structure

The tertiary structure of LOC101928193 is an all beta-sheet protein, as can be seen by its predicted tertiary structure. Both the N-terminus and the C-terminus lack beta-sheets.

Post-translational modifications

O-GlcNAc

There are 13 predicted O-GlcNAc sites within the LOC101928193 protein.[17] O-GlcNAc is a unique form of protein glycosylation that occurs exclusively in the nuclear and cytoplasmic compartments of the cell.[18] O-Glc-NAcylated proteins are abundant on proteins involved in signaling pathways, stress responses, cytoskeletal assembly, and energy metabolism.

N-linked glycosylation

There are no N-linked glycosylation sites due to the absence of asparagine residues.

Phosphorylation

LOC101928193 has many sites of phosphorylation at several serines, threonines, and tyrosines throughout its structure that results in a conformational change and aids in signaling pathways and regulation. There are 33 predicted phosphorylation sites.[19] The relative amount of phosphorylation sites is highly conserved throughout orthologs of LOC101928193.[19]

Subcellular localization

LOC101928193 is targeted to the cytoplasm for Homo sapiens, rodents, amphibians, fish, and mollusks.[20] It is predicted to localize in the nucleus for cnidarians, fungi, and bacteria.[20]

Remove ads

Expression

Thumb
The Mean RPKM Values of 27 Different Human Tissues From RNA-Sequencing of LOC101928193.[4] The protein is most highly expressed in the thyroid, ovaries, skin, and testes.

LOC101928193 is not expressed ubiquitously, but is instead tissue specific in low levels of mRNA abundance compared to other human proteins.[5] LOC101928193 has the highest level of expression in the thyroid and has high levels of expression in the ovaries, skin, and testes.[5] Additionally, the gene is expressed in 23 other tissues at levels lower than 0.1 RPKM (Reads Per Kilobase of transcript per Million mapped reads) in humans. Other studies have also found that tissue-specific circular RNA induction of LOC101928193 during human fetal development has the highest levels in the heart, kidney, and stomach at 10 weeks gestational time.[4]

Remove ads

Regulation of Expression

Summarize
Perspective
Thumb
LOC101928193 promoter and isoforms.[4][21] There is one promoter and three isoforms.

Epigenetic

Epigenetic processes such as DNA methylation and histone modification that control expression have not been found in LOC101928193.

Transcriptional

Thumb
LOC101928193 5' UTR stem loops near AUG.[22]

Promoter

There is one promoter for the LOC101928193 gene (GXP_6058323), and it is 1101 nucleotides long on the positive strand from base pairs 133,188,767 to 133,189,867 on chromosome 9.[21] The transcription start site can be found at the 1001 base pair position.[21]

Transcription factor binding sites

Several transcription factors are predicted to bind to the promoter sequence. Some examples include:[23]

Based on the functions of these transcription factors, it is possible that LOC101928193 may have been involved in gene repression, hematopoiesis regulation, fetal development, inhibition, DNA-binding, or limb development.

Translational and mRNA stability

Under conditions consistent with the temperature in the human body, multiple stem loops are predicted to occur in the 5' UTR, the coding region of the protein, and in the 3' UTR. The stem loops direct RNA folding, protect structural stability for mRNA, provide recognition sites for RNA binding proteins, and serve as a substrate for enzymatic reactions.[24] There is an interior loop and a stem loop in the mRNA near AUG on the 5' UTR.[22] These structures are often bound by proteins or cause the attenuation of a transcript in order to regulate translation. Furthermore, these stem-loops aid in mRNA stability and the predicted 5' UTR conformation has a free energy of -124.30 kcal/mol.[22] In the 3' UTR, there are 6 predicted stem loops to occur with a free energy of -310.70 kcal/mol, which is spontaneously formed.[22] There are no known microRNA targets in the 3' UTR.

Remove ads

Homology and Evolution

Summarize
Perspective
Thumb
LOC101928193 Unrooted Phylogenetic Tree. Color coded by taxonomic group: Mammals (orange), amphibians (green), fish (blue), mollusks (yellow), cnidarians (teal), fungi (lime green), and bacteria (purple).[25]

Paralogs

There are no known paralogs of LOC101928193.

Orthologs

Thumb
LOC101929193 rate of evolution in comparison to cytochrome c and fibrinogen.[3][26]
Thumb
LOC101928193 conserved coding domain found from a multiple sequence alignment of orthologs.[27] A sequence logo provides a richer and more precise description of, for example, a binding site, than would a consensus sequence.

LOC101928193 has over 20 orthologs that are present in mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria.[8] The most distant orthologs are found in bacteria that diverged from humans more than 4.29 billion years ago.[26] No orthologs for LOC101928193 have been discovered in close mammalian relatives of humans, including in primates. Below is a table of a range of organisms with orthologs related to the human LOC101928193 protein.

More information Species, Common name ...

Distant homologs

The most distant detectable homolog is in several viral and bacterial species that diverged from humans over 4.29 billion years ago.[26]

Homologous domains

There is a conserved coding region of 28 amino acids that is repeated six times in the protein-encoding region within LOC101928193 and across its orthologs. This domain begins with a glycine at the amino acid position of 194, 222, 250, 278, 306, and 334 within LOC101928193. The domain is conserved across mammals, cnidarians, fish, bacteria, and amphibians, and even in some species within these taxonomic groups that are not orthologs but share the same domain. The sequence always begins with a polar glycine and a hydrophobic valine. There is also a conserved basic arginine within the middle of the sequence.

Phylogeny

No other species has LOC101928193 in the same form as in humans. Several species within mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria have LOC101928193 in a slightly different form with a similarity usually between 30 and 50%. Several taxonomic groups do not express any proteins or genes similar to LOC101928193 including Archaeans, plants, and several animal species.

Inheritance

LOC101928193 may not follow a normal inheritance pattern or occur regularly in the genome as it has a scattered occurrence throughout evolutionarily related species.[2] Furthermore, the similarity between orthologs of LOC101928193 is constant over time and is not higher in closely related taxonomic groups or lower in distantly related taxonomic groups. It is possible that LOC101928193 incorporates into the genome of different species through viral pathways as LOC101928193 has been found to have ligand binding sites for cyanobacteria proteins, like chlorophyll a.[29] Orthologs of LOC101928193 have been found to contain UL36, which is a large tegument protein that functions in the viral cycle and is commonly found in human herpesvirus simplex virus 1.[30][31]

Remove ads

References

Suggested Reading

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads