Overview

In this script, we give the gene annotations of the genes that seem to vary greatly between the clusters we get. This would be indicative of the markers that are driving the different clusters. If we can find that for a cluster which is mainly represented in a particular tissue type, if the genes significantly differentially expressed in that cluster are indeed related to the tissue in terms of its annotation, then we can say that the clustering makes biological sense.

Extracting top driving genes

GoM_output <- get(load("../external_data/GTEX_V6/gtexv6fit.k.20.master.rda"));
topics_theta <- GoM_output$theta;

top_features <- ExtractTopFeatures(topics_theta, top_features=100, method="poisson", options="min");

gene_names <- as.vector(as.matrix(read.table("../external_data/GTEX_V6/gene_names_GTEX_V6.txt")))
gene_names <- substring(gene_names,1,15);
xli  <-  gene_names;
gene_list <- do.call(rbind, lapply(1:dim(top_features)[1], function(x) gene_names[top_features[x,]]))
write.table(gene_names, paste0("../utilities/gene_names_all_gtex.txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 1 Annotations

out <- mygene::queryMany(gene_list[1,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id summary query name notfound
NEAT1 283131 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. ENSG00000245532 nuclear paraspeckle assembly transcript 1 (non-protein coding) NA
IGFBP5 3488 NA ENSG00000115461 insulin like growth factor binding protein 5 NA
CCNL2 81669 The protein encoded by this gene belongs to the cyclin family. Through its interaction with several proteins, such as RNA polymerase II, splicing factors, and cyclin-dependent kinases, this protein functions as a regulator of the pre-mRNA splicing process, as well as in inducing apoptosis by modulating the expression of apoptotic and antiapoptotic proteins. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. ENSG00000221978 cyclin L2 NA
SRSF5 6430 The protein encoded by this gene is a member of the serine/arginine (SR)-rich family of pre-mRNA splicing factors, which constitute part of the spliceosome. Each of these factors contains an RNA recognition motif (RRM) for binding RNA and an RS domain for binding other proteins. The RS domain is rich in serine and arginine residues and facilitates interaction between different SR splicing factors. In addition to being critical for mRNA splicing, the SR proteins have also been shown to be involved in mRNA export from the nucleus and in translation. Alternative splicing results in multiple transcript variants. ENSG00000100650 serine/arginine-rich splicing factor 5 NA
PNISR 25957 NA ENSG00000132424 PNN-interacting serine/arginine-rich protein NA
SRRM2 23524 NA ENSG00000167978 serine/arginine repetitive matrix 2 NA
SNRNP70 6625 NA ENSG00000104852 small nuclear ribonucleoprotein U1 subunit 70 NA
MYO15B ENSG00000266714 NA ENSG00000266714 myosin XVB NA
RBM6 10180 NA ENSG00000004534 RNA binding motif protein 6 NA
CIRBP 1153 NA ENSG00000099622 cold inducible RNA binding protein NA
RBM39 9584 This gene encodes a member of the U2AF65 family of proteins. The encoded protein is found in the nucleus, where it co-localizes with core spliceosomal proteins. It has been shown to play a role in both steroid hormone receptor-mediated transcription and alternative splicing, and it is also a transcriptional coregulator of the viral oncoprotein v-Rel. Multiple transcript variants have been observed for this gene. A related pseudogene has been identified on chromosome X. ENSG00000131051 RNA binding motif protein 39 NA
ZNF83 55769 NA ENSG00000167766 zinc finger protein 83 NA
JUN 3725 This gene is the putative transforming gene of avian sarcoma virus 17. It encodes a protein which is highly similar to the viral protein, and which interacts directly with specific target DNA sequences to regulate gene expression. This gene is intronless and is mapped to 1p32-p31, a chromosomal region involved in both translocations and deletions in human malignancies. ENSG00000177606 jun proto-oncogene NA
NUMA1 4926 This gene encodes a large protein that forms a structural component of the nuclear matrix. The encoded protein interacts with microtubules and plays a role in the formation and organization of the mitotic spindle during cell division. Chromosomal translocation of this gene with the RARA (retinoic acid receptor, alpha) gene on chromosome 17 have been detected in patients with acute promyelocytic leukemia. Alternative splicing results in multiple transcript variants. ENSG00000137497 nuclear mitotic apparatus protein 1 NA
KAT2A 2648 KAT2A, or GCN5, is a histone acetyltransferase (HAT) that functions primarily as a transcriptional activator. It also functions as a repressor of NF-kappa-B (see MIM 164011) by promoting ubiquitination of the NF-kappa-B subunit RELA (MIM 164014) in a HAT-independent manner (Mao et al., 2009 [PubMed 19339690]). ENSG00000108773 lysine acetyltransferase 2A NA
TIA1 7072 The product encoded by this gene is a member of a RNA-binding protein family and possesses nucleolytic activity against cytotoxic lymphocyte (CTL) target cells. It has been suggested that this protein may be involved in the induction of apoptosis as it preferentially recognizes poly(A) homopolymers and induces DNA fragmentation in CTL targets. The major granule-associated species is a 15-kDa protein that is thought to be derived from the carboxyl terminus of the 40-kDa product by proteolytic processing. Alternative splicing resulting in different isoforms of this gene product has been described in the literature. ENSG00000116001 TIA1 cytotoxic granule-associated RNA binding protein NA
ATN1 1822 Dentatorubral pallidoluysian atrophy (DRPLA) is a rare neurodegenerative disorder characterized by cerebellar ataxia, myoclonic epilepsy, choreoathetosis, and dementia. The disorder is related to the expansion from 7-23 copies to 49-75 copies of a trinucleotide repeat (CAG/CAA) within this gene. The encoded protein includes a serine repeat and a region of alternating acidic and basic amino acids, as well as the variable glutamine repeat. Alternative splicing results in two transcripts variants that encode the same protein. ENSG00000111676 atrophin 1 NA
HP1BP3 50809 NA ENSG00000127483 heterochromatin protein 1 binding protein 3 NA
CLK1 1195 This gene encodes a member of the CDC2-like (or LAMMER) family of dual specificity protein kinases. In the nucleus, the encoded protein phosphorylates serine/arginine-rich proteins involved in pre-mRNA processing, releasing them into the nucleoplasm. The choice of splice sites during pre-mRNA processing may be regulated by the concentration of transacting factors, including serine/arginine rich proteins. Therefore, the encoded protein may play an indirect role in governing splice site selection. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000013441 CDC like kinase 1 NA
EEF1D 1936 This gene encodes a subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This subunit, delta, functions as guanine nucleotide exchange factor. It is reported that following HIV-1 infection, this subunit interacts with HIV-1 Tat. This interaction results in repression of translation of host cell proteins and enhanced translation of viral proteins. Several alternatively spliced transcript variants encoding multiple isoforms have been found for this gene. Related pseudogenes have been defined on chromosomes 1, 6, 7, 9, 11, 13, 17, 19. ENSG00000104529 eukaryotic translation elongation factor 1 delta NA
FAM160B2 64760 NA ENSG00000158863 family with sequence similarity 160 member B2 NA
GSTM2 2946 Cytosolic and membrane-bound forms of glutathione S-transferase are encoded by two distinct supergene families. At present, eight distinct classes of the soluble cytoplasmic mammalian glutathione S-transferases have been identified: alpha, kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a glutathione S-transferase that belongs to the mu class. The mu class of enzymes functions in the detoxification of electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins and products of oxidative stress, by conjugation with glutathione. The genes encoding the mu class of enzymes are organized in a gene cluster on chromosome 1p13.3 and are known to be highly polymorphic. These genetic variations can change an individual’s susceptibility to carcinogens and toxins as well as affect the toxicity and efficacy of certain drugs. ENSG00000213366 glutathione S-transferase mu 2 (muscle) NA
PCED1A 64773 The protein encoded by this gene is a member of the GDSL/SGNH superfamily. Members of this family are hydrolytic enzymes with esterase and lipase activity and broad substrate specificity. This protein belongs to the Pmr5-Cas1p-esterase subfamily in that it contains the catalytic triad comprised of serine, aspartate and histidine and lacks two conserved regions (glycine after strand S2 and GxND motif). A pseudogene of this gene has been identified on the long arm of chromosome 2. Alternative splicing results in multiple transcript variants that encode different protein isoforms. ENSG00000132635 PC-esterase domain containing 1A NA
SULF2 55959 Heparan sulfate proteoglycans (HSPGs) act as coreceptors for numerous heparin-binding growth factors and cytokines and are involved in cell signaling. Heparan sulfate 6-O-endosulfatases, such as SULF2, selectively remove 6-O-sulfate groups from heparan sulfate. This activity modulates the effects of heparan sulfate by altering binding sites for signaling molecules (Dai et al., 2005 [PubMed 16192265]). ENSG00000196562 sulfatase 2 NA
ARRDC3 57561 NA ENSG00000113369 arrestin domain containing 3 NA
NFATC4 4776 This gene encodes a member of the nuclear factor of activated T cells (NFAT) protein family. The encoded protein is part of a DNA-binding transcription complex. This complex consists of at least two components: a preexisting cytosolic component that translocates to the nucleus upon T cell receptor stimulation and an inducible nuclear component. NFAT proteins are activated by the calmodulin-dependent phosphatase, calcineurin. The encoded protein plays a role in the inducible expression of cytokine genes in T cells, especially in the induction of interleukin-2 and interleukin-4. Alternative splicing results in multiple transcript variants. ENSG00000100968 nuclear factor of activated T-cells 4 NA
RSRP1 57035 NA ENSG00000117616 arginine/serine-rich protein 1 NA
AHSA2 130872 NA ENSG00000173209 AHA1, activator of heat shock 90kDa protein ATPase homolog 2 (yeast) NA
SF3B1 23451 This gene encodes subunit 1 of the splicing factor 3b protein complex. Splicing factor 3b, together with splicing factor 3a and a 12S RNA unit, forms the U2 small nuclear ribonucleoproteins complex (U2 snRNP). The splicing factor 3b/3a complex binds pre-mRNA upstream of the intron’s branch site in a sequence independent manner and may anchor the U2 snRNP to the pre-mRNA. Splicing factor 3b is also a component of the minor U12-type spliceosome. The carboxy-terminal two-thirds of subunit 1 have 22 non-identical, tandem HEAT repeats that form rod-like, helical structures. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000115524 splicing factor 3b subunit 1 NA
PNPLA7 375775 Human patatin-like phospholipases, such as PNPLA7, have been implicated in regulation of adipocyte differentiation and have been induced by metabolic stimuli (Wilson et al., 2006 [PubMed 16799181]). ENSG00000130653 patatin like phospholipase domain containing 7 NA
MTMR9LP ENSG00000220785 NA ENSG00000220785 myotubularin related protein 9-like, pseudogene NA
COL16A1 1307 This gene encodes the alpha chain of type XVI collagen, a member of the FACIT collagen family (fibril-associated collagens with interrupted helices). Members of this collagen family are found in association with fibril-forming collagens such as type I and II, and serve to maintain the integrity of the extracellular matrix. High levels of type XVI collagen have been found in fibroblasts and keratinocytes, and in smooth muscle and amnion. ENSG00000084636 collagen type XVI alpha 1 NA
HNRNPA1 3178 This gene encodes a member of a family of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs), which are RNA-binding proteins that associate with pre-mRNAs in the nucleus and influence pre-mRNA processing, as well as other aspects of mRNA metabolism and transport. The protein encoded by this gene is one of the most abundant core proteins of hnRNP complexes and plays a key role in the regulation of alternative splicing. Mutations in this gene have been observed in individuals with amyotrophic lateral sclerosis 20. Multiple alternatively spliced transcript variants have been found. There are numerous pseudogenes of this gene distributed throughout the genome. ENSG00000135486 heterogeneous nuclear ribonucleoprotein A1 NA
RBM5 10181 This gene is a candidate tumor suppressor gene which encodes a nuclear RNA binding protein that is a component of the spliceosome A complex. The encoded protein plays a role in the induction of cell cycle arrest and apoptosis through pre-mRNA splicing of multiple target genes including the tumor suppressor protein p53. This gene is located within the tumor suppressor region 3p21.3, and may play a role in the inhibition of tumor transformation and progression of several malignancies including lung cancer. ENSG00000003756 RNA binding motif protein 5 NA
ZMIZ1 57178 This gene encodes a member of the PIAS (protein inhibitor of activated STAT) family of proteins. The encoded protein regulates the activity of various transcription factors, including the androgen receptor, Smad3/4, and p53. The encoded protein may also play a role in sumoylation. A translocation between this locus on chromosome 10 and the protein tyrosine kinase ABL1 locus on chromosome 9 has been associated with acute lymphoblastic leukemia. ENSG00000108175 zinc finger MIZ-type containing 1 NA
WDR6 11180 This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. The encoded protein interacts with serine/threonine kinase 11, and is implicated in cell growth arrest. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000178252 WD repeat domain 6 NA
VEGFA 7422 This gene is a member of the PDGF/VEGF growth factor family. It encodes a heparin-binding protein, which exists as a disulfide-linked homodimer. This growth factor induces proliferation and migration of vascular endothelial cells, and is essential for both physiological and pathological angiogenesis. Disruption of this gene in mice resulted in abnormal embryonic blood vessel formation. This gene is upregulated in many known tumors and its expression is correlated with tumor stage and progression. Elevated levels of this protein are found in patients with POEMS syndrome, also known as Crow-Fukase syndrome. Allelic variants of this gene have been associated with microvascular complications of diabetes 1 (MVCD1) and atherosclerosis. Alternatively spliced transcript variants encoding different isoforms have been described. There is also evidence for alternative translation initiation from upstream non-AUG (CUG) codons resulting in additional isoforms. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is antiangiogenic. Expression of some isoforms derived from the AUG start codon is regulated by a small upstream open reading frame, which is located within an internal ribosome entry site. ENSG00000112715 vascular endothelial growth factor A NA
CREBZF 58487 NA ENSG00000137504 CREB/ATF bZIP transcription factor NA
FAM193B 54540 NA ENSG00000146067 family with sequence similarity 193 member B NA
MAN2C1 4123 NA ENSG00000140400 mannosidase alpha class 2C member 1 NA
D2HGDH 728294 This gene encodes D-2hydroxyglutarate dehydrogenase, a mitochondrial enzyme belonging to the FAD-binding oxidoreductase/transferase type 4 family. This enzyme, which is most active in liver and kidney but also active in heart and brain, converts D-2-hydroxyglutarate to 2-ketoglutarate. Mutations in this gene are present in D-2-hydroxyglutaric aciduria, a rare recessive neurometabolic disorder causing developmental delay, epilepsy, hypotonia, and dysmorphic features. ENSG00000180902 D-2-hydroxyglutarate dehydrogenase NA
SNHG5 ENSG00000203875 NA ENSG00000203875 small nucleolar RNA host gene 5 NA
PSMA3-AS1 379025 NA ENSG00000257621 PSMA3 antisense RNA 1 NA
LUC7L 55692 The LUC7L gene may represent a mammalian heterochromatic gene, encoding a putative RNA-binding protein similar to the yeast Luc7p subunit of the U1 snRNP splicing complex that is normally required for 5-prime splice site selection (Tufarelli et al., 2001 [PubMed 11170747]). ENSG00000007392 LUC7 like NA
NUPR1 26471 NA ENSG00000176046 nuclear protein 1, transcriptional regulator NA
LUC7L3 51747 This gene encodes a protein with an N-terminal half that contains cysteine/histidine motifs and leucine zipper-like repeats, and the C-terminal half is rich in arginine and glutamate residues (RE domain) and arginine and serine residues (RS domain). This protein localizes with a speckled pattern in the nucleus, and could be involved in the formation of splicesome via the RE and RS domains. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. ENSG00000108848 LUC7 like 3 pre-mRNA splicing factor NA
SOX4 6659 This intronless gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins, such as syndecan binding protein (syntenin). The protein may function in the apoptosis pathway leading to cell death as well as to tumorigenesis and may mediate downstream effects of parathyroid hormone (PTH) and PTH-related protein (PTHrP) in bone development. The solution structure has been resolved for the HMG-box of a similar mouse protein. ENSG00000124766 SRY-box 4 NA
NA NA NA ENSG00000256586 NA TRUE
HNRNPH1 3187 This gene encodes a member of a subfamily of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs). The hnRNPs are RNA binding proteins that complex with heterogeneous nuclear RNA. These proteins are associated with pre-mRNAs in the nucleus and appear to influence pre-mRNA processing and other aspects of mRNA metabolism and transport. While all of the hnRNPs are present in the nucleus, some may shuttle between the nucleus and the cytoplasm. The hnRNP proteins have distinct nucleic acid binding properties. The protein encoded by this gene has three repeats of quasi-RRM domains that bind to RNA and is very similar to the family member HNRPF. This gene may be associated with hereditary lymphedema type I. Alternatively spliced transcript variants have been described ENSG00000169045 heterogeneous nuclear ribonucleoprotein H1 (H) NA
JUND 3727 The protein encoded by this intronless gene is a member of the JUN family, and a functional component of the AP1 transcription factor complex. This protein has been proposed to protect cells from p53-dependent senescence and apoptosis. Alternative translation initiation site usage results in the production of different isoforms (PMID:12105216). ENSG00000130522 jun D proto-oncogene NA
GOLGA8A 23015 The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. ENSG00000175265 golgin A8 family member A NA
SRSF6 6431 The protein encoded by this gene is involved in mRNA splicing and may play a role in the determination of alternative splicing. The encoded nuclear protein belongs to the splicing factor SR family and has been shown to bind with and modulate another member of the family, SFRS12. Alternative splicing results in multiple transcript variants. In addition, two pseudogenes, one on chromosome 17 and the other on the X chromosome, have been found for this gene. ENSG00000124193 serine/arginine-rich splicing factor 6 NA
CCL21 6366 This antimicrobial gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. Similar to other chemokines the protein encoded by this gene inhibits hemopoiesis and stimulates chemotaxis. This protein is chemotactic in vitro for thymocytes and activated T cells, but not for B cells, macrophages, or neutrophils. The cytokine encoded by this gene may also play a role in mediating homing of lymphocytes to secondary lymphoid organs. It is a high affinity functional ligand for chemokine receptor 7 that is expressed on T and B lymphocytes and a known receptor for another member of the cytokine family (small inducible cytokine A19). ENSG00000137077 C-C motif chemokine ligand 21 NA
NKTR 4820 This gene encodes a membrane-anchored protein with a hydrophobic amino terminal domain and a cyclophilin-like PPIase domain. It is present on the surface of natural killer cells and facilitates their binding to targets. Its expression is regulated by IL2 activation of the cells. ENSG00000114857 natural killer cell triggering receptor NA
AC074212.5 ENSG00000259605 NA ENSG00000259605 NA NA
NXF1 10482 This gene is one member of a family of nuclear RNA export factor genes. Common domain features of this family are a noncanonical RNP-type RNA-binding domain (RBD), 4 leucine-rich repeats (LRRs), a nuclear transport factor 2 (NTF2)-like domain that allows heterodimerization with NTF2-related export protein-1 (NXT1), and a ubiquitin-associated domain that mediates interactions with nucleoporins. The LRRs and NTF2-like domains are required for export activity. Alternative splicing seems to be a common mechanism in this gene family. The encoded protein of this gene shuttles between the nucleus and the cytoplasm and binds in vivo to poly(A)+ RNA. It is the vertebrate homologue of the yeast protein Mex67p. The encoded protein overcomes the mRNA export block caused by the presence of saturating amounts of CTE (constitutive transport element) RNA of type D retroviruses. Alternative splicing results in multiple transcript variants. ENSG00000162231 nuclear RNA export factor 1 NA
UCKL1 54963 The protein encoded by this gene is a uridine kinase. Uridine kinases catalyze the phosphorylation of uridine to uridine monophosphate. This protein has been shown to bind to Epstein-Barr nuclear antigen 3 as well as natural killer lytic-associated molecule. Ubiquitination of this protein is enhanced by the presence of natural killer lytic-associated molecule. In addition, protein levels decrease in the presence of natural killer lytic-associated molecule, suggesting that association with natural killer lytic-associated molecule results in ubiquitination and subsequent degradation of this protein. Alternative splicing results in multiple transcript variants. ENSG00000198276 uridine-cytidine kinase 1 like 1 NA
PCGF3 10336 The protein encoded by this gene contains a C3HC4 type RING finger, which is a motif known to be involved in protein-protein interactions. The specific function of this protein has not yet been determined. ENSG00000185619 polycomb group ring finger 3 NA
IGFBP4 3487 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. ENSG00000141753 insulin like growth factor binding protein 4 NA
EGR1 1958 The protein encoded by this gene belongs to the EGR family of C2H2-type zinc-finger proteins. It is a nuclear protein and functions as a transcriptional regulator. The products of target genes it activates are required for differentitation and mitogenesis. Studies suggest this is a cancer suppressor gene. ENSG00000120738 early growth response 1 NA
FNBP4 23360 NA ENSG00000109920 formin binding protein 4 NA
MSANTD2 79684 NA ENSG00000120458 Myb/SANT DNA binding domain containing 2 NA
NSUN5P1 155400 This locus represents a transcribed pseudogene of a nearby locus on chromosome 7, which encodes a putative methyltransferase. There is also a third closely related pseudogene locus in this region. Alternative splicing results in multiple transcript variants of this gene. ENSG00000223705 NOP2/Sun RNA methyltransferase family member 5 pseudogene 1 NA
HNRNPH3 3189 This gene belongs to the subfamily of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs). The hnRNPs are RNA binding proteins and they complex with heterogeneous nuclear RNA (hnRNA). These proteins are associated with pre-mRNAs in the nucleus and appear to influence pre-mRNA processing and other aspects of mRNA metabolism and transport. While all of the hnRNPs are present in the nucleus, some seem to shuttle between the nucleus and the cytoplasm. The hnRNP proteins have distinct nucleic acid binding properties. The protein encoded by this gene has two repeats of quasi-RRM domains that bind to RNAs. It is localized in nuclear bodies of the nucleus. This protein is involved in the splicing process and it also participates in early heat shock-induced splicing arrest by transiently leaving the hnRNP complexes. Several alternatively spliced transcript variants have been noted for this gene, however, not all are fully characterized. ENSG00000096746 heterogeneous nuclear ribonucleoprotein H3 NA
USP36 57602 This gene encodes a member of the peptidase C19 or ubiquitin-specific protease family of cysteine proteases. Members of this family remove ubiquitin molecules from polyubiquitinated proteins. The encoded protein may deubiquitinate and stabilize the transcription factor c-Myc, also known as MYC, an important oncoprotein known to be upregulated in most human cancers. The encoded protease may also regulate the activation of autophagy. This gene exhibits elevated expression in some breast and lung cancers. ENSG00000055483 ubiquitin specific peptidase 36 NA
ZFP36L2 678 This gene is a member of the TIS11 family of early response genes. Family members are induced by various agonists such as the phorbol ester TPA and the polypeptide mitogen EGF. The encoded protein contains a distinguishing putative zinc finger domain with a repeating cys-his motif. This putative nuclear transcription factor most likely functions in regulating the response to growth factors. ENSG00000152518 ZFP36 ring finger protein-like 2 NA
SNX1 6642 This gene encodes a member of the sorting nexin family. Members of this family contain a phox (PX) domain, which is a phosphoinositide binding domain, and are involved in intracellular trafficking. This endosomal protein regulates the cell-surface expression of epidermal growth factor receptor. This protein also has a role in sorting protease-activated receptor-1 from early endosomes to lysosomes. This protein may form oligomeric complexes with family members. This gene results in three transcript variants encoding distinct isoforms. ENSG00000028528 sorting nexin 1 NA
ROBO3 64221 This gene is a member of the Roundabout (ROBO) gene family that controls neurite outgrowth, growth cone guidance, and axon fasciculation. ROBO proteins are a subfamily of the immunoglobulin transmembrane receptor superfamily. SLIT proteins 1-3, a family of secreted chemorepellants, are ligands for ROBO proteins and SLIT/ROBO interactions regulate myogenesis, leukocyte migration, kidney morphogenesis, angiogenesis, and vasculogenesis in addition to neurogenesis. This gene, ROBO3, has a putative extracellular domain with five immunoglobulin (Ig)-like loops and three fibronectin (Fn) type III motifs, a transmembrane segment, and a cytoplasmic tail with three conserved signaling motifs: CC0, CC2, and CC3 (CC for conserved cytoplasmic). Unlike other ROBO family members, ROBO3 lacks motif CC1. The ROBO3 gene regulates axonal navigation at the ventral midline of the neural tube. In mouse, loss of Robo3 results in a complete failure of commissural axons to cross the midline throughout the spinal cord and the hindbrain. Mutations ROBO3 result in horizontal gaze palsy with progressive scoliosis (HGPPS); an autosomal recessive disorder characterized by congenital absence of horizontal gaze, progressive scoliosis, and failure of the corticospinal and somatosensory axon tracts to cross the midline in the medulla. Alternative transcript variants have been described but have not been experimentally validated. ENSG00000154134 roundabout guidance receptor 3 NA
GATAD1 57798 The protein encoded by this gene contains a zinc finger at the N-terminus, and is thought to bind to a histone modification site that regulates gene expression. Mutations in this gene have been associated with autosomal recessive dilated cardiomyopathy. Alternatively spliced transcript variants have been found for this gene. ENSG00000157259 GATA zinc finger domain containing 1 NA
N4BP2L2 10443 NA ENSG00000244754 NEDD4 binding protein 2-like 2 NA
TTC17 55761 NA ENSG00000052841 tetratricopeptide repeat domain 17 NA
SH3BP5-AS1 100505696 NA ENSG00000224660 SH3BP5 antisense RNA 1 NA
KLF3-AS1 79667 NA ENSG00000231160 KLF3 antisense RNA 1 NA
CLK2 1196 This gene encodes a dual specificity protein kinase that phosphorylates serine/threonine and tyrosine-containing substrates. Activity of this protein regulates serine- and arginine-rich (SR) proteins of the spliceosomal complex, thereby influencing alternative transcript splicing. Chromosomal translocations have been characterized between this locus and the PAFAH1B3 (platelet-activating factor acetylhydrolase 1b, catalytic subunit 3 (29kDa)) gene on chromosome 19, resulting in the production of a fusion protein. Note that this gene is distinct from the TELO2 gene (GeneID:9894), which shares the CLK2 alias, but encodes a protein that is involved in telomere length regulation. There is a pseudogene for this gene on chromosome 7. Alternative splicing results in multiple transcript variants. ENSG00000176444 CDC like kinase 2 NA
LOC102724814 102724814 NA ENSG00000258727 uncharacterized LOC102724814 NA
SMAD3 4088 The protein encoded by this gene belongs to the SMAD, a family of proteins similar to the gene products of the Drosophila gene ‘mothers against decapentaplegic’ (Mad) and the C. elegans gene Sma. SMAD proteins are signal transducers and transcriptional modulators that mediate multiple signaling pathways. This protein functions as a transcriptional modulator activated by transforming growth factor-beta and is thought to play a role in the regulation of carcinogenesis. ENSG00000166949 SMAD family member 3 NA
PRPF3 9129 The removal of introns from nuclear pre-mRNAs occurs on complexes called spliceosomes, which are made up of 4 small nuclear ribonucleoprotein (snRNP) particles and an undefined number of transiently associated splicing factors. This gene product is one of several proteins that associate with U4 and U6 snRNPs. Mutations in this gene are associated with retinitis pigmentosa-18. ENSG00000117360 pre-mRNA processing factor 3 NA
TBL1XR1 79718 This gene is a member of the WD40 repeat-containing gene family and shares sequence similarity with transducin (beta)-like 1X-linked (TBL1X). The protein encoded by this gene is thought to be a component of both nuclear receptor corepressor (N-CoR) and histone deacetylase 3 (HDAC 3) complexes, and is required for transcriptional activation by a variety of transcription factors. Mutations in these gene have been associated with some autism spectrum disorders, and one finding suggests that haploinsufficiency of this gene may be a cause of intellectual disability with dysmorphism. Mutations in this gene as well as recurrent translocations involving this gene have also been observed in some tumors. ENSG00000177565 transducin (beta)-like 1 X-linked receptor 1 NA
ARGLU1 55082 NA ENSG00000134884 arginine and glutamate rich 1 NA
PATZ1 23598 The protein encoded by this gene contains an A-T hook DNA binding motif which usually binds to other DNA binding structures to play an important role in chromatin modeling and transcription regulation. Its Poz domain is thought to function as a site for protein-protein interaction and is required for transcriptional repression, and the zinc-fingers comprise the DNA binding domain. Since the encoded protein has typical features of a transcription factor, it is postulated to be a repressor of gene expression. In small round cell sarcoma, this gene is fused to EWS by a small inversion of 22q, then the hybrid is thought to be translocated (t(1;22)(p36.1;q12). The rearrangement of chromosome 22 involves intron 8 of EWS and exon 1 of this gene creating a chimeric sequence containing the transactivation domain of EWS fused to zinc finger domain of this protein. This is a distinct example of an intra-chromosomal rearrangement of chromosome 22. Four alternatively spliced transcript variants are described for this gene. ENSG00000100105 POZ/BTB and AT hook containing zinc finger 1 NA
LENG8 114823 NA ENSG00000167615 leukocyte receptor cluster (LRC) member 8 NA
CHD3 1107 This gene encodes a member of the CHD family of proteins which are characterized by the presence of chromo (chromatin organization modifier) domains and SNF2-related helicase/ATPase domains. This protein is one of the components of a histone deacetylase complex referred to as the Mi-2/NuRD complex which participates in the remodeling of chromatin by deacetylating histones. Chromatin remodeling is essential for many processes including transcription. Autoantibodies against this protein are found in a subset of patients with dermatomyositis. Three alternatively spliced transcripts encoding different isoforms have been described. ENSG00000170004 chromodomain helicase DNA binding protein 3 NA
SLC7A8 23428 NA ENSG00000092068 solute carrier family 7 member 8 NA
POGZ 23126 The protein encoded by this gene appears to be a zinc finger protein containing a transposase domain at the C-terminus. This protein was found to interact with the transcription factor SP1 in a yeast two-hybrid system. Alternatively spliced transcript variants encoding distinct isoforms have been observed. ENSG00000143442 pogo transposable element with ZNF domain NA
TAF1C 9013 Initiation of transcription by RNA polymerase I requires the formation of a complex composed of the TATA-binding protein (TBP) and three TBP-associated factors (TAFs) specific for RNA polymerase I. This complex, known as SL1, binds to the core promoter of ribosomal RNA genes to position the polymerase properly and acts as a channel for regulatory signals. This gene encodes the largest SL1-specific TAF. Multiple alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000103168 TATA-box binding protein associated factor, RNA polymerase I subunit C NA
KIAA0907 22889 NA ENSG00000132680 KIAA0907 NA
SNHG7 84973 NA ENSG00000233016 small nucleolar RNA host gene 7 NA
NA NA NA ENSG00000215513 NA TRUE
ZBED5 58486 This gene is unusual in that its coding sequence is mostly derived from Charlie-like DNA transposon; however, it does not appear to be an active DNA transposon as it is not flanked by terminal inverted repeats. The encoded protein is conserved among the mammalian Laurasiatheria branch. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000236287 zinc finger BED-type containing 5 NA
LOC150776 150776 NA ENSG00000152117 sphingomyelin phosphodiesterase 4, neutral membrane (neutral sphingomyelinase-3) pseudogene NA
ZNF266 10781 This gene encodes a protein containing many tandem zinc-finger motifs. Zinc fingers are protein or nucleic acid-binding domains, and may be involved in a variety of functions, including regulation of transcription. This gene is located in a cluster of similar genes encoding zinc finger proteins on chromosome 19. Alternative splicing results in multiple transcript variants for this gene. ENSG00000174652 zinc finger protein 266 NA
AC007563.5 ENSG00000236886 NA ENSG00000236886 NA NA
SETD5 55209 This function of this gene has yet to be determined but mutations in this gene have been associated with autosomal dominant mental retardation-23. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000168137 SET domain containing 5 NA
TP73-AS1 57212 NA ENSG00000227372 TP73 antisense RNA 1 NA
EIF3L 51386 NA ENSG00000100129 eukaryotic translation initiation factor 3 subunit L NA
HEXDC 284004 NA ENSG00000169660 hexosaminidase D NA
LINC01089 338799 NA ENSG00000212694 long intergenic non-protein coding RNA 1089 NA
AMT 275 This gene encodes one of four critical components of the glycine cleavage system. Mutations in this gene have been associated with glycine encephalopathy. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000145020 aminomethyltransferase NA
CSGALNACT1 55790 NA ENSG00000147408 chondroitin sulfate N-acetylgalactosaminyltransferase 1 NA
SIX5 147912 The protein encoded by this gene is a homeodomain-containing transcription factor that appears to function in the regulation of organogenesis. This gene is located downstream of the dystrophia myotonica-protein kinase gene. Mutations in this gene are a cause of branchiootorenal syndrome type 2. ENSG00000177045 SIX homeobox 5 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",1,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 2 Annotations

out <- mygene::queryMany(gene_list[2,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol summary X_id name
ENSG00000132639 SNAP25 Synaptic vesicle membrane docking and fusion is mediated by SNAREs (soluble N-ethylmaleimide-sensitive factor attachment protein receptors) located on the vesicle membrane (v-SNAREs) and the target membrane (t-SNAREs). The assembled v-SNARE/t-SNARE complex consists of a bundle of four helices, one of which is supplied by v-SNARE and the other three by t-SNARE. For t-SNAREs on the plasma membrane, the protein syntaxin supplies one helix and the protein encoded by this gene contributes the other two. Therefore, this gene product is a presynaptic plasma membrane protein involved in the regulation of neurotransmitter release. Two alternative transcript variants encoding different protein isoforms have been described for this gene. 6616 synaptosome associated protein 25kDa
ENSG00000127585 FBXL16 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). 146330 F-box and leucine-rich repeat protein 16
ENSG00000020129 NCDN This gene encodes a leucine-rich cytoplasmic protein, which is highly similar to a mouse protein that negatively regulates Ca/calmodulin-dependent protein kinase II phosphorylation and may be essential for spatial learning processes. Several alternatively spliced transcript variants of this gene have been described. 23154 neurochondrin
ENSG00000104888 SLC17A7 The protein encoded by this gene is a vesicle-bound, sodium-dependent phosphate transporter that is specifically expressed in the neuron-rich regions of the brain. It is preferentially associated with the membranes of synaptic vesicles and functions in glutamate transport. The protein shares 82% identity with the differentiation-associated Na-dependent inorganic phosphate cotransporter and they appear to form a distinct class within the Na+/Pi cotransporter family. 57030 solute carrier family 17 member 7
ENSG00000124507 PACSIN1 NA 29993 protein kinase C and casein kinase substrate in neurons 1
ENSG00000074317 SNCB This gene encodes a member of a small family of proteins that inhibit phospholipase D2 and may function in neuronal plasticity. The encoded protein is abundant in lesions of patients with Alzheimer disease. A mutation in this gene was found in individuals with dementia with Lewy bodies. Alternative splicing results in multiple transcript variants. 6620 synuclein beta
ENSG00000155980 KIF5A This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. 3798 kinesin family member 5A
ENSG00000160014 CALM3 NA 808 calmodulin 3 (phosphorylase kinase, delta)
ENSG00000160014 CALM2 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. 805 calmodulin 2 (phosphorylase kinase, delta)
ENSG00000106976 DNM1 This gene encodes a member of the dynamin subfamily of GTP-binding proteins. The encoded protein possesses unique mechanochemical properties used to tubulate and sever membranes, and is involved in clathrin-mediated endocytosis and other vesicular trafficking processes. Actin and other cytoskeletal proteins act as binding partners for the encoded protein, which can also self-assemble leading to stimulation of GTPase activity. More than sixty highly conserved copies of the 3’ region of this gene are found elsewhere in the genome, particularly on chromosomes Y and 15. Alternatively spliced transcript variants encoding different isoforms have been described. 1759 dynamin 1
ENSG00000198668 CALM1 This gene encodes a member of the EF-hand calcium-binding protein family. It is one of three genes which encode an identical calcium binding protein which is one of the four subunits of phosphorylase kinase. Two pseudogenes have been identified on chromosome 7 and X. Multiple transcript variants encoding different isoforms have been found for this gene. 801 calmodulin 1 (phosphorylase kinase, delta)
ENSG00000198668 CALM2 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. 805 calmodulin 2 (phosphorylase kinase, delta)
ENSG00000128656 CHN1 This gene encodes GTPase-activating protein for ras-related p21-rac and a phorbol ester receptor. It is predominantly expressed in neurons, and plays an important role in neuronal signal-transduction mechanisms. Mutations in this gene are associated with Duane’s retraction syndrome 2 (DURS2). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 1123 chimerin 1
ENSG00000136854 STXBP1 This gene encodes a syntaxin-binding protein. The encoded protein appears to play a role in release of neurotransmitters via regulation of syntaxin, a transmembrane attachment protein receptor. Mutations in this gene have been associated with infantile epileptic encephalopathy-4. Alternatively spliced transcript variants have been described. 6812 syntaxin binding protein 1
ENSG00000105696 TMEM59L This gene encodes a predicted type-I membrane glycoprotein. The encoded protein may play a role in functioning of the central nervous system. 25789 transmembrane protein 59 like
ENSG00000111674 ENO2 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme, a homodimer, is found in mature neurons and cells of neuronal origin. A switch from alpha enolase to gamma enolase occurs in neural tissue during development in rats and primates. 2026 enolase 2 (gamma, neuronal)
ENSG00000139970 RTN1 This gene belongs to the family of reticulon encoding genes. Reticulons are associated with the endoplasmic reticulum, and are involved in neuroendocrine secretion or in membrane trafficking in neuroendocrine cells. This gene is considered to be a specific marker for neurological diseases and cancer, and is a potential molecular target for therapy. Alternative splicing results in multiple transcript variants. 6252 reticulon 1
ENSG00000168490 PHYHIP NA 9796 phytanoyl-CoA 2-hydroxylase interacting protein
ENSG00000163032 VSNL1 This gene is a member of the visinin/recoverin subfamily of neuronal calcium sensor proteins. The encoded protein is strongly expressed in granule cells of the cerebellum where it associates with membranes in a calcium-dependent manner and modulates intracellular signaling pathways of the central nervous system by directly or indirectly regulating the activity of adenylyl cyclase. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. 7447 visinin like 1
ENSG00000099365 STX1B The protein encoded by this gene belongs to a family of proteins thought to play a role in the exocytosis of synaptic vesicles. Vesicle exocytosis releases vesicular contents and is important to various cellular functions. For instance, the secretion of transmitters from neurons plays an important role in synaptic transmission. After exocytosis, the membrane and proteins from the vesicle are retrieved from the plasma membrane through the process of endocytosis. Mutations in this gene have been identified as one cause of fever-associated epilepsy syndromes. A possible link between this gene and Parkinson’s disease has also been suggested. 112755 syntaxin 1B
ENSG00000125814 NAPB NA 63908 NSF attachment protein beta
ENSG00000154146 NRGN Neurogranin (NRGN) is the human homolog of the neuron-specific rat RC3/neurogranin gene. This gene encodes a postsynaptic protein kinase substrate that binds calmodulin in the absence of calcium. The NRGN gene contains four exons and three introns. The exons 1 and 2 encode the protein and exons 3 and 4 contain untranslated sequences. It is suggested that the NRGN is a direct target for thyroid hormone in human brain, and that control of expression of this gene could underlay many of the consequences of hypothyroidism on mental states during development as well as in adult subjects. 4900 neurogranin
ENSG00000104435 STMN2 This gene encodes a member of the stathmin family of phosphoproteins. Stathmin proteins function in microtubule dynamics and signal transduction. The encoded protein plays a regulatory role in neuronal growth and is also thought to be involved in osteogenesis. Reductions in the expression of this gene have been associated with Down’s syndrome and Alzheimer’s disease. Alternatively spliced transcript variants have been observed for this gene. A pseudogene of this gene is located on the long arm of chromosome 6. 11075 stathmin 2
ENSG00000132535 DLG4 This gene encodes a member of the membrane-associated guanylate kinase (MAGUK) family. It heteromultimerizes with another MAGUK protein, DLG2, and is recruited into NMDA receptor and potassium channel clusters. These two MAGUK proteins may interact at postsynaptic sites to form a multimeric scaffold for the clustering of receptors, ion channels, and associated signaling proteins. Multiple transcript variants encoding different isoforms have been found for this gene. 1742 discs large homolog 4
ENSG00000188191 PRKAR1B The protein encoded by this gene is a regulatory subunit of cyclic AMP-dependent protein kinase A (PKA), which is involved in the signaling pathway of the second messenger cAMP. Two regulatory and two catalytic subunits form the PKA holoenzyme, disbands after cAMP binding. The holoenzyme is involved in many cellular events, including ion transport, metabolism, and transcription. Several transcript variants encoding the same protein have been found for this gene. 5575 protein kinase cAMP-dependent type I regulatory subunit beta
ENSG00000008735 MAPK8IP2 The protein encoded by this gene is closely related to MAPK8IP1/IB1/JIP-1, a scaffold protein that is involved in the c-Jun amino-terminal kinase signaling pathway. This protein is expressed in brain and pancreatic cells. It has been shown to interact with, and regulate the activity of MAPK8/JNK1, and MAP2K7/MKK7 kinases. This protein thus is thought to function as a regulator of signal transduction by protein kinase cascade in brain and pancreatic beta-cells. 23542 mitogen-activated protein kinase 8 interacting protein 2
ENSG00000100321 SYNGR1 This gene encodes an integral membrane protein associated with presynaptic vesicles in neuronal cells. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it functions in synaptic plasticity without being required for synaptic transmission. The gene product belongs to the synaptogyrin gene family. Three alternatively spliced variants encoding three different isoforms have been identified. 9145 synaptogyrin 1
ENSG00000063180 CA11 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. They show extensive diversity in tissue distribution and in their subcellular localization. CA XI is likely a secreted protein, however, radical changes at active site residues completely conserved in CA isozymes with catalytic activity, make it unlikely that it has carbonic anhydrase activity. It shares properties in common with two other acatalytic CA isoforms, CA VIII and CA X. CA XI is most abundantly expressed in brain, and may play a general role in the central nervous system. 770 carbonic anhydrase 11
ENSG00000110076 NRXN2 This gene encodes a member of the neurexin gene family. The products of these genes function as cell adhesion molecules and receptors in the vertebrate nervous system. These genes utilize two promoters. The majority of transcripts are produced from the upstream promoter and encode alpha-neurexin isoforms while a smaller number of transcripts are produced from the downstream promoter and encode beta-neuresin isoforms. The alpha-neurexins contain epidermal growth factor-like (EGF-like) sequences and laminin G domains, and have been shown to interact with neurexophilins. The beta-neurexins lack EGF-like sequences and contain fewer laminin G domains than alpha-neurexins. Alternative splicing and the use of alternative promoters may generate thousands of transcript variants (PMID: 12036300, PMID: 11944992). 9379 neurexin 2
ENSG00000159164 SV2A NA 9900 synaptic vesicle glycoprotein 2A
ENSG00000100505 TRIM9 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The protein localizes to cytoplasmic bodies. Its function has not been identified. Alternate splicing of this gene generates two transcript variants encoding different isoforms. 114088 tripartite motif containing 9
ENSG00000198794 SCAMP5 NA 192683 secretory carrier membrane protein 5
ENSG00000138814 PPP3CA NA 5530 protein phosphatase 3 catalytic subunit alpha
ENSG00000171617 ENC1 This gene encodes a member of the kelch-related family of actin-binding proteins. The encoded protein plays a role in the oxidative stress response as a regulator of the transcription factor Nrf2, and expression of this gene may play a role in malignant transformation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 8507 ectodermal-neural cortex 1
ENSG00000197457 STMN3 This gene encodes a protein which is a member of the stathmin protein family. Members of this protein family form a complex with tubulins at a ratio of 2 tubulins for each stathmin protein. Microtubules require the ordered assembly of alpha- and beta-tubulins, and formation of a complex with stathmin disrupts microtubule formation and function. A pseudogene of this gene is located on chromosome 22. Alternative splicing results in multiple transcript variants. 50861 stathmin 3
ENSG00000088899 LZTS3 NA 9762 leucine zipper, putative tumor suppressor family member 3
ENSG00000105649 RAB3A NA 5864 RAB3A, member RAS oncogene family
ENSG00000092096 SLC22A17 NA 51310 solute carrier family 22 member 17
ENSG00000184524 CEND1 The protein encoded by this gene is a neuron-specific protein. The similar protein in pig enhances neuroblastoma cell differentiation in vitro and may be involved in neuronal differentiation in vivo. Multiple pseudogenes have been reported for this gene. 51286 cell cycle exit and neuronal differentiation 1
ENSG00000168993 CPLX1 Proteins encoded by the complexin/synaphin gene family are cytosolic proteins that function in synaptic vesicle exocytosis. These proteins bind syntaxin, part of the SNAP receptor. The protein product of this gene binds to the SNAP receptor complex and disrupts it, allowing transmitter release. 10815 complexin 1
ENSG00000112139 MDGA1 NA 266727 MAM domain containing glycosylphosphatidylinositol anchor 1
ENSG00000154277 UCHL1 The protein encoded by this gene belongs to the peptidase C12 family. This enzyme is a thiol protease that hydrolyzes a peptide bond at the C-terminal glycine of ubiquitin. This gene is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease. 7345 ubiquitin C-terminal hydrolase L1
ENSG00000108309 RUNDC3A NA 10900 RUN domain containing 3A
ENSG00000143847 PPFIA4 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. 8497 PTPRF interacting protein alpha 4
ENSG00000107130 NCS1 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. 23413 neuronal calcium sensor 1
ENSG00000166963 MAP1A This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1A heavy chain and LC2 light chain. Expression of this gene is almost exclusively in the brain. Studies of the rat microtubule-associated protein 1A gene suggested a role in early events of spinal cord development. 4130 microtubule associated protein 1A
ENSG00000058404 CAMK2B The product of this gene belongs to the serine/threonine protein kinase family and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. Calcium signaling is crucial for several aspects of plasticity at glutamatergic synapses. In mammalian cells, the enzyme is composed of four different chains: alpha, beta, gamma, and delta. The product of this gene is a beta chain. It is possible that distinct isoforms of this chain have different cellular localizations and interact differently with calmodulin. Alternative splicing results in multiple transcript variants. 816 calcium/calmodulin dependent protein kinase II beta
ENSG00000117016 RIMS3 NA 9783 regulating synaptic membrane exocytosis 3
ENSG00000221890 NPTXR This gene encodes a protein similar to the rat neuronal pentraxin receptor. The rat pentraxin receptor is an integral membrane protein that is thought to mediate neuronal uptake of the snake venom toxin, taipoxin, and its transport into the synapses. Studies in rat indicate that translation of this mRNA initiates at a non-AUG (CUG) codon. This may also be true for mouse and human, based on strong sequence conservation amongst these species. 23467 neuronal pentraxin receptor
ENSG00000139200 PIANP This gene encodes a ligand for the paired immunoglobin-like type 2 receptor alpha, and so may be involved in immune regulation. Alternate splicing results in multiple transcript variants encoding different proteins. 196500 PILR alpha associated neural protein
ENSG00000104833 TUBB4A This gene encodes a member of the beta tubulin family. Beta tubulins are one of two core protein families (alpha and beta tubulins) that heterodimerize and assemble to form microtubules. Mutations in this gene cause hypomyelinating leukodystrophy-6 and autosomal dominant torsion dystonia-4. Alternate splicing results in multiple transcript variants encoding different isoforms. A pseudogene of this gene is found on chromosome X. 10382 tubulin beta 4A class IVa
ENSG00000167371 PRRT2 This gene encodes a transmembrane protein containing a proline-rich domain in its N-terminal half. Studies in mice suggest that it is predominantly expressed in brain and spinal cord in embryonic and postnatal stages. Mutations in this gene are associated with episodic kinesigenic dyskinesia-1. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 112476 proline rich transmembrane protein 2
ENSG00000160469 BRSK1 NA 84446 BR serine/threonine kinase 1
ENSG00000059915 PSD This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. 5662 pleckstrin and Sec7 domain containing
ENSG00000127561 SYNGR3 This gene encodes an integral membrane protein. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it is a synaptic vesicle protein that also interacts with the dopamine transporter. The gene product belongs to the synaptogyrin gene family. 9143 synaptogyrin 3
ENSG00000073969 NSF NA 4905 N-ethylmaleimide sensitive factor
ENSG00000131771 PPP1R1B This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. 84152 protein phosphatase 1 regulatory inhibitor subunit 1B
ENSG00000139899 CBLN3 Members of the precerebellin family, such as CBLN3, contain a cerebellin motif (see CBLN1; MIM 600432) and a C-terminal C1q signature domain (see MIM 120550) that mediates trimeric assembly of atypical collagen complexes. However, precerebellins do not contain a collagen motif, suggesting that they are not conventional components of the extracellular matrix (Pang et al., 2000 [PubMed 10964938]). 643866 cerebellin 3 precursor
ENSG00000145362 ANK2 This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. 287 ankyrin 2, neuronal
ENSG00000204681 GABBR1 This gene encodes a receptor for gamma-aminobutyric acid (GABA), which is the main inhibitory neurotransmitter in the mammalian central nervous system. This receptor functions as a heterodimer with GABA(B) receptor 2. Defects in this gene may underlie brain disorders such as schizophrenia and epilepsy. Alternative splicing generates multiple transcript variants, but the full-length nature of some of these variants has not been determined. 2550 gamma-aminobutyric acid type B receptor subunit 1
ENSG00000101298 SNPH Syntaxin-1, synaptobrevin/VAMP, and SNAP25 interact to form the SNARE complex, which is required for synaptic vesicle docking and fusion. The protein encoded by this gene is membrane-associated and inhibits SNARE complex formation by binding free syntaxin-1. Expression of this gene appears to be brain-specific. Alternative splicing results in multiple transcript variants encoding different isoforms. 9751 syntaphilin
ENSG00000132563 REEP2 This gene encodes a member of the receptor expression enhancing protein family. Studies of a related gene in mouse suggest that the encoded protein is found in the cell membrane and enhances the function of sweet taste receptors. Alternative splicing results in multiple transcript variants. 51308 receptor accessory protein 2
ENSG00000156011 PSD3 NA 23362 pleckstrin and Sec7 domain containing 3
ENSG00000152154 TMEM178A NA 130733 transmembrane protein 178A
ENSG00000160460 SPTBN4 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein localizes to the nuclear matrix, PML nuclear bodies, and cytoplasmic vesicles. A highly similar gene in the mouse is required for localization of specific membrane proteins in polarized regions of neurons. Multiple transcript variants encoding different isoforms have been found for this gene. 57731 spectrin beta, non-erythrocytic 4
ENSG00000008710 PKD1 This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. 5310 polycystin 1, transient receptor potential channel interacting
ENSG00000084731 KIF3C NA 3797 kinesin family member 3C
ENSG00000008277 ADAM22 This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins, and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. Unlike other members of the ADAM protein family, the protein encoded by this gene lacks metalloprotease activity since it has no zinc-binding motif. This gene is highly expressed in the brain and may function as an integrin ligand in the brain. In mice, it has been shown to be essential for correct myelination in the peripheral nervous system. Alternative splicing results in several transcript variants. 53616 ADAM metallopeptidase domain 22
ENSG00000105270 CLIP3 This gene encodes a member of the cytoplasmic linker protein 170 family. Members of this protein family contain a cytoskeleton-associated protein glycine-rich domain and mediate the interaction of microtubules with cellular organelles. The encoded protein plays a role in T cell apoptosis by facilitating the association of tubulin and the lipid raft ganglioside GD3. The encoded protein also functions as a scaffold protein mediating membrane localization of phosphorylated protein kinase B. Alternatively spliced transcript variants have been observed for this gene. 25999 CAP-Gly domain containing linker protein 3
ENSG00000179456 ZBTB18 This gene encodes a C2H2-type zinc finger protein which acts a transcriptional repressor of genes involved in neuronal development. The encoded protein recognizes a specific sequence motif and recruits components of chromatin to target genes. Alternative splicing results in multiple transcript variants. 10472 zinc finger and BTB domain containing 18
ENSG00000107742 SPOCK2 This gene encodes a protein which binds with glycosaminoglycans to form part of the extracellular matrix. The protein contains thyroglobulin type-1, follistatin-like, and calcium-binding domains, and has glycosaminoglycan attachment sites in the acidic C-terminal region. Three alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. 9806 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 2
ENSG00000135709 KIAA0513 NA 9764 KIAA0513
ENSG00000197535 MYO5A This gene is one of three myosin V heavy-chain genes, belonging to the myosin gene superfamily. Myosin V is a class of actin-based motor proteins involved in cytoplasmic vesicle transport and anchorage, spindle-pole alignment and mRNA translocation. The protein encoded by this gene is abundant in melanocytes and nerve cells. Mutations in this gene cause Griscelli syndrome type-1 (GS1), Griscelli syndrome type-3 (GS3) and neuroectodermal melanolysosomal disease, or Elejalde disease. Multiple alternatively spliced transcript variants encoding different isoforms have been reported, but the full-length nature of some variants has not been determined. 4644 myosin VA
ENSG00000187189 TSPYL4 NA 23270 TSPY-like 4
ENSG00000109107 ALDOC This gene encodes a member of the class I fructose-biphosphate aldolase gene family. Expressed specifically in the hippocampus and Purkinje cells of the brain, the encoded protein is a glycolytic enzyme that catalyzes the reversible aldol cleavage of fructose-1,6-biphosphate and fructose 1-phosphate to dihydroxyacetone phosphate and either glyceraldehyde-3-phosphate or glyceraldehyde, respectively. 230 aldolase, fructose-bisphosphate C
ENSG00000247556 OIP5-AS1 NA ENSG00000247556 OIP5 antisense RNA 1
ENSG00000178531 CTXN1 NA 404217 cortexin 1
ENSG00000128482 RNF112 This gene encodes a member of the RING finger protein family of transcription factors. The protein is primarily expressed in brain. The gene is located within the Smith-Magenis syndrome region on chromosome 17. 7732 ring finger protein 112
ENSG00000129244 ATP1B2 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 2 subunit. Two transcript variants encoding different isoforms have been found for this gene. 482 ATPase Na+/K+ transporting subunit beta 2
ENSG00000125648 SLC25A23 NA 79085 solute carrier family 25 member 23
ENSG00000135439 AGAP2 The protein encoded by this gene belongs to the centaurin gamma-like family. It mediates anti-apoptotic effects of nerve growth factor by activating nuclear phosphoinositide 3-kinase. It is overexpressed in cancer cells, and promotes cancer cell invasion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 116986 ArfGAP with GTPase domain, ankyrin repeat and PH domain 2
ENSG00000227051 C14orf132 NA ENSG00000227051 chromosome 14 open reading frame 132
ENSG00000109472 CPE This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. 1363 carboxypeptidase E
ENSG00000171867 PRNP The protein encoded by this gene is a membrane glycosylphosphatidylinositol-anchored glycoprotein that tends to aggregate into rod-like structures. The encoded protein contains a highly unstable region of five tandem octapeptide repeats. This gene is found on chromosome 20, approximately 20 kbp upstream of a gene which encodes a biochemically and structurally similar protein to the one encoded by this gene. Mutations in the repeat region as well as elsewhere in this gene have been associated with Creutzfeldt-Jakob disease, fatal familial insomnia, Gerstmann-Straussler disease, Huntington disease-like 1, and kuru. An overlapping open reading frame has been found for this gene that encodes a smaller, structurally unrelated protein, AltPrp. Alternative splicing results in multiple transcript variants. 5621 prion protein
ENSG00000131584 ACAP3 NA 116983 ArfGAP with coiled-coil, ankyrin repeat and PH domains 3
ENSG00000184702 SEPT5 This gene is a member of the septin gene family of nucleotide binding proteins, originally described in yeast as cell division cycle regulatory proteins. Septins are highly conserved in yeast, Drosophila, and mouse and appear to regulate cytoskeletal organization. Disruption of septin function disturbs cytokinesis and results in large multinucleate or polyploid cells. This gene is mapped to 22q11, the region frequently deleted in DiGeorge and velocardiofacial syndromes. A translocation involving the MLL gene and this gene has also been reported in patients with acute myeloid leukemia. Alternative splicing results in multiple transcript variants. The presence of a non-consensus polyA signal (AACAAT) in this gene also results in read-through transcription into the downstream neighboring gene (GP1BB; platelet glycoprotein Ib), whereby larger, non-coding transcripts are produced. 5413 septin 5
ENSG00000108797 CNTNAP1 The gene product was initially identified as a 190-kD protein associated with the contactin-PTPRZ1 complex. The 1,384-amino acid protein, also designated p190 or CASPR for ‘contactin-associated protein,’ includes an extracellular domain with several putative protein-protein interaction domains, a putative transmembrane domain, and a 74-amino acid cytoplasmic domain. Northern blot analysis showed that the gene is transcribed predominantly in brain as a transcript of 6.2 kb, with weak expression in several other tissues tested. The architecture of its extracellular domain is similar to that of neurexins, and this protein may be the signaling subunit of contactin, enabling recruitment and activation of intracellular signaling pathways in neurons. 8506 contactin associated protein 1
ENSG00000165802 NSMF The protein encoded by this gene is involved in guidance of olfactory axon projections and migration of luteinizing hormone-releasing hormone neurons. Defects in this gene are a cause of idiopathic hypogonadotropic hypogonadism (IHH). Several transcript variants encoding different isoforms have been found for this gene. 26012 NMDA receptor synaptonuclear signaling and neuronal migration factor
ENSG00000130758 MAP3K10 The protein encoded by this gene is a member of the serine/threonine kinase family. This kinase has been shown to activate MAPK8/JNK and MKK4/SEK1, and this kinase itself can be phoshorylated, and thus activated by JNK kinases. This kinase functions preferentially on the JNK signaling pathway, and is reported to be involved in nerve growth factor (NGF) induced neuronal apoptosis. 4294 mitogen-activated protein kinase kinase kinase 10
ENSG00000139182 CLSTN3 NA 9746 calsyntenin 3
ENSG00000171130 ATP6V0E2 Multisubunit vacuolar-type proton pumps, or H(+)-ATPases, acidify various intracellular compartments, such as vacuoles, clathrin-coated and synaptic vesicles, endosomes, lysosomes, and chromaffin granules. H(+)-ATPases are also found in plasma membranes of specialized cells, where they play roles in urinary acidification, bone resorption, and sperm maturation. Multiple subunits form H(+)-ATPases, with proteins of the V1 class hydrolyzing ATP for energy to transport H+, and proteins of the V0 class forming an integral membrane domain through which H+ is transported. ATP6V0E2 encodes an isoform of the H(+)-ATPase V0 e subunit, an essential proton pump component (Blake-Palmer et al., 2007 [PubMed 17350184]). 155066 ATPase H+ transporting V0 subunit e2
ENSG00000105662 CRTC1 NA 23373 CREB regulated transcription coactivator 1
ENSG00000132879 FBXO44 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class. It is also a member of the NFB42 (neural F Box 42 kDa) family, similar to F-box only protein 2 and F-box only protein 6. Several alternatively spliced transcript variants encoding two distinct isoforms have been found for this gene. 93611 F-box protein 44
ENSG00000198825 INPP5F The protein encoded by this gene is an inositol 1,4,5-trisphosphate (InsP3) 5-phosphatase and contains a Sac domain. The activity of this protein is specific for phosphatidylinositol 4,5-bisphosphate and phosphatidylinositol 3,4,5-trisphosphate. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 22876 inositol polyphosphate-5-phosphatase F
ENSG00000137267 TUBB2A Microtubules, key participants in processes such as mitosis and intracellular transport, are composed of heterodimers of alpha- and beta-tubulins. The protein encoded by this gene is a beta-tubulin. Defects in this gene are associated with complex cortical dysplasia with other brain malformations-5. Two transcript variants encoding distinct isoforms have been found for this gene. 7280 tubulin beta 2A class IIa
ENSG00000072832 CRMP1 This gene encodes a member of a family of cytosolic phosphoproteins expressed exclusively in the nervous system. The encoded protein is thought to be a part of the semaphorin signal transduction pathway implicated in semaphorin-induced growth cone collapse during neural development. Alternative splicing results in multiple transcript variants. 1400 collapsin response mediator protein 1
ENSG00000174684 B4GAT1 This gene encodes a member of the beta-1,3-N-acetylglucosaminyltransferase family. This enzyme is a type II transmembrane protein. It is essential for the synthesis of poly-N-acetyllactosamine, a determinant for the blood group i antigen. 11041 beta-1,4-glucuronyltransferase 1
ENSG00000130294 KIF1A The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. 547 kinesin family member 1A
ENSG00000128245 YWHAH This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 99% identical to the mouse, rat and bovine orthologs. This gene contains a 7 bp repeat sequence in its 5’ UTR, and changes in the number of this repeat have been associated with early-onset schizophrenia and psychotic bipolar disorder. 7533 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein eta
ENSG00000073670 ADAM11 This gene encodes a member of the ADAM (a disintegrin and metalloprotease) protein family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins, and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. The encoded preproprotein is proteolytically processed to generate the mature protease. This gene represents a candidate tumor suppressor gene for human breast cancer based on its location within a minimal region of chromosome 17q21 previously defined by tumor deletion mapping. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. 4185 ADAM metallopeptidase domain 11
ENSG00000250510 GPR162 This gene was identified upon genomic analysis of a gene-dense region at human chromosome 12p13. It appears to be mainly expressed in the brain; however, its function is not known. Alternatively spliced transcript variants encoding different isoforms have been identified. 27239 G protein-coupled receptor 162
ENSG00000162545 CAMK2N1 NA 55450 calcium/calmodulin dependent protein kinase II inhibitor 1
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",2,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 3 Annotations

out <- mygene::queryMany(gene_list[3,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id summary name symbol query notfound
2167 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. fatty acid binding protein 4 FABP4 ENSG00000170323 NA
5346 The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. perilipin 1 PLIN1 ENSG00000166819 NA
2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. fatty acid synthase FASN ENSG00000169710 NA
2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. glutathione peroxidase 3 GPX3 ENSG00000211445 NA
63924 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. cell death inducing DFFA like effector c CIDEC ENSG00000187288 NA
57104 This gene encodes an enzyme which catalyzes the first step in the hydrolysis of triglycerides in adipose tissue. Mutations in this gene are associated with neutral lipid storage disease with myopathy. patatin like phospholipase domain containing 2 PNPLA2 ENSG00000177666 NA
3991 The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. lipase E, hormone sensitive type LIPE ENSG00000079435 NA
729359 Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). perilipin 4 PLIN4 ENSG00000167676 NA
2819 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. glycerol-3-phosphate dehydrogenase 1 GPD1 ENSG00000167588 NA
948 The protein encoded by this gene is the fourth major glycoprotein of the platelet surface and serves as a receptor for thrombospondin in platelets and various cell lines. Since thrombospondins are widely distributed proteins involved in a variety of adhesive processes, this protein may have important functions as a cell adhesion molecule. It binds to collagen, thrombospondin, anionic phospholipids and oxidized LDL. It directly mediates cytoadherence of Plasmodium falciparum parasitized erythrocytes and it binds long chain fatty acids and may function in the transport and/or as a regulator of fatty acid transport. Mutations in this gene cause platelet glycoprotein deficiency. Multiple alternatively spliced transcript variants have been found for this gene. CD36 molecule CD36 ENSG00000135218 NA
1675 This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. complement factor D (adipsin) CFD ENSG00000197766 NA
7079 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. The secreted, netrin domain-containing protein encoded by this gene is involved in regulation of platelet aggregation and recruitment and may play role in hormonal regulation and endometrial tissue remodeling. TIMP metallopeptidase inhibitor 4 TIMP4 ENSG00000157150 NA
2934 The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. gelsolin GSN ENSG00000148180 NA
81575 APOLD1 is an endothelial cell early response protein that may play a role in regulation of endothelial cell signaling and vascular function (Regard et al., 2004 [PubMed 15102925]). apolipoprotein L domain containing 1 APOLD1 ENSG00000178878 NA
ENSG00000255108 NA NA AP006621.8 ENSG00000255108 NA
50486 NA G0/G1 switch 2 G0S2 ENSG00000123689 NA
57678 This gene encodes a mitochondrial enzyme which prefers saturated fatty acids as its substrate for the synthesis of glycerolipids. This metabolic pathway’s first step is catalyzed by the encoded enzyme. Two forms for this enzyme exist, one in the mitochondria and one in the endoplasmic reticulum. Two alternatively spliced transcript variants have been described for this gene. glycerol-3-phosphate acyltransferase, mitochondrial GPAM ENSG00000119927 NA
123 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. perilipin 2 PLIN2 ENSG00000147872 NA
32 Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. acetyl-CoA carboxylase beta ACACB ENSG00000076555 NA
11067 The expression of this gene is induced by fasting as well as by progesterone. The protein encoded by this gene contains a t-synaptosome-associated protein receptor (SNARE) coiled-coil homology domain and a peroxisomal targeting signal. Production of the encoded protein leads to phosphorylation and activation of the transcription factor ELK1. chromosome 10 open reading frame 10 C10orf10 ENSG00000165507 NA
5577 cAMP is a signaling molecule important for a variety of cellular functions. cAMP exerts its effects by activating the cAMP-dependent protein kinase, which transduces the signal through phosphorylation of different target proteins. The inactive kinase holoenzyme is a tetramer composed of two regulatory and two catalytic subunits. cAMP causes the dissociation of the inactive holoenzyme into a dimer of regulatory subunits bound to four cAMP and two free monomeric catalytic subunits. Four different regulatory subunits and three catalytic subunits have been identified in humans. The protein encoded by this gene is one of the regulatory subunits. This subunit can be phosphorylated by the activated catalytic subunit. This subunit has been shown to interact with and suppress the transcriptional activity of the cAMP responsive element binding protein 1 (CREB1) in activated T cells. Knockout studies in mice suggest that this subunit may play an important role in regulating energy balance and adiposity. The studies also suggest that this subunit may mediate the gene induction and cataleptic behavior induced by haloperidol. protein kinase cAMP-dependent type II regulatory subunit beta PRKAR2B ENSG00000005249 NA
51129 This gene encodes a glycosylated, secreted protein containing a C-terminal fibrinogen domain. The encoded protein is induced by peroxisome proliferation activators and functions as a serum hormone that regulates glucose homeostasis, lipid metabolism, and insulin sensitivity. This protein can also act as an apoptosis survival factor for vascular endothelial cells and can prevent metastasis by inhibiting vascular growth and tumor cell invasion. The C-terminal domain may be proteolytically-cleaved from the full-length secreted protein. Decreased expression of this gene has been associated with type 2 diabetes. Alternative splicing results in multiple transcript variants. This gene was previously referred to as ANGPTL2 but has been renamed ANGPTL4. angiopoietin like 4 ANGPTL4 ENSG00000167772 NA
5468 This gene encodes a member of the peroxisome proliferator-activated receptor (PPAR) subfamily of nuclear receptors. PPARs form heterodimers with retinoid X receptors (RXRs) and these heterodimers regulate transcription of various genes. Three subtypes of PPARs are known: PPAR-alpha, PPAR-delta, and PPAR-gamma. The protein encoded by this gene is PPAR-gamma and is a regulator of adipocyte differentiation. Additionally, PPAR-gamma has been implicated in the pathology of numerous diseases including obesity, diabetes, atherosclerosis and cancer. Alternatively spliced transcript variants that encode different isoforms have been described. peroxisome proliferator activated receptor gamma PPARG ENSG00000132170 NA
84293 NA family with sequence similarity 213 member A FAM213A ENSG00000122378 NA
116362 Due to its chemical instability and low solubility in aqueous solution, vitamin A requires cellular retinol-binding proteins (CRBPs), such as RBP7, for stability, internalization, intercellular transfer, homeostasis, and metabolism. retinol binding protein 7 RBP7 ENSG00000162444 NA
4023 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. lipoprotein lipase LPL ENSG00000175445 NA
125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. alcohol dehydrogenase 1B (class I), beta polypeptide ADH1B ENSG00000196616 NA
5360 The protein encoded by this gene is one of at least two lipid transfer proteins found in human plasma. The encoded protein transfers phospholipids from triglyceride-rich lipoproteins to high density lipoprotein (HDL). In addition to regulating the size of HDL particles, this protein may be involved in cholesterol metabolism. At least two transcript variants encoding different isoforms have been found for this gene. phospholipid transfer protein PLTP ENSG00000100979 NA
23452 Angiopoietins are members of the vascular endothelial growth factor family and the only known growth factors largely specific for vascular endothelium. Angiopoietin-1, angiopoietin-2, and angiopoietin-4 participate in the formation of blood vessels. ANGPTL2 protein is a secreted glycoprotein with homology to the angiopoietins and may exert a function on endothelial cells through autocrine or paracrine action. angiopoietin like 2 ANGPTL2 ENSG00000136859 NA
5176 The protein encoded by this gene is a member of the serpin family, although it does not display the serine protease inhibitory activity shown by many of the other serpin family members. The encoded protein is secreted and strongly inhibits angiogenesis. In addition, this protein is a neurotrophic factor involved in neuronal differentiation in retinoblastoma cells. serpin family F member 1 SERPINF1 ENSG00000132386 NA
2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. glutamate-ammonia ligase GLUL ENSG00000135821 NA
NA NA NA NA ENSG00000256545 TRUE
9590 The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein is expressed in endothelial cells, cultured fibroblasts, and osteosarcoma cells. It associates with protein kinases A and C and phosphatase, and serves as a scaffold protein in signal transduction. This protein and RII PKA colocalize at the cell periphery. This protein is a cell growth-related protein. Antibodies to this protein can be produced by patients with myasthenia gravis. Alternative splicing of this gene results in two transcript variants encoding different isoforms. A-kinase anchoring protein 12 AKAP12 ENSG00000131016 NA
132720 NA chromosome 4 open reading frame 32 C4orf32 ENSG00000174749 NA
10252 NA sprouty RTK signaling antagonist 1 SPRY1 ENSG00000164056 NA
60481 This gene belongs to the ELO family. It is highly expressed in the adrenal gland and testis, and encodes a multi-pass membrane protein that is localized in the endoplasmic reticulum. This protein is involved in the elongation of long-chain polyunsaturated fatty acids. Mutations in this gene have been associated with spinocerebellar ataxia-38 (SCA38). Alternatively spliced transcript variants have been found for this gene. ELOVL fatty acid elongase 5 ELOVL5 ENSG00000012660 NA
154807 NA vitamin K epoxide reductase complex subunit 1 like 1 VKORC1L1 ENSG00000196715 NA
3479 The protein encoded by this gene is similar to insulin in function and structure and is a member of a family of proteins involved in mediating growth and development. The encoded protein is processed from a precursor, bound by a specific receptor, and secreted. Defects in this gene are a cause of insulin-like growth factor I deficiency. Alternative splicing results in multiple transcript variants encoding different isoforms that may undergo similar processing to generate mature protein. insulin like growth factor 1 IGF1 ENSG00000017427 NA
10555 This gene encodes a member of the 1-acylglycerol-3-phosphate O-acyltransferase family. The protein is located within the endoplasmic reticulum membrane and converts lysophosphatidic acid to phosphatidic acid, the second step in de novo phospholipid biosynthesis. Mutations in this gene have been associated with congenital generalized lipodystrophy (CGL), or Berardinelli-Seip syndrome, a disease characterized by a near absence of adipose tissue and severe insulin resistance. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 1-acylglycerol-3-phosphate O-acyltransferase 2 AGPAT2 ENSG00000169692 NA
8483 Major alterations in the composition of the cartilage extracellular matrix occur in joint disease, such as osteoarthrosis. This gene encodes the cartilage intermediate layer protein (CILP), which increases in early osteoarthrosis cartilage. The encoded protein was thought to encode a protein precursor for two different proteins; an N-terminal CILP and a C-terminal homolog of NTPPHase, however, later studies identified no nucleotide pyrophosphatase phosphodiesterase (NPP) activity. The full-length and the N-terminal domain of this protein was shown to function as an IGF-1 antagonist. An allelic variant of this gene has been associated with lumbar disc disease. cartilage intermediate layer protein CILP ENSG00000138615 NA
23344 NA extended synaptotagmin protein 1 ESYT1 ENSG00000139641 NA
1979 This gene encodes a member of the eukaryotic translation initiation factor 4E binding protein family. The gene products of this family bind eIF4E and inhibit translation initiation. However, insulin and other growth factors can release this inhibition via a phosphorylation-dependent disruption of their binding to eIF4E. Regulation of protein production through these gene products have been implicated in cell proliferation, cell differentiation and viral infection. eukaryotic translation initiation factor 4E binding protein 2 EIF4EBP2 ENSG00000148730 NA
79812 This gene encodes a protein belonging to the member of elastin microfibril interface-located (EMILIN) protein family. This family member is an extracellular matrix glycoprotein that can interfere with tumor angiogenesis and growth. It serves as a transforming growth factor beta antagonist and can interfere with the VEGF-A/VEGFR2 pathway. A related pseudogene has been identified on chromosome 6. multimerin 2 MMRN2 ENSG00000173269 NA
7049 This locus encodes the transforming growth factor (TGF)-beta type III receptor. The encoded receptor is a membrane proteoglycan that often functions as a co-receptor with other TGF-beta receptor superfamily members. Ectodomain shedding produces soluble TGFBR3, which may inhibit TGFB signaling. Decreased expression of this receptor has been observed in various cancers. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. transforming growth factor beta receptor III TGFBR3 ENSG00000069702 NA
NA NA NA NA ENSG00000117289 TRUE
2532 The protein encoded by this gene is a glycosylated membrane protein and a non-specific receptor for several chemokines. The encoded protein is the receptor for the human malarial parasites Plasmodium vivax and Plasmodium knowlesi. Polymorphisms in this gene are the basis of the Duffy blood group system. Two transcript variants encoding different isoforms have been found for this gene. atypical chemokine receptor 1 (Duffy blood group) ACKR1 ENSG00000213088 NA
665 This gene encodes a protein that belongs to the pro-apoptotic subfamily within the Bcl-2 family of proteins. The encoded protein binds to Bcl-2 and possesses the BH3 domain. The protein directly targets mitochondria and causes apoptotic changes, including loss of membrane potential and the release of cytochrome c. BCL2/adenovirus E1B 19kDa interacting protein 3-like BNIP3L ENSG00000104765 NA
947 The protein encoded by this gene may play a role in the attachment of stem cells to the bone marrow extracellular matrix or to stromal cells. This single-pass membrane protein is highly glycosylated and phosphorylated by protein kinase C. Two transcript variants encoding different isoforms have been found for this gene. CD34 molecule CD34 ENSG00000174059 NA
NA NA NA NA ENSG00000156750 TRUE
7048 This gene encodes a member of the Ser/Thr protein kinase family and the TGFB receptor subfamily. The encoded protein is a transmembrane protein that has a protein kinase domain, forms a heterodimeric complex with another receptor protein, and binds TGF-beta. This receptor/ligand complex phosphorylates proteins, which then enter the nucleus and regulate the transcription of a subset of genes related to cell proliferation. Mutations in this gene have been associated with Marfan Syndrome, Loeys-Deitz Aortic Aneurysm Syndrome, and the development of various types of tumors. Alternatively spliced transcript variants encoding different isoforms have been characterized. transforming growth factor beta receptor II TGFBR2 ENSG00000163513 NA
7450 This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. von Willebrand factor VWF ENSG00000110799 NA
84883 This gene encodes a flavoprotein oxidoreductase that binds single stranded DNA and is thought to contribute to apoptosis in the presence of bacterial and viral DNA. The expression of this gene is also found to be induced by tumor suppressor protein p53 in colon cancer cells. apoptosis inducing factor, mitochondria associated 2 AIFM2 ENSG00000042286 NA
1368 The protein encoded by this gene is a membrane-bound arginine/lysine carboxypeptidase. Its expression is associated with monocyte to macrophage differentiation. This encoded protein contains hydrophobic regions at the amino and carboxy termini and has 6 potential asparagine-linked glycosylation sites. The active site residues of carboxypeptidases A and B are conserved in this protein. Three alternatively spliced transcript variants encoding the same protein have been described for this gene. carboxypeptidase M CPM ENSG00000135678 NA
6776 The protein encoded by this gene is a member of the STAT family of transcription factors. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators. This protein is activated by, and mediates the responses of many cell ligands, such as IL2, IL3, IL7 GM-CSF, erythropoietin, thrombopoietin, and different growth hormones. Activation of this protein in myeloma and lymphoma associated with a TEL/JAK2 gene fusion is independent of cell stimulus and has been shown to be essential for tumorigenesis. The mouse counterpart of this gene is found to induce the expression of BCL2L1/BCL-X(L), which suggests the antiapoptotic function of this gene in cells. Alternatively spliced transcript variants have been found for this gene. signal transducer and activator of transcription 5A STAT5A ENSG00000126561 NA
23593 NA heme binding protein 2 HEBP2 ENSG00000051620 NA
7078 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. TIMP metallopeptidase inhibitor 3 TIMP3 ENSG00000100234 NA
11343 This gene encodes a serine hydrolase of the AB hydrolase superfamily that catalyzes the conversion of monoacylglycerides to free fatty acids and glycerol. The encoded protein plays a critical role in several physiological processes including pain and nociperception through hydrolysis of the endocannabinoid 2-arachidonoylglycerol. Expression of this gene may play a role in cancer tumorigenesis and metastasis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. monoglyceride lipase MGLL ENSG00000074416 NA
3486 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein forms a ternary complex with insulin-like growth factor acid-labile subunit (IGFALS) and either insulin-like growth factor (IGF) I or II. In this form, it circulates in the plasma, prolonging the half-life of IGFs and altering their interaction with cell surface receptors. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. insulin like growth factor binding protein 3 IGFBP3 ENSG00000146674 NA
4641 This gene encodes a member of the unconventional myosin protein family, which are actin-based molecular motors. The protein is found in the cytoplasm, and one isoform with a unique N-terminus is also found in the nucleus. The nuclear isoform associates with RNA polymerase I and II and functions in transcription initiation. The mouse ortholog of this protein also functions in intracellular vesicle transport to the plasma membrane. Multiple transcript variants encoding different isoforms have been found for this gene. The related gene myosin IE has been referred to as myosin IC in the literature, but it is a distinct locus on chromosome 19. myosin IC MYO1C ENSG00000197879 NA
80832 The protein encoded by this gene is a member of the apolipoprotein L family and may play a role in lipid exchange and transport throughout the body, as well as in reverse cholesterol transport from peripheral cells to the liver. Two transcript variants encoding two different isoforms have been found for this gene. Only one of the isoforms appears to be a secreted protein. apolipoprotein L4 APOL4 ENSG00000100336 NA
10544 The protein encoded by this gene is a receptor for activated protein C, a serine protease activated by and involved in the blood coagulation pathway. The encoded protein is an N-glycosylated type I membrane protein that enhances the activation of protein C. Mutations in this gene have been associated with venous thromboembolism and myocardial infarction, as well as with late fetal loss during pregnancy. The encoded protein may also play a role in malarial infection and has been associated with cancer. protein C receptor PROCR ENSG00000101000 NA
9945 NA glutamine-fructose-6-phosphate transaminase 2 GFPT2 ENSG00000131459 NA
23580 The product of this gene is a member of the CDC42-binding protein family. Members of this family interact with Rho family GTPases and regulate the organization of the actin cytoskeleton. This protein has been shown to bind both CDC42 and TC10 GTPases in a GTP-dependent manner. When overexpressed in fibroblasts, this protein was able to induce pseudopodia formation, which suggested a role in inducing actin filament assembly and cell shape control. CDC42 effector protein 4 CDC42EP4 ENSG00000179604 NA
2152 This gene encodes coagulation factor III which is a cell surface glycoprotein. This factor enables cells to initiate the blood coagulation cascades, and it functions as the high-affinity receptor for the coagulation factor VII. The resulting complex provides a catalytic event that is responsible for initiation of the coagulation protease cascades by specific limited proteolysis. Unlike the other cofactors of these protease cascades, which circulate as nonfunctional precursors, this factor is a potent initiator that is fully functional when expressed on cell surfaces. There are 3 distinct domains of this factor: extracellular, transmembrane, and cytoplasmic. This protein is the only one in the coagulation pathway for which a congenital deficiency has not been described. Alternate splicing results in multiple transcript variants. coagulation factor III, tissue factor F3 ENSG00000117525 NA
2687 This gene is a member of the gamma-glutamyl transpeptidase gene family, and some reports indicate that it is capable of cleaving the gamma-glutamyl moiety of glutathione. The protein encoded by this gene is synthesized as a single, catalytically-inactive polypeptide, that is processed post-transcriptionally to form a heavy and light subunit, with the catalytic activity contained within the small subunit. The encoded enzyme is able to convert leukotriene C4 to leukotriene D4, but appears to have distinct substrate specificity compared to gamma-glutamyl transpeptidase. Alternative splicing results in multiple transcript variants encoding different isoforms. gamma-glutamyltransferase 5 GGT5 ENSG00000099998 NA
7481 The WNT gene family consists of structurally related genes which encode secreted signaling proteins. These proteins have been implicated in oncogenesis and in several developmental processes, including regulation of cell fate and patterning during embryogenesis. This gene is a member of the WNT gene family. It encodes a protein which shows 97%, 85%, and 63% amino acid identity with mouse, chicken, and Xenopus Wnt11 protein, respectively. This gene may play roles in the development of skeleton, kidney and lung, and is considered to be a plausible candidate gene for High Bone Mass Syndrome. Wnt family member 11 WNT11 ENSG00000085741 NA
4489 NA metallothionein 1A MT1A ENSG00000205362 NA
23328 NA SAM and SH3 domain containing 1 SASH1 ENSG00000111961 NA
115330 NA G protein-coupled receptor 146 GPR146 ENSG00000164849 NA
56265 This gene likely encodes a member of the carboxypeptidase family of proteins. Cloning of a comparable locus in mouse indicates that the encoded protein contains a discoidin domain and a carboxypeptidase domain, but the protein appears to lack residues necessary for carboxypeptidase activity. carboxypeptidase X (M14 family), member 1 CPXM1 ENSG00000088882 NA
375061 NA family with sequence similarity 89 member A FAM89A ENSG00000182118 NA
5140 NA phosphodiesterase 3B PDE3B ENSG00000152270 NA
83636 This gene encodes a small transmembrane protein. Mutations in this gene are a cause of neurodegeneration with brain iron accumulation-4 (NBIA4), but the specific function of the encoded protein is unknown. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. chromosome 19 open reading frame 12 C19orf12 ENSG00000131943 NA
10114 NA homeodomain interacting protein kinase 3 HIPK3 ENSG00000110422 NA
6675 NA UDP-N-acetylglucosamine pyrophosphorylase 1 UAP1 ENSG00000117143 NA
9475 The protein encoded by this gene is a serine/threonine kinase that regulates cytokinesis, smooth muscle contraction, the formation of actin stress fibers and focal adhesions, and the activation of the c-fos serum response element. This protein, which is an isozyme of ROCK1 is a target for the small GTPase Rho. Rho associated coiled-coil containing protein kinase 2 ROCK2 ENSG00000134318 NA
9397 This gene encodes one of two N-myristoyltransferase proteins. N-terminal myristoylation is a lipid modification that is involved in regulating the function and localization of signaling proteins. The encoded protein catalyzes the addition of a myristoyl group to the N-terminal glycine residue of many signaling proteins, including the human immunodeficiency virus type 1 (HIV-1) proteins, Gag and Nef. Alternative splicing results in multiple transcript variants. N-myristoyltransferase 2 NMT2 ENSG00000152465 NA
9588 The protein encoded by this gene is a member of the thiol-specific antioxidant protein family. This protein is a bifunctional enzyme with two distinct active sites. It is involved in redox regulation of the cell; it can reduce H(2)O(2) and short chain organic, fatty acid, and phospholipid hydroperoxides. It may play a role in the regulation of phospholipid turnover as well as in protection against oxidative injury. peroxiredoxin 6 PRDX6 ENSG00000117592 NA
48 The protein encoded by this gene is a bifunctional, cytosolic protein that functions as an essential enzyme in the TCA cycle and interacts with mRNA to control the levels of iron inside cells. When cellular iron levels are high, this protein binds to a 4Fe-4S cluster and functions as an aconitase. Aconitases are iron-sulfur proteins that function to catalyze the conversion of citrate to isocitrate. When cellular iron levels are low, the protein binds to iron-responsive elements (IREs), which are stem-loop structures found in the 5’ UTR of ferritin mRNA, and in the 3’ UTR of transferrin receptor mRNA. When the protein binds to IRE, it results in repression of translation of ferritin mRNA, and inhibition of degradation of the otherwise rapidly degraded transferrin receptor mRNA. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alternative splicing results in multiple transcript variants aconitase 1 ACO1 ENSG00000122729 NA
84173 NA ELMO domain containing 3 ELMOD3 ENSG00000115459 NA
1282 This gene encodes a type IV collagen alpha protein. Type IV collagen proteins are integral components of basement membranes. This gene shares a bidirectional promoter with a paralogous gene on the opposite strand. The protein consists of an amino-terminal 7S domain, a triple-helix forming collagenous domain, and a carboxy-terminal non-collagenous domain. It functions as part of a heterotrimer and interacts with other extracellular matrix components such as perlecans, proteoglycans, and laminins. In addition, proteolytic cleavage of the non-collagenous carboxy-terminal domain results in a biologically active fragment known as arresten, which has anti-angiogenic and tumor suppressor properties. Mutations in this gene cause porencephaly, cerebrovascular disease, and renal and muscular defects. Alternative splicing results in multiple transcript variants. collagen type IV alpha 1 COL4A1 ENSG00000187498 NA
4232 This gene encodes a member of the alpha/beta hydrolase superfamily. It is imprinted, exhibiting preferential expression from the paternal allele in fetal tissues, and isoform-specific imprinting in lymphocytes. The loss of imprinting of this gene has been linked to certain types of cancer and may be due to promotor switching. The encoded protein may play a role in development. Alternatively spliced transcript variants encoding multiple isoforms have been identified for this gene. Pseudogenes of this gene are located on the short arm of chromosomes 3 and 4, and the long arm of chromosomes 6 and 15. mesoderm specific transcript MEST ENSG00000106484 NA
64757 NA mitochondrial amidoxime reducing component 1 MARC1 ENSG00000186205 NA
54884 NA retinol saturase (all-trans-retinol 13,14-reductase) RETSAT ENSG00000042445 NA
4828 This gene encodes a member of the bombesin-like family of neuropeptides, which negatively regulate eating behavior. The encoded protein may regulate colonic smooth muscle contraction through binding to its cognate receptor, the neuromedin B receptor (NMBR). Polymorphisms of this gene may be associated with hunger, weight gain and obesity. Alternative splicing results in multiple transcript variants. neuromedin B NMB ENSG00000197696 NA
51351 NA zinc finger protein 117 ZNF117 ENSG00000152926 NA
ENSG00000257607 NA NA RP11-449P15.1 ENSG00000257607 NA
1901 The protein encoded by this gene is structurally similar to G protein-coupled receptors and is highly expressed in endothelial cells. It binds the ligand sphingosine-1-phosphate with high affinity and high specificity, and suggested to be involved in the processes that regulate the differentiation of endothelial cells. Activation of this receptor induces cell-cell adhesion. Alternative splicing results in multiple transcript variants. sphingosine-1-phosphate receptor 1 S1PR1 ENSG00000170989 NA
80833 This gene is a member of the apolipoprotein L gene family, and it is present in a cluster with other family members on chromosome 22. The encoded protein is found in the cytoplasm, where it may affect the movement of lipids, including cholesterol, and/or allow the binding of lipids to organelles. In addition, expression of this gene is up-regulated by tumor necrosis factor-alpha in endothelial cells lining the normal and atherosclerotic iliac artery and aorta. Alternative splicing results in multiple transcript variants. apolipoprotein L3 APOL3 ENSG00000128284 NA
9270 The cytoplasmic domains of integrins are essential for cell adhesion. The protein encoded by this gene binds to the beta1 integrin cytoplasmic domain. The interaction between this protein and beta1 integrin is highly specific. Two isoforms of this protein are derived from alternatively spliced transcripts. The shorter form of this protein does not interact with the beta1 integrin cytoplasmic domain. The longer form is a phosphoprotein and the extent of its phosphorylation is regulated by the cell-matrix interaction, suggesting an important role of this protein during integrin-dependent cell adhesion. Several transcript variants, some protein-coding and some non-protein coding, have been found for this gene. integrin subunit beta 1 binding protein 1 ITGB1BP1 ENSG00000119185 NA
84230 NA leucine-rich repeat containing 8 family member C LRRC8C ENSG00000171488 NA
54941 This gene encodes a novel E3 ubiquitin ligase that contains a RING finger domain in the N-terminus and three zinc-binding and one ubiquitin-interacting motif in the C-terminus. As a result of myristoylation, this protein associates with membranes and is primarily localized to intracellular membrane systems. The encoded protein may function as a positive regulator in the T-cell receptor signaling pathway. ring finger protein 125, E3 ubiquitin protein ligase RNF125 ENSG00000101695 NA
7423 This gene encodes a member of the PDGF (platelet-derived growth factor)/VEGF (vascular endothelial growth factor) family. The VEGF family members regulate the formation of blood vessels and are involved in endothelial cell physiology. This member is a ligand for VEGFR-1 (vascular endothelial growth factor receptor 1) and NRP-1 (neuropilin-1). Studies in mice showed that this gene was co-expressed with nuclear-encoded mitochondrial genes and the encoded protein specifically controlled endothelial uptake of fatty acids. Alternatively spliced transcript variants encoding distinct isoforms have been identified. vascular endothelial growth factor B VEGFB ENSG00000173511 NA
125058 NA TBC1 domain family member 16 TBC1D16 ENSG00000167291 NA
2321 This gene encodes a member of the vascular endothelial growth factor receptor (VEGFR) family. VEGFR family members are receptor tyrosine kinases (RTKs) which contain an extracellular ligand-binding region with seven immunoglobulin (Ig)-like domains, a transmembrane segment, and a tyrosine kinase (TK) domain within the cytoplasmic domain. This protein binds to VEGFR-A, VEGFR-B and placental growth factor and plays an important role in angiogenesis and vasculogenesis. Expression of this receptor is found in vascular endothelial cells, placental trophoblast cells and peripheral blood monocytes. Multiple transcript variants encoding different isoforms have been found for this gene. Isoforms include a full-length transmembrane receptor isoform and shortened, soluble isoforms. The soluble isoforms are associated with the onset of pre-eclampsia. fms related tyrosine kinase 1 FLT1 ENSG00000102755 NA
1879 NA early B-cell factor 1 EBF1 ENSG00000164330 NA
2690 This gene encodes a member of the type I cytokine receptor family, which is a transmembrane receptor for growth hormone. Binding of growth hormone to the receptor leads to receptor dimerization and the activation of an intra- and intercellular signal transduction pathway leading to growth. Mutations in this gene have been associated with Laron syndrome, also known as the growth hormone insensitivity syndrome (GHIS), a disorder characterized by short stature. In humans and rabbits, but not rodents, growth hormone binding protein (GHBP) is generated by proteolytic cleavage of the extracellular ligand-binding domain from the mature growth hormone receptor protein. Multiple alternatively spliced transcript variants have been found for this gene. growth hormone receptor GHR ENSG00000112964 NA
220 This gene encodes an aldehyde dehydrogenase enzyme that uses retinal as a substrate. Mutations in this gene have been associated with microphthalmia, isolated 8, and expression changes have also been detected in tumor cells. Alternative splicing results in multiple transcript variants. aldehyde dehydrogenase 1 family member A3 ALDH1A3 ENSG00000184254 NA
NA NA NA NA ENSG00000229645 TRUE
4337 Molybdenum cofactor biosynthesis is a conserved pathway leading to the biological activation of molybdenum. The protein encoded by this gene is involved in this pathway. This gene was originally thought to produce a bicistronic mRNA with the potential to produce two proteins (MOCS1A and MOCS1B) from adjacent open reading frames. However, only the first open reading frame (MOCS1A) has been found to encode a protein from the putative bicistronic mRNA, whereas additional splice variants, whose full-length natures have yet to be determined, are likely to produce a fusion between the two open reading frames. This gene is defective in patients with molybdenum cofactor deficiency, type A. A related pseudogene has been identified on chromosome 16. molybdenum cofactor synthesis 1 MOCS1 ENSG00000124615 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",3,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 4 Annotations

out <- mygene::queryMany(gene_list[4,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query X_id name summary symbol notfound
ENSG00000163017 72 actin, gamma 2, smooth muscle, enteric Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ACTG2 NA
ENSG00000133392 4629 myosin, heavy chain 11, smooth muscle The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. MYH11 NA
ENSG00000182253 23336 synemin The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. SYNM NA
ENSG00000065534 4638 myosin light chain kinase This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. MYLK NA
ENSG00000130176 1264 calponin 1 NA CNN1 NA
ENSG00000159176 1465 cysteine and glycine rich protein 1 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. CSRP1 NA
ENSG00000183963 6525 smoothelin This gene encodes a structural protein that is found exclusively in contractile smooth muscle cells. It associates with stress fibers and constitutes part of the cytoskeleton. This gene is localized to chromosome 22q12.3, distal to the TUPLE1 locus and outside the DiGeorge syndrome deletion. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. SMTN NA
ENSG00000269936 ENSG00000269936 NA NA RP11-394O4.5 NA
ENSG00000075073 6865 tachykinin receptor 2 This gene belongs to a family of genes that function as receptors for tachykinins. Receptor affinities are specified by variations in the 5’-end of the sequence. The receptors belonging to this family are characterized by interactions with G proteins and 7 hydrophobic transmembrane regions. This gene encodes the receptor for the tachykinin neuropeptide substance K, also referred to as neurokinin A. TACR2 NA
ENSG00000101335 10398 myosin light chain 9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. MYL9 NA
ENSG00000259716 NA NA NA NA TRUE
ENSG00000129116 23022 palladin, cytoskeletal associated protein This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. PALLD NA
ENSG00000259627 ENSG00000259627 NA NA RP11-244F12.2 NA
ENSG00000263335 ENSG00000263335 NA NA AF001548.5 NA
ENSG00000154330 5239 phosphoglucomutase 5 Phosphoglucomutases (EC 5.2.2.2.), such as PGM5, are phosphotransferases involved in interconversion of glucose-1-phosphate and glucose-6-phosphate. PGM activity is essential in formation of carbohydrates from glucose-6-phosphate and in formation of glucose-6-phosphate from galactose and glycogen (Edwards et al., 1995 [PubMed 8586438]). PGM5 NA
ENSG00000058668 493 ATPase plasma membrane Ca2+ transporting 4 The protein encoded by this gene belongs to the family of P-type primary ion transport ATPases characterized by the formation of an aspartyl phosphate intermediate during the reaction cycle. These enzymes remove bivalent calcium ions from eukaryotic cells against very large concentration gradients and play a critical role in intracellular calcium homeostasis. The mammalian plasma membrane calcium ATPase isoforms are encoded by at least four separate genes and the diversity of these enzymes is further increased by alternative splicing of transcripts. The expression of different isoforms and splice variants is regulated in a developmental, tissue- and cell type-specific manner, suggesting that these pumps are functionally adapted to the physiological needs of particular cells and tissues. This gene encodes the plasma membrane calcium ATPase isoform 4. Alternatively spliced transcript variants encoding different isoforms have been identified. ATP2B4 NA
ENSG00000263065 ENSG00000263065 NA NA AF001548.6 NA
ENSG00000122786 800 caldesmon 1 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. CALD1 NA
ENSG00000111696 51559 5’-nucleotidase domain containing 3 NA NT5DC3 NA
ENSG00000075624 60 actin, beta This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB NA
ENSG00000092841 4637 myosin light chain 6 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain that is expressed in smooth muscle and non-muscle tissues. Genomic sequences representing several pseudogenes have been described and two transcript variants encoding different isoforms have been identified for this gene. MYL6 NA
ENSG00000095637 10580 sorbin and SH3 domain containing 1 This gene encodes a CBL-associated protein which functions in the signaling and stimulation of insulin. Mutations in this gene may be associated with human disorders of insulin resistance. Alternative splicing results in multiple transcript variants. SORBS1 NA
ENSG00000261054 ENSG00000261054 NA NA RP11-6O2.4 NA
ENSG00000023902 51177 pleckstrin homology domain containing O1 NA PLEKHO1 NA
ENSG00000197256 25959 KN motif and ankyrin repeat domains 2 NA KANK2 NA
ENSG00000106772 158471 prune homolog 2 (Drosophila) The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. PRUNE2 NA
ENSG00000163297 118429 anthrax toxin receptor 2 This gene encodes a receptor for anthrax toxin. The protein binds to collagen IV and laminin, suggesting that it may be involved in extracellular matrix adhesion. Mutations in this gene cause juvenile hyaline fibromatosis and infantile systemic hyalinosis. Multiple transcript variants encoding different isoforms have been found for this gene. ANTXR2 NA
ENSG00000072163 55679 LIM zinc finger domain containing 2 This gene encodes a member of a small family of focal adhesion proteins which interacts with ILK (integrin-linked kinase), a protein which effects protein-protein interactions with the extraceullar matrix. The encoded protein has five LIM domains, each domain forming two zinc fingers, which permit interactions which regulate cell shape and migration. A pseudogene of this gene is located on chromosome 4. Multiple transcript variants encoding different isoforms have been found for this gene. LIMS2 NA
ENSG00000156113 3778 potassium calcium-activated channel subfamily M alpha 1 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified. KCNMA1 NA
ENSG00000125503 54776 protein phosphatase 1 regulatory subunit 12C The gene encodes a subunit of myosin phosphatase. The encoded protein regulates the catalytic activity of protein phosphatase 1 delta and assembly of the actin cytoskeleton. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. PPP1R12C NA
ENSG00000113657 1809 dihydropyrimidinase like 3 NA DPYSL3 NA
ENSG00000100994 5834 phosphorylase, glycogen; brain The protein encoded by this gene is a glycogen phosphorylase found predominantly in the brain. The encoded protein forms homodimers which can associate into homotetramers, the enzymatically active form of glycogen phosphorylase. The activity of this enzyme is positively regulated by AMP and negatively regulated by ATP, ADP, and glucose-6-phosphate. This enzyme catalyzes the rate-determining step in glycogen degradation. PYGB NA
ENSG00000007866 7005 TEA domain transcription factor 3 This gene product is a member of the transcriptional enhancer factor (TEF) family of transcription factors, which contain the TEA/ATTS DNA-binding domain. It is predominantly expressed in the placenta and is involved in the transactivation of the chorionic somatomammotropin-B gene enhancer. Translation of this protein is initiated at a non-AUG (AUA) start codon. TEAD3 NA
ENSG00000163681 7871 sarcolemma associated protein This gene encodes a component of a conserved striatin-interacting phosphatase and kinase complex. Striatin family complexes participate in a variety of cellular processes including signaling, cell cycle control, cell migration, Golgi assembly, and apoptosis. The protein encoded by this gene is a coiled-coil, tail-anchored membrane protein with a single C-terminal transmembrane domain that is posttranslationally inserted into membranes. Mutations in this gene are associated with Brugada syndrome, a cardiac channelopathy. Alternative splicing results in multiple transcript variants. SLMAP NA
ENSG00000065882 23216 TBC1 domain family member 1 TBC1D1 is the founding member of a family of proteins sharing a 180- to 200-amino acid TBC domain presumed to have a role in regulating cell growth and differentiation. These proteins share significant homology with TRE2 (USP6; MIM 604334), yeast Bub2, and CDC16 (MIM 603461) (White et al., 2000 [PubMed 10965142]). TBC1D1 NA
ENSG00000180672 NA NA NA NA TRUE
ENSG00000121440 23024 PDZ domain containing ring finger 3 This gene encodes a member of the LNX (Ligand of Numb Protein-X) family of RING-type ubiquitin E3 ligases. This protein may function in vascular morphogenesis and the differentiation of adipocytes, osteoblasts and myoblasts. This protein may be targeted for degradation by the human papilloma virus E6 protein. Alternative splicing results in multiple transcript variants. PDZRN3 NA
ENSG00000135269 26136 testin LIM domain protein Cancer-associated chromosomal changes often involve regions containing fragile sites. This gene maps to a commom fragile site on chromosome 7q31.2 designated FRA7G. This gene is similar to mouse Testin, a testosterone-responsive gene encoding a Sertoli cell secretory protein containing three LIM domains. LIM domains are double zinc-finger motifs that mediate protein-protein interactions between transcription factors, cytoskeletal proteins and signaling proteins. This protein is a negative regulator of cell growth and may act as a tumor suppressor. This scaffold protein may also play a role in cell adhesion, cell spreading and in the reorganization of the actin cytoskeleton. Multiple protein isoforms are encoded by transcript variants of this gene. TES NA
ENSG00000058272 4659 protein phosphatase 1 regulatory subunit 12A Myosin phosphatase target subunit 1, which is also called the myosin-binding subunit of myosin phosphatase, is one of the subunits of myosin phosphatase. Myosin phosphatase regulates the interaction of actin and myosin downstream of the guanosine triphosphatase Rho. The small guanosine triphosphatase Rho is implicated in myosin light chain (MLC) phosphorylation, which results in contraction of smooth muscle and interaction of actin and myosin in nonmuscle cells. The guanosine triphosphate (GTP)-bound, active form of RhoA (GTP.RhoA) specifically interacted with the myosin-binding subunit (MBS) of myosin phosphatase, which regulates the extent of phosphorylation of MLC. Rho-associated kinase (Rho-kinase), which is activated by GTP. RhoA, phosphorylated MBS and consequently inactivated myosin phosphatase. Overexpression of RhoA or activated RhoA in NIH 3T3 cells increased phosphorylation of MBS and MLC. Thus, Rho appears to inhibit myosin phosphatase through the action of Rho-kinase. Several transcript variants encoding different isoforms have been found for this gene. PPP1R12A NA
ENSG00000261616 ENSG00000261616 NA NA RP11-6O2.3 NA
ENSG00000112658 6722 serum response factor This gene encodes a ubiquitous nuclear protein that stimulates both cell proliferation and differentiation. It is a member of the MADS (MCM1, Agamous, Deficiens, and SRF) box superfamily of transcription factors. This protein binds to the serum response element (SRE) in the promoter region of target genes. This protein regulates the activity of many immediate-early genes, for example c-fos, and thereby participates in cell cycle regulation, apoptosis, cell growth, and cell differentiation. This gene is the downstream target of many pathways; for example, the mitogen-activated protein kinase pathway (MAPK) that acts through the ternary complex factors (TCFs). Two transcript variants encoding different isoforms have been found for this gene. SRF NA
ENSG00000097007 25 ABL proto-oncogene 1, non-receptor tyrosine kinase This gene is a protooncogene that encodes a protein tyrosine kinase involved in a variety of cellular processes, including cell division, adhesion, differentiation, and response to stress. The activity of the protein is negatively regulated by its SH3 domain, whereby deletion of the region encoding this domain results in an oncogene. The ubiquitously expressed protein has DNA-binding activity that is regulated by CDC2-mediated phosphorylation, suggesting a cell cycle function. This gene has been found fused to a variety of translocation partner genes in various leukemias, most notably the t(9;22) translocation that results in a fusion with the 5’ end of the breakpoint cluster region gene (BCR; MIM:151410). Alternative splicing of this gene results in two transcript variants, which contain alternative first exons that are spliced to the remaining common exons. ABL1 NA
ENSG00000118496 84085 F-box protein 30 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class and it is upregulated in nasopharyngeal carcinoma. FBXO30 NA
ENSG00000116473 5906 RAP1A, member of RAS oncogene family This gene encodes a member of the Ras family of small GTPases. The encoded protein undergoes a change in conformational state and activity, depending on whether it is bound to GTP or GDP. This protein is activated by several types of guanine nucleotide exchange factors (GEFs), and inactivated by two groups of GTPase-activating proteins (GAPs). The activation status of the encoded protein is therefore affected by the balance of intracellular levels of GEFs and GAPs. The encoded protein regulates signaling pathways that affect cell proliferation and adhesion, and may play a role in tumor malignancy. Pseudogenes of this gene have been defined on chromosomes 14 and 17. Alternative splicing results in multiple transcript variants. RAP1A NA
ENSG00000101447 81610 family with sequence similarity 83 member D NA FAM83D NA
ENSG00000121067 8405 speckle type BTB/POZ protein This gene encodes a protein that may modulate the transcriptional repression activities of death-associated protein 6 (DAXX), which interacts with histone deacetylase, core histones, and other histone-associated proteins. In mouse, the encoded protein binds to the putative leucine zipper domain of macroH2A1.2, a variant H2A histone that is enriched on inactivated X chromosomes. The BTB/POZ domain of this protein has been shown in other proteins to mediate transcriptional repression and to interact with components of histone deacetylase co-repressor complexes. Alternative splicing of this gene results in multiple transcript variants encoding the same protein. SPOP NA
ENSG00000198624 26112 coiled-coil domain containing 69 NA CCDC69 NA
ENSG00000018408 25937 WW domain containing transcription regulator 1 NA WWTR1 NA
ENSG00000140682 7041 transforming growth factor beta 1 induced transcript 1 This gene encodes a coactivator of the androgen receptor, a transcription factor which is activated by androgen and has a key role in male sexual differentiation. The encoded protein is thought to regulate androgen receptor activity and may have a role to play in the treatment of prostate cancer. Multiple transcript variants encoding different isoforms have been found for this gene. TGFB1I1 NA
ENSG00000116729 79971 wntless Wnt ligand secretion mediator NA WLS NA
ENSG00000157110 11030 RNA binding protein with multiple splicing This gene encodes a member of the RNA recognition motif family of RNA-binding proteins. The RNA recognition motif is between 80-100 amino acids in length and family members contain one to four copies of the motif. The RNA recognition motif consists of two short stretches of conserved sequence, as well as a few highly conserved hydrophobic residues. The encoded protein has a single, putative RNA recognition motif in its N-terminus. Alternative splicing results in multiple transcript variants encoding different isoforms. RBPMS NA
ENSG00000064999 23294 ankyrin repeat and sterile alpha motif domain containing 1A NA ANKS1A NA
ENSG00000197894 128 alcohol dehydrogenase 5 (class III), chi polypeptide This gene encodes a member of the alcohol dehydrogenase family. Members of this family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. The encoded protein forms a homodimer. It has virtually no activity for ethanol oxidation, but exhibits high activity for oxidation of long-chain primary alcohols and for oxidation of S-hydroxymethyl-glutathione, a spontaneous adduct between formaldehyde and glutathione. This enzyme is an important component of cellular metabolism for the elimination of formaldehyde, a potent irritant and sensitizing agent that causes lacrymation, rhinitis, pharyngitis, and contact dermatitis. The human genome contains several non-transcribed pseudogenes related to this gene. ADH5 NA
ENSG00000139718 23067 SET domain containing 1B SET1B is a component of a histone methyltransferase complex that produces trimethylated histone H3 at Lys4 (Lee et al., 2007 [PubMed 17355966]). SETD1B NA
ENSG00000128272 468 activating transcription factor 4 This gene encodes a transcription factor that was originally identified as a widely expressed mammalian DNA binding protein that could bind a tax-responsive enhancer element in the LTR of HTLV-1. The encoded protein was also isolated and characterized as the cAMP-response element binding protein 2 (CREB-2). The protein encoded by this gene belongs to a family of DNA-binding proteins that includes the AP-1 family of transcription factors, cAMP-response element binding proteins (CREBs) and CREB-like proteins. These transcription factors share a leucine zipper region that is involved in protein-protein interactions, located C-terminal to a stretch of basic amino acids that functions as a DNA binding domain. Two alternative transcripts encoding the same protein have been described. Two pseudogenes are located on the X chromosome at q28 in a region containing a large inverted duplication. ATF4 NA
ENSG00000237886 ENSG00000237886 NOTCH1 associated lncRNA in T-cell acute lymphoblastic leukemia 1 NA NALT1 NA
ENSG00000213949 3672 integrin subunit alpha 1 This gene encodes the alpha 1 subunit of integrin receptors. This protein heterodimerizes with the beta 1 subunit to form a cell-surface receptor for collagen and laminin. The heterodimeric receptor is involved in cell-cell adhesion and may play a role in inflammation and fibrosis. The alpha 1 subunit contains an inserted (I) von Willebrand factor type I domain which is thought to be involved in collagen binding. ITGA1 NA
ENSG00000240771 115557 Rho guanine nucleotide exchange factor 25 Rho GTPases alternate between an inactive GDP-bound state and an active GTP-bound state, and GEFs facilitate GDP/GTP exchange. This gene encodes a guanine nucleotide exchange factor (GEF) which interacts with Rho GTPases involved in contraction of vascular smooth muscles, regulation of responses to angiotensin II and lens cell differentiation. Multiple transcript variants encoding different isoforms have been found for this gene. ARHGEF25 NA
ENSG00000103202 4833 NME/NM23 nucleoside diphosphate kinase 4 The nucleoside diphosphate (NDP) kinases (EC 2.7.4.6) are ubiquitous enzymes that catalyze transfer of gamma-phosphates, via a phosphohistidine intermediate, between nucleoside and dioxynucleoside tri- and diphosphates. The enzymes are products of the nm23 gene family, which includes NME4 (Milon et al., 1997 [PubMed 9099850]). NME4 NA
ENSG00000101452 60625 DEAH-box helicase 35 DEAD box proteins characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of the DEAD box protein family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. The function of this gene product which is a member of this family, has not been determined. Alternatively spliced transcript variants have been found for this gene. DHX35 NA
ENSG00000174136 285704 repulsive guidance molecule family member b RGMB is a glycosylphosphatidylinositol (GPI)-anchored member of the repulsive guidance molecule family (see RGMA, MIM 607362) and contributes to the patterning of the developing nervous system (Samad et al., 2005 [PubMed 15671031]). RGMB NA
ENSG00000188549 388115 chromosome 15 open reading frame 52 NA C15orf52 NA
ENSG00000243244 11037 stonin 1 Endocytosis of cell surface proteins is mediated by a complex molecular machinery that assembles on the inner surface of the plasma membrane. This gene encodes one of two human homologs of the Drosophila melanogaster stoned B protein. This protein is related to components of the endocytic machinery and exhibits a modular structure consisting of an N-terminal proline-rich domain, a central region of homology specific to the human stoned B-like proteins, and a C-terminal region homologous to the mu subunits of adaptor protein (AP) complexes. Read-through transcription of this gene into the neighboring downstream gene, which encodes TFIIA-alpha/beta-like factor, generates a transcript (SALF), which encodes a fusion protein comprised of sequence sharing identity with each individual gene product. Alternative splicing results in multiple transcript variants. STON1 NA
ENSG00000117013 9132 potassium voltage-gated channel subfamily Q member 4 The protein encoded by this gene forms a potassium channel that is thought to play a critical role in the regulation of neuronal excitability, particularly in sensory cells of the cochlea. The current generated by this channel is inhibited by M1 muscarinic acetylcholine receptors and activated by retigabine, a novel anti-convulsant drug. The encoded protein can form a homomultimeric potassium channel or possibly a heteromultimeric channel in association with the protein encoded by the KCNQ3 gene. Defects in this gene are a cause of nonsyndromic sensorineural deafness type 2 (DFNA2), an autosomal dominant form of progressive hearing loss. Two transcript variants encoding different isoforms have been found for this gene. KCNQ4 NA
ENSG00000149596 57158 junctophilin 2 Junctional complexes between the plasma membrane and endoplasmic/sarcoplasmic reticulum are a common feature of all excitable cell types and mediate cross talk between cell surface and intracellular ion channels. The protein encoded by this gene is a component of junctional complexes and is composed of a C-terminal hydrophobic segment spanning the endoplasmic/sarcoplasmic reticulum membrane and a remaining cytoplasmic domain that shows specific affinity for the plasma membrane. This gene is a member of the junctophilin gene family. Alternative splicing has been observed at this locus and two variants encoding distinct isoforms are described. JPH2 NA
ENSG00000196923 9260 PDZ and LIM domain 7 The protein encoded by this gene is representative of a family of proteins composed of conserved PDZ and LIM domains. LIM domains are proposed to function in protein-protein recognition in a variety of contexts including gene transcription and development and in cytoskeletal interaction. The LIM domains of this protein bind to protein kinases, whereas the PDZ domain binds to actin filaments. The gene product is involved in the assembly of an actin filament-associated complex essential for transmission of ret/ptc2 mitogenic signaling. The biological function is likely to be that of an adapter, with the PDZ domain localizing the LIM-binding proteins to actin filaments of both skeletal muscle and nonmuscle tissues. Alternative splicing of this gene results in multiple transcript variants. PDLIM7 NA
ENSG00000103852 64927 tetratricopeptide repeat domain 23 NA TTC23 NA
ENSG00000163637 166336 prickle planar cell polarity protein 2 This gene encodes a homolog of Drosophila prickle. The exact function of this gene is not known, however, studies in mice suggest that it may be involved in seizure prevention. Mutations in this gene are associated with progressive myoclonic epilepsy type 5. PRICKLE2 NA
ENSG00000261490 ENSG00000261490 NA NA RP11-448G15.3 NA
ENSG00000182175 56963 repulsive guidance molecule family member a This gene encodes a member of the repulsive guidance molecule family. The encoded protein is a glycosylphosphatidylinositol-anchored glycoprotein that functions as an axon guidance protein in the developing and adult central nervous system. This protein may also function as a tumor suppressor in some cancers. Alternate splicing results in multiple transcript variants. RGMA NA
ENSG00000035403 7414 vinculin Vinculin is a cytoskeletal protein associated with cell-cell and cell-matrix junctions, where it is thought to function as one of several interacting proteins involved in anchoring F-actin to the membrane. Defects in VCL are the cause of cardiomyopathy dilated type 1W. Dilated cardiomyopathy is a disorder characterized by ventricular dilation and impaired systolic function, resulting in congestive heart failure and arrhythmia. Multiple alternatively spliced transcript variants have been found for this gene, but the biological validity of some variants has not been determined. VCL NA
ENSG00000116194 9068 angiopoietin like 1 Angiopoietins are members of the vascular endothelial growth factor family and the only known growth factors largely specific for vascular endothelium. Angiopoietin-1, angiopoietin-2, and angiopoietin-4 participate in the formation of blood vessels. The protein encoded by this gene is another member of the angiopoietin family that is widely expressed in adult tissues with mRNA levels highest in highly vascularized tissues. This protein was found to be a secretory protein that does not act as an endothelial cell mitogen in vitro. ANGPTL1 NA
ENSG00000065320 9423 netrin 1 Netrin is included in a family of laminin-related secreted proteins. The function of this gene has not yet been defined; however, netrin is thought to be involved in axon guidance and cell migration during development. Mutations and loss of expression of netrin suggest that variation in netrin may be involved in cancer development. NTN1 NA
ENSG00000155760 8324 frizzled class receptor 7 Members of the ‘frizzled’ gene family encode 7-transmembrane domain proteins that are receptors for Wnt signaling proteins. The FZD7 protein contains an N-terminal signal sequence, 10 cysteine residues typical of the cysteine-rich extracellular domain of Fz family members, 7 putative transmembrane domains, and an intracellular C-terminal tail with a PDZ domain-binding motif. FZD7 gene expression may downregulate APC function and enhance beta-catenin-mediated signals in poorly differentiated human esophageal carcinomas. FZD7 NA
ENSG00000145012 4026 LIM domain containing preferred translocation partner in lipoma This gene encodes a member of a subfamily of LIM domain proteins that are characterized by an N-terminal proline-rich region and three C-terminal LIM domains. The encoded protein localizes to the cell periphery in focal adhesions and may be involved in cell-cell adhesion and cell motility. This protein also shuttles through the nucleus and may function as a transcriptional co-activator. This gene is located at the junction of certain disease-related chromosomal translocations, which result in the expression of chimeric proteins that may promote tumor growth. Alternative splicing results in multiple transcript variants. LPP NA
ENSG00000213160 151230 kelch like family member 23 NA KLHL23 NA
ENSG00000213160 100526832 PHOSPHO2-KLHL23 readthrough This locus represents naturally occurring read-through transcription between the neighboring PHOSPHO2 (phosphatase, orphan 2) and KLHL23 (kelch-like 23) genes on chromosome 2. The read-through transcript includes only non-coding PHOSPHO2 exons, and thus encodes the KLHL23 protein. PHOSPHO2-KLHL23 NA
ENSG00000072110 87 actinin alpha 1 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. ACTN1 NA
ENSG00000135931 80210 armadillo repeat containing 9 NA ARMC9 NA
ENSG00000114861 27086 forkhead box P1 This gene belongs to subfamily P of the forkhead box (FOX) transcription factor family. Forkhead box transcription factors play important roles in the regulation of tissue- and cell type-specific gene transcription during both development and adulthood. Forkhead box P1 protein contains both DNA-binding- and protein-protein binding-domains. This gene may act as a tumor suppressor as it is lost in several tumor types and maps to a chromosomal region (3p14.1) reported to contain a tumor suppressor gene(s). Alternative splicing results in multiple transcript variants encoding different isoforms. FOXP1 NA
ENSG00000173175 111 adenylate cyclase 5 This gene encodes a member of the membrane-bound adenylyl cyclase enzymes. Adenylyl cyclases mediate G protein-coupled receptor signaling through the synthesis of the second messenger cAMP. Activity of the encoded protein is stimulated by the Gs alpha subunit of G protein-coupled receptors and is inhibited by protein kinase A, calcium and Gi alpha subunits. Single nucleotide polymorphisms in this gene may be associated with low birth weight and type 2 diabetes. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. ADCY5 NA
ENSG00000071205 79658 Rho GTPase activating protein 10 NA ARHGAP10 NA
ENSG00000118257 8828 neuropilin 2 This gene encodes a member of the neuropilin family of receptor proteins. The encoded transmembrane protein binds to SEMA3C protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C} and SEMA3F protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F}, and interacts with vascular endothelial growth factor (VEGF). This protein may play a role in cardiovascular development, axon guidance, and tumorigenesis. Multiple transcript variants encoding distinct isoforms have been identified for this gene. NRP2 NA
ENSG00000182095 84629 trinucleotide repeat containing 18 NA TNRC18 NA
ENSG00000224713 ENSG00000224713 NA NA AC025165.8 NA
ENSG00000087448 57542 kelch like family member 42 NA KLHL42 NA
ENSG00000166444 6764 suppression of tumorigenicity 5 This gene was identified by its ability to suppress the tumorigenicity of Hela cells in nude mice. The protein encoded by this gene contains a C-terminal region that shares similarity with the Rab 3 family of small GTP binding proteins. This protein preferentially binds to the SH3 domain of c-Abl kinase, and acts as a regulator of MAPK1/ERK2 kinase, which may contribute to its ability to reduce the tumorigenic phenotype in cells. Three alternatively spliced transcript variants of this gene encoding distinct isoforms are identified. ST5 NA
ENSG00000010803 22955 sex comb on midleg homolog 1 (Drosophila) NA SCMH1 NA
ENSG00000151240 22982 disco interacting protein 2 homolog C This gene encodes a member of the disco-interacting protein homolog 2 family. The protein shares strong similarity with a Drosophila protein which interacts with the transcription factor disco and is expressed in the nervous system. DIP2C NA
ENSG00000166166 115708 tRNA methyltransferase 61A NA TRMT61A NA
ENSG00000138080 11117 elastin microfibril interfacer 1 This gene encodes an extracellular matrix glycoprotein that is characterized by an N-terminal microfibril interface domain, a coiled-coiled alpha-helical domain, a collagenous domain and a C-terminal globular C1q domain. The encoded protein associates with elastic fibers at the interface between elastin and microfibrils and may play a role in the development of elastic tissues including large blood vessels, dermis, heart and lung. EMILIN1 NA
ENSG00000231346 ENSG00000231346 long intergenic non-protein coding RNA 1160 NA LINC01160 NA
ENSG00000165995 783 calcium voltage-gated channel auxiliary subunit beta 2 This gene encodes a subunit of a voltage-dependent calcium channel protein that is a member of the voltage-gated calcium channel superfamily. The gene product was originally identified as an antigen target in Lambert-Eaton myasthenic syndrome, an autoimmune disorder. Mutations in this gene are associated with Brugada syndrome. Alternatively spliced variants encoding different isoforms have been described. CACNB2 NA
ENSG00000117569 58155 polypyrimidine tract binding protein 2 The protein encoded by this gene binds to intronic polypyrimidine clusters in pre-mRNA molecules and is implicated in controlling the assembly of other splicing-regulatory proteins. This protein is very similar to the polypyrimidine tract binding protein (PTB) but most of its isoforms are expressed primarily in the brain. Alternative splicing results in multiple transcript variants. PTBP2 NA
ENSG00000166333 3611 integrin linked kinase This gene encodes a protein with a kinase-like domain and four ankyrin-like repeats. The encoded protein associates at the cell membrane with the cytoplasmic domain of beta integrins, where it regulates integrin-mediated signal transduction. Activity of this protein is important in the epithelial to mesenchymal transition, and over-expression of this gene is implicated in tumor growth and metastasis. Alternative splicing results in multiple transcript variants. ILK NA
ENSG00000163431 25802 leiomodin 1 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. LMOD1 NA
ENSG00000179954 284297 scavenger receptor cysteine rich family, 5 domains NA SSC5D NA
ENSG00000162341 219931 two pore segment channel 2 This gene encodes a putative cation-selective ion channel with two repeats of a six-transmembrane-domain. The protein localizes to lysosomal membranes and enables nicotinic acid adenine dinucleotide phosphate (NAADP) -induced calcium ion release from lysosome-related stores. This ubiquitously expressed gene has elevated expression in liver and kidney. Two common nonsynonymous SNPs in this gene strongly associate with blond versus brown hair pigmentation. TPCN2 NA
ENSG00000197361 283807 F-box and leucine-rich repeat protein 22 This gene encodes a member of the F-box protein family. This F-box protein interacts with S-phase kinase-associated protein 1A and cullin in order to form SCF complexes which function as ubiquitin ligases. FBXL22 NA
ENSG00000155858 134353 LSM11, U7 small nuclear RNA associated NA LSM11 NA
ENSG00000073712 10979 fermitin family member 2 NA FERMT2 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",4,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 5 Annotations

out <- mygene::queryMany(gene_list[5,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
regulator of G-protein signaling 5 8490 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. RGS5 ENSG00000143248 NA
matrix Gla protein 4256 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. MGP ENSG00000111341 NA
AE binding protein 1 165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AEBP1 ENSG00000106624 NA
insulin like growth factor binding protein 7 3490 This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). IGFBP7 ENSG00000163453 NA
milk fat globule-EGF factor 8 protein 4240 This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. MFGE8 ENSG00000140545 NA
melanoma cell adhesion molecule 4162 NA MCAM ENSG00000076706 NA
integrin subunit alpha 8 8516 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. ITGA8 ENSG00000077943 NA
elastin 2006 This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. ELN ENSG00000049540 NA
actin, alpha 2, smooth muscle, aorta 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ACTA2 ENSG00000107796 NA
myosin, heavy chain 10, non-muscle 4628 This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. MYH10 ENSG00000133026 NA
latent transforming growth factor beta binding protein 2 4053 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. LTBP2 ENSG00000119681 NA
proline/arginine-rich end leucine-rich repeat protein 5549 The protein encoded by this gene is a leucine-rich repeat protein present in connective tissue extracellular matrix. This protein functions as a molecule anchoring basement membranes to the underlying connective tissue. This protein has been shown to bind type I collagen to basement membranes and type II collagen to cartilage. It also binds the basement membrane heparan sulfate proteoglycan perlecan. This protein is suggested to be involved in the pathogenesis of Hutchinson-Gilford progeria (HGP), which is reported to lack the binding of collagen in basement membranes and cartilage. Alternatively spliced transcript variants encoding the same protein have been observed. PRELP ENSG00000188783 NA
myosin, heavy chain 9, non-muscle 4627 This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. MYH9 ENSG00000100345 NA
frizzled-related protein 2487 The protein encoded by this gene is a secreted protein that is involved in the regulation of bone development. Defects in this gene are a cause of female-specific osteoarthritis (OA) susceptibility. FRZB ENSG00000162998 NA
osteoglycin 4969 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family of proteins. The encoded protein induces ectopic bone formation in conjunction with transforming growth factor beta and may regulate osteoblast differentiation. High expression of the encoded protein may be associated with elevated heart left ventricular mass. Alternative splicing results in multiple transcript variants. OGN ENSG00000106809 NA
fibromodulin 2331 Fibromodulin belongs to the family of small interstitial proteoglycans. The encoded protein possesses a central region containing leucine-rich repeats with 4 keratan sulfate chains, flanked by terminal domains containing disulphide bonds. Owing to the interaction with type I and type II collagen fibrils and in vitro inhibition of fibrillogenesis, the encoded protein may play a role in the assembly of extracellular matrix. It may also regulate TGF-beta activities by sequestering TGF-beta into the extracellular matrix. Sequence variations in this gene may be associated with the pathogenesis of high myopia. Alternative splicing results in multiple transcript variants. FMOD ENSG00000122176 NA
SPARC like 1 8404 NA SPARCL1 ENSG00000152583 NA
connective tissue growth factor 1490 The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. CTGF ENSG00000118523 NA
filamin binding LIM protein 1 54751 This gene encodes a protein with an N-terminal filamin-binding domain, a central proline-rich domain, and, multiple C-terminal LIM domains. This protein localizes at cell junctions and may link cell adhesion structures to the actin cytoskeleton. This protein may be involved in the assembly and stabilization of actin-filaments and likely plays a role in modulating cell adhesion, cell morphology and cell motility. This protein also localizes to the nucleus and may affect cardiomyocyte differentiation after binding with the CSX/NKX2-5 transcription factor. Alternative splicing results in multiple transcript variants encoding different isoforms. FBLIM1 ENSG00000162458 NA
IGFBP7 antisense RNA 1 255130 NA IGFBP7-AS1 ENSG00000245067 NA
ACTA2 antisense RNA 1 ENSG00000180139 NA ACTA2-AS1 ENSG00000180139 NA
latent transforming growth factor beta binding protein 1 4052 The protein encoded by this gene belongs to the family of latent TGF-beta binding proteins (LTBPs). The secretion and activation of TGF-betas is regulated by their association with latency-associated proteins and with latent TGF-beta binding proteins. The product of this gene targets latent complexes of transforming growth factor beta to the extracellular matrix, where the latent cytokine is subsequently activated by several different mechanisms. Alternatively spliced transcript variants encoding different isoforms have been identified. LTBP1 ENSG00000049323 NA
transglutaminase 2 7052 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. TGM2 ENSG00000198959 NA
WNT1 inducible signaling pathway protein 2 8839 This gene encodes a member of the WNT1 inducible signaling pathway (WISP) protein subfamily, which belongs to the connective tissue growth factor (CTGF) family. WNT1 is a member of a family of cysteine-rich, glycosylated signaling proteins that mediate diverse developmental processes. The CTGF family members are characterized by four conserved cysteine-rich domains: insulin-like growth factor-binding domain, von Willebrand factor type C module, thrombospondin domain and C-terminal cystine knot-like (CT) domain. The encoded protein lacks the CT domain which is implicated in dimerization and heparin binding. It is 72% identical to the mouse protein at the amino acid level. This gene may be downstream in the WNT1 signaling pathway that is relevant to malignant transformation. Its expression in colon tumors is reduced while the other two WISP members are overexpressed in colon tumors. It is expressed at high levels in bone tissue, and may play an important role in modulating bone turnover. WISP2 ENSG00000064205 NA
actinin alpha 4 81 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. ACTN4 ENSG00000130402 NA
chloride intracellular channel 4 25932 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 4 (CLIC4) protein, encoded by the CLIC4 gene, is a member of the p64 family; the gene is expressed in many tissues and exhibits a intracellular vesicular pattern in Panc-1 cells (pancreatic cancer cells). CLIC4 ENSG00000169504 NA
notch 3 4854 This gene encodes the third discovered human homologue of the Drosophilia melanogaster type I membrane protein notch. In Drosophilia, notch interaction with its cell-bound ligands (delta, serrate) establishes an intercellular signalling pathway that plays a key role in neural development. Homologues of the notch-ligands have also been identified in human, but precise interactions between these ligands and the human notch homologues remains to be determined. Mutations in NOTCH3 have been identified as the underlying cause of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL). NOTCH3 ENSG00000074181 NA
cytokine receptor-like factor 1 9244 This gene encodes a member of the cytokine type I receptor family. The protein forms a secreted complex with cardiotrophin-like cytokine factor 1 and acts on cells expressing ciliary neurotrophic factor receptors. The complex can promote survival of neuronal cells. Mutations in this gene result in Crisponi syndrome and cold-induced sweating syndrome. CRLF1 ENSG00000006016 NA
integrin subunit alpha 11 22801 This gene encodes an alpha integrin. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This protein contains an I domain, is expressed in muscle tissue, dimerizes with beta 1 integrin in vitro, and appears to bind collagen in this form. Therefore, the protein may be involved in attaching muscle tissue to the extracellular matrix. Alternative transcriptional splice variants have been found for this gene, but their biological validity is not determined. ITGA11 ENSG00000137809 NA
SPARC related modular calcium binding 2 64094 This gene encodes a member of the SPARC family (secreted protein acidic and rich in cysteine/osteonectin/BM-40), which are highly expressed during embryogenesis and wound healing. The gene product is a matricellular protein which promotes matrix assembly and can stimulate endothelial cell proliferation and migration, as well as angiogenic activity. Associated with pulmonary function, this secretory gene product contains a Kazal domain, two thymoglobulin type-1 domains, and two EF-hand calcium-binding domains. The encoded protein may serve as a target for controlling angiogenesis in tumor growth and myocardial ischemia. Alternative splicing results in multiple transcript variants. SMOC2 ENSG00000112562 NA
ras homolog family member B 388 NA RHOB ENSG00000143878 NA
thrombospondin 2 7058 The protein encoded by this gene belongs to the thrombospondin family. It is a disulfide-linked homotrimeric glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein has been shown to function as a potent inhibitor of tumor growth and angiogenesis. Studies of the mouse counterpart suggest that this protein may modulate the cell surface properties of mesenchymal cells and be involved in cell adhesion and migration. THBS2 ENSG00000186340 NA
anthrax toxin receptor 1 84168 This gene encodes a type I transmembrane protein and is a tumor-specific endothelial marker that has been implicated in colorectal cancer. The encoded protein has been shown to also be a docking protein or receptor for Bacillus anthracis toxin, the causative agent of the disease, anthrax. The binding of the protective antigen (PA) component, of the tripartite anthrax toxin, to this receptor protein mediates delivery of toxin components to the cytosol of cells. Once inside the cell, the other two components of anthrax toxin, edema factor (EF) and lethal factor (LF) disrupt normal cellular processes. Three alternatively spliced variants that encode different protein isoforms have been described. ANTXR1 ENSG00000169604 NA
EGF containing fibulin-like extracellular matrix protein 1 2202 This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. EFEMP1 ENSG00000115380 NA
myosin ID 4642 NA MYO1D ENSG00000176658 NA
transmembrane protein 181 57583 The TMEM181 gene encodes a putative G protein-coupled receptor expressed on the cell surface (Carette et al., 2009 [PubMed 19965467]; Wollscheid et al., 2009 [PubMed 19349973]). TMEM181 ENSG00000146433 NA
prostate transmembrane protein, androgen induced 1 56937 This gene encodes a transmembrane protein that contains a Smad interacting motif (SIM). Expression of this gene is induced by androgens and transforming growth factor beta, and the encoded protein suppresses the androgen receptor and transforming growth factor beta signaling pathways though interactions with Smad proteins. Overexpression of this gene may play a role in multiple types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. PMEPA1 ENSG00000124225 NA
coiled-coil domain containing 3 83643 NA CCDC3 ENSG00000151468 NA
TIMP metallopeptidase inhibitor 2 7077 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. TIMP2 ENSG00000035862 NA
collagen type XVIII alpha 1 80781 This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. COL18A1 ENSG00000182871 NA
destrin, actin depolymerizing factor 11034 The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. DSTN ENSG00000125868 NA
integrin subunit alpha 10 8515 Integrins are integral transmembrane glycoproteins composed of noncovalently linked alpha and beta chains. They participate in cell adhesion as well as cell-surface mediated signalling. This gene encodes an integrin alpha chain and is expressed at high levels in chondrocytes, where it is transcriptionally regulated by AP-2epsilon and Ets-1. The protein encoded by this gene binds to collagen. Alternative splicing results in multiple transcript variants. ITGA10 ENSG00000143127 NA
forkhead box C1 2296 This gene belongs to the forkhead family of transcription factors which is characterized by a distinct DNA-binding forkhead domain. The specific function of this gene has not yet been determined; however, it has been shown to play a role in the regulation of embryonic and ocular development. Mutations in this gene cause various glaucoma phenotypes including primary congenital glaucoma, autosomal dominant iridogoniodysgenesis anomaly, and Axenfeld-Rieger anomaly. FOXC1 ENSG00000054598 NA
superoxide dismutase 3, extracellular 6649 This gene encodes a member of the superoxide dismutase (SOD) protein family. SODs are antioxidant enzymes that catalyze the conversion of superoxide radicals into hydrogen peroxide and oxygen, which may protect the brain, lungs, and other tissues from oxidative stress. Proteolytic processing of the encoded protein results in the formation of two distinct homotetramers that differ in their ability to interact with the extracellular matrix (ECM). Homotetramers consisting of the intact protein, or type C subunit, exhibit high affinity for heparin and are anchored to the ECM. Homotetramers consisting of a proteolytically cleaved form of the protein, or type A subunit, exhibit low affinity for heparin and do not interact with the ECM. A mutation in this gene may be associated with increased heart disease risk. SOD3 ENSG00000109610 NA
jun B proto-oncogene 3726 NA JUNB ENSG00000171223 NA
versican 1462 This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. VCAN ENSG00000038427 NA
protein phosphatase 1 regulatory inhibitor subunit 14A 94274 The protein encoded by this gene belongs to the protein phosphatase 1 (PP1) inhibitor family. This protein is an inhibitor of smooth muscle myosin phosphatase, and has higher inhibitory activity when phosphorylated. Inhibition of myosin phosphatase leads to increased myosin phosphorylation and enhanced smooth muscle contraction. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. PPP1R14A ENSG00000167641 NA
insulin like growth factor binding protein 2 3485 The protein encoded by this gene is one of six similar proteins that bind insulin-like growth factors I and II (IGF-I and IGF-II). The encoded protein can be secreted into the bloodstream, where it binds IGF-I and IGF-II with high affinity, or it can remain intracellular, interacting with many different ligands. High expression levels of this protein promote the growth of several types of tumors and may be predictive of the chances of recovery of the patient. Several transcript variants, one encoding a secreted isoform and the others encoding nonsecreted isoforms, have been found for this gene. IGFBP2 ENSG00000115457 NA
integrin subunit beta 5 3693 NA ITGB5 ENSG00000082781 NA
fibulin 5 10516 The protein encoded by this gene is a secreted, extracellular matrix protein containing an Arg-Gly-Asp (RGD) motif and calcium-binding EGF-like domains. It promotes adhesion of endothelial cells through interaction of integrins and the RGD motif. It is prominently expressed in developing arteries but less so in adult vessels. However, its expression is reinduced in balloon-injured vessels and atherosclerotic lesions, notably in intimal vascular smooth muscle cells and endothelial cells. Therefore, the protein encoded by this gene may play a role in vascular development and remodeling. Defects in this gene are a cause of autosomal dominant cutis laxa, autosomal recessive cutis laxa type I (CL type I), and age-related macular degeneration type 3 (ARMD3). FBLN5 ENSG00000140092 NA
tenascin C 3371 This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. TNC ENSG00000041982 NA
protein kinase, cGMP-dependent, type I 5592 Mammals have three different isoforms of cyclic GMP-dependent protein kinase (Ialpha, Ibeta, and II). These PRKG isoforms act as key mediators of the nitric oxide/cGMP signaling pathway and are important components of many signal transduction processes in diverse cell types. This PRKG1 gene on human chromosome 10 encodes the soluble Ialpha and Ibeta isoforms of PRKG by alternative transcript splicing. A separate gene on human chromosome 4, PRKG2, encodes the membrane-bound PRKG isoform II. The PRKG1 proteins play a central role in regulating cardiovascular and neuronal functions in addition to relaxing smooth muscle tone, preventing platelet aggregation, and modulating cell growth. This gene is most strongly expressed in all types of smooth muscle, platelets, cerebellar Purkinje cells, hippocampal neurons, and the lateral amygdala. Isoforms Ialpha and Ibeta have identical cGMP-binding and catalytic domains but differ in their leucine/isoleucine zipper and autoinhibitory sequences and therefore differ in their dimerization substrates and kinase enzyme activity. PRKG1 ENSG00000185532 NA
CD151 molecule (Raph blood group) 977 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins and other transmembrane 4 superfamily proteins. It is involved in cellular processes including cell adhesion and may regulate integrin trafficking and/or function. This protein enhances cell motility, invasion and metastasis of cancer cells. Multiple alternatively spliced transcript variants that encode the same protein have been described for this gene. CD151 ENSG00000177697 NA
filamin A interacting protein 1-like 11259 NA FILIP1L ENSG00000168386 NA
transgelin 6876 The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. TAGLN ENSG00000149591 NA
serine/threonine kinase 38 like 23012 NA STK38L ENSG00000211455 NA
tumor necrosis factor receptor superfamily member 11b 4982 The protein encoded by this gene is a member of the TNF-receptor superfamily. This protein is an osteoblast-secreted decoy receptor that functions as a negative regulator of bone resorption. This protein specifically binds to its ligand, osteoprotegerin ligand, both of which are key extracellular regulators of osteoclast development. Studies of the mouse counterpart also suggest that this protein and its ligand play a role in lymph-node organogenesis and vascular calcification. Alternatively spliced transcript variants of this gene have been reported, but their full length nature has not been determined. TNFRSF11B ENSG00000164761 NA
cysteine and glycine rich protein 2 1466 CSRP2 is a member of the CSRP family of genes, encoding a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. CRP2 contains two copies of the cysteine-rich amino acid sequence motif (LIM) with putative zinc-binding activity, and may be involved in regulating ordered cell growth. Other genes in the family include CSRP1 and CSRP3. Alternative splicing results in multiple transcript variants. CSRP2 ENSG00000175183 NA
tubulointerstitial nephritis antigen like 1 64129 The protein encoded by this gene is similar in sequence to tubulointerstitial nephritis antigen, a secreted glycoprotein that is recognized by antibodies in some types of immune-related tubulointerstitial nephritis. Three transcript variants encoding different isoforms have been found for this gene. TINAGL1 ENSG00000142910 NA
polycystin 2, transient receptor potential cation channel 5311 This gene encodes a member of the polycystin protein family. The encoded protein is a multi-pass membrane protein that functions as a calcium permeable cation channel, and is involved in calcium transport and calcium signaling in renal epithelial cells. This protein interacts with polycystin 1, and they may be partners in a common signaling cascade involved in tubular morphogenesis. Mutations in this gene are associated with autosomal dominant polycystic kidney disease type 2. PKD2 ENSG00000118762 NA
Rho guanine nucleotide exchange factor 17 9828 NA ARHGEF17 ENSG00000110237 NA
nephroblastoma overexpressed 4856 The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. NOV ENSG00000136999 NA
adhesion molecule with Ig-like domain 2 347902 NA AMIGO2 ENSG00000139211 NA
carboxypeptidase X (M14 family), member 2 119587 NA CPXM2 ENSG00000121898 NA
chondroitin sulfate proteoglycan 4 1464 A human melanoma-associated chondroitin sulfate proteoglycan plays a role in stabilizing cell-substratum interactions during early events of melanoma cell spreading on endothelial basement membranes. CSPG4 represents an integral membrane chondroitin sulfate proteoglycan expressed by human malignant melanoma cells. CSPG4 ENSG00000173546 NA
cytochrome b5 reductase 3 1727 This gene encodes cytochrome b5 reductase, which includes a membrane-bound form in somatic cells (anchored in the endoplasmic reticulum, mitochondrial and other membranes) and a soluble form in erythrocytes. The membrane-bound form exists mainly on the cytoplasmic side of the endoplasmic reticulum and functions in desaturation and elongation of fatty acids, in cholesterol biosynthesis, and in drug metabolism. The erythrocyte form is located in a soluble fraction of circulating erythrocytes and is involved in methemoglobin reduction. The membrane-bound form has both membrane-binding and catalytic domains, while the soluble form has only the catalytic domain. Alternate splicing results in multiple transcript variants. Mutations in this gene cause methemoglobinemias. CYB5R3 ENSG00000100243 NA
SMAD family member 7 4092 The protein encoded by this gene is a nuclear protein that binds the E3 ubiquitin ligase SMURF2. Upon binding, this complex translocates to the cytoplasm, where it interacts with TGF-beta receptor type-1 (TGFBR1), leading to the degradation of both the encoded protein and TGFBR1. Expression of this gene is induced by TGFBR1. Variations in this gene are a cause of susceptibility to colorectal cancer type 3 (CRCS3). Several transcript variants encoding different isoforms have been found for this gene. SMAD7 ENSG00000101665 NA
tensin 1 7145 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. TNS1 ENSG00000079308 NA
vimentin 7431 This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. VIM ENSG00000026025 NA
Yes associated protein 1 10413 This gene encodes a downstream nuclear effector of the Hippo signaling pathway which is involved in development, growth, repair, and homeostasis. This gene is known to play a role in the development and progression of multiple cancers as a transcriptional regulator of this signaling pathway and may function as a potential target for cancer treatment. Alternative splicing results in multiple transcript variants encoding different isoforms. YAP1 ENSG00000137693 NA
regulator of calcineurin 2 10231 This gene encodes a member of the regulator of calcineurin (RCAN) protein family. These proteins play a role in many physiological processes by binding to the catalytic domain of calcineurin A, inhibiting calcineurin-mediated nuclear translocation of the transcription factor NFATC1. Expression of this gene in skin fibroblasts is upregulated by thyroid hormone, and the encoded protein may also play a role in endothelial cell function and angiogenesis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. RCAN2 ENSG00000172348 NA
collagen type IV alpha 2 1284 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. COL4A2 ENSG00000134871 NA
hes related family bHLH transcription factor with YRPW motif 2 23493 This gene encodes a member of the hairy and enhancer of split-related (HESR) family of basic helix-loop-helix (bHLH)-type transcription factors. The encoded protein forms homo- or hetero-dimers that localize to the nucleus and interact with a histone deacetylase complex to repress transcription. Expression of this gene is induced by the Notch signal transduction pathway. Two similar and redundant genes in mouse are required for embryonic cardiovascular development, and are also implicated in neurogenesis and somitogenesis. Alternatively spliced transcript variants have been found, but their biological validity has not been determined. HEY2 ENSG00000135547 NA
thromboxane A2 receptor 6915 This gene encodes a member of the G protein-coupled receptor family. The protein interacts with thromboxane A2 to induce platelet aggregation and regulate hemostasis. A mutation in this gene results in a bleeding disorder. Multiple transcript variants encoding different isoforms have been found for this gene. TBXA2R ENSG00000006638 NA
integrin subunit alpha 5 3678 The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. ITGA5 ENSG00000161638 NA
zinc finger homeobox 3 463 This gene encodes a transcription factor with multiple homeodomains and zinc finger motifs, and regulates myogenic and neuronal differentiation. The encoded protein suppresses expression of the alpha-fetoprotein gene by binding to an AT-rich enhancer motif. The protein has also been shown to negatively regulate c-Myb, and transactivate the cell cycle inhibitor cyclin-dependent kinase inhibitor 1A (also known as p21CIP1). This gene is reported to function as a tumor suppressor in several cancers, and sequence variants of this gene are also associated with atrial fibrillation. Multiple transcript variants expressed from alternate promoters and encoding different isoforms have been found for this gene. ZFHX3 ENSG00000140836 NA
secreted frizzled-related protein 2 6423 This gene encodes a member of the SFRP family that contains a cysteine-rich domain homologous to the putative Wnt-binding site of Frizzled proteins. SFRPs act as soluble modulators of Wnt signaling. Methylation of this gene is a potential marker for the presence of colorectal cancer. SFRP2 ENSG00000145423 NA
inhibitor of DNA binding 2, HLH protein 3398 The protein encoded by this gene belongs to the inhibitor of DNA binding family, members of which are transcriptional regulators that contain a helix-loop-helix (HLH) domain but not a basic domain. Members of the inhibitor of DNA binding family inhibit the functions of basic helix-loop-helix transcription factors in a dominant-negative manner by suppressing their heterodimerization partners through the HLH domains. This protein may play a role in negatively regulating cell differentiation. A pseudogene of this gene is located on chromosome 3. ID2 ENSG00000115738 NA
dishevelled-binding antagonist of beta-catenin 3 147906 NA DACT3 ENSG00000197380 NA
jagged 1 182 The jagged 1 protein encoded by JAG1 is the human homolog of the Drosophilia jagged protein. Human jagged 1 is the ligand for the receptor notch 1, the latter a human homolog of the Drosophilia jagged receptor notch. Mutations that alter the jagged 1 protein cause Alagille syndrome. Jagged 1 signalling through notch 1 has also been shown to play a role in hematopoiesis. JAG1 ENSG00000101384 NA
NA ENSG00000232415 NA CTB-51J22.1 ENSG00000232415 NA
potassium channel tetramerization domain containing 10 83892 The protein encoded by this gene binds proliferating cell nuclear antigen (PCNA) and may be involved in DNA synthesis and cell proliferation. In addition, the encoded protein may be a tumor suppressor. Several protein-coding and non-protein coding transcript variants have been found for this gene. KCTD10 ENSG00000110906 NA
hes related family bHLH transcription factor with YRPW motif-like 26508 This gene encodes a member of the hairy and enhancer of split-related (HESR) family of basic helix-loop-helix (bHLH)-type transcription factors. The sequence of the encoded protein contains a conserved bHLH and orange domain, but its YRPW motif has diverged from other HESR family members. It is thought to be an effector of Notch signaling and a regulator of cell fate decisions. Alternatively spliced transcript variants have been found, but their biological validity has not been determined. HEYL ENSG00000163909 NA
hyaluronan and proteoglycan link protein 3 145864 This gene belongs to the hyaluronan and proteoglycan binding link protein gene family. The protein encoded by this gene may function in hyaluronic acid binding and cell adhesion. HAPLN3 ENSG00000140511 NA
muscleblind like splicing regulator 1 4154 This gene encodes a member of the muscleblind protein family which was initially described in Drosophila melanogaster. The encoded protein is a C3H-type zinc finger protein that modulates alternative splicing of pre-mRNAs. Muscleblind proteins bind specifically to expanded dsCUG RNA but not to normal size CUG repeats and may thereby play a role in the pathophysiology of myotonic dystrophy. Mice lacking this gene exhibited muscle abnormalities and cataracts. Several alternatively spliced transcript variants have been described but the full-length natures of only some have been determined. The different isoforms are thought to have different binding specificities and/or splicing activities. MBNL1 ENSG00000152601 NA
microfibrillar associated protein 4 4239 This gene encodes a protein with similarity to a bovine microfibril-associated protein. The protein has binding specificities for both collagen and carbohydrate. It is thought to be an extracellular matrix protein which is involved in cell adhesion or intercellular interactions. The gene is located within the Smith-Magenis syndrome region. Two transcript variants encoding different isoforms have been found for this gene. MFAP4 ENSG00000166482 NA
Wilms tumor 1 interacting protein 126374 NA WTIP ENSG00000142279 NA
platelet derived growth factor subunit A 5154 This gene encodes a member of the protein family comprised of both platelet-derived growth factors (PDGF) and vascular endothelial growth factors (VEGF). The encoded preproprotein is proteolytically processed to generate platelet-derived growth factor subunit A, which can homodimerize, or alternatively, heterodimerize with the related platelet-derived growth factor subunit B. These proteins bind and activate PDGF receptor tyrosine kinases, which play a role in a wide range of developmental processes. Alternative splicing results in multiple transcript variants. PDGFA ENSG00000197461 NA
protein kinase C delta binding protein 112464 The protein encoded by this gene was identified as a binding protein of the protein kinase C, delta (PRKCD). The expression of this gene in cultured cell lines is strongly induced by serum starvation. The expression of this protein was found to be down-regulated in various cancer cell lines, suggesting the possible tumor suppressor function of this protein. PRKCDBP ENSG00000170955 NA
NA NA NA NA ENSG00000255905 TRUE
growth arrest and DNA damage inducible beta 4616 This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The genes in this group respond to environmental stresses by mediating activation of the p38/JNK pathway. This activation is mediated via their proteins binding and activating MTK1/MEKK4 kinase, which is an upstream activator of both p38 and JNK MAPKs. The function of these genes or their protein products is involved in the regulation of growth and apoptosis. These genes are regulated by different mechanisms, but they are often coordinately expressed and can function cooperatively in inhibiting cell growth. GADD45B ENSG00000099860 NA
atlastin GTPase 3 25923 This gene encodes a member of a family of dynamin-like, integral membrane GTPases. The encoded protein is required for the proper formation of the network of interconnected tubules of the endoplasmic reticulum. Mutations in this gene may be associated with hereditary sensory neuropathy type IF. Alternatively spliced transcript variants that encode distinct isoforms have been described. ATL3 ENSG00000184743 NA
adipogenesis regulatory factor 10974 APM2 gene is exclusively expressed in adipose tissue. Its function is currently unknown. ADIRF ENSG00000148671 NA
Sad1 and UNC84 domain containing 2 25777 SUN1 (MIM 607723) and SUN2 are inner nuclear membrane (INM) proteins that play a major role in nuclear-cytoplasmic connection by formation of a ‘bridge’ across the nuclear envelope, known as the LINC complex, via interaction with the conserved luminal KASH domain of nesprins (e.g., SYNE1; MIM 608441) located in the outer nuclear membrane (ONM). The LINC complex provides a direct connection between the nuclear lamina and the cytoskeleton, which contributes to nuclear positioning and cellular rigidity (summary by Haque et al., 2010 [PubMed 19933576]). SUN2 ENSG00000100242 NA
murine retrovirus integration site 1 homolog 10335 This gene is similar to a putative mouse tumor suppressor gene (Mrvi1) that is frequently disrupted by mouse AIDS-related virus (MRV). The encoded protein, which is found in the membrane of the endoplasmic reticulum, is similar to Jaw1, a lymphoid-restricted protein whose expression is down-regulated during lymphoid differentiation. This protein is a substrate of cGMP-dependent kinase-1 (PKG1) that can function as a regulator of IP3-induced calcium release. Studies in mouse suggest that MRV integration at Mrvi1 induces myeloid leukemia by altering the expression of a gene important for myeloid cell growth and/or differentiation, and thus this gene may function as a myeloid leukemia tumor suppressor gene. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, and alternative translation start sites, including a non-AUG (CUG) start site, are used. MRVI1 ENSG00000072952 NA
extracellular matrix protein 2 1842 ECM2 encodes extracellular matrix protein 2, so named because it shares extensive similarity with known extracelluar matrix proteins. Three transcript variants encoding different isoforms have been found for this gene. ECM2 ENSG00000106823 NA
Janus kinase 2 3717 This gene product is a protein tyrosine kinase involved in a specific subset of cytokine receptor signaling pathways. It has been found to be constituitively associated with the prolactin receptor and is required for responses to gamma interferon. Mice that do not express an active protein for this gene exhibit embryonic lethality associated with the absence of definitive erythropoiesis. JAK2 ENSG00000096968 NA
VIM antisense RNA 1 100507347 NA VIM-AS1 ENSG00000229124 NA
latent transforming growth factor beta binding protein 3 4054 The protein encoded by this gene forms a complex with transforming growth factor beta (TGF-beta) proteins and may be involved in their subcellular localization. Activation of this complex requires removal of the encoded binding protein. This protein also may play a structural role in the extracellular matrix. Three transcript variants encoding different isoforms have been found for this gene. LTBP3 ENSG00000168056 NA
polymerase I and transcript release factor 284119 This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. PTRF ENSG00000177469 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",5,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 6 Annotations

out <- mygene::queryMany(gene_list[6,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query summary name X_id
KRT10 ENSG00000186395 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. keratin 10 3858
KRT1 ENSG00000167768 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 1 3848
KRT2 ENSG00000172867 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 2 3849
LOR ENSG00000203782 This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. loricrin 4014
KRT14 ENSG00000186847 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. keratin 14 3861
DMKN ENSG00000161249 This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. dermokine 93099
DCD ENSG00000161634 This antimicrobial gene encodes a secreted protein that is subsequently processed into mature peptides of distinct biological activities. The C-terminal peptide is constitutively expressed in sweat and has antibacterial and antifungal activities. The N-terminal peptide, also known as diffusible survival evasion peptide, promotes neural cell survival under conditions of severe oxidative stress. A glycosylated form of the N-terminal peptide may be associated with cachexia (muscle wasting) in cancer patients. Alternative splicing results in multiple transcript variants encoding different isoforms. dermcidin 117159
KRTDAP ENSG00000188508 This gene encodes a protein which may function in the regulation of keratinocyte differentiation and maintenance of stratified epithelia. Multiple transcript variants encoding different isoforms have been found for this gene. keratinocyte differentiation associated protein 388533
CALML5 ENSG00000178372 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. calmodulin like 5 51806
SBSN ENSG00000189001 NA suprabasin 374897
ASPRV1 ENSG00000244617 NA aspartic peptidase, retroviral-like 1 151516
DSP ENSG00000096696 This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. desmoplakin 1832
TMEM45A ENSG00000181458 NA transmembrane protein 45A 55076
CDHR1 ENSG00000148600 This gene belongs to the cadherin superfamily of calcium-dependent cell adhesion molecules. The encoded protein is a photoreceptor-specific cadherin that plays a role in outer segment disc morphogenesis. Mutations in this gene are associated with inherited retinal dystrophies. Alternatively spliced transcript variants encoding different isoforms have been identified. cadherin related family member 1 92211
LY6G6C ENSG00000204421 LY6G6C belongs to a cluster of leukocyte antigen-6 (LY6) genes located in the major histocompatibility complex (MHC) class III region on chromosome 6. Members of the LY6 superfamily typically contain 70 to 80 amino acids, including 8 to 10 cysteines. Most LY6 proteins are attached to the cell surface by a glycosylphosphatidylinositol (GPI) anchor that is directly involved in signal transduction (Mallya et al., 2002 [PubMed 12079290]). lymphocyte antigen 6 complex, locus G6C 80740
PKP1 ENSG00000081277 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. plakophilin 1 5317
DEGS1 ENSG00000143753 This gene encodes a member of the membrane fatty acid desaturase family which is responsible for inserting double bonds into specific positions in fatty acids. This protein contains three His-containing consensus motifs that are characteristic of a group of membrane fatty acid desaturases. It is predicted to be a multiple membrane-spanning protein localized to the endoplasmic reticulum. Overexpression of this gene inhibited biosynthesis of the EGF receptor, suggesting a possible role of a fatty acid desaturase in regulating biosynthetic processing of the EGF receptor. delta(4)-desaturase, sphingolipid 1 8560
CST6 ENSG00000175315 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. This gene encodes a cystatin from the type 2 family, which is down-regulated in metastatic breast tumor cells as compared to primary tumor cells. Loss of expression is likely associated with the progression of a primary tumor to a metastatic phenotype. cystatin E/M 1474
PERP ENSG00000112378 NA PERP, TP53 apoptosis effector 64065
LGALS7B ENSG00000178934 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. Differential and in situ hybridization studies indicate that this lectin is specifically expressed in keratinocytes and found mainly in stratified squamous epithelium. A duplicate copy of this gene (GeneID:3963) is found adjacent to, but on the opposite strand on chromosome 19. lectin, galactoside binding soluble 7B 653499
CLDN1 ENSG00000163347 Tight junctions represent one mode of cell-to-cell adhesion in epithelial or endothelial cell sheets, forming continuous seals around cells and serving as a physical barrier to prevent solutes and water from passing freely through the paracellular space. These junctions are comprised of sets of continuous networking strands in the outwardly facing cytoplasmic leaflet, with complementary grooves in the inwardly facing extracytoplasmic leaflet. The protein encoded by this gene, a member of the claudin family, is an integral membrane protein and a component of tight junction strands. Loss of function mutations result in neonatal ichthyosis-sclerosing cholangitis syndrome. claudin 1 9076
SIK1 ENSG00000142178 NA salt inducible kinase 1 150094
AHNAK2 ENSG00000185567 NA AHNAK nucleoprotein 2 113146
MUCL1 ENSG00000172551 NA mucin like 1 118430
KLF4 ENSG00000136826 This gene encodes a protein that belongs to the Kruppel family of transcription factors. The encoded zinc finger protein is required for normal development of the barrier function of skin. The encoded protein is thought to control the G1-to-S transition of the cell cycle following DNA damage by mediating the tumor suppressor gene p53. Mice lacking this gene have a normal appearance but lose weight rapidly, and die shortly after birth due to fluid evaporation resulting from compromised epidermal barrier function. Alternative splicing results in multiple transcript variants encoding different isoforms. Kruppel-like factor 4 (gut) 9314
HOPX ENSG00000171476 The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. HOP homeobox 84525
CXCL14 ENSG00000145824 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. C-X-C motif chemokine ligand 14 9547
NR1D1 ENSG00000126368 This gene encodes a transcription factor that is a member of the nuclear receptor subfamily 1. The encoded protein is a ligand-sensitive transcription factor that negatively regulates the expression of core clock proteins. In particular this protein represses the circadian clock transcription factor aryl hydrocarbon receptor nuclear translocator-like protein 1 (ARNTL). This protein may also be involved in regulating genes that function in metabolic, inflammatory and cardiovascular processes. nuclear receptor subfamily 1 group D member 1 9572
CTNNBIP1 ENSG00000178585 The protein encoded by this gene binds CTNNB1 and prevents interaction between CTNNB1 and TCF family members. The encoded protein is a negative regulator of the Wnt signaling pathway. Two transcript variants encoding the same protein have been found for this gene. catenin beta interacting protein 1 56998
THEM5 ENSG00000196407 NA thioesterase superfamily member 5 284486
LGALSL ENSG00000119862 NA lectin, galactoside binding like 29094
COL7A1 ENSG00000114270 This gene encodes the alpha chain of type VII collagen. The type VII collagen fibril, composed of three identical alpha collagen chains, is restricted to the basement zone beneath stratified squamous epithelia. It functions as an anchoring fibril between the external epithelia and the underlying stroma. Mutations in this gene are associated with all forms of dystrophic epidermolysis bullosa. In the absence of mutations, however, an acquired form of this disease can result from an autoimmune response made to type VII collagen. collagen type VII alpha 1 1294
RORA ENSG00000069667 The protein encoded by this gene is a member of the NR1 subfamily of nuclear hormone receptors. It can bind as a monomer or as a homodimer to hormone response elements upstream of several genes to enhance the expression of those genes. The encoded protein has been shown to interact with NM23-2, a nucleoside diphosphate kinase involved in organogenesis and differentiation, as well as with NM23-1, the product of a tumor metastasis suppressor candidate gene. Also, it has been shown to aid in the transcriptional regulation of some genes involved in circadian rhythm. Four transcript variants encoding different isoforms have been described for this gene. RAR related orphan receptor A 6095
BLMH ENSG00000108578 Bleomycin hydrolase (BMH) is a cytoplasmic cysteine peptidase that is highly conserved through evolution; however, the only known activity of the enzyme is metabolic inactivation of the glycopeptide bleomycin (BLM), an essential component of combination chemotherapy regimens for cancer. The protein contains the signature active site residues of the cysteine protease papain superfamily. bleomycin hydrolase 642
FGFR3 ENSG00000068078 This gene encodes a member of the fibroblast growth factor receptor (FGFR) family, with its amino acid sequence being highly conserved between members and among divergent species. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein would consist of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds acidic and basic fibroblast growth hormone and plays a role in bone development and maintenance. Mutations in this gene lead to craniosynostosis and multiple types of skeletal dysplasia. Three alternatively spliced transcript variants that encode different protein isoforms have been described. fibroblast growth factor receptor 3 2261
LOC284023 ENSG00000179859 NA uncharacterized LOC284023 284023
TUFT1 ENSG00000143367 Tuftelin is an acidic protein that is thought to play a role in dental enamel mineralization and is implicated in caries susceptibility. It is also thought to be involved with adaptation to hypoxia, mesenchymal stem cell function, and neurotrophin nerve growth factor mediated neuronal differentiation. tuftelin 1 7286
CASZ1 ENSG00000130940 The protein encoded by this gene is a zinc finger transcription factor. The encoded protein may function as a tumor suppressor, and single nucleotide polymorphisms in this gene are associated with blood pressure variation. Alternative splicing results in multiple transcript variants that encode different protein isoforms. castor zinc finger 1 54897
SCGB1B2P ENSG00000268751 NA secretoglobin family 1B member 2, pseudogene 643719
RPLP1 ENSG00000137818 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal phosphoprotein that is a component of the 60S subunit. The protein, which is a functional equivalent of the E. coli L7/L12 ribosomal protein, belongs to the L12P family of ribosomal proteins. It plays an important role in the elongation step of protein synthesis. Unlike most ribosomal proteins, which are basic, the encoded protein is acidic. Its C-terminal end is nearly identical to the C-terminal ends of the ribosomal phosphoproteins P0 and P2. The P1 protein can interact with P0 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Two alternatively spliced transcript variants that encode different proteins have been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein lateral stalk subunit P1 6176
TINCR ENSG00000223573 This gene produces a spliced long non-coding RNA that is required for normal epidermal differentiation. This transcript regulates the expression of genes involved in the differentiation of epidermal tissue. Mutations in some of the genes targeted by this transcript have been implicated in epidermal skin diseases. tissue differentiation-inducing non-protein coding RNA 257000
FOS ENSG00000170345 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. In some cases, expression of the FOS gene has also been associated with apoptotic cell death. FBJ murine osteosarcoma viral oncogene homolog 2353
LOC101930123 ENSG00000103319 NA eukaryotic elongation factor 2 kinase 101930123
EEF2K ENSG00000103319 This gene encodes a highly conserved protein kinase in the calmodulin-mediated signaling pathway that links activation of cell surface receptors to cell division. This kinase is involved in the regulation of protein synthesis. It phosphorylates eukaryotic elongation factor 2 (EEF2) and thus inhibits the EEF2 function. The activity of this kinase is increased in many cancers and may be a valid target for anti-cancer treatment. eukaryotic elongation factor 2 kinase 29904
CCL27 ENSG00000213927 This gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. The protein encoded by this gene is chemotactic for skin-associated memory T lymphocytes. This cytokine may also play a role in mediating homing of lymphocytes to cutaneous sites. It specifically binds to chemokine receptor 10 (CCR10). Studies of a similar murine protein indicate that these protein-receptor interactions have a pivotal role in T cell-mediated skin inflammation. C-C motif chemokine ligand 27 10850
EGFR ENSG00000146648 The protein encoded by this gene is a transmembrane glycoprotein that is a member of the protein kinase superfamily. This protein is a receptor for members of the epidermal growth factor family. EGFR is a cell surface protein that binds to epidermal growth factor. Binding of the protein to a ligand induces receptor dimerization and tyrosine autophosphorylation and leads to cell proliferation. Mutations in this gene are associated with lung cancer. Multiple alternatively spliced transcript variants that encode different protein isoforms have been found for this gene. epidermal growth factor receptor 1956
TRIM29 ENSG00000137699 The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype. tripartite motif containing 29 23650
JUP ENSG00000173801 This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. junction plakoglobin 3728
TNFRSF19 ENSG00000127863 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is highly expressed during embryonic development. It has been shown to interact with TRAF family members, and to activate JNK signaling pathway when overexpressed in cells. This receptor is capable of inducing apoptosis by a caspase-independent mechanism, and it is thought to play an essential role in embryonic development. Alternatively spliced transcript variants encoding distinct isoforms have been described. tumor necrosis factor receptor superfamily member 19 55504
KCNK7 ENSG00000173338 This gene encodes a member of the superfamily of potassium channel proteins containing two pore-forming P domains. The product of this gene has not been shown to be a functional channel; however, it may require other non-pore-forming proteins for activity. Multiple transcript variants encoding different isoforms have been found for this gene. potassium two pore domain channel subfamily K member 7 10089
GPNMB ENSG00000136235 The protein encoded by this gene is a type I transmembrane glycoprotein which shows homology to the pMEL17 precursor, a melanocyte-specific protein. GPNMB shows expression in the lowly metastatic human melanoma cell lines and xenografts but does not show expression in the highly metastatic cell lines. GPNMB may be involved in growth delay and reduction of metastatic potential. Two transcript variants encoding different isoforms have been found for this gene. glycoprotein nmb 10457
FAM57A ENSG00000167695 The protein encoded by this gene is a membrane-associated protein that promotes lung carcinogenesis. The encoded protein may be involved in amino acid transport and glutathione metabolism since it can interact with a solute carrier family member (SLC3A2) and an isoform of gamma-glutamyltranspeptidase-like 3. An alternatively spliced variant encoding a protein that lacks a 32 aa internal segment showed the opposite effect, inhibiting lung cancer cell growth. Knockdown of this gene also inhibited lung carcinogenesis and tumor cell growth. Several transcript variants encoding different isoforms have been found for this gene. family with sequence similarity 57 member A 79850
PPP1R13L ENSG00000104881 IASPP is one of the most evolutionarily conserved inhibitors of p53 (TP53; MIM 191170), whereas ASPP1 (MIM 606455) and ASPP2 (MIM 602143) are activators of p53. protein phosphatase 1 regulatory subunit 13 like 10848
APCDD1 ENSG00000154856 This locus encodes an inhibitor of the Wnt signaling pathway. Mutations at this locus have been associated with hereditary hypotrichosis simplex. Increased expression of this gene may also be associated with colorectal carcinogenesis. adenomatosis polyposis coli down-regulated 1 147495
LOC101927164 ENSG00000237101 NA uncharacterized LOC101927164 101927164
ZNF385A ENSG00000161642 Zinc finger proteins, such as ZNF385A, are regulatory proteins that act as transcription factors, bind single- or double-stranded RNA, or interact with other proteins (Sharma et al., 2004 [PubMed 15527981]). zinc finger protein 385A 25946
RAPGEFL1 ENSG00000108352 NA Rap guanine nucleotide exchange factor like 1 51195
EGR3 ENSG00000179388 This gene encodes a transcriptional regulator that belongs to the EGR family of C2H2-type zinc-finger proteins. It is an immediate-early growth response gene which is induced by mitogenic stimulation. The protein encoded by this gene participates in the transcriptional regulation of genes in controling biological rhythm. It may also play a role in a wide variety of processes including muscle development, lymphocyte development, endothelial cell growth and migration, and neuronal development. Alternative splicing results in multiple transcript variants encoding distinct isoforms. early growth response 3 1960
EPHB6 ENSG00000106123 This gene encodes a member of a family of transmembrane proteins that function as receptors for ephrin-B family proteins. Unlike other members of this family, the encoded protein does not contain a functional kinase domain. Activity of this protein can influence cell adhesion and migration. Expression of this gene is downregulated during tumor progression, suggesting that the protein may suppress tumor invasion and metastasis. Alternative splicing results in multiple transcript variants. EPH receptor B6 2051
IDE ENSG00000119912 This gene encodes a zinc metallopeptidase that degrades intracellular insulin, and thereby terminates insulins activity, as well as participating in intercellular peptide signalling by degrading diverse peptides such as glucagon, amylin, bradykinin, and kallidin. The preferential affinity of this enzyme for insulin results in insulin-mediated inhibition of the degradation of other peptides such as beta-amyloid. Deficiencies in this protein’s function are associated with Alzheimer’s disease and type 2 diabetes mellitus but mutations in this gene have not been shown to be causitive for these diseases. This protein localizes primarily to the cytoplasm but in some cell types localizes to the extracellular space, cell membrane, peroxisome, and mitochondrion. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Additional transcript variants have been described but have not been experimentally verified. insulin degrading enzyme 3416
ETV3 ENSG00000117036 NA ETS variant 3 2117
CA12 ENSG00000074410 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. This gene product is a type I membrane protein that is highly expressed in normal tissues, such as kidney, colon and pancreas, and has been found to be overexpressed in 10% of clear cell renal carcinomas. Three transcript variants encoding different isoforms have been identified for this gene. carbonic anhydrase 12 771
ZNF273 ENSG00000198039 This gene is a member of the krueppel C2H2-type zinc-finger protein family and encodes a protein with 13 C2H2-type zinc fingers and a KRAB domain. This nuclear protein is involved in transcriptional regulation. Alternative splicing results in multiple transcript variants. zinc finger protein 273 10793
LONRF1 ENSG00000154359 NA LON peptidase N-terminal domain and ring finger 1 91694
MYCL ENSG00000116990 NA v-myc avian myelocytomatosis viral oncogene lung carcinoma derived homolog 4610
ATP6V1C2 ENSG00000143882 This gene encodes a component of vacuolar ATPase (V-ATPase), a multisubunit enzyme that mediates acidification of eukaryotic intracellular organelles. V-ATPase dependent organelle acidification is necessary for such intracellular processes as protein sorting, zymogen activation, receptor-mediated endocytosis, and synaptic vesicle proton gradient generation. V-ATPase is composed of a cytosolic V1 domain and a transmembrane V0 domain. The V1 domain consists of three A,three B, and two G subunits, as well as a C, D, E, F, and H subunit. The V1 domain contains the ATP catalytic site. This gene encodes alternate transcriptional splice variants, encoding different V1 domain C subunit isoforms. ATPase H+ transporting V1 subunit C2 245973
LOC101929777 ENSG00000108379 NA uncharacterized LOC101929777 101929777
WNT3 ENSG00000108379 The WNT gene family consists of structurally related genes which encode secreted signaling proteins. These proteins have been implicated in oncogenesis and in several developmental processes, including regulation of cell fate and patterning during embryogenesis. This gene is a member of the WNT gene family. It encodes a protein which shows 98% amino acid identity to mouse Wnt3 protein, and 84% to human WNT3A protein, another WNT gene product. The mouse studies show the requirement of Wnt3 in primary axis formation in the mouse. Studies of the gene expression suggest that this gene may play a key role in some cases of human breast, rectal, lung, and gastric cancer through activation of the WNT-beta-catenin-TCF signaling pathway. This gene is clustered with WNT15, another family member, in the chromosome 17q21 region. Wnt family member 3 7473
BICD2 ENSG00000185963 This gene is one of two human homologs of Drosophila bicaudal-D and a member of the Bicoid family. It has been implicated in dynein-mediated, minus end-directed motility along microtubules. It has also been reported to be a phosphorylation target of NIMA related kinase 8. Two alternative splice variants have been described. BICD cargo adaptor 2 23299
IL20RB ENSG00000174564 IL20RB and IL20RA (MIM 605620) form a heterodimeric receptor for interleukin-20 (IL20; MIM 605619) (Blumberg et al., 2001 [PubMed 11163236]). interleukin 20 receptor subunit beta 53833
IRF6 ENSG00000117595 This gene encodes a member of the interferon regulatory transcription factor (IRF) family. Family members share a highly-conserved N-terminal helix-turn-helix DNA-binding domain and a less conserved C-terminal protein-binding domain. The encoded protein may be a transcriptional activator. Mutations in this gene can cause van der Woude syndrome and popliteal pterygium syndrome. Mutations in this gene are also associated with non-syndromic orofacial cleft type 6. Alternate splicing results in multiple transcript variants. interferon regulatory factor 6 3664
TRIM35 ENSG00000104228 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The function of this protein has not been identified. tripartite motif containing 35 23087
METRNL ENSG00000176845 NA meteorin, glial cell differentiation regulator-like 284207
VANGL2 ENSG00000162738 The protein encoded by this gene is a membrane protein involved in the regulation of planar cell polarity, especially in the stereociliary bundles of the cochlea. The encoded protein transmits directional signals to individual cells or groups of cells in epithelial sheets. This protein is also involved in the development of the neural plate. VANGL planar cell polarity protein 2 57216
ACVR1B ENSG00000135503 This gene encodes an activin A type IB receptor. Activins are dimeric growth and differentiation factors which belong to the transforming growth factor-beta (TGF-beta) superfamily of structurally related signaling proteins. Activins signal through a heteromeric complex of receptor serine kinases which include at least two type I and two type II receptors. This protein is a type I receptor which is essential for signaling. Mutations in this gene are associated with pituitary tumors. Alternate splicing results in multiple transcript variants. activin A receptor type 1B 91
GJA1 ENSG00000152661 This gene is a member of the connexin gene family. The encoded protein is a component of gap junctions, which are composed of arrays of intercellular channels that provide a route for the diffusion of low molecular weight materials from cell to cell. The encoded protein is the major protein of gap junctions in the heart that are thought to have a crucial role in the synchronized contraction of the heart and in embryonic development. A related intronless pseudogene has been mapped to chromosome 5. Mutations in this gene have been associated with oculodentodigital dysplasia, autosomal recessive craniometaphyseal dysplasia and heart malformations. gap junction protein alpha 1 2697
MAFB ENSG00000204103 The protein encoded by this gene is a basic leucine zipper (bZIP) transcription factor that plays an important role in the regulation of lineage-specific hematopoiesis. The encoded nuclear protein represses ETS1-mediated transcription of erythroid-specific genes in myeloid cells. This gene contains no introns. v-maf avian musculoaponeurotic fibrosarcoma oncogene homolog B 9935
ACAD9 ENSG00000177646 This gene encodes a member of the acyl-CoA dehydrogenase family. Members of this family of proteins localize to the mitochondria and catalyze the rate-limiting step in the beta-oxidation of fatty acyl-CoA. The encoded protein is specifically active toward palmitoyl-CoA and long-chain unsaturated substrates. Mutations in this gene cause acyl-CoA dehydrogenase family member type 9 deficiency. Alternate splicing results in multiple transcript variants. acyl-CoA dehydrogenase family member 9 28976
DNASE1L2 ENSG00000167968 NA deoxyribonuclease I-like 2 1775
ELOVL4 ENSG00000118402 This gene encodes a membrane-bound protein which is a member of the ELO family, proteins which participate in the biosynthesis of fatty acids. Consistent with the expression of the encoded protein in photoreceptor cells of the retina, mutations and small deletions in this gene are associated with Stargardt-like macular dystrophy (STGD3) and autosomal dominant Stargardt-like macular dystrophy (ADMD), also referred to as autosomal dominant atrophic macular degeneration. ELOVL fatty acid elongase 4 6785
RP5-1126H10.2 ENSG00000272084 NA NA ENSG00000272084
RALBP1 ENSG00000017797 RALBP1 plays a role in receptor-mediated endocytosis and is a downstream effector of the small GTP-binding protein RAL (see RALA; MIM 179550). Small G proteins, such as RAL, have GDP-bound inactive and GTP-bound active forms, which shift from the inactive to the active state through the action of RALGDS (MIM 601619), which in turn is activated by RAS (see HRAS; MIM 190020) (summary by Feig, 2003 [PubMed 12888294]). ralA binding protein 1 10928
FAM110A ENSG00000125898 NA family with sequence similarity 110 member A 83541
RP11-84A14.5 ENSG00000223989 NA NA ENSG00000223989
HES1 ENSG00000114315 This protein belongs to the basic helix-loop-helix family of transcription factors. It is a transcriptional repressor of genes that require a bHLH protein for their transcription. The protein has a particular type of basic domain that contains a helix interrupting protein that binds to the N-box rather than the canonical E-box. hes family bHLH transcription factor 1 3280
SPPL3 ENSG00000157837 NA signal peptide peptidase like 3 121665
PLCH2 ENSG00000149527 PLCH2 is a member of the PLC-eta family of the phosphoinositide-specific phospholipase C (PLC) superfamily of enzymes that cleave PtdIns(4,5) P2 to generate second messengers inositol 1,4,5-trisphosphate and diacylglycerol (Zhou et al., 2005 [PubMed 16107206]). phospholipase C eta 2 9651
WEE1 ENSG00000166483 This gene encodes a nuclear protein, which is a tyrosine kinase belonging to the Ser/Thr family of protein kinases. This protein catalyzes the inhibitory tyrosine phosphorylation of CDC2/cyclin B kinase, and appears to coordinate the transition between DNA replication and mitosis by protecting the nucleus from cytoplasmically activated CDC2 kinase. WEE1 G2 checkpoint kinase 7465
RNH1 ENSG00000023191 Placental ribonuclease inhibitor (PRI) is a member of a family of proteinaceous cytoplasmic RNase inhibitors that occur in many tissues and bind to both intracellular and extracellular RNases (summarized by Lee et al., 1988 [PubMed 3219362]). In addition to control of intracellular RNases, the inhibitor may have a role in the regulation of angiogenin (MIM 105850). Ribonuclease inhibitor, of 50,000 Da, binds to ribonucleases and holds them in a latent form. Since neutral and alkaline ribonucleases probably play a critical role in the turnover of RNA in eukaryotic cells, RNH may be essential for control of mRNA turnover; the interaction of eukaryotic cells with ribonuclease may be reversible in vivo. ribonuclease/angiogenin inhibitor 1 6050
PARD6G ENSG00000178184 NA par-6 family cell polarity regulator gamma 84552
BNIPL ENSG00000163141 The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. BCL2/adenovirus E1B 19kD interacting protein like 149428
GAL3ST4 ENSG00000197093 This gene encodes a member of the galactose-3-O-sulfotransferase protein family. The product of this gene catalyzes sulfonation by transferring a sulfate to the C-3’ position of galactose residues in O-linked glycoproteins. This enzyme is highly specific for core 1 structures, with asialofetuin, Gal-beta-1,3-GalNAc and Gal-beta-1,3 (GlcNAc-beta-1,6)GalNAc being good substrates. galactose-3-O-sulfotransferase 4 79690
S100A3 ENSG00000188015 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein has the highest content of cysteines of all S100 proteins, has a high affinity for Zinc, and is highly expressed in human hair cuticle. The precise function of this protein is unknown. S100 calcium binding protein A3 6274
RAB40C ENSG00000197562 NA RAB40C, member RAS oncogene family 57799
SERPINB8 ENSG00000166401 The superfamily of high molecular weight serine proteinase inhibitors (serpins) regulate a diverse set of intracellular and extracellular processes such as complement activation, fibrinolysis, coagulation, cellular differentiation, tumor suppression, apoptosis, and cell migration. Serpins are characterized by well-conserved a tertiary structure that consists of 3 beta sheets and 8 or 9 alpha helices (Huber and Carrell, 1989 [PubMed 2690952]). A critical portion of the molecule, the reactive center loop connects beta sheets A and C. Protease inhibitor-8 (PI8; SERPINB8) is a member of the ov-serpin subfamily, which, relative to the archetypal serpin PI1 (MIM 107400), is characterized by a high degree of homology to chicken ovalbumin, lack of N- and C-terminal extensions, absence of a signal peptide, and a serine rather than an asparagine residue at the penultimate position (summary by Bartuski et al., 1997 [PubMed 9268635]). serpin family B member 8 5271
RPS6KB2 ENSG00000175634 This gene encodes a member of the RSK (ribosomal S6 kinase) family of serine/threonine kinases. This kinase contains a kinase catalytic domain and phosphorylates the S6 ribosomal protein and eukaryotic translation initiation factor 4B (eIF4B). Phosphorylation of S6 leads to an increase in protein synthesis and cell proliferation. ribosomal protein S6 kinase B2 6199
ADAM15 ENSG00000143537 The protein encoded by this gene is a member of the ADAM (a disintegrin and metalloproteinase) protein family. ADAM family members are type I transmembrane glycoproteins known to be involved in cell adhesion and proteolytic ectodomain processing of cytokines and adhesion molecules. This protein contains multiple functional domains including a zinc-binding metalloprotease domain, a disintegrin-like domain, as well as a EGF-like domain. Through its disintegrin-like domain, this protein specifically interacts with the integrin beta chain, beta 3. It also interacts with Src family protein-tyrosine kinases in a phosphorylation-dependent manner, suggesting that this protein may function in cell-cell adhesion as well as in cellular signaling. Multiple alternatively spliced transcript variants encoding distinct isoforms have been observed. ADAM metallopeptidase domain 15 8751
FAM46B ENSG00000158246 NA family with sequence similarity 46 member B 115572
VPS13D ENSG00000048707 This gene encodes a protein belonging to the vacuolar-protein-sorting-13 gene family. In yeast, vacuolar-protein-sorting-13 proteins are involved in trafficking of membrane proteins between the trans-Golgi network and the prevacuolar compartment. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode distinct isoforms. vacuolar protein sorting 13 homolog D 55187
JAG2 ENSG00000184916 The Notch signaling pathway is an intercellular signaling mechanism that is essential for proper embryonic development. Members of the Notch gene family encode transmembrane receptors that are critical for various cell fate decisions. The protein encoded by this gene is one of several ligands that activate Notch and related receptors. Two transcript variants encoding different isoforms have been found for this gene. jagged 2 3714
DEGS2 ENSG00000168350 This gene encodes a bifunctional enzyme that is involved in the biosynthesis of phytosphingolipids in human skin and in other phytosphingolipid-containing tissues. This enzyme can act as a sphingolipid delta(4)-desaturase, and also as a sphingolipid C4-hydroxylase. delta(4)-desaturase, sphingolipid 2 123099
PLEKHG5 ENSG00000171680 This gene encodes a protein that activates the nuclear factor kappa B (NFKB1) signaling pathway. Mutations in this gene are associated with autosomal recessive distal spinal muscular atrophy. Multiple transcript variants encoding different isoforms have been found for this gene. pleckstrin homology and RhoGEF domain containing G5 57449
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",6,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 7 Annotations

out <- mygene::queryMany(gene_list[7,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id name query summary
NEB 4703 nebulin ENSG00000183091 This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy.
MYH1 4619 myosin, heavy chain 1, skeletal muscle, adult ENSG00000109061 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development.
MYH2 4620 myosin, heavy chain 2, skeletal muscle, adult ENSG00000125414 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified.
MYBPC1 4604 myosin binding protein C, slow type ENSG00000196091 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
ACTA1 58 actin, alpha 1, skeletal muscle ENSG00000143632 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects.
MYL1 4632 myosin light chain 1 ENSG00000168530 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene.
TNNC2 7125 troponin C2, fast skeletal type ENSG00000101470 Troponin (Tn), a key protein complex in the regulation of striated muscle contraction, is composed of 3 subunits. The Tn-I subunit inhibits actomyosin ATPase, the Tn-T subunit binds tropomyosin and Tn-C, while the Tn-C subunit binds calcium and overcomes the inhibitory action of the troponin complex on actin filaments. The protein encoded by this gene is the Tn-C subunit.
TNNT1 7138 troponin T1, slow skeletal type ENSG00000105048 This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene.
CKM 1158 creatine kinase, M-type ENSG00000104879 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family.
ATP2A1 487 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 1 ENSG00000196296 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in muscular excitation and contraction. Mutations in this gene cause some autosomal recessive forms of Brody disease, characterized by increasing impairment of muscular relaxation during exercise. Alternative splicing results in three transcript variants encoding different isoforms.
PYGM 5837 phosphorylase, glycogen, muscle ENSG00000068976 This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants.
TTN 7273 titin ENSG00000155657 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma.
MYBPC2 4606 myosin binding protein C, fast type ENSG00000086967 This gene encodes a member of the myosin-binding protein C family. This family includes the fast-, slow- and cardiac-type isoforms, each of which is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The protein encoded by this locus is referred to as the fast-type isoform. Mutations in the related but distinct genes encoding the slow-type and cardiac-type isoforms have been associated with distal arthrogryposis, type 1 and hypertrophic cardiomyopathy, respectively.
MYLPF 29895 myosin light chain, phosphorylatable, fast skeletal muscle ENSG00000180209 NA
TNNT3 7140 troponin T3, fast skeletal type ENSG00000130595 The binding of Ca(2+) to the trimeric troponin complex initiates the process of muscle contraction. Increased Ca(2+) concentrations produce a conformational change in the troponin complex that is transmitted to tropomyosin dimers situated along actin filaments. The altered conformation permits increased interaction between a myosin head and an actin filament which, ultimately, produces a muscle contraction. The troponin complex has protein subunits C, I, and T. Subunit C binds Ca(2+) and subunit I binds to actin and inhibits actin-myosin interaction. Subunit T binds the troponin complex to the tropomyosin complex and is also required for Ca(2+)-mediated activation of actomyosin ATPase activity. There are 3 different troponin T genes that encode tissue-specific isoforms of subunit T for fast skeletal-, slow skeletal-, and cardiac-muscle. This gene encodes fast skeletal troponin T protein; also known as troponin T type 3. Alternative splicing results in multiple transcript variants encoding additional distinct troponin T type 3 isoforms. A developmentally regulated switch between fetal/neonatal and adult troponin T type 3 isoforms occurs. Additional splice variants have been described but their biological validity has not been established. Mutations in this gene may cause distal arthrogryposis multiplex congenita type 2B (DA2B).
TNNI1 7135 troponin I1, slow skeletal type ENSG00000159173 Troponin proteins associate with tropomyosin and regulate the calcium sensitivity of the myofibril contractile apparatus of striated muscles. Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. The TnI-fast and TnI-slow genes are expressed in fast-twitch and slow-twitch skeletal muscle fibers, respectively, while the TnI-cardiac gene is expressed exclusively in cardiac muscle tissue. This gene encodes the Troponin-I-skeletal-slow-twitch protein. This gene is expressed in cardiac and skeletal muscle during early development but is restricted to slow-twitch skeletal muscle fibers in adults. The encoded protein prevents muscle contraction by inhibiting calcium-mediated conformational changes in actin-myosin complexes.
RYR1 6261 ryanodine receptor 1 ENSG00000196218 This gene encodes a ryanodine receptor found in skeletal muscle. The encoded protein functions as a calcium release channel in the sarcoplasmic reticulum but also serves to connect the sarcoplasmic reticulum and transverse tubule. Mutations in this gene are associated with malignant hyperthermia susceptibility, central core disease, and minicore myopathy with external ophthalmoplegia. Alternatively spliced transcripts encoding different isoforms have been described.
CA3 761 carbonic anhydrase 3 ENSG00000164879 Carbonic anhydrase III (CAIII) is a member of a multigene family (at least six separate genes are known) that encodes carbonic anhydrase isozymes. These carbonic anhydrases are a class of metalloenzymes that catalyze the reversible hydration of carbon dioxide and are differentially expressed in a number of cell types. The expression of the CA3 gene is strictly tissue specific and present at high levels in skeletal muscle and much lower levels in cardiac and smooth muscle. A proportion of carriers of Duchenne muscle dystrophy have a higher CA3 level than normal. The gene spans 10.3 kb and contains seven exons and six introns.
KLHL41 10324 kelch like family member 41 ENSG00000239474 This gene is a member of the kelch-like family. The encoded protein contains a BACK domain, a BTB/POZ domain, and 5 Kelch repeats. This protein is thought to function in skeletal muscle development and maintenance. Mutations in this gene have been associated with nemaline myopathy (NM), a rare congenital muscle disorder.
TNNI2 7136 troponin I2, fast skeletal type ENSG00000130598 This gene encodes a fast-twitch skeletal muscle protein, a member of the troponin I gene family, and a component of the troponin complex including troponin T, troponin C and troponin I subunits. The troponin complex, along with tropomyosin, is responsible for the calcium-dependent regulation of striated muscle contraction. Mouse studies show that this component is also present in vascular smooth muscle and may play a role in regulation of smooth muscle function. In addition to muscle tissues, this protein is found in corneal epithelium, cartilage where it is an inhibitor of angiogenesis to inhibit tumor growth and metastasis, and mammary gland where it functions as a co-activator of estrogen receptor-related receptor alpha. This protein also suppresses tumor growth in human ovarian carcinoma. Mutations in this gene cause myopathy and distal arthrogryposis type 2B. Alternatively spliced transcript variants have been found for this gene.
YBX3 8531 Y-box binding protein 3 ENSG00000060138 NA
MYOZ1 58529 myozenin 1 ENSG00000177791 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling.
LOC100129518 100129518 uncharacterized LOC100129518 ENSG00000112096 NA
SOD2 6648 superoxide dismutase 2, mitochondrial ENSG00000112096 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternate transcriptional splice variants, encoding different isoforms, have been characterized.
ENO3 2027 enolase 3 (beta, muscle) ENSG00000108515 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme is found in skeletal muscle cells in the adult where it may play a role in muscle development and regeneration. A switch from alpha enolase to beta enolase occurs in muscle tissue during development in rodents. Mutations in this gene have be associated glycogen storage disease. Alternatively spliced transcript variants encoding different isoforms have been described.
STAC3 246329 SH3 and cysteine rich domain 3 ENSG00000185482 The protein encoded by this gene is a component of the excitation-contraction coupling machinery of muscles. This protein is a member of the Stac gene family and contains an N-terminal cysteine-rich domain and two SH3 domains. Mutations in this gene are a cause of Native American myopathy.
MYOT 9499 myotilin ENSG00000120729 This gene encodes a cystoskeletal protein which plays a significant role in the stability of thin filaments during muscle contraction. This protein binds F-actin, crosslinks actin filaments, and prevents latrunculin A-induced filament disassembly. Mutations in this gene have been associated with limb-girdle muscular dystrophy and myofibrillar myopathies. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined.
CASQ1 844 calsequestrin 1 ENSG00000143318 This gene encodes the skeletal muscle specific member of the calsequestrin protein family. Calsequestrin functions as a luminal sarcoplasmic reticulum calcium sensor in both cardiac and skeletal muscle cells. This protein, also known as calmitine, functions as a calcium regulator in the mitochondria of skeletal muscle. This protein is absent in patients with Duchenne and Becker types of muscular dystrophy.
TTN-AS1 100506866 TTN antisense RNA 1 ENSG00000237298 NA
LOC101927055 101927055 uncharacterized LOC101927055 ENSG00000237298 NA
CMYA5 202333 cardiomyopathy associated 5 ENSG00000164309 NA
OBSCN 84033 obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF ENSG00000154358 The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. This protein belongs to the family of giant sacromeric signaling proteins that includes titin and nebulin, and may have a role in the organization of myofibrils during assembly and may mediate interactions between the sarcoplasmic reticulum and myofibrils. Alternatively spliced transcript variants encoding different isoforms have been identified.
TPM3 7170 tropomyosin 3 ENSG00000143549 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants.
ALDOA 226 aldolase, fructose-bisphosphate A ENSG00000149925 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10.
FHL3 2275 four and a half LIM domains 3 ENSG00000183386 The protein encoded by this gene is a member of a family of proteins containing a four-and-a-half LIM domain, which is a highly conserved double zinc finger motif. The encoded protein has been shown to interact with the cancer developmental regulators SMAD2, SMAD3, and SMAD4, the skeletal muscle myogenesis protein MyoD, and the high-affinity IgE beta chain regulator MZF-1. This protein may be involved in tumor suppression, repression of MyoD expression, and repression of IgE receptor expression. Two transcript variants encoding different isoforms have been found for this gene.
LOC100507537 100507537 uncharacterized LOC100507537 ENSG00000240045 NA
TRIM63 84676 tripartite motif containing 63 ENSG00000158022 This gene encodes a member of the RING zinc finger protein family found in striated muscle and iris. The product of this gene is an E3 ubiquitin ligase that localizes to the Z-line and M-line lattices of myofibrils. This protein plays an important role in the atrophy of skeletal and cardiac muscle and is required for the degradation of myosin heavy chain proteins, myosin light chain, myosin binding protein, and for muscle-type creatine kinase.
TMOD4 29765 tropomodulin 4 ENSG00000163157 NA
NRAP 4892 nebulin related anchoring protein ENSG00000197893 NA
PPP1R27 116729 protein phosphatase 1 regulatory subunit 27 ENSG00000182676 NA
PDK4 5166 pyruvate dehydrogenase kinase 4 ENSG00000004799 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin.
PDLIM3 27295 PDZ and LIM domain 3 ENSG00000154553 The protein encoded by this gene contains a PDZ domain and a LIM domain, indicating that it may be involved in cytoskeletal assembly. In support of this, the encoded protein has been shown to bind the spectrin-like repeats of alpha-actinin-2 and to colocalize with alpha-actinin-2 at the Z lines of skeletal muscle. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. Aberrant alternative splicing of this gene may play a role in myotonic dystrophy.
ADCK3 56997 aarF domain containing kinase 3 ENSG00000163050 This gene encodes a mitochondrial protein similar to yeast ABC1, which functions in an electron-transferring membrane protein complex in the respiratory chain. It is not related to the family of ABC transporter proteins. Expression of this gene is induced by the tumor suppressor p53 and in response to DNA damage, and inhibiting its expression partially suppresses p53-induced apoptosis. Alternatively spliced transcript variants have been found; however, their full-length nature has not been determined.
BIN1 274 bridging integrator 1 ENSG00000136717 This gene encodes several isoforms of a nucleocytoplasmic adaptor protein, one of which was initially identified as a MYC-interacting protein with features of a tumor suppressor. Isoforms that are expressed in the central nervous system may be involved in synaptic vesicle endocytosis and may interact with dynamin, synaptojanin, endophilin, and clathrin. Isoforms that are expressed in muscle and ubiquitously expressed isoforms localize to the cytoplasm and nucleus and activate a caspase-independent apoptotic process. Studies in mouse suggest that this gene plays an important role in cardiac muscle development. Alternate splicing of the gene results in several transcript variants encoding different isoforms. Aberrant splice variants expressed in tumor cell lines have also been described.
OPTN 10133 optineurin ENSG00000123240 This gene encodes the coiled-coil containing protein optineurin. Optineurin may play a role in normal-tension glaucoma and adult-onset primary open angle glaucoma. Optineurin interacts with adenovirus E3-14.7K protein and may utilize tumor necrosis factor-alpha or Fas-ligand pathways to mediate apoptosis, inflammation or vasoconstriction. Optineurin may also function in cellular morphogenesis and membrane trafficking, vesicle trafficking, and transcription activation through its interactions with the RAB8, huntingtin, and transcription factor IIIA proteins. Alternative splicing results in multiple transcript variants encoding the same protein.
TPT1 7178 tumor protein, translationally-controlled 1 ENSG00000133112 NA
UCP3 7352 uncoupling protein 3 (mitochondrial, proton carrier) ENSG00000175564 Mitochondrial uncoupling proteins (UCP) are members of the larger family of mitochondrial anion carrier proteins (MACP). UCPs separate oxidative phosphorylation from ATP synthesis with energy dissipated as heat, also referred to as the mitochondrial proton leak. UCPs facilitate the transfer of anions from the inner to the outer mitochondrial membrane and the return transfer of protons from the outer to the inner mitochondrial membrane. They also reduce the mitochondrial membrane potential in mammalian cells. The different UCPs have tissue-specific expression; this gene is primarily expressed in skeletal muscle. This gene’s protein product is postulated to protect mitochondria against lipid-induced oxidative stress. Expression levels of this gene increase when fatty acid supplies to mitochondria exceed their oxidation capacity and the protein enables the export of fatty acids from mitochondria. UCPs contain the three solcar protein domains typically found in MACPs. Two splice variants have been found for this gene.
FBXO32 114907 F-box protein 32 ENSG00000156804 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class and contains an F-box domain. This protein is highly expressed during muscle atrophy, whereas mice deficient in this gene were found to be resistant to atrophy. This protein is thus a potential drug target for the treatment of muscle atrophy. Alternative splicing results in multiple transcript variants encoding different isoforms.
SMTNL1 219537 smoothelin-like 1 ENSG00000214872 SMTNL1, which is a member of the smoothelin (SMTN; MIM 602127) family, regulates contraction and relaxation of skeletal and smooth muscle fibers and mediates vascular adaptation to exercise (Wooldridge et al., 2008 [PubMed 18310078]).
CA3-AS1 100996348 CA3 antisense RNA 1 ENSG00000253549 NA
FEM1A 55527 fem-1 homolog A ENSG00000141965 NA
PCNT 5116 pericentrin ENSG00000160299 The protein encoded by this gene binds to calmodulin and is expressed in the centrosome. It is an integral component of the pericentriolar material (PCM). The protein contains a series of coiled-coil domains and a highly conserved PCM targeting motif called the PACT domain near its C-terminus. The protein interacts with the microtubule nucleation component gamma-tubulin and is likely important to normal functioning of the centrosomes, cytoskeleton, and cell-cycle progression. Mutations in this gene cause Seckel syndrome-4 and microcephalic osteodysplastic primordial dwarfism type II. Two transcript variants encoding different isoforms have been found for this gene.
AC005523.2 ENSG00000269604 NA ENSG00000269604 NA
MKNK2 2872 MAP kinase interacting serine/threonine kinase 2 ENSG00000099875 This gene encodes a member of the calcium/calmodulin-dependent protein kinases (CAMK) Ser/Thr protein kinase family, which belongs to the protein kinase superfamily. This protein contains conserved DLG (asp-leu-gly) and ENIL (glu-asn-ile-leu) motifs, and an N-terminal polybasic region which binds importin A and the translation factor scaffold protein eukaryotic initiation factor 4G (eIF4G). This protein is one of the downstream kinases activated by mitogen-activated protein (MAP) kinases. It phosphorylates the eukaryotic initiation factor 4E (eIF4E), thus playing important roles in the initiation of mRNA translation, oncogenic transformation and malignant cell proliferation. In addition to eIF4E, this protein also interacts with von Hippel-Lindau tumor suppressor (VHL), ring-box 1 (Rbx1) and Cullin2 (Cul2), which are all components of the CBC(VHL) ubiquitin ligase E3 complex. Multiple alternatively spliced transcript variants have been found, but the full-length nature and biological activity of only two variants are determined. These two variants encode distinct isoforms which differ in activity and regulation, and in subcellular localization.
PFKM 5213 phosphofructokinase, muscle ENSG00000152556 Three phosphofructokinase isozymes exist in humans: muscle, liver and platelet. These isozymes function as subunits of the mammalian tetramer phosphofructokinase, which catalyzes the phosphorylation of fructose-6-phosphate to fructose-1,6-bisphosphate. Tetramer composition varies depending on tissue type. This gene encodes the muscle-type isozyme. Mutations in this gene have been associated with glycogen storage disease type VII, also known as Tarui disease. Alternatively spliced transcript variants have been described.
EGLN1 54583 egl-9 family hypoxia inducible factor 1 ENSG00000135766 The protein encoded by this gene catalyzes the post-translational formation of 4-hydroxyproline in hypoxia-inducible factor (HIF) alpha proteins. HIF is a transcriptional complex that plays a central role in mammalian oxygen homeostasis. This protein functions as a cellular oxygen sensor, and under normal oxygen concentration, modification by prolyl hydroxylation is a key regulatory event that targets HIF subunits for proteasomal destruction via the von Hippel-Lindau ubiquitylation complex. Mutations in this gene are associated with erythrocytosis familial type 3 (ECYT3).
DDIT4L 115265 DNA damage inducible transcript 4 like ENSG00000145358 NA
SLN 6588 sarcolipin ENSG00000170290 Sarcoplasmic reticulum Ca(2+)-ATPases are transmembrane proteins that catalyze the ATP-dependent transport of Ca(2+) from the cytosol into the lumen of the sarcoplasmic reticulum in muscle cells. This gene encodes a small proteolipid that regulates several sarcoplasmic reticulum Ca(2+)-ATPases. The transmembrane protein interacts with Ca(2+)-ATPases and reduces the accumulation of Ca(2+) in the sarcoplasmic reticulum without affecting the rate of ATP hydrolysis.
GAPDH 2597 glyceraldehyde-3-phosphate dehydrogenase ENSG00000111640 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants.
NFE2L1 4779 nuclear factor, erythroid 2 like 1 ENSG00000082641 This gene encodes a protein that is involved in globin gene expression in erythrocytes. Confusion has occurred in bibliographic databases due to the shared symbol of NRF1 for this gene, NFE2L1, and for ‘nuclear respiratory factor 1’ which has an official symbol of NRF1.
JPH1 56704 junctophilin 1 ENSG00000104369 Junctional complexes between the plasma membrane and endoplasmic/sarcoplasmic reticulum are a common feature of all excitable cell types and mediate cross talk between cell surface and intracellular ion channels. The protein encoded by this gene is a component of junctional complexes and is composed of a C-terminal hydrophobic segment spanning the endoplasmic/sarcoplasmic reticulum membrane and a remaining cytoplasmic domain that shows specific affinity for the plasma membrane. This gene is a member of the junctophilin gene family.
CSDE1 7812 cold shock domain containing E1 ENSG00000009307 NA
PGM1 5236 phosphoglucomutase 1 ENSG00000079739 The protein encoded by this gene is an isozyme of phosphoglucomutase (PGM) and belongs to the phosphohexose mutase family. There are several PGM isozymes, which are encoded by different genes and catalyze the transfer of phosphate between the 1 and 6 positions of glucose. In most cell types, this PGM isozyme is predominant, representing about 90% of total PGM activity. In red cells, PGM2 is a major isozyme. This gene is highly polymorphic. Mutations in this gene cause glycogen storage disease type 14. Alternativley spliced transcript variants encoding different isoforms have been identified in this gene.
RAD23A 5886 RAD23 homolog A, nucleotide excision repair protein ENSG00000179262 The protein encoded by this gene is one of two human homologs of Saccharomyces cerevisiae Rad23, a protein involved in nucleotide excision repair. Proteins in this family have a modular domain structure consisting of an ubiquitin-like domain (UbL), ubiquitin-associated domain 1 (UbA1), XPC-binding domain and UbA2. The protein encoded by this gene plays an important role in nucleotide excision repair and also in delivery of polyubiquitinated proteins to the proteasome. Alternative splicing results in multiple transcript variants encoding multiple isoforms.
SYPL2 284612 synaptophysin like 2 ENSG00000143028 NA
H19 283120 H19, imprinted maternally expressed transcript (non-protein coding) ENSG00000130600 This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants.
EEF1A2 1917 eukaryotic translation elongation factor 1 alpha 2 ENSG00000101210 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer.
PDE4DIP 9659 phosphodiesterase 4D interacting protein ENSG00000178104 The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene.
TMEM38A 79041 transmembrane protein 38A ENSG00000072954 NA
CHMP1B 57132 charged multivesicular body protein 1B ENSG00000255112 CHMP1B belongs to the chromatin-modifying protein/charged multivesicular body protein (CHMP) family. These proteins are components of ESCRT-III (endosomal sorting complex required for transport III), a complex involved in degradation of surface receptor proteins and formation of endocytic multivesicular bodies (MVBs). Some CHMPs have both nuclear and cytoplasmic/vesicular distributions, and one such CHMP, CHMP1A (MIM 164010), is required for both MVB formation and regulation of cell cycle progression (Tsang et al., 2006 [PubMed 16730941]).
FLNC 2318 filamin C ENSG00000128591 This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene.
KIAA0368 23392 KIAA0368 ENSG00000136813 NA
RP11-381K20.2 ENSG00000250159 NA ENSG00000250159 NA
USP13 8975 ubiquitin specific peptidase 13 (isopeptidase T-3) ENSG00000058056 NA
DCAF6 55827 DDB1 and CUL4 associated factor 6 ENSG00000143164 NA
ASB2 51676 ankyrin repeat and SOCS box containing 2 ENSG00000100628 This gene encodes a member of the ankyrin repeat and SOCS box-containing (ASB) protein family. These proteins play a role in protein degradation by coupling suppressor of cytokine signalling (SOCS) proteins with the elongin BC complex. The encoded protein is a subunit of a multimeric E3 ubiquitin ligase complex that mediates the degradation of actin-binding proteins. This gene plays a role in retinoic acid-induced growth inhibition and differentiation of myeloid leukemia cells. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
CORO6 84940 coronin 6 ENSG00000167549 NA
ADSSL1 122622 adenylosuccinate synthase like 1 ENSG00000185100 This gene encodes a member of the adenylosuccinate synthase family of proteins. The encoded muscle-specific enzyme plays a role in the purine nucleotide cycle by catalyzing the first step in the conversion of inosine monophosphate (IMP) to adenosine monophosphate (AMP). Mutations in this gene may cause adolescent onset distal myopathy. Alternative splicing results in multiple transcript variants.
ABLIM2 84448 actin binding LIM protein family member 2 ENSG00000163995 NA
RP11-290D2.6 ENSG00000273149 NA ENSG00000273149 NA
CLIP1 6249 CAP-Gly domain containing linker protein 1 ENSG00000130779 The protein encoded by this gene links endocytic vesicles to microtubules. This gene is highly expressed in Reed-Sternberg cells of Hodgkin disease. Several transcript variants encoding different isoforms have been found for this gene.
USO1 8615 USO1 vesicle transport factor ENSG00000138768 The protein encoded by this gene is a peripheral membrane protein which recycles between the cytosol and the Golgi apparatus during interphase. It is regulated by phosphorylation: dephosphorylated protein associates with the Golgi membrane and dissociates from the membrane upon phosphorylation. Ras-associated protein 1 recruits this protein to coat protein complex II (COPII) vesicles during budding from the endoplasmic reticulum, where it interacts with a set of COPII vesicle-associated SNAREs to form a cis-SNARE complex that promotes targeting to the Golgi apparatus. Alternative splicing results in multiple transcript variants.
ATP2A1-AS1 100289092 ATP2A1 antisense RNA 1 ENSG00000260442 NA
CAPN3 825 calpain 3 ENSG00000092529 Calpain, a heterodimer consisting of a large and a small subunit, is a major intracellular protease, although its function has not been well established. This gene encodes a muscle-specific member of the calpain large subunit family that specifically binds to titin. Mutations in this gene are associated with limb-girdle muscular dystrophies type 2A. Alternate promoters and alternative splicing result in multiple transcript variants encoding different isoforms and some variants are ubiquitously expressed.
CHRNB1 1140 cholinergic receptor nicotinic beta 1 subunit ENSG00000170175 The muscle acetylcholine receptor is composed of five subunits: two alpha subunits and one beta, one gamma, and one delta subunit. This gene encodes the beta subunit of the acetylcholine receptor. The acetylcholine receptor changes conformation upon acetylcholine binding leading to the opening of an ion-conducting channel across the plasma membrane. Mutations in this gene are associated with slow-channel congenital myasthenic syndrome.
RP11-164J13.1 ENSG00000258461 NA ENSG00000258461 NA
SVIL 6840 supervillin ENSG00000197321 This gene encodes a bipartite protein with distinct amino- and carboxy-terminal domains. The amino-terminus contains nuclear localization signals and the carboxy-terminus contains numerous consecutive sequences with extensive similarity to proteins in the gelsolin family of actin-binding proteins, which cap, nucleate, and/or sever actin filaments. The gene product is tightly associated with both actin filaments and plasma membranes, suggesting a role as a high-affinity link between the actin cytoskeleton and the membrane. The encoded protein appears to aid in both myosin II assembly during cell spreading and disassembly of focal adhesions. Two transcript variants encoding different isoforms of supervillin have been described.
CNBP 7555 CCHC-type zinc finger nucleic acid binding protein ENSG00000169714 This gene encodes a nucleic-acid binding protein with seven zinc-finger domains. The protein has a preference for binding single stranded DNA and RNA. The protein functions in cap-independent translation of ornithine decarboxylase mRNA, and may also function in sterol-mediated transcriptional regulation. A CCTG expansion in the first intron of this gene results in myotonic dystrophy type 2. Multiple transcript variants encoding different isoforms have been found for this gene.
HSPB6 126393 heat shock protein family B (small) member 6 ENSG00000004776 This locus encodes a heat shock protein. The encoded protein likely plays a role in smooth muscle relaxation.
ACTN2 88 actinin alpha 2 ENSG00000077522 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene.
CLTCL1 8218 clathrin heavy chain like 1 ENSG00000070371 This gene is a member of the clathrin heavy chain family and encodes a major protein of the polyhedral coat of coated pits and vesicles. Chromosomal aberrations involving this gene are associated with meningioma, DiGeorge syndrome, and velo-cardio-facial syndrome. Multiple transcript variants encoding different isoforms have been found for this gene.
MAPK12 6300 mitogen-activated protein kinase 12 ENSG00000188130 Activation of members of the mitogen-activated protein kinase family is a major mechanism for transduction of extracellular signals. Stress-activated protein kinases are one subclass of MAP kinases. The protein encoded by this gene functions as a signal transducer during differentiation of myoblasts to myotubes.
GLRX 2745 glutaredoxin ENSG00000173221 This gene encodes a member of the glutaredoxin family. The encoded protein is a cytoplasmic enzyme catalyzing the reversible reduction of glutathione-protein mixed disulfides. This enzyme highly contributes to the antioxidant defense system. It is crucial for several signalling pathways by controlling the S-glutathionylation status of signalling mediators. It is involved in beta-amyloid toxicity and Alzheimer’s disease. Multiple alternatively spliced transcript variants encoding the same protein have been identified.
RP5-940J5.9 ENSG00000269968 NA ENSG00000269968 NA
MSS51 118490 MSS51 mitochondrial translational activator ENSG00000166343 NA
FXR1 8087 FMR1 autosomal homolog 1 ENSG00000114416 The protein encoded by this gene is an RNA binding protein that interacts with the functionally-similar proteins FMR1 and FXR2. These proteins shuttle between the nucleus and cytoplasm and associate with polyribosomes, predominantly with the 60S ribosomal subunit. Three transcript variants encoding different isoforms have been found for this gene.
ASB8 140461 ankyrin repeat and SOCS box containing 8 ENSG00000177981 NA
BAG3 9531 BCL2 associated athanogene 3 ENSG00000151929 BAG proteins compete with Hip for binding to the Hsc70/Hsp70 ATPase domain and promote substrate release. All the BAG proteins have an approximately 45-amino acid BAG domain near the C terminus but differ markedly in their N-terminal regions. The protein encoded by this gene contains a WW domain in the N-terminal region and a BAG domain in the C-terminal region. The BAG domains of BAG1, BAG2, and BAG3 interact specifically with the Hsc70 ATPase domain in vitro and in mammalian cells. All 3 proteins bind with high affinity to the ATPase domain of Hsc70 and inhibit its chaperone activity in a Hip-repressible manner.
SYNPO 11346 synaptopodin ENSG00000171992 Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]).
CAND2 23066 cullin-associated and neddylation-dissociated 2 (putative) ENSG00000144712 NA
ANKRD23 200539 ankyrin repeat domain 23 ENSG00000163126 This gene is a member of the muscle ankyrin repeat protein (MARP) family and encodes a protein with four tandem ankyrin-like repeats. The protein is localized to the nucleus, functioning as a transcriptional regulator. Expression of this protein is induced during recovery following starvation.
MYO18A 399687 myosin XVIIIA ENSG00000196535 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",7,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 8 Annotations

out <- mygene::queryMany(gene_list[8,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol summary X_id name notfound
ENSG00000115414 FN1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 fibronectin 1 NA
ENSG00000108821 COL1A1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1277 collagen type I alpha 1 NA
ENSG00000164692 COL1A2 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1278 collagen type I alpha 2 NA
ENSG00000168542 COL3A1 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1281 collagen type III alpha 1 NA
ENSG00000163359 COL6A3 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. 1293 collagen type VI alpha 3 NA
ENSG00000120708 TGFBI This gene encodes an RGD-containing protein that binds to type I, II and IV collagens. The RGD motif is found in many extracellular matrix proteins modulating cell adhesion and serves as a ligand recognition sequence for several integrins. This protein plays a role in cell-collagen interactions and may be involved in endochondrial bone formation in cartilage. The protein is induced by transforming growth factor-beta and acts to inhibit cell adhesion. Mutations in this gene are associated with multiple types of corneal dystrophy. 7045 transforming growth factor beta induced NA
ENSG00000163661 PTX3 NA 5806 pentraxin 3 NA
ENSG00000196549 MME This gene encodes a common acute lymphocytic leukemia antigen that is an important cell surface marker in the diagnosis of human acute lymphocytic leukemia (ALL). This protein is present on leukemic cells of pre-B phenotype, which represent 85% of cases of ALL. This protein is not restricted to leukemic cells, however, and is found on a variety of normal tissues. It is a glycoprotein that is particularly abundant in kidney, where it is present on the brush border of proximal tubules and on glomerular epithelium. The protein is a neutral endopeptidase that cleaves peptides at the amino side of hydrophobic residues and inactivates several peptide hormones including glucagon, enkephalins, substance P, neurotensin, oxytocin, and bradykinin. This gene, which encodes a 100-kD type II transmembrane glycoprotein, exists in a single copy of greater than 45 kb. The 5’ untranslated region of this gene is alternatively spliced, resulting in four separate mRNA transcripts. The coding region is not affected by alternative splicing. 4311 membrane metallo-endopeptidase NA
ENSG00000137801 THBS1 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. 7057 thrombospondin 1 NA
ENSG00000111799 COL12A1 This gene encodes the alpha chain of type XII collagen, a member of the FACIT (fibril-associated collagens with interrupted triple helices) collagen family. Type XII collagen is a homotrimer found in association with type I collagen, an association that is thought to modify the interactions between collagen I fibrils and the surrounding matrix. Alternatively spliced transcript variants encoding different isoforms have been identified. 1303 collagen type XII alpha 1 NA
ENSG00000107984 DKK1 This gene encodes a protein that is a member of the dickkopf family. It is a secreted protein with two cysteine rich regions and is involved in embryonic development through its inhibition of the WNT signaling pathway. Elevated levels of DKK1 in bone marrow plasma and peripheral blood is associated with the presence of osteolytic bone lesions in patients with multiple myeloma. 22943 dickkopf WNT signaling pathway inhibitor 1 NA
ENSG00000091986 CCDC80 NA 151887 coiled-coil domain containing 80 NA
ENSG00000166923 GREM1 This gene encodes a member of the BMP (bone morphogenic protein) antagonist family. Like BMPs, BMP antagonists contain cystine knots and typically form homo- and heterodimers. The CAN (cerberus and dan) subfamily of BMP antagonists, to which this gene belongs, is characterized by a C-terminal cystine knot with an eight-membered ring. The antagonistic effect of the secreted glycosylated protein encoded by this gene is likely due to its direct binding to BMP proteins. As an antagonist of BMP, this gene may play a role in regulating organogenesis, body patterning, and tissue differentiation. In mouse, this protein has been shown to relay the sonic hedgehog (SHH) signal from the polarizing region to the apical ectodermal ridge during limb bud outgrowth. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 26585 gremlin 1, DAN family BMP antagonist NA
ENSG00000106366 SERPINE1 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 5054 serpin family E member 1 NA
ENSG00000166147 FBN1 This gene encodes a member of the fibrillin family of proteins. The encoded preproprotein is proteolytically processed to generate a mature extracellular matrix glycoprotein that serves as a structural component of calcium-binding microfibrils. These microfibrils provide force-bearing structural support in elastic and nonelastic connective tissue throughout the body. Mutations in this gene are associated with Marfan syndrome and the related MASS phenotype, as well as ectopia lentis syndrome, Weill-Marchesani syndrome, and Shprintzen-Goldberg syndrome. 2200 fibrillin 1 NA
ENSG00000204262 COL5A2 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. 1290 collagen type V alpha 2 NA
ENSG00000077942 FBLN1 Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. 2192 fibulin 1 NA
ENSG00000103888 CEMIP NA 57214 cell migration inducing protein, hyaluronan binding NA
ENSG00000149257 SERPINH1 This gene encodes a member of the serpin superfamily of serine proteinase inhibitors. The encoded protein is localized to the endoplasmic reticulum and plays a role in collagen biosynthesis as a collagen-specific molecular chaperone. Autoantibodies to the encoded protein have been found in patients with rheumatoid arthritis. Expression of this gene may be a marker for cancer, and nucleotide polymorphisms in this gene may be associated with preterm birth caused by preterm premature rupture of membranes. Alternatively spliced transcript variants have been observed for this gene, and a pseudogene of this gene is located on the short arm of chromosome 9. 871 serpin family H member 1 NA
ENSG00000065308 TRAM2 TRAM2 is a component of the translocon, a gated macromolecular channel that controls the posttranslational processing of nascent secretory and membrane proteins at the endoplasmic reticulum (ER) membrane. 9697 translocation associated membrane protein 2 NA
ENSG00000128595 CALU The product of this gene is a calcium-binding protein localized in the endoplasmic reticulum (ER) and it is involved in such ER functions as protein folding and sorting. This protein belongs to a family of multiple EF-hand proteins (CERC) that include reticulocalbin, ERC-55, and Cab45 and the product of this gene. Alternatively spliced transcript variants encoding different isoforms have been identified. 813 calumenin NA
ENSG00000113083 LOX This gene encodes a member of the lysyl oxidase family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate a regulatory propeptide and the mature enzyme. The copper-dependent amine oxidase activity of this enzyme functions in the crosslinking of collagens and elastin, while the propeptide may play a role in tumor suppression. 4015 lysyl oxidase NA
ENSG00000106333 PCOLCE Fibrillar collagen types I-III are synthesized as precursor molecules known as procollagens. These precursors contain amino- and carboxyl-terminal peptide extensions known as N- and C-propeptides, respectively, which are cleaved, upon secretion of procollagen from the cell, to yield the mature triple helical, highly structured fibrils. This gene encodes a glycoprotein which binds and drives the enzymatic cleavage of type I procollagen and heightens C-proteinase activity. 5118 procollagen C-endopeptidase enhancer NA
ENSG00000113739 STC2 This gene encodes a secreted, homodimeric glycoprotein that is expressed in a wide variety of tissues and may have autocrine or paracrine functions. The encoded protein has 10 of its 15 cysteine residues conserved among stanniocalcin family members and is phosphorylated by casein kinase 2 exclusively on its serine residues. Its C-terminus contains a cluster of histidine residues which may interact with metal ions. The protein may play a role in the regulation of renal and intestinal calcium and phosphate transport, cell metabolism, or cellular calcium/phosphate homeostasis. Constitutive overexpression of human stanniocalcin 2 in mice resulted in pre- and postnatal growth restriction, reduced bone and skeletal muscle growth, and organomegaly. Expression of this gene is induced by estrogen and altered in some breast cancers. 8614 stanniocalcin 2 NA
ENSG00000135919 SERPINE2 This gene encodes a member of the serpin family of proteins, a group of proteins that inhibit serine proteases. Thrombin, urokinase, plasmin and trypsin are among the proteases that this family member can inhibit. This gene is a susceptibility gene for chronic obstructive pulmonary disease and for emphysema. Alternative splicing results in multiple transcript variants. 5270 serpin family E member 2 NA
ENSG00000130635 COL5A1 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. The encoded procollagen protein occurs commonly as the heterotrimer pro-alpha1(V)-pro-alpha1(V)-pro-alpha2(V). Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. Alternative splicing of this gene results in multiple transcript variants. 1289 collagen type V alpha 1 NA
ENSG00000163430 FSTL1 This gene encodes a protein with similarity to follistatin, an activin-binding protein. It contains an FS module, a follistatin-like sequence containing 10 conserved cysteine residues. This gene product is thought to be an autoantigen associated with rheumatoid arthritis. 11167 follistatin like 1 NA
ENSG00000087245 MMP2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. 4313 matrix metallopeptidase 2 NA
ENSG00000049449 RCN1 Reticulocalbin 1 is a calcium-binding protein located in the lumen of the ER. The protein contains six conserved regions with similarity to a high affinity Ca(+2)-binding motif, the EF-hand. High conservation of amino acid residues outside of these motifs, in comparison to mouse reticulocalbin, is consistent with a possible biochemical function besides that of calcium binding. In human endothelial and prostate cancer cell lines this protein localizes to the plasma membrane. 5954 reticulocalbin 1 NA
ENSG00000049449 DKFZp686K1684 NA 440034 uncharacterized LOC440034 NA
ENSG00000152952 PLOD2 The protein encoded by this gene is a membrane-bound homodimeric enzyme that is localized to the cisternae of the rough endoplasmic reticulum. The enzyme (cofactors iron and ascorbate) catalyzes the hydroxylation of lysyl residues in collagen-like peptides. The resultant hydroxylysyl groups are attachment sites for carbohydrates in collagen and thus are critical for the stability of intermolecular crosslinks. Some patients with Ehlers-Danlos syndrome type VIB have deficiencies in lysyl hydroxylase activity. Mutations in the coding region of this gene are associated with Bruck syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. 5352 procollagen-lysine,2-oxoglutarate 5-dioxygenase 2 NA
ENSG00000122707 RECK The protein encoded by this gene is a cysteine-rich, extracellular protein with protease inhibitor-like domains whose expression is suppressed strongly in many tumors and cells transformed by various kinds of oncogenes. In normal cells, this membrane-anchored glycoprotein may serve as a negative regulator for matrix metalloproteinase-9, a key enzyme involved in tumor invasion and metastasis. Several transcript variants encoding different isoforms have been found for this gene. 8434 reversion inducing cysteine rich protein with kazal motifs NA
ENSG00000100097 LGALS1 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. This gene product may act as an autocrine negative growth factor that regulates cell proliferation. 3956 lectin, galactoside binding soluble 1 NA
ENSG00000164733 CTSB This gene encodes a member of the C1 family of peptidases. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cathepsin B light and heavy chains, which can dimerize to form the double chain form of the enzyme. This enzyme is a lysosomal cysteine protease with both endopeptidase and exopeptidase activity that may play a role in protein turnover. It is also known as amyloid precursor protein secretase and is involved in the proteolytic processing of amyloid precursor protein (APP). Incomplete proteolytic processing of APP has been suggested to be a causative factor in Alzheimer’s disease, the most common cause of dementia. Overexpression of the encoded protein has been associated with esophageal adenocarcinoma and other tumors. Multiple pseudogenes of this gene have been identified. 1508 cathepsin B NA
ENSG00000141756 FKBP10 The protein encoded by this gene belongs to the FKBP-type peptidyl-prolyl cis/trans isomerase (PPIase) family. This protein localizes to the endoplasmic reticulum and acts as a molecular chaperone. Alternatively spliced variants encoding different isoforms have been reported, but their biological validity has not been determined. 60681 FK506 binding protein 10 NA
ENSG00000157227 MMP14 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMP’s are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. However, the protein encoded by this gene is a member of the membrane-type MMP (MT-MMP) subfamily; each member of this subfamily contains a potential transmembrane domain suggesting that these proteins are expressed at the cell surface rather than secreted. This protein activates MMP2 protein, and this activity may be involved in tumor invasion. 4323 matrix metallopeptidase 14 NA
ENSG00000100644 HIF1A This gene encodes the alpha subunit of transcription factor hypoxia-inducible factor-1 (HIF-1), which is a heterodimer composed of an alpha and a beta subunit. HIF-1 functions as a master regulator of cellular and systemic homeostatic response to hypoxia by activating transcription of many genes, including those involved in energy metabolism, angiogenesis, apoptosis, and other genes whose protein products increase oxygen delivery or facilitate metabolic adaptation to hypoxia. HIF-1 thus plays an essential role in embryonic vascularization, tumor angiogenesis and pathophysiology of ischemic disease. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. 3091 hypoxia inducible factor 1 alpha subunit NA
ENSG00000167460 TPM4 This gene encodes a member of the tropomyosin family of actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosins are dimers of coiled-coil proteins that polymerize end-to-end along the major groove in most actin filaments. They provide stability to the filaments and regulate access of other actin-binding proteins. In muscle cells, they regulate muscle contraction by controlling the binding of myosin heads to the actin filament. Multiple transcript variants encoding different isoforms have been found for this gene. 7171 tropomyosin 4 NA
ENSG00000164442 CITED2 The protein encoded by this gene inhibits transactivation of HIF1A-induced genes by competing with binding of hypoxia-inducible factor 1-alpha to p300-CH1. Mutations in this gene are a cause of cardiac septal defects. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 10370 Cbp/p300 interacting transactivator with Glu/Asp rich carboxy-terminal domain 2 NA
ENSG00000164932 CTHRC1 This locus encodes a protein that may play a role in the cellular response to arterial injury through involvement in vascular remodeling. Mutations at this locus have been associated with Barrett esophagus and esophageal adenocarcinoma. Alternatively spliced transcript variants have been described. 115908 collagen triple helix repeat containing 1 NA
ENSG00000139926 FRMD6 NA 122786 FERM domain containing 6 NA
ENSG00000156535 CD109 This gene encodes a glycosyl phosphatidylinositol (GPI)-linked glycoprotein that localizes to the surface of platelets, activated T-cells, and endothelial cells. The protein binds to and negatively regulates signalling by transforming growth factor beta (TGF-beta). Multiple transcript variants encoding different isoforms have been found for this gene. 135228 CD109 molecule NA
ENSG00000011028 MRC2 This gene encodes a member of the mannose receptor family of proteins that contain a fibronectin type II domain and multiple C-type lectin-like domains. The encoded protein plays a role in extracellular matrix remodeling by mediating the internalization and lysosomal degradation of collagen ligands. Expression of this gene may play a role in the tumorigenesis and metastasis of several malignancies including breast cancer, gliomas and metastatic bone disease. 9902 mannose receptor C type 2 NA
ENSG00000138119 MYOF Mutations in dysferlin, a protein associated with the plasma membrane, can cause muscle weakness that affects both proximal and distal muscles. The protein encoded by this gene is a type II membrane protein that is structurally similar to dysferlin. It is a member of the ferlin family and associates with both plasma and nuclear membranes. The protein contains C2 domains that play a role in calcium-mediated membrane fusion events, suggesting that it may be involved in membrane regeneration and repair. Two transcript variants encoding different isoforms have been found for this gene. Other possible variants have been detected, but their full-length nature has not been determined. 26509 myoferlin NA
ENSG00000116774 OLFML3 NA 56944 olfactomedin like 3 NA
ENSG00000134013 LOXL2 This gene encodes a member of the lysyl oxidase gene family. The prototypic member of the family is essential to the biogenesis of connective tissue, encoding an extracellular copper-dependent amine oxidase that catalyses the first step in the formation of crosslinks in collagens and elastin. A highly conserved amino acid sequence at the C-terminus end appears to be sufficient for amine oxidase activity, suggesting that each family member may retain this function. The N-terminus is poorly conserved and may impart additional roles in developmental regulation, senescence, tumor suppression, cell growth control, and chemotaxis to each member of the family. 4017 lysyl oxidase like 2 NA
ENSG00000123384 LRP1 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. 4035 LDL receptor related protein 1 NA
ENSG00000142552 RCN3 NA 57333 reticulocalbin 3 NA
ENSG00000067167 TRAM1 This gene encodes a multi-pass membrane protein that is part of the mammalian endoplasmic reticulum. The encoded protein influences glycosylation and facilitates the translocation of secretory proteins across the endoplasmic reticulum membrane by regulating which domains of the nascent polypeptide chain are visible to the cytosol during a translocational pause. 23471 translocation associated membrane protein 1 NA
ENSG00000135318 NT5E The protein encoded by this gene is a plasma membrane protein that catalyzes the conversion of extracellular nucleotides to membrane-permeable nucleosides. The encoded protein is used as a determinant of lymphocyte differentiation. Defects in this gene can lead to the calcification of joints and arteries. Two transcript variants encoding different isoforms have been found for this gene. 4907 5’-nucleotidase ecto NA
ENSG00000148926 ADM The protein encoded by this gene is a preprohormone which is cleaved to form two biologically active peptides, adrenomedullin and proadrenomedullin N-terminal 20 peptide. Adrenomedullin is a 52 aa peptide with several functions, including vasodilation, regulation of hormone secretion, promotion of angiogenesis, and antimicrobial activity. The antimicrobial activity is antibacterial, as the peptide has been shown to kill E. coli and S. aureus at low concentration. 133 adrenomedullin NA
ENSG00000272761 NA NA NA NA TRUE
ENSG00000168615 ADAM9 This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins, and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. The protein encoded by this gene interacts with SH3 domain-containing proteins, binds mitotic arrest deficient 2 beta protein, and is also involved in TPA-induced ectodomain shedding of membrane-anchored heparin-binding EGF-like growth factor. Several alternatively spliced transcript variants have been identified for this gene. 8754 ADAM metallopeptidase domain 9 NA
ENSG00000083444 PLOD1 Lysyl hydroxylase is a membrane-bound homodimeric protein localized to the cisternae of the endoplasmic reticulum. The enzyme (cofactors iron and ascorbate) catalyzes the hydroxylation of lysyl residues in collagen-like peptides. The resultant hydroxylysyl groups are attachment sites for carbohydrates in collagen and thus are critical for the stability of intermolecular crosslinks. Some patients with Ehlers-Danlos syndrome type VI have deficiencies in lysyl hydroxylase activity. Two transcript variants encoding different isoforms have been found for this gene. 5351 procollagen-lysine,2-oxoglutarate 5-dioxygenase 1 NA
ENSG00000135862 LAMC1 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins, composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively), have a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the gamma chain isoform laminin, gamma 1. The gamma 1 chain, formerly thought to be a beta chain, contains structural domains similar to beta chains, however, lacks the short alpha region separating domains I and II. The structural organization of this gene also suggested that it had diverged considerably from the beta chain genes. Embryos of transgenic mice in which both alleles of the gamma 1 chain gene were inactivated by homologous recombination, lacked basement membranes, indicating that laminin, gamma 1 chain is necessary for laminin heterotrimer assembly. It has been inferred by analogy with the strikingly similar 3’ UTR sequence in mouse laminin gamma 1 cDNA, that multiple polyadenylation sites are utilized in human to generate the 2 different sized mRNAs (5.5 and 7.5 kb) seen on Northern analysis. 3915 laminin subunit gamma 1 NA
ENSG00000019549 SNAI2 This gene encodes a member of the Snail family of C2H2-type zinc finger transcription factors. The encoded protein acts as a transcriptional repressor that binds to E-box motifs and is also likely to repress E-cadherin transcription in breast carcinoma. This protein is involved in epithelial-mesenchymal transitions and has antiapoptotic activity. Mutations in this gene may be associated with sporatic cases of neural tube defects. 6591 snail family zinc finger 2 NA
ENSG00000183160 TMEM119 NA 338773 transmembrane protein 119 NA
ENSG00000105825 TFPI2 This gene encodes a member of the Kunitz-type serine proteinase inhibitor family. The protein can inhibit a variety of serine proteases including factor VIIa/tissue factor, factor Xa, plasmin, trypsin, chymotryspin and plasma kallikrein. This gene has been identified as a tumor suppressor gene in several types of cancer. Alternative splicing results in multiple transcript variants. 7980 tissue factor pathway inhibitor 2 NA
ENSG00000138758 SEPT11 SEPT11 belongs to the conserved septin family of filament-forming cytoskeletal GTPases that are involved in a variety of cellular functions including cytokinesis and vesicle trafficking (Hanai et al., 2004 [PubMed 15196925]; Nagata et al., 2004 [PubMed 15485874]). 55752 septin 11 NA
ENSG00000143387 CTSK The protein encoded by this gene is a lysosomal cysteine proteinase involved in bone remodeling and resorption. This protein, which is a member of the peptidase C1 protein family, is predominantly expressed in osteoclasts. However, the encoded protein is also expressed in a significant fraction of human breast cancers, where it could contribute to tumor invasiveness. Mutations in this gene are the cause of pycnodysostosis, an autosomal recessive disease characterized by osteosclerosis and short stature. 1513 cathepsin K NA
ENSG00000224729 PCOLCE-AS1 NA 100129845 PCOLCE antisense RNA 1 NA
ENSG00000091136 LAMB1 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 1. The beta 1 chain has 7 structurally distinct domains which it shares with other beta chain isomers. The C-terminal helical region containing domains I and II are separated by domain alpha, domains III and V contain several EGF-like repeats, and domains IV and VI have a globular conformation. Laminin, beta 1 is expressed in most tissues that produce basement membranes, and is one of the 3 chains constituting laminin 1, the first laminin isolated from Engelbreth-Holm-Swarm (EHS) tumor. A sequence in the beta 1 chain that is involved in cell attachment, chemotaxis, and binding to the laminin receptor was identified and shown to have the capacity to inhibit metastasis. 3912 laminin subunit beta 1 NA
ENSG00000198431 TXNRD1 This gene encodes a member of the family of pyridine nucleotide oxidoreductases. This protein reduces thioredoxins as well as other substrates, and plays a role in selenium metabolism and protection against oxidative stress. The functional enzyme is thought to be a homodimer which uses FAD as a cofactor. Each subunit contains a selenocysteine (Sec) residue which is required for catalytic activity. The selenocysteine is encoded by the UGA codon that normally signals translation termination. The 3’ UTR of selenocysteine-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), that is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. Alternative splicing results in several transcript variants encoding the same or different isoforms. 7296 thioredoxin reductase 1 NA
ENSG00000166073 GPR176 Members of the G protein-coupled receptor family, such as GPR176, are cell surface receptors involved in responses to hormones, growth factors, and neurotransmitters (Hata et al., 1995 [PubMed 7893747]). 11245 G protein-coupled receptor 176 NA
ENSG00000189184 PCDH18 This gene belongs to the protocadherin gene family, a subfamily of the cadherin superfamily. This gene encodes a protein which contains 6 extracellular cadherin domains, a transmembrane domain and a cytoplasmic tail differing from those of the classical cadherins. Although its specific function is undetermined, the cadherin-related neuronal receptor is thought to play a role in the establishment and function of specific cell-cell connections in the brain. 54510 protocadherin 18 NA
ENSG00000223380 NA NA NA NA TRUE
ENSG00000057019 DCBLD2 NA 131566 discoidin, CUB and LCCL domain containing 2 NA
ENSG00000164111 ANXA5 The protein encoded by this gene belongs to the annexin family of calcium-dependent phospholipid binding proteins some of which have been implicated in membrane-related events along exocytotic and endocytotic pathways. Annexin 5 is a phospholipase A2 and protein kinase C inhibitory protein with calcium channel activity and a potential role in cellular signal transduction, inflammation, growth and differentiation. Annexin 5 has also been described as placental anticoagulant protein I, vascular anticoagulant-alpha, endonexin II, lipocortin V, placental protein 4 and anchorin CII. The gene spans 29 kb containing 13 exons, and encodes a single transcript of approximately 1.6 kb and a protein product with a molecular weight of about 35 kDa. 308 annexin A5 NA
ENSG00000136010 ALDH1L2 This gene encodes a member of both the aldehyde dehydrogenase superfamily and the formyl transferase superfamily. This member is the mitochondrial form of 10-formyltetrahydrofolate dehydrogenase (FDH), which converts 10-formyltetrahydrofolate to tetrahydrofolate and CO2 in an NADP(+)-dependent reaction, and plays an essential role in the distribution of one-carbon groups between the cytosolic and mitochondrial compartments of the cell. Alternatively spliced transcript variants have been found for this gene. 160428 aldehyde dehydrogenase 1 family member L2 NA
ENSG00000089597 GANAB NA 23193 glucosidase II alpha subunit NA
ENSG00000105281 SLC1A5 The SLC1A5 gene encodes a sodium-dependent neutral amino acid transporter that can act as a receptor for RD114/type D retrovirus (Larriba et al., 2001 [PubMed 11781704]). 6510 solute carrier family 1 member 5 NA
ENSG00000164647 STEAP1 This gene is predominantly expressed in prostate tissue, and is found to be upregulated in multiple cancer cell lines. The gene product is predicted to be a six-transmembrane protein, and was shown to be a cell surface antigen significantly expressed at cell-cell junctions. 26872 six transmembrane epithelial antigen of the prostate 1 NA
ENSG00000087303 NID2 This gene encodes a member of the nidogen family of basement membrane proteins. This protein is a cell-adhesion protein that binds collagens I and IV and laminin and may be involved in maintaining the structure of the basement membrane. 22795 nidogen 2 NA
ENSG00000130508 PXDN This gene encodes a heme-containing peroxidase that is secreted into the extracellular matrix. It is involved in extracellular matrix formation, and may function in the physiological and pathological fibrogenic response in fibrotic kidney. Mutations in this gene cause corneal opacification and other ocular anomalies, and also microphthalmia and anterior segment dysgenesis. 7837 peroxidasin NA
ENSG00000100196 KDELR3 This gene encodes a member of the KDEL endoplasmic reticulum protein retention receptor family. Retention of resident soluble proteins in the lumen of the endoplasmic reticulum (ER) is achieved in both yeast and animal cells by their continual retrieval from the cis-Golgi, or a pre-Golgi compartment. Sorting of these proteins is dependent on a C-terminal tetrapeptide signal, usually lys-asp-glu-leu (KDEL) in animal cells, and his-asp-glu-leu (HDEL) in S. cerevisiae. This process is mediated by a receptor that recognizes, and binds the tetrapeptide-containing protein, and returns it to the ER. In yeast, the sorting receptor encoded by a single gene, ERD2, is a seven-transmembrane protein. Unlike yeast, several human homologs of the ERD2 gene, constituting the KDEL receptor gene family, have been described. KDELR3 was the third member of the family to be identified. Alternate splicing results in multiple transcript variants. 11015 KDEL endoplasmic reticulum protein retention receptor 3 NA
ENSG00000173530 TNFRSF10D The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor contains an extracellular TRAIL-binding domain, a transmembrane domain, and a truncated cytoplamic death domain. This receptor does not induce apoptosis, and has been shown to play an inhibitory role in TRAIL-induced cell apoptosis. 8793 tumor necrosis factor receptor superfamily member 10d NA
ENSG00000182718 ANXA2 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions as an autocrine factor which heightens osteoclast formation and bone resorption. This gene has three pseudogenes located on chromosomes 4, 9 and 10, respectively. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 302 annexin A2 NA
ENSG00000009413 REV3L NA 5980 REV3 like, DNA directed polymerase zeta catalytic subunit NA
ENSG00000166130 IKBIP NA 121457 IKBKB interacting protein NA
ENSG00000134294 SLC38A2 NA 54407 solute carrier family 38 member 2 NA
ENSG00000184575 XPOT This gene encodes a protein belonging to the RAN-GTPase exportin family that mediates export of tRNA from the nucleus to the cytoplasm. Translocation of tRNA to the cytoplasm occurs once exportin has bound both tRNA and GTP-bound RAN. 11260 exportin for tRNA NA
ENSG00000044574 HSPA5 The protein encoded by this gene is a member of the heat shock protein 70 (HSP70) family. It is localized in the lumen of the endoplasmic reticulum (ER), and is involved in the folding and assembly of proteins in the ER. As this protein interacts with many ER proteins, it may play a key role in monitoring protein transport through the cell. 3309 heat shock protein family A (Hsp70) member 5 NA
ENSG00000039560 RAI14 NA 26064 retinoic acid induced 14 NA
ENSG00000122642 FKBP9 NA 11328 FK506 binding protein 9 NA
ENSG00000134352 IL6ST The protein encoded by this gene is a signal transducer shared by many cytokines, including interleukin 6 (IL6), ciliary neurotrophic factor (CNTF), leukemia inhibitory factor (LIF), and oncostatin M (OSM). This protein functions as a part of the cytokine receptor complex. The activation of this protein is dependent upon the binding of cytokines to their receptors. vIL6, a protein related to IL6 and encoded by the Kaposi sarcoma-associated herpesvirus, can bypass the interleukin 6 receptor (IL6R) and directly activate this protein. Knockout studies in mice suggest that this gene plays a critical role in regulating myocyte apoptosis. Alternatively spliced transcript variants have been described. A related pseudogene has been identified on chromosome 17. 3572 interleukin 6 signal transducer NA
ENSG00000145817 YIPF5 NA 81555 Yip1 domain family member 5 NA
ENSG00000083312 TNPO1 This gene encodes the beta subunit of the karyopherin receptor complex which interacts with nuclear localization signals to target nuclear proteins to the nucleus. The karyopherin receptor complex is a heterodimer of an alpha subunit which recognizes the nuclear localization signal and a beta subunit which docks the complex at nucleoporins. Alternate splicing of this gene results in two transcript variants encoding different proteins. 3842 transportin 1 NA
ENSG00000259279 CTD-2033D15.1 NA ENSG00000259279 NA NA
ENSG00000106105 GARS This gene encodes glycyl-tRNA synthetase, one of the aminoacyl-tRNA synthetases that charge tRNAs with their cognate amino acids. The encoded enzyme is an (alpha)2 dimer which belongs to the class II family of tRNA synthetases. It has been shown to be a target of autoantibodies in the human autoimmune diseases, polymyositis or dermatomyositis. Two transcript variants encoding different isoforms have been found for this gene. 2617 glycyl-tRNA synthetase NA
ENSG00000160691 SHC1 This gene encodes three main isoforms that differ in activities and subcellular location. While all three are adapter proteins in signal transduction pathways, the longest (p66Shc) may be involved in regulating life span and the effects of reactive oxygen species. The other two isoforms, p52Shc and p46Shc, link activated receptor tyrosine kinases to the Ras pathway by recruitment of the GRB2/SOS complex. p66Shc is not involved in Ras activation. Unlike the other two isoforms, p46Shc is targeted to the mitochondrial matrix. Several transcript variants encoding different isoforms have been found for this gene. 6464 SHC (Src homology 2 domain containing) transforming protein 1 NA
ENSG00000166794 PPIB The protein encoded by this gene is a cyclosporine-binding protein and is mainly located within the endoplasmic reticulum. It is associated with the secretory pathway and released in biological fluids. This protein can bind to cells derived from T- and B-lymphocytes, and may regulate cyclosporine A-mediated immunosuppression. Variants have been identified in this protein that give rise to recessive forms of osteogenesis imperfecta. 5479 peptidylprolyl isomerase B NA
ENSG00000166250 CLMP This gene encodes a type I transmembrane protein that is localized to junctional complexes between endothelial and epithelial cells and may have a role in cell-cell adhesion. Expression of this gene in white adipose tissue is implicated in adipocyte maturation and development of obesity. This gene is also essential for normal intestinal development and mutations in the gene are associated with congenital short bowel syndrome. 79827 CXADR-like membrane protein NA
ENSG00000177311 ZBTB38 The protein encoded by this gene is a zinc finger transcriptional activator that binds methylated DNA. The encoded protein can form homodimers or heterodimers through the zinc finger domains. In mouse, inhibition of this protein has been associated with apoptosis in some cell types. 253461 zinc finger and BTB domain containing 38 NA
ENSG00000106080 FKBP14 The protein encoded by this gene is a member of the FK506-binding protein family of peptidyl-prolyl cis-trans isomerases. The encoded protein is found in the lumen of the endoplasmic reticulum, where it is thought to accelerate protein folding. Defects in this gene are a cause of a type of Ehlers-Danlos syndrome (EDS). Both a protein-coding variant and noncoding variants are transcribed from this gene. 55033 FK506 binding protein 14 NA
ENSG00000175592 FOSL1 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. Several transcript variants encoding different isoforms have been found for this gene. 8061 FOS like antigen 1 NA
ENSG00000151348 EXT2 This gene encodes one of two glycosyltransferases involved in the chain elongation step of heparan sulfate biosynthesis. Mutations in this gene cause the type II form of multiple exostoses. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. 2132 exostosin glycosyltransferase 2 NA
ENSG00000168374 ARF4 This gene is a member of the human ARF gene family whose members encode small guanine nucleotide-binding proteins that stimulate the ADP-ribosyltransferase activity of cholera toxin and play a role in vesicular trafficking and as activators of phospholipase D. The gene products include 5 ARF proteins and 11 ARF-like proteins and constitute one family of the RAS superfamily. The ARF proteins are categorized as class I, class II and class III; this gene is a class II member. The members of each class share a common gene organization. The ARF4 gene spans approximately 12kb and contains six exons and five introns. This gene is the most divergent member of the human ARFs. Conflicting map positions at 3p14 or 3p21 have been reported for this gene. 378 ADP ribosylation factor 4 NA
ENSG00000142871 CYR61 The secreted protein encoded by this gene is growth factor-inducible and promotes the adhesion of endothelial cells. The encoded protein interacts with several integrins and with heparan sulfate proteoglycan. This protein also plays a role in cell proliferation, differentiation, angiogenesis, apoptosis, and extracellular matrix formation. 3491 cysteine rich angiogenic inducer 61 NA
ENSG00000168140 VASN NA 114990 vasorin NA
ENSG00000162616 DNAJB4 The protein encoded by this gene is a molecular chaperone, tumor suppressor, and member of the heat shock protein-40 family. The encoded protein binds the cell adhesion protein E-cadherin and targets it to the plasma membrane. This protein also binds incorrectly folded E-cadherin and targets it for endoplasmic reticulum-associated degradation. This gene is a strong tumor suppressor for colorectal carcinoma, and downregulation of it may serve as a good biomarker for predicting patient outcomes. Several transcript variants encoding different isoforms have been found for this gene. 11080 DnaJ heat shock protein family (Hsp40) member B4 NA
ENSG00000128829 EIF2AK4 This gene encodes a member of a family of kinases that phosphorylate the alpha subunit of eukaryotic translation initiation factor-2 (EIF2), resulting in the downregulaton of protein synthesis. The encoded protein responds to amino acid deprivation by binding uncharged transfer RNAs. It may also be activated by glucose deprivation and viral infection. Mutations in this gene have been found in individuals suffering from autosomal recessive pulmonary venoocclusive-disease-2. 440275 eukaryotic translation initiation factor 2 alpha kinase 4 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",8,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 9 Annotations

out <- mygene::queryMany(gene_list[9,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
X_id summary name symbol query
4155 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. myelin basic protein MBP ENSG00000197971
2670 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. glial fibrillary acidic protein GFAP ENSG00000131095
ENSG00000266844 NA NA RP11-862L9.3 ENSG00000266844
ENSG00000237973 NA MT-CO1 pseudogene 12 MTCO1P12 ENSG00000237973
222166 NA maturin, neural progenitor differentiation regulator homolog (Xenopus) MTURN ENSG00000180354
ENSG00000229344 NA MT-CO2 pseudogene 12 MTCO2P12 ENSG00000229344
ENSG00000225630 NA mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 MTND2P28 ENSG00000225630
79957 NA progestin and adipoQ receptor family member 6 PAQR6 ENSG00000160781
28996 This gene encodes a conserved serine/threonine kinase that is a member of the homeodomain-interacting protein kinase family. The encoded protein interacts with homeodomain transcription factors and many other transcription factors such as p53, and can function as both a corepressor and a coactivator depending on the transcription factor and its subcellular localization. Multiple transcript variants encoding different isoforms have been found for this gene. homeodomain interacting protein kinase 2 HIPK2 ENSG00000064393
58473 NA pleckstrin homology domain containing B1 PLEKHB1 ENSG00000021300
58476 NA tumor protein p53 inducible nuclear protein 2 TP53INP2 ENSG00000078804
ENSG00000225972 NA mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 1 pseudogene 23 MTND1P23 ENSG00000225972
56650 NA claudin domain containing 1 CLDND1 ENSG00000080822
ENSG00000271043 NA MT-RNR2-like 2 MTRNR2L2 ENSG00000271043
8502 Armadillo-like proteins are characterized by a series of armadillo repeats, first defined in the Drosophila ‘armadillo’ gene product, that are typically 42 to 45 amino acids in length. These proteins can be divided into subfamilies based on their number of repeats, their overall sequence similarity, and the dispersion of the repeats throughout their sequences. Members of the p120(ctn)/plakophilin subfamily of Armadillo-like proteins, including CTNND1, CTNND2, PKP1, PKP2, PKP4, and ARVCF. PKP4 may be a component of desmosomal plaque and other adhesion plaques and is thought to be involved in regulating junctional plaque organization and cadherin function. Multiple transcript variants encoding different isoforms have been found for this gene. plakophilin 4 PKP4 ENSG00000144283
51148 NA cerebral endothelial cell adhesion molecule CERCAM ENSG00000167123
57571 CARNS1 (EC 6.3.2.11), a member of the ATP-grasp family of ATPases, catalyzes the formation of carnosine (beta-alanyl-L-histidine) and homocarnosine (gamma-aminobutyryl-L-histidine), which are found mainly in skeletal muscle and the central nervous system, respectively (Drozak et al., 2010 [PubMed 20097752]). carnosine synthase 1 CARNS1 ENSG00000172508
57698 NA shootin 1 SHTN1 ENSG00000187164
8682 This gene encodes a death effector domain-containing protein that functions as a negative regulator of apoptosis. The encoded protein is an endogenous substrate for protein kinase C. This protein is also overexpressed in type 2 diabetes mellitus, where it may contribute to insulin resistance in glucose uptake. Alternative splicing results in multiple transcript variants. phosphoprotein enriched in astrocytes 15 PEA15 ENSG00000162734
ENSG00000251660 NA NA AC007036.5 ENSG00000251660
5860 This gene encodes the enzyme dihydropteridine reductase, which catalyzes the NADH-mediated reduction of quinonoid dihydrobiopterin. This enzyme is an essential component of the pterin-dependent aromatic amino acid hydroxylating systems. Mutations in this gene resulting in QDPR deficiency include aberrant splicing, amino acid substitutions, insertions, or premature terminations. Dihydropteridine reductase deficiency presents as atypical phenylketonuria due to insufficient production of biopterin, a cofactor for phenylalanine hydroxylase. quinoid dihydropteridine reductase QDPR ENSG00000151552
11170 NA family with sequence similarity 107 member A FAM107A ENSG00000168309
55314 NA transmembrane protein 144 TMEM144 ENSG00000164124
5414 This gene is a member of the septin family of nucleotide binding proteins, originally described in yeast as cell division cycle regulatory proteins. Septins are highly conserved in yeast, Drosophila, and mouse, and appear to regulate cytoskeletal organization. Disruption of septin function disturbs cytokinesis and results in large multinucleate or polyploid cells. This gene is highly expressed in brain and heart. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. One of the isoforms (known as ARTS) is distinct; it is localized to the mitochondria, and has a role in apoptosis and cancer. septin 4 SEPT4 ENSG00000108387
400961 Most mRNAs, except for histones, contain a 3-prime poly(A) tail. Poly(A)-binding protein (PABP; see MIM 604679) enhances translation by circularizing mRNA through its interaction with the translation initiation factor EIF4G1 (MIM 600495) and the poly(A) tail. Various PABP-binding proteins regulate PABP activity, including PAIP1 (MIM 605184), a translational stimulator, and PAIP2A (MIM 605604) and PAIP2B, translational inhibitors (Derry et al., 2006 [PubMed 17381337]). poly(A) binding protein interacting protein 2B PAIP2B ENSG00000124374
6696 The protein encoded by this gene is involved in the attachment of osteoclasts to the mineralized bone matrix. The encoded protein is secreted and binds hydroxyapatite with high affinity. The osteoclast vitronectin receptor is found in the cell membrane and may be involved in the binding to this protein. This protein is also a cytokine that upregulates expression of interferon-gamma and interleukin-12. Several transcript variants encoding different isoforms have been found for this gene. secreted phosphoprotein 1 SPP1 ENSG00000118785
23176 This gene is a member of the septin family of nucleotide binding proteins, originally described in yeast as cell division cycle regulatory proteins. Septins are highly conserved in yeast, Drosophila, and mouse, and appear to regulate cytoskeletal organization. Disruption of septin function disturbs cytokinesis and results in large multinucleate or polyploid cells. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. septin 8 SEPT8 ENSG00000164402
1783 Cytoplasmic dynein is a microtubule-associated motor protein (Hughes et al., 1995 [PubMed 7738094]). See DYNC1H1 (MIM 600112) for general information about dyneins. dynein cytoplasmic 1 light intermediate chain 2 DYNC1LI2 ENSG00000135720
91369 NA ankyrin repeat domain 40 ANKRD40 ENSG00000154945
5305 Phosphatidylinositol-5,4-bisphosphate, the precursor to second messengers of the phosphoinositide signal transduction pathways, is thought to be involved in the regulation of secretion, cell proliferation, differentiation, and motility. The protein encoded by this gene is one of a family of enzymes capable of catalyzing the phosphorylation of phosphatidylinositol-5-phosphate on the fourth hydroxyl of the myo-inositol ring to form phosphatidylinositol-5,4-bisphosphate. The amino acid sequence of this enzyme does not show homology to other kinases, but the recombinant protein does exhibit kinase activity. This gene is a member of the phosphatidylinositol-5-phosphate 4-kinase family. phosphatidylinositol-5-phosphate 4-kinase type 2 alpha PIP4K2A ENSG00000150867
54443 This gene encodes an actin-binding protein that plays a role in cell growth and migration, and in cytokinesis. The encoded protein is thought to regulate actin cytoskeletal dynamics in podocytes, components of the glomerulus. Mutations in this gene are associated with focal segmental glomerulosclerosis 8. Alternative splicing results in multiple transcript variants encoding different isoforms. anillin actin binding protein ANLN ENSG00000011426
9638 This gene is an ortholog of the C. elegans unc-76 gene, which is necessary for normal axonal bundling and elongation within axon bundles. Expression of this gene in C. elegans unc-76 mutants can restore to the mutants partial locomotion and axonal fasciculation, suggesting that it also functions in axonal outgrowth. The N-terminal half of the gene product is highly acidic. Alternatively spliced transcript variants encoding different isoforms of this protein have been described. fasciculation and elongation protein zeta 1 FEZ1 ENSG00000149557
7846 Microtubules of the eukaryotic cytoskeleton perform essential and diverse functions and are composed of a heterodimer of alpha and beta tubulins. The genes encoding these microtubule constituents belong to the tubulin superfamily, which is composed of six distinct families. Genes from the alpha, beta and gamma tubulin families are found in all eukaryotes. The alpha and beta tubulins represent the major components of microtubules, while gamma tubulin plays a critical role in the nucleation of microtubule assembly. There are multiple alpha and beta tubulin genes, which are highly conserved among species. This gene encodes alpha tubulin and is highly similar to the mouse and rat Tuba1 genes. Northern blotting studies have shown that the gene expression is predominantly found in morphologically differentiated neurologic cells. This gene is one of three alpha-tubulin genes in a cluster on chromosome 12q. Mutations in this gene cause lissencephaly type 3 (LIS3) - a neurological condition characterized by microcephaly, mental retardation, and early-onset epilepsy and caused by defective neuronal migration. Alternative splicing results in multiple transcript variants encoding distinct isoforms. tubulin alpha 1a TUBA1A ENSG00000167552
83543 NA allograft inflammatory factor 1 like AIF1L ENSG00000126878
333 This gene encodes a member of the highly conserved amyloid precursor protein gene family. The encoded protein is a membrane-associated glycoprotein that is cleaved by secretases in a manner similar to amyloid beta A4 precursor protein cleavage. This cleavage liberates an intracellular cytoplasmic fragment that may act as a transcriptional activator. The encoded protein may also play a role in synaptic maturation during cortical development. Alternatively spliced transcript variants encoding different isoforms have been described. amyloid beta precursor like protein 1 APLP1 ENSG00000105290
65108 This gene encodes a member of the myristoylated alanine-rich C-kinase substrate (MARCKS) family. Members of this family play a role in cytoskeletal regulation, protein kinase C signaling and calmodulin signaling. The encoded protein affects the formation of adherens junction. Alternative splicing results in multiple transcript variants. Pseudogenes of this gene are located on the long arm of chromosomes 6 and 10. MARCKS-like 1 MARCKSL1 ENSG00000175130
8871 The gene is a member of the inositol-polyphosphate 5-phosphatase family. The encoded protein interacts with the ras-related C3 botulinum toxin substrate 1, which causes translocation of the encoded protein to the plasma membrane where it inhibits clathrin-mediated endocytosis. Alternative splicing results in multiple transcript variants. synaptojanin 2 SYNJ2 ENSG00000078269
1902 The integral membrane protein encoded by this gene is a lysophosphatidic acid (LPA) receptor from a group known as EDG receptors. These receptors are members of the G protein-coupled receptor superfamily. Utilized by LPA for cell signaling, EDG receptors mediate diverse biologic functions, including proliferation, platelet aggregation, smooth muscle contraction, inhibition of neuroblastoma cell differentiation, chemotaxis, and tumor cell invasion. Two transcript variants encoding the same protein have been identified for this gene lysophosphatidic acid receptor 1 LPAR1 ENSG00000198121
8073 The protein encoded by this gene belongs to a small class of the protein tyrosine phosphatase (PTP) family. PTPs are cell signaling molecules that play regulatory roles in a variety of cellular processes. PTPs in this class contain a protein tyrosine phosphatase catalytic domain and a characteristic C-terminal prenylation motif. This PTP has been shown to primarily associate with plasmic and endosomal membrane through its C-terminal prenylation. This PTP was found to interact with the beta-subunit of Rab geranylgeranyltransferase II (beta GGT II), and thus may function as a regulator of GGT II activity. Overexpression of this gene in mammalian cells conferred a transformed phenotype, which suggested its role in tumorigenesis. Alternatively spliced transcript variants have been described. Related pseudogenes exist on chromosomes 11, 12 and 17. protein tyrosine phosphatase type IVA, member 2 PTP4A2 ENSG00000184007
65125 This gene encodes a member of the WNK subfamily of serine/threonine protein kinases. The encoded protein may be a key regulator of blood pressure by controlling the transport of sodium and chloride ions. Mutations in this gene have been associated with pseudohypoaldosteronism type II and hereditary sensory neuropathy type II. Alternatively spliced transcript variants encoding different isoforms have been described but the full-length nature of all of them has yet to be determined. WNK lysine deficient protein kinase 1 WNK1 ENSG00000060237
4134 The protein encoded by this gene is a major non-neuronal microtubule-associated protein. This protein contains a domain similar to the microtubule-binding domains of neuronal microtubule-associated protein (MAP2) and microtubule-associated protein tau (MAPT/TAU). This protein promotes microtubule assembly, and has been shown to counteract destabilization of interphase microtubule catastrophe promotion. Cyclin B was found to interact with this protein, which targets cell division cycle 2 (CDC2) kinase to microtubules. The phosphorylation of this protein affects microtubule properties and cell cycle progression. Multiple transcript variants encoding different isoforms have been found for this gene. microtubule associated protein 4 MAP4 ENSG00000047849
6285 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21; however, this gene is located at 21q22.3. This protein may function in Neurite extension, proliferation of melanoma cells, stimulation of Ca2+ fluxes, inhibition of PKC-mediated phosphorylation, astrocytosis and axonal proliferation, and inhibition of microtubule assembly. Chromosomal rearrangements and altered expression of this gene have been implicated in several neurological, neoplastic, and other types of diseases, including Alzheimer’s disease, Down’s syndrome, epilepsy, amyotrophic lateral sclerosis, melanoma, and type I diabetes. S100 calcium binding protein B S100B ENSG00000160307
100463486 NA MT-RNR2-like 8 MTRNR2L8 ENSG00000255823
23446 NA solute carrier family 44 member 1 SLC44A1 ENSG00000070214
66008 NA trafficking protein, kinesin binding 2 TRAK2 ENSG00000115993
57447 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. NDRG family member 2 NDRG2 ENSG00000165795
9444 The protein encoded by this gene is an RNA-binding protein that regulates pre-mRNA splicing, export of mRNAs from the nucleus, protein translation, and mRNA stability. The encoded protein is involved in myelinization and oligodendrocyte differentiation and may play a role in schizophrenia. Multiple transcript variants encoding different isoforms have been found for this gene. QKI, KH domain containing, RNA binding QKI ENSG00000112531
20 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. This protein is highly expressed in brain tissue and may play a role in macrophage lipid metabolism and neural development. Two transcript variants encoding different isoforms have been found for this gene. ATP binding cassette subfamily A member 2 ABCA2 ENSG00000107331
3799 NA kinesin family member 5B KIF5B ENSG00000170759
1267 NA 2’,3’-cyclic nucleotide 3’ phosphodiesterase CNP ENSG00000173786
745 This gene encodes a transcription factor that is required for central nervous system myelination and may regulate oligodendrocyte differentiation. It is thought to act by increasing the expression of genes that effect myelin production but may also directly promote myelin gene expression. Loss of a similar gene in mouse models results in severe demyelination. Alternative splicing results in multiple transcript variants. myelin regulatory factor MYRF ENSG00000124920
9448 The protein encoded by this gene is a member of the serine/threonine protein kinase family. This kinase has been shown to specifically activate MAPK8/JNK. The activation of MAPK8 by this kinase is found to be inhibited by the dominant-negative mutants of MAP3K7/TAK1, MAP2K4/MKK4, and MAP2K7/MKK7, which suggests that this kinase may function through the MAP3K7-MAP2K4-MAP2K7 kinase cascade, and mediate the TNF-alpha signaling pathway. Alternatively spliced transcript variants encoding different isoforms have been identified. mitogen-activated protein kinase kinase kinase kinase 4 MAP4K4 ENSG00000071054
9839 The protein encoded by this gene is a member of the Zfh1 family of 2-handed zinc finger/homeodomain proteins. It is located in the nucleus and functions as a DNA-binding transcriptional repressor that interacts with activated SMADs. Mutations in this gene are associated with Hirschsprung disease/Mowat-Wilson syndrome. Alternatively spliced transcript variants have been found for this gene. zinc finger E-box binding homeobox 2 ZEB2 ENSG00000169554
94015 This gene encodes a member of the tweety family of proteins. Members of this family function as chloride anion channels. The encoded protein functions as a calcium(2+)-activated large conductance chloride(-) channel, and may play a role in kidney tumorigenesis. Two transcript variants encoding distinct isoforms have been identified for this gene. tweety family member 2 TTYH2 ENSG00000141540
81628 TSC22D4 is a member of the TSC22 domain family of leucine zipper transcriptional regulators (see TSC22D3; MIM 300506) (Kester et al., 1999 [PubMed 10488076]; Fiorenza et al., 2001 [PubMed 11707329]). TSC22 domain family member 4 TSC22D4 ENSG00000166925
5129 NA cyclin-dependent kinase 18 CDK18 ENSG00000117266
85315 NA progestin and adipoQ receptor family member 8 PAQR8 ENSG00000170915
55914 This gene is a member of the leucine-rich repeat and PDZ domain (LAP) family. The encoded protein contains 17 leucine-rich repeats and one PDZ domain. It binds to the unphosphorylated form of the ERBB2 protein and regulates ERBB2 function and localization. It has also been shown to affect the Ras signaling pathway by disrupting Ras-Raf interaction. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. erbb2 interacting protein ERBIN ENSG00000112851
1059 This gene product is a highly conserved protein that facilitates centromere formation. It is a DNA-binding protein that is derived from transposases of the pogo DNA transposon family. It contains a helix-loop-helix DNA binding motif at the N-terminus, and a dimerization domain at the C-terminus. The DNA binding domain recognizes and binds a 17-bp sequence (CENP-B box) in the centromeric alpha satellite DNA. This protein is proposed to play an important role in the assembly of specific centromere structures in interphase nuclei and on mitotic chromosomes. It is also considered a major centromere autoantigen recognized by sera from patients with anti-centromere antibodies. centromere protein B CENPB ENSG00000125817
5781 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP contains two tandem Src homology-2 domains, which function as phospho-tyrosine binding domains and mediate the interaction of this PTP with its substrates. This PTP is widely expressed in most tissues and plays a regulatory role in various cell signaling events that are important for a diversity of cell functions, such as mitogenic activation, metabolic control, transcription regulation, and cell migration. Mutations in this gene are a cause of Noonan syndrome as well as acute myeloid leukemia. Two transcript variants encoding different isoforms have been found for this gene. protein tyrosine phosphatase, non-receptor type 11 PTPN11 ENSG00000179295
114793 This gene encodes a formin-related protein. Formin-related proteins have been implicated in morphogenesis, cytokinesis, and cell polarity. Alternatively spliced transcript variants encoding different isoforms have been described but their full-length nature has yet to be determined. formin like 2 FMNL2 ENSG00000157827
8099 The protein encoded by this gene is a cyclin-dependent kinase 2 (CDK2) -associated protein which is thought to negatively regulate CDK2 activity by sequestering monomeric CDK2, and targeting CDK2 for proteolysis. This protein was found to also interact with DNA polymerase alpha/primase and mediate the phosphorylation of the large p180 subunit, which suggests a regulatory role in DNA replication during the S-phase of the cell cycle. This protein also forms a core subunit of the nucleosome remodeling and histone deacetylation (NURD) complex that epigenetically regulates embryonic stem cell differentiation. This gene thus plays a role in both cell-cycle and epigenetic regulation. Alternative splicing results in multiple transcript variants encoding distinct isoforms. cyclin-dependent kinase 2 associated protein 1 CDK2AP1 ENSG00000111328
ENSG00000260465 NA NA RP11-63M22.2 ENSG00000260465
144348 NA zinc finger protein 664 ZNF664 ENSG00000179195
23500 NA dishevelled associated activator of morphogenesis 2 DAAM2 ENSG00000146122
22933 This gene encodes a member of the sirtuin family of proteins, homologs to the yeast Sir2 protein. Members of the sirtuin family are characterized by a sirtuin core domain and grouped into four classes. The functions of human sirtuins have not yet been determined; however, yeast sirtuin proteins are known to regulate epigenetic gene silencing and suppress recombination of rDNA. Studies suggest that the human sirtuins may function as intracellular regulatory proteins with mono-ADP-ribosyltransferase activity. The protein encoded by this gene is included in class I of the sirtuin family. Several transcript variants are resulted from alternative splicing of this gene. sirtuin 2 SIRT2 ENSG00000068903
123606 This gene encodes a magnesium transporter that associates with early endosomes and the cell surface in a variety of neuronal and epithelial cells. This protein may play a role in nervous system development and maintenance. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene have been associated with autosomal dominant spastic paraplegia 6. non imprinted in Prader-Willi/Angelman syndrome 1 NIPA1 ENSG00000170113
989 This gene encodes a protein that is highly similar to the CDC10 protein of Saccharomyces cerevisiae. The protein also shares similarity with Diff 6 of Drosophila and with H5 of mouse. Each of these similar proteins, including the yeast CDC10, contains a GTP-binding motif. The yeast CDC10 protein is a structural component of the 10 nm filament which lies inside the cytoplasmic membrane and is essential for cytokinesis. This human protein functions in gliomagenesis and in the suppression of glioma cell growth, and it is required for the association of centromere-associated protein E with the kinetochore. Alternative splicing results in multiple transcript variants. Several related pseudogenes have been identified on chromosomes 5, 7, 9, 10, 11, 14, 17 and 19. septin 7 SEPT7 ENSG00000122545
58480 This gene encodes a member of the Rho family of GTPases. This protein can activate PAK1 and JNK1, and can induce filopodium formation and stress fiber dissolution. It may also mediate the effects of WNT1 signaling in the regulation of cell morphology, cytoskeletal organization, and cell proliferation. A non-coding transcript variant of this gene results from naturally occurring read-through transcription between this locus and the neighboring DUSP5P (dual specificity phosphatase 5 pseudogene) locus. ras homolog family member U RHOU ENSG00000116574
1471 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. The cystatin locus on chromosome 20 contains the majority of the type 2 cystatin genes and pseudogenes. This gene is located in the cystatin locus and encodes the most abundant extracellular inhibitor of cysteine proteases, which is found in high concentrations in biological fluids and is expressed in virtually all organs of the body. A mutation in this gene has been associated with amyloid angiopathy. Expression of this protein in vascular wall smooth muscle cells is severely reduced in both atherosclerotic and aneurysmal aortic lesions, establishing its role in vascular disease. In addition, this protein has been shown to have an antimicrobial function, inhibiting the replication of herpes simplex virus. Alternative splicing results in multiple transcript variants encoding a single protein. cystatin C CST3 ENSG00000101439
9771 Members of the RAS (see HRAS; MIM 190020) subfamily of GTPases function in signal transduction as GTP/GDP-regulated switches that cycle between inactive GDP- and active GTP-bound states. Guanine nucleotide exchange factors (GEFs), such as RAPGEF5, serve as RAS activators by promoting acquisition of GTP to maintain the active GTP-bound state and are the key link between cell surface receptors and RAS activation (Rebhun et al., 2000 [PubMed 10934204]). Rap guanine nucleotide exchange factor 5 RAPGEF5 ENSG00000136237
10284 Histone acetylation plays a key role in the regulation of eukaryotic gene expression. Histone acetylation and deacetylation are catalyzed by multisubunit complexes. The protein encoded by this gene is a component of the histone deacetylase complex, which includes SIN3, SAP30, HDAC1, HDAC2, RbAp46, RbAp48, and other polypeptides. This protein directly interacts with SIN3 and enhances SIN3-mediated transcriptional repression when tethered to the promoter. A pseudogene has been identified on chromosome 2. Sin3A associated protein 18kDa SAP18 ENSG00000150459
3707 The protein encoded by this protein regulates inositol phosphate metabolism by phosphorylation of second messenger inositol 1,4,5-trisphosphate to Ins(1,3,4,5)P4. The activity of this encoded protein is responsible for regulating the levels of a large number of inositol polyphosphates that are important in cellular signaling. Both calcium/calmodulin and protein phosphorylation mechanisms control its activity. inositol-trisphosphate 3-kinase B ITPKB ENSG00000143772
347733 The protein encoded by this gene is a beta isoform of tubulin, which binds GTP and is a major component of microtubules. This gene is highly similar to TUBB2A and TUBB2C. Defects in this gene are a cause of asymmetric polymicrogyria. tubulin beta 2B class IIb TUBB2B ENSG00000137285
3306 NA heat shock protein family A (Hsp70) member 2 HSPA2 ENSG00000126803
64077 NA phospholysine phosphohistidine inorganic pyrophosphate phosphatase LHPP ENSG00000107902
ENSG00000262967 NA NA RP11-294J22.6 ENSG00000262967
23313 NA KIAA0930 KIAA0930 ENSG00000100364
219654 NA zinc finger CCHC-type containing 24 ZCCHC24 ENSG00000165424
50862 The protein encoded by this gene contains a RING finger, a motif known to be involved in protein-DNA and protein-protein interactions. Abundant expression of this gene was found in the testicular tissue of fertile men, but was not detected in azoospermic patients. Studies of the mouse counterpart suggest that this gene may function as a testis specific transcription factor during spermatogenesis. ring finger protein 141 RNF141 ENSG00000110315
ENSG00000235423 NA NA RP11-282O18.3 ENSG00000235423
5168 The protein encoded by this gene functions as both a phosphodiesterase, which cleaves phosphodiester bonds at the 5’ end of oligonucleotides, and a phospholipase, which catalyzes production of lysophosphatidic acid (LPA) in extracellular fluids. LPA evokes growth factor-like responses including stimulation of cell proliferation and chemotaxis. This gene product stimulates the motility of tumor cells and has angiogenic properties, and its expression is upregulated in several kinds of carcinomas. The gene product is secreted and further processed to make the biologically active form. Several alternatively spliced transcript variants encoding different isoforms have been identified. ectonucleotide pyrophosphatase/phosphodiesterase 2 ENPP2 ENSG00000136960
56895 This gene encodes a member of the 1-acylglycerol-3-phosphate O-acyltransferase family. This integral membrane protein converts lysophosphatidic acid to phosphatidic acid, the second step in de novo phospholipid biosynthesis. 1-acylglycerol-3-phosphate O-acyltransferase 4 AGPAT4 ENSG00000026652
1298 This gene encodes one of the three alpha chains of type IX collagen, the major collagen component of hyaline cartilage. Type IX collagen, a heterotrimeric molecule, is usually found in tissues containing type II collagen, a fibrillar collagen. This chain is unusual in that, unlike the other two type IX alpha chains, it contains a covalently attached glycosaminoglycan side chain. Mutations in this gene are associated with multiple epiphyseal dysplasia. collagen type IX alpha 2 COL9A2 ENSG00000049089
51704 This gene encodes a member of the type 3 G protein-coupled receptor family. Members of this superfamily are characterized by a signature 7-transmembrane domain motif. The encoded protein may modulate insulin secretion and increased protein expression is associated with type 2 diabetes. Alternative splicing results in multiple transcript variants. G protein-coupled receptor class C group 5 member B GPRC5B ENSG00000167191
23241 NA phosphofurin acidic cluster sorting protein 2 PACS2 ENSG00000179364
5010 This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets, and also play critical roles in maintaining cell polarity and signal transductions. The protein encoded by this gene is a major component of central nervous system (CNS) myelin and plays an important role in regulating proliferation and migration of oligodendrocytes. Mouse studies showed that the gene deficiency results in deafness and loss of the Sertoli cell epithelial phenotype in the testis. This protein is a tight junction protein at the human blood-testis barrier (BTB), and the BTB disruption is related to a dysfunction of this gene. Alternatively spliced transcript variants encoding different isoforms have been identified. claudin 11 CLDN11 ENSG00000013297
1191 The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. clusterin CLU ENSG00000120885
2768 NA G protein subunit alpha 12 GNA12 ENSG00000146535
4504 NA metallothionein 3 MT3 ENSG00000087250
6468 This gene is a member of the F-box/WD-40 gene family, which recruit specific target proteins through their WD-40 protein-protein binding domains for ubiquitin mediated degradation. In mouse, a highly similar protein is thought to be responsible for maintaining the apical ectodermal ridge of developing limb buds; disruption of the mouse gene results in the absence of central digits, underdeveloped or absent metacarpal/metatarsal bones and syndactyly. This phenotype is remarkably similar to split hand-split foot malformation in humans, a clinically heterogeneous condition with a variety of modes of transmission. An autosomal recessive form has been mapped to the chromosomal region where this gene is located, and complex rearrangements involving duplications of this gene and others have been associated with the condition. A pseudogene of this locus has been mapped to one of the introns of the BCR gene on chromosome 22. F-box and WD repeat domain containing 4 FBXW4 ENSG00000107829
79966 Stearoyl-CoA desaturase (SCD; EC 1.14.99.5) is an integral membrane protein of the endoplasmic reticulum that catalyzes the formation of monounsaturated fatty acids from saturated fatty acids. SCD may be a key regulator of energy metabolism with a role in obesity and dislipidemia. Four SCD isoforms, Scd1 through Scd4, have been identified in mouse. In contrast, only 2 SCD isoforms, SCD1 (MIM 604031) and SCD5, have been identified in human. SCD1 shares about 85% amino acid identity with all 4 mouse SCD isoforms, as well as with rat Scd1 and Scd2. In contrast, SCD5 shares limited homology with the rodent SCDs and appears to be unique to primates (Wang et al., 2005 [PubMed 15907797]). stearoyl-CoA desaturase 5 SCD5 ENSG00000145284
84722 This gene encodes a proline-rich protein that is a target for regulation by the tumor suppressor protein p53. The encoded protein plays an important role in mitosis by recruiting and regulating microtubule depolymerases that destabalize microtubules. Alternatively spliced transcript variants encoding different isoforms have been described. proline and serine rich coiled-coil 1 PSRC1 ENSG00000134222
51125 NA golgin A7 GOLGA7 ENSG00000147533
83641 NA family with sequence similarity 107 member B FAM107B ENSG00000065809
23232 NA TBC1 domain family member 12 TBC1D12 ENSG00000108239
3996 This gene encodes a protein that is similar to a tumor suppressor in Drosophila. The protein is part of a cytoskeletal network and is associated with nonmuscle myosin II heavy chain and a kinase that specifically phosphorylates this protein at serine residues. The gene is located within the Smith-Magenis syndrome region on chromosome 17. LLGL1, scribble cell polarity complex component LLGL1 ENSG00000131899
57165 This gene encodes a gap junction protein. Gap junction proteins are members of a large family of homologous connexins and comprise 4 transmembrane, 2 extracellular, and 3 cytoplasmic domains. This gene plays a key role in central myelination and is involved in peripheral myelination in humans. Defects in this gene are the cause of autosomal recessive Pelizaeus-Merzbacher-like disease-1. gap junction protein gamma 2 GJC2 ENSG00000198835
101927688 NA SEPT4 antisense RNA 1 SEPT4-AS1 ENSG00000264672
90933 This gene encodes a member of the tripartite motif (TRIM) family. The TRIM family is characterized by a signature motif composed of a RING finger, one or more B-box domains, and a coiled-coil region. This encoded protein may play a role in protein kinase C signaling. Multiple transcript variants encoding different isoforms have been found for this gene. tripartite motif containing 41 TRIM41 ENSG00000146063
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",9,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 10 Annotations

out <- mygene::queryMany(gene_list[10,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol X_id summary name notfound
ENSG00000148795 CYP17A1 1586 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. cytochrome P450 family 17 subfamily A member 1 NA
ENSG00000160882 CYP11B1 1584 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and is involved in the conversion of progesterone to cortisol in the adrenal cortex. Mutations in this gene cause congenital adrenal hyperplasia due to 11-beta-hydroxylase deficiency. Transcript variants encoding different isoforms have been noted for this gene. cytochrome P450 family 11 subfamily B member 1 NA
ENSG00000211890 IGHA2 ENSG00000211890 NA immunoglobulin heavy constant alpha 2 (A2m marker) NA
ENSG00000169605 GKN1 56287 The protein encoded by this gene is found to be down-regulated in human gastric cancer tissue as compared to normal gastric mucosa. gastrokine 1 NA
ENSG00000162896 PIGR 5284 This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. polymeric immunoglobulin receptor NA
ENSG00000147465 STAR 6770 The protein encoded by this gene plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone. This protein permits the cleavage of cholesterol into pregnenolone by mediating the transport of cholesterol from the outer mitochondrial membrane to the inner mitochondrial membrane. Mutations in this gene are a cause of congenital lipoid adrenal hyperplasia (CLAH), also called lipoid CAH. A pseudogene of this gene is located on chromosome 13. steroidogenic acute regulatory protein NA
ENSG00000164816 DEFA5 1670 Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. Several of the alpha defensin genes appear to be clustered on chromosome 8. The protein encoded by this gene, defensin, alpha 5, is highly expressed in the secretory granules of Paneth cells of the ileum. defensin alpha 5 NA
ENSG00000171747 LGALS4 3960 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. lectin, galactoside binding soluble 4 NA
ENSG00000085662 AKR1B1 231 This gene encodes a member of the aldo/keto reductase superfamily, which consists of more than 40 known enzymes and proteins. This member catalyzes the reduction of a number of aldehydes, including the aldehyde form of glucose, and is thereby implicated in the development of diabetic complications by catalyzing the reduction of glucose to sorbitol. Multiple pseudogenes have been identified for this gene. The nomenclature system used by the HUGO Gene Nomenclature Committee to define human aldo-keto reductase family members is known to differ from that used by the Mouse Genome Informatics database. aldo-keto reductase family 1, member B1 (aldose reductase) NA
ENSG00000231852 CYP21A2 1589 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and hydroxylates steroids at the 21 position. Its activity is required for the synthesis of steroid hormones including cortisol and aldosterone. Mutations in this gene cause congenital adrenal hyperplasia. A related pseudogene is located near this gene; gene conversion events involving the functional gene and the pseudogene are thought to account for many cases of steroid 21-hydroxylase deficiency. Two transcript variants encoding different isoforms have been found for this gene. cytochrome P450 family 21 subfamily A member 2 NA
ENSG00000140459 CYP11A1 1583 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and catalyzes the conversion of cholesterol to pregnenolone, the first and rate-limiting step in the synthesis of the steroid hormones. Two transcript variants encoding different isoforms have been found for this gene. The cellular location of the smaller isoform is unclear since it lacks the mitochondrial-targeting transit peptide. cytochrome P450 family 11 subfamily A member 1 NA
ENSG00000170421 KRT8 3856 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. keratin 8 NA
ENSG00000211895 IGHA1 ENSG00000211895 NA immunoglobulin heavy constant alpha 1 NA
ENSG00000155850 SLC26A2 1836 The diastrophic dysplasia sulfate transporter is a transmembrane glycoprotein implicated in the pathogenesis of several human chondrodysplasias. It apparently is critical in cartilage for sulfation of proteoglycans and matrix organization. solute carrier family 26 member 2 NA
ENSG00000090920 NA NA NA NA TRUE
ENSG00000117472 TSPAN1 10103 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. tetraspanin 1 NA
ENSG00000073060 SCARB1 949 The protein encoded by this gene is a plasma membrane receptor for high density lipoprotein cholesterol (HDL). The encoded protein mediates cholesterol transfer to and from HDL. In addition, this protein is a receptor for hepatitis C virus glycoprotein E2. Two transcript variants encoding different isoforms have been found for this gene. scavenger receptor class B member 1 NA
ENSG00000137714 FDX1 2230 This gene encodes a small iron-sulfur protein that transfers electrons from NADPH through ferredoxin reductase to mitochondrial cytochrome P450, involved in steroid, vitamin D, and bile acid metabolism. Pseudogenes of this functional gene are found on chromosomes 20 and 21. ferredoxin 1 NA
ENSG00000116133 DHCR24 1718 This gene encodes a flavin adenine dinucleotide (FAD)-dependent oxidoreductase which catalyzes the reduction of the delta-24 double bond of sterol intermediates during cholesterol biosynthesis. The protein contains a leader sequence that directs it to the endoplasmic reticulum membrane. Missense mutations in this gene have been associated with desmosterolosis. Also, reduced expression of the gene occurs in the temporal cortex of Alzheimer disease patients and overexpression has been observed in adrenal gland cancer cells. 24-dehydrocholesterol reductase NA
ENSG00000163399 ATP1A1 476 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 1 subunit. Multiple transcript variants encoding different isoforms have been found for this gene. ATPase Na+/K+ transporting subunit alpha 1 NA
ENSG00000143416 SELENBP1 8991 This gene encodes a member of the selenium-binding protein family. Selenium is an essential nutrient that exhibits potent anticarcinogenic properties, and deficiency of selenium may cause certain neurologic diseases. The effects of selenium in preventing cancer and neurologic diseases may be mediated by selenium-binding proteins, and decreased expression of this gene may be associated with several types of cancer. The encoded protein may play a selenium-dependent role in ubiquitination/deubiquitination-mediated protein degradation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. selenium binding protein 1 NA
ENSG00000146205 ANO7 50636 This prostate-specific gene encodes a cytoplasmic protein, as well as a polytopic membrane protein which may serve as a target in prostate cancer diagnosis and immunotherapy. Alternative splicing results in multiple transcript variants encoding different isoforms. anoctamin 7 NA
ENSG00000176387 HSD11B2 3291 There are at least two isozymes of the corticosteroid 11-beta-dehydrogenase, a microsomal enzyme complex responsible for the interconversion of cortisol and cortisone. The type I isozyme has both 11-beta-dehydrogenase (cortisol to cortisone) and 11-oxoreductase (cortisone to cortisol) activities. The type II isozyme, encoded by this gene, has only 11-beta-dehydrogenase activity. In aldosterone-selective epithelial tissues such as the kidney, the type II isozyme catalyzes the glucocorticoid cortisol to the inactive metabolite cortisone, thus preventing illicit activation of the mineralocorticoid receptor. In tissues that do not express the mineralocorticoid receptor, such as the placenta and testis, it protects cells from the growth-inhibiting and/or pro-apoptotic effects of cortisol, particularly during embryonic development. Mutations in this gene cause the syndrome of apparent mineralocorticoid excess and hypertension. hydroxysteroid (11-beta) dehydrogenase 2 NA
ENSG00000158019 BRE 9577 NA brain and reproductive organ-expressed (TNFRSF1A modulator) NA
ENSG00000138449 SLC40A1 30061 The protein encoded by this gene is a cell membrane protein that may be involved in iron export from duodenal epithelial cells. Defects in this gene are a cause of hemochromatosis type 4 (HFE4). solute carrier family 40 member 1 NA
ENSG00000023330 ALAS1 211 This gene encodes the mitochondrial enzyme which is catalyzes the rate-limiting step in heme (iron-protoporphyrin) biosynthesis. The enzyme encoded by this gene is the housekeeping enzyme; a separate gene encodes a form of the enzyme that is specific for erythroid tissue. The level of the mature encoded protein is regulated by heme: high levels of heme down-regulate the mature enzyme in mitochondria while low heme levels up-regulate. A pseudogene of this gene is located on chromosome 12. Alternative splicing results in multiple transcript variants encoding different isoforms. 5’-aminolevulinate synthase 1 NA
ENSG00000135052 GOLM1 51280 The Golgi complex plays a key role in the sorting and modification of proteins exported from the endoplasmic reticulum. The protein encoded by this gene is a type II Golgi transmembrane protein. It processes proteins synthesized in the rough endoplasmic reticulum and assists in the transport of protein cargo through the Golgi apparatus. The expression of this gene has been observed to be upregulated in response to viral infection. Alternatively spliced transcript variants encoding the same protein have been described for this gene. golgi membrane protein 1 NA
ENSG00000120756 PLS1 5357 Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. The protein encoded by this gene is a third distinct plastin isoform, which is specifically expressed at high levels in the small intestine. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. A pseudogene of this gene is found on chromosome 11. plastin 1 NA
ENSG00000085563 ABCB1 5243 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MDR/TAP subfamily. Members of the MDR/TAP subfamily are involved in multidrug resistance. The protein encoded by this gene is an ATP-dependent drug efflux pump for xenobiotic compounds with broad substrate specificity. It is responsible for decreased drug accumulation in multidrug-resistant cells and often mediates the development of resistance to anticancer drugs. This protein also functions as a transporter in the blood-brain barrier. ATP binding cassette subfamily B member 1 NA
ENSG00000119888 EPCAM 4072 This gene encodes a carcinoma-associated antigen and is a member of a family that includes at least two type I membrane proteins. This antigen is expressed on most normal epithelial cells and gastrointestinal carcinomas and functions as a homotypic calcium-independent cell adhesion molecule. The antigen is being used as a target for immunotherapy treatment of human carcinomas. Mutations in this gene result in congenital tufting enteropathy. epithelial cell adhesion molecule NA
ENSG00000161513 FDXR 2232 This gene encodes a mitochondrial flavoprotein that initiates electron transport for cytochromes P450 receiving electrons from NADPH. Multiple alternatively spliced transcript variants have been found for this gene. ferredoxin reductase NA
ENSG00000185559 DLK1 8788 This gene encodes a transmembrane protein that contains multiple epidermal growth factor repeats that functions as a regulator of cell growth. The encoded protein is involved in the differentiation of several cell types including adipocytes. This gene is located in a region of chromosome 14 frequently showing unparental disomy, and is imprinted and expressed from the paternal allele. A single nucleotide variant in this gene is associated with child and adolescent obesity and shows polar overdominance, where heterozygotes carrying an active paternal allele express the phenotype, while mutant homozygotes are normal. delta-like 1 homolog (Drosophila) NA
ENSG00000147804 SLC39A4 55630 This gene encodes a member of the zinc/iron-regulated transporter-like protein (ZIP) family. The encoded protein localizes to cell membranes and is required for zinc uptake in the intestine. Mutations in this gene result in acrodermatitis enteropathica. Multiple transcript variants encoding different isoforms have been found for this gene. solute carrier family 39 member 4 NA
ENSG00000132465 JCHAIN 3512 NA joining chain of multimeric IgA and IgM NA
ENSG00000019102 VSIG2 23584 NA V-set and immunoglobulin domain containing 2 NA
ENSG00000095932 SMIM24 284422 NA small integral membrane protein 24 NA
ENSG00000165449 SLC16A9 220963 NA solute carrier family 16 member 9 NA
ENSG00000104267 CA2 760 The protein encoded by this gene is one of several isozymes of carbonic anhydrase, which catalyzes reversible hydration of carbon dioxide. Defects in this enzyme are associated with osteopetrosis and renal tubular acidosis. Two transcript variants encoding different isoforms have been found for this gene. carbonic anhydrase 2 NA
ENSG00000088340 FER1L4 ENSG00000088340 NA fer-1-like family member 4, pseudogene (functional) NA
ENSG00000057252 SOAT1 6646 The protein encoded by this gene belongs to the acyltransferase family. It is located in the endoplasmic reticulum, and catalyzes the formation of fatty acid-cholesterol esters. This gene has been implicated in the formation of beta-amyloid and atherosclerotic plaques by controlling the equilibrium between free cholesterol and cytoplasmic cholesteryl esters. Alternatively spliced transcript variants have been found for this gene. sterol O-acyltransferase 1 NA
ENSG00000108846 ABCC3 8714 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. The specific function of this protein has not yet been determined; however, this protein may play a role in the transport of biliary and intestinal excretion of organic anions. Alternatively spliced variants which encode different protein isoforms have been described; however, not all variants have been fully characterized. ATP binding cassette subfamily C member 3 NA
ENSG00000102837 OLFM4 10562 This gene was originally cloned from human myeloblasts and found to be selectively expressed in inflammed colonic epithelium. This gene encodes a member of the olfactomedin family. The encoded protein is an antiapoptotic factor that promotes tumor growth and is an extracellular matrix glycoprotein that facilitates cell adhesion. olfactomedin 4 NA
ENSG00000128298 BAIAP2L2 80115 The protein encoded by this gene binds phosphoinositides and promotes the formation of planar or curved membrane structures. The encoded protein is found in RAB13-positive vesicles and at intercellular contacts with the plasma membrane. BAI1 associated protein 2 like 2 NA
ENSG00000136059 VILL 50853 The protein encoded by this gene belongs to the villin/gelsolin family. It contains 6 gelsolin-like repeats and a headpiece domain. It may play a role in actin-bundling. villin-like NA
ENSG00000188707 ZBED6CL 113763 NA ZBED6 C-terminal like NA
ENSG00000185000 DGAT1 8694 This gene encodes an multipass transmembrane protein that functions as a key metabolic enzyme. The encoded protein catalyzes the conversion of diacylglycerol and fatty acyl CoA to triacylglycerol. This enzyme can also transfer acyl CoA to retinol. Activity of this protein may be associated with obesity and other metabolic diseases. diacylglycerol O-acyltransferase 1 NA
ENSG00000163694 RBM47 54502 NA RNA binding motif protein 47 NA
ENSG00000103018 CYB5B 80777 NA cytochrome b5 type B NA
ENSG00000108272 NA NA NA NA TRUE
ENSG00000014914 MTMR11 10903 NA myotubularin related protein 11 NA
ENSG00000162144 CYB561A3 220002 NA cytochrome b561 family member A3 NA
ENSG00000166920 C15orf48 84419 This gene was first identified in a study of human esophageal squamous cell carcinoma tissues. Levels of both the message and protein are reduced in carcinoma samples. In adult human tissues, this gene is expressed in the the esophagus, stomach, small intestine, colon and placenta. Alternatively spliced transcript variants that encode the same protein have been identified. chromosome 15 open reading frame 48 NA
ENSG00000110917 MLEC 9761 This gene encodes the carbohydrate-binding protein malectin which is a Type I membrane-anchored endoplasmic reticulum protein. This protein has an affinity for Glc2Man9GlcNAc2 (G2M9) N-glycans and is involved in regulating glycosylation in the endoplasmic reticulum. This protein has also been shown to interact with ribophorin I and may be involved in the directing the degradation of misfolded proteins. Alternate splicing results in multiple transcript variants. malectin NA
ENSG00000119431 HDHD3 81932 NA haloacid dehalogenase like hydrolase domain containing 3 NA
ENSG00000254667 AP000783.1 ENSG00000254667 NA NA NA
ENSG00000130529 TRPM4 54795 The protein encoded by this gene is a calcium-activated nonselective ion channel that mediates transport of monovalent cations across membranes, thereby depolarizing the membrane. The activity of the encoded protein increases with increasing intracellular calcium concentration, but this channel does not transport calcium. transient receptor potential cation channel subfamily M member 4 NA
ENSG00000197442 MAP3K5 4217 Mitogen-activated protein kinase (MAPK) signaling cascades include MAPK or extracellular signal-regulated kinase (ERK), MAPK kinase (MKK or MEK), and MAPK kinase kinase (MAPKKK or MEKK). MAPKK kinase/MEKK phosphorylates and activates its downstream protein kinase, MAPK kinase/MEK, which in turn activates MAPK. The kinases of these signaling cascades are highly conserved, and homologs exist in yeast, Drosophila, and mammalian cells. MAPKKK5 contains 1,374 amino acids with all 11 kinase subdomains. Northern blot analysis shows that MAPKKK5 transcript is abundantly expressed in human heart and pancreas. The MAPKKK5 protein phosphorylates and activates MKK4 (aliases SERK1, MAPKK4) in vitro, and activates c-Jun N-terminal kinase (JNK)/stress-activated protein kinase (SAPK) during transient expression in COS and 293 cells; MAPKKK5 does not activate MAPK/ERK. mitogen-activated protein kinase kinase kinase 5 NA
ENSG00000073350 LLGL2 3993 The lethal (2) giant larvae protein of Drosophila plays a role in asymmetric cell division, epithelial cell polarity, and cell migration. This human gene encodes a protein similar to lethal (2) giant larvae of Drosophila. In fly, the protein’s ability to localize cell fate determinants is regulated by the atypical protein kinase C (aPKC). In human, this protein interacts with aPKC-containing complexes and is cortically localized in mitotic cells. Alternative splicing results in multiple transcript variants encoding different isoforms. LLGL2, scribble cell polarity complex component NA
ENSG00000167986 DDB1 1642 The protein encoded by this gene is the large subunit (p127) of the heterodimeric DNA damage-binding (DDB) complex while another protein (p48) forms the small subunit. This protein complex functions in nucleotide-excision repair and binds to DNA following UV damage. Defective activity of this complex causes the repair defect in patients with xeroderma pigmentosum complementation group E (XPE) - an autosomal recessive disorder characterized by photosensitivity and early onset of carcinomas. However, it remains for mutation analysis to demonstrate whether the defect in XPE patients is in this gene or the gene encoding the small subunit. In addition, Best vitelliform mascular dystrophy is mapped to the same region as this gene on 11q, but no sequence alternations of this gene are demonstrated in Best disease patients. The protein encoded by this gene also functions as an adaptor molecule for the cullin 4 (CUL4) ubiquitin E3 ligase complex by facilitating the binding of substrates to this complex and the ubiquitination of proteins. damage specific DNA binding protein 1 NA
ENSG00000072310 SREBF1 6720 This gene encodes a transcription factor that binds to the sterol regulatory element-1 (SRE1), which is a decamer flanking the low density lipoprotein receptor gene and some genes involved in sterol biosynthesis. The protein is synthesized as a precursor that is attached to the nuclear membrane and endoplasmic reticulum. Following cleavage, the mature protein translocates to the nucleus and activates transcription by binding to the SRE1. Sterols inhibit the cleavage of the precursor, and the mature nuclear form is rapidly catabolized, thereby reducing transcription. The protein is a member of the basic helix-loop-helix-leucine zipper (bHLH-Zip) transcription factor family. This gene is located within the Smith-Magenis syndrome region on chromosome 17. sterol regulatory element binding transcription factor 1 NA
ENSG00000196365 LONP1 9361 This gene encodes a mitochondrial matrix protein that belongs to the Lon family of ATP-dependent proteases. This protein mediates the selective degradation of misfolded, unassembled or oxidatively damaged polypeptides in the mitochondrial matrix. It may also have a chaperone function in the assembly of inner membrane protein complexes, and participate in the regulation of mitochondrial gene expression and maintenance of the integrity of the mitochondrial genome. Decreased expression of this gene has been noted in a patient with hereditary spastic paraplegia (PMID:18378094). Alternatively spliced transcript variants have been found for this gene. lon peptidase 1, mitochondrial NA
ENSG00000149809 TM7SF2 7108 NA transmembrane 7 superfamily member 2 NA
ENSG00000142959 BEST4 266675 This gene is a member of the bestrophin gene family of anion channels. Bestrophin genes share a similar gene structure with highly conserved exon-intron boundaries, but with distinct 3’ ends. Bestrophins are transmembrane proteins that contain a homologous region rich in aromatic residues, including an invariant arg-phe-pro motif. Mutation in one of the family members (bestrophin 1) is associated with vitelliform macular dystrophy. The bestrophin 4 gene is predominantly expressed in the colon. bestrophin 4 NA
ENSG00000168060 NAALADL1 10004 NA N-acetylated alpha-linked acidic dipeptidase-like 1 NA
ENSG00000167107 ACSF2 80221 NA acyl-CoA synthetase family member 2 NA
ENSG00000064601 CTSA 5476 This gene encodes a member of the peptidase S10 family of serine carboxypeptidases. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate two chains that comprise the heterodimeric active enzyme. This enzyme possesses deamidase, esterase and carboxypeptidase activities and acts as a scaffold in the lysosomal multienzyme complex. Mutations in this gene are associated with galactosialidosis. cathepsin A NA
ENSG00000143891 GALM 130589 This gene encodes an enzyme that catalyzes the epimerization of hexose sugars such as glucose and galactose. The encoded protein is expressed in the cytoplasm and has a preference for galactose. The encoded protein may be required for normal galactose metabolism by maintaining the equilibrium of alpha and beta anomers of galactose. galactose mutarotase (aldose 1-epimerase) NA
ENSG00000065154 OAT 4942 This gene encodes the mitochondrial enzyme ornithine aminotransferase, which is a key enzyme in the pathway that converts arginine and ornithine into the major excitatory and inhibitory neurotransmitters glutamate and GABA. Mutations that result in a deficiency of this enzyme cause the autosomal recessive eye disease Gyrate Atrophy. Alternatively spliced transcript variants encoding different isoforms have been described. Related pseudogenes have been defined on the X chromosome. ornithine aminotransferase NA
ENSG00000149260 CAPN5 726 Calpains are calcium-dependent cysteine proteases involved in signal transduction in a variety of cellular processes. A functional calpain protein consists of an invariant small subunit and 1 of a family of large subunits. CAPN5 is one of the large subunits. Unlike some of the calpains, CAPN5 and CAPN6 lack a calmodulin-like domain IV. Because of the significant similarity to Caenorhabditis elegans sex determination gene tra-3, CAPN5 is also called as HTRA3. calpain 5 NA
ENSG00000171345 KRT19 3880 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. keratin 19 NA
ENSG00000138413 IDH1 3417 Isocitrate dehydrogenases catalyze the oxidative decarboxylation of isocitrate to 2-oxoglutarate. These enzymes belong to two distinct subclasses, one of which utilizes NAD(+) as the electron acceptor and the other NADP(+). Five isocitrate dehydrogenases have been reported: three NAD(+)-dependent isocitrate dehydrogenases, which localize to the mitochondrial matrix, and two NADP(+)-dependent isocitrate dehydrogenases, one of which is mitochondrial and the other predominantly cytosolic. Each NADP(+)-dependent isozyme is a homodimer. The protein encoded by this gene is the NADP(+)-dependent isocitrate dehydrogenase found in the cytoplasm and peroxisomes. It contains the PTS-1 peroxisomal targeting signal sequence. The presence of this enzyme in peroxisomes suggests roles in the regeneration of NADPH for intraperoxisomal reductions, such as the conversion of 2, 4-dienoyl-CoAs to 3-enoyl-CoAs, as well as in peroxisomal reactions that consume 2-oxoglutarate, namely the alpha-hydroxylation of phytanic acid. The cytoplasmic enzyme serves a significant role in cytoplasmic NADPH production. Alternatively spliced transcript variants encoding the same protein have been found for this gene. isocitrate dehydrogenase 1 (NADP+) NA
ENSG00000052344 PRSS8 5652 This gene encodes a member of the peptidase S1 or chymotrypsin family of serine proteases. The encoded preproprotein is proteolytically processed to generate light and heavy chains that associate via a disulfide bond to form the heterodimeric enzyme. This enzyme is highly expressed in prostate epithelia and is one of several proteolytic enzymes found in seminal fluid. This protease exhibits trypsin-like substrate specificity, cleaving protein substrates at the carboxyl terminus of lysine or arginine residues. The encoded protease partially mediates proteolytic activation of the epithelial sodium channel, a regulator of sodium balance, and may also play a role in epithelial barrier formation. protease, serine 8 NA
ENSG00000170899 GSTA4 2941 Cytosolic and membrane-bound forms of glutathione S-transferase are encoded by two distinct supergene families. These enzymes are involved in cellular defense against toxic, carcinogenic, and pharmacologically active electrophilic compounds. At present, eight distinct classes of the soluble cytoplasmic mammalian glutathione S-transferases have been identified: alpha, kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a glutathione S-tranferase belonging to the alpha class. The alpha class genes, which are located in a cluster on chromosome 6, are highly related and encode enzymes with glutathione peroxidase activity that function in the detoxification of lipid peroxidation products. Reactive electrophiles produced by oxidative metabolism have been linked to a number of degenerative diseases including Parkinson’s disease, Alzheimer’s disease, cataract formation, and atherosclerosis. glutathione S-transferase alpha 4 NA
ENSG00000166825 ANPEP 290 Aminopeptidase N is located in the small-intestinal and renal microvillar membrane, and also in other plasma membranes. In the small intestine aminopeptidase N plays a role in the final digestion of peptides generated from hydrolysis of proteins by gastric and pancreatic proteases. Its function in proximal tubular epithelial cells and other cell types is less clear. The large extracellular carboxyterminal domain contains a pentapeptide consensus sequence characteristic of members of the zinc-binding metalloproteinase superfamily. Sequence comparisons with known enzymes of this class showed that CD13 and aminopeptidase N are identical. The latter enzyme was thought to be involved in the metabolism of regulatory peptides by diverse cell types, including small intestinal and renal tubular epithelial cells, macrophages, granulocytes, and synaptic membranes from the CNS. Human aminopeptidase N is a receptor for one strain of human coronavirus that is an important cause of upper respiratory tract infections. Defects in this gene appear to be a cause of various types of leukemia or lymphoma. alanyl aminopeptidase, membrane NA
ENSG00000157617 C2CD2 25966 NA C2 calcium-dependent domain containing 2 NA
ENSG00000167608 TMC4 147798 NA transmembrane channel like 4 NA
ENSG00000197142 ACSL5 51703 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. This isozyme is highly expressed in uterus and spleen, and in trace amounts in normal brain, but has markedly increased levels in malignant gliomas. This gene functions in mediating fatty acid-induced glioma cell growth. Three transcript variants encoding two different isoforms have been found for this gene. acyl-CoA synthetase long-chain family member 5 NA
ENSG00000172831 CES2 8824 This gene encodes a member of the carboxylesterase large family. The family members are responsible for the hydrolysis or transesterification of various xenobiotics, such as cocaine and heroin, and endogenous substrates with ester, thioester, or amide bonds. They may participate in fatty acyl and cholesterol ester metabolism, and may play a role in the blood-brain barrier system. The protein encoded by this gene is the major intestinal enzyme and functions in intestine drug clearance. Alternatively spliced transcript variants have been found for this gene. carboxylesterase 2 NA
ENSG00000164125 FAM198B 51313 NA family with sequence similarity 198 member B NA
ENSG00000081923 ATP8B1 5205 This gene encodes a member of the P-type cation transport ATPase family, which belongs to the subfamily of aminophospholipid-transporting ATPases. The aminophospholipid translocases transport phosphatidylserine and phosphatidylethanolamine from one side of a bilayer to another. Mutations in this gene may result in progressive familial intrahepatic cholestasis type 1 and in benign recurrent intrahepatic cholestasis. ATPase phospholipid transporting 8B1 NA
ENSG00000066230 SLC9A3 6550 The protein encoded by this gene is an epithelial brush border Na/H exchanger that uses an inward sodium ion gradient to expel acids from the cell. Defects in this gene are a cause of congenital secretory sodium diarrhea. Pseudogenes of this gene exist on chromosomes 10 and 22. solute carrier family 9 member A3 NA
ENSG00000079385 CEACAM1 634 This gene encodes a member of the carcinoembryonic antigen (CEA) gene family, which belongs to the immunoglobulin superfamily. Two subgroups of the CEA family, the CEA cell adhesion molecules and the pregnancy-specific glycoproteins, are located within a 1.2 Mb cluster on the long arm of chromosome 19. Eleven pseudogenes of the CEA cell adhesion molecule subgroup are also found in the cluster. The encoded protein was originally described in bile ducts of liver as biliary glycoprotein. Subsequently, it was found to be a cell-cell adhesion molecule detected on leukocytes, epithelia, and endothelia. The encoded protein mediates cell adhesion via homophilic as well as heterophilic binding to other proteins of the subgroup. Multiple cellular activities have been attributed to the encoded protein, including roles in the differentiation and arrangement of tissue three-dimensional structure, angiogenesis, apoptosis, tumor suppression, metastasis, and the modulation of innate and adaptive immune responses. Multiple transcript variants encoding different isoforms have been reported, but the full-length nature of all variants has not been defined. carcinoembryonic antigen related cell adhesion molecule 1 NA
ENSG00000105755 ETHE1 23474 This gene encodes a member of the metallo beta-lactamase family of iron-containing proteins involved in the mitochondrial sulfide oxidation pathway. The encoded protein catalyzes the oxidation of a persulfide substrate to sulfite. Certain mutations in this gene cause ethylmalonic encephalopathy, an infantile metabolic disorder affecting the brain, gastrointestinal tract and peripheral vessels. Alternative splicing results in multiple transcript variants encoding different isoforms. ethylmalonic encephalopathy 1 NA
ENSG00000149418 ST14 6768 The protein encoded by this gene is an epithelial-derived, integral membrane serine protease. This protease forms a complex with the Kunitz-type serine protease inhibitor, HAI-1, and is found to be activated by sphingosine 1-phosphate. This protease has been shown to cleave and activate hepatocyte growth factor/scattering factor, and urokinase plasminogen activator, which suggest the function of this protease as an epithelial membrane activator for other proteases and latent growth factors. The expression of this protease has been associated with breast, colon, prostate, and ovarian tumors, which implicates its role in cancer invasion, and metastasis. suppression of tumorigenicity 14 NA
ENSG00000121900 TMEM54 113452 NA transmembrane protein 54 NA
ENSG00000160191 PDE9A 5152 The protein encoded by this gene catalyzes the hydrolysis of cAMP and cGMP to their corresponding monophosphates. The encoded protein plays a role in signal transduction by regulating the intracellular concentration of these cyclic nucleotides. Multiple transcript variants encoding several different isoforms have been found for this gene. phosphodiesterase 9A NA
ENSG00000095380 NANS 54187 This gene encodes an enzyme that functions in the biosynthetic pathways of sialic acids. In vitro, the encoded protein uses N-acetylmannosamine 6-phosphate and mannose 6-phosphate as substrates to generate phosphorylated forms of N-acetylneuraminic acid (Neu5Ac) and 2-keto-3-deoxy-D-glycero-D-galacto-nononic acid (KDN), respectively; however, it exhibits much higher activity toward the Neu5Ac phosphate product. In insect cells, expression of this gene results in Neu5Ac and KDN production. This gene is related to the E. coli sialic acid synthase gene neuB, and it can partially restore sialic acid synthase activity in an E. coli neuB-negative mutant. N-acetylneuraminate synthase NA
ENSG00000215187 FAM166B 730112 NA family with sequence similarity 166 member B NA
ENSG00000114650 SCAP 22937 This gene encodes a protein with a sterol sensing domain (SSD) and seven WD domains. In the presence of cholesterol, this protein binds to sterol regulatory element binding proteins (SREBPs) and mediates their transport from the ER to the Golgi. The SREBPs are then proteolytically cleaved and regulate sterol biosynthesis. Alternative splicing results in multiple transcript variants. SREBF chaperone NA
ENSG00000174236 REP15 387849 REP15 is a binding partner of the RAB GTPase family member RAB15 that facilitates transferrin receptor (TFRC; MIM 190010) recycling from the endocytic recycling compartment (Strick and Elferink, 2005 [PubMed 16195351]). RAB15 effector protein NA
ENSG00000108179 PPIF 10105 The protein encoded by this gene is a member of the peptidyl-prolyl cis-trans isomerase (PPIase) family. PPIases catalyze the cis-trans isomerization of proline imidic peptide bonds in oligopeptides and accelerate the folding of proteins. This protein is part of the mitochondrial permeability transition pore in the inner mitochondrial membrane. Activation of this pore is thought to be involved in the induction of apoptotic and necrotic cell death. peptidylprolyl isomerase F NA
ENSG00000144381 HSPD1 3329 This gene encodes a member of the chaperonin family. The encoded mitochondrial protein may function as a signaling molecule in the innate immune system. This protein is essential for the folding and assembly of newly imported proteins in the mitochondria. This gene is adjacent to a related family member and the region between the 2 genes functions as a bidirectional promoter. Several pseudogenes have been associated with this gene. Two transcript variants encoding the same protein have been identified for this gene. Mutations associated with this gene cause autosomal recessive spastic paraplegia 13. heat shock protein family D (Hsp60) member 1 NA
ENSG00000167600 CYP2S1 29785 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. In rodents, the homologous protein has been shown to metabolize certain carcinogens; however, the specific function of the human protein has not been determined. cytochrome P450 family 2 subfamily S member 1 NA
ENSG00000178623 GPR35 2859 NA G protein-coupled receptor 35 NA
ENSG00000108679 LGALS3BP 3959 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. LGALS3BP has been found elevated in the serum of patients with cancer and in those infected by the human immunodeficiency virus (HIV). It appears to be implicated in immune response associated with natural killer (NK) and lymphokine-activated killer (LAK) cell cytotoxicity. Using fluorescence in situ hybridization the full length 90K cDNA has been localized to chromosome 17q25. The native protein binds specifically to a human macrophage-associated lectin known as Mac-2 and also binds galectin 1. lectin, galactoside binding soluble 3 binding protein NA
ENSG00000158467 AHCYL2 23382 The protein encoded by this gene acts as a homotetramer and may be involved in the conversion of S-adenosyl-L-homocysteine to L-homocysteine and adenosine. Several transcript variants encoding different isoforms have been found for this gene. adenosylhomocysteinase like 2 NA
ENSG00000204344 STK19 8859 This gene encodes a serine/threonine kinase which localizes predominantly to the nucleus. Its specific function is unknown; it is possible that phosphorylation of this protein is involved in transcriptional regulation. This gene localizes to the major histocompatibility complex (MHC) class III region on chromosome 6 and expresses two transcript variants. serine/threonine kinase 19 NA
ENSG00000161013 MGAT4B 11282 This gene encodes a key glycosyltransferase that regulates the formation of tri- and multiantennary branching structures in the Golgi apparatus. The encoded protein, in addition to the related isoenzyme A, catalyzes the transfer of N-acetylglucosamine (GlcNAc) from UDP-GlcNAc in a beta-1,4 linkage to the Man-alpha-1,3-Man-beta-1,4-GlcNAc arm of R-Man-alpha-1,6(GlcNAc-beta-1,2-Man-alpha-1,3)Man-beta-1, 4-GlcNAc-beta-1,4-GlcNAc-beta-1-Asn. The encoded protein may play a role in regulating the availability of serum glycoproteins, oncogenesis, and differentiation. mannosyl (alpha-1,3-)-glycoprotein beta-1,4-N-acetylglucosaminyltransferase, isozyme B NA
ENSG00000134824 FADS2 9415 The protein encoded by this gene is a member of the fatty acid desaturase (FADS) gene family. Desaturase enzymes regulate unsaturation of fatty acids through the introduction of double bonds between defined carbons of the fatty acyl chain. FADS family members are considered fusion products composed of an N-terminal cytochrome b5-like domain and a C-terminal multiple membrane-spanning desaturase portion, both of which are characterized by conserved histidine motifs. This gene is clustered with family members at 11q12-q13.1; this cluster is thought to have arisen evolutionarily from gene duplication based on its similar exon/intron organization. Alternative splicing results in multiple transcript variants encoding different isoforms. fatty acid desaturase 2 NA
ENSG00000117620 SLC35A3 23443 This gene encodes a UDP-N-acetylglucosamine transporter found in the golgi apparatus membrane. In cattle, a missense mutation in this gene causes complex vertebral malformation. Alternative splicing results in multiple transcript variants. solute carrier family 35 member A3 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",10,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 11 Annotations

out <- mygene::queryMany(gene_list[11,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
myelin protein zero 4359 This gene is specifically expressed in Schwann cells of the peripheral nervous system and encodes a type I transmembrane glycoprotein that is a major structural protein of the peripheral myelin sheath. The encoded protein contains a large hydrophobic extracellular domain and a smaller basic intracellular domain, which are essential for the formation and stabilization of the multilamellar structure of the compact myelin. Mutations in this gene are associated with autosomal dominant form of Charcot-Marie-Tooth disease type 1 (CMT1B) and other polyneuropathies, such as Dejerine-Sottas syndrome (DSS) and congenital hypomyelinating neuropathy (CHN). A recent study showed that two isoforms are produced from the same mRNA by use of alternative in-frame translation termination codons via a stop codon readthrough mechanism. MPZ ENSG00000158887 NA
apolipoprotein D 347 This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. APOD ENSG00000189058 NA
peripheral myelin protein 22 5376 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. PMP22 ENSG00000109099 NA
periaxin 57716 This gene encodes a protein involved in peripheral nerve myelin upkeep. The encoded protein contains 2 PDZ domains which were named after PSD95 (post synaptic density protein), DlgA (Drosophila disc large tumor suppressor), and ZO1 (a mammalian tight junction protein). Two alternatively spliced transcript variants have been described for this gene which encode different protein isoforms and which are targeted differently in the Schwann cell. Mutations in this gene cause Charcot-Marie-Tooth neuoropathy, type 4F and Dejerine-Sottas neuropathy. PRX ENSG00000105227 NA
AHNAK nucleoprotein 79026 NA AHNAK ENSG00000124942 NA
pleckstrin homology domain containing A4 57664 NA PLEKHA4 ENSG00000105559 NA
nerve growth factor receptor 4804 Nerve growth factor receptor contains an extracellular domain containing four 40-amino acid repeats with 6 cysteine residues at conserved positions followed by a serine/threonine-rich region, a single transmembrane domain, and a 155-amino acid cytoplasmic domain. The cysteine-rich region contains the nerve growth factor binding domain. NGFR ENSG00000064300 NA
integrin subunit alpha 6 3655 The gene encodes a member of the integrin alpha chain family of proteins. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 6 subunit. This subunit may associate with a beta 1 or beta 4 subunit to form an integrin that interacts with extracellular matrix proteins including members of the laminin family. The alpha 6 beta 4 integrin may promote tumorigenesis, while the alpha 6 beta 1 integrin may negatively regulate erbB2/HER2 signaling. Alternative splicing results in multiple transcript variants. ITGA6 ENSG00000091409 NA
ubiquitin specific peptidase 53 54532 NA USP53 ENSG00000145390 NA
apoptosis-associated tyrosine kinase 9625 The protein encoded by this gene contains a tyrosine kinase domain at the N-terminus and a proline-rich domain at the C-terminus. This gene is induced during apoptosis, and expression of this gene may be a necessary pre-requisite for the induction of growth arrest and/or apoptosis of myeloid precursor cells. This gene has been shown to produce neuronal differentiation in a neuroblastoma cell line. Two transcript variants encoding different isoforms have been found for this gene. AATK ENSG00000181409 NA
semaphorin 3B 7869 The protein encoded by this gene belongs to the class-3 semaphorin/collapsin family, whose members function in growth cone guidance during neuronal development. This family member inhibits axonal extension and has been shown to act as a tumor suppressor by inducing apoptosis. Alternative splicing of this gene results in multiple transcript variants. SEMA3B ENSG00000012171 NA
secreted frizzled related protein 4 6424 Secreted frizzled-related protein 4 (SFRP4) is a member of the SFRP family that contains a cysteine-rich domain homologous to the putative Wnt-binding site of Frizzled proteins. SFRPs act as soluble modulators of Wnt signaling. The expression of SFRP4 in ventricular myocardium correlates with apoptosis related gene expression. SFRP4 ENSG00000106483 NA
insulin like growth factor binding protein 6 3489 NA IGFBP6 ENSG00000167779 NA
inverted formin, FH2 and WH2 domain containing 64423 This gene represents a member of the formin family of proteins. It is considered a diaphanous formin due to the presence of a diaphanous inhibitory domain located at the N-terminus of the encoded protein. Studies of a similar mouse protein indicate that the protein encoded by this locus may function in polymerization and depolymerization of actin filaments. Mutations at this locus have been associated with focal segmental glomerulosclerosis 5. INF2 ENSG00000203485 NA
Rho guanine nucleotide exchange factor 10 9639 This gene encodes a Rho guanine nucleotide exchange factor (GEF). Rho GEFs regulate the activity of small Rho GTPases by stimulating the exchange of guanine diphosphate (GDP) for guanine triphosphate (GTP) and may play a role in neural morphogenesis. Mutations in this gene are associated with slowed nerve conduction velocity (SNCV). Alternative splicing results in multiple transcript variants. ARHGEF10 ENSG00000104728 NA
bone morphogenetic protein 8b 656 The bone morphogenetic proteins (BMPs) are a family of secreted signaling molecules that can induce ectopic bone growth. Many BMPs are part of the transforming growth factor-beta (TGFB) superfamily. BMPs were originally identified by an ability of demineralized bone extract to induce endochondral osteogenesis in vivo in an extraskeletal site. Based on its expression early in embryogenesis, the BMP encoded by this gene has a proposed role in early development. In addition, the fact that this BMP is closely related to BMP5 and BMP7 has led to speculation of possible bone inductive activity. BMP8B ENSG00000116985 NA
solute carrier family 2 member 1 6513 This gene encodes a major glucose transporter in the mammalian blood-brain barrier. The encoded protein is found primarily in the cell membrane and on the cell surface, where it can also function as a receptor for human T-cell leukemia virus (HTLV) I and II. Mutations in this gene have been found in a family with paroxysmal exertion-induced dyskinesia. SLC2A1 ENSG00000117394 NA
NIMA related kinase 1 4750 The protein encoded by this gene is a serine/threonine kinase involved in cell cycle regulation. The encoded protein is found in a centrosomal complex with FEZ1, a neuronal protein that plays a role in axonal development. Defects in this gene are a cause of polycystic kidney disease (PKD). Several transcript variants encoding different isoforms have been found for this gene. NEK1 ENSG00000137601 NA
N-myc downstream regulated 1 10397 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein involved in stress responses, hormone responses, cell growth, and differentiation. The encoded protein is necessary for p53-mediated caspase activation and apoptosis. Mutations in this gene are a cause of Charcot-Marie-Tooth disease type 4D, and expression of this gene may be a prognostic indicator for several types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NDRG1 ENSG00000104419 NA
integrin subunit beta 4 3691 Integrins are heterodimers comprised of alpha and beta subunits, that are noncovalently associated transmembrane glycoprotein receptors. Different combinations of alpha and beta polypeptides form complexes that vary in their ligand-binding specificities. Integrins mediate cell-matrix or cell-cell adhesion, and transduced signals that regulate gene expression and cell growth. This gene encodes the integrin beta 4 subunit, a receptor for the laminins. This subunit tends to associate with alpha 6 subunit and is likely to play a pivotal role in the biology of invasive carcinoma. Mutations in this gene are associated with epidermolysis bullosa with pyloric atresia. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. ITGB4 ENSG00000132470 NA
EH domain binding protein 1 23301 This gene encodes an Eps15 homology domain binding protein. The encoded protein may play a role in endocytic trafficking. A single nucleotide polymorphism in this gene is associated with an aggressive form of prostate cancer. Alternate splicing results in multiple transcript variants. EHBP1 ENSG00000115504 NA
spectrin beta, non-erythrocytic 5 51332 NA SPTBN5 ENSG00000137877 NA
heat shock protein family A (Hsp70) member 12A 259217 NA HSPA12A ENSG00000165868 NA
papilin, proteoglycan-like sulfated glycoprotein 89932 NA PAPLN ENSG00000100767 NA
semaphorin 4C 54910 NA SEMA4C ENSG00000168758 NA
thrombospondin 4 7060 The protein encoded by this gene belongs to the thrombospondin protein family. Thrombospondin family members are adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. This protein forms a pentamer and can bind to heparin and calcium. It is involved in local signaling in the developing and adult nervous system, and it contributes to spinal sensitization and neuropathic pain states. This gene is activated during the stromal response to invasive breast cancer. It may also play a role in inflammatory responses in Alzheimer’s disease. Alternative splicing results in multiple transcript variants. THBS4 ENSG00000113296 NA
aryl hydrocarbon receptor 196 The protein encoded by this gene is a ligand-activated helix-loop-helix transcription factor involved in the regulation of biological responses to planar aromatic hydrocarbons. This receptor has been shown to regulate xenobiotic-metabolizing enzymes such as cytochrome P450. Before ligand binding, the encoded protein is sequestered in the cytoplasm; upon ligand binding, this protein moves to the nucleus and stimulates transcription of target genes. AHR ENSG00000106546 NA
Rho GTPase activating protein 19 84986 Members of the ARHGAP family, such as ARHGAP19, encode negative regulators of Rho GTPases (see RHOA; MIM 165390), which are involved in cell migration, proliferation, and differentiation, actin remodeling, and G1 cell cycle progression (Lv et al., 2007 [PubMed 17454002]). ARHGAP19 ENSG00000213390 NA
erythrocyte membrane protein band 4.1 like 2 2037 NA EPB41L2 ENSG00000079819 NA
nuclear receptor binding protein 2 340371 NA NRBP2 ENSG00000185189 NA
decorin 1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. DCN ENSG00000011465 NA
filamin B 2317 This gene encodes a member of the filamin family. The encoded protein interacts with glycoprotein Ib alpha as part of the process to repair vascular injuries. The platelet glycoprotein Ib complex includes glycoprotein Ib alpha, and it binds the actin cytoskeleton. Mutations in this gene have been found in several conditions: atelosteogenesis type 1 and type 3; boomerang dysplasia; autosomal dominant Larsen syndrome; and spondylocarpotarsal synostosis syndrome. Multiple alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. FLNB ENSG00000136068 NA
von Willebrand factor A domain containing 1 64856 VWA1 belongs to the von Willebrand factor (VWF; MIM 613160) A (VWFA) domain superfamily of extracellular matrix proteins and appears to play a role in cartilage structure and function (Fitzgerald et al., 2002 [PubMed 12062410]). VWA1 ENSG00000179403 NA
phosphoinositide-3-kinase regulatory subunit 1 5295 Phosphatidylinositol 3-kinase phosphorylates the inositol ring of phosphatidylinositol at the 3-prime position. The enzyme comprises a 110 kD catalytic subunit and a regulatory subunit of either 85, 55, or 50 kD. This gene encodes the 85 kD regulatory subunit. Phosphatidylinositol 3-kinase plays an important role in the metabolic actions of insulin, and a mutation in this gene has been associated with insulin resistance. Alternative splicing of this gene results in four transcript variants encoding different isoforms. PIK3R1 ENSG00000145675 NA
plectin 5339 Plectin is a prominent member of an important family of structurally and in part functionally related proteins, termed plakins or cytolinkers, that are capable of interlinking different elements of the cytoskeleton. Plakins, with their multi-domain structure and enormous size, not only play crucial roles in maintaining cell and tissue integrity and orchestrating dynamic changes in cytoarchitecture and cell shape, but also serve as scaffolding platforms for the assembly, positioning, and regulation of signaling complexes (reviewed in PMID: 9701547, 11854008, and 17499243). Plectin is expressed as several protein isoforms in a wide range of cell types and tissues from a single gene located on chromosome 8 in humans (PMID: 8633055, 8698233). Until 2010, this locus was named plectin 1 (symbol PLEC1 in human; Plec1 in mouse and rat) and the gene product had been referred to as ‘hemidesmosomal protein 1’ or ‘plectin 1, intermediate filament binding 500kDa’. These names were superseded by plectin. The plectin gene locus in mouse on chromosome 15 has been analyzed in detail (PMID: 10556294, 14559777), revealing a genomic exon-intron organization with well over 40 exons spanning over 62 kb and an unusual 5’ transcript complexity of plectin isoforms. Eleven exons (1-1j) have been identified that alternatively splice directly into a common exon 2 which is the first exon to encode plectin’s highly conserved actin binding domain (ABD). Three additional exons (-1, 0a, and 0) splice into an alternative first coding exon (1c), and two additional exons (2alpha and 3alpha) are optionally spliced within the exons encoding the acting binding domain (exons 2-8). Analysis of the human locus has identified eight of the eleven alternative 5’ exons found in mouse and rat (PMID: 14672974); exons 1i, 1j and 1h have not been confirmed in human. Furthermore, isoforms lacking the central rod domain encoded by exon 31 have been detected in mouse (PMID:10556294), rat (PMID: 9177781), and human (PMID: 11441066, 10780662, 20052759). The short alternative amino-terminal sequences encoded by the different first exons direct the targeting of the various isoforms to distinct subcellular locations (PMID: 14559777). As the expression of specific plectin isoforms was found to be dependent on cell type (tissue) and stage of development (PMID: 10556294, 12542521, 17389230) it appears that each cell type (tissue) contains a unique set (proportion and composition) of plectin isoforms, as if custom-made for specific requirements of the particular cells. Concordantly, individual isoforms were found to carry out distinct and specific functions (PMID: 14559777, 12542521, 18541706). In 1996, a number of groups reported that patients suffering from epidermolysis bullosa simplex with muscular dystrophy (EBS-MD) lacked plectin expression in skin and muscle tissues due to defects in the plectin gene (PMID: 8698233, 8941634, 8636409, 8894687, 8696340). Two other subtypes of plectin-related EBS have been described: EBS-pyloric atresia (PA) and EBS-Ogna. For reviews of plectin-related diseases see PMID: 15810881, 19945614. Mutations in the plectin gene related to human diseases should be named based on the position in NM_000445 (variant 1, isoform 1c), unless the mutation is located within one of the other alternative first exons, in which case the position in the respective Reference Sequence should be used. PLEC ENSG00000178209 NA
microfibrillar associated protein 5 8076 This gene encodes a 25-kD microfibril-associated glycoprotein which is a component of microfibrils of the extracellular matrix. The encoded protein promotes attachment of cells to microfibrils via alpha-V-beta-3 integrin. Deficiency of this gene in mice results in neutropenia. Alternate splicing results in multiple transcript variants encoding different isoforms. MFAP5 ENSG00000197614 NA
sperm specific antigen 2 6744 NA SSFA2 ENSG00000138434 NA
SH3 and PX domains 2A 9644 NA SH3PXD2A ENSG00000107957 NA
protein tyrosine phosphatase domain containing 1 138639 The protein encoded by this gene contains a characteristic motif of protein tyrosine phosphatases (PTPs). PTPs regulate activities of phosphoproteins through dephosphorylation. They are signaling molecules involved in the regulation of a wide variety of biological processes. The specific function of this protein has not yet been determined. Alternatively spliced transcript variants encoding distinct isoforms have been identified. PTPDC1 ENSG00000158079 NA
latent transforming growth factor beta binding protein 4 8425 The protein encoded by this gene binds transforming growth factor beta (TGFB) as it is secreted and targeted to the extracellular matrix. TGFB is biologically latent after secretion and insertion into the extracellular matrix, and sheds TGFB and other proteins upon activation. Defects in this gene may be a cause of cutis laxa and severe pulmonary, gastrointestinal, and urinary abnormalities. Three transcript variants encoding different isoforms have been found for this gene. LTBP4 ENSG00000090006 NA
actin filament associated protein 1 like 2 84632 NA AFAP1L2 ENSG00000169129 NA
aminoadipate-semialdehyde synthase 10157 This gene encodes a bifunctional enzyme that catalyzes the first two steps in the mammalian lysine degradation pathway. The N-terminal and the C-terminal portions of this enzyme contain lysine-ketoglutarate reductase and saccharopine dehydrogenase activity, respectively, resulting in the conversion of lysine to alpha-aminoadipic semialdehyde. Mutations in this gene are associated with familial hyperlysinemia. AASS ENSG00000008311 NA
myocilin 4653 MYOC encodes the protein myocilin, which is believed to have a role in cytoskeletal function. MYOC is expressed in many occular tissues, including the trabecular meshwork, and was revealed to be the trabecular meshwork glucocorticoid-inducible response protein (TIGR). The trabecular meshwork is a specialized eye tissue essential in regulating intraocular pressure, and mutations in MYOC have been identified as the cause of hereditary juvenile-onset open-angle glaucoma. MYOC ENSG00000034971 NA
spectrin beta, non-erythrocytic 1 6711 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. SPTBN1 ENSG00000115306 NA
StAR related lipid transfer domain containing 13 90627 This gene encodes a protein which contains an N-terminal sterile alpha motif (SAM) for protein-protein interactions, followed by an ATP/GTP-binding motif, a GTPase-activating protein (GAP) domain, and a C-terminal STAR-related lipid transfer (START) domain. It may be involved in regulation of cytoskeletal reorganization, cell proliferation, and cell motility, and acts as a tumor suppressor in hepatoma cells. The gene is located in a region of chromosome 13 that is associated with loss of heterozygosity in hepatocellular carcinomas. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. STARD13 ENSG00000133121 NA
platelet derived growth factor receptor like 5157 This gene encodes a protein with significant sequence similarity to the ligand binding domain of platelet-derived growth factor receptor beta. Mutations in this gene, or deletion of a chromosomal segment containing this gene, are associated with sporadic hepatocellular carcinomas, colorectal cancers, and non-small cell lung cancers. This suggests this gene product may function as a tumor suppressor. PDGFRL ENSG00000104213 NA
zinc finger DHHC-type containing 8 29801 This gene encodes a four transmembrane protein that is a member of the zinc finger DHHC domain-containing protein family. The encoded protein may function as a palmitoyltransferase. Defects in this gene may be associated with a susceptibility to schizophrenia. Alternate splicing of this gene results in multiple transcript variants. A pseudogene of this gene is found on chromosome 22. ZDHHC8 ENSG00000099904 NA
SAM domain and HD domain 1 25939 This gene may play a role in regulation of the innate immune response. The encoded protein is upregulated in response to viral infection and may be involved in mediation of tumor necrosis factor-alpha proinflammatory responses. Mutations in this gene have been associated with Aicardi-Goutieres syndrome. SAMHD1 ENSG00000101347 NA
SECIS binding protein 2 like 9728 NA SECISBP2L ENSG00000138593 NA
somatomedin B and thrombospondin type 1 domain containing 157869 NA SBSPON ENSG00000164764 NA
patched 1 5727 This gene encodes a member of the patched gene family. The encoded protein is the receptor for sonic hedgehog, a secreted molecule implicated in the formation of embryonic structures and in tumorigenesis, as well as the desert hedgehog and indian hedgehog proteins. This gene functions as a tumor suppressor. Mutations of this gene have been associated with basal cell nevus syndrome, esophageal squamous cell carcinoma, trichoepitheliomas, transitional cell carcinomas of the bladder, as well as holoprosencephaly. Alternative splicing results in multiple transcript variants encoding different isoforms. Additional splice variants have been described, but their full length sequences and biological validity cannot be determined currently. PTCH1 ENSG00000185920 NA
early growth response 2 1959 The protein encoded by this gene is a transcription factor with three tandem C2H2-type zinc fingers. Defects in this gene are associated with Charcot-Marie-Tooth disease type 1D (CMT1D), Charcot-Marie-Tooth disease type 4E (CMT4E), and with Dejerine-Sottas syndrome (DSS). Multiple transcript variants encoding two different isoforms have been found for this gene. EGR2 ENSG00000122877 NA
rhomboid 5 homolog 1 (Drosophila) 64285 NA RHBDF1 ENSG00000007384 NA
ATP binding cassette subfamily A member 10 10349 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, and White). This encoded protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. This gene is clustered among 4 other ABC1 family members on 17q24, but neither the substrate nor the function of this gene is known. ABCA10 ENSG00000154263 NA
podocan 127435 NA PODN ENSG00000174348 NA
cell adhesion molecule L1 like 10752 The protein encoded by this gene is a member of the L1 gene family of neural cell adhesion molecules. It is a neural recognition molecule that may be involved in signal transduction pathways. The deletion of one copy of this gene may be responsible for mental defects in patients with 3p- syndrome. This protein may also play a role in the growth of certain cancers. Alternate splicing results in both coding and non-coding variants. CHL1 ENSG00000134121 NA
tensin 3 64759 NA TNS3 ENSG00000136205 NA
GLI family zinc finger 1 2735 This gene encodes a member of the Kruppel family of zinc finger proteins. The encoded transcription factor is activated by the sonic hedgehog signal transduction cascade and regulates stem cell proliferation. The activity and nuclear localization of this protein is negatively regulated by p53 in an inhibitory loop. Multiple transcript variants encoding different isoforms have been found for this gene. GLI1 ENSG00000111087 NA
nuclear receptor subfamily 2 group F member 2 7026 This gene encodes a member of the steroid thyroid hormone superfamily of nuclear receptors. The encoded protein is a ligand inducible transcription factor that is involved in the regulation of many different genes. Alternate splicing results in multiple transcript variants. NR2F2 ENSG00000185551 NA
PDZ domain containing 2 23037 The protein encoded by this gene contains six PDZ domains and shares sequence similarity with pro-interleukin-16 (pro-IL-16). Like pro-IL-16, the encoded protein localizes to the endoplasmic reticulum and is thought to be cleaved by a caspase to produce a secreted peptide containing two PDZ domains. In addition, this gene is upregulated in primary prostate tumors and may be involved in the early stages of prostate tumorigenesis. PDZD2 ENSG00000133401 NA
secreted protein acidic and cysteine rich 6678 This gene encodes a cysteine-rich acidic matrix-associated protein. The encoded protein is required for the collagen in bone to become calcified but is also involved in extracellular matrix synthesis and promotion of changes to cell shape. The gene product has been associated with tumor suppression but has also been correlated with metastasis based on changes to cell shape which can promote tumor cell invasion. Three transcript variants encoding different isoforms have been found for this gene. SPARC ENSG00000113140 NA
transducin like enhancer of split 1 7088 NA TLE1 ENSG00000196781 NA
protein tyrosine phosphatase, receptor type U 10076 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP possesses an extracellular region, a single transmembrane region, and two tandem intracellular catalytic domains, and thus represents a receptor-type PTP. The extracellular region contains a meprin-A5 antigen-PTP (MAM) domain, Ig-like and fibronectin type III-like repeats. This PTP was thought to play roles in cell-cell recognition and adhesion. Studies of the similar gene in mice suggested the role of this PTP in early neural development. The expression of this gene was reported to be regulated by phorbol myristate acetate (PMA) or calcium ionophore in Jurkat T lymphoma cells. Alternatively spliced transcript variants have been reported. PTPRU ENSG00000060656 NA
adducin 3 120 Adducins are heteromeric proteins composed of different subunits referred to as adducin alpha, beta and gamma. The three subunits are encoded by distinct genes and belong to a family of membrane skeletal proteins involved in the assembly of spectrin-actin network in erythrocytes and at sites of cell-cell contact in epithelial tissues. While adducins alpha and gamma are ubiquitously expressed, the expression of adducin beta is restricted to brain and hematopoietic tissues. Adducin, originally purified from human erythrocytes, was found to be a heterodimer of adducins alpha and beta. Polymorphisms resulting in amino acid substitutions in these two subunits have been associated with the regulation of blood pressure in an animal model of hypertension. Heterodimers consisting of alpha and gamma subunits have also been described. Structurally, each subunit is comprised of two distinct domains. The amino-terminal region is protease resistant and globular in shape, while the carboxy-terminal region is protease sensitive. The latter contains multiple phosphorylation sites for protein kinase C, the binding site for calmodulin, and is required for association with spectrin and actin. Alternatively spliced adducin gamma transcripts encoding different isoforms have been described. The functions of the different isoforms are not known. ADD3 ENSG00000148700 NA
StAR related lipid transfer domain containing 9 57519 NA STARD9 ENSG00000159433 NA
retinol dehydrogenase 10 (all-trans) 157506 This gene encodes a retinol dehydrogenase, which converts all-trans-retinol to all-trans-retinal, with preference for NADP as a cofactor. Studies in mice suggest that this protein is essential for synthesis of embryonic retinoic acid and is required for limb, craniofacial, and organ development. RDH10 ENSG00000121039 NA
transmembrane protease, serine 5 80975 This gene encodes a protein that belongs to the serine protease family. Serine proteases are known to be involved in many physiological and pathological processes. Alternative splicing results in multiple transcript variants. TMPRSS5 ENSG00000166682 NA
ST6GALNAC2, alpha-2,6-sialyltransferase 2 10610 ST6GALNAC2 belongs to a family of sialyltransferases that add sialic acids to the nonreducing ends of glycoconjugates. At the cell surface, these modifications have roles in cell-cell and cell-substrate interactions, bacterial adhesion, and protein targeting (Samyn-Petit et al., 2000 [PubMed 10742600]). ST6GALNAC2 ENSG00000070731 NA
mitogen-activated protein kinase 8 interacting protein 1 9479 This gene encodes a regulator of the pancreatic beta-cell function. It is highly similar to JIP-1, a mouse protein known to be a regulator of c-Jun amino-terminal kinase (Mapk8). This protein has been shown to prevent MAPK8 mediated activation of transcription factors, and to decrease IL-1 beta and MAP kinase kinase 1 (MEKK1) induced apoptosis in pancreatic beta cells. This protein also functions as a DNA-binding transactivator of the glucose transporter GLUT2. RE1-silencing transcription factor (REST) is reported to repress the expression of this gene in insulin-secreting beta cells. This gene is found to be mutated in a type 2 diabetes family, and thus is thought to be a susceptibility gene for type 2 diabetes. MAPK8IP1 ENSG00000121653 NA
DNA damage inducible transcript 4 54541 NA DDIT4 ENSG00000168209 NA
FERM domain containing 8 83786 NA FRMD8 ENSG00000126391 NA
MICAL like 2 79778 NA MICALL2 ENSG00000164877 NA
dystroglycan 1 1605 This gene encodes dystroglycan, a central component of dystrophin-glycoprotein complex that links the extracellular matrix and the cytoskeleton in the skeletal muscle. The encoded preproprotein undergoes O- and N-glycosylation, and proteolytic processing to generate alpha and beta subunits. Certain mutations in this gene are known to cause distinct forms of muscular dystrophy. Alternative splicing results in multiple transcript variants, all encoding the same protein. DAG1 ENSG00000173402 NA
ATP binding cassette subfamily A member 9 10350 This gene is a member of the superfamily of ATP-binding cassette (ABC) transporters and the encoded protein contains two transmembrane domains and two nucleotide binding folds. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, and White). This gene is a member of the ABC1 subfamily and is clustered with four other ABC1 family members on chromosome 17q24. Transcriptional expression of this gene is induced during monocyte differentiation into macrophages and is suppressed by cholesterol import. ABCA9 ENSG00000154258 NA
Kruppel-like factor 9 687 The protein encoded by this gene is a transcription factor that binds to GC box elements located in the promoter. Binding of the encoded protein to a single GC box inhibits mRNA expression while binding to tandemly repeated GC box elements activates transcription. KLF9 ENSG00000119138 NA
sestrin 3 143686 This gene encodes a member of the sestrin family of stress-induced proteins. The encoded protein reduces the levels of intracellular reactive oxygen species induced by activated Ras downstream of RAC-alpha serine/threonine-protein kinase (Akt) and FoxO transcription factor. The protein is required for normal regulation of blood glucose, insulin resistance and plays a role in lipid storage in obesity. Alternative splicing results in multiple transcript variants. SESN3 ENSG00000149212 NA
Bardet-Biedl syndrome 2 583 This gene is a member of the Bardet-Biedl syndrome (BBS) gene family. Bardet-Biedl syndrome is an autosomal recessive disorder characterized by severe pigmentary retinopathy, obesity, polydactyly, renal malformation and mental retardation. The proteins encoded by BBS gene family members are structurally diverse and the similar phenotypes exhibited by mutations in BBS gene family members is likely due to their shared roles in cilia formation and function. Many BBS proteins localize to the basal bodies, ciliary axonemes, and pericentriolar regions of cells. BBS proteins may also be involved in intracellular trafficking via microtubule-related transport. The protein encoded by this gene forms a multiprotein BBSome complex with seven other BBS proteins. BBS2 ENSG00000125124 NA
uncharacterized LOC100132215 100132215 NA LOC100132215 ENSG00000231609 NA
uncharacterized LOC100288866 100288866 NA LOC100288866 ENSG00000249906 NA
leucine-rich repeat containing 8 family member B 23507 NA LRRC8B ENSG00000197147 NA
tight junction protein 1 7082 This gene encodes a protein located on a cytoplasmic membrane surface of intercellular tight junctions. The encoded protein may be involved in signal transduction at cell-cell junctions. Alternative splicing of this gene results in multiple transcript variants. TJP1 ENSG00000104067 NA
cytochrome P450 family 1 subfamily B member 1 1545 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The enzyme encoded by this gene localizes to the endoplasmic reticulum and metabolizes procarcinogens such as polycyclic aromatic hydrocarbons and 17beta-estradiol. Mutations in this gene have been associated with primary congenital glaucoma; therefore it is thought that the enzyme also metabolizes a signaling molecule involved in eye development, possibly a steroid. CYP1B1 ENSG00000138061 NA
cytochrome P450 family 2 subfamily U member 1 113612 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This enzyme is a hydroxylase that metabolizes arachidonic acid, docosahexaenoic acid, and other long chain fatty acids. CYP2U1 ENSG00000155016 NA
zinc finger AN1-type containing 5 7763 NA ZFAND5 ENSG00000107372 NA
SRY-box 13 9580 This gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins. It has also been determined to be a type-1 diabetes autoantigen, also known as islet cell antibody 12. SOX13 ENSG00000143842 NA
NA ENSG00000238018 NA AC093110.3 ENSG00000238018 NA
exosome component 10 5394 NA EXOSC10 ENSG00000171824 NA
gap junction protein gamma 3 349149 This gene encodes a gap junction protein. The encoded protein, also known as a connexin, plays a role in formation of gap junctions, which provide direct connections between neighboring cells. Mutations in this gene have been reported to be associated with nonsyndromic hearing loss. GJC3 ENSG00000176402 NA
NA ENSG00000257225 NA RP11-328C8.4 ENSG00000257225 NA
uveal autoantigen with coiled-coil domains and ankyrin repeats 55075 NA UACA ENSG00000137831 NA
utrophin 7402 This gene shares both structural and functional similarities with the dystrophin gene. It contains an actin-binding N-terminus, a triple coiled-coil repeat central region, and a C-terminus that consists of protein-protein interaction motifs which interact with dystroglycan protein components. The protein encoded by this gene is located at the neuromuscular synapse and myotendinous junctions, where it participates in post-synaptic membrane maintenance and acetylcholine receptor clustering. Mouse studies suggest that this gene may serve as a functional substitute for the dystrophin gene and therefore, may serve as a potential therapeutic alternative to muscular dystrophy which is caused by mutations in the dystrophin gene. Alternative splicing of the utrophin gene has been described; however, the full-length nature of these variants has not yet been determined. UTRN ENSG00000152818 NA
NA ENSG00000232767 NA RP11-498B4.5 ENSG00000232767 NA
NA NA NA NA ENSG00000163113 TRUE
ral guanine nucleotide dissociation stimulator like 1 23179 NA RGL1 ENSG00000143344 NA
CASK interacting protein 2 57513 This gene encodes a large protein that contains six ankyrin repeats, as well as a Src homology 3 (SH3) domain and two sterile alpha motif (SAM) domains, which may be involved in protein-protein interactions. The C-terminal portion of this protein is proline-rich and contains a conserved region. A related protein interacts with calcium/calmodulin-dependent serine protein kinase (CASK). Alternative splicing results in multiple transcript variants. CASKIN2 ENSG00000177303 NA
sarcoglycan beta 6443 This gene encodes a member of the sarcoglycan family. Sarcoglycans are transmembrane components in the dystrophin-glycoprotein complex which help stabilize the muscle fiber membranes and link the muscle cytoskeleton to the extracellular matrix. Mutations in this gene have been associated with limb-girdle muscular dystrophy. SGCB ENSG00000163069 NA
NA ENSG00000269926 NA RP11-442H21.2 ENSG00000269926 NA
FXYD domain containing ion transport regulator 1 5348 This gene encodes a member of a family of small membrane proteins that share a 35-amino acid signature sequence domain, beginning with the sequence PFXYD and containing 7 invariant and 6 highly conserved amino acids. The approved human gene nomenclature for the family is FXYD-domain containing ion transport regulator. Mouse FXYD5 has been termed RIC (Related to Ion Channel). FXYD2, also known as the gamma subunit of the Na,K-ATPase, regulates the properties of that enzyme. FXYD1 (phospholemman), FXYD2 (gamma), FXYD3 (MAT-8), FXYD4 (CHIF), and FXYD5 (RIC) have been shown to induce channel activity in experimental expression systems. Transmembrane topology has been established for two family members (FXYD1 and FXYD2), with the N-terminus extracellular and the C-terminus on the cytoplasmic side of the membrane. The protein encoded by this gene is a plasma membrane substrate for several kinases, including protein kinase A, protein kinase C, NIMA kinase, and myotonic dystrophy kinase. It is thought to form an ion channel or regulate ion channel activity. Transcript variants with different 5’ UTR sequences have been described in the literature. FXYD1 ENSG00000266964 NA
interleukin 34 146433 Interleukin-34 is a cytokine that promotes the differentiation and viability of monocytes and macrophages through the colony-stimulating factor-1 receptor (CSF1R; MIM 164770) (Lin et al., 2008 [PubMed 18467591]). IL34 ENSG00000157368 NA
NA ENSG00000221857 NA CTD-2527I21.4 ENSG00000221857 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",11,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 12 Annotations

out <- mygene::queryMany(gene_list[12,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol name query X_id summary notfound
IGHM immunoglobulin heavy constant mu ENSG00000211899 ENSG00000211899 NA NA
IGHG1 immunoglobulin heavy constant gamma 1 (G1m marker) ENSG00000211896 ENSG00000211896 NA NA
IGHG2 immunoglobulin heavy constant gamma 2 (G2m marker) ENSG00000211893 ENSG00000211893 NA NA
IGHG4 immunoglobulin heavy constant gamma 4 (G4m marker) ENSG00000211892 ENSG00000211892 NA NA
RP11-731F5.2 NA ENSG00000253364 ENSG00000253364 NA NA
CD74 CD74 molecule ENSG00000019582 972 The protein encoded by this gene associates with class II major histocompatibility complex (MHC) and is an important chaperone that regulates antigen presentation for immune response. It also serves as cell surface receptor for the cytokine macrophage migration inhibitory factor (MIF) which, when bound to the encoded protein, initiates survival pathways and cell proliferation. This protein also interacts with amyloid precursor protein (APP) and suppresses the production of amyloid beta (Abeta). Multiple alternatively spliced transcript variants encoding different isoforms have been identified. NA
MX1 MX dynamin like GTPase 1 ENSG00000157601 4599 This gene encodes a guanosine triphosphate (GTP)-metabolizing protein that participates in the cellular antiviral response. The encoded protein is induced by type I and type II interferons and antagonizes the replication process of several different RNA and DNA viruses. There is a related gene located adjacent to this gene on chromosome 21, and there are multiple pseudogenes located in a cluster on chromosome 4. Alternative splicing results in multiple transcript variants. NA
MIR155HG MIR155 host gene ENSG00000234883 114614 MicroRNAs (miRNAs), such as miRNA155, are endogenous noncoding RNAs of about 22 nucleotides that regulate mRNAs by targeting them for cleavage or translational repression. The primary miRNA transcript containing the mature miRNA155 sequence, pri-miRNA155, is also referred to as BIC (Kluiver et al., 2005 [PubMed 16041695]). NA
HLA-DRA major histocompatibility complex, class II, DR alpha ENSG00000204287 3122 HLA-DRA is one of the HLA class II alpha chain paralogues. This class II molecule is a heterodimer consisting of an alpha and a beta chain, both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The alpha chain is approximately 33-35 kDa and its gene contains 5 exons. Exon 1 encodes the leader peptide, exons 2 and 3 encode the two extracellular domains, and exon 4 encodes the transmembrane domain and the cytoplasmic tail. DRA does not have polymorphisms in the peptide binding part and acts as the sole alpha chain for DRB1, DRB3, DRB4 and DRB5. NA
CD79A CD79a molecule ENSG00000105369 973 The B lymphocyte antigen receptor is a multimeric complex that includes the antigen-specific component, surface immunoglobulin (Ig). Surface Ig non-covalently associates with two other proteins, Ig-alpha and Ig-beta, which are necessary for expression and function of the B-cell antigen receptor. This gene encodes the Ig-alpha protein of the B-cell antigen component. Alternatively spliced transcript variants encoding different isoforms have been described. NA
OAS2 2’-5’-oligoadenylate synthetase 2 ENSG00000111335 4939 This gene encodes a member of the 2-5A synthetase family, essential proteins involved in the innate immune response to viral infection. The encoded protein is induced by interferons and uses adenosine triphosphate in 2’-specific nucleotidyl transfer reactions to synthesize 2’,5’-oligoadenylates (2-5As). These molecules activate latent RNase L, which results in viral RNA degradation and the inhibition of viral replication. The three known members of this gene family are located in a cluster on chromosome 12. Alternatively spliced transcript variants encoding different isoforms have been described. NA
ENO1 enolase 1 ENSG00000074800 2023 This gene encodes alpha-enolase, one of three enolase isoenzymes found in mammals. Each isoenzyme is a homodimer composed of 2 alpha, 2 gamma, or 2 beta subunits, and functions as a glycolytic enzyme. Alpha-enolase in addition, functions as a structural lens protein (tau-crystallin) in the monomeric form. Alternative splicing of this gene results in a shorter isoform that has been shown to bind to the c-myc promoter and function as a tumor suppressor. Several pseudogenes have been identified, including one on the long arm of chromosome 1. Alpha-enolase has also been identified as an autoantigen in Hashimoto encephalopathy. NA
IFI44L interferon induced protein 44 like ENSG00000137959 10964 NA NA
EBI3 Epstein-Barr virus induced 3 ENSG00000105246 10148 This gene was identified by its induced expression in B lymphocytes in response Epstein-Barr virus infection. It encodes a secreted glycoprotein belonging to the hematopoietin receptor family, and heterodimerizes with a 28 kDa protein to form interleukin 27 (IL-27). IL-27 regulates T cell and inflammatory responses, in part by activating the Jak/STAT pathway of CD4+ T cells. NA
IGHG3 immunoglobulin heavy constant gamma 3 (G3m marker) ENSG00000211897 ENSG00000211897 NA NA
OAS3 2’-5’-oligoadenylate synthetase 3 ENSG00000111331 4940 This gene encodes an enzyme included in the 2’, 5’ oligoadenylate synthase family. This enzyme is induced by interferons and catalyzes the 2’, 5’ oligomers of adenosine in order to bind and activate RNase L. This enzyme family plays a significant role in the inhibition of cellular protein synthesis and viral infection resistance. NA
RPS2 ribosomal protein S2 ENSG00000140988 6187 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S5P family of ribosomal proteins. It is located in the cytoplasm. This gene shares sequence similarity with mouse LLRep3. It is co-transcribed with the small nucleolar RNA gene U64, which is located in its third intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. NA
LOC728026 prothymosin alpha-like ENSG00000187514 728026 NA NA
PTMA prothymosin, alpha ENSG00000187514 5757 NA NA
LTA lymphotoxin alpha ENSG00000226979 4049 The encoded protein, a member of the tumor necrosis factor family, is a cytokine produced by lymphocytes. The protein is highly inducible, secreted, and forms heterotrimers with lymphotoxin-beta which anchor lymphotoxin-alpha to the cell surface. This protein also mediates a large variety of inflammatory, immunostimulatory, and antiviral responses, is involved in the formation of secondary lymphoid organs during development and plays a role in apoptosis. Genetic variations in this gene are associated with susceptibility to leprosy type 4, myocardial infarction, non-Hodgkin’s lymphoma, and psoriatic arthritis. Alternatively spliced transcript variants have been observed for this gene. NA
BIRC3 baculoviral IAP repeat containing 3 ENSG00000023445 330 This gene encodes a member of the IAP family of proteins that inhibit apoptosis by binding to tumor necrosis factor receptor-associated factors TRAF1 and TRAF2, probably by interfering with activation of ICE-like proteases. The encoded protein inhibits apoptosis induced by serum deprivation but does not affect apoptosis resulting from exposure to menadione, a potent inducer of free radicals. It contains 3 baculovirus IAP repeats and a ring finger domain. Transcript variants encoding the same isoform have been identified. NA
CD226 CD226 molecule ENSG00000150637 10666 This gene encodes a glycoprotein expressed on the surface of NK cells, platelets, monocytes and a subset of T cells. It is a member of the Ig-superfamily containing 2 Ig-like domains of the V-set. The protein mediates cellular adhesion of platelets and megakaryocytic cells to vascular endothelial cells. The protein also plays a role in megakaryocytic cell maturation. Alternative splicing results in multiple transcript variants. NA
SEMA7A semaphorin 7A (John Milton Hagen blood group) ENSG00000138623 8482 This gene encodes a member of the semaphorin family of proteins. The encoded preproprotein is proteolytically processed to generate the mature glycosylphosphatidylinositol (GPI)-anchored membrane glycoprotein. The encoded protein is found on activated lymphocytes and erythrocytes and may be involved in immunomodulatory and neuronal processes. The encoded protein carries the John Milton Hagen (JMH) blood group antigens. Mutations in this gene may be associated with reduced bone mineral density (BMD). Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. NA
MAP4K1 mitogen-activated protein kinase kinase kinase kinase 1 ENSG00000104814 11184 NA NA
ISG15 ISG15 ubiquitin-like modifier ENSG00000187608 9636 The protein encoded by this gene is a ubiquitin-like protein that is conjugated to intracellular target proteins upon activation by interferon-alpha and interferon-beta. Several functions have been ascribed to the encoded protein, including chemotactic activity towards neutrophils, direction of ligated target proteins to intermediate filaments, cell-to-cell signaling, and antiviral activity during viral infections. While conjugates of this protein have been found to be noncovalently attached to intermediate filaments, this protein is sometimes secreted. NA
HLA-DPA1 major histocompatibility complex, class II, DP alpha 1 ENSG00000231389 3113 HLA-DPA1 belongs to the HLA class II alpha chain paralogues. This class II molecule is a heterodimer consisting of an alpha (DPA) and a beta (DPB) chain, both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The alpha chain is approximately 33-35 kDa and its gene contains 5 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the two extracellular domains, exon 4 encodes the transmembrane domain and the cytoplasmic tail. Within the DP molecule both the alpha chain and the beta chain contain the polymorphisms specifying the peptide binding specificities, resulting in up to 4 different molecules. NA
CD22 CD22 molecule ENSG00000012124 933 NA NA
WDFY4 WDFY family member 4 ENSG00000128815 57705 NA NA
STAT1 signal transducer and activator of transcription 1 ENSG00000115415 6772 The protein encoded by this gene is a member of the STAT protein family. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators. This protein can be activated by various ligands including interferon-alpha, interferon-gamma, EGF, PDGF and IL6. This protein mediates the expression of a variety of genes, which is thought to be important for cell viability in response to different cell stimuli and pathogens. Two alternatively spliced transcript variants encoding distinct isoforms have been described. NA
LINC00926 long intergenic non-protein coding RNA 926 ENSG00000247982 283663 NA NA
NFKBIE NFKB inhibitor epsilon ENSG00000146232 4794 The protein encoded by this gene binds to components of NF-kappa-B, trapping the complex in the cytoplasm and preventing it from activating genes in the nucleus. Phosphorylation of the encoded protein targets it for destruction by the ubiquitin pathway, which activates NF-kappa-B by making it available to translocate to the nucleus. NA
TMPO thymopoietin ENSG00000120802 7112 The protein encoded by this gene resides in the nucleus and may play a role in the assembly of the nuclear lamina, and thus help maintain the structural organization of the nuclear envelope. It may function as a receptor for the attachment of lamin filaments to the inner nuclear membrane. Mutations in this gene are associated with dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. NA
MCM5 minichromosome maintenance complex component 5 ENSG00000100297 4174 The protein encoded by this gene is structurally very similar to the CDC46 protein from S. cerevisiae, a protein involved in the initiation of DNA replication. The encoded protein is a member of the MCM family of chromatin-binding proteins and can interact with at least two other members of this family. The encoded protein is upregulated in the transition from the G0 to G1/S phase of the cell cycle and may actively participate in cell cycle regulation. NA
LPXN leupaxin ENSG00000110031 9404 The product encoded by this gene is preferentially expressed in hematopoietic cells and belongs to the paxillin protein family. Similar to other members of this focal-adhesion-associated adaptor-protein family, it has four leucine-rich LD-motifs in the N-terminus and four LIM domains in the C-terminus. It may function in cell type-specific signaling by associating with PYK2, a member of focal adhesion kinase family. As a substrate for a tyrosine kinase in lymphoid cells, this protein may also function in, and be regulated by, tyrosine kinase activity. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
IL21R interleukin 21 receptor ENSG00000103522 50615 The protein encoded by this gene is a cytokine receptor for interleukin 21 (IL21). It belongs to the type I cytokine receptors, and has been shown to form a heterodimeric receptor complex with the common gamma-chain, a receptor subunit also shared by the receptors for interleukin 2, 4, 7, 9, and 15. This receptor transduces the growth promoting signal of IL21, and is important for the proliferation and differentiation of T cells, B cells, and natural killer (NK) cells. The ligand binding of this receptor leads to the activation of multiple downstream signaling molecules, including JAK1, JAK3, STAT1, and STAT3. Knockout studies of a similar gene in mouse suggest a role for this gene in regulating immunoglobulin production. Three alternatively spliced transcript variants have been described. NA
CLEC2D C-type lectin domain family 2 member D ENSG00000069493 29121 This gene encodes a member of the natural killer cell receptor C-type lectin family. The encoded protein inhibits osteoclast formation and contains a transmembrane domain near the N-terminus as well as the C-type lectin-like extracellular domain. Several alternatively spliced transcript variants have been identified for this gene. NA
SYNGR2 synaptogyrin 2 ENSG00000108639 9144 This gene encodes an integral membrane protein containing four transmembrane regions and a C-terminal cytoplasmic tail that is tyrosine phosphorylated. The exact function of this protein is unclear, but studies of a similar rat protein suggest that it may play a role in regulating membrane traffic in non-neuronal cells. The gene belongs to the synaptogyrin gene family. Alternative splicing results in multiple transcript variants. NA
TRAF1 TNF receptor associated factor 1 ENSG00000056558 7185 The protein encoded by this gene is a member of the TNF receptor (TNFR) associated factor (TRAF) protein family. TRAF proteins associate with, and mediate the signal transduction from various receptors of the TNFR superfamily. This protein and TRAF2 form a heterodimeric complex, which is required for TNF-alpha-mediated activation of MAPK8/JNK and NF-kappaB. The protein complex formed by this protein and TRAF2 also interacts with inhibitor-of-apoptosis proteins (IAPs), and thus mediates the anti-apoptotic signals from TNF receptors. The expression of this protein can be induced by Epstein-Barr virus (EBV). EBV infection membrane protein 1 (LMP1) is found to interact with this and other TRAF proteins; this interaction is thought to link LMP1-mediated B lymphocyte transformation to the signal transduction from TNFR family receptors. Three transcript variants encoding two different isoforms have been found for this gene. NA
MCM3 minichromosome maintenance complex component 3 ENSG00000112118 4172 The protein encoded by this gene is one of the highly conserved mini-chromosome maintenance proteins (MCM) that are involved in the initiation of eukaryotic genome replication. The hexameric protein complex formed by MCM proteins is a key component of the pre-replication complex (pre_RC) and may be involved in the formation of replication forks and in the recruitment of other DNA replication related proteins. This protein is a subunit of the protein complex that consists of MCM2-7. It has been shown to interact directly with MCM5/CDC46. This protein also interacts with and is acetylated by MCM3AP, a chromatin-associated acetyltransferase. The acetylation of this protein inhibits the initiation of DNA replication and cell cycle progression. Two transcript variants encoding different isoforms have been found for this gene. NA
SEPT9 septin 9 ENSG00000184640 10801 This gene is a member of the septin family involved in cytokinesis and cell cycle control. This gene is a candidate for the ovarian tumor suppressor gene. Mutations in this gene cause hereditary neuralgic amyotrophy, also known as neuritis with brachial predilection. A chromosomal translocation involving this gene on chromosome 17 and the MLL gene on chromosome 11 results in acute myelomonocytic leukemia. Multiple alternatively spliced transcript variants encoding different isoforms have been described. NA
PKM pyruvate kinase, muscle ENSG00000067225 5315 This gene encodes a protein involved in glycolysis. The encoded protein is a pyruvate kinase that catalyzes the transfer of a phosphoryl group from phosphoenolpyruvate to ADP, generating ATP and pyruvate. This protein has been shown to interact with thyroid hormone and may mediate cellular metabolic effects induced by thyroid hormones. This protein has been found to bind Opa protein, a bacterial outer membrane protein involved in gonococcal adherence to and invasion of human cells, suggesting a role of this protein in bacterial pathogenesis. Several alternatively spliced transcript variants encoding a few distinct isoforms have been reported. NA
IGLC2 immunoglobulin lambda constant 2 (Kern-Oz- marker) ENSG00000211677 ENSG00000211677 NA NA
MCM2 minichromosome maintenance complex component 2 ENSG00000073111 4171 The protein encoded by this gene is one of the highly conserved mini-chromosome maintenance proteins (MCM) that are involved in the initiation of eukaryotic genome replication. The hexameric protein complex formed by MCM proteins is a key component of the pre-replication complex (pre_RC) and may be involved in the formation of replication forks and in the recruitment of other DNA replication related proteins. This protein forms a complex with MCM4, 6, and 7, and has been shown to regulate the helicase activity of the complex. This protein is phosphorylated, and thus regulated by, protein kinases CDC2 and CDC7. Multiple alternatively spliced transcript variants have been found, but the full-length nature of some variants has not been defined. NA
CD79B CD79b molecule ENSG00000007312 974 The B lymphocyte antigen receptor is a multimeric complex that includes the antigen-specific component, surface immunoglobulin (Ig). Surface Ig non-covalently associates with two other proteins, Ig-alpha and Ig-beta, which are necessary for expression and function of the B-cell antigen receptor. This gene encodes the Ig-beta protein of the B-cell antigen component. Alternatively spliced transcript variants encoding different isoforms have been described. NA
CD48 CD48 molecule ENSG00000117091 962 This gene encodes a member of the CD2 subfamily of immunoglobulin-like receptors which includes SLAM (signaling lymphocyte activation molecules) proteins. The encoded protein is found on the surface of lymphocytes and other immune cells, dendritic cells and endothelial cells, and participates in activation and differentiation pathways in these cells. The encoded protein does not have a transmembrane domain, however, but is held at the cell surface by a GPI anchor via a C-terminal domain which maybe cleaved to yield a soluble form of the receptor. Multiple transcript variants encoding different isoforms have been found for this gene. NA
IFIH1 interferon induced with helicase C domain 1 ENSG00000115267 64135 DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein that is upregulated in response to treatment with beta-interferon and a protein kinase C-activating compound, mezerein. Irreversible reprogramming of melanomas can be achieved by treatment with both these agents; treatment with either agent alone only achieves reversible differentiation. Genetic variation in this gene is associated with diabetes mellitus insulin-dependent type 19. NA
TNF tumor necrosis factor ENSG00000232810 7124 This gene encodes a multifunctional proinflammatory cytokine that belongs to the tumor necrosis factor (TNF) superfamily. This cytokine is mainly secreted by macrophages. It can bind to, and thus functions through its receptors TNFRSF1A/TNFR1 and TNFRSF1B/TNFBR. This cytokine is involved in the regulation of a wide spectrum of biological processes including cell proliferation, differentiation, apoptosis, lipid metabolism, and coagulation. This cytokine has been implicated in a variety of diseases, including autoimmune diseases, insulin resistance, and cancer. Knockout studies in mice also suggested the neuroprotective function of this cytokine. NA
EEF1A1 eukaryotic translation elongation factor 1 alpha 1 ENSG00000156508 1915 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. NA
TMSB4XP8 thymosin beta 4, X-linked pseudogene 8 ENSG00000187653 ENSG00000187653 NA NA
NA NA ENSG00000260655 NA NA TRUE
CD40 CD40 molecule ENSG00000101017 958 This gene is a member of the TNF-receptor superfamily. The encoded protein is a receptor on antigen-presenting cells of the immune system and is essential for mediating a broad variety of immune and inflammatory responses including T cell-dependent immunoglobulin class switching, memory B cell development, and germinal center formation. AT-hook transcription factor AKNA is reported to coordinately regulate the expression of this receptor and its ligand, which may be important for homotypic cell interactions. Adaptor protein TNFR2 interacts with this receptor and serves as a mediator of the signal transduction. The interaction of this receptor and its ligand is found to be necessary for amyloid-beta-induced microglial activation, and thus is thought to be an early event in Alzheimer disease pathogenesis. Mutations affecting this gene are the cause of autosomal recessive hyper-IgM immunodeficiency type 3 (HIGM3). Multiple alternatively spliced transcript variants of this gene encoding distinct isoforms have been reported. NA
CCT8 chaperonin containing TCP1 subunit 8 ENSG00000156261 10694 This gene encodes the theta subunit of the CCT chaperonin, which is abundant in the eukaryotic cytosol and may be involved in the transport and assembly of newly synthesized proteins. Alternative splicing results in multiple transcript variants of this gene. A pseudogene related to this gene is located on chromosome 1. NA
TNFAIP3 TNF alpha induced protein 3 ENSG00000118503 7128 This gene was identified as a gene whose expression is rapidly induced by the tumor necrosis factor (TNF). The protein encoded by this gene is a zinc finger protein and ubiqitin-editing enzyme, and has been shown to inhibit NF-kappa B activation as well as TNF-mediated apoptosis. The encoded protein, which has both ubiquitin ligase and deubiquitinase activities, is involved in the cytokine-mediated immune and inflammatory responses. Several transcript variants encoding the same protein have been found for this gene. NA
HNRNPC heterogeneous nuclear ribonucleoprotein C (C1/C2) ENSG00000092199 3183 This gene belongs to the subfamily of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs). The hnRNPs are RNA binding proteins and they complex with heterogeneous nuclear RNA (hnRNA). These proteins are associated with pre-mRNAs in the nucleus and appear to influence pre-mRNA processing and other aspects of mRNA metabolism and transport. While all of the hnRNPs are present in the nucleus, some seem to shuttle between the nucleus and the cytoplasm. The hnRNP proteins have distinct nucleic acid binding properties. The protein encoded by this gene can act as a tetramer and is involved in the assembly of 40S hnRNP particles. Multiple transcript variants encoding at least two different isoforms have been described for this gene. NA
SP140 SP140 nuclear body protein ENSG00000079263 11262 NA NA
LY75 lymphocyte antigen 75 ENSG00000054219 4065 NA NA
NFKB2 nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 ENSG00000077150 4791 This gene encodes a subunit of the transcription factor complex nuclear factor-kappa-B (NFkB). The NFkB complex is expressed in numerous cell types and functions as a central activator of genes involved in inflammation and immune function. The protein encoded by this gene can function as both a transcriptional activator or repressor depending on its dimerization partner. The p100 full-length protein is co-translationally processed into a p52 active form. Chromosomal rearrangements and translocations of this locus have been observed in B cell lymphomas, some of which may result in the formation of fusion proteins. There is a pseudogene for this gene on chromosome 18. Alternative splicing results in multiple transcript variants. NA
IFIT3 interferon induced protein with tetratricopeptide repeats 3 ENSG00000119917 3437 NA NA
CD27 CD27 molecule ENSG00000139193 939 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is required for generation and long-term maintenance of T cell immunity. It binds to ligand CD70, and plays a key role in regulating B-cell activation and immunoglobulin synthesis. This receptor transduces signals that lead to the activation of NF-kappaB and MAPK8/JNK. Adaptor proteins TRAF2 and TRAF5 have been shown to mediate the signaling process of this receptor. CD27-binding protein (SIVA), a proapoptotic protein, can bind to this receptor and is thought to play an important role in the apoptosis induced by this receptor. NA
TFRC transferrin receptor ENSG00000072274 7037 This gene encodes a cell surface receptor necessary for cellular iron uptake by the process of receptor-mediated endocytosis. This receptor is required for erythropoiesis and neurologic development. Multiple alternatively spliced variants have been identified. NA
DUSP2 dual specificity phosphatase 2 ENSG00000158050 1844 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1 and ERK2, is predominantly expressed in hematopoietic tissues, and is localized in the nucleus. NA
NCAPD2 non-SMC condensin I complex subunit D2 ENSG00000010292 9918 NA NA
TUBB tubulin beta class I ENSG00000196230 203068 This gene encodes a beta tubulin protein. This protein forms a dimer with alpha tubulin and acts as a structural component of microtubules. Mutations in this gene cause cortical dysplasia, complex, with other brain malformations 6. Alternative splicing results in multiple splice variants. There are multiple pseudogenes for this gene on chromosomes 1, 6, 7, 8, 9, and 13. NA
RPSA ribosomal protein SA ENSG00000168028 3921 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Many of the effects of laminin are mediated through interactions with cell surface receptors. These receptors include members of the integrin family, as well as non-integrin laminin-binding proteins. This gene encodes a high-affinity, non-integrin family, laminin receptor 1. This receptor has been variously called 67 kD laminin receptor, 37 kD laminin receptor precursor (37LRP) and p40 ribosome-associated protein. The amino acid sequence of laminin receptor 1 is highly conserved through evolution, suggesting a key biological function. It has been observed that the level of the laminin receptor transcript is higher in colon carcinoma tissue and lung cancer cell line than their normal counterparts. Also, there is a correlation between the upregulation of this polypeptide in cancer cells and their invasive and metastatic phenotype. Multiple copies of this gene exist, however, most of them are pseudogenes thought to have arisen from retropositional events. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. NA
CDC42SE2 CDC42 small effector 2 ENSG00000158985 56990 NA NA
LDHA lactate dehydrogenase A ENSG00000134333 3939 The protein encoded by this gene catalyzes the conversion of L-lactate and NAD to pyruvate and NADH in the final step of anaerobic glycolysis. The protein is found predominantly in muscle tissue and belongs to the lactate dehydrogenase family. Mutations in this gene have been linked to exertional myoglobinuria. Multiple transcript variants encoding different isoforms have been found for this gene. The human genome contains several non-transcribed pseudogenes of this gene. NA
VOPP1 vesicular, overexpressed in cancer, prosurvival protein 1 ENSG00000154978 81552 NA NA
RAN RAN, member RAS oncogene family ENSG00000132341 5901 RAN (ras-related nuclear protein) is a small GTP binding protein belonging to the RAS superfamily that is essential for the translocation of RNA and proteins through the nuclear pore complex. The RAN protein is also involved in control of DNA synthesis and cell cycle progression. Nuclear localization of RAN requires the presence of regulator of chromosome condensation 1 (RCC1). Mutations in RAN disrupt DNA synthesis. Because of its many functions, it is likely that RAN interacts with several other proteins. RAN regulates formation and organization of the microtubule network independently of its role in the nucleus-cytosol exchange of macromolecules. RAN could be a key signaling molecule regulating microtubule polymerization during mitosis. RCC1 generates a high local concentration of RAN-GTP around chromatin which, in turn, induces the local nucleation of microtubules. RAN is an androgen receptor (AR) coactivator that binds differentially with different lengths of polyglutamine within the androgen receptor. Polyglutamine repeat expansion in the AR is linked to Kennedy’s disease (X-linked spinal and bulbar muscular atrophy). RAN coactivation of the AR diminishes with polyglutamine expansion within the AR, and this weak coactivation may lead to partial androgen insensitivity during the development of Kennedy’s disease. NA
PCNA proliferating cell nuclear antigen ENSG00000132646 5111 The protein encoded by this gene is found in the nucleus and is a cofactor of DNA polymerase delta. The encoded protein acts as a homotrimer and helps increase the processivity of leading strand synthesis during DNA replication. In response to DNA damage, this protein is ubiquitinated and is involved in the RAD6-dependent DNA repair pathway. Two transcript variants encoding the same protein have been found for this gene. Pseudogenes of this gene have been described on chromosome 4 and on the X chromosome. NA
PARP1 poly(ADP-ribose) polymerase 1 ENSG00000143799 142 This gene encodes a chromatin-associated enzyme, poly(ADP-ribosyl)transferase, which modifies various nuclear proteins by poly(ADP-ribosyl)ation. The modification is dependent on DNA and is involved in the regulation of various important cellular processes such as differentiation, proliferation, and tumor transformation and also in the regulation of the molecular events involved in the recovery of cell from DNA damage. In addition, this enzyme may be the site of mutation in Fanconi anemia, and may participate in the pathophysiology of type I diabetes. NA
TNFRSF13C tumor necrosis factor receptor superfamily member 13C ENSG00000159958 115650 B cell-activating factor (BAFF) enhances B-cell survival in vitro and is a regulator of the peripheral B-cell population. Overexpression of Baff in mice results in mature B-cell hyperplasia and symptoms of systemic lupus erythematosus (SLE). Also, some SLE patients have increased levels of BAFF in serum. Therefore, it has been proposed that abnormally high levels of BAFF may contribute to the pathogenesis of autoimmune diseases by enhancing the survival of autoreactive B cells. The protein encoded by this gene is a receptor for BAFF and is a type III transmembrane protein containing a single extracellular cysteine-rich domain. It is thought that this receptor is the principal receptor required for BAFF-mediated mature B-cell survival. NA
IRF5 interferon regulatory factor 5 ENSG00000128604 3663 This gene encodes a member of the interferon regulatory factor (IRF) family, a group of transcription factors with diverse roles, including virus-mediated activation of interferon, and modulation of cell growth, differentiation, apoptosis, and immune system activity. Members of the IRF family are characterized by a conserved N-terminal DNA-binding domain containing tryptophan (W) repeats. Multiple transcript variants encoding different isoforms have been found for this gene, and a 30-nt indel polymorphism (SNP rs60344245) can result in loss of a 10-aa segment. NA
OAS1 2’-5’-oligoadenylate synthetase 1 ENSG00000089127 4938 This gene is induced by interferons and encodes a protein that synthesizes 2’,5’-oligoadenylates (2-5As). This protein activates latent RNase L, which results in viral RNA degradation and the inhibition of viral replication. Alternative splicing results in multiple transcript variants with different enzymatic activities. Polymorphisms in this gene have been associated with susceptibility to viral infection and diabetes mellitus, type 1. A disease-associated allele in a splice acceptor site influences the production of the p46 splice isoform. This gene is located in a cluster of related genes on chromosome 12. NA
TSPAN33 tetraspanin 33 ENSG00000158457 340348 NA NA
NUSAP1 nucleolar and spindle associated protein 1 ENSG00000137804 51203 NUSAP1 is a nucleolar-spindle-associated protein that plays a role in spindle microtubule organization (Raemaekers et al., 2003 [PubMed 12963707]). NA
KIAA0922 KIAA0922 ENSG00000121210 23240 NA NA
H2AFZ H2A histone family member Z ENSG00000164032 3015 Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Nucleosomes consist of approximately 146 bp of DNA wrapped around a histone octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene encodes a replication-independent member of the histone H2A family that is distinct from other members of the family. Studies in mice have shown that this particular histone is required for embryonic development and indicate that lack of functional histone H2A leads to embryonic lethality. NA
NPM1 nucleophosmin (nucleolar phosphoprotein B23, numatrin) ENSG00000181163 4869 This gene encodes a phosphoprotein which moves between the nucleus and the cytoplasm. The gene product is thought to be involved in several processes including regulation of the ARF/p53 pathway. A number of genes are fusion partners have been characterized, in particular the anaplastic lymphoma kinase gene on chromosome 2. Mutations in this gene are associated with acute myeloid leukemia. More than a dozen pseudogenes of this gene have been identified. Alternative splicing results in multiple transcript variants. NA
RPL4 ribosomal protein L4 ENSG00000174444 6124 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L4E family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. NA
TMC6 transmembrane channel like 6 ENSG00000141524 11322 Epidermodysplasia verruciformis (EV) is an autosomal recessive dermatosis characterized by abnormal susceptibility to human papillomaviruses (HPVs) and a high rate of progression to squamous cell carcinoma on sun-exposed skin. EV is caused by mutations in either of two adjacent genes located on chromosome 17q25.3. Both of these genes encode integral membrane proteins that localize to the endoplasmic reticulum and are predicted to form transmembrane channels. This gene encodes a transmembrane channel-like protein with 10 transmembrane domains and 2 leucine zipper motifs. NA
RPS6 ribosomal protein S6 ENSG00000137154 6194 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a cytoplasmic ribosomal protein that is a component of the 40S subunit. The protein belongs to the S6E family of ribosomal proteins. It is the major substrate of protein kinases in the ribosome, with subsets of five C-terminal serine residues phosphorylated by different protein kinases. Phosphorylation is induced by a wide range of stimuli, including growth factors, tumor-promoting agents, and mitogens. Dephosphorylation occurs at growth arrest. The protein may contribute to the control of cell growth and proliferation through the selective translation of particular classes of mRNA. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. NA
RPS2P5 ribosomal protein S2 pseudogene 5 ENSG00000240342 ENSG00000240342 NA NA
RPS23 ribosomal protein S23 ENSG00000186468 6228 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S12P family of ribosomal proteins. It is located in the cytoplasm. The protein shares significant amino acid similarity with S. cerevisiae ribosomal protein S28. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. NA
SET SET nuclear proto-oncogene ENSG00000119335 6418 The protein encoded by this gene inhibits acetylation of nucleosomes, especially histone H4, by histone acetylases (HAT). This inhibition is most likely accomplished by masking histone lysines from being acetylated, and the consequence is to silence HAT-dependent transcription. The encoded protein is part of a complex localized to the endoplasmic reticulum but is found in the nucleus and inhibits apoptosis following attack by cytotoxic T lymphocytes. This protein can also enhance DNA replication of the adenovirus genome. Several transcript variants encoding different isoforms have been found for this gene. NA
APOBEC3G apolipoprotein B mRNA editing enzyme catalytic subunit 3G ENSG00000239713 60489 This gene is a member of the cytidine deaminase gene family. It is one of seven related genes or pseudogenes found in a cluster, thought to result from gene duplication, on chromosome 22. Members of the cluster encode proteins that are structurally and functionally related to the C to U RNA-editing cytidine deaminase APOBEC1. It is thought that the proteins may be RNA editing enzymes and have roles in growth or cell cycle control. The protein encoded by this gene has been found to be a specific inhibitor of human immunodeficiency virus-1 (HIV-1) infectivity. NA
ITGB7 integrin subunit beta 7 ENSG00000139626 3695 This gene encodes a protein that is a member of the integrin superfamily. Members of this family are adhesion receptors that function in signaling from the extracellular matrix to the cell. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. The encoded protein forms dimers with an alpha4 chain or an alphaE chain and plays a role in leukocyte adhesion. Dimerization with alpha4 forms a homing receptor for migration of lymphocytes to the intestinal mucosa and Peyer’s patches. Dimerization with alphaE permits binding to the ligand epithelial cadherin, a calcium-dependent adhesion molecule. Alternate splicing results in multiple transcript variants. Additional alternatively spliced transcript variants of this gene have been described, but their full-length nature is not known. NA
PPIA peptidylprolyl isomerase A ENSG00000196262 5478 This gene encodes a member of the peptidyl-prolyl cis-trans isomerase (PPIase) family. PPIases catalyze the cis-trans isomerization of proline imidic peptide bonds in oligopeptides and accelerate the folding of proteins. The encoded protein is a cyclosporin binding-protein and may play a role in cyclosporin A-mediated immunosuppression. The protein can also interact with several HIV proteins, including p55 gag, Vpr, and capsid protein, and has been shown to be necessary for the formation of infectious HIV virions. Multiple pseudogenes that map to different chromosomes have been reported. NA
UBR5 ubiquitin protein ligase E3 component n-recognin 5 ENSG00000104517 51366 This gene encodes a progestin-induced protein, which belongs to the HECT (homology to E6-AP carboxyl terminus) family. The HECT family proteins function as E3 ubiquitin-protein ligases, targeting specific proteins for ubiquitin-mediated proteolysis. This gene is localized to chromosome 8q22 which is disrupted in a variety of cancers. This gene potentially has a role in regulation of cell proliferation or differentiation. NA
TPI1 triosephosphate isomerase 1 ENSG00000111669 7167 This gene encodes an enzyme, consisting of two identical proteins, which catalyzes the isomerization of glyceraldehydes 3-phosphate (G3P) and dihydroxy-acetone phosphate (DHAP) in glycolysis and gluconeogenesis. Mutations in this gene are associated with triosephosphate isomerase deficiency. Pseudogenes have been identified on chromosomes 1, 4, 6 and 7. Alternative splicing results in multiple transcript variants. NA
ZC3H12D zinc finger CCCH-type containing 12D ENSG00000178199 340152 NA NA
POU2F2 POU class 2 homeobox 2 ENSG00000028277 5452 The protein encoded by this gene is a homeobox-containing transcription factor of the POU domain family. The encoded protein binds the octamer sequence 5’-ATTTGCAT-3’, a common transcription factor binding site in immunoglobulin gene promoters. Several transcript variants encoding different isoforms have been found for this gene. NA
IFI44 interferon induced protein 44 ENSG00000137965 10561 NA NA
B3GNT2 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 2 ENSG00000170340 10678 This gene encodes a member of the beta-1,3-N-acetylglucosaminyltransferase family. This enzyme is a type II transmembrane protein. It prefers the substrate of lacto-N-neotetraose, and is involved in the biosynthesis of poly-N-acetyllactosamine chains. Two transcript variants encoding the same protein have been found for this gene. NA
RPL19 ribosomal protein L19 ENSG00000108298 6143 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L19E family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. NA
NCOA3 nuclear receptor coactivator 3 ENSG00000124151 8202 The protein encoded by this gene is a nuclear receptor coactivator that interacts with nuclear hormone receptors to enhance their transcriptional activator functions. The encoded protein has histone acetyltransferase activity and recruits p300/CBP-associated factor and CREB binding protein as part of a multisubunit coactivation complex. This protein is initially found in the cytoplasm but is translocated into the nucleus upon phosphorylation. Several transcript variants encoding different isoforms have been found for this gene. In addition, a polymorphic repeat region is found in the C-terminus of the encoded protein. NA
RPS3 ribosomal protein S3 ENSG00000149273 6188 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit, where it forms part of the domain where translation is initiated. The protein belongs to the S3P family of ribosomal proteins. Studies of the mouse and rat proteins have demonstrated that the protein has an extraribosomal role as an endonuclease involved in the repair of UV-induced DNA damage. The protein appears to be located in both the cytoplasm and nucleus but not in the nucleolus. Higher levels of expression of this gene in colon adenocarcinomas and adenomatous polyps compared to adjacent normal colonic mucosa have been observed. This gene is co-transcribed with the small nucleolar RNA genes U15A and U15B, which are located in its first and fifth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
ITGA4 integrin subunit alpha 4 ENSG00000115232 3676 The gene encodes a member of the integrin alpha chain family of proteins. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 4 subunit. This subunit associates with a beta 1 or beta 7 subunit to form an integrin that may play a role in cell motility and migration. This integrin is a therapeutic target for the treatment of multiple sclerosis, Crohn’s disease and inflammatory bowel disease. Alternative splicing results in multiple transcript variants. NA
IGLC3 immunoglobulin lambda constant 3 (Kern-Oz+ marker) ENSG00000211679 ENSG00000211679 NA NA
BCAS4 breast carcinoma amplified sequence 4 ENSG00000124243 55653 NA NA
RPL23AP42 ribosomal protein L23a pseudogene 42 ENSG00000234851 ENSG00000234851 NA NA
NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 ENSG00000109320 4790 This gene encodes a 105 kD protein which can undergo cotranslational processing by the 26S proteasome to produce a 50 kD protein. The 105 kD protein is a Rel protein-specific transcription inhibitor and the 50 kD protein is a DNA binding subunit of the NF-kappa-B (NFKB) protein complex. NFKB is a transcription regulator that is activated by various intra- and extra-cellular stimuli such as cytokines, oxidant-free radicals, ultraviolet irradiation, and bacterial or viral products. Activated NFKB translocates into the nucleus and stimulates the expression of genes involved in a wide variety of biological functions. Inappropriate activation of NFKB has been associated with a number of inflammatory diseases while persistent inhibition of NFKB leads to inappropriate immune cell development or delayed cell growth. Alternative splicing results in multiple transcript variants encoding different isoforms, at least one of which is proteolytically processed. NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",12,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 13 Annotations

out <- mygene::queryMany(gene_list[13,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id symbol name query notfound
Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. 7038 TG thyroglobulin ENSG00000042832 NA
This gene encodes the anterior pituitary hormone prolactin. This secreted hormone is a growth regulator for many tissues, including cells of the immune system. It may also play a role in cell survival by suppressing apoptosis, and it is essential for lactation. Alternative splicing results in multiple transcript variants that encode the same protein. 5617 PRL prolactin ENSG00000172179 NA
The protein encoded by this gene is a member of the somatotropin/prolactin family of hormones which play an important role in growth control. The gene, along with four other related genes, is located at the growth hormone locus on chromosome 17 where they are interspersed in the same transcriptional orientation; an arrangement which is thought to have evolved by a series of gene duplications. The five genes share a remarkably high degree of sequence identity. Alternative splicing generates additional isoforms of each of the five growth hormones, leading to further diversity and potential for specialization. This particular family member is expressed in the pituitary but not in placental tissue as is the case for the other four genes in the growth hormone locus. Mutations in or deletions of the gene lead to growth hormone deficiency and short stature. 2688 GH1 growth hormone 1 ENSG00000259384 NA
This gene encodes a preproprotein that undergoes extensive, tissue-specific, post-translational processing via cleavage by subtilisin-like enzymes known as prohormone convertases. There are eight potential cleavage sites within the preproprotein and, depending on tissue type and the available convertases, processing may yield as many as ten biologically active peptides involved in diverse cellular functions. The encoded protein is synthesized mainly in corticotroph cells of the anterior pituitary where four cleavage sites are used; adrenocorticotrophin, essential for normal steroidogenesis and the maintenance of normal adrenal weight, and lipotropin beta are the major end products. In other tissues, including the hypothalamus, placenta, and epithelium, all cleavage sites may be used, giving rise to peptides with roles in pain and energy homeostasis, melanocyte stimulation, and immune modulation. These include several distinct melanotropins, lipotropins, and endorphins that are contained within the adrenocorticotrophin and beta-lipotropin peptides. The antimicrobial melanotropin alpha peptide exhibits antibacterial and antifungal activity. Mutations in this gene have been associated with early onset obesity, adrenal insufficiency, and red hair pigmentation. Alternatively spliced transcript variants encoding the same protein have been described. 5443 POMC proopiomelanocortin ENSG00000115138 NA
Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. 5620 PRM2 protamine 2 ENSG00000122304 NA
NA 5619 PRM1 protamine 1 ENSG00000175646 NA
This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. 7173 TPO thyroid peroxidase ENSG00000115705 NA
The four human glycoprotein hormones chorionic gonadotropin (CG), luteinizing hormone (LH), follicle stimulating hormone (FSH), and thyroid stimulating hormone (TSH) are dimers consisting of alpha and beta subunits that are associated noncovalently. The alpha subunits of these hormones are identical, however, their beta chains are unique and confer biological specificity. The protein encoded by this gene is the alpha subunit and belongs to the glycoprotein hormones alpha chain family. Two transcript variants encoding different isoforms have been found for this gene. 1081 CGA glycoprotein hormones, alpha polypeptide ENSG00000135346 NA
This gene encodes a member of the paired box (PAX) family of transcription factors. Members of this gene family typically encode proteins that contain a paired box domain, an octapeptide, and a paired-type homeodomain. This nuclear protein is involved in thyroid follicular cell development and expression of thyroid-specific genes. Mutations in this gene have been associated with thyroid dysgenesis, thyroid follicular carcinomas and atypical follicular thyroid adenomas. Alternatively spliced transcript variants encoding different isoforms have been described. 7849 PAX8 paired box 8 ENSG00000125618 NA
Spermatogenesis is a complex process regulated by extracellular and intracellular factors as well as cellular interactions among interstitial cells of the testis, Sertoli cells, and germ cells. This gene is expressed in the testis in Sertoli cells but not germ cells. The protein encoded by this gene contains plant homeodomain (PHD) finger domains, also known as leukemia associated protein (LAP) domains, believed to be involved in transcriptional regulation. The protein, which localizes to the nucleus of transfected cells, has been implicated in the transcriptional regulation of spermatogenesis. Alternate splicing results in multiple transcript variants of this gene. 51533 PHF7 PHD finger protein 7 ENSG00000010318 NA
NA 128229 TSACC TSSK6 activating co-chaperone ENSG00000163467 NA
NA 81691 LOC81691 exonuclease NEF-sp ENSG00000005189 NA
This gene encodes a protein belonging to the glyceraldehyde-3-phosphate dehydrogenase family of enzymes that play an important role in carbohydrate metabolism. Like its somatic cell counterpart, this sperm-specific enzyme functions in a nicotinamide adenine dinucleotide-dependent manner to remove hydrogen and add phosphate to glyceraldehyde 3-phosphate to form 1,3-diphosphoglycerate. During spermiogenesis, this enzyme may play an important role in regulating the switch between different energy-producing pathways, and it is required for sperm motility and male fertility. 26330 GAPDHS glyceraldehyde-3-phosphate dehydrogenase, spermatogenic ENSG00000105679 NA
NA ENSG00000219435 TEX40 testis expressed 40 ENSG00000219435 NA
NA 27124 INPP5J inositol polyphosphate-5-phosphatase J ENSG00000185133 NA
The protein encoded by this gene is similar to proacrosin binding protein sp32 precursor found in mouse, guinea pig, and pig. This protein is located in the sperm acrosome and is thought to function as a binding protein to proacrosin for packaging and condensation of the acrosin zymogen in the acrosomal matrix. This protein is a member of the cancer/testis family of antigens and it is found to be immunogenic. In normal tissues, this mRNA is expressed only in testis, whereas it is detected in a range of different tumor types such as bladder, breast, lung, liver, and colon. 84519 ACRBP acrosin binding protein ENSG00000111644 NA
The outer dense fibers are cytoskeletal structures that surround the axoneme in the middle piece and principal piece of the sperm tail. The fibers function in maintaining the elastic structure and recoil of the sperm tail as well as in protecting the tail from shear forces during epididymal transport and ejaculation. Defects in the outer dense fibers lead to abnormal sperm morphology and infertility. This gene encodes one of the major outer dense fiber proteins. Alternative splicing results in multiple transcript variants. The longer transcripts, also known as ‘Cenexins’, encode proteins with a C-terminal extension that are differentially targeted to somatic centrioles and thought to be crucial for the formation of microtubule organizing centers. 4957 ODF2 outer dense fiber of sperm tails 2 ENSG00000136811 NA
This gene belongs to the ATP-ases associated with diverse cellular activities (AAA+) superfamily. Members of this superfamily form ring-shaped homo-hexamers and have highly conserved ATPase domains that are involved in various processes including DNA replication, protein degradation and reactivation of misfolded proteins. All members of this family hydrolyze ATP through their AAA+ domains and use the energy generated through ATP hydrolysis to exert mechanical force on their substrates. In addition to an AAA+ domain, the protein encoded by this gene contains a C-terminal D2 domain, which is characteristic of the AAA+ subfamily of Caseinolytic peptidases to which this protein belongs. It cooperates with Hsp70 in the disaggregation of protein aggregates. Allelic variants of this gene are associated with 3-methylglutaconic aciduria, which causes cataracts and neutropenia. Alternative splicing results in multiple transcript variants. 81570 CLPB ClpB homolog, mitochondrial AAA ATPase chaperonin ENSG00000162129 NA
This gene encodes a protein containing a MYND-type zinc finger domain that likely functions in assembly of the dynein motor. Mutations in this gene can cause primary ciliary dyskinesia. This gene is also considered a tumor suppressor gene and is often mutated, deleted, or hypermethylated and silenced in cancer cells. Alternative splicing results in multiple transcript variants. 51364 ZMYND10 zinc finger MYND-type containing 10 ENSG00000004838 NA
To reach fertilization competence, spermatozoa undergo a series of morphological and molecular maturational processes, termed capacitation, involving protein tyrosine phosphorylation and increased intracellular calcium. The protein encoded by this gene localizes to the principal piece of the sperm flagellum in association with the fibrous sheath and exhibits calcium-binding when phosphorylated during capacitation. A pseudogene on chromosome 3 has been identified for this gene. Alternatively spliced transcript variants encoding distinct protein isoforms have been found for this gene. 26256 CABYR calcium binding tyrosine phosphorylation regulated ENSG00000154040 NA
This gene encodes a member of the subtilisin-like proprotein convertase family, which includes proteases that process protein and peptide precursors trafficking through regulated or constitutive branches of the secretory pathway. The encoded protein undergoes an initial autocatalytic processing event in the ER to generate a heterodimer which exits the ER and sorts to subcellular compartments where a second autocatalytic even takes place and the catalytic activity is acquired. This gene encodes one of the seven basic amino acid-specific members which cleave their substrates at single or paired basic residues. The protease is expressed only in the testis, placenta, and ovary. It plays a critical role in fertilization, fetoplacental growth, and embryonic development and processes multiple prohormones including pro-pituitary adenylate cyclase-activating protein and pro-insulin-like growth factor II. 54760 PCSK4 proprotein convertase subtilisin/kexin type 4 ENSG00000115257 NA
NA ENSG00000153363 LINC00467 long intergenic non-protein coding RNA 467 ENSG00000153363 NA
NA 60509 AGBL5 ATP/GTP binding protein-like 5 ENSG00000084693 NA
This gene encodes a member of A-kinase anchoring proteins (AKAPs), a family of functionally related proteins that target protein kinase A to discrete locations within the cell. The encoded protein is reported to participate in protein-protein interactions with the R-subunit of the protein kinase A as well as sperm-associated proteins. This protein is expressed in spermatozoa and localized to the acrosomal region of the sperm head as well as the length of the principal piece. It may function as a regulator of motility, capacitation, and the acrosome reaction. 10566 AKAP3 A-kinase anchoring protein 3 ENSG00000111254 NA
The protein encoded by this gene interacts with components of the origin recognition complex (ORC) and regulates the formation of the prereplicative complex. The encoded protein stabilizes the ORC and therefore aids in DNA replication. This protein is required for the G1/S phase transition of the cell cycle. In addition, the encoded protein binds to trimethylated histone H3 in heterochromatin and recruits the ORC and lysine methyltransferases, which help maintain the repressive heterochromatic state. Two transcript variants encoding different isoforms have been found for this gene. 222229 LRWD1 leucine-rich repeats and WD repeat domain containing 1 ENSG00000161036 NA
NA 147011 PROCA1 protein interacting with cyclin A1 ENSG00000167525 NA
The protein encoded by this gene has substantial phospholipase activity and may be involved in lipoprotein metabolism and vascular biology. This protein is designated a member of the TG lipase family by its sequence and characteristic lid region which provides substrate specificity for enzymes of the TG lipase family. 9388 LIPG lipase G, endothelial type ENSG00000101670 NA
The protein encoded by this gene belongs to the ornithine decarboxylase antizyme family, which plays a role in cell growth and proliferation by regulating intracellular polyamine levels. Expression of antizymes requires +1 ribosomal frameshifting, which is enhanced by high levels of polyamines. Antizymes in turn bind to and inhibit ornithine decarboxylase (ODC), the key enzyme in polyamine biosynthesis; thus, completing the auto-regulatory circuit. This gene encodes antizyme 3, the third member of the antizyme family. Like antizymes 1 and 2, antizyme 3 inhibits ODC activity and polyamine uptake; however, it does not stimulate ODC degradation. Also, while antizymes 1 and 2 have broad tissue distribution, expression of antizyme 3 is restricted to haploid germ cells in testis, suggesting a distinct role for this antizyme in spermiogenesis. Antizyme 3 gene knockout studies showed that homozygous mutant male mice were infertile, and indicated the likely role of this antizyme in the formation of a rigid connection between the sperm head and tail during spermatogenesis. Alternatively spliced transcript variants encoding different isoforms, including one resulting from the use of non-AUG (CUG) translation initiation codon, have been found for this gene. 51686 OAZ3 ornithine decarboxylase antizyme 3 ENSG00000143450 NA
NA 113177 IZUMO4 IZUMO family member 4 ENSG00000099840 NA
NA 100506469 TMEM147-AS1 TMEM147 antisense RNA 1 ENSG00000236144 NA
This locus has a highly complex imprinted expression pattern. It gives rise to maternally, paternally, and biallelically expressed transcripts that are derived from four alternative promoters and 5’ exons. Some transcripts contain a differentially methylated region (DMR) at their 5’ exons, and this DMR is commonly found in imprinted genes and correlates with transcript expression. An antisense transcript is produced from an overlapping locus on the opposite strand. One of the transcripts produced from this locus, and the antisense transcript, are paternally expressed noncoding RNAs, and may regulate imprinting in this region. In addition, one of the transcripts contains a second overlapping ORF, which encodes a structurally unrelated protein - Alex. Alternative splicing of downstream exons is also observed, which results in different forms of the stimulatory G-protein alpha subunit, a key element of the classical signal transduction pathway linking receptor-ligand interactions with the activation of adenylyl cyclase and a variety of cellular reponses. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene result in pseudohypoparathyroidism type 1a, pseudohypoparathyroidism type 1b, Albright hereditary osteodystrophy, pseudopseudohypoparathyroidism, McCune-Albright syndrome, progressive osseus heteroplasia, polyostotic fibrous dysplasia of bone, and some pituitary tumors. 2778 GNAS GNAS complex locus ENSG00000087460 NA
This gene encodes an unconventional myosin. This protein differs from other myosins in that it has a long N-terminal extension preceding the conserved motor domain. Studies in mice suggest that this protein is necessary for actin organization in the hair cells of the cochlea. Mutations in this gene have been associated with profound, congenital, neurosensory, nonsyndromal deafness. This gene is located within the Smith-Magenis syndrome region on chromosome 17. Read-through transcripts containing an upstream gene and this gene have been identified, but they are not thought to encode a fusion protein. Several alternatively spliced transcript variants have been described, but their full length sequences have not been determined. 51168 MYO15A myosin XVA ENSG00000091536 NA
This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class. Multiple transcript variants encoding different isoforms have been found for this gene. 26261 FBXO24 F-box protein 24 ENSG00000106336 NA
NA 100133036 FAM95B1 family with sequence similarity 95 member B1 ENSG00000223839 NA
This gene encodes a member of the beta-transducin protein family. Most proteins of the beta-transducin family are involved in regulatory functions. This protein is possibly involved in some intracellular signaling pathway. This gene is deleted in Williams-Beuren syndrome, a developmental disorder caused by deletion of multiple genes at 7q11.23. 26608 TBL2 transducin (beta)-like 2 ENSG00000106638 NA
NA 79025 C20orf195 chromosome 20 open reading frame 195 ENSG00000125531 NA
NA 84266 ALKBH7 alkB homolog 7 ENSG00000125652 NA
NA 200172 SLFNL1 schlafen like 1 ENSG00000171790 NA
This gene encodes a member of the Golgi-localized, gamma adaptin ear-containing, ARF-binding (GGA) protein family. Members of this family are ubiquitous coat proteins that regulate the trafficking of proteins between the trans-Golgi network and the lysosome. These proteins share an amino-terminal VHS domain which mediates sorting of the mannose 6-phosphate receptors at the trans-Golgi network. They also contain a carboxy-terminal region with homology to the ear domain of gamma-adaptins. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 26088 GGA1 golgi-associated, gamma adaptin ear containing, ARF binding protein 1 ENSG00000100083 NA
This gene is proposed to play a role in cerebral cortical development. Mutations in this gene have been associated with microencephaly, cortical malformations, and mental retardation. Alternative splicing results in multiple transcript variants. 284403 WDR62 WD repeat domain 62 ENSG00000075702 NA
NA 64753 CCDC136 coiled-coil domain containing 136 ENSG00000128596 NA
NA 199223 TTC21A tetratricopeptide repeat domain 21A ENSG00000168026 NA
NA 54535 CCHCR1 coiled-coil alpha-helical rod protein 1 ENSG00000204536 NA
This gene encodes a molecular chaperone that is a member of the chaperonin-containing TCP1 complex (CCT), also known as the TCP1 ring complex (TRiC). This complex consists of two identical stacked rings, each containing eight different proteins. Unfolded polypeptides enter the central cavity of the complex and are folded in an ATP-dependent manner. The complex folds various proteins, including actin and tubulin. Alternative splicing results in multiple transcript variants. 10693 CCT6B chaperonin containing TCP1 subunit 6B ENSG00000132141 NA
This gene belongs to the chemokine-like factor gene superfamily, a novel family that links the chemokine and the transmembrane 4 superfamilies of signaling molecules. The protein encoded by this gene may play an important role in testicular development. 146225 CMTM2 CKLF like MARVEL transmembrane domain containing 2 ENSG00000140932 NA
CATSPERG is a subunit of the CATSPER (see CATSPER1; MIM 606389) sperm calcium channel, which is required for sperm hyperactivated motility and male fertility (Wang et al., 2009 [PubMed 19516020]). 57828 CATSPERG cation channel sperm associated auxiliary subunit gamma ENSG00000099338 NA
This gene encodes a member of the insulin-like hormone superfamily. The encoded protein is mainly produced in gonadal tissues. Studies of the mouse counterpart suggest that this gene may be involved in the development of urogenital tract and female fertility. This protein may also act as a hormone to regulate growth and differentiation of gubernaculum, and thus mediating intra-abdominal testicular descent. Mutations in this gene may lead to cryptorchidism. Alternate splicing results in multiple transcript variants. 3640 INSL3 insulin like 3 ENSG00000248099 NA
Cytosolic and membrane-bound forms of glutathione S-transferase are encoded by two distinct supergene families. At present, eight distinct classes of the soluble cytoplasmic mammalian glutathione S-transferases have been identified: alpha, kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a glutathione S-transferase that belongs to the mu class. The mu class of enzymes functions in the detoxification of electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins and products of oxidative stress, by conjugation with glutathione. The genes encoding the mu class of enzymes are organized in a gene cluster on chromosome 1p13.3 and are known to be highly polymorphic. These genetic variations can change an individual’s susceptibility to carcinogens and toxins as well as affect the toxicity and efficacy of certain drugs. Mutations of this class mu gene have been linked with a slight increase in a number of cancers, likely due to exposure with environmental toxins. Alternative splicing results in multiple transcript variants. 2947 GSTM3 glutathione S-transferase mu 3 (brain) ENSG00000134202 NA
This gene encodes a tyrosine-sulfated secretory protein abundant in peptidergic endocrine cells and neurons. This protein may serve as a precursor for regulatory peptides. 1114 CHGB chromogranin B ENSG00000089199 NA
This gene belongs to the CFAP53 family. It was found to be differentially expressed by the ciliated cells of frog epidermis and in skin fibroblasts from human. Mutations in this gene are associated with visceral heterotaxy-6, which implicates this gene in determination of left-right asymmetric patterning. 220136 CFAP53 cilia and flagella associated protein 53 ENSG00000172361 NA
This gene encodes a member of the F-box protein family, members of which are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene contains WD-40 domains, in addition to an F-box motif, so it belongs to the Fbw class. Alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene, however, they were found to be nonsense-mediated mRNA decay (NMD) candidates, hence not represented. 54461 FBXW5 F-box and WD repeat domain containing 5 ENSG00000159069 NA
This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. This cytoplasmic protein contains seven WD repeats and an AF-2 domain which function by recruiting coregulatory molecules and in transcriptional activation. Mutations in this gene cause cranioectodermal dysplasia-1. A related pseudogene is located on chromosome 3. Alternative splicing results in multiple transcript variants encoding different isoforms. 55764 IFT122 intraflagellar transport 122 ENSG00000163913 NA
The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of aminophospholipid-transporting ATPases. The aminophospholipid translocases transport phosphatidylserine and phosphatidylethanolamine from one side of a bilayer to the other. This gene encodes member 3 of phospholipid-transporting ATPase 8B; other members of this protein family are located on chromosomes 1, 15 and 18. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 148229 ATP8B3 ATPase phospholipid transporting 8B3 ENSG00000130270 NA
This gene encodes a protein that is highly similar to the mouse cGMP-dependent protein kinase anchoring protein 42kDa. The mouse protein has been found to localize with the Golgi and recruit cGMP-dependent protein kinase I alpha to the Golgi in mouse testes. It is thought to play a role in germ cell development. Transcript variants encoding different isoforms have been found for this gene. 80318 GKAP1 G kinase anchoring protein 1 ENSG00000165113 NA
NA 257169 C9orf43 chromosome 9 open reading frame 43 ENSG00000157653 NA
NA 146562 C16orf71 chromosome 16 open reading frame 71 ENSG00000166246 NA
This gene encodes a 70kDa heat shock protein. In conjunction with other heat shock proteins, this protein stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and in organelles. The gene is located in the major histocompatibility complex class III region, in a cluster with two closely related genes which also encode isoforms of the 70kDa heat shock protein. 3305 HSPA1L heat shock protein family A (Hsp70) member 1 like ENSG00000204390 NA
This gene encodes an inositol-3-phosphate synthase enzyme. The encoded protein plays a critical role in the myo-inositol biosynthesis pathway by catalyzing the rate-limiting conversion of glucose 6-phosphate to myoinositol 1-phosphate. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene, and a pseudogene of this gene is located on the short arm of chromosome 4. 51477 ISYNA1 inositol-3-phosphate synthase 1 ENSG00000105655 NA
COL23A1 is a member of the transmembrane collagens, a subfamily of the nonfibrillar collagens that contain a single pass hydrophobic transmembrane domain (Banyard et al., 2003 [PubMed 12644459]). 91522 COL23A1 collagen type XXIII alpha 1 ENSG00000050767 NA
SLC6A16 shows structural characteristics of an Na(+)- and Cl(-)-dependent neurotransmitter transporter, including 12 transmembrane (TM) domains, intracellular N and C termini, and large extracellular loops containing multiple N-glycosylation sites. 28968 SLC6A16 solute carrier family 6 member 16 ENSG00000063127 NA
This gene encodes an RNA-binding phosphoprotein that is part of the MEX3 (muscle excess 3) family of translational regulators. The encoded protein contains N-terminal nuclear export and nuclear localization signals and is exported from the cytoplasm to the nucleus. The protein binds to RNA via two KH domains and also colocalizes with MEX3A, Dcp1A decapping factor and Argonaute proteins within P (processing) bodies. 84206 MEX3B mex-3 RNA binding family member B ENSG00000183496 NA
NA 54964 C1orf56 chromosome 1 open reading frame 56 ENSG00000143443 NA
The protein encoded by this gene is a member of the DNA polymerase type-B-like family. This enzyme synthesizes the 3’ poly(A) tail of mitochondrial transcripts and plays a role in replication-dependent histone mRNA degradation. 55149 MTPAP mitochondrial poly(A) polymerase ENSG00000107951 NA
NA 387338 NSUN4 NOP2/Sun RNA methyltransferase family member 4 ENSG00000117481 NA
The protein encoded by this gene is a member of the PP2C family of Ser/Thr protein phosphatases. PP2C family members are known to be negative regulators of cell stress response pathways. This phosphatase is found to be responsible for the dephosphorylation of Pre-mRNA splicing factors, which is important for the formation of functional spliceosome. Studies of a similar gene in mice suggested a role of this phosphatase in regulating cell cycle progression. 5496 PPM1G protein phosphatase, Mg2+/Mn2+ dependent 1G ENSG00000115241 NA
This gene encodes the second human homologue of the bacterial RuvB gene. Bacterial RuvB protein is a DNA helicase essential for homologous recombination and DNA double-strand break repair. Functional analysis showed that this gene product has both ATPase and DNA helicase activities. This gene is physically linked to the CGB/LHB gene cluster on chromosome 19q13.3, and is very close (55 nt) to the LHB gene, in the opposite orientation. 10856 RUVBL2 RuvB like AAA ATPase 2 ENSG00000183207 NA
This gene encodes an orphan nuclear receptor which is a member of the nuclear hormone receptor family. Its expression pattern suggests that it may be involved in neurogenesis and germ cell development. The protein can homodimerize and bind DNA, but in vivo targets have not been identified. Alternate splicing results in multiple transcript variants. 2649 NR6A1 nuclear receptor subfamily 6 group A member 1 ENSG00000148200 NA
NA 375189 PFN4 profilin family member 4 ENSG00000176732 NA
This p53-target gene encodes a brain-specific angiogenesis inhibitor. The protein is a seven-span transmembrane protein and a member of the secretin receptor family. It interacts with the cytoplasmic region of brain-specific angiogenesis inhibitor 1. This protein also contains two C2 domains, which are often found in proteins involved in signal transduction or membrane trafficking. Its expression pattern and similarity to other proteins suggest that it may be involved in synaptic functions. Several transcript variants encoding different isoforms have been found for this gene. 8938 BAIAP3 BAI1 associated protein 3 ENSG00000007516 NA
DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which has an ATPase activity and is a component of the survival of motor neurons (SMN) complex. This protein interacts directly with SMN, the spinal muscular atrophy gene product, and may play a catalytic role in the function of the SMN complex on RNPs. 11218 DDX20 DEAD-box helicase 20 ENSG00000064703 NA
NA 29122 PRSS50 protease, serine 50 ENSG00000206549 NA
NA NA NA NA ENSG00000174111 TRUE
NA 285429 DCAF4L1 DDB1 and CUL4 associated factor 4-like 1 ENSG00000182308 NA
NA 79696 ZC2HC1C zinc finger C2HC-type containing 1C ENSG00000119703 NA
This gene encodes a voltage-regulated, electrogenic sodium-coupled borate cotransporter that is essential for borate homeostasis, cell growth and cell proliferation. Mutations in this gene have been associated with a number of endothelial corneal dystrophies including recessive corneal endothelial dystrophy 2, corneal dystrophy and perceptive deafness, and Fuchs endothelial corneal dystrophy. Multiple transcript variants encoding different isoforms have been described. 83959 SLC4A11 solute carrier family 4 member 11 ENSG00000088836 NA
NA 93550 ZFAND4 zinc finger AN1-type containing 4 ENSG00000172671 NA
Fibrosin is a lymphokine secreted by activated lymphocytes that induces fibroblast proliferation (Prakash and Robbins, 1998 [PubMed 9809749]). 64319 FBRS fibrosin ENSG00000156860 NA
NA 84517 ACTRT3 actin related protein T3 ENSG00000184378 NA
NA 399664 MEX3D mex-3 RNA binding family member D ENSG00000181588 NA
This gene encodes an essential structural component of the synaptonemal complex. This complex is involved in synapsis, recombination and segregation of meiotic chromosomes. Mutations in this gene are associated with azoospermia in males and susceptibility to pregnancy loss in females. Alternate splicing results in multiple transcript variants that encode the same protein. 50511 SYCP3 synaptonemal complex protein 3 ENSG00000139351 NA
NA 80726 KIAA1683 KIAA1683 ENSG00000130518 NA
NA ENSG00000244219 GS1-259H13.2 NA ENSG00000244219 NA
NA 90417 KNSTRN kinetochore-localized astrin/SPAG5 binding protein ENSG00000128944 NA
NA 55063 ZCWPW1 zinc finger CW-type and PWWP domain containing 1 ENSG00000078487 NA
NA 80313 LRRC27 leucine rich repeat containing 27 ENSG00000148814 NA
This gene was first characterized as part of a cluster of genes located within the human major histocompatibility complex class III region. This gene encodes a nuclear protein that is cleaved by caspase 3 and is implicated in the control of apoptosis. In addition, the protein forms a complex with E1A binding protein p300 and is required for the acetylation of p53 in response to DNA damage. Multiple transcript variants encoding different isoforms have been found for this gene. 7917 BAG6 BCL2 associated athanogene 6 ENSG00000204463 NA
NA ENSG00000273142 RP11-458F8.4 NA ENSG00000273142 NA
The protein encoded by this gene is necessary for intercellular bridges in germ cells, which are required for spermatogenesis. Three transcript variants encoding different isoforms have been found for this gene. 56155 TEX14 testis expressed 14 ENSG00000121101 NA
This gene encodes a member of the tubulin tyrosine ligase like protein family. This protein interacts with two glucocorticoid receptor coactivators, transcriptional intermediary factor 2 and steroid receptor coactivator 1. This protein may function as a coregulator of glucocorticoid receptor mediated gene induction and repression. This protein may also function as an alpha tubulin polyglutamylase. 23093 TTLL5 tubulin tyrosine ligase like 5 ENSG00000119685 NA
NA 150291 MORC2-AS1 MORC2 antisense RNA 1 ENSG00000235989 NA
The protein encoded by this gene interacts with thyroid hormone receptor in a ligand-dependent manner and enhances thyroid hormone-dependent activation from thyroid response elements. This protein contains a bromodomain and is thought to be a nuclear receptor coactivator. Multiple alternatively spliced transcript variants that encode distinct isoforms have been identified. 10902 BRD8 bromodomain containing 8 ENSG00000112983 NA
This gene encodes a protein sharing high sequence similarity with ribosomal protein L39. Although the name of this gene has been referred to as ‘ribosomal protein L39’ in the public databases, its official name is ‘ribosomal protein L39-like’. It is not currently known whether the encoded protein is a functional ribosomal protein or whether it has evolved a function that is independent of the ribosome. 116832 RPL39L ribosomal protein L39 like ENSG00000163923 NA
NA 51233 DRICH1 aspartate-rich 1 ENSG00000189269 NA
NA 80705 TSGA10 testis specific 10 ENSG00000135951 NA
NA 150082 LCA5L Leber congenital amaurosis 5-like ENSG00000157578 NA
NA 83538 TTC25 tetratricopeptide repeat domain 25 ENSG00000204815 NA
NA 387644 LINC00202-1 long intergenic non-protein coding RNA 202-1 ENSG00000232224 NA
NA 56834 GPR137 G protein-coupled receptor 137 ENSG00000173264 NA
NA 100507588 TGFBR3L transforming growth factor beta receptor III like ENSG00000260001 NA
NA 54165 DCUN1D1 defective in cullin neddylation 1 domain containing 1 ENSG00000043093 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",13,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 14 Annotations

out <- mygene::queryMany(gene_list[14,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
name summary X_id query symbol
natriuretic peptide A The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. 4878 ENSG00000175206 NPPA
myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. 4624 ENSG00000197616 MYH6
myosin light chain 7 NA 58498 ENSG00000106631 MYL7
troponin T2, cardiac type The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139 ENSG00000118194 TNNT2
actin, alpha, cardiac muscle 1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). 70 ENSG00000159251 ACTC1
NPPA antisense RNA 1 NA ENSG00000242349 ENSG00000242349 NPPA-AS1
myosin binding protein C, cardiac MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. 4607 ENSG00000134571 MYBPC3
natriuretic peptide B This gene is a member of the natriuretic peptide family and encodes a secreted protein which functions as a cardiac hormone. The protein undergoes two cleavage events, one within the cell and a second after secretion into the blood. The protein’s biological actions include natriuresis, diuresis, vasorelaxation, inhibition of renin and aldosterone secretion, and a key role in cardiovascular homeostasis. A high concentration of this protein in the bloodstream is indicative of heart failure. The protein also acts as an antimicrobial peptide with antibacterial and antifungal activity. Mutations in this gene have been associated with postmenopausal osteoporosis. 4879 ENSG00000120937 NPPB
troponin I3, cardiac type Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. This gene encodes the TnI-cardiac protein and is exclusively expressed in cardiac muscle tissues. Mutations in this gene cause familial hypertrophic cardiomyopathy type 7 (CMH7) and familial restrictive cardiomyopathy (RCM). 7137 ENSG00000129991 TNNI3
ankyrin repeat domain 1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063 ENSG00000148677 ANKRD1
myosin light chain 4 Myosin is a hexameric ATPase cellular motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. This gene encodes a myosin alkali light chain that is found in embryonic muscle and adult atria. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. 4635 ENSG00000198336 MYL4
myosin light chain 3 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. 4634 ENSG00000160808 MYL3
heat shock protein family B (small) member 7 NA 27129 ENSG00000173641 HSPB7
crystallin alpha B Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. 1410 ENSG00000109846 CRYAB
nebulette This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. 10529 ENSG00000078114 NEBL
fatty acid binding protein 3 The intracellular fatty acid-binding proteins (FABPs) belongs to a multigene family. FABPs are divided into at least three distinct types, namely the hepatic-, intestinal- and cardiac-type. They form 14-15 kDa proteins and are thought to participate in the uptake, intracellular metabolism and/or transport of long-chain fatty acids. They may also be responsible in the modulation of cell growth and proliferation. Fatty acid-binding protein 3 gene contains four exons and its function is to arrest growth of mammary epithelial cells. This gene is a candidate tumor suppressor gene for human breast cancer. Alternative splicing results in multiple transcript variants. 2170 ENSG00000121769 FABP3
myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625 ENSG00000092054 MYH7
calsequestrin 2 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. 845 ENSG00000118729 CASQ2
myozenin 2 The protein encoded by this gene belongs to a family of sarcomeric proteins that bind to calcineurin, a phosphatase involved in calcium-dependent signal transduction in diverse cell types. These family members tether calcineurin to alpha-actinin at the z-line of the sarcomere of cardiac and skeletal muscle cells, and thus they are important for calcineurin signaling. Mutations in this gene cause cardiomyopathy familial hypertrophic type 16, a hereditary heart disorder. 51778 ENSG00000172399 MYOZ2
nicotinamide riboside kinase 2 NA 27231 ENSG00000077009 NMRK2
tropomyosin 1 (alpha) This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. 7168 ENSG00000140416 TPM1
four and a half LIM domains 2 This gene encodes a member of the four-and-a-half-LIM-only protein family. Family members contain two highly conserved, tandemly arranged, zinc finger domains with four highly conserved cysteines binding a zinc atom in each zinc finger. This protein is thought to have a role in the assembly of extracellular membranes. Also, this gene is down-regulated during transformation of normal myoblasts to rhabdomyosarcoma cells and the encoded protein may function as a link between presenilin-2 and an intracellular signaling pathway. Multiple alternatively spliced variants encoding different isoforms have been identified. 2274 ENSG00000115641 FHL2
cysteine and glycine rich protein 3 This gene encodes a member of the CSRP family of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this protein is found in a group of proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Mutations in this gene are thought to cause heritable forms of hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) in humans. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. 8048 ENSG00000129170 CSRP3
peptidylglycine alpha-amidating monooxygenase This gene encodes a multifunctional protein. The encoded preproprotein is proteolytically processed to generate the mature enzyme. This enzyme includes two domains with distinct catalytic activities, a peptidylglycine alpha-hydroxylating monooxygenase (PHM) domain and a peptidyl-alpha-hydroxyglycine alpha-amidating lyase (PAL) domain. These catalytic domains work sequentially to catalyze the conversion of neuroendocrine peptides to active alpha-amidated products. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. 5066 ENSG00000145730 PAM
solute carrier family 25 member 4 This gene is a member of the mitochondrial carrier subfamily of solute carrier protein genes. The product of this gene functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm. The protein forms a homodimer embedded in the inner mitochondria membrane. Mutations in this gene have been shown to result in autosomal dominant progressive external opthalmoplegia and familial hypertrophic cardiomyopathy. 291 ENSG00000151729 SLC25A4
ATP synthase, H+ transporting, mitochondrial F1 complex, beta polypeptide This gene encodes a subunit of mitochondrial ATP synthase. Mitochondrial ATP synthase catalyzes ATP synthesis, utilizing an electrochemical gradient of protons across the inner membrane during oxidative phosphorylation. ATP synthase is composed of two linked multi-subunit complexes: the soluble catalytic core, F1, and the membrane-spanning component, Fo, comprising the proton channel. The catalytic portion of mitochondrial ATP synthase consists of 5 different subunits (alpha, beta, gamma, delta, and epsilon) assembled with a stoichiometry of 3 alpha, 3 beta, and a single representative of the other 3. The proton channel consists of three main subunits (a, b, c). This gene encodes the beta subunit of the catalytic core. 506 ENSG00000110955 ATP5B
popeye domain containing 2 This gene encodes a member of the POP family of proteins which contain three putative transmembrane domains. This membrane associated protein is predominantly expressed in skeletal and cardiac muscle, and may have an important function in these tissues. 64091 ENSG00000121577 POPDC2
histidine rich calcium binding protein This gene encodes a luminal sarcoplasmic reticulum protein identified by its ability to bind low-density lipoprotein with high affinity. The protein interacts with the cytoplasmic domain of triadin, the main transmembrane protein of the junctional sarcoplasmic reticulum (SR) of skeletal muscle. The protein functions in the regulation of releasable calcium into the SR. 3270 ENSG00000130528 HRC
ATP synthase, H+ transporting, mitochondrial F1 complex, alpha subunit 1, cardiac muscle This gene encodes a subunit of mitochondrial ATP synthase. Mitochondrial ATP synthase catalyzes ATP synthesis, using an electrochemical gradient of protons across the inner membrane during oxidative phosphorylation. ATP synthase is composed of two linked multi-subunit complexes: the soluble catalytic core, F1, and the membrane-spanning component, Fo, comprising the proton channel. The catalytic portion of mitochondrial ATP synthase consists of 5 different subunits (alpha, beta, gamma, delta, and epsilon) assembled with a stoichiometry of 3 alpha, 3 beta, and a single representative of the other 3. The proton channel consists of three main subunits (a, b, c). This gene encodes the alpha subunit of the catalytic core. Alternatively spliced transcript variants encoding the different isoforms have been identified. Pseudogenes of this gene are located on chromosomes 9, 2, and 16. 498 ENSG00000152234 ATP5A1
lactate dehydrogenase B This gene encodes the B subunit of lactate dehydrogenase enzyme, which catalyzes the interconversion of pyruvate and lactate with concomitant interconversion of NADH and NAD+ in a post-glycolysis process. Alternatively spliced transcript variants have been found for this gene. Recent studies have shown that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Mutations in this gene are associated with lactate dehydrogenase B deficiency. Pseudogenes have been identified on chromosomes X, 5 and 13. 3945 ENSG00000111716 LDHB
ubiquinol-cytochrome c reductase binding protein This gene encodes a subunit of the ubiquinol-cytochrome c oxidoreductase complex, which consists of one mitochondrial-encoded and 10 nuclear-encoded subunits. The protein encoded by this gene binds ubiquinone and participates in the transfer of electrons when ubiquinone is bound. This protein plays an important role in hypoxia-induced angiogenesis through mitochondrial reactive oxygen species-mediated signaling. Mutations in this gene are associated with mitochondrial complex III deficiency. Alternatively spliced transcript variants have been found for this gene. Related pseudogenes have been identified on chromosomes 1, 5 and X. 7381 ENSG00000156467 UQCRB
malate dehydrogenase 1 This gene encodes an enzyme that catalyzes the NAD/NADH-dependent, reversible oxidation of malate to oxaloacetate in many metabolic pathways, including the citric acid cycle. Two main isozymes are known to exist in eukaryotic cells: one is found in the mitochondrial matrix and the other in the cytoplasm. This gene encodes the cytosolic isozyme, which plays a key role in the malate-aspartate shuttle that allows malate to pass through the mitochondrial membrane to be transformed into oxaloacetate for further cellular processes. Alternatively spliced transcript variants have been found for this gene. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Pseudogenes have been identified on chromosomes X and 6. 4190 ENSG00000014641 MDH1
myosin light chain 2 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. 4633 ENSG00000111245 MYL2
dickkopf WNT signaling pathway inhibitor 3 This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. 27122 ENSG00000050165 DKK3
phospholamban The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure. 5350 ENSG00000198523 PLN
myosin, heavy chain 7B, cardiac muscle, beta The myosin II molecule is a multi-subunit complex consisting of two heavy chains and four light chains. This gene encodes a heavy chain of myosin II, which is a member of the motor-domain superfamily. The heavy chain includes a globular motor domain, which catalyzes ATP hydrolysis and interacts with actin, and a tail domain in which heptad repeat sequences promote dimerization by interacting to form a rod-like alpha-helical coiled coil. This heavy chain subunit is a slow-twitch myosin. Alternatively spliced transcript variants have been found, but the full-length nature of these variants is not determined. 57644 ENSG00000078814 MYH7B
plakophilin 2 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This gene product may regulate the signaling activity of beta-catenin. Two alternately spliced transcripts encoding two protein isoforms have been identified. A processed pseudogene with high similarity to this locus has been mapped to chromosome 12p13. 5318 ENSG00000057294 PKP2
solute carrier family 41 member 1 NA 254428 ENSG00000133065 SLC41A1
creatine kinase, mitochondrial 2 Mitochondrial creatine kinase (MtCK) is responsible for the transfer of high energy phosphate from mitochondria to the cytosolic carrier, creatine. It belongs to the creatine kinase isoenzyme family. It exists as two isoenzymes, sarcomeric MtCK and ubiquitous MtCK, encoded by separate genes. Mitochondrial creatine kinase occurs in two different oligomeric forms: dimers and octamers, in contrast to the exclusively dimeric cytosolic creatine kinase isoenzymes. Sarcomeric mitochondrial creatine kinase has 80% homology with the coding exons of ubiquitous mitochondrial creatine kinase. This gene contains sequences homologous to several motifs that are shared among some nuclear genes encoding mitochondrial proteins and thus may be essential for the coordinated activation of these genes during mitochondrial biogenesis. Three transcript variants encoding the same protein have been found for this gene. 1160 ENSG00000131730 CKMT2
myomesin 2 The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD and 165 kD. The predicted MYOM2 protein contains 1,465 amino acids. Like MYOM1, MYOM2 has a unique N-terminal domain followed by 12 repeat domains with strong homology to either fibronectin type III or immunoglobulin C2 domains. Protein sequence comparisons suggested that the MYOM2 protein and bovine M protein are identical. 9172 ENSG00000036448 MYOM2
fat storage inducing transmembrane protein 2 FIT2 belongs to an evolutionarily conserved family of proteins involved in fat storage (Kadereit et al., 2008 [PubMed 18160536]). 128486 ENSG00000197296 FITM2
formin homology 2 domain containing 3 The protein encoded by this gene is a member of the diaphanous-related formins (DRF), and contains multiple domains, including GBD (GTPase-binding domain), DID (diaphanous inhibitory domain), FH1 (formin homology 1), FH2 (formin homology 2), and DAD (diaphanous auto-regulatory domain) domains. This protein is thought to play a role in actin filament polymerization in cardiomyocytes. Mutations in this gene have been associated with dilated cardiomyopathy (DCM), characterized by dilation of the ventricular chamber, leading to impairment of systolic pump function and subsequent heart failure. Increased levels of the protein encoded by this gene have been observed in individuals with hypertrophic cardiomyopathy (HCM). Alternative splicing results in multiple transcript variants encoding different isoforms. A muscle-specific isoform has been shown to possess a casein kinase 2 (CK2) phosphorylation site at the C-terminal end of the FH2 domain. Phosphorylation of this site alters its interaction with sequestosome 1 (SQSTM1), and targets this isoform to myofibrils, while other isoforms form cytoplasmic aggregates. 80206 ENSG00000134775 FHOD3
ATP synthase, H+ transporting, mitochondrial Fo complex subunit C3 (subunit 9) This gene encodes a subunit of mitochondrial ATP synthase. Mitochondrial ATP synthase catalyzes ATP synthesis, utilizing an electrochemical gradient of protons across the inner membrane during oxidative phosphorylation. ATP synthase is composed of two linked multi-subunit complexes: the soluble catalytic core, F1, and the membrane-spanning component, Fo, comprising the proton channel. The catalytic portion of mitochondrial ATP synthase consists of 5 different subunits (alpha, beta, gamma, delta, and epsilon) assembled with a stoichiometry of 3 alpha, 3 beta, and a single representative of the other 3. The proton channel seems to have nine subunits (a, b, c, d, e, f, g, F6 and 8). This gene is one of three genes that encode subunit c of the proton channel. Each of the three genes have distinct mitochondrial import sequences but encode the identical mature protein. Alternatively spliced transcript variants encoding different proteins have been identified. 518 ENSG00000154518 ATP5G3
cytochrome c, somatic This gene encodes a small heme protein that functions as a central component of the electron transport chain in mitochondria. The encoded protein associates with the inner membrane of the mitochondrion where it accepts electrons from cytochrome b and transfers them to the cytochrome oxidase complex. This protein is also involved in initiation of apoptosis. Mutations in this gene are associated with autosomal dominant nonsyndromic thrombocytopenia. Numerous processed pseudogenes of this gene are found throughout the human genome. 54205 ENSG00000172115 CYCS
NA NA ENSG00000258444 ENSG00000258444 CTD-2201G16.1
cadherin 2 This gene encodes a classical cadherin and member of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein is proteolytically processed to generate a calcium-dependent cell adhesion molecule and glycoprotein. This protein plays a role in the establishment of left-right asymmetry, development of the nervous system and the formation of cartilage and bone. 1000 ENSG00000170558 CDH2
cold shock domain containing C2 NA 27254 ENSG00000172346 CSDC2
NDUFA4, mitochondrial complex associated The protein encoded by this gene belongs to the complex I 9kDa subunit family. Mammalian complex I of mitochondrial respiratory chain is composed of 45 different subunits. This protein has NADH dehydrogenase activity and oxidoreductase activity. It transfers electrons from NADH to the respiratory chain. The immediate electron acceptor for the enzyme is believed to be ubiquinone. 4697 ENSG00000189043 NDUFA4
mitofusin 2 This gene encodes a mitochondrial membrane protein that participates in mitochondrial fusion and contributes to the maintenance and operation of the mitochondrial network. This protein is involved in the regulation of vascular smooth muscle cell proliferation, and it may play a role in the pathophysiology of obesity. Mutations in this gene cause Charcot-Marie-Tooth disease type 2A2, and hereditary motor and sensory neuropathy VI, which are both disorders of the peripheral nervous system. Defects in this gene have also been associated with early-onset stroke. Two transcript variants encoding the same protein have been identified. 9927 ENSG00000116688 MFN2
LIM domain binding 3 This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. 11155 ENSG00000122367 LDB3
cytochrome c oxidase subunit 7C Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes subunit VIIc, which shares 87% and 85% amino acid sequence identity with mouse and bovine COX VIIc, respectively, and is found in all tissues. A pseudogene COX7CP1 has been found on chromosome 13. 1350 ENSG00000127184 COX7C
basigin (Ok blood group) The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. 682 ENSG00000172270 BSG
succinate dehydrogenase complex flavoprotein subunit A This gene encodes a major catalytic subunit of succinate-ubiquinone oxidoreductase, a complex of the mitochondrial respiratory chain. The complex is composed of four nuclear-encoded subunits and is localized in the mitochondrial inner membrane. Mutations in this gene have been associated with a form of mitochondrial respiratory chain deficiency known as Leigh Syndrome. A pseudogene has been identified on chromosome 3q29. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 6389 ENSG00000073578 SDHA
actin binding LIM protein 1 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. 3983 ENSG00000099204 ABLIM1
phospholipase A2 group V This gene is a member of the secretory phospholipase A2 family. It is located in a tightly-linked cluster of secretory phospholipase A2 genes on chromosome 1. The encoded enzyme catalyzes the hydrolysis of membrane phospholipids to generate lysophospholipids and free fatty acids including arachidonic acid. It preferentially hydrolyzes linoleoyl-containing phosphatidylcholine substrates. Secretion of this enzyme is thought to induce inflammatory responses in neighboring cells. Alternatively spliced transcript variants have been found, but their full-length nature has not been determined. 5322 ENSG00000127472 PLA2G5
lysosomal protein transmembrane 4 beta NA 55353 ENSG00000104341 LAPTM4B
cytochrome c oxidase subunit 4I1 Cytochrome c oxidase (COX) is the terminal enzyme of the mitochondrial respiratory chain. It is a multi-subunit enzyme complex that couples the transfer of electrons from cytochrome c to molecular oxygen and contributes to a proton electrochemical gradient across the inner mitochondrial membrane. The complex consists of 13 mitochondrial- and nuclear-encoded subunits. The mitochondrially-encoded subunits perform the electron transfer and proton pumping activities. The functions of the nuclear-encoded subunits are unknown but they may play a role in the regulation and assembly of the complex. This gene encodes the nuclear-encoded subunit IV isoform 1 of the human mitochondrial respiratory chain enzyme. It is located at the 3’ of the NOC4 (neighbor of COX4) gene in a head-to-head orientation, and shares a promoter with it. Pseudogenes related to this gene are located on chromosomes 13 and 14. Alternative splicing results in multiple transcript variants encoding different isoforms. 1327 ENSG00000131143 COX4I1
NADH:ubiquinone oxidoreductase subunit AB1 NA 4706 ENSG00000004779 NDUFAB1
solute carrier family 25 member 3 The protein encoded by this gene catalyzes the transport of phosphate into the mitochondrial matrix, either by proton cotransport or in exchange for hydroxyl ions. The protein contains three related segments arranged in tandem which are related to those found in other characterized members of the mitochondrial carrier family. Both the N-terminal and C-terminal regions of this protein protrude toward the cytosol. Multiple alternatively spliced transcript variants have been isolated. 5250 ENSG00000075415 SLC25A3
ADP-ribosylhydrolase like 1 ADP-ribosylation is a reversible posttranslational modification used to regulate protein function. ADP-ribosyltransferases (see ART1; MIM 601625) transfer ADP-ribose from NAD+ to the target protein, and ADP-ribosylhydrolases, such as ADPRHL1, reverse the reaction (Glowacki et al., 2002 [PubMed 12070318]). 113622 ENSG00000153531 ADPRHL1
glutamic-oxaloacetic transaminase 1 Glutamic-oxaloacetic transaminase is a pyridoxal phosphate-dependent enzyme which exists in cytoplasmic and mitochondrial forms, GOT1 and GOT2, respectively. GOT plays a role in amino acid metabolism and the urea and tricarboxylic acid cycles. The two enzymes are homodimeric and show close homology. 2805 ENSG00000120053 GOT1
myosin light chain 12A This gene encodes a nonsarcomeric myosin regulatory light chain. This protein is activated by phosphorylation and regulates smooth muscle and non-muscle cell contraction. This protein may also be involved in DNA damage repair by sequestering the transcriptional regulator apoptosis-antagonizing transcription factor (AATF)/Che-1 which functions as a repressor of p53-driven apoptosis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 8. 10627 ENSG00000101608 MYL12A
cytochrome c oxidase subunit 6B1 Cytochrome c oxidase (COX), the terminal enzyme of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. It is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may be involved in the regulation and assembly of the complex. This nuclear gene encodes subunit VIb. Mutations in this gene are associated with severe infantile encephalomyopathy. Three pseudogenes COX6BP-1, COX6BP-2 and COX6BP-3 have been found on chromosomes 7, 17 and 22q13.1-13.2, respectively. 1340 ENSG00000126267 COX6B1
chromosome 15 open reading frame 41 This gene encodes a protein with two predicted helix-turn-helix domains. Mutations in this gene were found in families with congenital dyserythropoietic anemia type Ib. Alternative splicing results in multiple transcript variants encoding different isoforms. 84529 ENSG00000186073 C15orf41
cytochrome c oxidase subunit 5B Cytochrome C oxidase (COX) is the terminal enzyme of the mitochondrial respiratory chain. It is a multi-subunit enzyme complex that couples the transfer of electrons from cytochrome c to molecular oxygen and contributes to a proton electrochemical gradient across the inner mitochondrial membrane. The complex consists of 13 mitochondrial- and nuclear-encoded subunits. The mitochondrially-encoded subunits perform the electron transfer and proton pumping activities. The functions of the nuclear-encoded subunits are unknown but they may play a role in the regulation and assembly of the complex. This gene encodes the nuclear-encoded subunit Vb of the human mitochondrial respiratory chain enzyme. 1329 ENSG00000135940 COX5B
ATP synthase, H+ transporting, mitochondrial Fo complex subunit F6 Mitochondrial ATP synthase catalyzes ATP synthesis, utilizing an electrochemical gradient of protons across the inner membrane during oxidative phosphorylation. It is composed of two linked multi-subunit complexes: the soluble catalytic core, F1, and the membrane-spanning component, Fo, which comprises the proton channel. The F1 complex consists of 5 different subunits (alpha, beta, gamma, delta, and epsilon) assembled in a ratio of 3 alpha, 3 beta, and a single representative of the other 3. The Fo complex has nine subunits (a, b, c, d, e, f, g, F6 and 8). This gene encodes the F6 subunit of the Fo complex. The F6 subunit is required for F1 and Fo interactions. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. This gene has 1 or more pseudogenes. 522 ENSG00000154723 ATP5J
hedgehog acyltransferase-like NA 57467 ENSG00000010282 HHATL
SET and MYND domain containing 2 SET domain-containing proteins, such as SMYD2, catalyze lysine methylation (Brown et al., 2006 [PubMed 16805913]). 56950 ENSG00000143499 SMYD2
protein kinase AMP-activated non-catalytic subunit gamma 2 AMP-activated protein kinase (AMPK) is a heterotrimeric protein composed of a catalytic alpha subunit, a noncatalytic beta subunit, and a noncatalytic regulatory gamma subunit. Various forms of each of these subunits exist, encoded by different genes. AMPK is an important energy-sensing enzyme that monitors cellular energy status and functions by inactivating key enzymes involved in regulating de novo biosynthesis of fatty acid and cholesterol. This gene is a member of the AMPK gamma subunit family. Mutations in this gene have been associated with Wolff-Parkinson-White syndrome, familial hypertrophic cardiomyopathy, and glycogen storage disease of the heart. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 51422 ENSG00000106617 PRKAG2
ubiquinol-cytochrome c reductase core protein I NA 7384 ENSG00000010256 UQCRC1
cytochrome c oxidase subunit 5A Cytochrome c oxidase (COX) is the terminal enzyme of the mitochondrial respiratory chain. It is a multi-subunit enzyme complex that couples the transfer of electrons from cytochrome c to molecular oxygen and contributes to a proton electrochemical gradient across the inner mitochondrial membrane. The complex consists of 13 mitochondrial- and nuclear-encoded subunits. The mitochondrially-encoded subunits perform the electron transfer of proton pumping activities. The functions of the nuclear-encoded subunits are unknown but they may play a role in the regulation and assembly of the complex. This gene encodes the nuclear-encoded subunit Va of the human mitochondrial respiratory chain enzyme. A pseudogene COX5AP1 has been found in chromosome 14q22. 9377 ENSG00000178741 COX5A
cysteine rich protein 2 This gene encodes a putative transcription factor with two LIM zinc-binding domains. The encoded protein may participate in the differentiation of smooth muscle tissue. Alternative splicing results in multiple transcript variants. 1397 ENSG00000182809 CRIP2
ATP synthase, H+ transporting, mitochondrial F1 complex, O subunit The protein encoded by this gene is a component of the F-type ATPase found in the mitochondrial matrix. F-type ATPases are composed of a catalytic core and a membrane proton channel. The encoded protein appears to be part of the connector linking these two components and may be involved in transmission of conformational changes or proton conductance. 539 ENSG00000241837 ATP5O
ATP synthase, H+ transporting, mitochondrial Fo complex subunit C1 (subunit 9) This gene encodes a subunit of mitochondrial ATP synthase. Mitochondrial ATP synthase catalyzes ATP synthesis, utilizing an electrochemical gradient of protons across the inner membrane during oxidative phosphorylation. ATP synthase is composed of two linked multi-subunit complexes: the soluble catalytic core, F1, and the membrane-spanning component, Fo, comprising the proton channel. The catalytic portion of mitochondrial ATP synthase consists of 5 different subunits (alpha, beta, gamma, delta, and epsilon) assembled with a stoichiometry of 3 alpha, 3 beta, and a single representative of the other 3. The proton channel seems to have nine subunits (a, b, c, d, e, f, g, F6 and 8). This gene is one of three genes that encode subunit c of the proton channel. Each of the three genes have distinct mitochondrial import sequences but encode the identical mature protein. Alternatively spliced transcript variants encoding the same protein have been identified. 516 ENSG00000159199 ATP5G1
nestin This gene encodes a member of the intermediate filament protein family and is expressed primarily in nerve cells. 10763 ENSG00000132688 NES
ubiquinol-cytochrome c reductase core protein II The protein encoded by this gene is located in the mitochondrion, where it is part of the ubiquinol-cytochrome c reductase complex (also known as complex III). This complex constitutes a part of the mitochondrial respiratory chain. Defects in this gene are a cause of mitochondrial complex III deficiency nuclear type 5. 7385 ENSG00000140740 UQCRC2
oxoglutarate dehydrogenase This gene encodes one subunit of the 2-oxoglutarate dehydrogenase complex. This complex catalyzes the overall conversion of 2-oxoglutarate (alpha-ketoglutarate) to succinyl-CoA and CO(2) during the Krebs cycle. The protein is located in the mitochondrial matrix and uses thiamine pyrophosphate as a cofactor. A congenital deficiency in 2-oxoglutarate dehydrogenase activity is believed to lead to hypotonia, metabolic acidosis, and hyperlactatemia. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 4967 ENSG00000105953 OGDH
NADH:ubiquinone oxidoreductase core subunit V1 The mitochondrial respiratory chain provides energy to cells via oxidative phosphorylation and consists of four membrane-bound electron-transporting protein complexes (I-IV) and an ATP synthase (complex V). This gene encodes a 51 kDa subunit of the NADH:ubiquinone oxidoreductase complex I; a large complex with at least 45 nuclear and mitochondrial encoded subunits that liberates electrons from NADH and channels them to ubiquinone. This subunit carries the NADH-binding site as well as flavin mononucleotide (FMN)- and Fe-S-biding sites. Defects in complex I are a common cause of mitochondrial dysfunction; a syndrome that occurs in approximately 1 in 10,000 live births. Mitochondrial complex I deficiency is linked to myopathies, encephalomyopathies, and neurodegenerative disorders such as Parkinson’s disease and Leigh syndrome. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 4723 ENSG00000167792 NDUFV1
eukaryotic translation initiation factor 1B NA 10289 ENSG00000114784 EIF1B
ATP synthase, H+ transporting, mitochondrial Fo complex subunit D Mitochondrial ATP synthase catalyzes ATP synthesis, utilizing an electrochemical gradient of protons across the inner membrane during oxidative phosphorylation. It is composed of two linked multi-subunit complexes: the soluble catalytic core, F1, and the membrane-spanning component, Fo, which comprises the proton channel. The F1 complex consists of 5 different subunits (alpha, beta, gamma, delta, and epsilon) assembled in a ratio of 3 alpha, 3 beta, and a single representative of the other 3. The Fo seems to have nine subunits (a, b, c, d, e, f, g, F6 and 8). This gene encodes the d subunit of the Fo complex. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. In addition, three pseudogenes are located on chromosomes 9, 12 and 15. 10476 ENSG00000167863 ATP5H
potassium voltage-gated channel interacting protein 2 This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belongs to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified from this gene. 30819 ENSG00000120049 KCNIP2
nudix hydrolase 4 The protein encoded by this gene regulates the turnover of diphosphoinositol polyphosphates. The turnover of these high-energy diphosphoinositol polyphosphates represents a molecular switching activity with important regulatory consequences. Molecular switching by diphosphoinositol polyphosphates may contribute to regulating intracellular trafficking. Several alternatively spliced transcript variants have been described, but the full-length nature of some variants has not been determined. Isoforms DIPP2alpha and DIPP2beta are distinguishable from each other solely by DIPP2beta possessing one additional amino acid due to intron boundary skidding in alternate splicing. 11163 ENSG00000173598 NUDT4
protein kinase cAMP-dependent type I regulatory subunit alpha cAMP is a signaling molecule important for a variety of cellular functions. cAMP exerts its effects by activating the cAMP-dependent protein kinase, which transduces the signal through phosphorylation of different target proteins. The inactive kinase holoenzyme is a tetramer composed of two regulatory and two catalytic subunits. cAMP causes the dissociation of the inactive holoenzyme into a dimer of regulatory subunits bound to four cAMP and two free monomeric catalytic subunits. Four different regulatory subunits and three catalytic subunits have been identified in humans. This gene encodes one of the regulatory subunits. This protein was found to be a tissue-specific extinguisher that down-regulates the expression of seven liver genes in hepatoma x fibroblast hybrids. Mutations in this gene cause Carney complex (CNC). This gene can fuse to the RET protooncogene by gene rearrangement and form the thyroid tumor-specific chimeric oncogene known as PTC2. A nonconventional nuclear localization sequence (NLS) has been found for this protein which suggests a role in DNA replication via the protein serving as a nuclear transport protein for the second subunit of the Replication Factor C (RFC40). Several alternatively spliced transcript variants encoding two different isoforms have been observed. 5573 ENSG00000108946 PRKAR1A
NA NA ENSG00000272030 ENSG00000272030 RP1-178F15.4
transmembrane protein 182 NA 130827 ENSG00000170417 TMEM182
mitogen-activated protein kinase-activated protein kinase 3 This gene encodes a member of the Ser/Thr protein kinase family. This kinase functions as a mitogen-activated protein kinase (MAP kinase)- activated protein kinase. MAP kinases are also known as extracellular signal-regulated kinases (ERKs), act as an integration point for multiple biochemical signals. This kinase was shown to be activated by growth inducers and stress stimulation of cells. In vitro studies demonstrated that ERK, p38 MAP kinase and Jun N-terminal kinase were all able to phosphorylate and activate this kinase, which suggested the role of this kinase as an integrative element of signaling in both mitogen and stress responses. This kinase was reported to interact with, phosphorylate and repress the activity of E47, which is a basic helix-loop-helix transcription factor known to be involved in the regulation of tissue-specific gene expression and cell differentiation. Alternate splicing results in multiple transcript variants that encode the same protein. 7867 ENSG00000114738 MAPKAPK3
aconitase 2 The protein encoded by this gene belongs to the aconitase/IPM isomerase family. It is an enzyme that catalyzes the interconversion of citrate to isocitrate via cis-aconitate in the second step of the TCA cycle. This protein is encoded in the nucleus and functions in the mitochondrion. It was found to be one of the mitochondrial matrix proteins that are preferentially degraded by the serine protease 15(PRSS15), also known as Lon protease, after oxidative modification. 50 ENSG00000100412 ACO2
cytochrome c oxidase subunit 6C Cytochrome c oxidase, the terminal enzyme of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. It is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may be involved in the regulation and assembly of the complex. This nuclear gene encodes subunit VIc, which has 77% amino acid sequence identity with mouse subunit VIc. This gene is up-regulated in prostate cancer cells. A pseudogene has been found on chromosomes 16p12. 1345 ENSG00000164919 COX6C
hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA hydratase (trifunctional protein), beta subunit This gene encodes the beta subunit of the mitochondrial trifunctional protein, which catalyzes the last three steps of mitochondrial beta-oxidation of long chain fatty acids. The mitochondrial membrane-bound heterocomplex is composed of four alpha and four beta subunits, with the beta subunit catalyzing the 3-ketoacyl-CoA thiolase activity. The encoded protein can also bind RNA and decreases the stability of some mRNAs. The genes of the alpha and beta subunits of the mitochondrial trifunctional protein are located adjacent to each other in the human genome in a head-to-head orientation. Mutations in this gene result in trifunctional protein deficiency. Alternatively spliced transcript variants encoding different isoforms have been described. 3032 ENSG00000138029 HADHB
cytochrome c oxidase subunit 7A1 Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 1 (muscle isoform) of subunit VIIa and the polypeptide 1 is present only in muscle tissues. Other polypeptides of subunit VIIa are present in both muscle and nonmuscle tissues, and are encoded by different genes. 1346 ENSG00000161281 COX7A1
NADH:ubiquinone oxidoreductase subunit S5 This gene is a member of the NADH dehydrogenase (ubiquinone) iron-sulfur protein family. The encoded protein is a subunit of the NADH:ubiquinone oxidoreductase (complex I), the first enzyme complex in the electron transport chain located in the inner mitochondrial membrane. Alternative splicing results in multiple transcript variants and pseudogenes have been identified on chromosomes 1, 4 and 17. 4725 ENSG00000168653 NDUFS5
protein phosphatase, Mg2+/Mn2+ dependent 1K This gene encodes a member of the PPM family of Mn2+/Mg2+-dependent protein phosphatases. The encoded protein, essential for cell survival and development, is targeted to the mitochondria where it plays a key role in regulation of the mitochondrial permeability transition pore. 152926 ENSG00000163644 PPM1K
ATP synthase, H+ transporting, mitochondrial F1 complex, gamma polypeptide 1 This gene encodes a subunit of mitochondrial ATP synthase. Mitochondrial ATP synthase catalyzes ATP synthesis, utilizing an electrochemical gradient of protons across the inner membrane during oxidative phosphorylation. ATP synthase is composed of two linked multi-subunit complexes: the soluble catalytic core, F1, and the membrane-spanning component, Fo, comprising the proton channel. The catalytic portion of mitochondrial ATP synthase consists of 5 different subunits (alpha, beta, gamma, delta, and epsilon) assembled with a stoichiometry of 3 alpha, 3 beta, and a single representative of the other 3. The proton channel consists of three main subunits (a, b, c). This gene encodes the gamma subunit of the catalytic core. Alternatively spliced transcript variants encoding different isoforms have been identified. This gene also has a pseudogene on chromosome 14. 509 ENSG00000165629 ATP5C1
NADH:ubiquinone oxidoreductase subunit B2 The protein encoded by this gene is a subunit of the multisubunit NADH:ubiquinone oxidoreductase (complex I). Mammalian complex I is composed of 45 different subunits. This protein has NADH dehydrogenase activity and oxidoreductase activity. It plays a important role in transfering electrons from NADH to the respiratory chain. The immediate electron acceptor for the enzyme is believed to be ubiquinone. Hydropathy analysis revealed that this subunit and 4 other subunits have an overall hydrophilic pattern, even though they are found within the hydrophobic protein (HP) fraction of complex I. 4708 ENSG00000090266 NDUFB2
NADH:ubiquinone oxidoreductase subunit A8 The protein encoded by this gene belongs to the complex I 19 kDa subunit family. Mammalian complex I is composed of 45 different subunits. This protein has NADH dehydrogenase activity and oxidoreductase activity. It plays an important role in transfering electrons from NADH to the respiratory chain. The immediate electron acceptor for the enzyme is believed to be ubiquinone. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. 4702 ENSG00000119421 NDUFA8
malonyl-CoA decarboxylase The product of this gene catalyzes the breakdown of malonyl-CoA to acetyl-CoA and carbon dioxide. Malonyl-CoA is an intermediate in fatty acid biosynthesis, and also inhibits the transport of fatty acyl CoAs into mitochondria. Consequently, the encoded protein acts to increase the rate of fatty acid oxidation. It is found in mitochondria, peroxisomes, and the cytoplasm. Mutations in this gene result in malonyl-CoA decarboyxlase deficiency. 23417 ENSG00000103150 MLYCD
ubiquinol-cytochrome c reductase hinge protein NA 7388 ENSG00000173660 UQCRH
RNA binding protein with multiple splicing 2 NA 348093 ENSG00000166831 RBPMS2
uncharacterized LOC105370792 NA 105370792 ENSG00000174171 LOC105370792
nicotinamide nucleotide transhydrogenase This gene encodes an integral protein of the inner mitochondrial membrane. The enzyme couples hydride transfer between NAD(H) and NADP(+) to proton translocation across the inner mitochondrial membrane. Under most physiological conditions, the enzyme uses energy from the mitochondrial proton gradient to produce high concentrations of NADPH. The resulting NADPH is used for biosynthesis and in free radical detoxification. Two alternatively spliced variants, encoding the same protein, have been found for this gene. 23530 ENSG00000112992 NNT
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",14,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 15 Annotations

out <- mygene::queryMany(gene_list[15,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
X_id symbol summary query name
3860 KRT13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. ENSG00000171401 keratin 13
3851 KRT4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000170477 keratin 4
6707 SPRR3 NA ENSG00000163209 small proline rich protein 3
ENSG00000229732 AC019349.5 NA ENSG00000229732 NA
4589 MUC7 This gene encodes a small salivary mucin, which is thought to play a role in facilitating the clearance of bacteria in the oral cavity and to aid in mastication, speech, and swallowing. The central domain of this glycoprotein contains tandem repeats, each composed of 23 amino acids. This antimicrobial protein has antibacterial and antifungal activity. The most common allele contains 6 repeats, and some alleles may be associated with susceptibility to asthma. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. ENSG00000171195 mucin 7, secreted
49860 CRNN This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. ENSG00000143536 cornulin
51458 RHCG NA ENSG00000140519 Rh family C glycoprotein
3853 KRT6A The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000205420 keratin 6A
124220 ZG16B NA ENSG00000162078 zymogen granule protein 16B
1476 CSTB The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). ENSG00000160213 cystatin B
6700 SPRR2A NA ENSG00000241794 small proline rich protein 2A
301 ANXA1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. ENSG00000135046 annexin A1
7053 TGM3 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. ENSG00000125780 transglutaminase 3
6698 SPRR1A NA ENSG00000169474 small proline rich protein 1A
1893 ECM1 This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. ENSG00000143369 extracellular matrix protein 1
1475 CSTA The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer. ENSG00000121552 cystatin A
2012 EMP1 NA ENSG00000134531 epithelial membrane protein 1
3557 IL1RN The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. ENSG00000136689 interleukin 1 receptor antagonist
2706 GJB2 This gene encodes a member of the gap junction protein family. The gap junctions were first characterized by electron microscopy as regionally specialized structures on plasma membranes of contacting adherent cells. These structures were shown to consist of cell-to-cell channels that facilitate the transfer of ions and small molecules between cells. The gap junction proteins, also known as connexins, purified from fractions of enriched gap junctions from different tissues differ. According to sequence similarities at the nucleotide and amino acid levels, the gap junction proteins are divided into two categories, alpha and beta. Mutations in this gene are responsible for as much as 50% of pre-lingual, recessive deafness. ENSG00000165474 gap junction protein beta 2
11005 SPINK5 This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. ENSG00000133710 serine peptidase inhibitor, Kazal type 5
4118 MAL The protein encoded by this gene is a highly hydrophobic integral membrane protein belonging to the MAL family of proteolipids. The protein has been localized to the endoplasmic reticulum of T-cells and is a candidate linker protein in T-cell signal transduction. In addition, this proteolipid is localized in compact myelin of cells in the nervous system and has been implicated in myelin biogenesis and/or function. The protein plays a role in the formation, stabilization and maintenance of glycosphingolipid-enriched membrane microdomains. Down-regulation of this gene has been associated with a variety of human epithelial malignancies. Alternative splicing produces four transcript variants which vary from each other by the presence or absence of alternatively spliced exons 2 and 3. ENSG00000172005 mal T-cell differentiation protein
57402 S100A14 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828). ENSG00000189334 S100 calcium binding protein A14
5493 PPL The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. ENSG00000118898 periplakin
5304 PIP NA ENSG00000159763 prolactin induced protein
7051 TGM1 The protein encoded by this gene is a membrane protein that catalyzes the addition of an alkyl group from an akylamine to a glutamine residue of a protein, forming an alkylglutamine in the protein. This protein alkylation leads to crosslinking of proteins and catenation of polyamines to proteins. This gene contains either one or two copies of a 22 nt repeat unit in its 3’ UTR. Mutations in this gene have been associated with autosomal recessive lamellar ichthyosis (LI) and nonbullous congenital ichthyosiform erythroderma (NCIE). ENSG00000092295 transglutaminase 1
6273 S100A2 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may have a tumor suppressor function. Chromosomal rearrangements and altered expression of this gene have been implicated in breast cancer. ENSG00000196754 S100 calcium binding protein A2
140576 S100A16 NA ENSG00000188643 S100 calcium binding protein A16
83886 PRSS27 This gene is located within a large protease gene cluster on chromosome 16. It belongs to the group-1 subfamily of serine proteases. The encoded protein is a secreted tryptic serine protease and is expressed mainly in the pancreas. Alternative splicing results in multiple transcript variants. ENSG00000172382 protease, serine 27
6699 SPRR1B The protein encoded by this gene is an envelope protein of keratinocytes. The encoded protein is crosslinked to membrane proteins by transglutaminase, forming an insoluble layer under the plasma membrane. This protein is proline-rich and contains several tandem amino acid repeats. ENSG00000169469 small proline rich protein 1B
64855 FAM129B NA ENSG00000136830 family with sequence similarity 129 member B
1824 DSC2 This gene encodes a member of the desmocollin protein subfamily. Desmocollins, along with desmogleins, are cadherin-like transmembrane glycoproteins that are major components of the desmosome. Desmosomes are cell-cell junctions that help resist shearing forces and are found in high concentrations in cells subject to mechanical stress. This gene is found in a cluster with other desmocollin family members on chromosome 18. Mutations in this gene are associated with arrhythmogenic right ventricular dysplasia-11, and reduced protein expression has been described in several types of cancer. Alternative splicing results in multiple transcript variants. ENSG00000134755 desmocollin 2
ENSG00000234964 FABP5P7 NA ENSG00000234964 fatty acid binding protein 5 pseudogene 7
5266 PI3 This gene encodes an elastase-specific inhibitor that functions as an antimicrobial peptide against Gram-positive and Gram-negative bacteria, and fungal pathogens. The protein contains a WAP-type four-disulfide core (WFDC) domain, and is thus a member of the WFDC domain family. Most WFDC gene members are localized to chromosome 20q12-q13 in two clusters: centromeric and telomeric. This gene belongs to the centromeric cluster. Expression of this gene is upgulated by bacterial lipopolysaccharides and cytokines. ENSG00000124102 peptidase inhibitor 3
11272 PRR4 This gene encodes a member of the proline-rich protein family that lacks a conserved repetitive domain. This protein may play a role in protective functions in the eye. Alternative splicing result in multiple transcript variants. Read-through transcription also exists between this gene and the upstream PRH1 (proline-rich protein HaeIII subfamily 1) gene. ENSG00000111215 proline rich 4 (lacrimal)
84518 CNFN NA ENSG00000105427 cornifelin
1382 CRABP2 This gene encodes a member of the retinoic acid (RA, a form of vitamin A) binding protein family and lipocalin/cytosolic fatty-acid binding protein family. The protein is a cytosol-to-nuclear shuttling protein, which facilitates RA binding to its cognate receptor complex and transfer to the nucleus. It is involved in the retinoid signaling pathway, and is associated with increased circulating low-density lipoprotein cholesterol. Alternatively spliced transcript variants encoding the same protein have been found for this gene. ENSG00000143320 cellular retinoic acid binding protein 2
54544 CRCT1 NA ENSG00000169509 cysteine rich C-terminal 1
6279 S100A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000143546 S100 calcium binding protein A8
360 AQP3 This gene encodes the water channel protein aquaporin 3. Aquaporins are a family of small integral membrane proteins related to the major intrinsic protein, also known as aquaporin 0. Aquaporin 3 is localized at the basal lateral membranes of collecting duct cells in the kidney. In addition to its water channel function, aquaporin 3 has been found to facilitate the transport of nonionic small solutes such as urea and glycerol, but to a smaller degree. It has been suggested that water channels can be functionally heterogeneous and possess water and solute permeation mechanisms. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. ENSG00000165272 aquaporin 3 (Gill blood group)
218 ALDH3A1 Aldehyde dehydrogenases oxidize various aldehydes to the corresponding acids. They are involved in the detoxification of alcohol-derived acetaldehyde and in the metabolism of corticosteroids, biogenic amines, neurotransmitters, and lipid peroxidation. The enzyme encoded by this gene forms a cytoplasmic homodimer that preferentially oxidizes aromatic and medium-chain (6 carbons or more) saturated and unsaturated aldehyde substrates. It is thought to promote resistance to UV and 4-hydroxy-2-nonenal-induced oxidative damage in the cornea. The gene is located within the Smith-Magenis syndrome region on chromosome 17. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000108602 aldehyde dehydrogenase 3 family member A1
30001 ERO1A NA ENSG00000197930 endoplasmic reticulum oxidoreductase alpha
2950 GSTP1 Glutathione S-transferases (GSTs) are a family of enzymes that play an important role in detoxification by catalyzing the conjugation of many hydrophobic and electrophilic compounds with reduced glutathione. Based on their biochemical, immunologic, and structural properties, the soluble GSTs are categorized into 4 main classes: alpha, mu, pi, and theta. This GST family member is a polymorphic gene encoding active, functionally different GSTP1 variant proteins that are thought to function in xenobiotic metabolism and play a role in susceptibility to cancer, and other diseases. ENSG00000084207 glutathione S-transferase pi 1
6590 SLPI This gene encodes a secreted inhibitor which protects epithelial tissues from serine proteases. It is found in various secretions including seminal plasma, cervical mucus, and bronchial secretions, and has affinity for trypsin, leukocyte elastase, and cathepsin G. Its inhibitory effect contributes to the immune response by protecting epithelial surfaces from attack by endogenous proteolytic enzymes. This antimicrobial protein has antibacterial, antifungal and antiviral activity. ENSG00000124107 secretory leukocyte peptidase inhibitor
3868 KRT16 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region of chromosome 17q12-q21. This keratin has been coexpressed with keratin 14 in a number of epithelial tissues, including esophagus, tongue, and hair follicles. Mutations in this gene are associated with type 1 pachyonychia congenita, non-epidermolytic palmoplantar keratoderma and unilateral palmoplantar verrucous nevus. ENSG00000186832 keratin 16
3866 KRT15 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region on chromosome 17q21.2. ENSG00000171346 keratin 15
810 CALML3 NA ENSG00000178363 calmodulin like 3
10205 MPZL2 Thymus development depends on a complex series of interactions between thymocytes and the stromal component of the organ. Epithelial V-like antigen (EVA) is expressed in thymus epithelium and strongly downregulated by thymocyte developmental progression. This gene is expressed in the thymus and in several epithelial structures early in embryogenesis. It is highly homologous to the myelin protein zero and, in thymus-derived epithelial cell lines, is poorly soluble in nonionic detergents, strongly suggesting an association to the cytoskeleton. Its capacity to mediate cell adhesion through a homophilic interaction and its selective regulation by T cell maturation might imply the participation of EVA in the earliest phases of thymus organogenesis. The protein bears a characteristic V-type domain and two potential N-glycosylation sites in the extracellular domain; a putative serine phosphorylation site for casein kinase 2 is also present in the cytoplasmic tail. Two transcript variants encoding the same protein have been found for this gene. ENSG00000149573 myelin protein zero like 2
4070 TACSTD2 This intronless gene encodes a carcinoma-associated antigen. This antigen is a cell surface receptor that transduces calcium signals. Mutations of this gene have been associated with gelatinous drop-like corneal dystrophy. ENSG00000184292 tumor-associated calcium signal transducer 2
147645 VSIG10L NA ENSG00000186806 V-set and immunoglobulin domain containing 10 like
2171 FABP5 This gene encodes the fatty acid binding protein found in epidermal cells, and was first identified as being upregulated in psoriasis tissue. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. FABPs may play roles in fatty acid uptake, transport, and metabolism. Polymorphisms in this gene are associated with type 2 diabetes. The human genome contains many pseudogenes similar to this locus. ENSG00000164687 fatty acid binding protein 5
1969 EPHA2 This gene belongs to the ephrin receptor subfamily of the protein-tyrosine kinase family. EPH and EPH-related receptors have been implicated in mediating developmental events, particularly in the nervous system. Receptors in the EPH subfamily typically have a single kinase domain and an extracellular region containing a Cys-rich domain and 2 fibronectin type III repeats. The ephrin receptors are divided into 2 groups based on the similarity of their extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. This gene encodes a protein that binds ephrin-A ligands. Mutations in this gene are the cause of certain genetically-related cataract disorders. ENSG00000142627 EPH receptor A2
6278 S100A7 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein differs from the other S100 proteins of known structure in its lack of calcium binding ability in one EF-hand at the N-terminus. The protein is overexpressed in hyperproliferative skin diseases, exhibits antimicrobial activities against bacteria and induces immunomodulatory activities. ENSG00000143556 S100 calcium binding protein A7
10890 RAB10 RAB10 belongs to the RAS (see HRAS; MIM 190020) superfamily of small GTPases. RAB proteins localize to exocytic and endocytic compartments and regulate intracellular vesicle trafficking (Bao et al., 1998 [PubMed 9918381]). ENSG00000084733 RAB10, member RAS oncogene family
54055 CYP4F29P NA ENSG00000228314 cytochrome P450 family 4 subfamily F member 29, pseudogene
375791 CYSRT1 NA ENSG00000197191 cysteine rich tail 1
7295 TXN The protein encoded by this gene acts as a homodimer and is involved in many redox reactions. The encoded protein is active in the reversible S-nitrosylation of cysteines in certain proteins, which is part of the response to intracellular nitric oxide. This protein is found in the cytoplasm. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000136810 thioredoxin
23508 TTC9 This gene encodes a protein that contains three tetratricopeptide repeats. The gene has been shown to be hormonally regulated in breast cancer cells and may play a role in cancer cell invasion and metastasis. ENSG00000133985 tetratricopeptide repeat domain 9
64787 EPS8L2 This gene encodes a member of the EPS8 gene family. The encoded protein, like other members of the family, is thought to link growth factor stimulation to actin organization, generating functional redundancy in the pathways that regulate actin cytoskeletal remodeling. ENSG00000177106 EPS8 like 2
3315 HSPB1 The protein encoded by this gene is induced by environmental stress and developmental changes. The encoded protein is involved in stress resistance and actin organization and translocates from the cytoplasm to the nucleus upon stress induction. Defects in this gene are a cause of Charcot-Marie-Tooth disease type 2F (CMT2F) and distal hereditary motor neuropathy (dHMN). ENSG00000106211 heat shock protein family B (small) member 1
3852 KRT5 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000186081 keratin 5
2810 SFN NA ENSG00000175793 stratifin
823 CAPN1 The calpains, calcium-activated neutral proteases, are nonlysosomal, intracellular cysteine proteases. The mammalian calpains include ubiquitous, stomach-specific, and muscle-specific proteins. The ubiquitous enzymes consist of heterodimers with distinct large, catalytic subunits associated with a common small, regulatory subunit. This gene encodes the large subunit of the ubiquitous enzyme, calpain 1. Several transcript variants encoding two different isoforms have been found for this gene. ENSG00000014216 calpain 1
8581 LY6D NA ENSG00000167656 lymphocyte antigen 6 complex, locus D
6281 S100A10 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in exocytosis and endocytosis. ENSG00000197747 S100 calcium binding protein A10
202 AIM1 NA ENSG00000112297 absent in melanoma 1
ENSG00000256462 RP11-116G8.5 NA ENSG00000256462 NA
54869 EPS8L1 This gene encodes a protein that is related to epidermal growth factor receptor pathway substrate 8 (EPS8), a substrate for the epidermal growth factor receptor. The function of this protein is unknown. At least two alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000131037 EPS8 like 1
10509 SEMA4B NA ENSG00000185033 semaphorin 4B
84790 TUBA1C NA ENSG00000167553 tubulin alpha 1c
928 CD9 This gene encodes a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Tetraspanins are cell surface glycoproteins with four transmembrane domains that form multimeric complexes with other cell surface proteins. The encoded protein functions in many cellular processes including differentiation, adhesion, and signal transduction, and expression of this gene plays a critical role in the suppression of cancer cell motility and metastasis. ENSG00000010278 CD9 molecule
29766 TMOD3 NA ENSG00000138594 tropomodulin 3
239 ALOX12 NA ENSG00000108839 arachidonate 12-lipoxygenase
10755 GIPC1 GIPC1 is a scaffolding protein that regulates cell surface receptor expression and trafficking (Lee et al., 2008 [PubMed 18775991]). ENSG00000123159 GIPC PDZ domain containing family member 1
23170 TTLL12 NA ENSG00000100304 tubulin tyrosine ligase like 12
22822 PHLDA1 This gene encodes an evolutionarily conserved proline-histidine rich nuclear protein. The encoded protein may play an important role in the anti-apoptotic effects of insulin-like growth factor-1. ENSG00000139289 pleckstrin homology like domain family A member 1
7534 YWHAZ This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 99% identical to the mouse, rat and sheep orthologs. The encoded protein interacts with IRS1 protein, suggesting a role in regulating insulin sensitivity. Several transcript variants that differ in the 5’ UTR but that encode the same protein have been identified for this gene. ENSG00000164924 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein zeta
6704 SPRR2E This gene encodes a member of a family of small proline-rich proteins clustered in the epidermal differentiation complex on chromosome 1q21. The encoded protein, along with other family members, is a component of the cornified cell envelope that forms beneath the plasma membrane in terminally differentiated stratified squamous epithelia. This envelope serves as a barrier against extracellular and environmental factors. The seven SPRR2 genes (A-G) appear to have been homogenized by gene conversion compared to others in the cluster that exhibit greater differences in protein structure. ENSG00000203785 small proline rich protein 2E
3038 HAS3 The protein encoded by this gene is involved in the synthesis of the unbranched glycosaminoglycan hyaluronan, or hyaluronic acid, which is a major constituent of the extracellular matrix. This gene is a member of the NODC/HAS gene family. Compared to the proteins encoded by other members of this gene family, this protein appears to be more of a regulator of hyaluronan synthesis. Alternative splicing results in multiple transcript variants. ENSG00000103044 hyaluronan synthase 3
ENSG00000258232 RP11-161H23.5 NA ENSG00000258232 NA
375449 MAST4 NA ENSG00000069020 microtubule associated serine/threonine kinase family member 4
9022 CLIC3 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 3 is a member of the p64 family and is predominantly localized in the nucleus and stimulates chloride ion channel activity. In addition, this protein may participate in cellular growth control, based on its association with ERK7, a member of the MAP kinase family. ENSG00000169583 chloride intracellular channel 3
9748 SLK NA ENSG00000065613 STE20 like kinase
1500 CTNND1 This gene encodes a member of the Armadillo protein family, which function in adhesion between cells and signal transduction. Multiple translation initiation codons and alternative splicing result in many different isoforms being translated. Not all of the full-length natures of the described transcript variants have been determined. Read-through transcription also exists between this gene and the neighboring upstream thioredoxin-related transmembrane protein 2 (TMX2) gene. ENSG00000198561 catenin delta 1
8553 BHLHE40 This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. ENSG00000134107 basic helix-loop-helix family member e40
9525 VPS4B The protein encoded by this gene is a member of the AAA protein family (ATPases associated with diverse cellular activities), and is the homolog of the yeast Vps4 protein. In humans, two paralogs of the yeast protein have been identified. The former share a high degree of aa sequence similarity with each other, and also with yeast Vps4 and mouse Skd1 proteins. Mouse Skd1 (suppressor of K+ transport defect 1) has been shown to be a yeast Vps4 ortholog. Functional studies indicate that both human paralogs associate with the endosomal compartments, and are involved in intracellular protein trafficking, similar to Vps4 protein in yeast. The gene encoding this paralog has been mapped to chromosome 18; the gene for the other resides on chromosome 16. ENSG00000119541 vacuolar protein sorting 4 homolog B
3934 LCN2 This gene encodes a protein that belongs to the lipocalin family. Members of this family transport small hydrophobic molecules such as lipids, steroid hormones and retinoids. The protein encoded by this gene is a neutrophil gelatinase-associated lipocalin and plays a role in innate immunity by limiting bacterial growth as a result of sequestering iron-containing siderophores. The presence of this protein in blood and urine is an early biomarker of acute kidney injury. This protein is thought to be be involved in multiple cellular processes, including maintenance of skin homeostasis, and suppression of invasiveness and metastasis. Mice lacking this gene are more susceptible to bacterial infection than wild type mice. ENSG00000148346 lipocalin 2
4780 NFE2L2 This gene encodes a transcription factor which is a member of a small family of basic leucine zipper (bZIP) proteins. The encoded transcription factor regulates genes which contain antioxidant response elements (ARE) in their promoters; many of these genes encode proteins involved in response to injury and inflammation which includes the production of free radicals. Multiple transcript variants encoding different isoforms have been characterized for this gene. ENSG00000116044 nuclear factor, erythroid 2 like 2
54809 SAMD9 This gene encodes a sterile alpha motif domain-containing protein. The encoded protein localizes to the cytoplasm and may play a role in regulating cell proliferation and apoptosis. Mutations in this gene are the cause of normophosphatemic familial tumoral calcinosis. Alternate splicing results in multiple transcript variants that encode the same protein. ENSG00000205413 sterile alpha motif domain containing 9
5875 RABGGTA NA ENSG00000100949 Rab geranylgeranyltransferase alpha subunit
29984 RHOD Ras homolog, or Rho, proteins interact with protein kinases and may serve as targets for activated GTPase. They play a critical role in muscle differentiation. The protein encoded by this gene binds GTP and is a member of the small GTPase superfamily. It is involved in endosome dynamics and reorganization of the actin cytoskeleton, and it may coordinate membrane transport with the function of the cytoskeleton. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000173156 ras homolog family member D
57111 RAB25 The protein encoded by this gene is a member of the RAS superfamily of small GTPases. The encoded protein is involved in membrane trafficking and cell survival. This gene has been found to be a tumor suppressor and an oncogene, depending on the context. Two variants, one protein-coding and the other not, have been found for this gene. ENSG00000132698 RAB25, member RAS oncogene family
8780 RIOK3 This gene was identified by the similarity of its product to the Aspergillus nidulans SUDD protein, an extragenic suppressor of the heat-sensitive bimD6 mutation that fails to attach properly to the spindle microtubules at a restrictive temperature. The specific function of this gene has not yet been determined. ENSG00000101782 RIO kinase 3
286077 FAM83H The protein encoded by this gene plays an important role in the structural development and calcification of tooth enamel. Defects in this gene are a cause of amelogenesis imperfecta type 3 (AI3). ENSG00000180921 family with sequence similarity 83 member H
150696 PROM2 This gene encodes a member of the prominin family of pentaspan membrane glycoproteins. The encoded protein localizes to basal epithelial cells and may be involved in the organization of plasma membrane microdomains. Alternative splicing results in multiple transcript variants. ENSG00000155066 prominin 2
134266 GRPEL2 NA ENSG00000164284 GrpE like 2, mitochondrial
639 PRDM1 This gene encodes a protein that acts as a repressor of beta-interferon gene expression. The protein binds specifically to the PRDI (positive regulatory domain I element) of the beta-IFN gene promoter. Transcription of this gene increases upon virus induction. Two alternatively spliced transcript variants that encode different isoforms have been reported. ENSG00000057657 PR domain 1
8895 CPNE3 Calcium-dependent membrane-binding proteins may regulate molecular events at the interface of the cell membrane and cytoplasm. This gene encodes a protein which contains two type II C2 domains in the amino-terminus and an A domain-like sequence in the carboxy-terminus. The A domain mediates interactions between integrins and extracellular ligands. ENSG00000085719 copine 3
9368 SLC9A3R1 This gene encodes a sodium/hydrogen exchanger regulatory cofactor. The protein interacts with and regulates various proteins including the cystic fibrosis transmembrane conductance regulator and G-protein coupled receptors such as the beta2-adrenergic receptor and the parathyroid hormone 1 receptor. The protein also interacts with proteins that function as linkers between integral membrane and cytoskeletal proteins. The protein localizes to actin-rich structures including membrane ruffles, microvilli, and filopodia. Mutations in this gene result in hypophosphatemic nephrolithiasis/osteoporosis type 2, and loss of heterozygosity of this gene is implicated in breast cancer. ENSG00000109062 SLC9A3 regulator 1
8140 SLC7A5 NA ENSG00000103257 solute carrier family 7 member 5
5338 PLD2 The protein encoded by this gene catalyzes the hydrolysis of phosphatidylcholine to phosphatidic acid and choline. The activity of the encoded enzyme is enhanced by phosphatidylinositol 4,5-bisphosphate and ADP-ribosylation factor-1. This protein localizes to the peripheral membrane and may be involved in cytoskeletal organization, cell cycle control, transcriptional regulation, and/or regulated secretion. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000129219 phospholipase D2
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",15,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 16 Annotations

out <- mygene::queryMany(gene_list[16,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary query name X_id symbol notfound
This gene encodes the pulmonary-associated surfactant protein B (SPB), an amphipathic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. The SPB enhances the rate of spreading and increases the stability of surfactant monolayers in vitro. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 1, also called pulmonary alveolar proteinosis due to surfactant protein B deficiency, and are associated with fatal respiratory distress in the neonatal period. Alternatively spliced transcript variants encoding the same protein have been identified. ENSG00000168878 surfactant protein B 6439 SFTPB NA
This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. ENSG00000122852 surfactant protein A1 653509 SFTPA1 NA
This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. ENSG00000185303 surfactant protein A2 729238 SFTPA2 NA
This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. ENSG00000168484 surfactant protein C 6440 SFTPC NA
Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. ENSG00000175899 alpha-2-macroglobulin 2 A2M NA
This gene encodes a member of the peptidase A1 family of aspartic proteases. The encoded preproprotein is proteolytically processed to generate an activation peptide and the mature protease. The activation peptides of aspartic proteinases function as inhibitors of the protease active site. These peptide segments, or pro-parts, are deemed important for correct folding, targeting, and control of the activation of aspartic proteinase zymogens. The encoded protease may play a role in the proteolytic processing of pulmonary surfactant protein B in the lung and may function in protein catabolism in the renal proximal tubules. This gene has been described as a marker for lung adenocarcinoma and renal cell carcinoma. ENSG00000131400 napsin A aspartic peptidase 9476 NAPSA NA
The advanced glycosylation end product (AGE) receptor encoded by this gene is a member of the immunoglobulin superfamily of cell surface receptors. It is a multiligand receptor, and besides AGE, interacts with other molecules implicated in homeostasis, development, and inflammation, and certain diseases, such as diabetes and Alzheimer’s disease. Many alternatively spliced transcript variants encoding different isoforms, as well as non-protein-coding variants, have been described for this gene (PMID:18089847). ENSG00000204305 advanced glycosylation end product-specific receptor 177 AGER NA
This gene encodes a member of the type 3 G protein-coupling receptor family, characterized by the signature 7-transmembrane domain motif. The encoded protein may be involved in interaction between retinoid acid and G protein signalling pathways. Retinoic acid plays a critical role in development, cellular growth, and differentiation. This gene may play a role in embryonic development and epithelial cell differentiation. ENSG00000013588 G protein-coupled receptor class C group 5 member A 9052 GPRC5A NA
This gene encodes a transcription factor involved in the induction of genes regulated by oxygen, which is induced as oxygen levels fall. The encoded protein contains a basic-helix-loop-helix domain protein dimerization domain as well as a domain found in proteins in signal transduction pathways which respond to oxygen levels. Mutations in this gene are associated with erythrocytosis familial type 4. ENSG00000116016 endothelial PAS domain protein 1 2034 EPAS1 NA
Ficolins are a group of proteins which consist of a collagen-like domain and a fibrinogen-like domain. In human serum, there are two types of ficolins, both of which have lectin activity. The protein encoded by this gene is a thermolabile beta-2-macroglycoprotein found in all human serum and is a member of the ficolin/opsonin p35 lectin family. The protein, which was initially identified based on its reactivity with sera from patients with systemic lupus erythematosus, has been shown to have a calcium-independent lectin activity. The protein can activate the complement pathway in association with MASPs and sMAP, thereby aiding in host defense through the activation of the lectin pathway. Alternative splicing occurs at this locus and two variants, each encoding a distinct isoform, have been identified. ENSG00000142748 ficolin 3 8547 FCN3 NA
This gene encodes a cell surface glycoprotein which is typically expressed on endothelial cells and cells of the immune system. It binds to integrins of type CD11a / CD18, or CD11b / CD18 and is also exploited by Rhinovirus as a receptor. ENSG00000090339 intercellular adhesion molecule 1 3383 ICAM1 NA
The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. The full transporter encoded by this gene may be involved in development of resistance to xenobiotics and engulfment during programmed cell death. ENSG00000167972 ATP binding cassette subfamily A member 3 21 ABCA3 NA
This gene encodes a prostaglandin transporter that is a member of the 12-membrane-spanning superfamily of transporters. The encoded protein may be involved in mediating the uptake and clearance of prostaglandins in numerous tissues. ENSG00000174640 solute carrier organic anion transporter family member 2A1 6578 SLCO2A1 NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. ENSG00000135480 keratin 7 3855 KRT7 NA
N-methylation of endogenous and xenobiotic compounds is a major method by which they are degraded. This gene encodes an enzyme that N-methylates indoles such as tryptamine. Alternative splicing results in multiple transcript variants. Read-through transcription also exists between this gene and the downstream FAM188B (family with sequence similarity 188, member B) gene. ENSG00000241644 indolethylamine N-methyltransferase 11185 INMT NA
NA ENSG00000166292 transmembrane protein 100 55273 TMEM100 NA
This gene encodes a tetraspan protein of the PMP22/EMP family. The encoded protein regulates cell membrane composition. It has been associated with various functions including endocytosis, cell signaling, cell proliferation, cell migration, cell adhesion, cell death, cholesterol homeostasis, urinary albumin excretion, and embryo implantation. It is known to negatively regulate caveolin-1, a scaffolding protein which is the main component of the caveolae plasma membrane invaginations found in most cell types. Through activation of PTK2 it positively regulates vascular endothelial growth factor A. It also modulates the function of specific integrin isomers in the plasma membrane. Up-regulation of this gene has been linked to cancer progression in multiple different tissues. Mutations in this gene have been associated with nephrotic syndrome type 10 (NPHS10). ENSG00000213853 epithelial membrane protein 2 2013 EMP2 NA
This gene is thought to regulate cell cycle progression. It is induced by p53 in response to DNA damage, or by sublytic levels of complement system proteins that result in activation of the cell cycle. The encoded protein localizes to the cytoplasm during interphase and to centrosomes during mitosis. The protein forms a complex with polo-like kinase 1. The protein also translocates to the nucleus in response to treatment with complement system proteins, and can associate with and increase the kinase activity of cell division cycle 2 protein. In different assays and cell types, overexpression of this protein has been shown to activate or suppress cell cycle progression. ENSG00000102760 regulator of cell cycle 28984 RGCC NA
This gene encodes a member of the SLC39 family of solute-carrier genes, which show structural characteristics of zinc transporters. The encoded protein is glycosylated and found in the plasma membrane and mitochondria, and functions in the cellular import of zinc at the onset of inflammation. It is also thought to be the primary transporter of the toxic cation cadmium, which is found in cigarette smoke. Multiple transcript variants encoding different isoforms have been found for this gene. Additional alternatively spliced transcript variants of this gene have been described, but their full-length nature is not known. ENSG00000138821 solute carrier family 39 member 8 64116 SLC39A8 NA
NA ENSG00000267607 NA ENSG00000267607 CTD-2369P2.8 NA
The protein encoded by this gene is a cytokine that controls the production, differentiation, and function of granulocytes. The active protein is found extracellularly. Alternatively spliced transcript variants have been described for this gene. ENSG00000108342 colony stimulating factor 3 1440 CSF3 NA
NA ENSG00000127603 KIAA0754 643314 KIAA0754 NA
This gene encodes a large protein containing numerous spectrin and leucine-rich repeat (LRR) domains. The encoded protein is a member of a family of proteins that form bridges between different cytoskeletal elements. This protein facilitates actin-microtubule interactions at the cell periphery and couples the microtubule network to cellular junctions. Alternative splicing results in multiple transcript variants, but the full-length nature of some of these variants has not been determined. ENSG00000127603 microtubule-actin crosslinking factor 1 23499 MACF1 NA
This gene encodes a member of the sialomucin protein family. The encoded protein was originally identified as an important component of glomerular podocytes. Podocytes are highly differentiated epithelial cells with interdigitating foot processes covering the outer aspect of the glomerular basement membrane. Other biological activities of the encoded protein include: binding in a membrane protein complex with Na+/H+ exchanger regulatory factor to intracellular cytoskeletal elements, playing a role in hematopoetic cell differentiation, and being expressed in vascular endothelium cells and binding to L-selectin. ENSG00000128567 podocalyxin like 5420 PODXL NA
NA ENSG00000135048 transmembrane protein 2 23670 TMEM2 NA
This gene encodes a type I cell-surface receptor for the TGF-beta superfamily of ligands. It shares with other type I receptors a high degree of similarity in serine-threonine kinase subdomains, a glycine- and serine-rich region (called the GS domain) preceding the kinase domain, and a short C-terminal tail. The encoded protein, sometimes termed ALK1, shares similar domain structures with other closely related ALK or activin receptor-like kinase proteins that form a subfamily of receptor serine/threonine kinases. Mutations in this gene are associated with hemorrhagic telangiectasia type 2, also known as Rendu-Osler-Weber syndrome 2. ENSG00000139567 activin A receptor like type 1 94 ACVRL1 NA
This gene encodes a protein containing a lipid recognition domain. The encoded protein may function in regulating the transport of cholesterol through the late endosomal/lysosomal system. Mutations in this gene have been associated with Niemann-Pick disease, type C2 and frontal lobe atrophy. ENSG00000119655 Niemann-Pick disease, type C2 10577 NPC2 NA
This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000145506 naked cuticle homolog 2 85409 NKD2 NA
The protein encoded by this gene is a member of the class A scavenger receptor family and is part of the innate antimicrobial immune system. The protein may bind both Gram-negative and Gram-positive bacteria via an extracellular, C-terminal, scavenger receptor cysteine-rich (SRCR) domain. In addition to short cytoplasmic and transmembrane domains, there is an extracellular spacer domain and a long, extracellular collagenous domain. The protein may form a trimeric molecule by the association of the collagenous domains of three identical polypeptide chains. ENSG00000019169 macrophage receptor with collagenous structure 8685 MARCO NA
This gene encodes a weak acid-active hyaluronidase. The encoded protein is similar in structure to other more active hyaluronidases. Hyaluronidases degrade hyaluronan, one of the major glycosaminoglycans of the extracellular matrix. Hyaluronan and fragments of hyaluronan are thought to be involved in cell proliferation, migration and differentiation. Although it was previously thought to be a lysosomal hyaluronidase that is active at a pH below 4, the encoded protein is likely a GPI-anchored cell surface protein. This hyaluronidase serves as a receptor for the oncogenic virus Jaagsiekte sheep retrovirus. The gene is one of several related genes in a region of chromosome 3p21.3 associated with tumor suppression. This gene encodes two alternatively spliced transcript variants which differ only in the 5’ UTR. ENSG00000068001 hyaluronoglucosaminidase 2 8692 HYAL2 NA
This gene encodes a member of the NF-kappa-B inhibitor family, which contain multiple ankrin repeat domains. The encoded protein interacts with REL dimers to inhibit NF-kappa-B/REL complexes which are involved in inflammatory responses. The encoded protein moves between the cytoplasm and the nucleus via a nuclear localization signal and CRM1-mediated nuclear export. Mutations in this gene have been found in ectodermal dysplasia anhidrotic with T-cell immunodeficiency autosomal dominant disease. ENSG00000100906 NFKB inhibitor alpha 4792 NFKBIA NA
The protein encoded by this gene is a lysosomal cysteine proteinase important in the overall degradation of lysosomal proteins. It is composed of a dimer of disulfide-linked heavy and light chains, both produced from a single protein precursor. The encoded protein, which belongs to the peptidase C1 protein family, can act both as an aminopeptidase and as an endopeptidase. Increased expression of this gene has been correlated with malignant progression of prostate tumors. Alternate splicing of this gene results in multiple transcript variants encoding different isoforms. ENSG00000103811 cathepsin H 1512 CTSH NA
Prostaglandin-endoperoxide synthase (PTGS), also known as cyclooxygenase, is the key enzyme in prostaglandin biosynthesis, and acts both as a dioxygenase and as a peroxidase. There are two isozymes of PTGS: a constitutive PTGS1 and an inducible PTGS2, which differ in their regulation of expression and tissue distribution. This gene encodes the inducible isozyme. It is regulated by specific stimulatory events, suggesting that it is responsible for the prostanoid biosynthesis involved in inflammation and mitogenesis. ENSG00000073756 prostaglandin-endoperoxide synthase 2 5743 PTGS2 NA
NA ENSG00000161055 secretoglobin family 3A member 1 92304 SCGB3A1 NA
This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. ENSG00000139329 lumican 4060 LUM NA
The protein encoded by this gene is an integral membrane ATPase. The encoded protein is probably phosphorylated in its intermediate state and likely drives the transport of ions such as calcium across membranes. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000068650 ATPase phospholipid transporting 11A 23250 ATP11A NA
This gene encodes the class A macrophage scavenger receptors, which include three different types (1, 2, 3) generated by alternative splicing of this gene. These receptors or isoforms are macrophage-specific trimeric integral membrane glycoproteins and have been implicated in many macrophage-associated physiological and pathological processes including atherosclerosis, Alzheimer’s disease, and host defense. The isoforms type 1 and type 2 are functional receptors and are able to mediate the endocytosis of modified low density lipoproteins (LDLs). The isoform type 3 does not internalize modified LDL (acetyl-LDL) despite having the domain shown to mediate this function in the types 1 and 2 isoforms. It has an altered intracellular processing and is trapped within the endoplasmic reticulum, making it unable to perform endocytosis. The isoform type 3 can inhibit the function of isoforms type 1 and type 2 when co-expressed, indicating a dominant negative effect and suggesting a mechanism for regulation of scavenger receptor activity in macrophages. ENSG00000038945 macrophage scavenger receptor 1 4481 MSR1 NA
This gene is one of several cytokine genes clustered on the q-arm of chromosome 17. Chemokines are a superfamily of secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of N-terminal cysteine residues of the mature peptide. This chemokine is a member of the CC subfamily which is characterized by two adjacent cysteine residues. This cytokine displays chemotactic activity for monocytes and basophils but not for neutrophils or eosinophils. It has been implicated in the pathogenesis of diseases characterized by monocytic infiltrates, like psoriasis, rheumatoid arthritis and atherosclerosis. It binds to chemokine receptors CCR2 and CCR4. ENSG00000108691 C-C motif chemokine ligand 2 6347 CCL2 NA
Lysophosphatidylcholine (LPC) acyltransferase (LPCAT; EC 2.3.1.23) catalyzes the conversion of LPC to phosphatidylcholine (PC) in the remodeling pathway of PC biosynthesis (Nakanishi et al., 2006 [PubMed 16704971]). ENSG00000153395 lysophosphatidylcholine acyltransferase 1 79888 LPCAT1 NA
NA ENSG00000099994 sushi domain containing 2 56241 SUSD2 NA
The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP contains an extracellular domain, a single transmembrane segment and one intracytoplasmic catalytic domain, thus belongs to receptor type PTP. The extracellular region of this PTP is composed of multiple fibronectin type_III repeats, which was shown to interact with neuronal receptor and cell adhesion molecules, such as contactin and tenascin C. This protein was also found to interact with sodium channels, and thus may regulate sodium channels by altering tyrosine phosphorylation status. The functions of the interaction partners of this protein implicate the roles of this PTP in cell adhesion, neurite growth, and neuronal differentiation. Alternate transcript variants encoding different isoforms have been found for this gene. ENSG00000127329 protein tyrosine phosphatase, receptor type B 5787 PTPRB NA
This gene is expressed in the kidney cortical epithelial cells and is upregulated by hyperglycemia. The encoded protein shares a high level of similarity to the rat homolog, and contains 3 C2 domains and a diacylglycerol-binding C1 domain. Hyperglycemia increases the levels of diacylglycerol, which has been shown to induce apoptosis in cells transfected with this gene and thus contribute to the renal cell complications of hyperglycemia. Studies in other species also indicate a role for this protein in the priming step of synaptic vesicle exocytosis. ENSG00000198722 unc-13 homolog B (C. elegans) 10497 UNC13B NA
NA ENSG00000145860 ring finger protein 145 153830 RNF145 NA
NA ENSG00000064989 calcitonin receptor like receptor 10203 CALCRL NA
The protein encoded by this gene is a pleiotropic cytokine with roles in several different systems. It is involved in the induction of hematopoietic differentiation in normal and myeloid leukemia cells, induction of neuronal cell differentiation, regulator of mesenchymal to epithelial conversion during kidney development, and may also have a role in immune tolerance at the maternal-fetal interface. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000128342 leukemia inhibitory factor 3976 LIF NA
This gene encodes a protease inhibitor that regulates the tissue factor (TF)-dependent pathway of blood coagulation. The coagulation process initiates with the formation of a factor VIIa-TF complex, which proteolytically activates additional proteases (factors IX and X) and ultimately leads to the formation of a fibrin clot. The product of this gene inhibits the activated factor X and VIIa-TF proteases in an autoregulatory loop. The encoded protein is glycosylated and predominantly found in the vascular endothelium and plasma in both free forms and complexed with plasma lipoproteins. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been confirmed. ENSG00000003436 tissue factor pathway inhibitor 7035 TFPI NA
This gene encodes a low density lipoprotein receptor that belongs to the C-type lectin superfamily. This gene is regulated through the cyclic AMP signaling pathway. The encoded protein binds, internalizes and degrades oxidized low-density lipoprotein. This protein may be involved in the regulation of Fas-induced apoptosis. This protein may play a role as a scavenger receptor. Mutations of this gene have been associated with atherosclerosis, risk of myocardial infarction, and may modify the risk of Alzheimer’s disease. Alternate splicing results in multiple transcript variants. ENSG00000173391 oxidized low density lipoprotein receptor 1 4973 OLR1 NA
NA ENSG00000128016 ZFP36 ring finger protein 7538 ZFP36 NA
This gene encodes an iron containing glycoprotein which catalyzes the conversion of orthophosphoric monoester to alcohol and orthophosphate. It is the most basic of the acid phosphatases and is the only form not inhibited by L(+)-tartrate. ENSG00000102575 acid phosphatase 5, tartrate resistant 54 ACP5 NA
This gene encodes one of the Rab11-family interacting proteins (Rab11-FIPs), which play a role in the Rab-11 mediated recycling of vesicles. The encoded protein may be involved in endocytic sorting, trafficking of proteins including integrin subunits and epidermal growth factor receptor (EGFR), and transport between the recycling endosome and the trans-Golgi network. Alternative splicing results in multiple transcript variants. A pseudogene is described on the X chromosome. ENSG00000156675 RAB11 family interacting protein 1 (class I) 80223 RAB11FIP1 NA
This gene is regulated as part of the p53 tumor suppressor pathway. The gene encodes a lysosomal membrane protein that is required for the induction of autophagy by the pathway. Decreased transcriptional expression of this gene is associated with various tumors. This gene has a pseudogene on chromosome 4. ENSG00000136048 DNA damage regulated autophagy modulator 1 55332 DRAM1 NA
NA ENSG00000110328 polypeptide N-acetylgalactosaminyltransferase 18 374378 GALNT18 NA
This gene encodes a member of the NHERF family of PDZ scaffolding proteins. These proteins mediate many cellular processes by binding to and regulating the membrane expression and protein-protein interactions of membrane receptors and transport proteins. The encoded protein plays a role in intestinal sodium absorption by regulating the activity of the sodium/hydrogen exchanger 3, and may also regulate the cystic fibrosis transmembrane regulator (CFTR) ion channel. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000065054 SLC9A3 regulator 2 9351 SLC9A3R2 NA
This gene encodes a small, monomeric, predominantly unstructured protein that functions as a positive regulator of the Wnt/beta-catenin signaling pathway. This protein interacts with a repressor of beta-catenin mediated transcription at nuclear speckles. It is thought to competitively block interactions of the repressor with beta-catenin, resulting in up-regulation of beta-catenin target genes. The encoded protein may also play a role in the NF-kappaB and ERK1/2 signaling pathways. Expression of this gene may play a role in the proliferation of several types of cancer including thyroid cancer, breast cancer and hematological malignancies. ENSG00000176907 chromosome 8 open reading frame 4 56892 C8orf4 NA
This gene encodes a member of the dedicator of cytokinesis (DOCK) family of atypical guanine nucleotide exchange factors. Guanine nucleotide exchange factors interact with small GTPases and are components of intracellular signaling networks. The encoded protein is a group C DOCK protein and plays a role in actin cytoskeletal reorganization by activating the Rho GTPases Cdc42 and Rac1. Mutations in this gene are associated with Adams-Oliver syndrome 2. ENSG00000130158 dedicator of cytokinesis 6 57572 DOCK6 NA
NA ENSG00000138411 HECT, C2 and WW domain containing E3 ubiquitin protein ligase 2 57520 HECW2 NA
The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP3) protein, CRLR functions as an adrenomedullin receptor. ENSG00000122679 receptor (G protein-coupled) activity modifying protein 3 10268 RAMP3 NA
The protein encoded by this gene is a small Rho GTPase and a candidate tumor suppressor. The encoded protein interacts with the cullin-3 protein, a ubiquitin E3 ligase necessary for mitotic cell division. This protein inhibits the growth and spread of some types of breast cancer. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000008853 Rho related BTB domain containing 2 23221 RHOBTB2 NA
This gene encodes a member of the bone morphogenetic protein (BMP) receptor family of transmembrane serine/threonine kinases. The ligands of this receptor are BMPs, which are members of the TGF-beta superfamily. BMPs are involved in endochondral bone formation and embryogenesis. These proteins transduce their signals through the formation of heteromeric complexes of two different types of serine (threonine) kinase receptors: type I receptors of about 50-55 kD and type II receptors of about 70-80 kD. Type II receptors bind ligands in the absence of type I receptors, but they require their respective type I receptors for signaling, whereas type I receptors require their respective type II receptors for ligand binding. Mutations in this gene have been associated with primary pulmonary hypertension, both familial and fenfluramine-associated, and with pulmonary venoocclusive disease. ENSG00000204217 bone morphogenetic protein receptor type II 659 BMPR2 NA
This gene encodes a protein that functions as a molecular scaffold, linking receptors, including group 1 metabotropic glutamate receptors, to neuronal proteins. The encoded protein contains conserved domains, including a leucine zipper sequence, PDZ domain and a C-terminal PDZ-binding motif. Alternately spliced transcript variants have been observed for this gene. ENSG00000161835 GRP1 (general receptor for phosphoinositides 1)-associated scaffold protein 160622 GRASP NA
NA ENSG00000152268 NA NA NA TRUE
NA ENSG00000101187 solute carrier organic anion transporter family member 4A1 28231 SLCO4A1 NA
The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. ENSG00000137857 dual oxidase 1 53905 DUOX1 NA
NA ENSG00000256013 NA ENSG00000256013 RP11-27M24.1 NA
The protein encoded by this gene is a member of the ros/insulin receptor family of tyrosine kinases. Tyrosine-specific phosphorylation of proteins is a key to the control of diverse pathways leading to cell growth and differentiation. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000062524 leukocyte receptor tyrosine kinase 4058 LTK NA
This gene was identified as a gene whose expression can be induced by the tumor necrosis factor alpha (TNF) in umbilical vein endothelial cells. The expression of this gene was shown to be induced by retinoic acid in a cell line expressing a oncogenic version of the retinoic acid receptor alpha fusion protein, which suggested that this gene may be a retinoic acid target gene in acute promyelocytic leukemia. ENSG00000185215 TNF alpha induced protein 2 7127 TNFAIP2 NA
This gene encodes a cytokine receptor that specifically binds interleukin 15 (IL15) with high affinity. The receptors of IL15 and IL2 share two subunits, IL2R beta and IL2R gamma. This forms the basis of many overlapping biological activities of IL15 and IL2. The protein encoded by this gene is structurally related to IL2R alpha, an additional IL2-specific alpha subunit necessary for high affinity IL2 binding. Unlike IL2RA, IL15RA is capable of binding IL15 with high affinity independent of other subunits, which suggests distinct roles between IL15 and IL2. This receptor is reported to enhance cell proliferation and expression of apoptosis inhibitor BCL2L1/BCL2-XL and BCL2. Multiple alternatively spliced transcript variants of this gene have been reported. ENSG00000134470 interleukin 15 receptor subunit alpha 3601 IL15RA NA
NA ENSG00000149564 endothelial cell adhesion molecule 90952 ESAM NA
CDC25B is a member of the CDC25 family of phosphatases. CDC25B activates the cyclin dependent kinase CDC2 by removing two phosphate groups and it is required for entry into mitosis. CDC25B shuttles between the nucleus and the cytoplasm due to nuclear localization and nuclear export signals. The protein is nuclear in the M and G1 phases of the cell cycle and moves to the cytoplasm during S and G2. CDC25B has oncogenic properties, although its role in tumor formation has not been determined. Multiple transcript variants for this gene exist. ENSG00000101224 cell division cycle 25B 994 CDC25B NA
The protein encoded by this gene is a member of the fibroblast growth factor receptor family, where amino acid sequence is highly conserved between members and throughout evolution. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein would consist of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. The genomic organization of this gene, compared to members 1-3, encompasses 18 exons rather than 19 or 20. Although alternative splicing has been observed, there is no evidence that the C-terminal half of the IgIII domain of this protein varies between three alternate forms, as indicated for members 1-3. This particular family member preferentially binds acidic fibroblast growth factor and, although its specific function is unknown, it is overexpressed in gynecological tumor samples, suggesting a role in breast and ovarian tumorigenesis. ENSG00000160867 fibroblast growth factor receptor 4 2264 FGFR4 NA
NA ENSG00000213626 limb bud and heart development 81606 LBH NA
This gene encodes a protein that contains domains of thioredoxin and ERV1, members of two long-standing gene families. The gene expression is induced as fibroblasts begin to exit the proliferative cycle and enter quiescence, suggesting that this gene plays an important role in growth regulation. Two transcript variants encoding two different isoforms have been found for this gene. ENSG00000116260 quiescin sulfhydryl oxidase 1 5768 QSOX1 NA
NA ENSG00000006210 C-X3-C motif chemokine ligand 1 6376 CX3CL1 NA
The protein encoded by this gene is a member of a small family of proteins which bind single stranded DNA/RNA. These proteins are characterized by the presence of two sets of ribonucleoprotein consensus sequence (RNP-CS) that contain conserved motifs, RNP1 and RNP2, originally described in RNA binding proteins, and required for DNA binding. The RBMS proteins have been implicated in such diverse functions as DNA replication, gene transcription, cell cycle progression and apoptosis. This protein was isolated by phenotypic complementation of cdc2 and cdc13 mutants of yeast and is thought to suppress cdc2 and cdc13 mutants through the induction of translation of cdc2. ENSG00000076067 RNA binding motif single stranded interacting protein 2 5939 RBMS2 NA
The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. ENSG00000106537 tetraspanin 13 27075 TSPAN13 NA
C16ORF30 plays a role in cell adhesion and cellular permeability at adherens junctions (Kearsey et al., 2004 [PubMed 15206924]). ENSG00000131634 transmembrane protein 204 79652 TMEM204 NA
Guanylate-binding proteins, such as GBP4, are induced by interferon and hydrolyze GTP to both GDP and GMP (Vestal, 2005 [PubMed 16108726]). ENSG00000162654 guanylate binding protein 4 115361 GBP4 NA
This gene encodes a calcium-independent phospholipid-binding protein whose expression increases in serum-starved cells. This protein is a substrate for protein kinase C (PKC) phosphorylation and recruits polymerase I and transcript release factor (PTRF) to caveolae. Removal of this protein causes caveolae loss and its over-expression results in caveolae deformation and membrane tubulation. ENSG00000168497 serum deprivation response 8436 SDPR NA
NA ENSG00000245025 NA ENSG00000245025 RP11-875O11.1 NA
NA ENSG00000152527 pleckstrin homology, MyTH4 and FERM domain containing H2 130271 PLEKHH2 NA
NA ENSG00000165949 interferon alpha inducible protein 27 3429 IFI27 NA
This antimicrobial gene encodes a member of the CXC subfamily of chemokines. The encoded protein is a secreted growth factor that signals through the G-protein coupled receptor, CXC receptor 2. This protein plays a role in inflammation and as a chemoattractant for neutrophils. ENSG00000163734 C-X-C motif chemokine ligand 3 2921 CXCL3 NA
NA ENSG00000161940 B-cell CLL/lymphoma 6B 255877 BCL6B NA
Protein kinase C (PKC) is a family of serine- and threonine-specific protein kinases that can be activated by calcium and the second messenger diacylglycerol. PKC family members phosphorylate a wide variety of protein targets and are known to be involved in diverse cellular signaling pathways. PKC family members also serve as major receptors for phorbol esters, a class of tumor promoters. Each member of the PKC family has a specific expression profile and is believed to play a distinct role in cells. The protein encoded by this gene is one of the PKC family members. It is a calcium-independent and phospholipids-dependent protein kinase. It is predominantly expressed in epithelial tissues and has been shown to reside specifically in the cell nucleus. This protein kinase can regulate keratinocyte differentiation by activating the MAP kinase MAPK13 (p38delta)-activated protein kinase cascade that targets CCAAT/enhancer-binding protein alpha (CEBPA). It is also found to mediate the transcription activation of the transglutaminase 1 (TGM1) gene. Mutations in this gene are associated with susceptibility to cerebral infarction. ENSG00000027075 protein kinase C eta 5583 PRKCH NA
NA ENSG00000256261 NA NA NA TRUE
This gene is a member of the ankyrin-repeat family and is induced by lipopolysaccharide (LPS). The C-terminal portion of the encoded product which contains the ankyrin repeats, shares high sequence similarity with the I kappa B family of proteins. The latter are known to play a role in inflammatory responses to LPS by their interaction with NF-B proteins through ankyrin-repeat domains. Studies in mouse indicate that this gene product is one of the nuclear I kappa B proteins and an activator of IL-6 production. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000144802 NFKB inhibitor zeta 64332 NFKBIZ NA
The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms containing c-terminal dbl oncogene homology (DH) and pleckstrin homology (PH) domains. The DH domain is associated with guanine nucleotide exchange activation for the Rho/Rac family of small GTP binding proteins, resulting in the conversion of the inactive GTPase to the active form capable of transducing signals. The PH domain has multiple functions. Therefore, these isoforms function as scaffolding proteins to coordinate a Rho signaling pathway, function as protein kinase A-anchoring proteins and, in addition, enhance ligand-dependent activity of estrogen receptors alpha and beta. ENSG00000170776 A-kinase anchoring protein 13 11214 AKAP13 NA
This gene encodes a protein that binds to the hepatocyte growth factor receptor to regulate cell growth, cell motility and morphogenesis in numerous cell and tissue types. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate alpha and beta chains, which form the mature heterodimer. This protein is secreted by mesenchymal cells and acts as a multi-functional cytokine on cells of mainly epithelial origin. This protein also plays a role in angiogenesis, tumorogenesis, and tissue regeneration. Although the encoded protein is a member of the peptidase S1 family of serine proteases, it lacks peptidase activity. Mutations in this gene are associated with nonsyndromic hearing loss. ENSG00000019991 hepatocyte growth factor 3082 HGF NA
The protein encoded by this gene is a member of the CRK-associated substrates family. Members of this family are adhesion docking molecules that mediate protein-protein interactions for signal transduction pathways. This protein is a focal adhesion protein that acts as a scaffold to regulate signaling complexes important in cell attachment, migration and invasion as well as apoptosis and the cell cycle. This protein has also been reported to have a role in cancer metastasis. Alternative splicing results in multiple transcript variants. ENSG00000111859 neural precursor cell expressed, developmentally down-regulated 9 4739 NEDD9 NA
This gene encodes a five transmembrane protein that functions as a major regulator of the innate immune response to viral and bacterial infections. The encoded protein is a pattern recognition receptor that detects cytosolic nucleic acids and transmits signals that activate type I interferon responses. The encoded protein has also been shown to play a role in apoptotic signaling by associating with type II major histocompatibility complex. Mutations in this gene are the cause of infantile-onset STING-associated vasculopathy. Alternate splicing results in multiple transcript variants. ENSG00000184584 transmembrane protein 173 340061 TMEM173 NA
This gene encodes a membrane glycoprotein of T lymphocytes that interacts with major histocompatibility complex class II antigenes and is also a receptor for the human immunodeficiency virus. This gene is expressed not only in T lymphocytes, but also in B cells, macrophages, and granulocytes. It is also expressed in specific regions of the brain. The protein functions to initiate or augment the early phase of T-cell activation, and may function as an important mediator of indirect neuronal damage in infectious and immune-mediated diseases of the central nervous system. Multiple alternatively spliced transcript variants encoding different isoforms have been identified in this gene. ENSG00000010610 CD4 molecule 920 CD4 NA
Retinoids exert biologic effects such as potent growth inhibitory and cell differentiation activities and are used in the treatment of hyperproliferative dermatological diseases. These effects are mediated by specific nuclear receptor proteins that are members of the steroid and thyroid hormone receptor superfamily of transcriptional regulators. RARRES1, RARRES2, and RARRES3 are genes whose expression is upregulated by the synthetic retinoid tazarotene. RARRES3 is thought act as a tumor suppressor or growth regulator. ENSG00000133321 retinoic acid receptor responder 3 5920 RARRES3 NA
NA ENSG00000204176 synaptotagmin-15 102724488 LOC102724488 NA
This gene encodes a member of the Synaptotagmin (Syt) family of membrane trafficking proteins. Members of this family contain a transmembrane region and a C-terminal-type tandem C2 domain. Unlike related family members, the encoded protein may be involved in membrane trafficking in non-neuronal tissues. Two trancript variants encoding distinct isoforms have been identified for this gene. ENSG00000204176 synaptotagmin 15 83849 SYT15 NA
NA ENSG00000148158 sorting nexin family member 30 401548 SNX30 NA
NA ENSG00000154065 ankyrin repeat domain 29 147463 ANKRD29 NA
NA ENSG00000039523 family with sequence similarity 65 member A 79567 FAM65A NA
NA ENSG00000159588 coiled-coil domain containing 17 149483 CCDC17 NA
This gene encodes a member of the transforming growth factor-beta (TGFB) superfamily. The encoded preproprotein is proteolytically processed to generate each subunit of the disulfide-linked homodimer, which induces bone and cartilage formation. ENSG00000125845 bone morphogenetic protein 2 650 BMP2 NA
NA ENSG00000105538 Ras interacting protein 1 54922 RASIP1 NA
NA ENSG00000088387 dedicator of cytokinesis 9 23348 DOCK9 NA
This gene encodes a member of the zinc-iron permease family. The encoded protein is localized to the cell membrane and acts as a zinc uptake transporter. This gene has been linked to prostate cancer, breast cancer, and Alzheimer’s disease. Alternative splicing results in multiple transcript variants. ENSG00000143570 solute carrier family 39 member 1 27173 SLC39A1 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",16,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 17 Annotations

out <- mygene::queryMany(gene_list[17,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query summary name X_id notfound
CSF3R ENSG00000119535 The protein encoded by this gene is the receptor for colony stimulating factor 3, a cytokine that controls the production, differentiation, and function of granulocytes. The encoded protein, which is a member of the family of cytokine receptors, may also function in some cell surface adhesion or recognition processes. Alternatively spliced transcript variants have been described. Mutations in this gene are a cause of Kostmann syndrome, also known as severe congenital neutropenia. colony stimulating factor 3 receptor 1441 NA
IFITM2 ENSG00000185201 NA interferon induced transmembrane protein 2 10581 NA
MMP25 ENSG00000008516 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMPs are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. However, the protein encoded by this gene is a member of the membrane-type MMP (MT-MMP) subfamily, attached to the plasma membrane via a glycosylphosphatidyl inositol anchor. In response to bacterial infection or inflammation, the encoded protein is thought to inactivate alpha-1 proteinase inhibitor, a major tissue protectant against proteolytic enzymes released by activated neutrophils, facilitating the transendothelial migration of neutrophils to inflammatory sites. The encoded protein may also play a role in tumor invasion and metastasis through activation of MMP2. The gene has previously been referred to as MMP20 but has been renamed MMP25. matrix metallopeptidase 25 64386 NA
IL1R2 ENSG00000115590 The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This protein binds interleukin alpha (IL1A), interleukin beta (IL1B), and interleukin 1 receptor, type I(IL1R1/IL1RA), and acts as a decoy receptor that inhibits the activity of its ligands. Interleukin 4 (IL4) is reported to antagonize the activity of interleukin 1 by inducing the expression and release of this cytokine. This gene and three other genes form a cytokine receptor gene cluster on chromosome 2q12. Alternative splicing results in multiple transcript variants and protein isoforms. Alternative splicing produces both membrane-bound and soluble proteins. A soluble protein is also produced by proteolytic cleavage. interleukin 1 receptor type 2 7850 NA
C10orf54 ENSG00000107738 NA chromosome 10 open reading frame 54 64115 NA
SELL ENSG00000188404 This gene encodes a cell surface adhesion molecule that belongs to a family of adhesion/homing receptors. The encoded protein contains a C-type lectin-like domain, a calcium-binding epidermal growth factor-like domain, and two short complement-like repeats. The gene product is required for binding and subsequent rolling of leucocytes on endothelial cells, facilitating their migration into secondary lymphoid organs and inflammation sites. Single-nucleotide polymorphisms in this gene have been associated with various diseases including immunoglobulin A nephropathy. Alternatively spliced transcript variants have been found for this gene. selectin L 6402 NA
VNN2 ENSG00000112303 This gene product is a member of the Vanin family of proteins that share extensive sequence similarity with each other, and also with biotinidase. The family includes secreted and membrane-associated proteins, a few of which have been reported to participate in hematopoietic cell trafficking. No biotinidase activity has been demonstrated for any of the vanin proteins, however, they possess pantetheinase activity, which may play a role in oxidative-stress response. The encoded protein is a GPI-anchored cell surface molecule that plays a role in transendothelial migration of neutrophils. This gene lies in close proximity to, and in same transcriptional orientation as two other vanin genes on chromosome 6q23-q24. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. vanin 2 8875 NA
FPR1 ENSG00000171051 This gene encodes a G protein-coupled receptor of mammalian phagocytic cells that is a member of the G-protein coupled receptor 1 family. The protein mediates the response of phagocytic cells to invasion of the host by microorganisms and is important in host defense and inflammation. formyl peptide receptor 1 2357 NA
MNDA ENSG00000163563 The myeloid cell nuclear differentiation antigen (MNDA) is detected only in nuclei of cells of the granulocyte-monocyte lineage. A 200-amino acid region of human MNDA is strikingly similar to a region in the proteins encoded by a family of interferon-inducible mouse genes, designated Ifi-201, Ifi-202, and Ifi-203, that are not regulated in a cell- or tissue-specific fashion. The 1.8-kb MNDA mRNA, which contains an interferon-stimulated response element in the 5-prime untranslated region, was significantly upregulated in human monocytes exposed to interferon alpha. MNDA is located within 2,200 kb of FCER1A, APCS, CRP, and SPTA1. In its pattern of expression and/or regulation, MNDA resembles IFI16, suggesting that these genes participate in blood cell-specific responses to interferons. myeloid cell nuclear differentiation antigen 4332 NA
S100A9 ENSG00000163220 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100 calcium binding protein A9 6280 NA
ALPL ENSG00000162551 This gene encodes a member of the alkaline phosphatase family of proteins. There are at least four distinct but related alkaline phosphatases: intestinal, placental, placental-like, and liver/bone/kidney (tissue non-specific). The first three are located together on chromosome 2, while the tissue non-specific form is located on chromosome 1. The product of this gene is a membrane bound glycosylated enzyme that is not expressed in any particular tissue and is, therefore, referred to as the tissue-nonspecific form of the enzyme. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature enzyme. This enzyme may play a role in bone mineralization. Mutations in this gene have been linked to hypophosphatasia, a disorder that is characterized by hypercalcemia and skeletal defects. alkaline phosphatase, liver/bone/kidney 249 NA
FCGR2A ENSG00000143226 This gene encodes one member of a family of immunoglobulin Fc receptor genes found on the surface of many immune response cells. The protein encoded by this gene is a cell surface receptor found on phagocytic cells such as macrophages and neutrophils, and is involved in the process of phagocytosis and clearing of immune complexes. Alternative splicing results in multiple transcript variants. Fc fragment of IgG receptor IIa 2212 NA
FCGR2C ENSG00000143226 This gene encodes one of three members of a family of low-affinity immunoglobulin gamma Fc receptors found on the surface of many immune response cells. The encoded protein is a transmembrane glycoprotein and may be involved in phagocytosis and clearing of immune complexes. An allelic polymorphism in this gene results in both coding and non-coding variants. Fc fragment of IgG receptor IIc (gene/pseudogene) 9103 NA
FLOT2 ENSG00000132589 Caveolae are small domains on the inner cell membrane involved in vesicular trafficking and signal transduction. This gene encodes a caveolae-associated, integral membrane protein, which is thought to function in neuronal signaling. flotillin 2 2319 NA
HLA-C ENSG00000204525 HLA-C belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domain, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Over one hundred HLA-C alleles have been described major histocompatibility complex, class I, C 3107 NA
S100A12 ENSG00000163221 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein is proposed to be involved in specific calcium-dependent signal transduction pathways and its regulatory effect on cytoskeletal components may modulate various neutrophil activities. The protein includes an antimicrobial peptide which has antibacterial activity. S100 calcium binding protein A12 6283 NA
FCGR3B ENSG00000162747 The protein encoded by this gene is a low affinity receptor for the Fc region of gamma immunoglobulins (IgG). The encoded protein acts as a monomer and can bind either monomeric or aggregated IgG. This gene may function to capture immune complexes in the peripheral circulation. Several transcript variants encoding different isoforms have been found for this gene. A highly-similar gene encoding a related protein is also found on chromosome 1. Fc fragment of IgG receptor IIIb 2215 NA
XPO6 ENSG00000169180 The protein encoded by this gene is a member of the importin-beta family. Members of this family are regulated by the GTPase Ran to mediate transport of cargo across the nuclear envelope. This protein has been shown to mediate nuclear export of profilin-actin complexes. A pseudogene of this gene is located on the long arm of chromosome 14. Alternative splicing results in multiple transcript variants that encode different protein isoforms. exportin 6 23214 NA
FGR ENSG00000000938 This gene is a member of the Src family of protein tyrosine kinases (PTKs). The encoded protein contains N-terminal sites for myristylation and palmitylation, a PTK domain, and SH2 and SH3 domains which are involved in mediating protein-protein interactions with phosphotyrosine-containing and proline-rich motifs, respectively. The protein localizes to plasma membrane ruffles, and functions as a negative regulator of cell migration and adhesion triggered by the beta-2 integrin signal transduction pathway. Infection with Epstein-Barr virus results in the overexpression of this gene. Multiple alternatively spliced variants, encoding the same protein, have been identified. FGR proto-oncogene, Src family tyrosine kinase 2268 NA
FCER1G ENSG00000158869 The high affinity IgE receptor is a key molecule involved in allergic reactions. It is a tetramer composed of 1 alpha, 1 beta, and 2 gamma chains. The gamma chains are also subunits of other Fc receptors. Fc fragment of IgE receptor Ig 2207 NA
NAMPT ENSG00000105835 This gene encodes a protein that catalyzes the condensation of nicotinamide with 5-phosphoribosyl-1-pyrophosphate to yield nicotinamide mononucleotide, one step in the biosynthesis of nicotinamide adenine dinucleotide. The protein belongs to the nicotinic acid phosphoribosyltransferase (NAPRTase) family and is thought to be involved in many important biological processes, including metabolism, stress response and aging. This gene has a pseudogene on chromosome 10. nicotinamide phosphoribosyltransferase 10135 NA
HCK ENSG00000101336 The protein encoded by this gene is a member of the Src family of tyrosine kinases. This protein is primarily hemopoietic, particularly in cells of the myeloid and B-lymphoid lineages. It may help couple the Fc receptor to the activation of the respiratory burst. In addition, it may play a role in neutrophil migration and in the degranulation of neutrophils. Multiple isoforms with different subcellular distributions are produced due to both alternative splicing and the use of alternative translation initiation codons, including a non-AUG (CUG) codon. HCK proto-oncogene, Src family tyrosine kinase 3055 NA
SMAP2 ENSG00000084070 NA small ArfGAP2 64744 NA
ARHGDIB ENSG00000111348 Members of the Rho (or ARH) protein family (see MIM 165390) and other Ras-related small GTP-binding proteins (see MIM 179520) are involved in diverse cellular events, including cell signaling, proliferation, cytoskeletal organization, and secretion. The GTP-binding proteins are active only in the GTP-bound state. At least 3 classes of proteins tightly regulate cycling between the GTP-bound and GDP-bound states: GTPase-activating proteins (GAPs), guanine nucleotide-releasing factors (GRFs), and GDP-dissociation inhibitors (GDIs). The GDIs, including ARHGDIB, decrease the rate of GDP dissociation from Ras-like GTPases (summary by Scherle et al., 1993 [PubMed 8356058]). Rho GDP dissociation inhibitor beta 397 NA
MYO1F ENSG00000142347 NA myosin IF 4542 NA
HK3 ENSG00000160883 Hexokinases phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. This gene encodes hexokinase 3. Similar to hexokinases 1 and 2, this allosteric enzyme is inhibited by its product glucose-6-phosphate. hexokinase 3 3101 NA
ALOX5AP ENSG00000132965 This gene encodes a protein which, with 5-lipoxygenase, is required for leukotriene synthesis. Leukotrienes are arachidonic acid metabolites which have been implicated in various types of inflammatory responses, including asthma, arthritis and psoriasis. This protein localizes to the plasma membrane. Inhibitors of its function impede translocation of 5-lipoxygenase from the cytoplasm to the cell membrane and inhibit 5-lipoxygenase activation. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. arachidonate 5-lipoxygenase activating protein 241 NA
CXCR1 ENSG00000163464 The protein encoded by this gene is a member of the G-protein-coupled receptor family. This protein is a receptor for interleukin 8 (IL8). It binds to IL8 with high affinity, and transduces the signal through a G-protein activated second messenger system. Knockout studies in mice suggested that this protein inhibits embryonic oligodendrocyte precursor migration in developing spinal cord. This gene, IL8RB, a gene encoding another high affinity IL8 receptor, as well as IL8RBP, a pseudogene of IL8RB, form a gene cluster in a region mapped to chromosome 2q33-q36. C-X-C motif chemokine receptor 1 3577 NA
ITGB2 ENSG00000160255 This gene encodes an integrin beta chain, which combines with multiple different alpha chains to form different integrin heterodimers. Integrins are integral cell-surface proteins that participate in cell adhesion as well as cell-surface mediated signalling. The encoded protein plays an important role in immune response and defects in this gene cause leukocyte adhesion deficiency. Alternative splicing results in multiple transcript variants. integrin subunit beta 2 3689 NA
AQP9 ENSG00000103569 The aquaporins are a family of water-selective membrane channels. This gene encodes a member of a subset of aquaporins called the aquaglyceroporins. This protein allows passage of a broad range of noncharged solutes and also stimulates urea transport and osmotic water permeability. This protein may also facilitate the uptake of glycerol in hepatic tissue . The encoded protein may also play a role in specialized leukocyte functions such as immunological response and bactericidal activity. Alternate splicing results in multiple transcript variants. aquaporin 9 366 NA
LILRA5 ENSG00000187116 The protein encoded by this gene is a member of the leukocyte immunoglobulin-like receptor (LIR) family. LIR family members are known to have activating and inibitory functions in leukocytes. Crosslink of this receptor protein on the surface of monocytes has been shown to induce calcium flux and secretion of several proinflammatory cytokines, which suggests the roles of this protein in triggering innate immune responses. This gene is one of the leukocyte receptor genes that form a gene cluster on the chromosomal region 19q13.4. Four alternatively spliced transcript variants encoding distinct isoforms have been described. leukocyte immunoglobulin like receptor A5 353514 NA
CD177 ENSG00000204936 This gene encodes a glycosyl-phosphatidylinositol (GPI)-linked cell surface glycoprotein that plays a role in neutrophil activation. The protein can bind platelet endothelial cell adhesion molecule-1 and function in neutrophil transmigration. Mutations in this gene are associated with myeloproliferative diseases. Over-expression of this gene has been found in patients with polycythemia rubra vera. Autoantibodies against the protein may result in pulmonary transfusion reactions, and it may be involved in Wegener’s granulomatosis. A related pseudogene, which is adjacent to this gene on chromosome 19, has been identified. CD177 molecule 57126 NA
GCA ENSG00000115271 This gene product, grancalcin, is a calcium-binding protein abundant in neutrophils and macrophages. It belongs to the penta-EF-hand subfamily of proteins which includes sorcin, calpain, and ALG-2. Grancalcin localization is dependent upon calcium and magnesium. In the absence of divalent cation, grancalcin localizes to the cytosolic fraction; with magnesium alone, it partitions with the granule fraction; and in the presence of magnesium and calcium, it associates with both the granule and membrane fractions, suggesting a role for grancalcin in granule-membrane fusion and degranulation. grancalcin 25801 NA
SELPLG ENSG00000110876 This gene encodes a glycoprotein that functions as a high affinity counter-receptor for the cell adhesion molecules P-, E- and L- selectin expressed on myeloid cells and stimulated T lymphocytes. As such, this protein plays a critical role in leukocyte trafficking during inflammation by tethering of leukocytes to activated platelets or endothelia expressing selectins. This protein requires two post-translational modifications, tyrosine sulfation and the addition of the sialyl Lewis x tetrasaccharide (sLex) to its O-linked glycans, for its high-affinity binding activity. Aberrant expression of this gene and polymorphisms in this gene are associated with defects in the innate and adaptive immune response. Alternate splicing results in multiple transcript variants. selectin P ligand 6404 NA
NCF2 ENSG00000116701 This gene encodes neutrophil cytosolic factor 2, the 67-kilodalton cytosolic subunit of the multi-protein NADPH oxidase complex found in neutrophils. This oxidase produces a burst of superoxide which is delivered to the lumen of the neutrophil phagosome. Mutations in this gene, as well as in other NADPH oxidase subunits, can result in chronic granulomatous disease, a disease that causes recurrent infections by catalase-positive organisms. Alternative splicing results in multiple transcript variants encoding different isoforms. neutrophil cytosolic factor 2 4688 NA
NCF4 ENSG00000100365 The protein encoded by this gene is a cytosolic regulatory component of the superoxide-producing phagocyte NADPH-oxidase, a multicomponent enzyme system important for host defense. This protein is preferentially expressed in cells of myeloid lineage. It interacts primarily with neutrophil cytosolic factor 2 (NCF2/p67-phox) to form a complex with neutrophil cytosolic factor 1 (NCF1/p47-phox), which further interacts with the small G protein RAC1 and translocates to the membrane upon cell stimulation. This complex then activates flavocytochrome b, the membrane-integrated catalytic core of the enzyme system. The PX domain of this protein can bind phospholipid products of the PI(3) kinase, which suggests its role in PI(3) kinase-mediated signaling events. The phosphorylation of this protein was found to negatively regulate the enzyme activity. Alternatively spliced transcript variants encoding distinct isoforms have been observed. neutrophil cytosolic factor 4 4689 NA
SRGN ENSG00000122862 This gene encodes a protein best known as a hematopoietic cell granule proteoglycan. Proteoglycans stored in the secretory granules of many hematopoietic cells also contain a protease-resistant peptide core, which may be important for neutralizing hydrolytic enzymes. This encoded protein was found to be associated with the macromolecular complex of granzymes and perforin, which may serve as a mediator of granule-mediated apoptosis. Two transcript variants, only one of them protein-coding, have been found for this gene. serglycin 5552 NA
LCP1 ENSG00000136167 Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. Plastin 1 (otherwise known as Fimbrin) is a third distinct plastin isoform which is specifically expressed at high levels in the small intestine. The L isoform is expressed only in hemopoietic cell lineages, while the T isoform has been found in all other normal cells of solid tissues that have replicative potential (fibroblasts, endothelial cells, epithelial cells, melanocytes, etc.). However, L-plastin has been found in many types of malignant human cells of non-hemopoietic origin suggesting that its expression is induced accompanying tumorigenesis in solid tissues. lymphocyte cytosolic protein 1 3936 NA
SPI1 ENSG00000066336 This gene encodes an ETS-domain transcription factor that activates gene expression during myeloid and B-lymphoid cell development. The nuclear protein binds to a purine-rich sequence known as the PU-box found near the promoters of target genes, and regulates their expression in coordination with other transcription factors and cofactors. The protein can also regulate alternative splicing of target genes. Multiple transcript variants encoding different isoforms have been found for this gene. Spi-1 proto-oncogene 6688 NA
MMP9 ENSG00000100985 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMP’s are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. The enzyme encoded by this gene degrades type IV and V collagens. Studies in rhesus monkeys suggest that the enzyme is involved in IL-8-induced mobilization of hematopoietic progenitor cells from bone marrow, and murine studies suggest a role in tumor-associated tissue remodeling. matrix metallopeptidase 9 4318 NA
HCLS1 ENSG00000180353 NA hematopoietic cell-specific Lyn substrate 1 3059 NA
MMP25-AS1 ENSG00000261971 NA MMP25 antisense RNA 1 ENSG00000261971 NA
ITGAX ENSG00000140678 This gene encodes the integrin alpha X chain protein. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This protein combines with the beta 2 chain (ITGB2) to form a leukocyte-specific integrin referred to as inactivated-C3b (iC3b) receptor 4 (CR4). The alpha X beta 2 complex seems to overlap the properties of the alpha M beta 2 integrin in the adherence of neutrophils and monocytes to stimulated endothelium cells, and in the phagocytosis of complement coated particles. Two transcript variants encoding different isoforms have been found for this gene. integrin subunit alpha X 3687 NA
ABTB1 ENSG00000114626 This gene encodes a protein with an ankyrin repeat region and two BTB/POZ domains, which are thought to be involved in protein-protein interactions. Expression of this gene is activated by the phosphatase and tensin homolog, a tumor suppressor. Alternate splicing results in three transcript variants. ankyrin repeat and BTB domain containing 1 80325 NA
IL18RAP ENSG00000115607 The protein encoded by this gene is an accessory subunit of the heterodimeric receptor for interleukin 18 (IL18), a proinflammatory cytokine involved in inducing cell-mediated immunity. This protein enhances the IL18-binding activity of the IL18 receptor and plays a role in signaling by IL18. Mutations in this gene are associated with Crohn’s disease and inflammatory bowel disease, and susceptibility to celiac disease and leprosy. Alternatively spliced transcript variants of this gene have been described, but their full-length nature is not known. interleukin 18 receptor accessory protein 8807 NA
SLC11A1 ENSG00000018280 This gene is a member of the solute carrier family 11 (proton-coupled divalent metal ion transporters) family and encodes a multi-pass membrane protein. The protein functions as a divalent transition metal (iron and manganese) transporter involved in iron metabolism and host resistance to certain pathogens. Mutations in this gene have been associated with susceptibility to infectious diseases such as tuberculosis and leprosy, and inflammatory diseases such as rheumatoid arthritis and Crohn disease. Alternatively spliced variants that encode different protein isoforms have been described but the full-length nature of only one has been determined. solute carrier family 11 member 1 6556 NA
SHKBP1 ENSG00000160410 NA SH3KBP1 binding protein 1 92799 NA
TYROBP ENSG00000011600 This gene encodes a transmembrane signaling polypeptide which contains an immunoreceptor tyrosine-based activation motif (ITAM) in its cytoplasmic domain. The encoded protein may associate with the killer-cell inhibitory receptor (KIR) family of membrane glycoproteins and may act as an activating signal transduction element. This protein may bind zeta-chain (TCR) associated protein kinase 70kDa (ZAP-70) and spleen tyrosine kinase (SYK) and play a role in signal transduction, bone modeling, brain myelination, and inflammation. Mutations within this gene have been associated with polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy (PLOSL), also known as Nasu-Hakola disease. Its putative receptor, triggering receptor expressed on myeloid cells 2 (TREM2), also causes PLOSL. Multiple alternative transcript variants encoding distinct isoforms have been identified for this gene. TYRO protein tyrosine kinase binding protein 7305 NA
HLA-B ENSG00000234745 HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. major histocompatibility complex, class I, B 3106 NA
SLA ENSG00000155926 NA Src-like-adaptor 6503 NA
FAM65B ENSG00000111913 The protein encoded by this gene stimulates the formation of a non-mitotic multinucleate syncytium from proliferative cytotrophoblasts during trophoblast differentiation. Alternative splicing of this gene results in multiple transcript variants. family with sequence similarity 65 member B 9750 NA
DOK3 ENSG00000146094 NA docking protein 3 79930 NA
LITAF ENSG00000189067 Lipopolysaccharide is a potent stimulator of monocytes and macrophages, causing secretion of tumor necrosis factor-alpha (TNF-alpha) and other inflammatory mediators. This gene encodes lipopolysaccharide-induced TNF-alpha factor, which is a DNA-binding protein and can mediate the TNF-alpha expression by direct binding to the promoter region of the TNF-alpha gene. The transcription of this gene is induced by tumor suppressor p53 and has been implicated in the p53-induced apoptotic pathway. Mutations in this gene cause Charcot-Marie-Tooth disease type 1C (CMT1C) and may be involved in the carcinogenesis of extramammary Paget’s disease (EMPD). Multiple alternatively spliced transcript variants have been found for this gene. lipopolysaccharide induced TNF factor 9516 NA
PYGL ENSG00000100504 This gene encodes a homodimeric protein that catalyses the cleavage of alpha-1,4-glucosidic bonds to release glucose-1-phosphate from liver glycogen stores. This protein switches from inactive phosphorylase B to active phosphorylase A by phosphorylation of serine residue 15. Activity of this enzyme is further regulated by multiple allosteric effectors and hormonal controls. Humans have three glycogen phosphorylase genes that encode distinct isozymes that are primarily expressed in liver, brain and muscle, respectively. The liver isozyme serves the glycemic demands of the body in general while the brain and muscle isozymes supply just those tissues. In glycogen storage disease type VI, also known as Hers disease, mutations in liver glycogen phosphorylase inhibit the conversion of glycogen to glucose and results in moderate hypoglycemia, mild ketosis, growth retardation and hepatomegaly. Alternative splicing results in multiple transcript variants encoding different isoforms. phosphorylase, glycogen, liver 5836 NA
LAPTM5 ENSG00000162511 This gene encodes a transmembrane receptor that is associated with lysosomes. The encoded protein, also known as E3 protein, may play a role in hematopoiesis. lysosomal protein transmembrane 5 7805 NA
HLA-E ENSG00000204592 HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. major histocompatibility complex, class I, E 3133 NA
COTL1 ENSG00000103187 This gene encodes one of the numerous actin-binding proteins which regulate the actin cytoskeleton. This protein binds F-actin, and also interacts with 5-lipoxygenase, which is the first committed enzyme in leukotriene biosynthesis. Although this gene has been reported to map to chromosome 17 in the Smith-Magenis syndrome region, the best alignments for this gene are to chromosome 16. The Smith-Magenis syndrome region is the site of two related pseudogenes. coactosin like F-actin binding protein 1 23406 NA
ITM2B ENSG00000136156 Amyloid precursor proteins are processed by beta-secretase and gamma-secretase to produce beta-amyloid peptides which form the characteristic plaques of Alzheimer disease. This gene encodes a transmembrane protein which is processed at the C-terminus by furin or furin-like proteases to produce a small secreted peptide which inhibits the deposition of beta-amyloid. Mutations which result in extension of the C-terminal end of the encoded protein, thereby increasing the size of the secreted peptide, are associated with two neurogenerative diseases, familial British dementia and familial Danish dementia. integral membrane protein 2B 9445 NA
H3F3AP4 ENSG00000235655 NA H3 histone, family 3A, pseudogene 4 ENSG00000235655 NA
CST7 ENSG00000077984 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions. This gene encodes a glycosylated cysteine protease inhibitor with a putative role in immune regulation through inhibition of a unique target in the hematopoietic system. Expression of the protein has been observed in various human cancer cell lines established from malignant tumors. cystatin F 8530 NA
ICAM3 ENSG00000076662 The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein is constitutively and abundantly expressed by all leucocytes and may be the most important ligand for LFA-1 in the initiation of the immune response. It functions not only as an adhesion molecule, but also as a potent signalling molecule. Alternative splicing results in multiple transcript variants encoding different isoforms. intercellular adhesion molecule 3 3385 NA
ITGAM ENSG00000169896 This gene encodes the integrin alpha M chain. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This I-domain containing alpha integrin combines with the beta 2 chain (ITGB2) to form a leukocyte-specific integrin referred to as macrophage receptor 1 (‘Mac-1’), or inactivated-C3b (iC3b) receptor 3 (‘CR3’). The alpha M beta 2 integrin is important in the adherence of neutrophils and monocytes to stimulated endothelium, and also in the phagocytosis of complement coated particles. Multiple transcript variants encoding different isoforms have been found for this gene. integrin subunit alpha M 3684 NA
ARRB2 ENSG00000141480 Members of arrestin/beta-arrestin protein family are thought to participate in agonist-mediated desensitization of G-protein-coupled receptors and cause specific dampening of cellular responses to stimuli such as hormones, neurotransmitters, or sensory signals. Arrestin beta 2, like arrestin beta 1, was shown to inhibit beta-adrenergic receptor function in vitro. It is expressed at high levels in the central nervous system and may play a role in the regulation of synaptic receptors. Besides the brain, a cDNA for arrestin beta 2 was isolated from thyroid gland, and thus it may also be involved in hormone-specific desensitization of TSH receptors. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. arrestin beta 2 409 NA
PLBD1 ENSG00000121316 NA phospholipase B domain containing 1 79887 NA
TNFRSF10C ENSG00000173535 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor contains an extracellular TRAIL-binding domain and a transmembrane domain, but no cytoplasmic death domain. This receptor is not capable of inducing apoptosis, and is thought to function as an antagonistic receptor that protects cells from TRAIL-induced apoptosis. This gene was found to be a p53-regulated DNA damage-inducible gene. The expression of this gene was detected in many normal tissues but not in most cancer cell lines, which may explain the specific sensitivity of cancer cells to the apoptosis-inducing activity of TRAIL. tumor necrosis factor receptor superfamily member 10c 8794 NA
TLR2 ENSG00000137462 The protein encoded by this gene is a member of the Toll-like receptor (TLR) family which plays a fundamental role in pathogen recognition and activation of innate immunity. TLRs are highly conserved from Drosophila to humans and share structural and functional similarities. This protein is a cell-surface protein that can form heterodimers with other TLR family members to recognize conserved molecules derived from microorganisms known as pathogen-associated molecular patterns (PAMPs). Activation of TLRs by PAMPs leads to an up-regulation of signaling pathways to modulate the host’s inflammatory response. This gene is also thought to promote apoptosis in response to bacterial lipoproteins. This gene has been implicated in the pathogenesis of several autoimmune diseases. Alternative splicing results in multiple transcript variants. toll like receptor 2 7097 NA
GPSM3 ENSG00000213654 NA G-protein signaling modulator 3 63940 NA
TKT ENSG00000163931 This gene encodes a thiamine-dependent enzyme which plays a role in the channeling of excess sugar phosphates to glycolysis in the pentose phosphate pathway. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. transketolase 7086 NA
MSRB1 ENSG00000198736 This gene encodes a selenoprotein, which contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon that normally signals translation termination. The 3’ UTR of selenoprotein genes have a common stem-loop structure, the sec insertion sequence (SECIS), that is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. This protein belongs to the methionine sulfoxide reductase (Msr) protein family which includes repair enzymes that reduce oxidized methionine residues in proteins. The protein encoded by this gene is expressed in a variety of adult and fetal tissues and localizes to the cell nucleus and cytosol. methionine sulfoxide reductase B1 51734 NA
CTSS ENSG00000163131 The protein encoded by this gene, a member of the peptidase C1 family, is a lysosomal cysteine proteinase that may participate in the degradation of antigenic proteins to peptides for presentation on MHC class II molecules. The encoded protein can function as an elastase over a broad pH range in alveolar macrophages. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. cathepsin S 1520 NA
IL17RA ENSG00000177663 Interleukin 17A (IL17A) is a proinflammatory cytokine secreted by activated T-lymphocytes. It is a potent inducer of the maturation of CD34-positive hematopoietic precursors into neutrophils. The transmembrane protein encoded by this gene (interleukin 17A receptor; IL17RA) is a ubiquitous type I membrane glycoprotein that binds with low affinity to interleukin 17A. Interleukin 17A and its receptor play a pathogenic role in many inflammatory and autoimmune diseases such as rheumatoid arthritis. Like other cytokine receptors, this receptor likely has a multimeric structure. Alternative splicing results in multiple transcript variants encoding different isoforms. interleukin 17 receptor A 23765 NA
CAP1 ENSG00000131236 The protein encoded by this gene is related to the S. cerevisiae CAP protein, which is involved in the cyclic AMP pathway. The human protein is able to interact with other molecules of the same protein, as well as with CAP2 and actin. Alternatively spliced transcript variants have been identified. CAP, adenylate cyclase-associated protein 1 (yeast) 10487 NA
TCIRG1 ENSG00000110719 Through alternate splicing, this gene encodes two proteins with similarity to subunits of the vacuolar ATPase (V-ATPase) but the encoded proteins seem to have different functions. V-ATPase is a multisubunit enzyme that mediates acidification of eukaryotic intracellular organelles. V-ATPase dependent organelle acidification is necessary for such intracellular processes as protein sorting, zymogen activation, and receptor-mediated endocytosis. V-ATPase is comprised of a cytosolic V1 domain and a transmembrane V0 domain. Mutations in this gene are associated with infantile malignant osteopetrosis. T-cell immune regulator 1, ATPase H+ transporting V0 subunit a3 10312 NA
THEMIS2 ENSG00000130775 NA thymocyte selection associated family member 2 9473 NA
LILRA6 ENSG00000244482 NA leukocyte immunoglobulin like receptor A6 79168 NA
LILRB3 ENSG00000244482 This gene is a member of the leukocyte immunoglobulin-like receptor (LIR) family, which is found in a gene cluster at chromosomal region 19q13.4. The encoded protein belongs to the subfamily B class of LIR receptors which contain two or four extracellular immunoglobulin domains, a transmembrane domain, and two to four cytoplasmic immunoreceptor tyrosine-based inhibitory motifs (ITIMs). The receptor is expressed on immune cells where it binds to MHC class I molecules on antigen-presenting cells and transduces a negative signal that inhibits stimulation of an immune response. It is thought to control inflammatory responses and cytotoxicity to help focus the immune response and limit autoreactivity. Multiple transcript variants encoding different isoforms have been found for this gene. leukocyte immunoglobulin like receptor B3 11025 NA
ATG16L2 ENSG00000168010 NA autophagy related 16 like 2 89849 NA
VASP ENSG00000125753 Vasodilator-stimulated phosphoprotein (VASP) is a member of the Ena-VASP protein family. Ena-VASP family members contain an EHV1 N-terminal domain that binds proteins containing E/DFPPPPXD/E motifs and targets Ena-VASP proteins to focal adhesions. In the mid-region of the protein, family members have a proline-rich domain that binds SH3 and WW domain-containing proteins. Their C-terminal EVH2 domain mediates tetramerization and binds both G and F actin. VASP is associated with filamentous actin formation and likely plays a widespread role in cell adhesion and motility. VASP may also be involved in the intracellular signaling pathways that regulate integrin-extracellular matrix interactions. VASP is regulated by the cyclic nucleotide-dependent kinases PKA and PKG. vasodilator-stimulated phosphoprotein 7408 NA
ACSL1 ENSG00000151726 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. acyl-CoA synthetase long-chain family member 1 2180 NA
EFHD2 ENSG00000142634 NA EF-hand domain family member D2 79180 NA
JAK3 ENSG00000105639 The protein encoded by this gene is a member of the Janus kinase (JAK) family of tyrosine kinases involved in cytokine receptor-mediated intracellular signal transduction. It is predominantly expressed in immune cells and transduces a signal in response to its activation via tyrosine phosphorylation by interleukin receptors. Mutations in this gene are associated with autosomal SCID (severe combined immunodeficiency disease). Janus kinase 3 3718 NA
RNF149 ENSG00000163162 NA ring finger protein 149 284996 NA
RHOG ENSG00000177105 This gene encodes a member of the Rho family of small GTPases, which cycle between inactive GDP-bound and active GTP-bound states and function as molecular switches in signal transduction cascades. Rho proteins promote reorganization of the actin cytoskeleton and regulate cell shape, attachment, and motility. The encoded protein facilitates translocation of a functional guanine nucleotide exchange factor (GEF) complex from the cytoplasm to the plasma membrane where ras-related C3 botulinum toxin substrate 1 is activated to promote lamellipodium formation and cell migration. Two related pseudogene have been identified on chromosomes 20 and X. ras homolog family member G 391 NA
NA ENSG00000237683 NA NA NA TRUE
ADGRE5 ENSG00000123146 This gene encodes a member of the EGF-TM7 subfamily of adhesion G protein-coupled receptors, which mediate cell-cell interactions. These proteins are cleaved by self-catalytic proteolysis into a large extracellular subunit and seven-span transmembrane subunit, which associate at the cell surface as a receptor complex. The encoded protein may play a role in cell adhesion as well as leukocyte recruitment, activation and migration, and contains multiple extracellular EGF-like repeats which mediate binding to chondroitin sulfate and the cell surface complement regulatory protein CD55. Expression of this gene may play a role in the progression of several types of cancer. Alternatively spliced transcript variants encoding multiple isoforms with 3 to 5 EGF-like repeats have been observed for this gene. This gene is found in a cluster with other EGF-TM7 genes on the short arm of chromosome 19. adhesion G protein-coupled receptor E5 976 NA
ARHGAP9 ENSG00000123329 This gene encodes a member of the Rho-GAP family of GTPase activating proteins. The protein has substantial GAP activity towards several Rho-family GTPases in vitro, converting them to an inactive GDP-bound state. It is implicated in regulating adhesion of hematopoietic cells to the extracellular matrix. Multiple transcript variants encoding different isoforms have been found for this gene. Rho GTPase activating protein 9 64333 NA
CD53 ENSG00000143119 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. It contributes to the transduction of CD2-generated signals in T cells and natural killer cells and has been suggested to play a role in growth regulation. Familial deficiency of this gene has been linked to an immunodeficiency associated with recurrent infectious diseases caused by bacteria, fungi and viruses. Alternative splicing results in multiple transcript variants. CD53 molecule 963 NA
ZDHHC18 ENSG00000204160 NA zinc finger DHHC-type containing 18 84243 NA
ADAM8 ENSG00000151651 This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins, and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. The protein encoded by this gene may be involved in cell adhesion during neurodegeneration, and it is thought to be a target for allergic respiratory diseases, including asthma. Alternative splicing results in multiple transcript variants. ADAM metallopeptidase domain 8 101 NA
SORL1 ENSG00000137642 This gene encodes a mosaic protein that belongs to at least two families: the vacuolar protein sorting 10 (VPS10) domain-containing receptor family, and the low density lipoprotein receptor (LDLR) family. The encoded protein also contains fibronectin type III repeats and an epidermal growth factor repeat. The encoded preproprotein is proteolytically processed to generate the mature receptor, which likely plays roles in endocytosis and sorting. Mutations in this gene may be associated with Alzheimer’s disease. sortilin-related receptor, L(DLR class) A repeats containing 6653 NA
NAMPTP1 ENSG00000229644 NA nicotinamide phosphoribosyltransferase pseudogene 1 ENSG00000229644 NA
PHC2 ENSG00000134686 In Drosophila melanogaster, the ‘Polycomb’ group (PcG) of genes are part of a cellular memory system that is responsible for the stable inheritance of gene activity. PcG proteins form a large multimeric, chromatin-associated protein complex. The protein encoded by this gene has homology to the Drosophila PcG protein ‘polyhomeotic’ (Ph) and is known to heterodimerize with EDR1 and colocalize with BMI1 in interphase nuclei of human cells. The specific function in human cells has not yet been determined. Two transcript variants encoding different isoforms have been found for this gene. polyhomeotic homolog 2 1912 NA
STXBP2 ENSG00000076944 This gene encodes a member of the STXBP/unc-18/SEC1 family. The encoded protein is involved in intracellular trafficking, control of SNARE (soluble NSF attachment protein receptor) complex assembly, and the release of cytotoxic granules by natural killer cells. Mutations in this gene are associated with familial hemophagocytic lymphohistiocytosis. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. syntaxin binding protein 2 6813 NA
CYBA ENSG00000051523 Cytochrome b is comprised of a light chain (alpha) and a heavy chain (beta). This gene encodes the light, alpha subunit which has been proposed as a primary component of the microbicidal oxidase system of phagocytes. Mutations in this gene are associated with autosomal recessive chronic granulomatous disease (CGD), that is characterized by the failure of activated phagocytes to generate superoxide, which is important for the microbicidal activity of these cells. cytochrome b-245 alpha chain 1535 NA
C1orf162 ENSG00000143110 NA chromosome 1 open reading frame 162 128346 NA
SLC12A9 ENSG00000146828 NA solute carrier family 12 member 9 56996 NA
RPS6KA1 ENSG00000117676 This gene encodes a member of the RSK (ribosomal S6 kinase) family of serine/threonine kinases. This kinase contains 2 nonidentical kinase catalytic domains and phosphorylates various substrates, including members of the mitogen-activated kinase (MAPK) signalling pathway. The activity of this protein has been implicated in controlling cell growth and differentiation. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. ribosomal protein S6 kinase A1 6195 NA
SECTM1 ENSG00000141574 This gene encodes a transmembrane and secreted protein with characteristics of a type 1a transmembrane protein. It is found in a perinuclear Golgi-like pattern and thought to be involved in hematopoietic and/or immune system processes. secreted and transmembrane 1 6398 NA
SLC6A6 ENSG00000131389 This gene encodes a multi-pass membrane protein that is a member of a family of sodium and chloride-ion dependent transporters. The encoded protein transports taurine and beta-alanine. There is a pseudogene for this gene on chromosome 21. Alternative splicing results in multiple transcript variants. solute carrier family 6 member 6 6533 NA
NAIP ENSG00000249437 This gene is part of a 500 kb inverted duplication on chromosome 5q13. This duplicated region contains at least four genes and repetitive elements which make it prone to rearrangements and deletions. The repetitiveness and complexity of the sequence have also caused difficulty in determining the organization of this genomic region. This copy of the gene is full length; additional copies with truncations and internal deletions are also present in this region of chromosome 5q13. It is thought that this gene is a modifier of spinal muscular atrophy caused by mutations in a neighboring gene, SMN1. The protein encoded by this gene contains regions of homology to two baculovirus inhibitor of apoptosis proteins, and it is able to suppress apoptosis induced by various signals. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. NLR family, apoptosis inhibitory protein 4671 NA
TALDO1 ENSG00000177156 Transaldolase 1 is a key enzyme of the nonoxidative pentose phosphate pathway providing ribose-5-phosphate for nucleic acid synthesis and NADPH for lipid biosynthesis. This pathway can also maintain glutathione at a reduced state and thus protect sulfhydryl groups and cellular integrity from oxygen radicals. The functional gene of transaldolase 1 is located on chromosome 11 and a pseudogene is identified on chromosome 1 but there are conflicting map locations. The second and third exon of this gene were developed by insertion of a retrotransposable element. This gene is thought to be involved in multiple sclerosis. transaldolase 1 6888 NA
MYD88 ENSG00000172936 This gene encodes a cytosolic adapter protein that plays a central role in the innate and adaptive immune response. This protein functions as an essential signal transducer in the interleukin-1 and Toll-like receptor signaling pathways. These pathways regulate that activation of numerous proinflammatory genes. The encoded protein consists of an N-terminal death domain and a C-terminal Toll-interleukin1 receptor domain. Patients with defects in this gene have an increased susceptibility to pyogenic bacterial infections. Alternate splicing results in multiple transcript variants. myeloid differentiation primary response 88 4615 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",17,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 18 Annotations

out <- mygene::queryMany(gene_list[18,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name query symbol summary X_id notfound
protease, serine 1 ENSG00000204983 PRSS1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 NA
carboxypeptidase A1 ENSG00000091704 CPA1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357 NA
pancreatic lipase ENSG00000175535 PNLIP This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. 5406 NA
chymotrypsin like elastase family member 3A ENSG00000142789 CELA3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136 NA
glycoprotein 2 ENSG00000169347 GP2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. 2813 NA
colipase ENSG00000137392 CLPS The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. 1208 NA
lipase F, gastric type ENSG00000182333 LIPF This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. 8513 NA
chymotrypsinogen B2 ENSG00000168928 CTRB2 NA 440387 NA
carboxypeptidase B1 ENSG00000153002 CPB1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. 1360 NA
progastricsin (pepsinogen C) ENSG00000096088 PGC This gene encodes an aspartic proteinase that belongs to the peptidase family A1. The encoded protein is a digestive enzyme that is produced in the stomach and constitutes a major component of the gastric mucosa. This protein is also secreted into the serum. This protein is synthesized as an inactive zymogen that includes a highly basic prosegment. This enzyme is converted into its active mature form at low pH by sequential cleavage of the prosegment that is carried out by the enzyme itself. Polymorphisms in this gene are associated with susceptibility to gastric cancers. Serum levels of this enzyme are used as a biomarker for certain gastric diseases including Helicobacter pylori related gastritis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 1. 5225 NA
regenerating family member 1 alpha ENSG00000115386 REG1A This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5967 NA
chymotrypsinogen B1 ENSG00000168925 CTRB1 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. 1504 NA
chymotrypsin like elastase family member 3B ENSG00000219073 CELA3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. 23436 NA
carboxyl ester lipase ENSG00000170835 CEL The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. 1056 NA
pepsinogen 3, group I (pepsinogen A) ENSG00000229859 PGA3 This gene encodes a protein precursor of the digestive enzyme pepsin, a member of the peptidase A1 family of endopeptidases. The encoded precursor is secreted by gastric chief cells and undergoes autocatalytic cleavage in acidic conditions to form the active enzyme, which functions in the digestion of dietary proteins. This gene is found in a cluster of related genes on chromosome 11, each of which encodes one of multiple pepsinogens. Pepsinogen levels in serum may serve as a biomarker for atrophic gastritis and gastric cancer. 643834 NA
carboxypeptidase A2 ENSG00000158516 CPA2 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. 1358 NA
amylase, alpha 2A (pancreatic) ENSG00000243480 AMY2A This gene encodes a member of the alpha-amylase family of proteins. Amylases are secreted proteins that hydrolyze 1,4-alpha-glucoside bonds in oligosaccharides and polysaccharides, catalyzing the first step in digestion of dietary starch and glycogen. This gene and several family members are present in a gene cluster on chromosome 1. This gene encodes an amylase isoenzyme produced by the pancreas. 279 NA
regenerating family member 1 beta ENSG00000172023 REG1B This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5968 NA
chymotrypsin C ENSG00000162438 CTRC This gene encodes a member of the peptidase S1 family. The encoded protein is a serum calcium-decreasing factor that has chymotrypsin-like protease activity. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. 11330 NA
chymotrypsin like elastase family member 2A ENSG00000142615 CELA2A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. 63036 NA
pancreatic lipase related protein 1 ENSG00000187021 PNLIPRP1 NA 5407 NA
phospholipase A2 group IB ENSG00000170890 PLA2G1B This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. 5319 NA
NA ENSG00000250606 NA NA NA TRUE
NA ENSG00000240338 RP11-331F4.4 NA ENSG00000240338 NA
NA ENSG00000165862 NA NA NA TRUE
CUB and zona pellucida like domains 1 ENSG00000138161 CUZD1 NA 50624 NA
syncollin ENSG00000179751 SYCN NA 342898 NA
serine peptidase inhibitor, Kazal type 1 ENSG00000164266 SPINK1 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. 6690 NA
amylase, alpha 2B (pancreatic) ENSG00000240038 AMY2B Amylases are secreted proteins that hydrolyze 1,4-alpha-glucoside bonds in oligosaccharides and polysaccharides, and thus catalyze the first step in digestion of dietary starch and glycogen. The human genome has a cluster of several amylase genes that are expressed at high levels in either salivary gland or pancreas. This gene encodes an amylase isoenzyme produced by the pancreas. 280 NA
chymotrypsin like elastase family member 2B ENSG00000215704 CELA2B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2B is secreted from the pancreas as a zymogen. In other species, elastase 2B has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. 51032 NA
protease, serine 3 ENSG00000010438 PRSS3 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. 5646 NA
regenerating family member 3 alpha ENSG00000172016 REG3A This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. 5068 NA
amylase, alpha pseudogene 1 ENSG00000227408 AMYP1 NA ENSG00000227408 NA
chymotrypsin like ENSG00000141086 CTRL NA 1506 NA
protein disulfide isomerase family A member 2 ENSG00000185615 PDIA2 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). 64714 NA
UBE2R2 antisense RNA 1 ENSG00000235481 UBE2R2-AS1 NA ENSG00000235481 NA
SEL1L ERAD E3 ligase adaptor subunit ENSG00000071537 SEL1L The protein encoded by this gene is part of a protein complex required for the retrotranslocation or dislocation of misfolded proteins from the endoplasmic reticulum lumen to the cytosol, where they are degraded by the proteasome in a ubiquitin-dependent manner. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 6400 NA
endoplasmic reticulum oxidoreductase beta ENSG00000086619 ERO1B NA 56605 NA
X-box binding protein 1 ENSG00000100219 XBP1 This gene encodes a transcription factor that regulates MHC class II genes by binding to a promoter element referred to as an X box. This gene product is a bZIP protein, which was also identified as a cellular transcription factor that binds to an enhancer in the promoter of the T cell leukemia virus type 1 promoter. It may increase expression of viral proteins by acting as the DNA binding partner of a viral transactivator. It has been found that upon accumulation of unfolded proteins in the endoplasmic reticulum (ER), the mRNA of this gene is processed to an active form by an unconventional splicing mechanism that is mediated by the endonuclease inositol-requiring enzyme 1 (IRE1). The resulting loss of 26 nt from the spliced mRNA causes a frame-shift and an isoform XBP1(S), which is the functionally active transcription factor. The isoform encoded by the unspliced mRNA, XBP1(U), is constitutively expressed, and thought to function as a negative feedback regulator of XBP1(S), which shuts off transcription of target genes during the recovery phase of ER stress. A pseudogene of XBP1 has been identified and localized to chromosome 5. 7494 NA
ghrelin/obestatin prepropeptide ENSG00000157017 GHRL This gene encodes the ghrelin-obestatin preproprotein that is cleaved to yield two peptides, ghrelin and obestatin. Ghrelin is a powerful appetite stimulant and plays an important role in energy homeostasis. Its secretion is initiated when the stomach is empty, whereupon it binds to the growth hormone secretagogue receptor in the hypothalamus which results in the secretion of growth hormone (somatotropin). Ghrelin is thought to regulate multiple activities, including hunger, reward perception via the mesolimbic pathway, gastric acid secretion, gastrointestinal motility, and pancreatic glucose-stimulated insulin secretion. It was initially proposed that obestatin plays an opposing role to ghrelin by promoting satiety and thus decreasing food intake, but this action is still debated. Recent reports suggest multiple metabolic roles for obestatin, including regulating adipocyte function and glucose metabolism. Alternative splicing results in multiple transcript variants. In addition, antisense transcripts for this gene have been identified and may potentially regulate ghrelin-obestatin preproprotein expression. 51738 NA
tectonin beta-propeller repeat containing 1 ENSG00000205356 TECPR1 This gene encodes a tethering factor involved in autophagy. The encoded protein is found at autolysosomes, and is involved in targeting protein aggregates, damaged mitochondria, and bacterial pathogens for autophagy 25851 NA
aldo-keto reductase family 7, member A3 (aflatoxin aldehyde reductase) ENSG00000162482 AKR7A3 Aldo-keto reductases, such as AKR7A3, are involved in the detoxification of aldehydes and ketones. 22977 NA
potassium voltage-gated channel subfamily E regulatory subunit 2 ENSG00000159197 KCNE2 Voltage-gated potassium (Kv) channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume. This gene encodes a member of the potassium channel, voltage-gated, isk-related subfamily. This member is a small integral membrane subunit that assembles with the KCNH2 gene product, a pore-forming protein, to alter its function. This gene is expressed in heart and muscle and the gene mutations are associated with cardiac arrhythmia. 9992 NA
erythrocyte membrane protein band 4.1 like 4B ENSG00000095203 EPB41L4B NA 54566 NA
ribosome binding protein 1 ENSG00000125844 RRBP1 This gene encodes a ribosome-binding protein of the endoplasmic reticulum (ER) membrane. Studies suggest that this gene plays a role in ER proliferation, secretory pathways and secretory cell differentiation, and mediation of ER-microtubule interactions. Alternative splicing has been observed and protein isoforms are characterized by regions of N-terminal decapeptide and C-terminal heptad repeats. Splicing of the tandem repeats results in variations in ribosome-binding affinity and secretory function. The full-length nature of variants which differ in repeat length has not been determined. Pseudogenes of this gene have been identified on chromosomes 3 and 7, and RRBP1 has been excluded as a candidate gene in the cause of Alagille syndrome, the result of a mutation in a nearby gene on chromosome 20p12. 6238 NA
homer scaffolding protein 2 ENSG00000103942 HOMER2 This gene encodes a member of the homer family of dendritic proteins. Members of this family regulate group 1 metabotrophic glutamate receptor function. The encoded protein is a postsynaptic density scaffolding protein. Alternative splicing results in multiple transcript variants. Two related pseudogenes have been identified on chromosome 14. 9455 NA
carbonic anhydrase 9 ENSG00000107159 CA9 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. They show extensive diversity in tissue distribution and in their subcellular localization. CA IX is a transmembrane protein and is one of only two tumor-associated carbonic anhydrase isoenzymes known. It is expressed in all clear-cell renal cell carcinoma, but is not detected in normal kidney or most other normal tissues. It may be involved in cell proliferation and transformation. This gene was mapped to 17q21.2 by fluorescence in situ hybridization, however, radiation hybrid mapping localized it to 9p13-p12. 768 NA
annexin A4 ENSG00000196975 ANXA4 Annexin IV (ANX4) belongs to the annexin family of calcium-dependent phospholipid binding proteins. Although their functions are still not clearly defined, several members of the annexin family have been implicated in membrane-related events along exocytotic and endocytotic pathways. ANX4 has 45 to 59% identity with other members of its family and shares a similar size and exon-intron organization. Isolated from human placenta, ANX4 encodes a protein that has possible interactions with ATP, and has in vitro anticoagulant activity and also inhibits phospholipase A2 activity. ANX4 is almost exclusively expressed in epithelial cells. Several transcript variants encoding different isoforms have been found for this gene. 307 NA
solute carrier family 4 member 4 ENSG00000080493 SLC4A4 This gene encodes a sodium bicarbonate cotransporter (NBC) involved in the regulation of bicarbonate secretion and absorption and intracellular pH. Mutations in this gene are associated with proximal renal tubular acidosis. Multiple transcript variants encoding different isoforms have been found for this gene. 8671 NA
ATH1, acid trehalase-like 1 (yeast) ENSG00000142102 ATHL1 NA 80162 NA
transmembrane p24 trafficking protein 6 ENSG00000157315 TMED6 NA 146456 NA
programmed cell death 4 (neoplastic transformation inhibitor) ENSG00000150593 PDCD4 This gene is a tumor suppressor and encodes a protein that binds to the eukaryotic translation initiation factor 4A1 and inhibits its function by preventing RNA binding. Alternative splicing results in multiple transcript variants. 27250 NA
glycine N-methyltransferase ENSG00000124713 GNMT The protein encoded by this gene is an enzyme that catalyzes the conversion of S-adenosyl-L-methionine (along with glycine) to S-adenosyl-L-homocysteine and sarcosine. This protein is found in the cytoplasm and acts as a homotetramer. Defects in this gene are a cause of GNMT deficiency (hypermethioninemia). Alternative splicing results in multiple transcript variants. Naturally occurring readthrough transcription occurs between the upstream CNPY3 (canopy FGF signaling regulator 3) gene and this gene and is represented with GeneID:107080644. 27232 NA
prolyl 4-hydroxylase subunit beta ENSG00000185624 P4HB This gene encodes the beta subunit of prolyl 4-hydroxylase, a highly abundant multifunctional enzyme that belongs to the protein disulfide isomerase family. When present as a tetramer consisting of two alpha and two beta subunits, this enzyme is involved in hydroxylation of prolyl residues in preprocollagen. This enzyme is also a disulfide isomerase containing two thioredoxin domains that catalyze the formation, breakage and rearrangement of disulfide bonds. Other known functions include its ability to act as a chaperone that inhibits aggregation of misfolded proteins in a concentration-dependent manner, its ability to bind thyroid hormone, its role in both the influx and efflux of S-nitrosothiol-bound nitric oxide, and its function as a subunit of the microsomal triglyceride transfer protein complex. 5034 NA
BCL2/adenovirus E1B 19kDa interacting protein 3 ENSG00000176171 BNIP3 This gene is encodes a mitochondrial protein that contains a BH3 domain and acts as a pro-apoptotic factor. The encoded protein interacts with anti-apoptotic proteins, including the E1B 19 kDa protein and Bcl2. This gene is silenced in tumors by DNA methylation. 664 NA
glycine amidinotransferase ENSG00000171766 GATM This gene encodes a mitochondrial enzyme that belongs to the amidinotransferase family. This enzyme is involved in creatine biosynthesis, whereby it catalyzes the transfer of a guanido group from L-arginine to glycine, resulting in guanidinoacetic acid, the immediate precursor of creatine. Mutations in this gene cause arginine:glycine amidinotransferase deficiency, an inborn error of creatine synthesis characterized by mental retardation, language impairment, and behavioral disorders. 2628 NA
NA ENSG00000225555 AP000320.6 NA ENSG00000225555 NA
solute carrier family 39 member 5 ENSG00000139540 SLC39A5 The protein encoded by this gene belongs to the ZIP family of zinc transporters that transport zinc into cells from outside, and play a crucial role in controlling intracellular zinc levels. Zinc is an essential cofactor for many enzymes and proteins involved in gene transcription, growth, development and differentiation. Mutations in this gene have been associated with autosomal dominant high myopia (MYP24). Alternatively spliced transcript variants have been found for this gene. 283375 NA
tyrosylprotein sulfotransferase 2 ENSG00000128294 TPST2 The protein encoded by this gene catalyzes the O-sulfation of tyrosine residues within acidic regions of proteins. The encoded protein is a type II integral membrane protein found in the Golgi body. Two transcript variants encoding the same protein have been found for this gene. 8459 NA
NA ENSG00000260065 NA NA NA TRUE
stress-associated endoplasmic reticulum protein 1 ENSG00000120742 SERP1 NA 27230 NA
long intergenic non-protein coding RNA 339 ENSG00000218510 LINC00339 NA 29092 NA
phospholipid phosphatase 5 ENSG00000147535 PLPP5 NA 84513 NA
SRP receptor alpha subunit ENSG00000182934 SRPRA The gene encodes a subunit of the endoplasmic reticulum signal recognition particle receptor that, in conjunction with the signal recognition particle, is involved in the targeting and translocation of signal sequence tagged secretory and membrane proteins across the endoplasmic reticulum. Alternative splicing results in multiple transcript variants. 6734 NA
solute carrier family 43 member 1 ENSG00000149150 SLC43A1 SLC43A1 belongs to the system L family of plasma membrane carrier proteins that transports large neutral amino acids (Babu et al., 2003 [PubMed 12930836]). 8501 NA
glutamic pyruvate transaminase (alanine aminotransferase) 2 ENSG00000166123 GPT2 This gene encodes a mitochondrial alanine transaminase, a pyridoxal enzyme that catalyzes the reversible transamination between alanine and 2-oxoglutarate to generate pyruvate and glutamate. Alanine transaminases play roles in gluconeogenesis and amino acid metabolism in many tissues including skeletal muscle, kidney, and liver. Activating transcription factor 4 upregulates this gene under metabolic stress conditions in hepatocyte cell lines. A loss of function mutation in this gene has been associated with developmental encephalopathy. Alternative splicing results in multiple transcript variants. 84706 NA
FK506 binding protein 11 ENSG00000134285 FKBP11 FKBP11 belongs to the FKBP family of peptidyl-prolyl cis/trans isomerases, which catalyze the folding of proline-containing polypeptides. The peptidyl-prolyl isomerase activity of FKBP proteins is inhibited by the immunosuppressant compounds FK506 and rapamycin (Rulten et al., 2006 [PubMed 16596453]). 51303 NA
family with sequence similarity 174 member B ENSG00000185442 FAM174B NA 400451 NA
SID1 transmembrane family member 2 ENSG00000149577 SIDT2 NA 51092 NA
tandem C2 domains, nuclear ENSG00000165929 TC2N NA 123036 NA
eukaryotic translation initiation factor 4E binding protein 1 ENSG00000187840 EIF4EBP1 This gene encodes one member of a family of translation repressor proteins. The protein directly interacts with eukaryotic translation initiation factor 4E (eIF4E), which is a limiting component of the multisubunit complex that recruits 40S ribosomal subunits to the 5’ end of mRNAs. Interaction of this protein with eIF4E inhibits complex assembly and represses translation. This protein is phosphorylated in response to various signals including UV irradiation and insulin signaling, resulting in its dissociation from eIF4E and activation of mRNA translation. 1978 NA
taperin ENSG00000176058 TPRN This locus encodes a sensory epithelial protein. It was defined by linkage analysis in three Pakistani families to lie between D9S1818 (centromeric) and D9SH6 (telomeric). Mutations at this locus have been associated with autosomal recessive deafness. 286262 NA
BCL2/adenovirus E1B 19kDa interacting protein 3 pseudogene 1 ENSG00000197358 BNIP3P1 NA ENSG00000197358 NA
homocysteine inducible ER protein with ubiquitin like domain 1 ENSG00000051108 HERPUD1 The accumulation of unfolded proteins in the endoplasmic reticulum (ER) triggers the ER stress response. This response includes the inhibition of translation to prevent further accumulation of unfolded proteins, the increased expression of proteins involved in polypeptide folding, known as the unfolded protein response (UPR), and the destruction of misfolded proteins by the ER-associated protein degradation (ERAD) system. This gene may play a role in both UPR and ERAD. Its expression is induced by UPR and it has an ER stress response element in its promoter region while the encoded protein has an N-terminal ubiquitin-like domain which may interact with the ERAD system. This protein has been shown to interact with presenilin proteins and to increase the level of amyloid-beta protein following its overexpression. Alternative splicing of this gene produces multiple transcript variants encoding different isoforms. The full-length nature of all transcript variants has not been determined. 9709 NA
retinol binding protein 1 ENSG00000114115 RBP1 This gene encodes the carrier protein involved in the transport of retinol (vitamin A alcohol) from the liver storage site to peripheral tissue. Vitamin A is a fat-soluble vitamin necessary for growth, reproduction, differentiation of epithelial tissues, and vision. Multiple transcript variants encoding different isoforms have been found for this gene. 5947 NA
insulin receptor ENSG00000171105 INSR This gene encodes a member of the receptor tyrosine kinase family of proteins. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form a heterotetrameric receptor. Binding of insulin or other ligands to this receptor activates the insulin signaling pathway, which regulates glucose uptake and release, as well as the synthesis and storage of carbohydrates, lipids and protein. Mutations in this gene underlie the inherited severe insulin resistance syndromes including type A insulin resistance syndrome, Donohue syndrome and Rabson-Mendenhall syndrome. Alternative splicing results in multiple transcript variants. 3643 NA
SH3 and SYLF domain containing 1 ENSG00000035115 SH3YL1 NA 26751 NA
zinc finger protein 710 ENSG00000140548 ZNF710 NA 374655 NA
NA ENSG00000186275 NA NA NA TRUE
SRP receptor beta subunit ENSG00000144867 SRPRB The protein encoded by this gene has similarity to mouse protein which is a subunit of the signal recognition particle receptor (SR). This subunit is a transmembrane GTPase belonging to the GTPase superfamily. It anchors alpha subunit, a peripheral membrane GTPase, to the ER membrane. SR is required for the cotranslational targeting of both secretory and membrane proteins to the ER membrane. 58477 NA
transmembrane protein 97 ENSG00000109084 TMEM97 TMEM97 is a conserved integral membrane protein that plays a role in controlling cellular cholesterol levels (Bartz et al., 2009 [PubMed 19583955]). 27346 NA
pleckstrin homology, MyTH4 and FERM domain containing H3 ENSG00000068137 PLEKHH3 NA 79990 NA
NA ENSG00000259799 RP11-554A11.9 NA ENSG00000259799 NA
actin-related protein 2/3 complex inhibitor ENSG00000242498 ARPIN NA 348110 NA
kinase suppressor of ras 1 ENSG00000141068 KSR1 NA 8844 NA
prostate stem cell antigen ENSG00000167653 PSCA This gene encodes a glycosylphosphatidylinositol-anchored cell membrane glycoprotein. In addition to being highly expressed in the prostate it is also expressed in the bladder, placenta, colon, kidney, and stomach. This gene is up-regulated in a large proportion of prostate cancers and is also detected in cancers of the bladder and pancreas. This gene includes a polymorphism that results in an upstream start codon in some individuals; this polymorphism is thought to be associated with a risk for certain gastric and bladder cancers. Alternative splicing results in multiple transcript variants. 8000 NA
NA ENSG00000272894 RP5-1159O4.1 NA ENSG00000272894 NA
death-associated protein ENSG00000112977 DAP This gene encodes a basic, proline-rich, 15-kD protein. The protein acts as a positive mediator of programmed cell death that is induced by interferon-gamma. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. 1611 NA
uncharacterized LOC102723927 ENSG00000237940 LOC102723927 NA 102723927 NA
MPV17 mitochondrial inner membrane protein like ENSG00000156968 MPV17L NA 255027 NA
zinc finger protein 33B ENSG00000196693 ZNF33B This gene encodes a member of the zinc finger family of proteins. This gene shows decreased expression in cumulus cells derived from patients undergoing controlled ovarian stimulation. This gene is present in a gene cluster with several related zinc finger genes in the pericentromeric region of chromosome 10. Pseudogenes have been identified on chromosomes 7 and 10. Alternative splicing results in multiple transcript variants. 7582 NA
fucosyltransferase 1 (H blood group) ENSG00000174951 FUT1 The protein encoded by this gene is a Golgi stack membrane protein that is involved in the creation of a precursor of the H antigen, which is required for the final step in the soluble A and B antigen synthesis pathway. This gene is one of two encoding the galactoside 2-L-fucosyltransferase enzyme. Mutations in this gene are a cause of the H-Bombay blood group. 2523 NA
proteasome 26S subunit, non-ATPase 6 ENSG00000163636 PSMD6 This gene encodes a member of the protease subunit S10 family. The encoded protein is a subunit of the 26S proteasome which colocalizes with DNA damage foci and is involved in the ATP-dependent degradation of ubiquinated proteins. Alternative splicing results in multiple transcript variants 9861 NA
interferon related developmental regulator 1 ENSG00000006652 IFRD1 This gene is an immediate early gene that encodes a protein related to interferon-gamma. This protein may function as a transcriptional co-activator/repressor that controls the growth and differentiation of specific cell types during embryonic development and tissue regeneration. Mutations in this gene are associated with sensory/motor neuropathy with ataxia. This gene may also be involved in modulating the pathogenesis of cystic fibrosis lung disease. Alternate splicing results in multiple transcript variants. 3475 NA
geminin, DNA replication inhibitor ENSG00000112312 GMNN This gene encodes a protein that plays a critical role in cell cycle regulation. The encoded protein inhibits DNA replication by binding to DNA replication factor Cdt1, preventing the incorporation of minichromosome maintenance proteins into the pre-replication complex. The encoded protein is expressed during the S and G2 phases of the cell cycle and is degraded by the anaphase-promoting complex during the metaphase-anaphase transition. Increased expression of this gene may play a role in several malignancies including colon, rectal and breast cancer. Alternatively spliced transcript variants have been observed for this gene, and two pseudogenes of this gene are located on the short arm of chromosome 16. 51053 NA
ALG5, dolichyl-phosphate beta-glucosyltransferase ENSG00000120697 ALG5 This gene encodes a member of the glycosyltransferase 2 family. The encoded protein participates in glucosylation of the oligomannose core in N-linked glycosylation of proteins. The addition of glucose residues to the oligomannose core is necessary to ensure substrate recognition, and therefore, effectual transfer of the oligomannose core to the nascent glycoproteins. Multiple transcript variants encoding different isoforms have been found for this gene. 29880 NA
long intergenic non-protein coding RNA 1237 ENSG00000233806 LINC01237 NA 101927289 NA
calcium release activated channel regulator 2B ENSG00000177685 CRACR2B NA 283229 NA
leucine-rich repeat containing G protein-coupled receptor 4 ENSG00000205213 LGR4 G protein-coupled receptors (GPCRs) play key roles in a variety of physiologic functions. Members of the leucine-rich GPCR (LGR) family, such as GPR48, have multiple N-terminal leucine-rich repeats (LRRs) and a 7-transmembrane domain (Weng et al., 2008 [PubMed 18424556]). 55366 NA
NODAL modulator 3 ENSG00000103226 NOMO3 This gene encodes a protein originally thought to be related to the collagenase gene family. This gene is one of three highly similar genes in a duplicated region on the short arm of chromosome 16. These three genes encode closely related proteins that may have the same function. The protein encoded by one of these genes has been identified as part of a protein complex that participates in the Nodal signaling pathway during vertebrate development. Mutations in ABCC6, which is located nearby, rather than mutations in this gene are associated with pseudoxanthoma elasticum. 408050 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",18,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 19 Annotations

out <- mygene::queryMany(gene_list[19,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id name query summary notfound
HBB 3043 hemoglobin subunit beta ENSG00000244734 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. NA
HBA2 3040 hemoglobin subunit alpha 2 ENSG00000188536 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. NA
HBA1 3039 hemoglobin subunit alpha 1 ENSG00000206172 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. NA
FKBP8 23770 FK506 binding protein 8 ENSG00000105701 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. Unlike the other members of the family, this encoded protein does not seem to have PPIase/rotamase activity. It may have a role in neurons associated with memory function. NA
HBD 3045 hemoglobin subunit delta ENSG00000223609 The delta (HBD) and beta (HBB) genes are normally expressed in the adult: two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin. Two alpha chains plus two delta chains constitute HbA-2, which with HbF comprises the remaining 3% of adult hemoglobin. Five beta-like globin genes are found within a 45 kb cluster on chromosome 11 in the following order: 5’-epsilon–Ggamma–Agamma–delta–beta-3’. Mutations in the delta-globin gene are associated with beta-thalassemia. NA
SLC25A39 51629 solute carrier family 25 member 39 ENSG00000013306 This gene encodes a member of the SLC25 transporter or mitochondrial carrier family of proteins. Members of this family are encoded by the nuclear genome while their protein products are usually embedded in the inner mitochondrial membrane and exhibit wide-ranging substrate specificity. Although the encoded protein is currently considered an orphan transporter, this protein is related to other carriers known to transport amino acids. This protein may play a role in iron homeostasis. NA
HBG2 3048 hemoglobin subunit gamma 2 ENSG00000196565 The gamma globin genes (HBG1 and HBG2) are normally expressed in the fetal liver, spleen and bone marrow. Two gamma chains together with two alpha chains constitute fetal hemoglobin (HbF) which is normally replaced by adult hemoglobin (HbA) at birth. In some beta-thalassemias and related conditions, gamma chain production continues into adulthood. The two types of gamma chains differ at residue 136 where glycine is found in the G-gamma product (HBG2) and alanine is found in the A-gamma product (HBG1). The former is predominant at birth. The order of the genes in the beta-globin cluster is: 5’- epsilon – gamma-G – gamma-A – delta – beta–3’. NA
DEFA3 1668 defensin alpha 3 ENSG00000239839 Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. The protein encoded by this gene, defensin, alpha 3, is found in the microbicidal granules of neutrophils and likely plays a role in phagocyte-mediated host defense. Several alpha defensin genes are clustered on chromosome 8. This gene differs from defensin, alpha 1 by only one amino acid. This gene and the gene encoding defensin, alpha 1 are both subject to copy number variation. NA
DEFA1B 728358 defensin alpha 1B ENSG00000239839 Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. The protein encoded by this gene, defensin, alpha 1, is found in the microbicidal granules of neutrophils and likely plays a role in phagocyte-mediated host defense. Several alpha defensin genes are clustered on chromosome 8. This gene differs from defensin, alpha 3 by only one amino acid. This gene and the gene encoding defensin, alpha 3 are both subject to copy number variation. Two transcript variants encoding different isoforms have been found for this gene. NA
DEFA1 1667 defensin alpha 1 ENSG00000239839 Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. The protein encoded by this gene, defensin, alpha 1, is found in the microbicidal granules of neutrophils and likely plays a role in phagocyte-mediated host defense. Several alpha defensin genes are clustered on chromosome 8. This gene differs from defensin, alpha 3 by only one amino acid. This gene and the gene encoding defensin, alpha 3 are both subject to copy number variation. NA
GYPC 2995 glycophorin C (Gerbich blood group) ENSG00000136732 Glycophorin C (GYPC) is an integral membrane glycoprotein. It is a minor species carried by human erythrocytes, but plays an important role in regulating the mechanical stability of red cells. A number of glycophorin C mutations have been described. The Gerbich and Yus phenotypes are due to deletion of exon 3 and 2, respectively. The Webb and Duch antigens, also known as glycophorin D, result from single point mutations of the glycophorin C gene. The glycophorin C protein has very little homology with glycophorins A and B. Alternate splicing results in multiple transcript variants. NA
HBG1 3047 hemoglobin subunit gamma 1 ENSG00000213934 The gamma globin genes (HBG1 and HBG2) are normally expressed in the fetal liver, spleen and bone marrow. Two gamma chains together with two alpha chains constitute fetal hemoglobin (HbF) which is normally replaced by adult hemoglobin (HbA) at birth. In some beta-thalassemias and related conditions, gamma chain production continues into adulthood. The two types of gamma chains differ at residue 136 where glycine is found in the G-gamma product (HBG2) and alanine is found in the A-gamma product (HBG1). The former is predominant at birth. The order of the genes in the beta-globin cluster is: 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. NA
RP11-20D14.6 ENSG00000249790 NA ENSG00000249790 NA NA
FBXO7 25793 F-box protein 7 ENSG00000100225 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class and it may play a role in regulation of hematopoiesis. Alternatively spliced transcript variants of this gene have been identified with the full-length natures of only some variants being determined. NA
AZU1 566 azurocidin 1 ENSG00000172232 Azurophil granules, specialized lysosomes of the neutrophil, contain at least 10 proteins implicated in the killing of microorganisms. This gene encodes a preproprotein that is proteolytically processed to generate a mature azurophil granule antibiotic protein, with monocyte chemotactic and antimicrobial activity. It is also an important multifunctional inflammatory mediator. This encoded protein is a member of the serine protease gene family but it is not a serine proteinase, because the active site serine and histidine residues are replaced. The genes encoding this protein, neutrophil elastase 2, and proteinase 3 are in a cluster located at chromosome 19pter. All 3 genes are expressed coordinately and their protein products are packaged together into azurophil granules during neutrophil differentiation. NA
EPB42 2038 erythrocyte membrane protein band 4.2 ENSG00000166947 Erythrocyte membrane protein band 4.2 is an ATP-binding protein which may regulate the association of protein 3 with ankyrin. It probably has a role in erythrocyte shape and mechanical property regulation. Mutations in the EPB42 gene are associated with recessive spherocytic elliptocytosis and recessively transmitted hereditary hemolytic anemia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
NA NA NA ENSG00000161570 NA TRUE
GNLY 10578 granulysin ENSG00000115523 The product of this gene is a member of the saposin-like protein (SAPLIP) family and is located in the cytotoxic granules of T cells, which are released upon antigen stimulation. This protein is present in cytotoxic granules of cytotoxic T lymphocytes and natural killer cells, and it has antimicrobial activity against M. tuberculosis and other organisms. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
DNAJB1 3337 DnaJ heat shock protein family (Hsp40) member B1 ENSG00000132002 This gene encodes a member of the DnaJ or Hsp40 (heat shock protein 40 kD) family of proteins. DNAJ family members are characterized by a highly conserved amino acid stretch called the ‘J-domain’ and function as one of the two major classes of molecular chaperones involved in a wide range of cellular events, such as protein folding and oligomeric protein complex assembly. The encoded protein is a molecular chaperone that stimulates the ATPase activity of Hsp70 heat-shock proteins in order to promote protein folding and prevent misfolded protein aggregation. Alternative splicing results in multiple transcript variants. NA
SLC25A37 51312 solute carrier family 25 member 37 ENSG00000147454 SLC25A37 is a solute carrier localized in the mitochondrial inner membrane. It functions as an essential iron importer for the synthesis of mitochondrial heme and iron-sulfur clusters (summary by Chen et al., 2009 [PubMed 19805291]). NA
UBA52 7311 ubiquitin A-52 residue ribosomal protein fusion product 1 ENSG00000221983 Ubiquitin is a highly conserved nuclear and cytoplasmic protein that has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene encodes a fusion protein consisting of ubiquitin at the N terminus and ribosomal protein L40 at the C terminus, a C-terminal extension protein (CEP). Multiple processed pseudogenes derived from this gene are present in the genome. NA
GUK1 2987 guanylate kinase 1 ENSG00000143774 The protein encoded by this gene is an enzyme that catalyzes the transfer of a phosphate group from ATP to guanosine monophosphate (GMP) to form guanosine diphosphate (GDP). The encoded protein is thought to be a good target for cancer chemotherapy. Several transcript variants encoding different isoforms have been found for this gene. NA
BCL2L1 598 BCL2 like 1 ENSG00000171552 The protein encoded by this gene belongs to the BCL-2 protein family. BCL-2 family members form hetero- or homodimers and act as anti- or pro-apoptotic regulators that are involved in a wide variety of cellular activities. The proteins encoded by this gene are located at the outer mitochondrial membrane, and have been shown to regulate outer mitochondrial membrane channel (VDAC) opening. VDAC regulates mitochondrial membrane potential, and thus controls the production of reactive oxygen species and release of cytochrome C by mitochondria, both of which are the potent inducers of cell apoptosis. Alternative splicing results in multiple transcript variants encoding two different isoforms. The longer isoform acts as an apoptotic inhibitor and the shorter isoform acts as an apoptotic activator. NA
RBM38 55544 RNA binding motif protein 38 ENSG00000132819 NA NA
NKG7 4818 natural killer cell granule protein 7 ENSG00000105374 NA NA
ADIPOR1 51094 adiponectin receptor 1 ENSG00000159346 This gene encodes a protein which acts as a receptor for adiponectin, a hormone secreted by adipocytes which regulates fatty acid catabolism and glucose levels. Binding of adiponectin to the encoded protein results in activation of an AMP-activated kinase signaling pathway which affects levels of fatty acid oxidation and insulin sensitivity. A pseudogene of this gene is located on chromosome 14. Multiple alternatively spliced transcript variants have been found for this gene. NA
NPRL3 8131 NPR3 like, GATOR1 complex subunit ENSG00000103148 The function of the encoded protein is not known. NA
UBB 7314 ubiquitin B ENSG00000170315 This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. NA
BPGM 669 bisphosphoglycerate mutase ENSG00000172331 2,3-diphosphoglycerate (2,3-DPG) is a small molecule found at high concentrations in red blood cells where it binds to and decreases the oxygen affinity of hemoglobin. This gene encodes a multifunctional enzyme that catalyzes 2,3-DPG synthesis via its synthetase activity, and 2,3-DPG degradation via its phosphatase activity. The enzyme also has phosphoglycerate phosphomutase activity. Deficiency of this enzyme increases the affinity of cells for oxygen. Mutations in this gene result in hemolytic anemia. Multiple alternatively spliced variants, encoding the same protein, have been identified. NA
SH2D2A 9047 SH2 domain containing 2A ENSG00000027869 This gene encodes an adaptor protein thought to function in T-cell signal transduction. A related protein in mouse is responsible for the activation of lymphocyte-specific protein-tyrosine kinase and functions in downstream signaling. Alternative splicing results in multiple transcript variants. NA
FAM210B 116151 family with sequence similarity 210 member B ENSG00000124098 NA NA
GTPBP1 9567 GTP binding protein 1 ENSG00000100226 This gene is upregulated by interferon-gamma and encodes a protein that is a member of the AGP11/GTPBP1 family of GTP-binding proteins. A structurally similar protein has been found in mouse, where disruption of the gene for that protein had no observable phenotype. NA
STRADB 55437 STE20-related kinase adaptor beta ENSG00000082146 This gene encodes a protein that belongs to the serine/threonine protein kinase STE20 subfamily. One of the active site residues in the protein kinase domain of this protein is altered, and it is thus a pseudokinase. This protein is a component of a complex involved in the activation of serine/threonine kinase 11, a master kinase that regulates cell polarity and energy-generating metabolism. This complex regulates the relocation of this kinase from the nucleus to the cytoplasm, and it is essential for G1 cell cycle arrest mediated by this kinase. The protein encoded by this gene can also interact with the X chromosome-linked inhibitor of apoptosis protein, and this interaction enhances the anti-apoptotic activity of this protein via the JNK1 signal transduction pathway. Two pseudogenes, located on chromosomes 1 and 7, have been found for this gene. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
CTSW 1521 cathepsin W ENSG00000172543 The protein encoded by this gene, a member of the peptidase C1 family, is a cysteine proteinase that may have a specific function in the mechanism or regulation of T-cell cytolytic activity. The encoded protein is found associated with the membrane inside the endoplasmic reticulum of natural killer and cytotoxic T-cells. Expression of this gene is up-regulated by interleukin-2. NA
UBXN6 80700 UBX domain protein 6 ENSG00000167671 NA NA
DCAF12 25853 DDB1 and CUL4 associated factor 12 ENSG00000198876 This gene encodes a WD repeat-containing protein that interacts with the COP9 signalosome, a macromolecular complex that interacts with cullin-RING E3 ligases and regulates their activity by hydrolyzing cullin-Nedd8 conjugates. NA
CXCL8 3576 C-X-C motif chemokine ligand 8 ENSG00000169429 The protein encoded by this gene is a member of the CXC chemokine family. This chemokine is one of the major mediators of the inflammatory response. This chemokine is secreted by several cell types. It functions as a chemoattractant, and is also a potent angiogenic factor. This gene is believed to play a role in the pathogenesis of bronchiolitis, a common respiratory tract disease caused by viral infection. This gene and other ten members of the CXC chemokine gene family form a chemokine gene cluster in a region mapped to chromosome 4q. NA
PLAUR 5329 plasminogen activator, urokinase receptor ENSG00000011422 This gene encodes the receptor for urokinase plasminogen activator and, given its role in localizing and promoting plasmin formation, likely influences many normal and pathological processes related to cell-surface plasminogen activation and localized degradation of the extracellular matrix. It binds both the proprotein and mature forms of urokinase plasminogen activator and permits the activation of the receptor-bound pro-enzyme by plasmin. The protein lacks transmembrane or cytoplasmic domains and may be anchored to the plasma membrane by a glycosyl-phosphatidylinositol (GPI) moiety following cleavage of the nascent polypeptide near its carboxy-terminus. However, a soluble protein is also produced in some cell types. Alternative splicing results in multiple transcript variants encoding different isoforms. The proprotein experiences several post-translational cleavage reactions that have not yet been fully defined. NA
PHOSPHO1 162466 phosphoethanolamine/phosphocholine phosphatase ENSG00000173868 NA NA
BLVRB 645 biliverdin reductase B ENSG00000090013 The final step in heme metabolism in mammals is catalyzed by the cytosolic biliverdin reductase enzymes A and B (EC 1.3.1.24). NA
ASCC2 84164 activating signal cointegrator 1 complex subunit 2 ENSG00000100325 NA NA
TRBC2 ENSG00000211772 T cell receptor beta constant 2 ENSG00000211772 NA NA
HIST2H2BF 440689 histone cluster 2, H2bf ENSG00000203814 Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. This structure consists of approximately 146 bp of DNA wrapped around a nucleosome, an octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene encodes a replication-dependent histone that is a member of the histone H2B family and is found in a histone cluster on chromosome 1. NA
RP11-138I1.4 ENSG00000265401 NA ENSG00000265401 NA NA
C9orf78 51759 chromosome 9 open reading frame 78 ENSG00000136819 NA NA
ARL4C 10123 ADP ribosylation factor like GTPase 4C ENSG00000188042 ADP-ribosylation factor-like 4C is a member of the ADP-ribosylation factor family of GTP-binding proteins. ARL4C is closely similar to ARL4A and ARL4D and each has a nuclear localization signal and an unusually high guanine nucleotide exchange rate. This protein may play a role in cholesterol transport. NA
SIAH2 6478 siah E3 ubiquitin protein ligase 2 ENSG00000181788 This gene encodes a protein that is a member of the seven in absentia homolog (SIAH) family. The protein is an E3 ligase and is involved in ubiquitination and proteasome-mediated degradation of specific proteins. The activity of this ubiquitin ligase has been implicated in regulating cellular response to hypoxia. NA
HMBS 3145 hydroxymethylbilane synthase ENSG00000256269 This gene encodes a member of the hydroxymethylbilane synthase superfamily. The encoded protein is the third enzyme of the heme biosynthetic pathway and catalyzes the head to tail condensation of four porphobilinogen molecules into the linear hydroxymethylbilane. Mutations in this gene are associated with the autosomal dominant disease acute intermittent porphyria. Alternatively spliced transcript variants encoding different isoforms have been described. NA
RP11-329A14.1 ENSG00000235105 NA ENSG00000235105 NA NA
NA NA NA ENSG00000257034 NA TRUE
GZMM 3004 granzyme M ENSG00000197540 Human natural killer (NK) cells and activated lymphocytes express and store a distinct subset of neutral serine proteases together with proteoglycans and other immune effector molecules in large cytoplasmic granules. These serine proteases are collectively termed granzymes and include 4 distinct gene products: granzyme A, granzyme B, granzyme H, and the protein encoded by this gene, granzyme M. Two transcript variants encoding different isoforms have been found for this gene. NA
LYL1 4066 lymphoblastic leukemia associated hematopoiesis regulator 1 ENSG00000104903 This gene represents a basic helix-loop-helix transcription factor. The encoded protein may play roles in blood vessel maturation and hematopoeisis. A translocation between this locus and the T cell receptor beta locus (GeneID 6957) on chromosome 7 has been associated with acute lymphoblastic leukemia. NA
ATG2A 23130 autophagy related 2A ENSG00000110046 NA NA
HIST1H1C 3006 histone cluster 1, H1c ENSG00000187837 Histones are basic nuclear proteins responsible for nucleosome structure of the chromosomal fiber in eukaryotes. Two molecules of each of the four core histones (H2A, H2B, H3, and H4) form an octamer, around which approximately 146 bp of DNA is wrapped in repeating units, called nucleosomes. The linker histone, H1, interacts with linker DNA between nucleosomes and functions in the compaction of chromatin into higher order structures. This gene is intronless and encodes a replication-dependent histone that is a member of the histone H1 family. Transcripts from this gene lack polyA tails but instead contain a palindromic termination element. This gene is found in the large histone gene cluster on chromosome 6. NA
CD7 924 CD7 molecule ENSG00000173762 This gene encodes a transmembrane protein which is a member of the immunoglobulin superfamily. This protein is found on thymocytes and mature T cells. It plays an essential role in T-cell interactions and also in T-cell/B-cell interaction during early lymphoid development. NA
RP4-800J21.3 ENSG00000218018 NA ENSG00000218018 NA NA
FECH 2235 ferrochelatase ENSG00000066926 The protein encoded by this gene is localized to the mitochondrion, where it catalyzes the insertion of the ferrous form of iron into protoporphyrin IX in the heme synthesis pathway. Mutations in this gene are associated with erythropoietic protoporphyria. Two transcript variants encoding different isoforms have been found for this gene. A pseudogene of this gene is found on chromosome 3. NA
RHCE 6006 Rh blood group CcEe antigens ENSG00000188672 The Rh blood group system is the second most clinically significant of the blood groups, second only to ABO. It is also the most polymorphic of the blood groups, with variations due to deletions, gene conversions, and missense mutations. The Rh blood group includes this gene which encodes both the RhC and RhE antigens on a single polypeptide and a second gene which encodes the RhD protein. The classification of Rh-positive and Rh-negative individuals is determined by the presence or absence of the highly immunogenic RhD protein on the surface of erythrocytes. A mutation in this gene results in amorph-type Rh-null disease. Alternative splicing of this gene results in four transcript variants encoding four different isoforms. NA
HBE1 3046 hemoglobin subunit epsilon 1 ENSG00000213931 The epsilon globin gene (HBE) is normally expressed in the embryonic yolk sac: two epsilon chains together with two zeta chains (an alpha-like globin) constitute the embryonic hemoglobin Hb Gower I; two epsilon chains together with two alpha chains form the embryonic Hb Gower II. Both of these embryonic hemoglobins are normally supplanted by fetal, and later, adult hemoglobin. The five beta-like globin genes are found within a 45 kb cluster on chromosome 11 in the following order: 5’-epsilon - G-gamma - A-gamma - delta - beta-3’ NA
RHD 6007 Rh blood group D antigen ENSG00000187010 The Rh blood group system is the second most clinically significant of the blood groups, second only to ABO. It is also the most polymorphic of the blood groups, with variations due to deletions, gene conversions, and missense mutations. The Rh blood group includes this gene, which encodes the RhD protein, and a second gene that encodes both the RhC and RhE antigens on a single polypeptide. The two genes, and a third unrelated gene, are found in a cluster on chromosome 1. The classification of Rh-positive and Rh-negative individuals is determined by the presence or absence of the highly immunogenic RhD protein on the surface of erythrocytes. Multiple transcript variants encoding different isoforms have been found for this gene. NA
AGO2 27161 argonaute 2, RISC catalytic component ENSG00000123908 This gene encodes a member of the Argonaute family of proteins which play a role in RNA interference. The encoded protein is highly basic, and contains a PAZ domain and a PIWI domain. It may interact with dicer1 and play a role in short-interfering-RNA-mediated gene silencing. Multiple transcript variants encoding different isoforms have been found for this gene. NA
PRDX3P1 ENSG00000229598 peroxiredoxin 3 pseudogene 1 ENSG00000229598 NA NA
FTH1 2495 ferritin, heavy polypeptide 1 ENSG00000167996 This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined. NA
JAZF1 221895 JAZF zinc finger 1 ENSG00000153814 This gene encodes a nuclear protein with three C2H2-type zinc fingers, and functions as a transcriptional repressor. Chromosomal aberrations involving this gene are associated with endometrial stromal tumors. Alternatively spliced variants which encode different protein isoforms have been described; however, not all variants have been fully characterized NA
HIST2H2AC 8338 histone cluster 2, H2ac ENSG00000184260 Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Two molecules of each of the four core histones (H2A, H2B, H3, and H4) form an octamer, around which approximately 146 bp of DNA is wrapped in repeating units, called nucleosomes. The linker histone, H1, interacts with linker DNA between nucleosomes and functions in the compaction of chromatin into higher order structures. This gene is intronless and encodes a replication-dependent histone that is a member of the histone H2A family. NA
HIST1H1E 3008 histone cluster 1, H1e ENSG00000168298 Histones are basic nuclear proteins responsible for nucleosome structure of the chromosomal fiber in eukaryotes. Two molecules of each of the four core histones (H2A, H2B, H3, and H4) form an octamer, around which approximately 146 bp of DNA is wrapped in repeating units, called nucleosomes. The linker histone, H1, interacts with linker DNA between nucleosomes and functions in the compaction of chromatin into higher order structures. This gene is intronless and encodes a replication-dependent histone that is a member of the histone H1 family. Transcripts from this gene lack polyA tails but instead contain a palindromic termination element. This gene is found in the large histone gene cluster on chromosome 6. NA
FTH1P7 ENSG00000232187 ferritin, heavy polypeptide 1 pseudogene 7 ENSG00000232187 NA NA
NA NA NA ENSG00000183558 NA TRUE
NA NA NA ENSG00000168274 NA TRUE
SCARNA17 ENSG00000251992 small Cajal body-specific RNA 17 ENSG00000251992 NA NA
SMIM1 388588 small integral membrane protein 1 (Vel blood group) ENSG00000235169 This gene encodes a small, conserved protein that participates in red blood cell formation. The encoded protein is localized to the cell membrane and is the antigen for the Vel blood group. Alternative splicing results in different transcript variants that encode the same protein. NA
SIAH2-AS1 ENSG00000244265 SIAH2 antisense RNA 1 ENSG00000244265 NA NA
CDC34 997 cell division cycle 34 ENSG00000099804 The protein encoded by this gene is a member of the ubiquitin-conjugating enzyme family. Ubiquitin-conjugating enzyme catalyzes the covalent attachment of ubiquitin to other proteins. This protein is a part of the large multiprotein complex, which is required for ubiquitin-mediated degradation of cell cycle G1 regulators, and for the initiation of DNA replication. NA
HILPDA 29923 hypoxia inducible lipid droplet associated ENSG00000135245 NA NA
MKRN1 23608 makorin ring finger protein 1 ENSG00000133606 This gene encodes a protein that belongs to a novel class of zinc finger proteins. The encoded protein functions as a transcriptional co-regulator, and as an E3 ubiquitin ligase that promotes the ubiquitination and proteasomal degradation of target proteins. The protein encoded by this gene is thought to regulate RNA polymerase II-catalyzed transcription. Substrates for this protein’s E3 ubiquitin ligase activity include the capsid protein of the West Nile virus and the catalytic subunit of the telomerase ribonucleoprotein. This protein controls cell cycle arrest and apoptosis by regulating p21, a cell cycle regulator, and the tumor suppressor protein p53. Pseudogenes of this gene are present on chromosomes 1, 3, 9, 12 and 20, and on the X chromosome. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
RP11-68I3.11 ENSG00000265625 NA ENSG00000265625 NA NA
FAM117A 81558 family with sequence similarity 117 member A ENSG00000121104 NA NA
CD160 11126 CD160 molecule ENSG00000117281 CD160 is an 27 kDa glycoprotein which was initially identified with the monoclonal antibody BY55. Its expression is tightly associated with peripheral blood NK cells and CD8 T lymphocytes with cytolytic effector activity. The cDNA sequence of CD160 predicts a cysteine-rich, glycosylphosphatidylinositol-anchored protein of 181 amino acids with a single Ig-like domain weakly homologous to KIR2DL4 molecule. CD160 is expressed at the cell surface as a tightly disulfide-linked multimer. RNA blot analysis revealed CD160 mRNAs of 1.5 and 1.6 kb whose expression was highly restricted to circulating NK and T cells, spleen and small intestine. Within NK cells CD160 is expressed by CD56dimCD16+ cells whereas among circulating T cells its expression is mainly restricted to TCRgd bearing cells and to TCRab+CD8brightCD95+CD56+CD28-CD27-cells. In tissues, CD160 is expressed on all intestinal intraepithelial lymphocytes. CD160 shows a broad specificity for binding to both classical and nonclassical MHC class I molecules. NA
FTH1P16 ENSG00000227376 ferritin, heavy polypeptide 1 pseudogene 16 ENSG00000227376 NA NA
RP11-155G14.6 ENSG00000240758 NA ENSG00000240758 NA NA
NA NA NA ENSG00000197697 NA TRUE
AC068580.5 ENSG00000229512 NA ENSG00000229512 NA NA
NA NA NA ENSG00000203583 NA TRUE
RP11-613C6.2 ENSG00000250751 NA ENSG00000250751 NA NA
FTH1P23 ENSG00000242960 ferritin, heavy polypeptide 1 pseudogene 23 ENSG00000242960 NA NA
RP11-22N19.2 ENSG00000273320 NA ENSG00000273320 NA NA
KLRG1 10219 killer cell lectin like receptor G1 ENSG00000139187 Natural killer (NK) cells are lymphocytes that can mediate lysis of certain tumor cells and virus-infected cells without previous activation. They can also regulate specific humoral and cell-mediated immunity. The protein encoded by this gene belongs to the killer cell lectin-like receptor (KLR) family, which is a group of transmembrane proteins preferentially expressed in NK cells. Studies in mice suggested that the expression of this gene may be regulated by MHC class I molecules. Alternatively spliced transcript variants have been reported, but their full-length natures have not yet been determined. NA
GLRX5 51218 glutaredoxin 5 ENSG00000182512 This gene encodes a mitochondrial protein, which is evolutionarily conserved. It is involved in the biogenesis of iron-sulfur clusters, which are required for normal iron homeostasis. Mutations in this gene are associated with autosomal recessive pyridoxine-refractory sideroblastic anemia. NA
FAM86B3P 286042 family with sequence similarity 86, member A pseudogene ENSG00000173295 NA NA
CTD-2139B15.2 ENSG00000248223 NA ENSG00000248223 NA NA
NA NA NA NA NA TRUE
NA NA NA NA NA TRUE
NA NA NA NA NA TRUE
NA NA NA NA NA TRUE
NA NA NA NA NA TRUE
NA NA NA NA NA TRUE
NA NA NA NA NA TRUE
NA NA NA NA NA TRUE
NA NA NA NA NA TRUE
NA NA NA NA NA TRUE
NA NA NA NA NA TRUE
NA NA NA NA NA TRUE
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",19,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 20 Annotations

out <- mygene::queryMany(gene_list[20,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
query symbol summary X_id name
ENSG00000163631 ALB Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. 213 albumin
ENSG00000257017 HP This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. 3240 haptoglobin
ENSG00000171564 FGB The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 2244 fibrinogen beta chain
ENSG00000171560 FGA This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. 2243 fibrinogen alpha chain
ENSG00000229314 ORM1 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. 5004 orosomucoid 1
ENSG00000132693 CRP The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. 1401 C-reactive protein, pentraxin-related
ENSG00000171557 FGG The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia and thrombophilia. Alternative splicing results in transcript variants encoding different isoforms. 2266 fibrinogen gamma chain
ENSG00000197249 SERPINA1 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. 5265 serpin family A member 1
ENSG00000134339 SAA2 NA 6289 serum amyloid A2
ENSG00000130649 CYP2E1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and is induced by ethanol, the diabetic state, and starvation. The enzyme metabolizes both endogenous substrates, such as ethanol, acetone, and acetal, as well as exogenous substrates including benzene, carbon tetrachloride, ethylene glycol, and nitrosamines which are premutagens found in cigarette smoke. Due to its many substrates, this enzyme may be involved in such varied processes as gluconeogenesis, hepatic cirrhosis, diabetes, and cancer. 1571 cytochrome P450 family 2 subfamily E member 1
ENSG00000110169 HPX This gene encodes a plasma glycoprotein that binds heme with high affinity. The encoded protein is an acute phase protein that transports heme from the plasma to the liver and may be involved in protecting cells from oxidative stress. 3263 hemopexin
ENSG00000118137 APOA1 This gene encodes apolipoprotein A-I, which is the major protein component of high density lipoprotein (HDL) in plasma. The encoded preproprotein is proteolytically processed to generate the mature protein, which promotes cholesterol efflux from tissues to the liver for excretion, and is a cofactor for lecithin cholesterolacyltransferase (LCAT), an enzyme responsible for the formation of most plasma cholesteryl esters. This gene is closely linked with two other apolipoprotein genes on chromosome 11. Defects in this gene are associated with HDL deficiencies, including Tangier disease, and with systemic non-neuropathic amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein. 335 apolipoprotein A-I
ENSG00000173432 SAA1 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. 6288 serum amyloid A1
ENSG00000091583 APOH Apolipoprotein H has been implicated in a variety of physiologic pathways including lipoprotein metabolism, coagulation, and the production of antiphospholipid autoantibodies. APOH may be a required cofactor for anionic phospholipid binding by the antiphospholipid autoantibodies found in sera of many patients with lupus and primary antiphospholipid syndrome, but it does not seem to be required for the reactivity of antiphospholipid autoantibodies associated with infections. 350 apolipoprotein H
ENSG00000110245 APOC3 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. 345 apolipoprotein C-III
ENSG00000255071 SAA2-SAA4 This locus represents naturally occurring read-through transcription between the neighboring serum amyloid A2 and serum amyloid A4 genes on chromosome 11. The read-through transcript produces a fusion protein that shares sequence identity with each individual gene product. 100528017 SAA2-SAA4 readthrough
ENSG00000158874 APOA2 This gene encodes apolipoprotein (apo-) A-II, which is the second most abundant protein of the high density lipoprotein particles. The protein is found in plasma as a monomer, homodimer, or heterodimer with apolipoprotein D. Defects in this gene may result in apolipoprotein A-II deficiency or hypercholesterolemia. 336 apolipoprotein A-II
ENSG00000106927 AMBP This gene encodes a complex glycoprotein secreted in plasma. The precursor is proteolytically processed into distinct functioning proteins: alpha-1-microglobulin, which belongs to the superfamily of lipocalin transport proteins and may play a role in the regulation of inflammatory processes, and bikunin, which is a urinary trypsin inhibitor belonging to the superfamily of Kunitz-type protease inhibitors and plays an important role in many physiological and pathological processes. This gene is located on chromosome 9 in a cluster of lipocalin genes. 259 alpha-1-microglobulin/bikunin precursor
ENSG00000125730 C3 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. 718 complement component 3
ENSG00000138207 RBP4 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. 5950 retinol binding protein 4
ENSG00000021826 CPS1 The mitochondrial enzyme encoded by this gene catalyzes synthesis of carbamoyl phosphate from ammonia and bicarbonate. This reaction is the first committed step of the urea cycle, which is important in the removal of excess urea from cells. The encoded protein may also represent a core mitochondrial nucleoid protein. Three transcript variants encoding different isoforms have been found for this gene. The shortest isoform may not be localized to the mitochondrion. Mutations in this gene have been associated with carbamoyl phosphate synthetase deficiency, susceptibility to persistent pulmonary hypertension, and susceptibility to venoocclusive disease after bone marrow transplantation. 1373 carbamoyl-phosphate synthase 1
ENSG00000109072 VTN The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. 7448 vitronectin
ENSG00000167711 SERPINF2 This gene encodes a member of the serpin family of serine protease inhibitors. The protein is a major inhibitor of plasmin, which degrades fibrin and various other proteins. Consequently, the proper function of this gene has a major role in regulating the blood clotting pathway. Mutations in this gene result in alpha-2-plasmin inhibitor deficiency, which is characterized by severe hemorrhagic diathesis. Multiple transcript variants encoding different isoforms have been found for this gene. 5345 serpin family F member 2
ENSG00000158104 HPD The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK). Two transcript variants encoding different isoforms have been found for this gene. 3242 4-hydroxyphenylpyruvate dioxygenase
ENSG00000121410 A1BG The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. 1 alpha-1-B glycoprotein
ENSG00000136872 ALDOB Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. 229 aldolase, fructose-bisphosphate B
ENSG00000138115 CYP2C8 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and its expression is induced by phenobarbital. The enzyme is known to metabolize many xenobiotics, including the anticonvulsive drug mephenytoin, benzo(a)pyrene, 7-ethyoxycoumarin, and the anti-cancer drug taxol. This gene is located within a cluster of cytochrome P450 genes on chromosome 10q24. Several transcript variants encoding a few different isoforms have been found for this gene. 1558 cytochrome P450 family 2 subfamily C member 8
ENSG00000130208 APOC1 This gene encodes a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. The encoded protein plays a central role in high density lipoprotein (HDL) and very low density lipoprotein (VLDL) metabolism. This protein has also been shown to inhibit cholesteryl ester transfer protein in plasma. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. 341 apolipoprotein C-I
ENSG00000169136 ATF5 NA 22809 activating transcription factor 5
ENSG00000118271 TTR This gene encodes transthyretin, one of the three prealbumins including alpha-1-antitrypsin, transthyretin and orosomucoid. Transthyretin is a carrier protein; it transports thyroid hormones in the plasma and cerebrospinal fluid, and also transports retinol (vitamin A) in the plasma. The protein consists of a tetramer of identical subunits. More than 80 different mutations in this gene have been reported; most mutations are related to amyloid deposition, affecting predominantly peripheral nerve and/or the heart, and a small portion of the gene mutations is non-amyloidogenic. The diseases caused by mutations include amyloidotic polyneuropathy, euthyroid hyperthyroxinaemia, amyloidotic vitreous opacities, cardiomyopathy, oculoleptomeningeal amyloidosis, meningocerebrovascular amyloidosis, carpal tunnel syndrome, etc. 7276 transthyretin
ENSG00000106327 TFR2 This gene encodes a single-pass type II membrane protein, which is a member of the transferrin receptor-like family. This protein mediates cellular uptake of transferrin-bound iron, and may be involved in iron metabolism, hepatocyte function and erythrocyte differentiation. Mutations in this gene have been associated with hereditary hemochromatosis type III. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 7036 transferrin receptor 2
ENSG00000268230 CTD-2619J13.8 NA ENSG00000268230 NA
ENSG00000162267 ITIH3 This gene encodes the heavy chain subunit of the pre-alpha-trypsin inhibitor complex. This complex may stabilize the extracellular matrix through its ability to bind hyaluronic acid. Polymorphisms of this gene may be associated with increased risk for schizophrenia and major depressive disorder. This gene is present in an inter-alpha-trypsin inhibitor family gene cluster on chromosome 3. 3699 inter-alpha-trypsin inhibitor heavy chain 3
ENSG00000139547 RDH16 NA 8608 retinol dehydrogenase 16 (all-trans)
ENSG00000135744 AGT The protein encoded by this gene, pre-angiotensinogen or angiotensinogen precursor, is expressed in the liver and is cleaved by the enzyme renin in response to lowered blood pressure. The resulting product, angiotensin I, is then cleaved by angiotensin converting enzyme (ACE) to generate the physiologically active enzyme angiotensin II. The protein is involved in maintaining blood pressure and in the pathogenesis of essential hypertension and preeclampsia. Mutations in this gene are associated with susceptibility to essential hypertension, and can cause renal tubular dysgenesis, a severe disorder of renal tubular development. Defects in this gene have also been associated with non-familial structural atrial fibrillation, and inflammatory bowel disease. 183 angiotensinogen
ENSG00000124253 PCK1 This gene is a main control point for the regulation of gluconeogenesis. The cytosolic enzyme encoded by this gene, along with GTP, catalyzes the formation of phosphoenolpyruvate from oxaloacetate, with the release of carbon dioxide and GDP. The expression of this gene can be regulated by insulin, glucocorticoids, glucagon, cAMP, and diet. Defects in this gene are a cause of cytosolic phosphoenolpyruvate carboxykinase deficiency. A mitochondrial isozyme of the encoded protein also has been characterized. 5105 phosphoenolpyruvate carboxykinase 1
ENSG00000175003 SLC22A1 Polyspecific organic cation transporters in the liver, kidney, intestine, and other organs are critical for elimination of many endogenous small organic cations as well as a wide array of drugs and environmental toxins. This gene is one of three similar cation transporter genes located in a cluster on chromosome 6. The encoded protein contains twelve putative transmembrane domains and is a plasma integral membrane protein. Two transcript variants encoding two different isoforms have been found for this gene, but only the longer variant encodes a functional transporter. 6580 solute carrier family 22 member 1
ENSG00000141505 ASGR1 This gene encodes a subunit of the asialoglycoprotein receptor. This receptor is a transmembrane protein that plays a critical role in serum glycoprotein homeostasis by mediating the endocytosis and lysosomal degradation of glycoproteins with exposed terminal galactose or N-acetylgalactosamine residues. The asialoglycoprotein receptor may facilitate hepatic infection by multiple viruses including hepatitis B, and is also a target for liver-specific drug delivery. The asialoglycoprotein receptor is a hetero-oligomeric protein composed of major and minor subunits, which are encoded by different genes. The protein encoded by this gene is the more abundant major subunit. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 432 asialoglycoprotein receptor 1
ENSG00000160282 FTCD The protein encoded by this gene is a bifunctional enzyme that channels 1-carbon units from formiminoglutamate, a metabolite of the histidine degradation pathway, to the folate pool. Mutations in this gene are associated with glutamate formiminotransferase deficiency. Alternatively spliced transcript variants have been found for this gene. 10841 formimidoyltransferase cyclodeaminase
ENSG00000047457 CP The protein encoded by this gene is a metalloprotein that binds most of the copper in plasma and is involved in the peroxidation of Fe(II)transferrin to Fe(III) transferrin. Mutations in this gene cause aceruloplasminemia, which results in iron accumulation and tissue damage, and is associated with diabetes and neurologic abnormalities. Two transcript variants, one protein-coding and the other not protein-coding, have been found for this gene. 1356 ceruloplasmin (ferroxidase)
ENSG00000083807 SLC27A5 The protein encoded by this gene is an isozyme of very long-chain acyl-CoA synthetase (VLCS). It is capable of activating very long-chain fatty-acids containing 24- and 26-carbons. It is expressed in liver and associated with endoplasmic reticulum but not with peroxisomes. Its primary role is in fatty acid elongation or complex lipid synthesis rather than in degradation. This gene has a mouse ortholog. 10998 solute carrier family 27 member 5
ENSG00000135094 SDS This gene encodes one of three enzymes that are involved in metabolizing serine and glycine. L-serine dehydratase converts L-serine to pyruvate and ammonia and requires pyridoxal phosphate as a cofactor. The encoded protein can also metabolize threonine to NH4+ and 2-ketobutyrate. The encoded protein is found predominantly in the liver. 10993 serine dehydratase
ENSG00000235910 APOA1-AS NA 104326055 APOA1 antisense RNA
ENSG00000091513 TF This gene encodes a glycoprotein with an approximate molecular weight of 76.5 kDa. It is thought to have been created as a result of an ancient gene duplication event that led to generation of homologous C and N-terminal domains each of which binds one ion of ferric iron. The function of this protein is to transport iron from the intestine, reticuloendothelial system, and liver parenchymal cells to all proliferating cells in the body. This protein may also have a physiologic role as granulocyte/pollen-binding protein (GPBP) involved in the removal of certain organic matter and allergens from serum. 7018 transferrin
ENSG00000106804 C5 This gene encodes a component of the complement system, a part of the innate immune system that plays an important role in inflammation, host homeostasis, and host defense against pathogens. The encoded preproprotein is proteolytically processed to generate multiple protein products, including the C5 alpha chain, C5 beta chain, C5a anaphylatoxin and C5b. The C5 protein is comprised of the C5 alpha and beta chains, which are linked by a disulfide bridge. Cleavage of the alpha chain by a convertase enzyme results in the formation of the C5a anaphylatoxin, which possesses potent spasmogenic and chemotactic activity, and the C5b macromolecular cleavage product, a subunit of the membrane attack complex (MAC). Mutations in this gene cause complement component 5 deficiency, a disease characterized by recurrent bacterial infections. Alternative splicing results in multiple transcript variants. 727 complement component 5
ENSG00000173531 MST1 The protein encoded by this gene contains four kringle domains and a serine protease domain, similar to that found in hepatic growth factor. Despite the presence of the serine protease domain, the encoded protein may not have any proteolytic activity. The receptor for this protein is RON tyrosine kinase, which upon activation stimulates ciliary motility of ciliated epithelial lung cells. This protein is secreted and cleaved to form an alpha chain and a beta chain bridged by disulfide bonds. 4485 macrophage stimulating 1
ENSG00000100197 CYP2D6 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and is known to metabolize as many as 25% of commonly prescribed drugs. Its substrates include antidepressants, antipsychotics, analgesics and antitussives, beta adrenergic blocking agents, antiarrythmics and antiemetics. The gene is highly polymorphic in the human population; certain alleles result in the poor metabolizer phenotype, characterized by a decreased ability to metabolize the enzyme’s substrates. Some individuals with the poor metabolizer phenotype have no functional protein since they carry 2 null alleles whereas in other individuals the gene is absent. This gene can vary in copy number and individuals with the ultrarapid metabolizer phenotype can have 3 or more active copies of the gene. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 1565 cytochrome P450 family 2 subfamily D member 6
ENSG00000025423 HSD17B6 The protein encoded by this gene has both oxidoreductase and epimerase activities and is involved in androgen catabolism. The oxidoreductase activity can convert 3 alpha-adiol to dihydrotestosterone, while the epimerase activity can convert androsterone to epi-androsterone. Both reactions use NAD+ as the preferred cofactor. This gene is a member of the retinol dehydrogenase family. 8630 hydroxysteroid (17-beta) dehydrogenase 6
ENSG00000169738 DCXR The protein encoded by this gene acts as a homotetramer to catalyze diacetyl reductase and L-xylulose reductase reactions. The encoded protein may play a role in the uronate cycle of glucose metabolism and in the cellular osmoregulation in the proximal renal tubules. Defects in this gene are a cause of pentosuria. Two transcript variants encoding different isoforms have been found for this gene. 51181 dicarbonyl/L-xylulose reductase
ENSG00000176919 C8G The protein encoded by this gene belongs to the lipocalin family. It is one of the three subunits that constitutes complement component 8 (C8), which is composed of a disulfide-linked C8 alpha-gamma heterodimer and a non-covalently associated C8 beta chain. C8 participates in the formation of the membrane attack complex (MAC) on bacterial cell membranes. While subunits alpha and beta play a role in complement-mediated bacterial killing, the gamma subunit is not required for the bactericidal activity. 733 complement component 8, gamma polypeptide
ENSG00000160862 AZGP1 NA 563 alpha-2-glycoprotein 1, zinc-binding
ENSG00000179761 PIPOX NA 51268 pipecolic acid and sarcosine oxidase
ENSG00000130707 ASS1 The protein encoded by this gene catalyzes the penultimate step of the arginine biosynthetic pathway. There are approximately 10 to 14 copies of this gene including the pseudogenes scattered across the human genome, among which the one located on chromosome 9 appears to be the only functional gene for argininosuccinate synthetase. Mutations in the chromosome 9 copy of this gene cause citrullinemia. Two transcript variants encoding the same protein have been found for this gene. 445 argininosuccinate synthase 1
ENSG00000130203 APOE The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. 348 apolipoprotein E
ENSG00000002933 TMEM176A NA 55365 transmembrane protein 176A
ENSG00000106565 TMEM176B NA 28959 transmembrane protein 176B
ENSG00000186301 MST1P2 NA ENSG00000186301 macrophage stimulating 1 (hepatocyte growth factor-like) pseudogene 2
ENSG00000023839 ABCC2 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. This protein is expressed in the canalicular (apical) part of the hepatocyte and functions in biliary transport. Substrates include anticancer drugs such as vinblastine; therefore, this protein appears to contribute to drug resistance in mammalian cells. Several different mutations in this gene have been observed in patients with Dubin-Johnson syndrome (DJS), an autosomal recessive disorder characterized by conjugated hyperbilirubinemia. 1244 ATP binding cassette subfamily C member 2
ENSG00000125144 MT1G NA 4495 metallothionein 1G
ENSG00000159403 C1R NA 715 complement C1r subcomponent
ENSG00000171236 LRG1 The leucine-rich repeat (LRR) family of proteins, including LRG1, have been shown to be involved in protein-protein interaction, signal transduction, and cell adhesion and development. LRG1 is expressed during granulocyte differentiation (O’Donnell et al., 2002 [PubMed 12223515]). 116844 leucine-rich alpha-2-glycoprotein 1
ENSG00000198848 CES1 This gene encodes a member of the carboxylesterase large family. The family members are responsible for the hydrolysis or transesterification of various xenobiotics, such as cocaine and heroin, and endogenous substrates with ester, thioester, or amide bonds. They may participate in fatty acyl and cholesterol ester metabolism, and may play a role in the blood-brain barrier system. This enzyme is the major liver enzyme and functions in liver drug clearance. Mutations of this gene cause carboxylesterase 1 deficiency. Three transcript variants encoding three different isoforms have been found for this gene. 1066 carboxylesterase 1
ENSG00000113790 EHHADH The protein encoded by this gene is a bifunctional enzyme and is one of the four enzymes of the peroxisomal beta-oxidation pathway. The N-terminal region of the encoded protein contains enoyl-CoA hydratase activity while the C-terminal region contains 3-hydroxyacyl-CoA dehydrogenase activity. Defects in this gene are a cause of peroxisomal disorders such as Zellweger syndrome. Two transcript variants encoding different isoforms have been found for this gene. 1962 enoyl-CoA, hydratase/3-hydroxyacyl CoA dehydrogenase
ENSG00000009724 MASP2 This gene encodes a member of the peptidase S1 family of serine proteases. The encoded preproprotein is proteolytically processed to generate A and B chains that heterodimerize to form the mature protease. This protease cleaves complement components C2 and C4 in order to generate C3 convertase in the lectin pathway of the complement system. The encoded protease also plays a role in the coagulation cascade through cleavage of prothrombin to form thrombin. Myocardial infarction and acute stroke patients exhibit reduced serum concentrations of the encoded protein. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. 10747 mannan binding lectin serine peptidase 2
ENSG00000159423 ALDH4A1 This protein belongs to the aldehyde dehydrogenase family of proteins. This enzyme is a mitochondrial matrix NAD-dependent dehydrogenase which catalyzes the second step of the proline degradation pathway, converting pyrroline-5-carboxylate to glutamate. Deficiency of this enzyme is associated with type II hyperprolinemia, an autosomal recessive disorder characterized by accumulation of delta-1-pyrroline-5-carboxylate (P5C) and proline. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. 8659 aldehyde dehydrogenase 4 family member A1
ENSG00000118514 ALDH8A1 This protein belongs to the aldehyde dehydrogenases family of proteins. It plays a role in a pathway of 9-cis-retinoic acid biosynthesis in vivo. This enzyme converts 9-cis-retinal into the retinoid X receptor ligand 9-cis-retinoic acid, and has approximately 40-fold higher activity with 9-cis-retinal than with all-trans-retinal. Therefore, it is the first known aldehyde dehydrogenase to show a preference for 9-cis-retinal relative to all-trans-retinal. Three transcript variants encoding distinct protein isoforms have been identified for this gene. 64577 aldehyde dehydrogenase 8 family member A1
ENSG00000113924 HGD This gene encodes the enzyme homogentisate 1,2 dioxygenase. This enzyme is involved in the catabolism of the amino acids tyrosine and phenylalanine. Mutations in this gene are the cause of the autosomal recessive metabolism disorder alkaptonuria. 3081 homogentisate 1,2-dioxygenase
ENSG00000176974 SHMT1 This gene encodes the cytosolic form of serine hydroxymethyltransferase, a pyridoxal phosphate-containing enzyme that catalyzes the reversible conversion of serine and tetrahydrofolate to glycine and 5,10-methylene tetrahydrofolate. This reaction provides one-carbon units for synthesis of methionine, thymidylate, and purines in the cytoplasm. This gene is located within the Smith-Magenis syndrome region on chromosome 17. A pseudogene of this gene is located on the short arm of chromosome 1. Alternative splicing results in multiple transcript variants. 6470 serine hydroxymethyltransferase 1
ENSG00000168237 GLYCTK This locus encodes a member of the glycerate kinase type-2 family. The encoded enzyme catalyzes the phosphorylation of (R)-glycerate and may be involved in serine degradation and fructose metabolism. Decreased activity of the encoded enzyme may be associated with the disease D-glyceric aciduria. Alternatively spliced transcript variants have been described. 132158 glycerate kinase
ENSG00000182326 C1S This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. 716 complement component 1, s subcomponent
ENSG00000127884 ECHS1 The protein encoded by this gene functions in the second step of the mitochondrial fatty acid beta-oxidation pathway. It catalyzes the hydration of 2-trans-enoyl-coenzyme A (CoA) intermediates to L-3-hydroxyacyl-CoAs. The gene product is a member of the hydratase/isomerase superfamily. It localizes to the mitochondrial matrix. Transcript variants utilizing alternative transcription initiation sites have been described in the literature. 1892 enoyl-CoA hydratase, short chain, 1, mitochondrial
ENSG00000000971 CFH This gene is a member of the Regulator of Complement Activation (RCA) gene cluster and encodes a protein with twenty short consensus repeat (SCR) domains. This protein is secreted into the bloodstream and has an essential role in the regulation of complement activation, restricting this innate defense mechanism to microbial infections. Mutations in this gene have been associated with hemolytic-uremic syndrome (HUS) and chronic hypocomplementemic nephropathy. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 3075 complement factor H
ENSG00000196177 ACADSB Short/branched chain acyl-CoA dehydrogenase(ACADSB) is a member of the acyl-CoA dehydrogenase family of enzymes that catalyze the dehydrogenation of acyl-CoA derivatives in the metabolism of fatty acids or branch chained amino acids. Substrate specificity is the primary characteristic used to define members of this gene family. The ACADSB gene product has the greatest activity towards the short branched chain acyl-CoA derivative, (S)-2-methylbutyryl-CoA, but also reacts significantly with other 2-methyl branched chain substrates and with short straight chain acyl-CoAs. The cDNA encodes for a mitochondrial precursor protein which is cleaved upon mitochondrial import and predicted to yield a mature peptide of approximately 43.7-KDa. 36 acyl-CoA dehydrogenase, short/branched chain
ENSG00000239799 ITIH4-AS1 NA 100873993 ITIH4 antisense RNA 1
ENSG00000105697 HAMP The product encoded by this gene is involved in the maintenance of iron homeostasis, and it is necessary for the regulation of iron storage in macrophages, and for intestinal iron absorption. The preproprotein is post-translationally cleaved into mature peptides of 20, 22 and 25 amino acids, and these active peptides are rich in cysteines, which form intramolecular bonds that stabilize their beta-sheet structures. These peptides exhibit antimicrobial activity against bacteria and fungi. Mutations in this gene cause hemochromatosis type 2B, also known as juvenile hemochromatosis, a disease caused by severe iron overload that results in cardiomyopathy, cirrhosis, and endocrine failure. 57817 hepcidin antimicrobial peptide
ENSG00000135929 CYP27A1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This mitochondrial protein oxidizes cholesterol intermediates as part of the bile synthesis pathway. Since the conversion of cholesterol to bile acids is the major route for removing cholesterol from the body, this protein is important for overall cholesterol homeostasis. Mutations in this gene cause cerebrotendinous xanthomatosis, a rare autosomal recessive lipid storage disease. 1593 cytochrome P450 family 27 subfamily A member 1
ENSG00000186480 INSIG1 Oxysterols regulate cholesterol homeostasis through the liver X receptor (LXR)- and sterol regulatory element-binding protein (SREBP)-mediated signaling pathways. This gene is an insulin-induced gene. It encodes an endoplasmic reticulum (ER) membrane protein that plays a critical role in regulating cholesterol concentrations in cells. This protein binds to the sterol-sensing domains of SREBP cleavage-activating protein (SCAP) and HMG CoA reductase, and is essential for the sterol-mediated trafficking of the two proteins. Alternatively spliced transcript variants encoding distinct isoforms have been observed. 3638 insulin induced gene 1
ENSG00000104635 SLC39A14 Zinc is an essential cofactor for hundreds of enzymes. It is involved in protein, nucleic acid, carbohydrate, and lipid metabolism, as well as in the control of gene transcription, growth, development, and differentiation. SLC39A14 belongs to a subfamily of proteins that show structural characteristics of zinc transporters (Taylor and Nicholson, 2003 [PubMed 12659941]). 23516 solute carrier family 39 member 14
ENSG00000164406 LEAP2 This gene encodes a cysteine-rich cationic antimicrobial peptide that is expressed predominantly in the liver. The mature peptide has activity against gram-positive bacteria and yeasts. 116842 liver expressed antimicrobial peptide 2
ENSG00000116171 SCP2 This gene encodes two proteins: sterol carrier protein X (SCPx) and sterol carrier protein 2 (SCP2), as a result of transcription initiation from 2 independently regulated promoters. The transcript initiated from the proximal promoter encodes the longer SCPx protein, and the transcript initiated from the distal promoter encodes the shorter SCP2 protein, with the 2 proteins sharing a common C-terminus. Evidence suggests that the SCPx protein is a peroxisome-associated thiolase that is involved in the oxidation of branched chain fatty acids, while the SCP2 protein is thought to be an intracellular lipid transfer protein. This gene is highly expressed in organs involved in lipid metabolism, and may play a role in Zellweger syndrome, in which cells are deficient in peroxisomes and have impaired bile acid synthesis. Alternative splicing of this gene produces multiple transcript variants, some encoding different isoforms. 6342 sterol carrier protein 2
ENSG00000205702 CYP2D7 NA ENSG00000205702 cytochrome P450 family 2 subfamily D member 7 (gene/pseudogene)
ENSG00000149131 SERPING1 This gene encodes a highly glycosylated plasma protein involved in the regulation of the complement cascade. Its protein inhibits activated C1r and C1s of the first complement component and thus regulates complement activation. Deficiency of this protein is associated with hereditary angioneurotic oedema (HANE). Alternative splicing results in multiple transcript variants encoding the same isoform. 710 serpin family G member 1
ENSG00000117594 HSD11B1 The protein encoded by this gene is a microsomal enzyme that catalyzes the conversion of the stress hormone cortisol to the inactive metabolite cortisone. In addition, the encoded protein can catalyze the reverse reaction, the conversion of cortisone to cortisol. Too much cortisol can lead to central obesity, and a particular variation in this gene has been associated with obesity and insulin resistance in children. Mutations in this gene and H6PD (hexose-6-phosphate dehydrogenase (glucose 1-dehydrogenase)) are the cause of cortisone reductase deficiency. Alternate splicing results in multiple transcript variants encoding the same protein. 3290 hydroxysteroid (11-beta) dehydrogenase 1
ENSG00000138356 AOX1 Aldehyde oxidase produces hydrogen peroxide and, under certain conditions, can catalyze the formation of superoxide. Aldehyde oxidase is a candidate gene for amyotrophic lateral sclerosis. 316 aldehyde oxidase 1
ENSG00000132837 DMGDH This gene encodes an enzyme involved in the catabolism of choline, catalyzing the oxidative demethylation of dimethylglycine to form sarcosine. The enzyme is found as a monomer in the mitochondrial matrix, and uses flavin adenine dinucleotide and folate as cofactors. Mutation in this gene causes dimethylglycine dehydrogenase deficiency, characterized by a fishlike body odor, chronic muscle fatigue, and elevated levels of the muscle form of creatine kinase in serum. Alternative splicing results in multiple transcript variants. 29958 dimethylglycine dehydrogenase
ENSG00000132541 RIDA NA 10247 reactive intermediate imine deaminase A homolog
ENSG00000111275 ALDH2 This protein belongs to the aldehyde dehydrogenase family of proteins. Aldehyde dehydrogenase is the second enzyme of the major oxidative pathway of alcohol metabolism. Two major liver isoforms of aldehyde dehydrogenase, cytosolic and mitochondrial, can be distinguished by their electrophoretic mobilities, kinetic properties, and subcellular localizations. Most Caucasians have two major isozymes, while approximately 50% of Orientals have the cytosolic isozyme but not the mitochondrial isozyme. A remarkably higher frequency of acute alcohol intoxication among Orientals than among Caucasians could be related to the absence of a catalytically active form of the mitochondrial isozyme. The increased exposure to acetaldehyde in individuals with the catalytically inactive form may also confer greater susceptibility to many types of cancer. This gene encodes a mitochondrial isoform, which has a low Km for acetaldehydes, and is localized in mitochondrial matrix. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 217 aldehyde dehydrogenase 2 family (mitochondrial)
ENSG00000204444 APOM The protein encoded by this gene is an apolipoprotein and member of the lipocalin protein family. It is found associated with high density lipoproteins and to a lesser extent with low density lipoproteins and triglyceride-rich lipoproteins. The encoded protein is secreted through the plasma membrane but remains membrane-bound, where it is involved in lipid transport. Alternate splicing results in both coding and non-coding variants of this gene. 55937 apolipoprotein M
ENSG00000170509 HSD17B13 NA 345275 hydroxysteroid (17-beta) dehydrogenase 13
ENSG00000073849 ST6GAL1 This gene encodes a member of glycosyltransferase family 29. The encoded protein is a type II membrane protein that catalyzes the transfer of sialic acid from CMP-sialic acid to galactose-containing substrates. The protein, which is normally found in the Golgi but can be proteolytically processed to a soluble form, is involved in the generation of the cell-surface carbohydrate determinants and differentiation antigens HB-6, CD75, and CD76. This gene has been incorrectly referred to as CD75. Three transcript variants encoding two different isoforms have been described. 6480 ST6 beta-galactosamide alpha-2,6-sialyltranferase 1
ENSG00000167315 ACAA2 The encoded protein catalyzes the last step of the mitochondrial fatty acid beta-oxidation spiral. Unlike most mitochondrial matrix proteins, it contains a non-cleavable amino-terminal targeting signal. 10449 acetyl-CoA acyltransferase 2
ENSG00000139344 AMDHD1 NA 144193 amidohydrolase domain containing 1
ENSG00000115107 STEAP3 This gene encodes a multipass membrane protein that functions as an iron transporter. The encoded protein can reduce both iron (Fe3+) and copper (Cu2+) cations. This protein may mediate downstream responses to p53, including promoting apoptosis. Deficiency in this gene can cause anemia. Alternative splicing results in multiple transcript variants. 55240 STEAP3 metalloreductase
ENSG00000137713 PPP2R1B This gene encodes a constant regulatory subunit of protein phosphatase 2. Protein phosphatase 2 is one of the four major Ser/Thr phosphatases, and it is implicated in the negative control of cell growth and division. It consists of a common heteromeric core enzyme, which is composed of a catalytic subunit and a constant regulatory subunit, that associates with a variety of regulatory subunits. The constant regulatory subunit A serves as a scaffolding molecule to coordinate the assembly of the catalytic subunit and a variable regulatory B subunit. This gene encodes a beta isoform of the constant regulatory subunit A. Mutations in this gene have been associated with some lung and colon cancers. Alternatively spliced transcript variants have been described. 5519 protein phosphatase 2 regulatory subunit A, beta
ENSG00000166741 NNMT N-methylation is one method by which drug and other xenobiotic compounds are metabolized by the liver. This gene encodes the protein responsible for this enzymatic activity which uses S-adenosyl methionine as the methyl donor. 4837 nicotinamide N-methyltransferase
ENSG00000166347 CYB5A The protein encoded by this gene is a membrane-bound cytochrome that reduces ferric hemoglobin (methemoglobin) to ferrous hemoglobin, which is required for stearyl-CoA-desaturase activity. Defects in this gene are a cause of type IV hereditary methemoglobinemia. Three transcript variants encoding different isoforms have been found for this gene. 1528 cytochrome b5 type A
ENSG00000179918 SEPHS2 This gene encodes an enzyme that synthesizes selenophosphate from selenide and ATP. Selenophosphate is the selenium donor used to synthesize selenocysteine, which is co-translationally incorporated into selenoproteins at in-frame UGA codons. Genes encoding selenocysteine contain a stem-loop secondary structure in their 3’ UTR called a selenocysteine insertion sequence (SECIS) element. The protein encoded by this gene contains a selenocysteine residue in its predicted active site. There is a pseudogene for this gene on chromosome 5. 22928 selenophosphate synthetase 2
ENSG00000225756 DBH-AS1 NA 138948 DBH antisense RNA 1
ENSG00000121310 ECHDC2 NA 55268 enoyl-CoA hydratase domain containing 2
ENSG00000139194 RBP5 NA 83758 retinol binding protein 5
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",20,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

GTEx Brain cluster annotations

brain_gom <- get(load("../rdas/gtexv6brain.k6fit.rda"))
topics_theta_brain <- brain_gom$theta
top_features <- ExtractTopFeatures(topics_theta_brain, top_features=100, method="poisson", options="min");

gene_names <- as.vector(as.matrix(read.table("../external_data/GTEX_V6/gene_names_GTEX_V6.txt")))
gene_names <- substring(gene_names,1,15);
xli  <-  gene_names;
gene_list_brain <- do.call(rbind, lapply(1:dim(top_features)[1], function(x) gene_names[top_features[x,]]))

Cluster 1 Annotations

out <- mygene::queryMany(gene_list_brain[1,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol X_id summary name
ENSG00000120885 CLU 1191 The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. clusterin
ENSG00000101405 OXT 5020 This gene encodes a precursor protein that is processed to produce oxytocin and neurophysin I. Oxytocin is a posterior pituitary hormone which is synthesized as an inactive precursor in the hypothalamus along with its carrier protein neurophysin I. Together with neurophysin, it is packaged into neurosecretory vesicles and transported axonally to the nerve endings in the neurohypophysis, where it is either stored or secreted into the bloodstream. The precursor seems to be activated while it is being transported along the axon to the posterior pituitary. This hormone contracts smooth muscle during parturition and lactation. It is also involved in cognition, tolerance, adaptation and complex sexual and maternal behaviour, as well as in the regulation of water excretion and cardiovascular functions. oxytocin/neurophysin I prepropeptide
ENSG00000135821 GLUL 2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. glutamate-ammonia ligase
ENSG00000165795 NDRG2 57447 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. NDRG family member 2
ENSG00000101439 CST3 1471 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. The cystatin locus on chromosome 20 contains the majority of the type 2 cystatin genes and pseudogenes. This gene is located in the cystatin locus and encodes the most abundant extracellular inhibitor of cysteine proteases, which is found in high concentrations in biological fluids and is expressed in virtually all organs of the body. A mutation in this gene has been associated with amyloid angiopathy. Expression of this protein in vascular wall smooth muscle cells is severely reduced in both atherosclerotic and aneurysmal aortic lesions, establishing its role in vascular disease. In addition, this protein has been shown to have an antimicrobial function, inhibiting the replication of herpes simplex virus. Alternative splicing results in multiple transcript variants encoding a single protein. cystatin C
ENSG00000133048 CHI3L1 1116 Chitinases catalyze the hydrolysis of chitin, which is an abundant glycopolymer found in insect exoskeletons and fungal cell walls. The glycoside hydrolase 18 family of chitinases includes eight human family members. This gene encodes a glycoprotein member of the glycosyl hydrolase 18 family. The protein lacks chitinase activity and is secreted by activated macrophages, chondrocytes, neutrophils and synovial cells. The protein is thought to play a role in the process of inflammation and tissue remodeling. chitinase 3 like 1
ENSG00000125148 MT2A 4502 NA metallothionein 2A
ENSG00000079215 SLC1A3 6507 This gene encodes a member of a member of a high affinity glutamate transporter family. This gene functions in the termination of excitatory neurotransmission in central nervous system. Mutations are associated with episodic ataxia, Type 6. Alternative splicing results in multiple transcript variants. solute carrier family 1 member 3
ENSG00000106211 HSPB1 3315 The protein encoded by this gene is induced by environmental stress and developmental changes. The encoded protein is involved in stress resistance and actin organization and translocates from the cytoplasm to the nucleus upon stress induction. Defects in this gene are a cause of Charcot-Marie-Tooth disease type 2F (CMT2F) and distal hereditary motor neuropathy (dHMN). heat shock protein family B (small) member 1
ENSG00000173110 HSPA6 3310 NA heat shock protein family A (Hsp70) member 6
ENSG00000087250 MT3 4504 NA metallothionein 3
ENSG00000152661 GJA1 2697 This gene is a member of the connexin gene family. The encoded protein is a component of gap junctions, which are composed of arrays of intercellular channels that provide a route for the diffusion of low molecular weight materials from cell to cell. The encoded protein is the major protein of gap junctions in the heart that are thought to have a crucial role in the synchronized contraction of the heart and in embryonic development. A related intronless pseudogene has been mapped to chromosome 5. Mutations in this gene have been associated with oculodentodigital dysplasia, autosomal recessive craniometaphyseal dysplasia and heart malformations. gap junction protein alpha 1
ENSG00000130203 APOE 348 The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. apolipoprotein E
ENSG00000168710 AHCYL1 10768 The protein encoded by this gene interacts with inositol 1,4,5-trisphosphate receptor, type 1 and may be involved in the conversion of S-adenosyl-L-homocysteine to L-homocysteine and adenosine. Several transcript variants encoding two different isoforms have been found for this gene. adenosylhomocysteinase like 1
ENSG00000137285 TUBB2B 347733 The protein encoded by this gene is a beta isoform of tubulin, which binds GTP and is a major component of microtubules. This gene is highly similar to TUBB2A and TUBB2C. Defects in this gene are a cause of asymmetric polymicrogyria. tubulin beta 2B class IIb
ENSG00000135916 ITM2C 81618 NA integral membrane protein 2C
ENSG00000112096 LOC100129518 100129518 NA uncharacterized LOC100129518
ENSG00000112096 SOD2 6648 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. superoxide dismutase 2, mitochondrial
ENSG00000110651 CD81 975 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. This protein appears to promote muscle cell fusion and support myotube maintenance. Also it may be involved in signal transduction. This gene is localized in the tumor-suppressor gene region and thus it is a candidate gene for malignancies. Two transcript variants encoding different isoforms have been found for this gene. CD81 molecule
ENSG00000109107 ALDOC 230 This gene encodes a member of the class I fructose-biphosphate aldolase gene family. Expressed specifically in the hippocampus and Purkinje cells of the brain, the encoded protein is a glycolytic enzyme that catalyzes the reversible aldol cleavage of fructose-1,6-biphosphate and fructose 1-phosphate to dihydroxyacetone phosphate and either glyceraldehyde-3-phosphate or glyceraldehyde, respectively. aldolase, fructose-bisphosphate C
ENSG00000135744 AGT 183 The protein encoded by this gene, pre-angiotensinogen or angiotensinogen precursor, is expressed in the liver and is cleaved by the enzyme renin in response to lowered blood pressure. The resulting product, angiotensin I, is then cleaved by angiotensin converting enzyme (ACE) to generate the physiologically active enzyme angiotensin II. The protein is involved in maintaining blood pressure and in the pathogenesis of essential hypertension and preeclampsia. Mutations in this gene are associated with susceptibility to essential hypertension, and can cause renal tubular dysgenesis, a severe disorder of renal tubular development. Defects in this gene have also been associated with non-familial structural atrial fibrillation, and inflammatory bowel disease. angiotensinogen
ENSG00000205336 ADGRG1 9289 This gene encodes a member of the G protein-coupled receptor family and regulates brain cortical patterning. The encoded protein binds specifically to transglutaminase 2, a component of tissue and tumor stroma implicated as an inhibitor of tumor progression. Mutations in this gene are associated with a brain malformation known as bilateral frontoparietal polymicrogyria. Alternative splicing results in multiple transcript variants. adhesion G protein-coupled receptor G1
ENSG00000143772 ITPKB 3707 The protein encoded by this protein regulates inositol phosphate metabolism by phosphorylation of second messenger inositol 1,4,5-trisphosphate to Ins(1,3,4,5)P4. The activity of this encoded protein is responsible for regulating the levels of a large number of inositol polyphosphates that are important in cellular signaling. Both calcium/calmodulin and protein phosphorylation mechanisms control its activity. inositol-trisphosphate 3-kinase B
ENSG00000129244 ATP1B2 482 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 2 subunit. Two transcript variants encoding different isoforms have been found for this gene. ATPase Na+/K+ transporting subunit beta 2
ENSG00000144381 HSPD1 3329 This gene encodes a member of the chaperonin family. The encoded mitochondrial protein may function as a signaling molecule in the innate immune system. This protein is essential for the folding and assembly of newly imported proteins in the mitochondria. This gene is adjacent to a related family member and the region between the 2 genes functions as a bidirectional promoter. Several pseudogenes have been associated with this gene. Two transcript variants encoding the same protein have been identified for this gene. Mutations associated with this gene cause autosomal recessive spastic paraplegia 13. heat shock protein family D (Hsp60) member 1
ENSG00000087086 FTL 2512 This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in this light chain ferritin gene are associated with several neurodegenerative diseases and hyperferritinemia-cataract syndrome. This gene has multiple pseudogenes. ferritin, light polypeptide
ENSG00000099860 GADD45B 4616 This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The genes in this group respond to environmental stresses by mediating activation of the p38/JNK pathway. This activation is mediated via their proteins binding and activating MTK1/MEKK4 kinase, which is an upstream activator of both p38 and JNK MAPKs. The function of these genes or their protein products is involved in the regulation of growth and apoptosis. These genes are regulated by different mechanisms, but they are often coordinately expressed and can function cooperatively in inhibiting cell growth. growth arrest and DNA damage inducible beta
ENSG00000169715 MT1E 4493 NA metallothionein 1E
ENSG00000106624 AEBP1 165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AE binding protein 1
ENSG00000134824 FADS2 9415 The protein encoded by this gene is a member of the fatty acid desaturase (FADS) gene family. Desaturase enzymes regulate unsaturation of fatty acids through the introduction of double bonds between defined carbons of the fatty acyl chain. FADS family members are considered fusion products composed of an N-terminal cytochrome b5-like domain and a C-terminal multiple membrane-spanning desaturase portion, both of which are characterized by conserved histidine motifs. This gene is clustered with family members at 11q12-q13.1; this cluster is thought to have arisen evolutionarily from gene duplication based on its similar exon/intron organization. Alternative splicing results in multiple transcript variants encoding different isoforms. fatty acid desaturase 2
ENSG00000150991 UBC 7316 This gene represents a ubiquitin gene, ubiquitin C. The encoded protein is a polyubiquitin precursor. Conjugation of ubiquitin monomers or polymers can lead to various effects within a cell, depending on the residues to which ubiquitin is conjugated. Ubiquitination has been associated with protein degradation, DNA repair, cell cycle regulation, kinase modification, endocytosis, and regulation of other cell signaling pathways. ubiquitin C
ENSG00000167772 ANGPTL4 51129 This gene encodes a glycosylated, secreted protein containing a C-terminal fibrinogen domain. The encoded protein is induced by peroxisome proliferation activators and functions as a serum hormone that regulates glucose homeostasis, lipid metabolism, and insulin sensitivity. This protein can also act as an apoptosis survival factor for vascular endothelial cells and can prevent metastasis by inhibiting vascular growth and tumor cell invasion. The C-terminal domain may be proteolytically-cleaved from the full-length secreted protein. Decreased expression of this gene has been associated with type 2 diabetes. Alternative splicing results in multiple transcript variants. This gene was previously referred to as ANGPTL2 but has been renamed ANGPTL4. angiopoietin like 4
ENSG00000187193 MT1X 4501 NA metallothionein 1X
ENSG00000151929 BAG3 9531 BAG proteins compete with Hip for binding to the Hsc70/Hsp70 ATPase domain and promote substrate release. All the BAG proteins have an approximately 45-amino acid BAG domain near the C terminus but differ markedly in their N-terminal regions. The protein encoded by this gene contains a WW domain in the N-terminal region and a BAG domain in the C-terminal region. The BAG domains of BAG1, BAG2, and BAG3 interact specifically with the Hsc70 ATPase domain in vitro and in mammalian cells. All 3 proteins bind with high affinity to the ATPase domain of Hsc70 and inhibit its chaperone activity in a Hip-repressible manner. BCL2 associated athanogene 3
ENSG00000117519 CNN3 1266 This gene encodes a protein with a markedly acidic C terminus; the basic N-terminus is highly homologous to the N-terminus of a related gene, CNN1. Members of the CNN gene family all contain similar tandemly repeated motifs. This encoded protein is associated with the cytoskeleton but is not involved in contraction. calponin 3
ENSG00000152137 HSPB8 26353 The protein encoded by this gene belongs to the superfamily of small heat-shock proteins containing a conservative alpha-crystallin domain at the C-terminal part of the molecule. The expression of this gene in induced by estrogen in estrogen receptor-positive breast cancer cells, and this protein also functions as a chaperone in association with Bag3, a stimulator of macroautophagy. Thus, this gene appears to be involved in regulation of cell proliferation, apoptosis, and carcinogenesis, and mutations in this gene have been associated with different neuromuscular diseases, including Charcot-Marie-Tooth disease. heat shock protein family B (small) member 8
ENSG00000159176 CSRP1 1465 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. cysteine and glycine rich protein 1
ENSG00000168309 FAM107A 11170 NA family with sequence similarity 107 member A
ENSG00000124762 CDKN1A 1026 This gene encodes a potent cyclin-dependent kinase inhibitor. The encoded protein binds to and inhibits the activity of cyclin-cyclin-dependent kinase2 or -cyclin-dependent kinase4 complexes, and thus functions as a regulator of cell cycle progression at G1. The expression of this gene is tightly controlled by the tumor suppressor protein p53, through which this protein mediates the p53-dependent cell cycle G1 phase arrest in response to a variety of stress stimuli. This protein can interact with proliferating cell nuclear antigen, a DNA polymerase accessory factor, and plays a regulatory role in S phase DNA replication and DNA damage repair. This protein was reported to be specifically cleaved by CASP3-like caspases, which thus leads to a dramatic activation of cyclin-dependent kinase2, and may be instrumental in the execution of apoptosis following caspase activation. Mice that lack this gene have the ability to regenerate damaged or missing tissue. Multiple alternatively spliced variants have been found for this gene. cyclin-dependent kinase inhibitor 1A
ENSG00000018625 ATP1A2 477 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. ATPase Na+/K+ transporting subunit alpha 2
ENSG00000134531 EMP1 2012 NA epithelial membrane protein 1
ENSG00000168461 RAB31 11031 Small GTP-binding proteins of the RAB family, such as RAB31, play essential roles in vesicle and granule targeting (Bao et al., 2002 [PubMed 11784320]). RAB31, member RAS oncogene family
ENSG00000185650 ZFP36L1 677 This gene is a member of the TIS11 family of early response genes, which are induced by various agonists such as the phorbol ester TPA and the polypeptide mitogen EGF. This gene is well conserved across species and has a promoter that contains motifs seen in other early-response genes. The encoded protein contains a distinguishing putative zinc finger domain with a repeating cys-his motif. This putative nuclear transcription factor most likely functions in regulating the response to growth factors. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ZFP36 ring finger protein-like 1
ENSG00000163346 PBXIP1 57326 The protein encoded by this gene interacts with the PBX1 homeodomain protein, inhibiting its transcriptional activation potential by preventing its binding to DNA. The encoded protein, which is primarily cytosolic but can shuttle to the nucleus, also can interact with estrogen receptors alpha and beta and promote the proliferation of breast cancer, brain tumors, and lung cancer. Several transcript variants encoding different isoforms have been found for this gene. More variants exist, but their full-length natures have yet to be determined. pre-B-cell leukemia homeobox interacting protein 1
ENSG00000111275 ALDH2 217 This protein belongs to the aldehyde dehydrogenase family of proteins. Aldehyde dehydrogenase is the second enzyme of the major oxidative pathway of alcohol metabolism. Two major liver isoforms of aldehyde dehydrogenase, cytosolic and mitochondrial, can be distinguished by their electrophoretic mobilities, kinetic properties, and subcellular localizations. Most Caucasians have two major isozymes, while approximately 50% of Orientals have the cytosolic isozyme but not the mitochondrial isozyme. A remarkably higher frequency of acute alcohol intoxication among Orientals than among Caucasians could be related to the absence of a catalytically active form of the mitochondrial isozyme. The increased exposure to acetaldehyde in individuals with the catalytically inactive form may also confer greater susceptibility to many types of cancer. This gene encodes a mitochondrial isoform, which has a low Km for acetaldehydes, and is localized in mitochondrial matrix. Alternative splicing results in multiple transcript variants encoding distinct isoforms. aldehyde dehydrogenase 2 family (mitochondrial)
ENSG00000135919 SERPINE2 5270 This gene encodes a member of the serpin family of proteins, a group of proteins that inhibit serine proteases. Thrombin, urokinase, plasmin and trypsin are among the proteases that this family member can inhibit. This gene is a susceptibility gene for chronic obstructive pulmonary disease and for emphysema. Alternative splicing results in multiple transcript variants. serpin family E member 2
ENSG00000132692 BCAN 63827 This gene encodes a member of the lectican family of chondroitin sulfate proteoglycans that is specifically expressed in the central nervous system. This protein is developmentally regulated and may function in the formation of the brain extracellular matrix. This protein is highly expressed in gliomas and may promote the growth and cell motility of brain tumor cells. Alternate splicing results in multiple transcript variants. brevican
ENSG00000135404 CD63 967 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. The encoded protein is a cell surface glycoprotein that is known to complex with integrins. It may function as a blood platelet activation marker. Deficiency of this protein is associated with Hermansky-Pudlak syndrome. Also this gene has been associated with tumor progression. Alternative splicing results in multiple transcript variants encoding different protein isoforms. CD63 molecule
ENSG00000143384 MCL1 4170 This gene encodes an anti-apoptotic protein, which is a member of the Bcl-2 family. Alternative splicing results in multiple transcript variants. The longest gene product (isoform 1) enhances cell survival by inhibiting apoptosis while the alternatively spliced shorter gene products (isoform 2 and isoform 3) promote apoptosis and are death-inducing. myeloid cell leukemia 1
ENSG00000182175 RGMA 56963 This gene encodes a member of the repulsive guidance molecule family. The encoded protein is a glycosylphosphatidylinositol-anchored glycoprotein that functions as an axon guidance protein in the developing and adult central nervous system. This protein may also function as a tumor suppressor in some cancers. Alternate splicing results in multiple transcript variants. repulsive guidance molecule family member a
ENSG00000132470 ITGB4 3691 Integrins are heterodimers comprised of alpha and beta subunits, that are noncovalently associated transmembrane glycoprotein receptors. Different combinations of alpha and beta polypeptides form complexes that vary in their ligand-binding specificities. Integrins mediate cell-matrix or cell-cell adhesion, and transduced signals that regulate gene expression and cell growth. This gene encodes the integrin beta 4 subunit, a receptor for the laminins. This subunit tends to associate with alpha 6 subunit and is likely to play a pivotal role in the biology of invasive carcinoma. Mutations in this gene are associated with epidermolysis bullosa with pyloric atresia. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. integrin subunit beta 4
ENSG00000204592 HLA-E 3133 HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. major histocompatibility complex, class I, E
ENSG00000164929 BAALC 79870 This gene was identified by gene expression studies in patients with acute myeloid leukemia (AML). The gene is conserved among mammals and is not found in lower organisms. Tissues that express this gene develop from the neuroectoderm. Multiple alternatively spliced transcript variants that encode different proteins have been described for this gene; however, some of the transcript variants are found only in AML cell lines. brain and acute leukemia, cytoplasmic
ENSG00000125398 SOX9 6662 The protein encoded by this gene recognizes the sequence CCTTGAG along with other members of the HMG-box class DNA-binding proteins. It acts during chondrocyte differentiation and, with steroidogenic factor 1, regulates transcription of the anti-Muellerian hormone (AMH) gene. Deficiencies lead to the skeletal malformation syndrome campomelic dysplasia, frequently with sex reversal. SRY-box 9
ENSG00000168003 SLC3A2 6520 This gene is a member of the solute carrier family and encodes a cell surface, transmembrane protein. The protein exists as the heavy chain of a heterodimer, covalently bound through di-sulfide bonds to one of several possible light chains. The encoded transporter plays a role in regulation of intracellular calcium levels and transports L-type amino acids. Alternatively spliced transcript variants, encoding different isoforms, have been characterized. solute carrier family 3 member 2
ENSG00000026025 VIM 7431 This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. vimentin
ENSG00000168209 DDIT4 54541 NA DNA damage inducible transcript 4
ENSG00000205364 MT1M 4499 This gene encodes a member of the metallothionein superfamily, type 1 family. Metallothioneins have a high content of cysteine residues that bind various heavy metals. These genes are transcriptionally regulated by both heavy metals and glucocorticoids. metallothionein 1M
ENSG00000072952 MRVI1 10335 This gene is similar to a putative mouse tumor suppressor gene (Mrvi1) that is frequently disrupted by mouse AIDS-related virus (MRV). The encoded protein, which is found in the membrane of the endoplasmic reticulum, is similar to Jaw1, a lymphoid-restricted protein whose expression is down-regulated during lymphoid differentiation. This protein is a substrate of cGMP-dependent kinase-1 (PKG1) that can function as a regulator of IP3-induced calcium release. Studies in mouse suggest that MRV integration at Mrvi1 induces myeloid leukemia by altering the expression of a gene important for myeloid cell growth and/or differentiation, and thus this gene may function as a myeloid leukemia tumor suppressor gene. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, and alternative translation start sites, including a non-AUG (CUG) start site, are used. murine retrovirus integration site 1 homolog
ENSG00000170525 PFKFB3 5209 The protein encoded by this gene belongs to a family of bifunctional proteins that are involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes. The encoded protein has a 6-phosphofructo-2-kinase activity that catalyzes the synthesis of fructose-2,6-bisphosphate (F2,6BP), and a fructose-2,6-biphosphatase activity that catalyzes the degradation of F2,6BP. This protein is required for cell cycle progression and prevention of apoptosis. It functions as a regulator of cyclin-dependent kinase 1, linking glucose metabolism to cell proliferation and survival in tumor cells. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3
ENSG00000136205 TNS3 64759 NA tensin 3
ENSG00000182902 SLC25A18 83733 NA solute carrier family 25 member 18
ENSG00000148672 GLUD1 2746 This gene encodes glutamate dehydrogenase, which is a mitochondrial matrix enzyme that catalyzes the oxidative deamination of glutamate to alpha-ketoglutarate and ammonia. This enzyme has an important role in regulating amino acid-induced insulin secretion. It is allosterically activated by ADP and inhibited by GTP and ATP. Activating mutations in this gene are a common cause of congenital hyperinsulinism. Alternative splicing of this gene results in multiple transcript variants. The related glutamate dehydrogenase 2 gene on the human X-chromosome originated from this gene via retrotransposition and encodes a soluble form of glutamate dehydrogenase. Related pseudogenes have been identified on chromosomes 10, 18 and X. glutamate dehydrogenase 1
ENSG00000100979 PLTP 5360 The protein encoded by this gene is one of at least two lipid transfer proteins found in human plasma. The encoded protein transfers phospholipids from triglyceride-rich lipoproteins to high density lipoprotein (HDL). In addition to regulating the size of HDL particles, this protein may be involved in cholesterol metabolism. At least two transcript variants encoding different isoforms have been found for this gene. phospholipid transfer protein
ENSG00000085063 CD59 966 This gene encodes a cell surface glycoprotein that regulates complement-mediated cell lysis, and it is involved in lymphocyte signal transduction. This protein is a potent inhibitor of the complement membrane attack complex, whereby it binds complement C8 and/or C9 during the assembly of this complex, thereby inhibiting the incorporation of multiple copies of C9 into the complex, which is necessary for osmolytic pore formation. This protein also plays a role in signal transduction pathways in the activation of T cells. Mutations in this gene cause CD59 deficiency, a disease resulting in hemolytic anemia and thrombosis, and which causes cerebral infarction. Multiple alternatively spliced transcript variants, which encode the same protein, have been identified for this gene. CD59 molecule
ENSG00000139644 TMBIM6 7009 NA transmembrane BAX inhibitor motif containing 6
ENSG00000107317 PTGDS 5730 The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. prostaglandin D2 synthase
ENSG00000168913 ENHO 375704 NA energy homeostasis associated
ENSG00000008394 MGST1 4257 The MAPEG (Membrane Associated Proteins in Eicosanoid and Glutathione metabolism) family consists of six human proteins, two of which are involved in the production of leukotrienes and prostaglandin E, important mediators of inflammation. Other family members, demonstrating glutathione S-transferase and peroxidase activities, are involved in cellular defense against toxic, carcinogenic, and pharmacologically active electrophilic compounds. This gene encodes a protein that catalyzes the conjugation of glutathione to electrophiles and the reduction of lipid hydroperoxides. This protein is localized to the endoplasmic reticulum and outer mitochondrial membrane where it is thought to protect these membranes from oxidative stress. Several transcript variants, some non-protein coding and some protein coding, have been found for this gene. microsomal glutathione S-transferase 1
ENSG00000100234 TIMP3 7078 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. TIMP metallopeptidase inhibitor 3
ENSG00000080824 HSP90AA1 3320 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. heat shock protein 90kDa alpha family class A member 1
ENSG00000124145 SDC4 6385 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan that functions as a receptor in intracellular signaling. The encoded protein is found as a homodimer and is a member of the syndecan proteoglycan family. This gene is found on chromosome 20, while a pseudogene has been found on chromosome 22. syndecan 4
ENSG00000188643 S100A16 140576 NA S100 calcium binding protein A16
ENSG00000088826 SMOX 54498 Polyamines are ubiquitous polycationic alkylamines which include spermine, spermidine, putrescine, and agmatine. These molecules participate in a broad range of cellular functions which include cell cycle modulation, scavenging reactive oxygen species, and the control of gene expression. These molecules also play important roles in neurotransmission through their regulation of cell-surface receptor activity, involvement in intracellular signalling pathways, and their putative roles as neurotransmitters. This gene encodes an FAD-containing enzyme that catalyzes the oxidation of spermine to spermadine and secondarily produces hydrogen peroxide. Multiple transcript variants encoding different isoenzymes have been identified for this gene, some of which have failed to demonstrate significant oxidase activity on natural polyamine substrates. The characterized isoenzymes have distinctive biochemical characteristics and substrate specificities, suggesting the existence of additional levels of complexity in polyamine catabolism. spermine oxidase
ENSG00000136802 LRRC8A 56262 This gene encodes a protein belonging to the leucine-rich repeat family of proteins, which are involved in diverse biological processes, including cell adhesion, cellular trafficking, and hormone-receptor interactions. This family member is a putative four-pass transmembrane protein that plays a role in B cell development. Defects in this gene cause autosomal dominant non-Bruton type agammaglobulinemia, an immunodeficiency disease resulting from defects in B cell maturation. Multiple alternatively spliced transcript variants, which encode the same protein, have been identified for this gene. leucine-rich repeat containing 8 family member A
ENSG00000225217 HSPA7 ENSG00000225217 NA heat shock protein family A (Hsp70) member 7
ENSG00000183255 PTTG1IP 754 This gene encodes a single-pass type I integral membrane protein, which binds to pituitary tumor-transforming 1 protein (PTTG1), and facilitates translocation of PTTG1 into the nucleus. Coexpression of this protein and PTTG1 induces transcriptional activation of basic fibroblast growth factor. Alternatively spliced transcript variants have been found for this gene. pituitary tumor-transforming 1 interacting protein
ENSG00000113657 DPYSL3 1809 NA dihydropyrimidinase like 3
ENSG00000149257 SERPINH1 871 This gene encodes a member of the serpin superfamily of serine proteinase inhibitors. The encoded protein is localized to the endoplasmic reticulum and plays a role in collagen biosynthesis as a collagen-specific molecular chaperone. Autoantibodies to the encoded protein have been found in patients with rheumatoid arthritis. Expression of this gene may be a marker for cancer, and nucleotide polymorphisms in this gene may be associated with preterm birth caused by preterm premature rupture of membranes. Alternatively spliced transcript variants have been observed for this gene, and a pseudogene of this gene is located on the short arm of chromosome 9. serpin family H member 1
ENSG00000124942 AHNAK 79026 NA AHNAK nucleoprotein
ENSG00000146535 GNA12 2768 NA G protein subunit alpha 12
ENSG00000184557 SOCS3 9021 This gene encodes a member of the STAT-induced STAT inhibitor (SSI), also known as suppressor of cytokine signaling (SOCS), family. SSI family members are cytokine-inducible negative regulators of cytokine signaling. The expression of this gene is induced by various cytokines, including IL6, IL10, and interferon (IFN)-gamma. The protein encoded by this gene can bind to JAK2 kinase, and inhibit the activity of JAK2 kinase. Studies of the mouse counterpart of this gene suggested the roles of this gene in the negative regulation of fetal liver hematopoiesis, and placental development. suppressor of cytokine signaling 3
ENSG00000172270 BSG 682 The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. basigin (Ok blood group)
ENSG00000135245 HILPDA 29923 NA hypoxia inducible lipid droplet associated
ENSG00000161642 ZNF385A 25946 Zinc finger proteins, such as ZNF385A, are regulatory proteins that act as transcription factors, bind single- or double-stranded RNA, or interact with other proteins (Sharma et al., 2004 [PubMed 15527981]). zinc finger protein 385A
ENSG00000184113 CLDN5 7122 This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets. Mutations in this gene have been found in patients with velocardiofacial syndrome. Alternatively spliced transcript variants encoding the same protein have been found for this gene. claudin 5
ENSG00000135926 TMBIM1 64114 NA transmembrane BAX inhibitor motif containing 1
ENSG00000155366 RHOC 389 This gene encodes a member of the Rho family of small GTPases, which cycle between inactive GDP-bound and active GTP-bound states and function as molecular switches in signal transduction cascades. Rho proteins promote reorganization of the actin cytoskeleton and regulate cell shape, attachment, and motility. The protein encoded by this gene is prenylated at its C-terminus, and localizes to the cytoplasm and plasma membrane. It is thought to be important in cell locomotion. Overexpression of this gene is associated with tumor cell proliferation and metastasis. Multiple alternatively spliced variants, encoding the same protein, have been identified. ras homolog family member C
ENSG00000259827 RP11-343H19.2 ENSG00000259827 NA NA
ENSG00000142089 IFITM3 10410 The protein encoded by this gene is an interferon-induced membrane protein that helps confer immunity to influenza A H1N1 virus, West Nile virus, and dengue virus. Two transcript variants, only one of them protein-coding, have been found for this gene. Another variant encoding an N-terminally truncated isoform has been reported, but the full-length nature of this variant has not been determined. interferon induced transmembrane protein 3
ENSG00000117592 PRDX6 9588 The protein encoded by this gene is a member of the thiol-specific antioxidant protein family. This protein is a bifunctional enzyme with two distinct active sites. It is involved in redox regulation of the cell; it can reduce H(2)O(2) and short chain organic, fatty acid, and phospholipid hydroperoxides. It may play a role in the regulation of phospholipid turnover as well as in protection against oxidative injury. peroxiredoxin 6
ENSG00000266964 FXYD1 5348 This gene encodes a member of a family of small membrane proteins that share a 35-amino acid signature sequence domain, beginning with the sequence PFXYD and containing 7 invariant and 6 highly conserved amino acids. The approved human gene nomenclature for the family is FXYD-domain containing ion transport regulator. Mouse FXYD5 has been termed RIC (Related to Ion Channel). FXYD2, also known as the gamma subunit of the Na,K-ATPase, regulates the properties of that enzyme. FXYD1 (phospholemman), FXYD2 (gamma), FXYD3 (MAT-8), FXYD4 (CHIF), and FXYD5 (RIC) have been shown to induce channel activity in experimental expression systems. Transmembrane topology has been established for two family members (FXYD1 and FXYD2), with the N-terminus extracellular and the C-terminus on the cytoplasmic side of the membrane. The protein encoded by this gene is a plasma membrane substrate for several kinases, including protein kinase A, protein kinase C, NIMA kinase, and myotonic dystrophy kinase. It is thought to form an ion channel or regulate ion channel activity. Transcript variants with different 5’ UTR sequences have been described in the literature. FXYD domain containing ion transport regulator 1
ENSG00000206503 HLA-A 3105 HLA-A belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-A alleles have been described. major histocompatibility complex, class I, A
ENSG00000138029 HADHB 3032 This gene encodes the beta subunit of the mitochondrial trifunctional protein, which catalyzes the last three steps of mitochondrial beta-oxidation of long chain fatty acids. The mitochondrial membrane-bound heterocomplex is composed of four alpha and four beta subunits, with the beta subunit catalyzing the 3-ketoacyl-CoA thiolase activity. The encoded protein can also bind RNA and decreases the stability of some mRNAs. The genes of the alpha and beta subunits of the mitochondrial trifunctional protein are located adjacent to each other in the human genome in a head-to-head orientation. Mutations in this gene result in trifunctional protein deficiency. Alternatively spliced transcript variants encoding different isoforms have been described. hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA hydratase (trifunctional protein), beta subunit
ENSG00000132002 DNAJB1 3337 This gene encodes a member of the DnaJ or Hsp40 (heat shock protein 40 kD) family of proteins. DNAJ family members are characterized by a highly conserved amino acid stretch called the ‘J-domain’ and function as one of the two major classes of molecular chaperones involved in a wide range of cellular events, such as protein folding and oligomeric protein complex assembly. The encoded protein is a molecular chaperone that stimulates the ATPase activity of Hsp70 heat-shock proteins in order to promote protein folding and prevent misfolded protein aggregation. Alternative splicing results in multiple transcript variants. DnaJ heat shock protein family (Hsp40) member B1
ENSG00000128016 ZFP36 7538 NA ZFP36 ring finger protein
ENSG00000149485 FADS1 3992 The protein encoded by this gene is a member of the fatty acid desaturase (FADS) gene family. Desaturase enzymes regulate unsaturation of fatty acids through the introduction of double bonds between defined carbons of the fatty acyl chain. FADS family members are considered fusion products composed of an N-terminal cytochrome b5-like domain and a C-terminal multiple membrane-spanning desaturase portion, both of which are characterized by conserved histidine motifs. This gene is clustered with family members FADS1 and FADS2 at 11q12-q13.1; this cluster is thought to have arisen evolutionarily from gene duplication based on its similar exon/intron organization. fatty acid desaturase 1
ENSG00000159423 ALDH4A1 8659 This protein belongs to the aldehyde dehydrogenase family of proteins. This enzyme is a mitochondrial matrix NAD-dependent dehydrogenase which catalyzes the second step of the proline degradation pathway, converting pyrroline-5-carboxylate to glutamate. Deficiency of this enzyme is associated with type II hyperprolinemia, an autosomal recessive disorder characterized by accumulation of delta-1-pyrroline-5-carboxylate (P5C) and proline. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. aldehyde dehydrogenase 4 family member A1
ENSG00000170989 S1PR1 1901 The protein encoded by this gene is structurally similar to G protein-coupled receptors and is highly expressed in endothelial cells. It binds the ligand sphingosine-1-phosphate with high affinity and high specificity, and suggested to be involved in the processes that regulate the differentiation of endothelial cells. Activation of this receptor induces cell-cell adhesion. Alternative splicing results in multiple transcript variants. sphingosine-1-phosphate receptor 1
ENSG00000173511 VEGFB 7423 This gene encodes a member of the PDGF (platelet-derived growth factor)/VEGF (vascular endothelial growth factor) family. The VEGF family members regulate the formation of blood vessels and are involved in endothelial cell physiology. This member is a ligand for VEGFR-1 (vascular endothelial growth factor receptor 1) and NRP-1 (neuropilin-1). Studies in mice showed that this gene was co-expressed with nuclear-encoded mitochondrial genes and the encoded protein specifically controlled endothelial uptake of fatty acids. Alternatively spliced transcript variants encoding distinct isoforms have been identified. vascular endothelial growth factor B
ENSG00000116717 GADD45A 1647 This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The protein encoded by this gene responds to environmental stresses by mediating activation of the p38/JNK pathway via MTK1/MEKK4 kinase. The DNA damage-induced transcription of this gene is mediated by both p53-dependent and -independent mechanisms. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. growth arrest and DNA damage inducible alpha
write.table(as.factor(out$query), paste0("../utilities/gene_names_brain_clus_",1,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 2 Annotations

out <- mygene::queryMany(gene_list_brain[2,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary
ENC1 8507 ENSG00000171617 ectodermal-neural cortex 1 This gene encodes a member of the kelch-related family of actin-binding proteins. The encoded protein plays a role in the oxidative stress response as a regulator of the transcription factor Nrf2, and expression of this gene may play a role in malignant transformation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
TUBA1B 10376 ENSG00000123416 tubulin alpha 1b NA
NCALD 83988 ENSG00000104490 neurocalcin delta This gene encodes a member of the neuronal calcium sensor (NCS) family of calcium-binding proteins. The protein contains an N-terminal myristoylation signal and four EF-hand calcium binding loops. The protein is cytosolic at resting calcium levels; however, elevated intracellular calcium levels induce a conformational change that exposes the myristoyl group, resulting in protein association with membranes and partial co-localization with the perinuclear trans-golgi network. The protein is thought to be a regulator of G protein-coupled receptor signal transduction. Several alternatively spliced variants of this gene have been determined, all of which encode the same protein; additional variants may exist but their biological validity has not been determined.
YWHAH 7533 ENSG00000128245 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein eta This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 99% identical to the mouse, rat and bovine orthologs. This gene contains a 7 bp repeat sequence in its 5’ UTR, and changes in the number of this repeat have been associated with early-onset schizophrenia and psychotic bipolar disorder.
RP11-386G11.10 ENSG00000258017 ENSG00000258017 NA NA
KIF5A 3798 ENSG00000155980 kinesin family member 5A This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10.
NPTXR 23467 ENSG00000221890 neuronal pentraxin receptor This gene encodes a protein similar to the rat neuronal pentraxin receptor. The rat pentraxin receptor is an integral membrane protein that is thought to mediate neuronal uptake of the snake venom toxin, taipoxin, and its transport into the synapses. Studies in rat indicate that translation of this mRNA initiates at a non-AUG (CUG) codon. This may also be true for mouse and human, based on strong sequence conservation amongst these species.
CAMK2N1 55450 ENSG00000162545 calcium/calmodulin dependent protein kinase II inhibitor 1 NA
BASP1 10409 ENSG00000176788 brain abundant membrane attached signal protein 1 This gene encodes a membrane bound protein with several transient phosphorylation sites and PEST motifs. Conservation of proteins with PEST sequences among different species supports their functional significance. PEST sequences typically occur in proteins with high turnover rates. Immunological characteristics of this protein are species specific. This protein also undergoes N-terminal myristoylation. Alternative splicing results in multiple transcript variants that encode the same protein.
PRKAR1B 5575 ENSG00000188191 protein kinase cAMP-dependent type I regulatory subunit beta The protein encoded by this gene is a regulatory subunit of cyclic AMP-dependent protein kinase A (PKA), which is involved in the signaling pathway of the second messenger cAMP. Two regulatory and two catalytic subunits form the PKA holoenzyme, disbands after cAMP binding. The holoenzyme is involved in many cellular events, including ion transport, metabolism, and transcription. Several transcript variants encoding the same protein have been found for this gene.
CHGA 1113 ENSG00000100604 chromogranin A The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene.
MAP1A 4130 ENSG00000166963 microtubule associated protein 1A This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1A heavy chain and LC2 light chain. Expression of this gene is almost exclusively in the brain. Studies of the rat microtubule-associated protein 1A gene suggested a role in early events of spinal cord development.
MEF2C 4208 ENSG00000081189 myocyte enhancer factor 2C This locus encodes a member of the MADS box transcription enhancer factor 2 (MEF2) family of proteins, which play a role in myogenesis. The encoded protein, MEF2 polypeptide C, has both trans-activating and DNA binding activities. This protein may play a role in maintaining the differentiated state of muscle cells. Mutations and deletions at this locus have been associated with severe mental retardation, stereotypic movements, epilepsy, and cerebral malformation. Alternatively spliced transcript variants have been described.
UCHL1 7345 ENSG00000154277 ubiquitin C-terminal hydrolase L1 The protein encoded by this gene belongs to the peptidase C12 family. This enzyme is a thiol protease that hydrolyzes a peptide bond at the C-terminal glycine of ubiquitin. This gene is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease.
YWHAG 7532 ENSG00000170027 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein gamma This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 100% identical to the rat ortholog. It is induced by growth factors in human vascular smooth muscle cells, and is also highly expressed in skeletal and heart muscles, suggesting an important role for this protein in muscle tissue. It has been shown to interact with RAF1 and protein kinase C, proteins involved in various signal transduction pathways.
RTN3 10313 ENSG00000133318 reticulon 3 This gene belongs to the reticulon family of highly conserved genes that are preferentially expressed in neuroendocrine tissues. This family of proteins interact with, and modulate the activity of beta-amyloid converting enzyme 1 (BACE1), and the production of amyloid-beta. An increase in the expression of any reticulon protein substantially reduces the production of amyloid-beta, suggesting that reticulon proteins are negative modulators of BACE1 in cells. Alternatively spliced transcript variants encoding different isoforms have been found for this gene, and pseudogenes of this gene are located on chromosomes 4 and 12.
SLC17A7 57030 ENSG00000104888 solute carrier family 17 member 7 The protein encoded by this gene is a vesicle-bound, sodium-dependent phosphate transporter that is specifically expressed in the neuron-rich regions of the brain. It is preferentially associated with the membranes of synaptic vesicles and functions in glutamate transport. The protein shares 82% identity with the differentiation-associated Na-dependent inorganic phosphate cotransporter and they appear to form a distinct class within the Na+/Pi cotransporter family.
GNAS 2778 ENSG00000087460 GNAS complex locus This locus has a highly complex imprinted expression pattern. It gives rise to maternally, paternally, and biallelically expressed transcripts that are derived from four alternative promoters and 5’ exons. Some transcripts contain a differentially methylated region (DMR) at their 5’ exons, and this DMR is commonly found in imprinted genes and correlates with transcript expression. An antisense transcript is produced from an overlapping locus on the opposite strand. One of the transcripts produced from this locus, and the antisense transcript, are paternally expressed noncoding RNAs, and may regulate imprinting in this region. In addition, one of the transcripts contains a second overlapping ORF, which encodes a structurally unrelated protein - Alex. Alternative splicing of downstream exons is also observed, which results in different forms of the stimulatory G-protein alpha subunit, a key element of the classical signal transduction pathway linking receptor-ligand interactions with the activation of adenylyl cyclase and a variety of cellular reponses. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene result in pseudohypoparathyroidism type 1a, pseudohypoparathyroidism type 1b, Albright hereditary osteodystrophy, pseudopseudohypoparathyroidism, McCune-Albright syndrome, progressive osseus heteroplasia, polyostotic fibrous dysplasia of bone, and some pituitary tumors.
KIF1A 547 ENSG00000130294 kinesin family member 1A The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described.
LMO4 8543 ENSG00000143013 LIM domain only 4 This gene encodes a cysteine-rich protein that contains two LIM domains but lacks a DNA-binding homeodomain. The encoded protein may play a role as a transcriptional regulator or as an oncogene.
RIMS3 9783 ENSG00000117016 regulating synaptic membrane exocytosis 3 NA
VSNL1 7447 ENSG00000163032 visinin like 1 This gene is a member of the visinin/recoverin subfamily of neuronal calcium sensor proteins. The encoded protein is strongly expressed in granule cells of the cerebellum where it associates with membranes in a calcium-dependent manner and modulates intracellular signaling pathways of the central nervous system by directly or indirectly regulating the activity of adenylyl cyclase. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined.
PACSIN1 29993 ENSG00000124507 protein kinase C and casein kinase substrate in neurons 1 NA
NSMF 26012 ENSG00000165802 NMDA receptor synaptonuclear signaling and neuronal migration factor The protein encoded by this gene is involved in guidance of olfactory axon projections and migration of luteinizing hormone-releasing hormone neurons. Defects in this gene are a cause of idiopathic hypogonadotropic hypogonadism (IHH). Several transcript variants encoding different isoforms have been found for this gene.
ATP6V1B2 526 ENSG00000147416 ATPase H+ transporting V1 subunit B2 This gene encodes a component of vacuolar ATPase (V-ATPase), a multisubunit enzyme that mediates acidification of eukaryotic intracellular organelles. V-ATPase dependent organelle acidification is necessary for such intracellular processes as protein sorting, zymogen activation, receptor-mediated endocytosis, and synaptic vesicle proton gradient generation. V-ATPase is composed of a cytosolic V1 domain and a transmembrane V0 domain. The V1 domain consists of three A, three B, and two G subunits, as well as a C, D, E, F, and H subunit. The V1 domain contains the ATP catalytic site. The protein encoded by this gene is one of two V1 domain B subunit isoforms and is the only B isoform highly expressed in osteoclasts.
LIMK1 3984 ENSG00000106683 LIM domain kinase 1 There are approximately 40 known eukaryotic LIM proteins, so named for the LIM domains they contain. LIM domains are highly conserved cysteine-rich structures containing 2 zinc fingers. Although zinc fingers usually function by binding to DNA or RNA, the LIM motif probably mediates protein-protein interactions. LIM kinase-1 and LIM kinase-2 belong to a small subfamily with a unique combination of 2 N-terminal LIM motifs and a C-terminal protein kinase domain. LIMK1 is a serine/threonine kinase that regulates actin polymerization via phosphorylation and inactivation of the actin binding factor cofilin. This protein is ubiquitously expressed during development and plays a role in many cellular processes associated with cytoskeletal structure. This protein also stimulates axon growth and may play a role in brain development. LIMK1 hemizygosity is implicated in the impaired visuospatial constructive cognition of Williams syndrome. Alternative splicing results in multiple transcript variants encoding distinct isoforms.
NCS1 23413 ENSG00000107130 neuronal calcium sensor 1 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene.
VSTM2L 128434 ENSG00000132821 V-set and transmembrane domain containing 2 like NA
RAPGEF4 11069 ENSG00000091428 Rap guanine nucleotide exchange factor 4 NA
TUBB2A 7280 ENSG00000137267 tubulin beta 2A class IIa Microtubules, key participants in processes such as mitosis and intracellular transport, are composed of heterodimers of alpha- and beta-tubulins. The protein encoded by this gene is a beta-tubulin. Defects in this gene are associated with complex cortical dysplasia with other brain malformations-5. Two transcript variants encoding distinct isoforms have been found for this gene.
DGKZ 8525 ENSG00000149091 diacylglycerol kinase zeta The protein encoded by this gene belongs to the eukaryotic diacylglycerol kinase family. It may attenuate protein kinase C activity by regulating diacylglycerol levels in intracellular signaling cascade and signal transduction. Alternative splicing occurs at this locus and multiple transcript variants encoding distinct isoforms have been identified.
NAPB 63908 ENSG00000125814 NSF attachment protein beta NA
CHN1 1123 ENSG00000128656 chimerin 1 This gene encodes GTPase-activating protein for ras-related p21-rac and a phorbol ester receptor. It is predominantly expressed in neurons, and plays an important role in neuronal signal-transduction mechanisms. Mutations in this gene are associated with Duane’s retraction syndrome 2 (DURS2). Alternatively spliced transcript variants encoding different isoforms have been described for this gene.
PGM2L1 283209 ENSG00000165434 phosphoglucomutase 2-like 1 NA
FXYD6 53826 ENSG00000137726 FXYD domain containing ion transport regulator 6 This gene encodes a member of the FXYD family of transmembrane proteins. This particular protein encodes phosphohippolin, which likely affects the activity of Na,K-ATPase. Multiple alternatively spliced transcript variants encoding the same protein have been described. Related pseudogenes have been identified on chromosomes 10 and X. Read-through transcripts have been observed between this locus and the downstream sodium/potassium-transporting ATPase subunit gamma (FXYD2, GeneID 486) locus.
SNX10 29887 ENSG00000086300 sorting nexin 10 This gene encodes a member of the sorting nexin family. Members of this family contain a phox (PX) domain, which is a phosphoinositide binding domain, and are involved in intracellular trafficking. This protein does not contain a coiled coil region, like some family members. This gene may play a role in regulating endosome homeostasis. Alternative splicing results in multiple transcript variants.
KALRN 8997 ENSG00000160145 kalirin, RhoGEF kinase Huntington’s disease (HD), a neurodegenerative disorder characterized by loss of striatal neurons, is caused by an expansion of a polyglutamine tract in the HD protein huntingtin. This gene encodes a protein that interacts with the huntingtin-associated protein 1, which is a huntingtin binding protein that may function in vesicle trafficking. Alternatively spliced transcript variants encoding different isoforms have been described.
BAIAP3 8938 ENSG00000007516 BAI1 associated protein 3 This p53-target gene encodes a brain-specific angiogenesis inhibitor. The protein is a seven-span transmembrane protein and a member of the secretin receptor family. It interacts with the cytoplasmic region of brain-specific angiogenesis inhibitor 1. This protein also contains two C2 domains, which are often found in proteins involved in signal transduction or membrane trafficking. Its expression pattern and similarity to other proteins suggest that it may be involved in synaptic functions. Several transcript variants encoding different isoforms have been found for this gene.
PTER 9317 ENSG00000165983 phosphotriesterase related NA
SERPINI1 5274 ENSG00000163536 serpin family I member 1 This gene encodes a member of the serpin superfamily of serine proteinase inhibitors. The protein is primarily secreted by axons in the brain, and preferentially reacts with and inhibits tissue-type plasminogen activator. It is thought to play a role in the regulation of axonal growth and the development of synaptic plasticity. Mutations in this gene result in familial encephalopathy with neuroserpin inclusion bodies (FENIB), which is a dominantly inherited form of familial encephalopathy and epilepsy characterized by the accumulation of mutant neuroserpin polymers. Multiple alternatively spliced variants, encoding the same protein, have been identified.
ATXN7L3 56970 ENSG00000087152 ataxin 7 like 3 NA
RAB3A 5864 ENSG00000105649 RAB3A, member RAS oncogene family NA
RTN4RL2 349667 ENSG00000186907 reticulon 4 receptor-like 2 NA
EFHD2 79180 ENSG00000142634 EF-hand domain family member D2 NA
AP2M1 1173 ENSG00000161203 adaptor related protein complex 2 mu 1 subunit This gene encodes a subunit of the heterotetrameric coat assembly protein complex 2 (AP2), which belongs to the adaptor complexes medium subunits family. The encoded protein is required for the activity of a vacuolar ATPase, which is responsible for proton pumping occurring in the acidification of endosomes and lysosomes. The encoded protein may also play an important role in regulating the intracellular trafficking and function of CTLA-4 protein. Three transcript variants encoding different isoforms have been found for this gene.
SYTL2 54843 ENSG00000137501 synaptotagmin like 2 The protein encoded by this gene is a synaptotagmin-like protein (SLP) that belongs to a C2 domain-containing protein family. The SLP homology domain (SHD) of this protein has been shown to specifically bind the GTP-bound form of Ras-related protein Rab-27A (RAB27A). This protein plays a role in RAB27A-dependent vesicle trafficking and controls melanosome distribution in the cell periphery. Alternative splicing results in multiple transcript variants encoding distinct isoforms.
IQSEC1 9922 ENSG00000144711 IQ motif and Sec7 domain 1 NA
ALDOA 226 ENSG00000149925 aldolase, fructose-bisphosphate A The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10.
NRCAM 4897 ENSG00000091129 neuronal cell adhesion molecule Cell adhesion molecules (CAMs) are members of the immunoglobulin superfamily. This gene encodes a neuronal cell adhesion molecule with multiple immunoglobulin-like C2-type domains and fibronectin type-III domains. This ankyrin-binding protein is involved in neuron-neuron adhesion and promotes directional signaling during axonal cone growth. This gene is also expressed in non-neural tissues and may play a general role in cell-cell communication via signaling from its intracellular domain to the actin cytoskeleton during directional cell migration. Allelic variants of this gene have been associated with autism and addiction vulnerability. Alternative splicing results in multiple transcript variants encoding different isoforms.
C16orf45 89927 ENSG00000166780 chromosome 16 open reading frame 45 NA
NAP1L5 266812 ENSG00000177432 nucleosome assembly protein 1 like 5 This gene encodes a protein that shares sequence similarity to nucleosome assembly factors, but may be localized to the cytoplasm rather than the nucleus. Expression of this gene is downregulated in hepatocellular carcinomas. This gene is located within a differentially methylated region (DMR) and is imprinted and paternally expressed. There is a related pseudogene on chromosome 4.
KCNAB2 8514 ENSG00000069424 potassium voltage-gated channel subfamily A regulatory beta subunit 2 Voltage-gated potassium (Kv) channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume. Four sequence-related potassium channel genes - shaker, shaw, shab, and shal - have been identified in Drosophila, and each has been shown to have human homolog(s). This gene encodes a member of the potassium channel, voltage-gated, shaker-related subfamily. This member is one of the beta subunits, which are auxiliary proteins associating with functional Kv-alpha subunits. This member alters functional properties of the KCNA4 gene product. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms.
KIF3C 3797 ENSG00000084731 kinesin family member 3C NA
SNCB 6620 ENSG00000074317 synuclein beta This gene encodes a member of a small family of proteins that inhibit phospholipase D2 and may function in neuronal plasticity. The encoded protein is abundant in lesions of patients with Alzheimer disease. A mutation in this gene was found in individuals with dementia with Lewy bodies. Alternative splicing results in multiple transcript variants.
PLD3 23646 ENSG00000105223 phospholipase D family member 3 This gene encodes a member of the phospholipase D (PLD) family of enzymes that catalyze the hydrolysis of membrane phospholipids. The encoded protein is a single-pass type II membrane protein and contains two PLD phosphodiesterase domains. This protein influences processing of amyloid-beta precursor protein. Mutations in this gene are associated with Alzheimer disease risk. Alternatively spliced transcript variants encoding the same protein have been found for this gene.
RUNDC3A 10900 ENSG00000108309 RUN domain containing 3A NA
BRSK1 84446 ENSG00000160469 BR serine/threonine kinase 1 NA
HSP90AB1 3326 ENSG00000096384 heat shock protein 90kDa alpha family class B member 1 This gene encodes a member of the heat shock protein 90 family; these proteins are involved in signal transduction, protein folding and degradation and morphological evolution. This gene encodes the constitutive form of the cytosolic 90 kDa heat-shock protein and is thought to play a role in gastric apoptosis and inflammation. Alternative splicing results in multiple transcript variants. Pseudogenes have been identified on multiple chromosomes.
CALM3 808 ENSG00000160014 calmodulin 3 (phosphorylase kinase, delta) NA
CALM2 805 ENSG00000160014 calmodulin 2 (phosphorylase kinase, delta) This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms.
COL5A2 1290 ENSG00000204262 collagen type V alpha 2 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II.
R3HDM1 23518 ENSG00000048991 R3H domain containing 1 NA
CBX6 23466 ENSG00000183741 chromobox 6 NA
NEFH 4744 ENSG00000100285 neurofilament, heavy polypeptide Neurofilaments are type IV intermediate filament heteropolymers composed of light, medium, and heavy chains. Neurofilaments comprise the axoskeleton and functionally maintain neuronal caliber. They may also play a role in intracellular transport to axons and dendrites. This gene encodes the heavy neurofilament protein. This protein is commonly used as a biomarker of neuronal damage and susceptibility to amyotrophic lateral sclerosis (ALS) has been associated with mutations in this gene.
EHD3 30845 ENSG00000013016 EH domain containing 3 NA
KIF5C 3800 ENSG00000168280 kinesin family member 5C The protein encoded by this gene is a kinesin heavy chain subunit involved in the transport of cargo within the central nervous system. The encoded protein, which acts as a tetramer by associating with another heavy chain and two light chains, interacts with protein kinase CK2. Mutations in this gene have been associated with complex cortical dysplasia with other brain malformations-2. Two transcript variants, one protein-coding and the other non-protein coding, have been found for this gene.
RTN1 6252 ENSG00000139970 reticulon 1 This gene belongs to the family of reticulon encoding genes. Reticulons are associated with the endoplasmic reticulum, and are involved in neuroendocrine secretion or in membrane trafficking in neuroendocrine cells. This gene is considered to be a specific marker for neurological diseases and cancer, and is a potential molecular target for therapy. Alternative splicing results in multiple transcript variants.
SYNGR3 9143 ENSG00000127561 synaptogyrin 3 This gene encodes an integral membrane protein. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it is a synaptic vesicle protein that also interacts with the dopamine transporter. The gene product belongs to the synaptogyrin gene family.
NLK 51701 ENSG00000087095 nemo-like kinase NA
SNCA 6622 ENSG00000145335 synuclein alpha Alpha-synuclein is a member of the synuclein family, which also includes beta- and gamma-synuclein. Synucleins are abundantly expressed in the brain and alpha- and beta-synuclein inhibit phospholipase D2 selectively. SNCA may serve to integrate presynaptic signaling and membrane trafficking. Defects in SNCA have been implicated in the pathogenesis of Parkinson disease. SNCA peptides are a major component of amyloid plaques in the brains of patients with Alzheimer’s disease. Four alternatively spliced transcripts encoding two different isoforms have been identified for this gene.
FABP3 2170 ENSG00000121769 fatty acid binding protein 3 The intracellular fatty acid-binding proteins (FABPs) belongs to a multigene family. FABPs are divided into at least three distinct types, namely the hepatic-, intestinal- and cardiac-type. They form 14-15 kDa proteins and are thought to participate in the uptake, intracellular metabolism and/or transport of long-chain fatty acids. They may also be responsible in the modulation of cell growth and proliferation. Fatty acid-binding protein 3 gene contains four exons and its function is to arrest growth of mammary epithelial cells. This gene is a candidate tumor suppressor gene for human breast cancer. Alternative splicing results in multiple transcript variants.
RP11-728F11.4 ENSG00000254528 ENSG00000254528 NA NA
PI4KA 5297 ENSG00000241973 phosphatidylinositol 4-kinase alpha This gene encodes a phosphatidylinositol (PI) 4-kinase which catalyzes the first committed step in the biosynthesis of phosphatidylinositol 4,5-bisphosphate. The mammalian PI 4-kinases have been classified into two types, II and III, based on their molecular mass, and modulation by detergent and adenosine. The protein encoded by this gene is a type III enzyme that is not inhibited by adenosine.
RNF157 114804 ENSG00000141576 ring finger protein 157 NA
ACOT7 11332 ENSG00000097021 acyl-CoA thioesterase 7 This gene encodes a member of the acyl coenzyme family. The encoded protein hydrolyzes the CoA thioester of palmitoyl-CoA and other long-chain fatty acids. Decreased expression of this gene may be associated with mesial temporal lobe epilepsy. Alternatively spliced transcript variants encoding distinct isoforms with different subcellular locations have been characterized.
DBNDD1 79007 ENSG00000003249 dysbindin (dystrobrevin binding protein 1) domain containing 1 NA
FBXO41 150726 ENSG00000163013 F-box protein 41 This gene encodes a member of the F-box protein family, which is characterized by an approximately 40 amino acid motif, the F-box. F-box proteins constitute one of the four subunits of the SCF ubiquitin protein ligase complex that plays a role in phosphorylation-dependent ubiquitination. F-box proteins are divided into three classes depending on the interaction substrate domain each contains in addition to the F-box motif: FBXW proteins contain WD-40 domains, FBXL proteins contain leucine-rich repeats, and FBXO proteins contain either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the FBXO class.
ATP6V1A 523 ENSG00000114573 ATPase H+ transporting V1 subunit A This gene encodes a component of vacuolar ATPase (V-ATPase), a multisubunit enzyme that mediates acidification of eukaryotic intracellular organelles. V-ATPase dependent organelle acidification is necessary for such intracellular processes as protein sorting, zymogen activation, receptor-mediated endocytosis, and synaptic vesicle proton gradient generation. V-ATPase is composed of a cytosolic V1 domain and a transmembrane V0 domain. The V1 domain consists of three A and three B subunits, two G subunits plus the C, D, E, F, and H subunits. The V1 domain contains the ATP catalytic site. The V0 domain consists of five different subunits: a, c, c’, c’, and d. Additional isoforms of many of the V1 and V0 subunit proteins are encoded by multiple genes or alternatively spliced transcript variants. This encoded protein is one of two V1 domain A subunit isoforms and is found in all tissues. Transcript variants derived from alternative polyadenylation exist.
SCAMP5 192683 ENSG00000198794 secretory carrier membrane protein 5 NA
PFN2 5217 ENSG00000070087 profilin 2 The protein encoded by this gene is a ubiquitous actin monomer-binding protein belonging to the profilin family. It is thought to regulate actin polymerization in response to extracellular signals. There are two alternatively spliced transcript variants encoding different isoforms described for this gene.
RB1CC1 9821 ENSG00000023287 RB1 inducible coiled-coil 1 The protein encoded by this gene interacts with signaling pathways to coordinately regulate cell growth, cell proliferation, apoptosis, autophagy, and cell migration. This tumor suppressor also enhances retinoblastoma 1 gene expression in cancer cells. Alternative splicing results in multiple transcript variants encoding distinct isoforms.
MOAP1 64112 ENSG00000165943 modulator of apoptosis 1 The protein encoded by this gene was identified by its interaction with apoptosis regulator BAX protein. This protein contains a Bcl-2 homology 3 (BH3)-like motif, which is required for the association with BAX. When overexpressed, this gene has been shown to mediate caspase-dependent apoptosis.
ATP6V1C1 528 ENSG00000155097 ATPase H+ transporting V1 subunit C1 This gene encodes a component of vacuolar ATPase (V-ATPase), a multisubunit enzyme that mediates acidification of intracellular compartments of eukaryotic cells. V-ATPase dependent acidification is necessary for such intracellular processes as protein sorting, zymogen activation, receptor-mediated endocytosis, and synaptic vesicle proton gradient generation. V-ATPase is composed of a cytosolic V1 domain and a transmembrane V0 domain. The V1 domain consists of three A and three B subunits, two G subunits plus the C, D, E, F, and H subunits. The V1 domain contains the ATP catalytic site. The V0 domain consists of five different subunits: a, c, c’, c’’, and d. Additional isoforms of many of the V1 and V0 subunit proteins are encoded by multiple genes or alternatively spliced transcript variants. This gene is one of two genes that encode the V1 domain C subunit proteins and is found ubiquitously. This C subunit is analogous but not homologous to gamma subunit of F-ATPases. Previously, this gene was designated ATP6D.
STX1A 6804 ENSG00000106089 syntaxin 1A This gene encodes a member of the syntaxin superfamily. Syntaxins are nervous system-specific proteins implicated in the docking of synaptic vesicles with the presynaptic plasma membrane. Syntaxins possess a single C-terminal transmembrane domain, a SNARE [Soluble NSF (N-ethylmaleimide-sensitive fusion protein)-Attachment protein REceptor] domain (known as H3), and an N-terminal regulatory domain (Habc). Syntaxins bind synaptotagmin in a calcium-dependent fashion and interact with voltage dependent calcium and potassium channels via the C-terminal H3 domain. This gene product is a key molecule in ion channel regulation and synaptic exocytosis. Alternatively spliced transcript variants encoding different isoforms have been found for this gene.
SYNPO 11346 ENSG00000171992 synaptopodin Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]).
NT5DC3 51559 ENSG00000111696 5’-nucleotidase domain containing 3 NA
GNB1 2782 ENSG00000078369 G protein subunit beta 1 Heterotrimeric guanine nucleotide-binding proteins (G proteins), which integrate signals between receptors and effector proteins, are composed of an alpha, a beta, and a gamma subunit. These subunits are encoded by families of related genes. This gene encodes a beta subunit. Beta subunits are important regulators of alpha subunits, as well as of certain signal transduction receptors and effectors. Alternative splicing results in multiple transcript variants.
EEF1A2 1917 ENSG00000101210 eukaryotic translation elongation factor 1 alpha 2 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer.
ATP13A2 23400 ENSG00000159363 ATPase 13A2 This gene encodes a member of the P5 subfamily of ATPases which transports inorganic cations as well as other substrates. Mutations in this gene are associated with Kufor-Rakeb syndrome (KRS), also referred to as Parkinson disease 9. Multiple transcript variants encoding different isoforms have been found for this gene.
C1orf216 127703 ENSG00000142686 chromosome 1 open reading frame 216 NA
KIF21B 23046 ENSG00000116852 kinesin family member 21B This gene encodes a member of the kinesin superfamily. Kinesins are ATP-dependent microtubule-based motor proteins that are involved in the intracellular transport of membranous organelles. Single nucleotide polymorphisms in this gene are associated with inflammatory bowel disease and multiple sclerosis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
PLXNA1 5361 ENSG00000114554 plexin A1 NA
PRDM8 56978 ENSG00000152784 PR domain 8 This gene encodes a protein that belongs to a conserved family of histone methyltransferases that predominantly act as negative regulators of transcription. The encoded protein contains an N-terminal Su(var)3-9, Enhancer-of-zeste, and Trithorax (SET) domain and a double zinc-finger domain. Knockout of this gene in mouse results in mistargeting by neurons of the dorsal telencephalon, abnormal itch-like behavior, and impaired differentiation of rod bipolar cells. In humans, the protein has been shown to interact with the phosphatase laforin and the ubiquitin ligase malin, which regulate glycogen construction in the cytoplasm. Alternative splicing results in multiple transcript variants.
STMN3 50861 ENSG00000197457 stathmin 3 This gene encodes a protein which is a member of the stathmin protein family. Members of this protein family form a complex with tubulins at a ratio of 2 tubulins for each stathmin protein. Microtubules require the ordered assembly of alpha- and beta-tubulins, and formation of a complex with stathmin disrupts microtubule formation and function. A pseudogene of this gene is located on chromosome 22. Alternative splicing results in multiple transcript variants.
SATB1 6304 ENSG00000182568 SATB homeobox 1 This gene encodes a matrix protein which binds nuclear matrix and scaffold-associating DNAs through a unique nuclear architecture. The protein recruits chromatin-remodeling factors in order to regulate chromatin structure and gene expression. Multiple transcript variants encoding different isoforms have been found for this gene.
STMN1 3925 ENSG00000117632 stathmin 1 This gene belongs to the stathmin family of genes. It encodes a ubiquitous cytosolic phosphoprotein proposed to function as an intracellular relay integrating regulatory signals of the cellular environment. The encoded protein is involved in the regulation of the microtubule filament system by destabilizing microtubules. It prevents assembly and promotes disassembly of microtubules. Multiple transcript variants encoding different isoforms have been found for this gene.
ATP1B1 481 ENSG00000143153 ATPase Na+/K+ transporting subunit beta 1 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 1 subunit. Alternatively spliced transcript variants encoding different isoforms have been described, but their biological validity is not known.
TMEM59L 25789 ENSG00000105696 transmembrane protein 59 like This gene encodes a predicted type-I membrane glycoprotein. The encoded protein may play a role in functioning of the central nervous system.
DGKZP1 ENSG00000179611 ENSG00000179611 diacylglycerol kinase, zeta pseudogene 1 NA
SMYD2 56950 ENSG00000143499 SET and MYND domain containing 2 SET domain-containing proteins, such as SMYD2, catalyze lysine methylation (Brown et al., 2006 [PubMed 16805913]).
ADRBK2 157 ENSG00000100077 adrenergic, beta, receptor kinase 2 The beta-adrenergic receptor kinase specifically phosphorylates the agonist-occupied form of the beta-adrenergic and related G protein-coupled receptors. Overall, the beta adrenergic receptor kinase 2 has 85% amino acid similarity with beta adrenergic receptor kinase 1, with the protein kinase catalytic domain having 95% similarity. These data suggest the existence of a family of receptor kinases which may serve broadly to regulate receptor function.
write.table(as.factor(out$query), paste0("../utilities/gene_names_brain_clus_",2,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 3 Annotations

out <- mygene::queryMany(gene_list_brain[3,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id symbol query summary notfound
MAM domain containing glycosylphosphatidylinositol anchor 1 266727 MDGA1 ENSG00000112139 NA NA
polycystin 1, transient receptor potential channel interacting 5310 PKD1 ENSG00000008710 This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. NA
cerebellin 3 precursor 643866 CBLN3 ENSG00000139899 Members of the precerebellin family, such as CBLN3, contain a cerebellin motif (see CBLN1; MIM 600432) and a C-terminal C1q signature domain (see MIM 120550) that mediates trimeric assembly of atypical collagen complexes. However, precerebellins do not contain a collagen motif, suggesting that they are not conventional components of the extracellular matrix (Pang et al., 2000 [PubMed 10964938]). NA
collagen type XXVII alpha 1 85301 COL27A1 ENSG00000196739 This gene encodes a member of the fibrillar collagen family, and plays a role during the calcification of cartilage and the transition of cartilage to bone. The encoded protein product is a preproprotein. It includes an N-terminal signal peptide, which is followed by an N-terminal propetide, mature peptide and a C-terminal propeptide. The N-terminal propeptide contains thrombospondin N-terminal-like and laminin G-like domains. The mature peptide is a major triple-helical region. The C-terminal propeptide, also known as COLFI domain, plays crucial roles in tissue growth and repair. Mutations in this gene cause Steel syndrome. Alternatively spliced transcript variants have been found, but the full-length nature of some variants has not been determined. NA
actin binding LIM protein 1 3983 ABLIM1 ENSG00000099204 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
transducin like enhancer of split 2 7089 TLE2 ENSG00000065717 NA NA
suppressor of glucose, autophagy associated 1 140710 SOGA1 ENSG00000149639 NA NA
T-cell lymphoma invasion and metastasis 1 7074 TIAM1 ENSG00000156299 NA NA
PTPRF interacting protein alpha 4 8497 PPFIA4 ENSG00000143847 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. NA
chromogranin B 1114 CHGB ENSG00000089199 This gene encodes a tyrosine-sulfated secretory protein abundant in peptidergic endocrine cells and neurons. This protein may serve as a precursor for regulatory peptides. NA
tetraspanin 9 10867 TSPAN9 ENSG00000011105 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. Alternatively spliced transcripts encoding the same protein have been identified. NA
NA NA NA ENSG00000163486 NA TRUE
LUC7 like 3 pre-mRNA splicing factor 51747 LUC7L3 ENSG00000108848 This gene encodes a protein with an N-terminal half that contains cysteine/histidine motifs and leucine zipper-like repeats, and the C-terminal half is rich in arginine and glutamate residues (RE domain) and arginine and serine residues (RS domain). This protein localizes with a speckled pattern in the nucleus, and could be involved in the formation of splicesome via the RE and RS domains. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. NA
coiled-coil domain containing 88B 283234 CCDC88B ENSG00000168071 This gene encodes a member of the hook-related protein family. Members of this family are characterized by an N-terminal potential microtubule binding domain, a central coiled-coiled and a C-terminal Hook-related domain. The encoded protein may be involved in linking organelles to microtubules. NA
ALS2, alsin Rho guanine nucleotide exchange factor 57679 ALS2 ENSG00000003393 The protein encoded by this gene contains an ATS1/RCC1-like domain, a RhoGEF domain, and a vacuolar protein sorting 9 (VPS9) domain, all of which are guanine-nucleotide exchange factors that activate members of the Ras superfamily of GTPases. The protein functions as a guanine nucleotide exchange factor for the small GTPase RAB5. The protein localizes with RAB5 on early endosomal compartments, and functions as a modulator for endosomal dynamics. Mutations in this gene result in several forms of juvenile lateral sclerosis and infantile-onset ascending spastic paralysis. Multiple transcript variants encoding different isoforms have been found for this gene. NA
protein kinase (cAMP-dependent, catalytic) inhibitor beta 5570 PKIB ENSG00000135549 This gene encodes a member of the cAMP-dependent protein kinase inhibitor family. The encoded protein may play a role in the protein kinase A (PKA) pathway by interacting with the catalytic subunit of PKA, and overexpression of this gene may play a role in prostate cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
calcium/calmodulin-dependent protein kinase kinase 2 10645 CAMKK2 ENSG00000110931 The product of this gene belongs to the Serine/Threonine protein kinase family, and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. The major isoform of this gene plays a role in the calcium/calmodulin-dependent (CaM) kinase cascade by phosphorylating the downstream kinases CaMK1 and CaMK4. Protein products of this gene also phosphorylate AMP-activated protein kinase (AMPK). This gene has its strongest expression in the brain and influences signalling cascades involved with learning and memory, neuronal differentiation and migration, neurite outgrowth, and synapse formation. Alternative splicing results in multiple transcript variants encoding distinct isoforms. The identified isoforms differ in their ability to undergo autophosphorylation and to phosphorylate downstream kinases. NA
polymerase (DNA) epsilon, catalytic subunit 5426 POLE ENSG00000177084 This gene encodes the catalytic subunit of DNA polymerase epsilon. The enzyme is involved in DNA repair and chromosomal DNA replication. Mutations in this gene have been associated with colorectal cancer 12 and facial dysmorphism, immunodeficiency, livedo, and short stature. NA
serine/arginine repetitive matrix 2 23524 SRRM2 ENSG00000167978 NA NA
CDC like kinase 4 57396 CLK4 ENSG00000113240 The protein encoded by this gene belongs to the CDC2-like protein kinase (CLK) family. This protein kinase can interact with and phosphorylate the serine- and arginine-rich (SR) proteins, which are known to play an important role in the formation of spliceosomes, and thus may be involved in the regulation of alternative splicing. Studies in the Israeli sand rat Psammomys obesus suggested that the ubiquitin-like 5 (UBL5/BEACON), a highly conserved ubiquitin-like protein, may interact with and regulate the activity of this kinase. Multiple alternatively spliced transcript variants have been observed, but the full-length natures of which have not yet been determined. NA
nischarin 11188 NISCH ENSG00000010322 This gene encodes a nonadrenergic imidazoline-1 receptor protein that localizes to the cytosol and anchors to the inner layer of the plasma membrane. The orthologous mouse protein has been shown to influence cytoskeletal organization and cell migration by binding to alpha-5-beta-1 integrin. In humans, this protein has been shown to bind to the adapter insulin receptor substrate 4 (IRS4) to mediate translocation of alpha-5 integrin from the cell membrane to endosomes. Expression of this protein was reduced in human breast cancers while its overexpression reduced tumor growth and metastasis; possibly by limiting the expression of alpha-5 integrin. In human cardiac tissue, this gene was found to affect cell growth and death while in neural tissue it affected neuronal growth and differentiation. Alternative splicing results in multiple transcript variants encoding differerent isoforms. Some isoforms lack the expected C-terminal domains of a functional imidazoline receptor. NA
zinc finger and BTB domain containing 18 10472 ZBTB18 ENSG00000179456 This gene encodes a C2H2-type zinc finger protein which acts a transcriptional repressor of genes involved in neuronal development. The encoded protein recognizes a specific sequence motif and recruits components of chromatin to target genes. Alternative splicing results in multiple transcript variants. NA
NA ENSG00000271795 CTC-251D13.1 ENSG00000271795 NA NA
MCF.2 cell line derived transforming sequence like 23263 MCF2L ENSG00000126217 This gene encodes a guanine nucleotide exchange factor that interacts specifically with the GTP-bound Rac1 and plays a role in the Rho/Rac signaling pathways. A variant in this gene was associated with osteoarthritis. Alternative splicing results in multiple transcript variants. NA
golgin A8 family member A 23015 GOLGA8A ENSG00000175265 The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. NA
small nuclear ribonucleoprotein U1 subunit 70 6625 SNRNP70 ENSG00000104852 NA NA
protein disulfide isomerase family A member 2 64714 PDIA2 ENSG00000185615 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). NA
microtubule associated monooxygenase, calponin and LIM domain containing 2 9645 MICAL2 ENSG00000133816 NA NA
neuronal regeneration related protein 9315 NREP ENSG00000134986 NA NA
SKI proto-oncogene 6497 SKI ENSG00000157933 This gene encodes the nuclear protooncogene protein homolog of avian sarcoma viral (v-ski) oncogene. It functions as a repressor of TGF-beta signaling, and may play a role in neural tube development and muscle differentiation. NA
transmembrane protein 178A 130733 TMEM178A ENSG00000152154 NA NA
tubulin beta 4A class IVa 10382 TUBB4A ENSG00000104833 This gene encodes a member of the beta tubulin family. Beta tubulins are one of two core protein families (alpha and beta tubulins) that heterodimerize and assemble to form microtubules. Mutations in this gene cause hypomyelinating leukodystrophy-6 and autosomal dominant torsion dystonia-4. Alternate splicing results in multiple transcript variants encoding different isoforms. A pseudogene of this gene is found on chromosome X. NA
Ca2+ dependent secretion activator 2 93664 CADPS2 ENSG00000081803 This gene encodes a member of the calcium-dependent activator of secretion (CAPS) protein family, which are calcium binding proteins that regulate the exocytosis of synaptic and dense-core vesicles in neurons and neuroendocrine cells. Mutations in this gene may contribute to autism susceptibility. Multiple transcript variants encoding different isoforms have been found for this gene. NA
tumor necrosis factor receptor superfamily member 25 8718 TNFRSF25 ENSG00000215788 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is expressed preferentially in the tissues enriched in lymphocytes, and it may play a role in regulating lymphocyte homeostasis. This receptor has been shown to stimulate NF-kappa B activity and regulate cell apoptosis. The signal transduction of this receptor is mediated by various death domain containing adaptor proteins. Knockout studies in mice suggested the role of this gene in the removal of self-reactive T cells in the thymus. Multiple alternatively spliced transcript variants of this gene encoding distinct isoforms have been reported, most of which are potentially secreted molecules. The alternative splicing of this gene in B and T cells encounters a programmed change upon T-cell activation, which predominantly produces full-length, membrane bound isoforms, and is thought to be involved in controlling lymphocyte proliferation induced by T-cell activation. NA
RAB37, member RAS oncogene family 326624 RAB37 ENSG00000172794 Rab proteins are low molecular mass GTPases that are critical regulators of vesicle trafficking. For additional background information on Rab proteins, see MIM 179508. NA
ankyrin 1 286 ANK1 ENSG00000029534 Ankyrins are a family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton and play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Multiple isoforms of ankyrin with different affinities for various target proteins are expressed in a tissue-specific, developmentally regulated manner. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. Ankyrin 1, the prototype of this family, was first discovered in the erythrocytes, but since has also been found in brain and muscles. Mutations in erythrocytic ankyrin 1 have been associated in approximately half of all patients with hereditary spherocytosis. Complex patterns of alternative splicing in the regulatory domain, giving rise to different isoforms of ankyrin 1 have been described. Truncated muscle-specific isoforms of ankyrin 1 resulting from usage of an alternate promoter have also been identified. NA
mitogen-activated protein kinase binding protein 1 23005 MAPKBP1 ENSG00000137802 NA NA
laminin subunit alpha 5 3911 LAMA5 ENSG00000130702 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). NA
interleukin 16 3603 IL16 ENSG00000172349 The protein encoded by this gene is a pleiotropic cytokine that functions as a chemoattractant, a modulator of T cell activation, and an inhibitor of HIV replication. The signaling process of this cytokine is mediated by CD4. The product of this gene undergoes proteolytic processing, which is found to yield two functional proteins. The cytokine function is exclusively attributed to the secreted C-terminal peptide, while the N-terminal product may play a role in cell cycle control. Caspase 3 is reported to be involved in the proteolytic processing of this protein. Alternate splicing results in multiple transcript variants. NA
nuclear pore complex interacting protein family member A1 9284 NPIPA1 ENSG00000183426 NA NA
nuclear pore complex interacting protein family member A2 642799 NPIPA2 ENSG00000183426 NA NA
NPIP-like protein 1 102724993 LOC102724993 ENSG00000183426 NA NA
nuclear pore complex interacting protein family member A3 642778 NPIPA3 ENSG00000183426 NA NA
nuclear pore complex interacting protein family member A7 101059938 NPIPA7 ENSG00000183426 NA NA
nuclear pore complex interacting protein family member A8 101059953 NPIPA8 ENSG00000183426 NA NA
DAB2 interacting protein 153090 DAB2IP ENSG00000136848 DAB2IP is a Ras (MIM 190020) GTPase-activating protein (GAP) that acts as a tumor suppressor. The DAB2IP gene is inactivated by methylation in prostate and breast cancers (Yano et al., 2005 [PubMed 15386433]). NA
PATJ, crumbs cell polarity complex component 10207 PATJ ENSG00000132849 This gene encodes a protein with multiple PDZ domains. PDZ domains mediate protein-protein interactions, and proteins with multiple PDZ domains often organize multimeric complexes at the plasma membrane. This protein localizes to tight junctions and to the apical membrane of epithelial cells. A similar protein in Drosophila is a scaffolding protein which tethers several members of a multimeric signaling complex in photoreceptors. NA
collagen type VI alpha 1 1291 COL6A1 ENSG00000142156 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. NA
microtubule associated monooxygenase, calponin and LIM domain containing 3 57553 MICAL3 ENSG00000243156 NA NA
Rho guanine nucleotide exchange factor 10 like 55160 ARHGEF10L ENSG00000074964 ARHGEF10L is a member of the RhoGEF family of guanine nucleotide exchange factors (GEFs) that activate Rho GTPases (Winkler et al., 2005 [PubMed 16112081]). NA
potassium voltage-gated channel subfamily J member 12 3768 KCNJ12 ENSG00000184185 This gene encodes an inwardly rectifying K+ channel which may be blocked by divalent cations. This protein is thought to be one of multiple inwardly rectifying channels which contribute to the cardiac inward rectifier current (IK1). The gene is located within the Smith-Magenis syndrome region on chromosome 17. NA
NA ENSG00000272505 RP11-981G7.6 ENSG00000272505 NA NA
GRAM domain containing 1B 57476 GRAMD1B ENSG00000023171 NA NA
MAM domain containing 4 158056 MAMDC4 ENSG00000177943 NA NA
cyclin L2 81669 CCNL2 ENSG00000221978 The protein encoded by this gene belongs to the cyclin family. Through its interaction with several proteins, such as RNA polymerase II, splicing factors, and cyclin-dependent kinases, this protein functions as a regulator of the pre-mRNA splicing process, as well as in inducing apoptosis by modulating the expression of apoptotic and antiapoptotic proteins. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
plexin B2 23654 PLXNB2 ENSG00000196576 Members of the B class of plexins, such as PLXNB2 are transmembrane receptors that participate in axon guidance and cell migration in response to semaphorins (Perrot et al. (2002) [PubMed 12183458]). NA
pleckstrin homology and RhoGEF domain containing G5 57449 PLEKHG5 ENSG00000171680 This gene encodes a protein that activates the nuclear factor kappa B (NFKB1) signaling pathway. Mutations in this gene are associated with autosomal recessive distal spinal muscular atrophy. Multiple transcript variants encoding different isoforms have been found for this gene. NA
serine/threonine kinase 10 6793 STK10 ENSG00000072786 This gene encodes a member of the Ste20 family of serine/threonine protein kinases, and is similar to several known polo-like kinase kinases. The protein can associate with and phosphorylate polo-like kinase 1, and overexpression of a kinase-dead version of the protein interferes with normal cell cycle progression. The kinase can also negatively regulate interleukin 2 expression in T-cells via the mitogen activated protein kinase kinase 1 pathway. NA
LIM domain and actin binding 1 51474 LIMA1 ENSG00000050405 This gene encodes a cytoskeleton-associated protein that inhibits actin filament depolymerization and cross-links filaments in bundles. It is downregulated in some cancer cell lines. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and expression of some of the variants maybe independently regulated. NA
PNN-interacting serine/arginine-rich protein 25957 PNISR ENSG00000132424 NA NA
paired immunoglobin-like type 2 receptor beta 29990 PILRB ENSG00000121716 The paired immunoglobin-like type 2 receptors consist of highly related activating and inhibitory receptors that are involved in the regulation of many aspects of the immune system. The paired immunoglobulin-like receptor genes are located in a tandem head-to-tail orientation on chromosome 7. This gene encodes the activating member of the receptor pair and contains a truncated cytoplasmic tail relative to its inhibitory counterpart (PILRA), that has a long cytoplasmic tail with immunoreceptor tyrosine-based inhibitory (ITIM) motifs. This gene is thought to have arisen from a duplication of the inhibitory PILRA gene and evolved to acquire its activating function. NA
diacylglycerol kinase delta 8527 DGKD ENSG00000077044 This gene encodes a cytoplasmic enzyme that phosphorylates diacylglycerol to produce phosphatidic acid. Diacylglycerol and phosphatidic acid are two lipids that act as second messengers in signaling cascades. Their cellular concentrations are regulated by the encoded protein, and so it is thought to play an important role in cellular signal transduction. Alternative splicing results in two transcript variants encoding different isoforms. NA
uncharacterized LOC105370792 105370792 LOC105370792 ENSG00000174171 NA NA
chromodomain helicase DNA binding protein 7 55636 CHD7 ENSG00000171316 This gene encodes a protein that contains several helicase family domains. Mutations in this gene have been found in some patients with the CHARGE syndrome. Two transcript variants encoding different isoforms have been found for this gene. NA
proline rich transmembrane protein 2 112476 PRRT2 ENSG00000167371 This gene encodes a transmembrane protein containing a proline-rich domain in its N-terminal half. Studies in mice suggest that it is predominantly expressed in brain and spinal cord in embryonic and postnatal stages. Mutations in this gene are associated with episodic kinesigenic dyskinesia-1. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
ArfGAP with coiled-coil, ankyrin repeat and PH domains 3 116983 ACAP3 ENSG00000131584 NA NA
armadillo repeat gene deleted in velocardiofacial syndrome 421 ARVCF ENSG00000099889 Armadillo Repeat gene deleted in Velo-Cardio-Facial syndrome (ARVCF) is a member of the catenin family. This family plays an important role in the formation of adherens junction complexes, which are thought to facilitate communication between the inside and outside environments of a cell. The ARVCF gene was isolated in the search for the genetic defect responsible for the autosomal dominant Velo-Cardio-Facial syndrome (VCFS), a relatively common human disorder with phenotypic features including cleft palate, conotruncal heart defects and facial dysmorphology. The ARVCF gene encodes a protein containing two motifs, a coiled coil domain in the N-terminus and a 10 armadillo repeat sequence in the midregion. Since these sequences can facilitate protein-protein interactions ARVCF is thought to function in a protein complex. In addition, ARVCF contains a predicted nuclear-targeting sequence suggesting that it may have a function as a nuclear protein. NA
tripartite motif containing 7 81786 TRIM7 ENSG00000146054 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1, a B-box type 2, and a coiled-coil region. The protein localizes to both the nucleus and the cytoplasm, and may represent a participant in the initiation of glycogen synthesis. Alternative splicing results in multiple transcript variants. NA
solute carrier family 36 member 1 206358 SLC36A1 ENSG00000123643 This gene encodes a member of the eukaryote-specific amino acid/auxin permease (AAAP) 1 transporter family. The encoded protein functions as a proton-dependent, small amino acid transporter. This gene is clustered with related family members on chromosome 5q33.1. Alternative splicing results in multiple transcript variants. NA
kelch like family member 3 26249 KLHL3 ENSG00000146021 This gene is ubiquitously expressed and encodes a full-length protein which has an N-terminal BTB domain followed by a BACK domain and six kelch-like repeats in the C-terminus. These kelch-like repeats promote substrate ubiquitination of bound proteins via interaction of the BTB domain with the CUL3 (cullin 3) component of a cullin-RING E3 ubiquitin ligase (CRL) complex. Muatations in this gene cause pseudohypoaldosteronism type IID (PHA2D); a rare Mendelian syndrome featuring hypertension, hyperkalaemia and metabolic acidosis. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
matrix metallopeptidase 24 10893 MMP24 ENSG00000125966 This gene encodes a member of the peptidase M10 family of matrix metalloproteinases (MMPs). Proteins in this family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. The encoded preproprotein is proteolytically processed to generate the mature protease. Unlike most MMPs, which are secreted, this protease is a member of the membrane-type MMP (MT-MMP) subfamily, contains a transmembrane domain and is expressed at the cell surface. Substrates of this protease include the proteins cadherin 2 and matrix metallopeptidase 2 (also known as 72 kDa type IV collagenase). NA
NA ENSG00000183458 RP11-958N24.1 ENSG00000183458 NA NA
phospholipase C eta 2 9651 PLCH2 ENSG00000149527 PLCH2 is a member of the PLC-eta family of the phosphoinositide-specific phospholipase C (PLC) superfamily of enzymes that cleave PtdIns(4,5) P2 to generate second messengers inositol 1,4,5-trisphosphate and diacylglycerol (Zhou et al., 2005 [PubMed 16107206]). NA
Src homology 2 domain containing F 90525 SHF ENSG00000138606 NA NA
SEL1L family member 3 23231 SEL1L3 ENSG00000091490 NA NA
synaptotagmin like 1 84958 SYTL1 ENSG00000142765 NA NA
transmembrane protein 266 123591 TMEM266 ENSG00000169758 NA NA
NPIP-like protein 1 102724993 LOC102724993 ENSG00000183889 NA NA
nuclear pore complex interacting protein family member A5 100288332 NPIPA5 ENSG00000183889 NA NA
neurexin 2 9379 NRXN2 ENSG00000110076 This gene encodes a member of the neurexin gene family. The products of these genes function as cell adhesion molecules and receptors in the vertebrate nervous system. These genes utilize two promoters. The majority of transcripts are produced from the upstream promoter and encode alpha-neurexin isoforms while a smaller number of transcripts are produced from the downstream promoter and encode beta-neuresin isoforms. The alpha-neurexins contain epidermal growth factor-like (EGF-like) sequences and laminin G domains, and have been shown to interact with neurexophilins. The beta-neurexins lack EGF-like sequences and contain fewer laminin G domains than alpha-neurexins. Alternative splicing and the use of alternative promoters may generate thousands of transcript variants (PMID: 12036300, PMID: 11944992). NA
serine/arginine-rich splicing factor 11 9295 SRSF11 ENSG00000116754 This gene encodes 54-kD nuclear protein that contains an arginine/serine-rich region similar to segments found in pre-mRNA splicing factors. Although the function of this protein is not yet known, structure and immunolocalization data suggest that it may play a role in pre-mRNA processing. Alternative splicing results in multiple transcript variants encoding different proteins. In addition, a pseudogene of this gene has been found on chromosome 12. NA
ankyrin repeat and sterile alpha motif domain containing 6 203286 ANKS6 ENSG00000165138 This gene encodes a protein containing multiple ankyrin repeats and a SAM domain. It is thought that this protein may localize to the proximal region of the primary cilium, and may play a role in renal and cardiovascular development. Mutations in this gene have been shown to cause a form of nephronophthisis (NPHP16), a chronic tubulo-interstitial nephritis. NA
leucine-rich repeats and calponin homology (CH) domain containing 1 23143 LRCH1 ENSG00000136141 This gene encodes a protein with a leucine-rich repeat and a calponin homology domain. Polymorphism in this gene may be associated with susceptibililty to knee osteoarthritis. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
polymerase (DNA) beta 5423 POLB ENSG00000070501 The protein encoded by this gene is a DNA polymerase involved in base excision and repair, also called gap-filling DNA synthesis. The encoded protein, acting as a monomer, is normally found in the cytoplasm, but it translocates to the nucleus upon DNA damage. Several transcript variants of this gene exist, but the full-length nature of only one has been described to date. NA
SMG1 phosphatidylinositol 3-kinase-related kinase 23049 SMG1 ENSG00000157106 This gene encodes a protein involved in nonsense-mediated mRNA decay (NMD) as part of the mRNA surveillance complex. The protein has kinase activity and is thought to function in NMD by phosphorylating the regulator of nonsense transcripts 1 protein. Alternatively spliced transcript variants have been described, but their full-length nature has yet to be determined. NA
jumonji domain containing 1C 221037 JMJD1C ENSG00000171988 The protein encoded by this gene interacts with thyroid hormone receptors and contains a jumonji domain. It is a candidate histone demethylase and is thought to be a coactivator for key transcription factors. It plays a role in the DNA-damage response pathway by demethylating the mediator of DNA damage checkpoint 1 (MDC1) protein, and is required for the survival of acute myeloid leukemia. Mutations in this gene are associated with Rett syndrome and intellectual disability. Alternative splicing results in multiple transcript variants. NA
F-box protein 31 79791 FBXO31 ENSG00000103264 This gene is a member of the F-box family. Members are classified into three classes according to the substrate interaction domain, FBW for WD40 repeats, FBL for leucing-rich repeats, and FBXO for other domains. This protein, classified into the last category because of the lack of a recognizable substrate binding domain, has been proposed to be a component of the SCF ubiquitination complex. It is thought to bind and recruit substrate for ubiquitination and degradation. This protein may have a role in regulating the cell cycle as well as dendrite growth and neuronal migration. Alternative splicing results in multiple transcript variants. NA
ADAM metallopeptidase domain 22 53616 ADAM22 ENSG00000008277 This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins, and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. Unlike other members of the ADAM protein family, the protein encoded by this gene lacks metalloprotease activity since it has no zinc-binding motif. This gene is highly expressed in the brain and may function as an integrin ligand in the brain. In mice, it has been shown to be essential for correct myelination in the peripheral nervous system. Alternative splicing results in several transcript variants. NA
serine/arginine-rich splicing factor 5 6430 SRSF5 ENSG00000100650 The protein encoded by this gene is a member of the serine/arginine (SR)-rich family of pre-mRNA splicing factors, which constitute part of the spliceosome. Each of these factors contains an RNA recognition motif (RRM) for binding RNA and an RS domain for binding other proteins. The RS domain is rich in serine and arginine residues and facilitates interaction between different SR splicing factors. In addition to being critical for mRNA splicing, the SR proteins have also been shown to be involved in mRNA export from the nucleus and in translation. Alternative splicing results in multiple transcript variants. NA
HECT domain E3 ubiquitin protein ligase 4 283450 HECTD4 ENSG00000173064 NA NA
Sad1 and UNC84 domain containing 1 23353 SUN1 ENSG00000164828 This gene is a member of the unc-84 homolog family and encodes a nuclear nuclear envelope protein with an Unc84 (SUN) domain. The protein is involved in nuclear anchorage and migration. Alternatively spliced transcript variants have been described. NA
ciliary rootlet coiled-coil, rootletin pseudogene 2 ENSG00000215908 CROCCP2 ENSG00000215908 NA NA
glucuronic acid epimerase 26035 GLCE ENSG00000138604 Heparan sulfate (HS) is a negatively charged cell surface polysaccharide required for the biologic activities of circulating extracellular ligands. GLCE is responsible for epimerization of D-glucuronic acid (GlcA) to L-iduronic acid (IdoA) of HS, which endows the nascent polysaccharide chain with the ability to bind growth factors and cytokines (Ghiselli and Agrawal, 2005 [PubMed 15853773]). NA
tripartite motif containing 9 114088 TRIM9 ENSG00000100505 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The protein localizes to cytoplasmic bodies. Its function has not been identified. Alternate splicing of this gene generates two transcript variants encoding different isoforms. NA
spectrin beta, non-erythrocytic 5 51332 SPTBN5 ENSG00000137877 NA NA
aarF domain containing kinase 3 56997 ADCK3 ENSG00000163050 This gene encodes a mitochondrial protein similar to yeast ABC1, which functions in an electron-transferring membrane protein complex in the respiratory chain. It is not related to the family of ABC transporter proteins. Expression of this gene is induced by the tumor suppressor p53 and in response to DNA damage, and inhibiting its expression partially suppresses p53-induced apoptosis. Alternatively spliced transcript variants have been found; however, their full-length nature has not been determined. NA
ataxin 2 like 11273 ATXN2L ENSG00000168488 This gene encodes an ataxin type 2 related protein of unknown function. This protein is a member of the spinocerebellar ataxia (SCAs) family, which is associated with a complex group of neurodegenerative disorders. Several alternatively spliced transcripts encoding different isoforms have been found for this gene. NA
STARD4 antisense RNA 1 100505678 STARD4-AS1 ENSG00000246859 NA NA
ubiquitin associated and SH3 domain containing B 84959 UBASH3B ENSG00000154127 This gene encodes a protein that contains a ubiquitin associated domain at the N-terminus, an SH3 domain, and a C-terminal domain with similarities to the catalytic motif of phosphoglycerate mutase. The encoded protein was found to inhibit endocytosis of epidermal growth factor receptor (EGFR) and platelet-derived growth factor receptor. NA
leukocyte receptor cluster (LRC) member 8 114823 LENG8 ENSG00000167615 NA NA
glycerol-3-phosphate acyltransferase 4 137964 GPAT4 ENSG00000158669 Lysophosphatidic acid acyltransferases (EC 2.3.1.51) catalyze the conversion of lysophosphatidic acid (LPA) to phosphatidic acid (PA). LPA and PA are involved in signal transduction and lipid biosynthesis. NA
atypical chemokine receptor 1 (Duffy blood group) 2532 ACKR1 ENSG00000213088 The protein encoded by this gene is a glycosylated membrane protein and a non-specific receptor for several chemokines. The encoded protein is the receptor for the human malarial parasites Plasmodium vivax and Plasmodium knowlesi. Polymorphisms in this gene are the basis of the Duffy blood group system. Two transcript variants encoding different isoforms have been found for this gene. NA
dehydrodolichyl diphosphate synthase subunit 79947 DHDDS ENSG00000117682 The protein encoded by this gene catalyzes cis-prenyl chain elongation to produce the polyprenyl backbone of dolichol, a glycosyl carrier lipid required for the biosynthesis of several classes of glycoproteins. Mutations in this gene are associated with retinitis pigmentosa type 59. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
MAX dimerization protein 4 10608 MXD4 ENSG00000123933 This gene is a member of the MAD gene family . The MAD genes encode basic helix-loop-helix-leucine zipper proteins that heterodimerize with MAX protein, forming a transcriptional repression complex. The MAD proteins compete for MAX binding with MYC, which heterodimerizes with MAX forming a transcriptional activation complex. Studies in rodents suggest that the MAD genes are tumor suppressors and contribute to the regulation of cell growth in differentiating tissues. NA
semaphorin 6C 10500 SEMA6C ENSG00000143434 This gene encodes a member of the semaphorin family. Semaphorins represent important molecular signals controlling multiple aspects of the cellular response that follows CNS injury, and thus may play an important role in neural regeneration. NA
nuclear factor I X 4784 NFIX ENSG00000008441 The protein encoded by this gene is a transcription factor that binds the palindromic sequence 5’-TTGGCNNNNNGCCAA-3 in viral and cellular promoters. The encoded protein can also stimulate adenovirus replication in vitro. Three transcript variants encoding different isoforms have been found for this gene. NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_brain_clus_",3,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 4 Annotations

out <- mygene::queryMany(gene_list_brain[4,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id query symbol name notfound
This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. 84152 ENSG00000131771 PPP1R1B protein phosphatase 1 regulatory inhibitor subunit 1B NA
NA 2788 ENSG00000176533 GNG7 G protein subunit gamma 7 NA
NA 5121 ENSG00000183036 PCP4 Purkinje cell protein 4 NA
This gene encodes a member of the regulator of G-protein signaling family. This protein contains one RGS domain, two Raf-like Ras-binding domains (RBDs), and one GoLoco domain. The protein attenuates the signaling activity of G-proteins by binding, through its GoLoco domain, to specific types of activated, GTP-bound G alpha subunits. Acting as a GTPase activating protein (GAP), the protein increases the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. Alternate transcriptional splice variants of this gene have been observed but have not been thoroughly characterized. 10636 ENSG00000169220 RGS14 regulator of G-protein signaling 14 NA
This gene encodes a leucine-rich cytoplasmic protein, which is highly similar to a mouse protein that negatively regulates Ca/calmodulin-dependent protein kinase II phosphorylation and may be essential for spatial learning processes. Several alternatively spliced transcript variants of this gene have been described. 23154 ENSG00000020129 NCDN neurochondrin NA
The protein encoded by this gene belongs to the cyclic nucleotide phosphodiesterase (PDE) family, and PDE1 subfamily. Members of the PDE1 family are calmodulin-dependent PDEs that are stimulated by a calcium-calmodulin complex. This PDE has dual-specificity for the second messengers, cAMP and cGMP, with a preference for cGMP as a substrate. cAMP and cGMP function as key regulators of many important physiological processes. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 5153 ENSG00000123360 PDE1B phosphodiesterase 1B NA
This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. 5909 ENSG00000076864 RAP1GAP RAP1 GTPase activating protein NA
The protein encoded by this gene is highly conserved in human, mouse, and chicken, showing 94% and 79% amino acid identity of human to mouse and chicken sequences, respectively. Hybridization to this gene was detected in spindle-shaped cells located along nerve fibers between the auditory ganglion and sensory epithelium. These cells accompany neurites at the habenula perforata, the opening through which neurites extend to innervate hair cells. This and the pattern of expression of this gene in chicken inner ear paralleled the histologic findings of acidophilic deposits, consistent with mucopolysaccharide ground substance, in temporal bones from DFNA9 (autosomal dominant nonsyndromic sensorineural deafness 9) patients. Mutations that cause DFNA9 have been reported in this gene. Alternative splicing results in multiple transcript variants encoding the same protein. Additional splice variants encoding distinct isoforms have been described but their biological validities have not been demonstrated. 1690 ENSG00000100473 COCH cochlin NA
This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belongs to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified from this gene. 30819 ENSG00000120049 KCNIP2 potassium voltage-gated channel interacting protein 2 NA
This gene encodes a member of the membrane-bound adenylyl cyclase enzymes. Adenylyl cyclases mediate G protein-coupled receptor signaling through the synthesis of the second messenger cAMP. Activity of the encoded protein is stimulated by the Gs alpha subunit of G protein-coupled receptors and is inhibited by protein kinase A, calcium and Gi alpha subunits. Single nucleotide polymorphisms in this gene may be associated with low birth weight and type 2 diabetes. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. 111 ENSG00000173175 ADCY5 adenylate cyclase 5 NA
The protein encoded by this gene is a proteolipid that may be involved in the regulation of ion channels during brain development. The encoded protein may also play a role in forming and maintaining the structure of the nervous system. This gene is found within an intron of the BLCAP gene, but on the opposite strand. This gene is imprinted and is expressed only from the paternal allele. Two transcript variants encoding two different isoforms have been found for this gene. 4826 ENSG00000053438 NNAT neuronatin NA
The 19-kD cAMP-regulated phosphoprotein plays a role in regulating mitosis by inhibiting protein phosphatase-2A (PP2A; see MIM 176915) (summary by Gharbi-Ayachi et al., 2010 [PubMed 21164014]). 10776 ENSG00000128989 ARPP19 cAMP regulated phosphoprotein 19kDa NA
Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). 146330 ENSG00000127585 FBXL16 F-box and leucine-rich repeat protein 16 NA
NA 5502 ENSG00000135447 PPP1R1A protein phosphatase 1 regulatory inhibitor subunit 1A NA
NA 54976 ENSG00000101220 C20orf27 chromosome 20 open reading frame 27 NA
NA 654790 ENSG00000248485 PCP4L1 Purkinje cell protein 4 like 1 NA
NA 100506071 ENSG00000258525 LOC100506071 uncharacterized LOC100506071 NA
This gene encodes four products of the tachykinin peptide hormone family, substance P and neurokinin A, as well as the related peptides, neuropeptide K and neuropeptide gamma. These hormones are thought to function as neurotransmitters which interact with nerve receptors and smooth muscle cells. They are known to induce behavioral responses and function as vasodilators and secretagogues. Substance P is an antimicrobial peptide with antibacterial and antifungal properties. Multiple transcript variants encoding different isoforms have been found for this gene. 6863 ENSG00000006128 TAC1 tachykinin precursor 1 NA
Potassium channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume. Four sequence-related potassium channel genes - shaker, shaw, shab, and shal - have been identified in Drosophila, and each has been shown to have human homolog(s). This gene encodes a member of the potassium channel, voltage-gated, shaker-related subfamily. This member includes distinct isoforms which are encoded by alternatively spliced transcript variants of this gene. Some of these isoforms are beta subunits, which form heteromultimeric complexes with alpha subunits and modulate the activity of the pore-forming alpha subunits. 7881 ENSG00000169282 KCNAB1 potassium voltage-gated channel subfamily A member regulatory beta subunit 1 NA
Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. This gene product is a type I membrane protein that is highly expressed in normal tissues, such as kidney, colon and pancreas, and has been found to be overexpressed in 10% of clear cell renal carcinomas. Three transcript variants encoding different isoforms have been identified for this gene. 771 ENSG00000074410 CA12 carbonic anhydrase 12 NA
LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. 4023 ENSG00000175445 LPL lipoprotein lipase NA
This gene encodes a member of the RGS family of GTPase activating proteins that function in various signaling pathways by accelerating the deactivation of G proteins. This protein is anchored to photoreceptor membranes in retinal cells and deactivates G proteins in the rod and cone phototransduction cascades. Mutations in this gene result in bradyopsia. Multiple transcript variants encoding different isoforms have been found for this gene. 8787 ENSG00000108370 RGS9 regulator of G-protein signaling 9 NA
NA 79085 ENSG00000125648 SLC25A23 solute carrier family 25 member 23 NA
NA 404217 ENSG00000178531 CTXN1 cortexin 1 NA
NA 5530 ENSG00000138814 PPP3CA protein phosphatase 3 catalytic subunit alpha NA
The protein encoded by this gene resides in the endoplasmic reticulum, and is involved in the maturation and transport of lipoprotein lipase through the secretory pathway. Mutations in this gene are associated with combined lipase deficiency. Alternatively spliced transcript variants have been found for this gene. 64788 ENSG00000260807 LMF1 lipase maturation factor 1 NA
This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. 27122 ENSG00000050165 DKK3 dickkopf WNT signaling pathway inhibitor 3 NA
The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. The protein is a cis-trans prolyl isomerase that binds the immunosuppressants FK506 and rapamycin. It interacts with several intracellular signal transduction proteins including type I TGF-beta receptor. It also interacts with multiple intracellular calcium release channels, and coordinates multi-protein complex formation of the tetrameric skeletal muscle ryanodine receptor. In mouse, deletion of this homologous gene causes congenital heart disorder known as noncompaction of left ventricular myocardium. Multiple alternatively spliced variants, encoding the same protein, have been identified. The human genome contains five pseudogenes related to this gene, at least one of which is transcribed. 2280 ENSG00000088832 FKBP1A FK506 binding protein 1A NA
NA 57464 ENSG00000128578 STRIP2 striatin interacting protein 2 NA
A reciprocal translocation between chromosomes 22 and 9 produces the Philadelphia chromosome, which is often found in patients with chronic myelogenous leukemia. The chromosome 22 breakpoint for this translocation is located within the BCR gene. The translocation produces a fusion protein which is encoded by sequence from both BCR and ABL, the gene at the chromosome 9 breakpoint. Although the BCR-ABL fusion protein has been extensively studied, the function of the normal BCR gene product is not clear. The protein has serine/threonine kinase activity and is a GTPase-activating protein for p21rac. Two transcript variants encoding different isoforms have been found for this gene. 613 ENSG00000186716 BCR breakpoint cluster region NA
This gene encodes a transmembrane protein that contains a Smad interacting motif (SIM). Expression of this gene is induced by androgens and transforming growth factor beta, and the encoded protein suppresses the androgen receptor and transforming growth factor beta signaling pathways though interactions with Smad proteins. Overexpression of this gene may play a role in multiple types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 56937 ENSG00000124225 PMEPA1 prostate transmembrane protein, androgen induced 1 NA
Constitutive activation of the Ras pathway triggers an irreversible proliferation arrest reminiscent of replicative senescence. Transcription of this gene is upregulated in response to activation of the Ras pathway, but not under other conditions that induce senescence. The encoded protein is similar to a rat cell surface receptor proposed to function in a neuronal survival pathway. An allelic polymorphism in this gene results in both functional and non-functional (frameshifted) alleles; the reference genome represents the functional allele. 25907 ENSG00000249992 TMEM158 transmembrane protein 158 (gene/pseudogene) NA
NA 439921 ENSG00000182534 MXRA7 matrix remodelling associated 7 NA
This gene encodes a common acute lymphocytic leukemia antigen that is an important cell surface marker in the diagnosis of human acute lymphocytic leukemia (ALL). This protein is present on leukemic cells of pre-B phenotype, which represent 85% of cases of ALL. This protein is not restricted to leukemic cells, however, and is found on a variety of normal tissues. It is a glycoprotein that is particularly abundant in kidney, where it is present on the brush border of proximal tubules and on glomerular epithelium. The protein is a neutral endopeptidase that cleaves peptides at the amino side of hydrophobic residues and inactivates several peptide hormones including glucagon, enkephalins, substance P, neurotensin, oxytocin, and bradykinin. This gene, which encodes a 100-kD type II transmembrane glycoprotein, exists in a single copy of greater than 45 kb. The 5’ untranslated region of this gene is alternatively spliced, resulting in four separate mRNA transcripts. The coding region is not affected by alternative splicing. 4311 ENSG00000196549 MME membrane metallo-endopeptidase NA
The protein encoded by this gene belongs to the family of P-type primary ion transport ATPases characterized by the formation of an aspartyl phosphate intermediate during the reaction cycle. These enzymes remove bivalent calcium ions from eukaryotic cells against very large concentration gradients and play a critical role in intracellular calcium homeostasis. The mammalian plasma membrane calcium ATPase isoforms are encoded by at least four separate genes and the diversity of these enzymes is further increased by alternative splicing of transcripts. The expression of different isoforms and splice variants is regulated in a developmental, tissue- and cell type-specific manner, suggesting that these pumps are functionally adapted to the physiological needs of particular cells and tissues. This gene encodes the plasma membrane calcium ATPase isoform 1. Alternatively spliced transcript variants encoding different isoforms have been identified. 490 ENSG00000070961 ATP2B1 ATPase plasma membrane Ca2+ transporting 1 NA
Calcium-dependent membrane-binding proteins may regulate molecular events at the interface of the cell membrane and cytoplasm. This gene is one of several genes that encode a calcium-dependent protein containing two N-terminal type II C2 domains and an integrin A domain-like sequence in the C-terminus. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. More variants may exist, but their full-length natures could not be determined. 57699 ENSG00000124772 CPNE5 copine 5 NA
This is a paternally expressed imprinted gene that is thought to have been derived from the Ty3/Gypsy family of retrotransposons. It contains two overlapping open reading frames, RF1 and RF2, and expresses two proteins: a shorter, gag-like protein (with a CCHC-type zinc finger domain) from RF1; and a longer, gag/pol-like fusion protein (with an additional aspartic protease motif) from RF1/RF2 by -1 translational frameshifting (-1 FS). While -1 FS has been observed in RNA viruses and transposons in both prokaryotes and eukaryotes, this gene represents the first example of -1 FS in a eukaryotic cellular gene. This gene is highly conserved across mammalian species and retains the heptanucleotide (GGGAAAC) and pseudoknot elements required for -1 FS. It is expressed in adult and embryonic tissues (most notably in placenta) and reported to have a role in cell proliferation, differentiation and apoptosis. Overexpression of this gene has been associated with several malignancies, such as hepatocellular carcinoma and B-cell lymphocytic leukemia. Knockout mice lacking this gene showed early embryonic lethality with placental defects, indicating the importance of this gene in embryonic development. Additional isoforms resulting from alternatively spliced transcript variants, and use of upstream non-AUG (CUG) start codon have been reported for this gene. 23089 ENSG00000242265 PEG10 paternally expressed 10 NA
This gene encodes a protein that interacts with the low density lipoprotein (LDL) receptor-related protein and facilitates its proper folding and localization by preventing the binding of ligands. Mutations in this gene have been identified in individuals with myopia 23. Alternative splicing results in multiple transcript variants. 4043 ENSG00000163956 LRPAP1 LDL receptor related protein associated protein 1 NA
This protein is expressed by in vitro differentiated macrophages but not freshly isolated monocytes. Although sequence analysis identifies seven potential transmembrane domains, this protein has little homology to G-protein receptors and it has not been positively identified as a receptor. A suggested alternative function is that of an ion channel protein in maturing macrophages. 23531 ENSG00000108960 MMD monocyte to macrophage differentiation associated NA
Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. 88 ENSG00000077522 ACTN2 actinin alpha 2 NA
NA 79734 ENSG00000100379 KCTD17 potassium channel tetramerization domain containing 17 NA
This gene encodes an import receptor of the outer mitochondrial membrane that is part of the translocase of the outer membrane complex. This protein is involved in the import of mitochondrial precursor proteins. 9868 ENSG00000154174 TOMM70 translocase of outer mitochondrial membrane 70 NA
This gene encodes a member of the MAP kinase family. MAP kinases, also known as extracellular signal-regulated kinases (ERKs), act as an integration point for multiple biochemical signals, and are involved in a wide variety of cellular processes such as proliferation, differentiation, transcription regulation and development. The activation of this kinase requires its phosphorylation by upstream kinases. Upon activation, this kinase translocates to the nucleus of the stimulated cells, where it phosphorylates nuclear targets. One study also suggests that this protein acts as a transcriptional repressor independent of its kinase activity. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Two alternatively spliced transcript variants encoding the same protein, but differing in the UTRs, have been reported for this gene. 5594 ENSG00000100030 MAPK1 mitogen-activated protein kinase 1 NA
This gene encodes a protein that reduces cell growth by stimulating apoptosis. Alternative splicing and the use of alternative promoters result in multiple transcript variants encoding the same protein. This gene is imprinted in brain where different transcript variants are expressed from each parental allele. Transcript variants initiating from the upstream promoter are expressed preferentially from the maternal allele, while transcript variants initiating downstream of the interspersed NNAT gene (GeneID:4826) are expressed from the paternal allele. Transcripts at this locus may also undergo A to I editing, resulting in amino acid changes at three positions in the N-terminus of the protein. 10904 ENSG00000166619 BLCAP bladder cancer associated protein NA
This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. 5662 ENSG00000059915 PSD pleckstrin and Sec7 domain containing NA
The protein encoded by this gene is a nuclear hormone receptor for triiodothyronine. It is one of the several receptors for thyroid hormone, and has been shown to mediate the biological activities of thyroid hormone. Knockout studies in mice suggest that the different receptors, while having certain extent of redundancy, may mediate different functions of thyroid hormone. Alternatively spliced transcript variants encoding distinct isoforms have been reported. 7067 ENSG00000126351 THRA thyroid hormone receptor, alpha NA
This gene belongs to the cadherin superfamily of calcium-dependent cell adhesion molecules. The encoded protein is a photoreceptor-specific cadherin that plays a role in outer segment disc morphogenesis. Mutations in this gene are associated with inherited retinal dystrophies. Alternatively spliced transcript variants encoding different isoforms have been identified. 92211 ENSG00000148600 CDHR1 cadherin related family member 1 NA
The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. 1152 ENSG00000166165 CKB creatine kinase B NA
The protein encoded by this gene is a brain-enriched nucleotide exchanged factor that contains an N-terminal GEF domain, 2 tandem repeats of EF-hand calcium-binding motifs, and a C-terminal diacylglycerol/phorbol ester-binding domain. This protein can activate small GTPases, including RAS and RAP1/RAS3. The nucleotide exchange activity of this protein can be stimulated by calcium and diacylglycerol. Four alternatively spliced transcript variants encoding two different isoforms have been found for this gene. 10235 ENSG00000068831 RASGRP2 RAS guanyl releasing protein 2 NA
This gene encodes a member of the sestrin family of stress-induced proteins. The encoded protein reduces the levels of intracellular reactive oxygen species induced by activated Ras downstream of RAC-alpha serine/threonine-protein kinase (Akt) and FoxO transcription factor. The protein is required for normal regulation of blood glucose, insulin resistance and plays a role in lipid storage in obesity. Alternative splicing results in multiple transcript variants. 143686 ENSG00000149212 SESN3 sestrin 3 NA
NA 9747 ENSG00000198420 TCAF1 TRPM8 channel associated factor 1 NA
The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein binds to the RII-beta regulatory subunit of PKA, and also to protein kinase C and the phosphatase calcineurin. It is predominantly expressed in cerebral cortex and may anchor the PKA protein at postsynaptic densities (PSD) and be involved in the regulation of postsynaptic events. It is also expressed in T lymphocytes and may function to inhibit interleukin-2 transcription by disrupting calcineurin-dependent dephosphorylation of NFAT. 9495 ENSG00000179841 AKAP5 A-kinase anchoring protein 5 NA
This gene encodes a member of a family of proteins containing an N-terminal pleckstrin homology domain and a highly conserved C-terminal oxysterol-binding protein-like sterol-binding domain. It binds mutliple lipid-containing molecules, including phosphatidylserine, phosphatidylinositol 4-phosphate (PI4P) and oxysterol, and promotes their exchange between the endoplasmic reticulum and the plasma membrane. Alternative splicing results in multiple transcript variants. 114882 ENSG00000091039 OSBPL8 oxysterol binding protein like 8 NA
The protein encoded by this gene has similarity to a yeast protein which suggests a role of the gene product in regulating secretory vesicles. 10966 ENSG00000141542 RAB40B RAB40B, member RAS oncogene family NA
This gene encodes a protein subunit that regulates the activity of the serine/threonine phosphatase, protein phosphatase-1. The encoded protein is required for completion of the mitotic cycle and for targeting protein phosphatase-1 to mitotic kinetochores. Alternate splicing results in multiple transcript variants. 5510 ENSG00000115685 PPP1R7 protein phosphatase 1 regulatory subunit 7 NA
The protein encoded by this gene belongs to the cdc2/cdkx subfamily of the ser/thr family of protein kinases. It has similarity to a rat protein that is thought to play a role in terminally differentiated neurons. Alternatively spliced transcript variants encoding different isoforms have been found. 5128 ENSG00000059758 CDK17 cyclin-dependent kinase 17 NA
This gene is a member of the dedicator of cytokinesis (DOCK) family and encodes a protein with a DHR-1 (CZH-1) domain, a DHR-2 (CZH-2) domain and an SH3 domain. This membrane-associated, cytoplasmic protein functions as a guanine nucleotide exchange factor and is involved in regulation of adherens junctions between cells. Mutations in this gene have been associated with ovarian, prostate, glioma, and colorectal cancers. Alternatively spliced variants which encode different protein isoforms have been described, but only one has been fully characterized. 9732 ENSG00000128512 DOCK4 dedicator of cytokinesis 4 NA
This gene encodes a receptor for gamma-aminobutyric acid (GABA), which is the main inhibitory neurotransmitter in the mammalian central nervous system. This receptor functions as a heterodimer with GABA(B) receptor 2. Defects in this gene may underlie brain disorders such as schizophrenia and epilepsy. Alternative splicing generates multiple transcript variants, but the full-length nature of some of these variants has not been determined. 2550 ENSG00000204681 GABBR1 gamma-aminobutyric acid type B receptor subunit 1 NA
Crystallins are separated into two classes: taxon-specific and ubiquitous. The former class is also called phylogenetically-restricted crystallins. The latter class constitutes the major proteins of vertebrate eye lens and maintains the transparency and refractive index of the lens. This gene encodes a taxon-specific crystallin protein that binds NADPH and has sequence similarity to bacterial ornithine cyclodeaminases. The encoded protein does not perform a structural role in lens tissue, and instead it binds thyroid hormone for possible regulatory or developmental roles. Mutations in this gene have been associated with autosomal dominant non-syndromic deafness. 1428 ENSG00000103316 CRYM crystallin mu NA
NA 26037 ENSG00000197555 SIPA1L1 signal induced proliferation associated 1 like 1 NA
The protein encoded by this gene was initially identified as a molecule linking syndecan-mediated signaling to the cytoskeleton. The syntenin protein contains tandemly repeated PDZ domains that bind the cytoplasmic, C-terminal domains of a variety of transmembrane proteins. This protein may also affect cytoskeletal-membrane organization, cell adhesion, protein trafficking, and the activation of transcription factors. The protein is primarily localized to membrane-associated adherens junctions and focal adhesions but is also found at the endoplasmic reticulum and nucleus. Alternative splicing results in multiple transcript variants encoding different isoforms. 6386 ENSG00000137575 SDCBP syndecan binding protein NA
Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. 87 ENSG00000072110 ACTN1 actinin alpha 1 NA
The protein encoded by this gene is important in purine metabolism by converting AMP to IMP. The encoded protein, which acts as a homotetramer, is one of three AMP deaminases found in mammals. Several transcript variants encoding different isoforms have been found for this gene. 271 ENSG00000116337 AMPD2 adenosine monophosphate deaminase 2 NA
NA ENSG00000260244 ENSG00000260244 RP11-588K22.2 NA NA
NA 54055 ENSG00000228314 CYP4F29P cytochrome P450 family 4 subfamily F member 29, pseudogene NA
The protein encoded by this gene contains six PDZ domains and shares sequence similarity with pro-interleukin-16 (pro-IL-16). Like pro-IL-16, the encoded protein localizes to the endoplasmic reticulum and is thought to be cleaved by a caspase to produce a secreted peptide containing two PDZ domains. In addition, this gene is upregulated in primary prostate tumors and may be involved in the early stages of prostate tumorigenesis. 23037 ENSG00000133401 PDZD2 PDZ domain containing 2 NA
NA 55188 ENSG00000111785 RIC8B RIC8 guanine nucleotide exchange factor B NA
NA 114900 ENSG00000172247 C1QTNF4 C1q and tumor necrosis factor related protein 4 NA
This gene encodes a member of the serine/threonine kinase family that contains two PAS domains. Expression of this gene is regulated by glucose, and the encoded protein plays a role in the regulation of insulin gene expression. Downregulation of this gene may play a role in type 2 diabetes. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 23178 ENSG00000115687 PASK PAS domain containing serine/threonine kinase NA
NA 440145 ENSG00000204899 MZT1 mitotic spindle organizing protein 1 NA
This gene encodes a member of the Kruppel-like factor subfamily of zinc finger proteins. The encoded protein is a transcriptional activator that binds directly to a specific recognition motif in the promoters of target genes. This protein acts downstream of multiple different signaling pathways and is regulated by post-translational modification. It may participate in both promoting and suppressing cell proliferation. Expression of this gene may be changed in a variety of different cancers and in cardiovascular disease. Alternative splicing results in multiple transcript variants. 688 ENSG00000102554 KLF5 Kruppel-like factor 5 (intestinal) NA
This gene encodes a class I mammalian Golgi 1,2-mannosidase which is a type II transmembrane protein. This protein catalyzes the hydrolysis of three terminal mannose residues from peptide-bound Man(9)-GlcNAc(2) oligosaccharides and belongs to family 47 of glycosyl hydrolases. 4121 ENSG00000111885 MAN1A1 mannosidase alpha class 1A member 1 NA
This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. 4008 ENSG00000136153 LMO7 LIM domain 7 NA
NA 221061 ENSG00000148468 FAM171A1 family with sequence similarity 171 member A1 NA
NA 83855 ENSG00000129911 KLF16 Kruppel-like factor 16 NA
This gene encodes a leucine-rich transmembrane glycoprotein that may be involved in cell adhesion. The encoded protein is an oncofetal antigen that is specific to trophoblast cells. In adults this protein is highly expressed in many tumor cells and is associated with poor clinical outcome in numerous cancers. Alternate splicing in the 5’ UTR results in multiple transcript variants that encode the same protein. 7162 ENSG00000146242 TPBG trophoblast glycoprotein NA
The protein encoded by this gene is involved in the import of precursor proteins into mitochondria. The encoded protein has a chaperone-like activity, binding the mature portion of unfolded proteins and aiding their import into mitochondria. This protein, which is found in the cytoplasm and sometimes associated with the outer mitochondrial membrane, has a weak ATPase activity and contains 6 TPR repeats. 10953 ENSG00000025772 TOMM34 translocase of outer mitochondrial membrane 34 NA
NA ENSG00000272379 ENSG00000272379 RP1-257A7.5 NA NA
NA 23392 ENSG00000136813 KIAA0368 KIAA0368 NA
The protein encoded by this gene is a beta-1,3-glucosyltransferase that transfers glucose to O-linked fucosylglycans on thrombospondin type-1 repeats (TSRs) of several proteins. The encoded protein is a type II membrane protein. Defects in this gene are a cause of Peters-plus syndrome (PPS). 145173 ENSG00000187676 B3GLCT beta 3-glucosyltransferase NA
This gene encodes a small GTP-binding protein of the RAS superfamily which functions as an ADP-ribosylation factor (ARF). The encoded protein is one of a functionally distinct group of ARF-like genes. 402 ENSG00000213465 ARL2 ADP ribosylation factor like GTPase 2 NA
NA 8408 ENSG00000177169 ULK1 unc-51 like autophagy activating kinase 1 NA
AMP-activated protein kinase (AMPK) is a heterotrimeric protein composed of a catalytic alpha subunit, a noncatalytic beta subunit, and a noncatalytic regulatory gamma subunit. Various forms of each of these subunits exist, encoded by different genes. AMPK is an important energy-sensing enzyme that monitors cellular energy status and functions by inactivating key enzymes involved in regulating de novo biosynthesis of fatty acid and cholesterol. This gene is a member of the AMPK gamma subunit family. Mutations in this gene have been associated with Wolff-Parkinson-White syndrome, familial hypertrophic cardiomyopathy, and glycogen storage disease of the heart. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 51422 ENSG00000106617 PRKAG2 protein kinase AMP-activated non-catalytic subunit gamma 2 NA
NA ENSG00000226009 ENSG00000226009 KCNIP2-AS1 KCNIP2 antisense RNA 1 NA
This gene encodes a member of the diacylglycerol kinase (DGK) enzyme family. Members of this family are involved in regulating intracellular concentrations of diacylglycerol and phosphatidic acid. Variation in this gene has been associated with bipolar disorder. Alternatively spliced transcript variants have been identified. 160851 ENSG00000102780 DGKH diacylglycerol kinase eta NA
NA 399665 ENSG00000167106 FAM102A family with sequence similarity 102 member A NA
The protein encoded by this gene is a member of the cAMP-dependent protein kinase (PKA) inhibitor family. This protein was demonstrated to interact with and inhibit the activities of both C alpha and C beta catalytic subunits of the PKA. Alternatively spliced transcript variants encoding the same protein have been reported. 5569 ENSG00000171033 PKIA protein kinase (cAMP-dependent, catalytic) inhibitor alpha NA
NA 170463 ENSG00000130511 SSBP4 single stranded DNA binding protein 4 NA
NA 23640 ENSG00000133265 HSPBP1 HSPA (heat shock 70kDa) binding protein, cytoplasmic cochaperone 1 NA
NA 83857 ENSG00000133687 TMTC1 transmembrane and tetratricopeptide repeat containing 1 NA
This gene encodes a member of a family of proteins that contain coiled-coil domains and may form hetero- or homomers. The encoded protein is involved in cell proliferation and calcium signaling. It also interacts with the mitogen-activated protein kinase kinase kinase 5 (MAP3K5/ASK1) and positively regulates MAP3K5-induced apoptosis. Multiple alternatively spliced transcript variants have been observed. 7164 ENSG00000111907 TPD52L1 tumor protein D52-like 1 NA
This gene encodes one of two enzymes which catalyzes the final reaction in the synthesis of triglycerides in which diacylglycerol is covalently bound to long chain fatty acyl-CoAs. The encoded protein catalyzes this reaction at low concentrations of magnesium chloride while the other enzyme has high activity at high concentrations of magnesium chloride. Multiple transcript variants encoding different isoforms have been found for this gene. 84649 ENSG00000062282 DGAT2 diacylglycerol O-acyltransferase 2 NA
NA NA ENSG00000229164 NA NA TRUE
NA 9796 ENSG00000168490 PHYHIP phytanoyl-CoA 2-hydroxylase interacting protein NA
NA 51108 ENSG00000197006 METTL9 methyltransferase like 9 NA
NA ENSG00000272678 ENSG00000272678 RP11-797D24.4 NA NA
Interleukin 17A (IL17A) is a proinflammatory cytokine secreted by activated T-lymphocytes. It is a potent inducer of the maturation of CD34-positive hematopoietic precursors into neutrophils. The transmembrane protein encoded by this gene (interleukin 17A receptor; IL17RA) is a ubiquitous type I membrane glycoprotein that binds with low affinity to interleukin 17A. Interleukin 17A and its receptor play a pathogenic role in many inflammatory and autoimmune diseases such as rheumatoid arthritis. Like other cytokine receptors, this receptor likely has a multimeric structure. Alternative splicing results in multiple transcript variants encoding different isoforms. 23765 ENSG00000177663 IL17RA interleukin 17 receptor A NA
Chondroitin sulfate (CS) is a glycosaminoglycan which is an important structural component of the extracellular matrix and which links to proteins to form proteoglycans. Chondroitin sulfate E (CS-E) is an isomer of chondroitin sulfate in which the C-4 and C-6 hydroxyl groups are sulfated. This gene encodes a type II transmembrane glycoprotein that acts as a sulfotransferase to transfer sulfate to the C-6 hydroxal group of chondroitin sulfate. This gene has also been identified as being co-expressed with RAG1 in B-cells and as potentially acting as a B-cell surface signaling receptor. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 51363 ENSG00000182022 CHST15 carbohydrate (N-acetylgalactosamine 4-sulfate 6-O) sulfotransferase 15 NA
This gene encodes a member of the beta-1,3-N-acetylglucosaminyltransferase family. This enzyme is a type II transmembrane protein. It prefers the substrate of lacto-N-neotetraose, and is involved in the biosynthesis of poly-N-acetyllactosamine chains. Two transcript variants encoding the same protein have been found for this gene. 10678 ENSG00000170340 B3GNT2 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 2 NA
This gene encodes a predicted transmembrane protein containing two extracellular CUB domains followed by a low-density lipoprotein class A (LDLa) domain. A similar gene in rats encodes a protein that modulates glutamate signaling in the brain by regulating kainate receptor function. Expression of this gene may be a biomarker for proliferating infantile hemangiomas. A pseudogene of this gene is located on the long arm of chromosome 8. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 81831 ENSG00000171208 NETO2 neuropilin and tolloid like 2 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_brain_clus_",4,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 5 Annotations

out <- mygene::queryMany(gene_list_brain[5,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id symbol name query notfound
The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. 4155 MBP myelin basic protein ENSG00000197971 NA
NA ENSG00000266844 RP11-862L9.3 NA ENSG00000266844 NA
This gene encodes a glycoprotein with an approximate molecular weight of 76.5 kDa. It is thought to have been created as a result of an ancient gene duplication event that led to generation of homologous C and N-terminal domains each of which binds one ion of ferric iron. The function of this protein is to transport iron from the intestine, reticuloendothelial system, and liver parenchymal cells to all proliferating cells in the body. This protein may also have a physiologic role as granulocyte/pollen-binding protein (GPBP) involved in the removal of certain organic matter and allergens from serum. 7018 TF transferrin ENSG00000091513 NA
NA 79957 PAQR6 progestin and adipoQ receptor family member 6 ENSG00000160781 NA
NA 222166 MTURN maturin, neural progenitor differentiation regulator homolog (Xenopus) ENSG00000180354 NA
NA 58476 TP53INP2 tumor protein p53 inducible nuclear protein 2 ENSG00000078804 NA
NA 56650 CLDND1 claudin domain containing 1 ENSG00000080822 NA
This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. 6319 SCD stearoyl-CoA desaturase (delta-9-desaturase) ENSG00000099194 NA
This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 2670 GFAP glial fibrillary acidic protein ENSG00000131095 NA
The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. This protein is highly expressed in brain tissue and may play a role in macrophage lipid metabolism and neural development. Two transcript variants encoding different isoforms have been found for this gene. 20 ABCA2 ATP binding cassette subfamily A member 2 ENSG00000107331 NA
NA 5129 CDK18 cyclin-dependent kinase 18 ENSG00000117266 NA
CARNS1 (EC 6.3.2.11), a member of the ATP-grasp family of ATPases, catalyzes the formation of carnosine (beta-alanyl-L-histidine) and homocarnosine (gamma-aminobutyryl-L-histidine), which are found mainly in skeletal muscle and the central nervous system, respectively (Drozak et al., 2010 [PubMed 20097752]). 57571 CARNS1 carnosine synthase 1 ENSG00000172508 NA
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21; however, this gene is located at 21q22.3. This protein may function in Neurite extension, proliferation of melanoma cells, stimulation of Ca2+ fluxes, inhibition of PKC-mediated phosphorylation, astrocytosis and axonal proliferation, and inhibition of microtubule assembly. Chromosomal rearrangements and altered expression of this gene have been implicated in several neurological, neoplastic, and other types of diseases, including Alzheimer’s disease, Down’s syndrome, epilepsy, amyotrophic lateral sclerosis, melanoma, and type I diabetes. 6285 S100B S100 calcium binding protein B ENSG00000160307 NA
This gene encodes a protein that contains a Ras association domain. Similar to its cattle and sheep counterparts, this gene is located near the prion gene. Two alternatively spliced transcripts encoding the same isoform have been reported. 9770 RASSF2 Ras association domain family member 2 ENSG00000101265 NA
Armadillo-like proteins are characterized by a series of armadillo repeats, first defined in the Drosophila ‘armadillo’ gene product, that are typically 42 to 45 amino acids in length. These proteins can be divided into subfamilies based on their number of repeats, their overall sequence similarity, and the dispersion of the repeats throughout their sequences. Members of the p120(ctn)/plakophilin subfamily of Armadillo-like proteins, including CTNND1, CTNND2, PKP1, PKP2, PKP4, and ARVCF. PKP4 may be a component of desmosomal plaque and other adhesion plaques and is thought to be involved in regulating junctional plaque organization and cadherin function. Multiple transcript variants encoding different isoforms have been found for this gene. 8502 PKP4 plakophilin 4 ENSG00000144283 NA
This gene encodes an actin-binding protein that plays a role in cell growth and migration, and in cytokinesis. The encoded protein is thought to regulate actin cytoskeletal dynamics in podocytes, components of the glomerulus. Mutations in this gene are associated with focal segmental glomerulosclerosis 8. Alternative splicing results in multiple transcript variants encoding different isoforms. 54443 ANLN anillin actin binding protein ENSG00000011426 NA
This gene is a member of the septin family of nucleotide binding proteins, originally described in yeast as cell division cycle regulatory proteins. Septins are highly conserved in yeast, Drosophila, and mouse, and appear to regulate cytoskeletal organization. Disruption of septin function disturbs cytokinesis and results in large multinucleate or polyploid cells. This gene is highly expressed in brain and heart. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. One of the isoforms (known as ARTS) is distinct; it is localized to the mitochondria, and has a role in apoptosis and cancer. 5414 SEPT4 septin 4 ENSG00000108387 NA
The protein encoded by this gene is a major non-neuronal microtubule-associated protein. This protein contains a domain similar to the microtubule-binding domains of neuronal microtubule-associated protein (MAP2) and microtubule-associated protein tau (MAPT/TAU). This protein promotes microtubule assembly, and has been shown to counteract destabilization of interphase microtubule catastrophe promotion. Cyclin B was found to interact with this protein, which targets cell division cycle 2 (CDC2) kinase to microtubules. The phosphorylation of this protein affects microtubule properties and cell cycle progression. Multiple transcript variants encoding different isoforms have been found for this gene. 4134 MAP4 microtubule associated protein 4 ENSG00000047849 NA
NA 58473 PLEKHB1 pleckstrin homology domain containing B1 ENSG00000021300 NA
NA 10507 SEMA4D semaphorin 4D ENSG00000187764 NA
This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets, and also play critical roles in maintaining cell polarity and signal transductions. The protein encoded by this gene is a major component of central nervous system (CNS) myelin and plays an important role in regulating proliferation and migration of oligodendrocytes. Mouse studies showed that the gene deficiency results in deafness and loss of the Sertoli cell epithelial phenotype in the testis. This protein is a tight junction protein at the human blood-testis barrier (BTB), and the BTB disruption is related to a dysfunction of this gene. Alternatively spliced transcript variants encoding different isoforms have been identified. 5010 CLDN11 claudin 11 ENSG00000013297 NA
This gene is a member of the septin family of nucleotide binding proteins, originally described in yeast as cell division cycle regulatory proteins. Septins are highly conserved in yeast, Drosophila, and mouse, and appear to regulate cytoskeletal organization. Disruption of septin function disturbs cytokinesis and results in large multinucleate or polyploid cells. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 23176 SEPT8 septin 8 ENSG00000164402 NA
The protein encoded by this gene is a member of the kinesin-like protein family. The family members are microtubule-dependent molecular motors that transport organelles within cells and move chromosomes during cell division. Mutations in this gene are a cause of spastic ataxia 2, autosomal recessive. 10749 KIF1C kinesin family member 1C ENSG00000129250 NA
NA NA NA NA ENSG00000256545 TRUE
The protein encoded by this gene functions as both a phosphodiesterase, which cleaves phosphodiester bonds at the 5’ end of oligonucleotides, and a phospholipase, which catalyzes production of lysophosphatidic acid (LPA) in extracellular fluids. LPA evokes growth factor-like responses including stimulation of cell proliferation and chemotaxis. This gene product stimulates the motility of tumor cells and has angiogenic properties, and its expression is upregulated in several kinds of carcinomas. The gene product is secreted and further processed to make the biologically active form. Several alternatively spliced transcript variants encoding different isoforms have been identified. 5168 ENPP2 ectonucleotide pyrophosphatase/phosphodiesterase 2 ENSG00000136960 NA
This gene encodes a transcription factor that is required for central nervous system myelination and may regulate oligodendrocyte differentiation. It is thought to act by increasing the expression of genes that effect myelin production but may also directly promote myelin gene expression. Loss of a similar gene in mouse models results in severe demyelination. Alternative splicing results in multiple transcript variants. 745 MYRF myelin regulatory factor ENSG00000124920 NA
Microtubules of the eukaryotic cytoskeleton perform essential and diverse functions and are composed of a heterodimer of alpha and beta tubulins. The genes encoding these microtubule constituents belong to the tubulin superfamily, which is composed of six distinct families. Genes from the alpha, beta and gamma tubulin families are found in all eukaryotes. The alpha and beta tubulins represent the major components of microtubules, while gamma tubulin plays a critical role in the nucleation of microtubule assembly. There are multiple alpha and beta tubulin genes, which are highly conserved among species. This gene encodes alpha tubulin and is highly similar to the mouse and rat Tuba1 genes. Northern blotting studies have shown that the gene expression is predominantly found in morphologically differentiated neurologic cells. This gene is one of three alpha-tubulin genes in a cluster on chromosome 12q. Mutations in this gene cause lissencephaly type 3 (LIS3) - a neurological condition characterized by microcephaly, mental retardation, and early-onset epilepsy and caused by defective neuronal migration. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 7846 TUBA1A tubulin alpha 1a ENSG00000167552 NA
The gene is a member of the inositol-polyphosphate 5-phosphatase family. The encoded protein interacts with the ras-related C3 botulinum toxin substrate 1, which causes translocation of the encoded protein to the plasma membrane where it inhibits clathrin-mediated endocytosis. Alternative splicing results in multiple transcript variants. 8871 SYNJ2 synaptojanin 2 ENSG00000078269 NA
NA 3306 HSPA2 heat shock protein family A (Hsp70) member 2 ENSG00000126803 NA
This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein involved in stress responses, hormone responses, cell growth, and differentiation. The encoded protein is necessary for p53-mediated caspase activation and apoptosis. Mutations in this gene are a cause of Charcot-Marie-Tooth disease type 4D, and expression of this gene may be a prognostic indicator for several types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 10397 NDRG1 N-myc downstream regulated 1 ENSG00000104419 NA
The protein encoded by this gene is involved in the attachment of osteoclasts to the mineralized bone matrix. The encoded protein is secreted and binds hydroxyapatite with high affinity. The osteoclast vitronectin receptor is found in the cell membrane and may be involved in the binding to this protein. This protein is also a cytokine that upregulates expression of interferon-gamma and interleukin-12. Several transcript variants encoding different isoforms have been found for this gene. 6696 SPP1 secreted phosphoprotein 1 ENSG00000118785 NA
Phosphatidylinositol-5,4-bisphosphate, the precursor to second messengers of the phosphoinositide signal transduction pathways, is thought to be involved in the regulation of secretion, cell proliferation, differentiation, and motility. The protein encoded by this gene is one of a family of enzymes capable of catalyzing the phosphorylation of phosphatidylinositol-5-phosphate on the fourth hydroxyl of the myo-inositol ring to form phosphatidylinositol-5,4-bisphosphate. The amino acid sequence of this enzyme does not show homology to other kinases, but the recombinant protein does exhibit kinase activity. This gene is a member of the phosphatidylinositol-5-phosphate 4-kinase family. 5305 PIP4K2A phosphatidylinositol-5-phosphate 4-kinase type 2 alpha ENSG00000150867 NA
This gene is an ortholog of the C. elegans unc-76 gene, which is necessary for normal axonal bundling and elongation within axon bundles. Expression of this gene in C. elegans unc-76 mutants can restore to the mutants partial locomotion and axonal fasciculation, suggesting that it also functions in axonal outgrowth. The N-terminal half of the gene product is highly acidic. Alternatively spliced transcript variants encoding different isoforms of this protein have been described. 9638 FEZ1 fasciculation and elongation protein zeta 1 ENSG00000149557 NA
This gene encodes the enzyme dihydropteridine reductase, which catalyzes the NADH-mediated reduction of quinonoid dihydrobiopterin. This enzyme is an essential component of the pterin-dependent aromatic amino acid hydroxylating systems. Mutations in this gene resulting in QDPR deficiency include aberrant splicing, amino acid substitutions, insertions, or premature terminations. Dihydropteridine reductase deficiency presents as atypical phenylketonuria due to insufficient production of biopterin, a cofactor for phenylalanine hydroxylase. 5860 QDPR quinoid dihydropteridine reductase ENSG00000151552 NA
NA 64834 ELOVL1 ELOVL fatty acid elongase 1 ENSG00000066322 NA
NA 55314 TMEM144 transmembrane protein 144 ENSG00000164124 NA
NA 66008 TRAK2 trafficking protein, kinesin binding 2 ENSG00000115993 NA
NA 83543 AIF1L allograft inflammatory factor 1 like ENSG00000126878 NA
Cytoplasmic dynein is a microtubule-associated motor protein (Hughes et al., 1995 [PubMed 7738094]). See DYNC1H1 (MIM 600112) for general information about dyneins. 1783 DYNC1LI2 dynein cytoplasmic 1 light intermediate chain 2 ENSG00000135720 NA
This gene encodes a member of the subtilisin-like proprotein convertase family, which includes proteases that process protein and peptide precursors trafficking through regulated or constitutive branches of the secretory pathway. The encoded protein undergoes an initial autocatalytic processing event in the ER to generate a heterodimer which exits the ER and sorts to the trans-Golgi network where a second autocatalytic event takes place and the catalytic activity is acquired. The encoded protease is constitutively secreted into the extracellular matrix and expressed in many tissues, including neuroendocrine, liver, gut, and brain. This gene encodes one of the seven basic amino acid-specific members which cleave their substrates at single or paired basic residues. Some of its substrates include transforming growth factor beta related proteins, proalbumin, and von Willebrand factor. This gene is thought to play a role in tumor progression and left-right patterning. Alternatively spliced transcript variants encoding different isoforms have been identified. 5046 PCSK6 proprotein convertase subtilisin/kexin type 6 ENSG00000140479 NA
NA 23446 SLC44A1 solute carrier family 44 member 1 ENSG00000070214 NA
This gene encodes a conserved serine/threonine kinase that is a member of the homeodomain-interacting protein kinase family. The encoded protein interacts with homeodomain transcription factors and many other transcription factors such as p53, and can function as both a corepressor and a coactivator depending on the transcription factor and its subcellular localization. Multiple transcript variants encoding different isoforms have been found for this gene. 28996 HIPK2 homeodomain interacting protein kinase 2 ENSG00000064393 NA
The protein encoded by this gene is a member of the Zfh1 family of 2-handed zinc finger/homeodomain proteins. It is located in the nucleus and functions as a DNA-binding transcriptional repressor that interacts with activated SMADs. Mutations in this gene are associated with Hirschsprung disease/Mowat-Wilson syndrome. Alternatively spliced transcript variants have been found for this gene. 9839 ZEB2 zinc finger E-box binding homeobox 2 ENSG00000169554 NA
The protein encoded by this gene belongs to a small class of the protein tyrosine phosphatase (PTP) family. PTPs are cell signaling molecules that play regulatory roles in a variety of cellular processes. PTPs in this class contain a protein tyrosine phosphatase catalytic domain and a characteristic C-terminal prenylation motif. This PTP has been shown to primarily associate with plasmic and endosomal membrane through its C-terminal prenylation. This PTP was found to interact with the beta-subunit of Rab geranylgeranyltransferase II (beta GGT II), and thus may function as a regulator of GGT II activity. Overexpression of this gene in mammalian cells conferred a transformed phenotype, which suggested its role in tumorigenesis. Alternatively spliced transcript variants have been described. Related pseudogenes exist on chromosomes 11, 12 and 17. 8073 PTP4A2 protein tyrosine phosphatase type IVA, member 2 ENSG00000184007 NA
NA 57698 SHTN1 shootin 1 ENSG00000187164 NA
NA 1267 CNP 2’,3’-cyclic nucleotide 3’ phosphodiesterase ENSG00000173786 NA
NA 51148 CERCAM cerebral endothelial cell adhesion molecule ENSG00000167123 NA
The integral membrane protein encoded by this gene is a lysophosphatidic acid (LPA) receptor from a group known as EDG receptors. These receptors are members of the G protein-coupled receptor superfamily. Utilized by LPA for cell signaling, EDG receptors mediate diverse biologic functions, including proliferation, platelet aggregation, smooth muscle contraction, inhibition of neuroblastoma cell differentiation, chemotaxis, and tumor cell invasion. Two transcript variants encoding the same protein have been identified for this gene 1902 LPAR1 lysophosphatidic acid receptor 1 ENSG00000198121 NA
This gene encodes a flavin adenine dinucleotide (FAD)-dependent oxidoreductase which catalyzes the reduction of the delta-24 double bond of sterol intermediates during cholesterol biosynthesis. The protein contains a leader sequence that directs it to the endoplasmic reticulum membrane. Missense mutations in this gene have been associated with desmosterolosis. Also, reduced expression of the gene occurs in the temporal cortex of Alzheimer disease patients and overexpression has been observed in adrenal gland cancer cells. 1718 DHCR24 24-dehydrocholesterol reductase ENSG00000116133 NA
NA 83641 FAM107B family with sequence similarity 107 member B ENSG00000065809 NA
This gene encodes a member of the peptidyl arginine deiminase family of enzymes, which catalyze the post-translational deimination of proteins by converting arginine residues into citrullines in the presence of calcium ions. The family members have distinct substrate specificities and tissue-specific expression patterns. The type II enzyme is the most widely expressed family member. Known substrates for this enzyme include myelin basic protein in the central nervous system and vimentin in skeletal muscle and macrophages. This enzyme is thought to play a role in the onset and progression of neurodegenerative human disorders, including Alzheimer disease and multiple sclerosis, and it has also been implicated in glaucoma pathogenesis. This gene exists in a cluster with four other paralogous genes. 11240 PADI2 peptidyl arginine deiminase 2 ENSG00000117115 NA
This gene encodes a member of the myristoylated alanine-rich C-kinase substrate (MARCKS) family. Members of this family play a role in cytoskeletal regulation, protein kinase C signaling and calmodulin signaling. The encoded protein affects the formation of adherens junction. Alternative splicing results in multiple transcript variants. Pseudogenes of this gene are located on the long arm of chromosomes 6 and 10. 65108 MARCKSL1 MARCKS-like 1 ENSG00000175130 NA
Members of the RAS (see HRAS; MIM 190020) subfamily of GTPases function in signal transduction as GTP/GDP-regulated switches that cycle between inactive GDP- and active GTP-bound states. Guanine nucleotide exchange factors (GEFs), such as RAPGEF5, serve as RAS activators by promoting acquisition of GTP to maintain the active GTP-bound state and are the key link between cell surface receptors and RAS activation (Rebhun et al., 2000 [PubMed 10934204]). 9771 RAPGEF5 Rap guanine nucleotide exchange factor 5 ENSG00000136237 NA
The protein encoded by this gene belongs to the BCL2 family, members of which form homo- or heterodimers, and act as anti- or proapoptotic regulators that are involved in a wide variety of cellular processes. Studies in rat show that this protein has restricted expression in reproductive tissues, interacts strongly with some antiapoptotic BCL2 proteins, not at all with proapoptotic BCL2 proteins, and induces apoptosis in transfected cells. Thus, this protein represents a proapoptotic member of the BCL2 family. 666 BOK BCL2-related ovarian killer ENSG00000176720 NA
NA 9725 TMEM63A transmembrane protein 63A ENSG00000196187 NA
This gene encodes a member of the sirtuin family of proteins, homologs to the yeast Sir2 protein. Members of the sirtuin family are characterized by a sirtuin core domain and grouped into four classes. The functions of human sirtuins have not yet been determined; however, yeast sirtuin proteins are known to regulate epigenetic gene silencing and suppress recombination of rDNA. Studies suggest that the human sirtuins may function as intracellular regulatory proteins with mono-ADP-ribosyltransferase activity. The protein encoded by this gene is included in class I of the sirtuin family. Several transcript variants are resulted from alternative splicing of this gene. 22933 SIRT2 sirtuin 2 ENSG00000068903 NA
The protein encoded by this gene belongs to the protein phosphatase 1 (PP1) inhibitor family. This protein is an inhibitor of smooth muscle myosin phosphatase, and has higher inhibitory activity when phosphorylated. Inhibition of myosin phosphatase leads to increased myosin phosphorylation and enhanced smooth muscle contraction. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. 94274 PPP1R14A protein phosphatase 1 regulatory inhibitor subunit 14A ENSG00000167641 NA
Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. 1410 CRYAB crystallin alpha B ENSG00000109846 NA
This gene encodes a selenoprotein containing multiple selenocysteine (Sec) residues, which are encoded by the UGA codon that normally signals translation termination. The 3’ UTR of selenoprotein genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. This selenoprotein is an extracellular glycoprotein, and is unusual in that it contains 10 Sec residues per polypeptide. It is a heparin-binding protein that appears to be associated with endothelial cells, and has been implicated to function as an antioxidant in the extracellular space. Several transcript variants, encoding either the same or different isoform, have been found for this gene. 6414 SEPP1 selenoprotein P, plasma, 1 ENSG00000250722 NA
NA 11145 PLA2G16 phospholipase A2 group XVI ENSG00000176485 NA
This gene encodes a member of the highly conserved amyloid precursor protein gene family. The encoded protein is a membrane-associated glycoprotein that is cleaved by secretases in a manner similar to amyloid beta A4 precursor protein cleavage. This cleavage liberates an intracellular cytoplasmic fragment that may act as a transcriptional activator. The encoded protein may also play a role in synaptic maturation during cortical development. Alternatively spliced transcript variants encoding different isoforms have been described. 333 APLP1 amyloid beta precursor like protein 1 ENSG00000105290 NA
The protein encoded by this gene belongs to the class-3 semaphorin/collapsin family, whose members function in growth cone guidance during neuronal development. This family member inhibits axonal extension and has been shown to act as a tumor suppressor by inducing apoptosis. Alternative splicing of this gene results in multiple transcript variants. 7869 SEMA3B semaphorin 3B ENSG00000012171 NA
NA 64077 LHPP phospholysine phosphohistidine inorganic pyrophosphate phosphatase ENSG00000107902 NA
This gene encodes a formin-related protein. Formin-related proteins have been implicated in morphogenesis, cytokinesis, and cell polarity. Alternatively spliced transcript variants encoding different isoforms have been described but their full-length nature has yet to be determined. 114793 FMNL2 formin like 2 ENSG00000157827 NA
This gene is imprinted, with preferential expression of the maternal allele. The encoded protein is a tight-binding, strong inhibitor of several G1 cyclin/Cdk complexes and a negative regulator of cell proliferation. Mutations in this gene are implicated in sporadic cancers and Beckwith-Wiedemann syndorome, suggesting that this gene is a tumor suppressor candidate. Three transcript variants encoding two different isoforms have been found for this gene. 1028 CDKN1C cyclin-dependent kinase inhibitor 1C ENSG00000129757 NA
Most mRNAs, except for histones, contain a 3-prime poly(A) tail. Poly(A)-binding protein (PABP; see MIM 604679) enhances translation by circularizing mRNA through its interaction with the translation initiation factor EIF4G1 (MIM 600495) and the poly(A) tail. Various PABP-binding proteins regulate PABP activity, including PAIP1 (MIM 605184), a translational stimulator, and PAIP2A (MIM 605604) and PAIP2B, translational inhibitors (Derry et al., 2006 [PubMed 17381337]). 400961 PAIP2B poly(A) binding protein interacting protein 2B ENSG00000124374 NA
Calpain, a heterodimer consisting of a large and a small subunit, is a major intracellular protease, although its function has not been well established. This gene encodes a muscle-specific member of the calpain large subunit family that specifically binds to titin. Mutations in this gene are associated with limb-girdle muscular dystrophies type 2A. Alternate promoters and alternative splicing result in multiple transcript variants encoding different isoforms and some variants are ubiquitously expressed. 825 CAPN3 calpain 3 ENSG00000092529 NA
This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. 347 APOD apolipoprotein D ENSG00000189058 NA
This gene encodes a member of the WNK subfamily of serine/threonine protein kinases. The encoded protein may be a key regulator of blood pressure by controlling the transport of sodium and chloride ions. Mutations in this gene have been associated with pseudohypoaldosteronism type II and hereditary sensory neuropathy type II. Alternatively spliced transcript variants encoding different isoforms have been described but the full-length nature of all of them has yet to be determined. 65125 WNK1 WNK lysine deficient protein kinase 1 ENSG00000060237 NA
NA ENSG00000258461 RP11-164J13.1 NA ENSG00000258461 NA
This gene encodes a member of the tweety family of proteins. Members of this family function as chloride anion channels. The encoded protein functions as a calcium(2+)-activated large conductance chloride(-) channel, and may play a role in kidney tumorigenesis. Two transcript variants encoding distinct isoforms have been identified for this gene. 94015 TTYH2 tweety family member 2 ENSG00000141540 NA
NA ENSG00000251660 AC007036.5 NA ENSG00000251660 NA
Members of the RAS (see HRAS; MIM 190020) subfamily of GTPases function in signal transduction as GTP/GDP-regulated switches that cycle between inactive GDP- and active GTP-bound states. Guanine nucleotide exchange factors (GEFs), such as RASGRP3, serve as RAS activators by promoting acquisition of GTP to maintain the active GTP-bound state and are the key link between cell surface receptors and RAS activation (Rebhun et al., 2000 [PubMed 10934204]). 25780 RASGRP3 RAS guanyl releasing protein 3 ENSG00000152689 NA
NA 9728 SECISBP2L SECIS binding protein 2 like ENSG00000138593 NA
This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 2819 GPD1 glycerol-3-phosphate dehydrogenase 1 ENSG00000167588 NA
Epidermodysplasia verruciformis (EV) is an autosomal recessive dermatosis characterized by abnormal susceptibility to human papillomaviruses (HPVs) and a high rate of progression to squamous cell carcinoma on sun-exposed skin. EV is caused by mutations in either of two adjacent genes located on chromosome 17q25.3. Both of these genes encode integral membrane proteins that localize to the endoplasmic reticulum and are predicted to form transmembrane channels. This gene encodes a transmembrane channel-like protein with 10 transmembrane domains and 2 leucine zipper motifs. 11322 TMC6 transmembrane channel like 6 ENSG00000141524 NA
NA ENSG00000259172 RP11-299G20.2 NA ENSG00000259172 NA
NA 23500 DAAM2 dishevelled associated activator of morphogenesis 2 ENSG00000146122 NA
The protein encoded by this gene is similar to bovine and porcine proteins which accelerate transfer of certain glycosphingolipids and glyceroglycolipids between membranes. It is thought to be a cytoplasmic protein. 51228 GLTP glycolipid transfer protein ENSG00000139433 NA
This gene encodes one of the three alpha chains of type IX collagen, the major collagen component of hyaline cartilage. Type IX collagen, a heterotrimeric molecule, is usually found in tissues containing type II collagen, a fibrillar collagen. This chain is unusual in that, unlike the other two type IX alpha chains, it contains a covalently attached glycosaminoglycan side chain. Mutations in this gene are associated with multiple epiphyseal dysplasia. 1298 COL9A2 collagen type IX alpha 2 ENSG00000049089 NA
This gene encodes a protein that plays an important role in the organization of the actin cytoskeleton. The encoded protein binds to a region of Wiskott-Aldrich syndrome protein that is frequently mutated in Wiskott-Aldrich syndrome, an X-linked recessive disorder. Impairment of the interaction between these two proteins may contribute to the disease. Two transcript variants encoding the same protein have been identified for this gene. 7456 WIPF1 WAS/WASL interacting protein family member 1 ENSG00000115935 NA
This gene encodes transthyretin, one of the three prealbumins including alpha-1-antitrypsin, transthyretin and orosomucoid. Transthyretin is a carrier protein; it transports thyroid hormones in the plasma and cerebrospinal fluid, and also transports retinol (vitamin A) in the plasma. The protein consists of a tetramer of identical subunits. More than 80 different mutations in this gene have been reported; most mutations are related to amyloid deposition, affecting predominantly peripheral nerve and/or the heart, and a small portion of the gene mutations is non-amyloidogenic. The diseases caused by mutations include amyloidotic polyneuropathy, euthyroid hyperthyroxinaemia, amyloidotic vitreous opacities, cardiomyopathy, oculoleptomeningeal amyloidosis, meningocerebrovascular amyloidosis, carpal tunnel syndrome, etc. 7276 TTR transthyretin ENSG00000118271 NA
This gene encodes a member of the pancreatic-type of secretory ribonucleases, a subset of the ribonuclease A superfamily. The encoded endonuclease cleaves internal phosphodiester RNA bonds on the 3’-side of pyrimidine bases. It prefers poly(C) as a substrate and hydrolyzes 2’,3’-cyclic nucleotides, with a pH optimum near 8.0. The encoded protein is monomeric and more commonly acts to degrade ds-RNA over ss-RNA. Alternative splicing occurs at this locus and four transcript variants encoding the same protein have been identified. 6035 RNASE1 ribonuclease A family member 1, pancreatic ENSG00000129538 NA
NA 389337 ARHGEF37 Rho guanine nucleotide exchange factor 37 ENSG00000183111 NA
NA 91369 ANKRD40 ankyrin repeat domain 40 ENSG00000154945 NA
The protein encoded by this gene is a member of the serine/threonine protein kinase family. This kinase has been shown to specifically activate MAPK8/JNK. The activation of MAPK8 by this kinase is found to be inhibited by the dominant-negative mutants of MAP3K7/TAK1, MAP2K4/MKK4, and MAP2K7/MKK7, which suggests that this kinase may function through the MAP3K7-MAP2K4-MAP2K7 kinase cascade, and mediate the TNF-alpha signaling pathway. Alternatively spliced transcript variants encoding different isoforms have been identified. 9448 MAP4K4 mitogen-activated protein kinase kinase kinase kinase 4 ENSG00000071054 NA
This gene encodes a protein that has sequence similarity to yeast longevity assurance gene 1. Mutation or overexpression of the related gene in yeast has been shown to alter yeast lifespan. The human protein may play a role in the regulation of cell growth. Alternatively spliced transcript variants encoding the same protein have been described. 29956 CERS2 ceramide synthase 2 ENSG00000143418 NA
NA 85414 SLC45A3 solute carrier family 45 member 3 ENSG00000158715 NA
This gene encodes a member of the tripartite motif (TRIM) family. The TRIM family is characterized by a signature motif composed of a RING finger, one or more B-box domains, and a coiled-coil region. This encoded protein may play a role in protein kinase C signaling. Multiple transcript variants encoding different isoforms have been found for this gene. 90933 TRIM41 tripartite motif containing 41 ENSG00000146063 NA
NA 6856 SYPL1 synaptophysin like 1 ENSG00000008282 NA
NA 3799 KIF5B kinesin family member 5B ENSG00000170759 NA
NA ENSG00000260465 RP11-63M22.2 NA ENSG00000260465 NA
This gene encodes lipase A, the lysosomal acid lipase (also known as cholesterol ester hydrolase). This enzyme functions in the lysosome to catalyze the hydrolysis of cholesteryl esters and triglycerides. Mutations in this gene can result in Wolman disease and cholesteryl ester storage disease. Alternatively spliced transcript variants have been found for this gene. 3988 LIPA lipase A, lysosomal acid type ENSG00000107798 NA
NA 4642 MYO1D myosin ID ENSG00000176658 NA
NA 10541 ANP32B acidic nuclear phosphoprotein 32 family member B ENSG00000136938 NA
This gene encodes a member of the 1-acylglycerol-3-phosphate O-acyltransferase family. This integral membrane protein converts lysophosphatidic acid to phosphatidic acid, the second step in de novo phospholipid biosynthesis. 56895 AGPAT4 1-acylglycerol-3-phosphate O-acyltransferase 4 ENSG00000026652 NA
This gene encodes a death effector domain-containing protein that functions as a negative regulator of apoptosis. The encoded protein is an endogenous substrate for protein kinase C. This protein is also overexpressed in type 2 diabetes mellitus, where it may contribute to insulin resistance in glucose uptake. Alternative splicing results in multiple transcript variants. 8682 PEA15 phosphoprotein enriched in astrocytes 15 ENSG00000162734 NA
This gene encodes a magnesium transporter that associates with early endosomes and the cell surface in a variety of neuronal and epithelial cells. This protein may play a role in nervous system development and maintenance. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene have been associated with autosomal dominant spastic paraplegia 6. 123606 NIPA1 non imprinted in Prader-Willi/Angelman syndrome 1 ENSG00000170113 NA
NA 91947 ARRDC4 arrestin domain containing 4 ENSG00000140450 NA
The protein encoded by this gene contains a RING zinc finger, a motif known to be involved in protein-protein interactions. The specific function of this gene has not yet been determined. Alternatively spliced transcript variants that encode the same protein have been reported. A pseudogene, which is also located on chromosome 3, has been defined for this gene. 11342 RNF13 ring finger protein 13 ENSG00000082996 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_brain_clus_",5,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 6 Annotations

out <- mygene::queryMany(gene_list_brain[6,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id name symbol query summary notfound
ENSG00000237973 MT-CO1 pseudogene 12 MTCO1P12 ENSG00000237973 NA NA
ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 MTND2P28 ENSG00000225630 NA NA
ENSG00000229344 MT-CO2 pseudogene 12 MTCO2P12 ENSG00000229344 NA NA
ENSG00000225972 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 1 pseudogene 23 MTND1P23 ENSG00000225972 NA NA
ENSG00000271043 MT-RNR2-like 2 MTRNR2L2 ENSG00000271043 NA NA
100463486 MT-RNR2-like 8 MTRNR2L8 ENSG00000255823 NA NA
8826 IQ motif containing GTPase activating protein 1 IQGAP1 ENSG00000140575 This gene encodes a member of the IQGAP family. The protein contains four IQ domains, one calponin homology domain, one Ras-GAP domain and one WW domain. It interacts with components of the cytoskeleton, with cell adhesion molecules, and with several signaling molecules to regulate cell morphology and motility. Expression of the protein is upregulated by gene amplification in two gastric cancer cell lines. NA
ENSG00000249119 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 6 pseudogene 4 MTND6P4 ENSG00000249119 NA NA
NA NA NA ENSG00000258486 NA TRUE
2 alpha-2-macroglobulin A2M ENSG00000175899 Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. NA
718 complement component 3 C3 ENSG00000125730 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. NA
ENSG00000237550 ribosomal protein L9 pseudogene 9 RPL9P9 ENSG00000237550 NA NA
4625 myosin, heavy chain 7, cardiac muscle, beta MYH7 ENSG00000092054 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. NA
7038 thyroglobulin TG ENSG00000042832 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. NA
6439 surfactant protein B SFTPB ENSG00000168878 This gene encodes the pulmonary-associated surfactant protein B (SPB), an amphipathic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. The SPB enhances the rate of spreading and increases the stability of surfactant monolayers in vitro. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 1, also called pulmonary alveolar proteinosis due to surfactant protein B deficiency, and are associated with fatal respiratory distress in the neonatal period. Alternatively spliced transcript variants encoding the same protein have been identified. NA
NA NA NA ENSG00000140181 NA TRUE
6711 spectrin beta, non-erythrocytic 1 SPTBN1 ENSG00000115306 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. NA
714 complement component 1, q subcomponent, C chain C1QC ENSG00000159189 This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. A deficiency in C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N-terminus, and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the C-chain polypeptide of human complement subcomponent C1q. Alternatively spliced transcript variants that encode the same protein have been found for this gene. NA
4633 myosin light chain 2 MYL2 ENSG00000111245 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. NA
ENSG00000250182 eukaryotic translation elongation factor 1 alpha 1 pseudogene 13 EEF1A1P13 ENSG00000250182 NA NA
ENSG00000228502 eukaryotic translation elongation factor 1 alpha 1 pseudogene 11 EEF1A1P11 ENSG00000228502 NA NA
ENSG00000263740 RNA, 7SL, cytoplasmic 4, pseudogene RN7SL4P ENSG00000263740 NA NA
653509 surfactant protein A1 SFTPA1 ENSG00000122852 This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. NA
115207 potassium channel tetramerization domain containing 12 KCTD12 ENSG00000178695 NA NA
ENSG00000249855 eukaryotic translation elongation factor 1 alpha 1 pseudogene 19 EEF1A1P19 ENSG00000249855 NA NA
643834 pepsinogen 3, group I (pepsinogen A) PGA3 ENSG00000229859 This gene encodes a protein precursor of the digestive enzyme pepsin, a member of the peptidase A1 family of endopeptidases. The encoded precursor is secreted by gastric chief cells and undergoes autocatalytic cleavage in acidic conditions to form the active enzyme, which functions in the digestion of dietary proteins. This gene is found in a cluster of related genes on chromosome 11, each of which encodes one of multiple pepsinogens. Pepsinogen levels in serum may serve as a biomarker for atrophic gastritis and gastric cancer. NA
1915 eukaryotic translation elongation factor 1 alpha 1 EEF1A1 ENSG00000156508 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. NA
2335 fibronectin 1 FN1 ENSG00000115414 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. NA
1158 creatine kinase, M-type CKM ENSG00000104879 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. NA
1808 dihydropyrimidinase like 2 DPYSL2 ENSG00000092964 This gene encodes a member of the collapsin response mediator protein family. Collapsin response mediator proteins form homo- and hetero-tetramers and facilitate neuron guidance, growth and polarity. The encoded protein promotes microtubule assembly and is required for Sema3A-mediated growth cone collapse, and also plays a role in synaptic signaling through interactions with calcium channels. This gene has been implicated in multiple neurological disorders, and hyperphosphorylation of the encoded protein may play a key role in the development of Alzheimer’s disease. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
5225 progastricsin (pepsinogen C) PGC ENSG00000096088 This gene encodes an aspartic proteinase that belongs to the peptidase family A1. The encoded protein is a digestive enzyme that is produced in the stomach and constitutes a major component of the gastric mucosa. This protein is also secreted into the serum. This protein is synthesized as an inactive zymogen that includes a highly basic prosegment. This enzyme is converted into its active mature form at low pH by sequential cleavage of the prosegment that is carried out by the enzyme itself. Polymorphisms in this gene are associated with susceptibility to gastric cancers. Serum levels of this enzyme are used as a biomarker for certain gastric diseases including Helicobacter pylori related gastritis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 1. NA
NA NA NA ENSG00000184779 NA TRUE
ENSG00000234975 ferritin, heavy polypeptide 1 pseudogene 2 FTH1P2 ENSG00000234975 NA NA
ENSG00000269930 NA RP11-932O9.9 ENSG00000269930 NA NA
713 complement component 1, q subcomponent, B chain C1QB ENSG00000173369 This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the B-chain polypeptide of human complement subcomponent C1q NA
ENSG00000214199 eukaryotic translation elongation factor 1 alpha 1 pseudogene 12 EEF1A1P12 ENSG00000214199 NA NA
1278 collagen type I alpha 2 COL1A2 ENSG00000164692 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. NA
ENSG00000136149 ribosomal protein L13a pseudogene 25 RPL13AP25 ENSG00000136149 NA NA
712 complement component 1, q subcomponent, A chain C1QA ENSG00000173372 This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the A-chain polypeptide of human complement subcomponent C1q. NA
4619 myosin, heavy chain 1, skeletal muscle, adult MYH1 ENSG00000109061 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. NA
7048 transforming growth factor beta receptor II TGFBR2 ENSG00000163513 This gene encodes a member of the Ser/Thr protein kinase family and the TGFB receptor subfamily. The encoded protein is a transmembrane protein that has a protein kinase domain, forms a heterodimeric complex with another receptor protein, and binds TGF-beta. This receptor/ligand complex phosphorylates proteins, which then enter the nucleus and regulate the transcription of a subset of genes related to cell proliferation. Mutations in this gene have been associated with Marfan Syndrome, Loeys-Deitz Aortic Aneurysm Syndrome, and the development of various types of tumors. Alternatively spliced transcript variants encoding different isoforms have been characterized. NA
64397 zinc finger protein 106 ZNF106 ENSG00000103994 NA NA
ENSG00000213885 ribosomal protein L13a pseudogene 7 RPL13AP7 ENSG00000213885 NA NA
4629 myosin, heavy chain 11, smooth muscle MYH11 ENSG00000133392 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
ENSG00000242960 ferritin, heavy polypeptide 1 pseudogene 23 FTH1P23 ENSG00000242960 NA NA
ENSG00000222328 RNA, U2 small nuclear 2, pseudogene RNU2-2P ENSG00000222328 NA NA
58 actin, alpha 1, skeletal muscle ACTA1 ENSG00000143632 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. NA
143884 CWF19-like 2, cell cycle control (S. pombe) CWF19L2 ENSG00000152404 NA NA
7450 von Willebrand factor VWF ENSG00000110799 This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. NA
4151 myoglobin MB ENSG00000198125 This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. NA
ENSG00000213453 ferritin, heavy polypeptide 1 pseudogene 3 FTH1P3 ENSG00000213453 NA NA
ENSG00000226221 ribosomal protein L26 pseudogene 19 RPL26P19 ENSG00000226221 NA NA
ENSG00000249264 eukaryotic translation elongation factor 1 alpha 1 pseudogene 9 EEF1A1P9 ENSG00000249264 NA NA
ENSG00000259001 NA RPPH1 ENSG00000259001 NA NA
6480 ST6 beta-galactosamide alpha-2,6-sialyltranferase 1 ST6GAL1 ENSG00000073849 This gene encodes a member of glycosyltransferase family 29. The encoded protein is a type II membrane protein that catalyzes the transfer of sialic acid from CMP-sialic acid to galactose-containing substrates. The protein, which is normally found in the Golgi but can be proteolytically processed to a soluble form, is involved in the generation of the cell-surface carbohydrate determinants and differentiation antigens HB-6, CD75, and CD76. This gene has been incorrectly referred to as CD75. Three transcript variants encoding two different isoforms have been described. NA
NA NA NA ENSG00000090920 NA TRUE
NA NA NA ENSG00000265150 NA TRUE
ENSG00000244363 ribosomal protein L7 pseudogene 23 RPL7P23 ENSG00000244363 NA NA
ENSG00000264281 NA CTD-2031P19.4 ENSG00000264281 NA NA
1674 desmin DES ENSG00000175084 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. NA
2321 fms related tyrosine kinase 1 FLT1 ENSG00000102755 This gene encodes a member of the vascular endothelial growth factor receptor (VEGFR) family. VEGFR family members are receptor tyrosine kinases (RTKs) which contain an extracellular ligand-binding region with seven immunoglobulin (Ig)-like domains, a transmembrane segment, and a tyrosine kinase (TK) domain within the cytoplasmic domain. This protein binds to VEGFR-A, VEGFR-B and placental growth factor and plays an important role in angiogenesis and vasculogenesis. Expression of this receptor is found in vascular endothelial cells, placental trophoblast cells and peripheral blood monocytes. Multiple transcript variants encoding different isoforms have been found for this gene. Isoforms include a full-length transmembrane receptor isoform and shortened, soluble isoforms. The soluble isoforms are associated with the onset of pre-eclampsia. NA
1281 collagen type III alpha 1 COL3A1 ENSG00000168542 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. NA
ENSG00000234648 NA AL162151.3 ENSG00000234648 NA NA
3075 complement factor H CFH ENSG00000000971 This gene is a member of the Regulator of Complement Activation (RCA) gene cluster and encodes a protein with twenty short consensus repeat (SCR) domains. This protein is secreted into the bloodstream and has an essential role in the regulation of complement activation, restricting this innate defense mechanism to microbial infections. Mutations in this gene have been associated with hemolytic-uremic syndrome (HUS) and chronic hypocomplementemic nephropathy. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. NA
7173 thyroid peroxidase TPO ENSG00000115705 This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. NA
4969 osteoglycin OGN ENSG00000106809 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family of proteins. The encoded protein induces ectopic bone formation in conjunction with transforming growth factor beta and may regulate osteoblast differentiation. High expression of the encoded protein may be associated with elevated heart left ventricular mass. Alternative splicing results in multiple transcript variants. NA
4607 myosin binding protein C, cardiac MYBPC3 ENSG00000134571 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. NA
ENSG00000263327 TAPT1 antisense RNA 1 (head to head) TAPT1-AS1 ENSG00000263327 NA NA
ENSG00000211893 immunoglobulin heavy constant gamma 2 (G2m marker) IGHG2 ENSG00000211893 NA NA
26986 poly(A) binding protein cytoplasmic 1 PABPC1 ENSG00000070756 This gene encodes a poly(A) binding protein. The protein shuttles between the nucleus and cytoplasm and binds to the 3’ poly(A) tail of eukaryotic messenger RNAs via RNA-recognition motifs. The binding of this protein to poly(A) promotes ribosome recruitment and translation initiation; it is also required for poly(A) shortening which is the first step in mRNA decay. The gene is part of a small gene family including three protein-coding genes and several pseudogenes. NA
ENSG00000250461 NA RP11-631M6.2 ENSG00000250461 NA NA
ENSG00000236439 NA RP11-175B9.3 ENSG00000236439 NA NA
ENSG00000249936 ras-related C3 botulinum toxin substrate 1 pseudogene 2 RAC1P2 ENSG00000249936 NA NA
100093631 general transcription factor IIi pseudogene 4 GTF2IP4 ENSG00000233369 NA NA
ENSG00000239470 NA RP11-16F15.2 ENSG00000239470 NA NA
ENSG00000224094 ribosomal protein S24 pseudogene 8 RPS24P8 ENSG00000224094 NA NA
5336 phospholipase C gamma 2 PLCG2 ENSG00000197943 The protein encoded by this gene is a transmembrane signaling enzyme that catalyzes the conversion of 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate to 1D-myo-inositol 1,4,5-trisphosphate (IP3) and diacylglycerol (DAG) using calcium as a cofactor. IP3 and DAG are second messenger molecules important for transmitting signals from growth factor receptors and immune system receptors across the cell membrane. Mutations in this gene have been found in autoinflammation, antibody deficiency, and immune dysregulation syndrome and familial cold autoinflammatory syndrome 3. NA
387841 ribosomal protein L13a pseudogene 20 RPL13AP20 ENSG00000234498 NA NA
2533 FYN binding protein FYB ENSG00000082074 The protein encoded by this gene is an adapter for the FYN protein and LCP2 signaling cascades in T-cells. The encoded protein is involved in platelet activation and controls the expression of interleukin-2. Three transcript variants encoding different isoforms have been found for this gene. NA
3454 interferon alpha and beta receptor subunit 1 IFNAR1 ENSG00000142166 The protein encoded by this gene is a type I membrane protein that forms one of the two chains of a receptor for interferons alpha and beta. Binding and activation of the receptor stimulates Janus protein kinases, which in turn phosphorylate several proteins, including STAT1 and STAT2. The encoded protein also functions as an antiviral factor. NA
ENSG00000175886 ribosomal protein L7a pseudogene 66 RPL7AP66 ENSG00000175886 NA NA
1356 ceruloplasmin (ferroxidase) CP ENSG00000047457 The protein encoded by this gene is a metalloprotein that binds most of the copper in plasma and is involved in the peroxidation of Fe(II)transferrin to Fe(III) transferrin. Mutations in this gene cause aceruloplasminemia, which results in iron accumulation and tissue damage, and is associated with diabetes and neurologic abnormalities. Two transcript variants, one protein-coding and the other not protein-coding, have been found for this gene. NA
8490 regulator of G-protein signaling 5 RGS5 ENSG00000143248 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. NA
27063 ankyrin repeat domain 1 ANKRD1 ENSG00000148677 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. NA
780851 small nucleolar RNA, C/D box 3A SNORD3A ENSG00000263934 U3 RNA, an abundant small nucleolar RNA (snoRNA), is thought to play a role in the processing of ribosomal RNA precursors (Bernstein et al., 1983 [PubMed 6186397]). NA
1436 colony stimulating factor 1 receptor CSF1R ENSG00000182578 The protein encoded by this gene is the receptor for colony stimulating factor 1, a cytokine which controls the production, differentiation, and function of macrophages. This receptor mediates most if not all of the biological effects of this cytokine. Ligand binding activates the receptor kinase through a process of oligomerization and transphosphorylation. The encoded protein is a tyrosine kinase transmembrane receptor and member of the CSF1/PDGF receptor family of tyrosine-protein kinases. Mutations in this gene have been associated with a predisposition to myeloid malignancy. The first intron of this gene contains a transcriptionally inactive ribosomal protein L7 processed pseudogene oriented in the opposite direction. Alternative splicing results in multiple transcript variants. NA
397 Rho GDP dissociation inhibitor beta ARHGDIB ENSG00000111348 Members of the Rho (or ARH) protein family (see MIM 165390) and other Ras-related small GTP-binding proteins (see MIM 179520) are involved in diverse cellular events, including cell signaling, proliferation, cytoskeletal organization, and secretion. The GTP-binding proteins are active only in the GTP-bound state. At least 3 classes of proteins tightly regulate cycling between the GTP-bound and GDP-bound states: GTPase-activating proteins (GAPs), guanine nucleotide-releasing factors (GRFs), and GDP-dissociation inhibitors (GDIs). The GDIs, including ARHGDIB, decrease the rate of GDP dissociation from Ras-like GTPases (summary by Scherle et al., 1993 [PubMed 8356058]). NA
10398 myosin light chain 9 MYL9 ENSG00000101335 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. NA
115361 guanylate binding protein 4 GBP4 ENSG00000162654 Guanylate-binding proteins, such as GBP4, are induced by interferon and hydrolyze GTP to both GDP and GMP (Vestal, 2005 [PubMed 16108726]). NA
ENSG00000214485 ribosomal protein L7 pseudogene 1 RPL7P1 ENSG00000214485 NA NA
ENSG00000231747 NA AC079922.2 ENSG00000231747 NA NA
2013 epithelial membrane protein 2 EMP2 ENSG00000213853 This gene encodes a tetraspan protein of the PMP22/EMP family. The encoded protein regulates cell membrane composition. It has been associated with various functions including endocytosis, cell signaling, cell proliferation, cell migration, cell adhesion, cell death, cholesterol homeostasis, urinary albumin excretion, and embryo implantation. It is known to negatively regulate caveolin-1, a scaffolding protein which is the main component of the caveolae plasma membrane invaginations found in most cell types. Through activation of PTK2 it positively regulates vascular endothelial growth factor A. It also modulates the function of specific integrin isomers in the plasma membrane. Up-regulation of this gene has been linked to cancer progression in multiple different tissues. Mutations in this gene have been associated with nephrotic syndrome type 10 (NPHS10). NA
ENSG00000220749 ribosomal protein L21 pseudogene 28 RPL21P28 ENSG00000220749 NA NA
ENSG00000213411 RNA binding motif protein 22 pseudogene 2 RBM22P2 ENSG00000213411 NA NA
ENSG00000244021 NA RP11-50D9.1 ENSG00000244021 NA NA
5787 protein tyrosine phosphatase, receptor type B PTPRB ENSG00000127329 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP contains an extracellular domain, a single transmembrane segment and one intracytoplasmic catalytic domain, thus belongs to receptor type PTP. The extracellular region of this PTP is composed of multiple fibronectin type_III repeats, which was shown to interact with neuronal receptor and cell adhesion molecules, such as contactin and tenascin C. This protein was also found to interact with sodium channels, and thus may regulate sodium channels by altering tyrosine phosphorylation status. The functions of the interaction partners of this protein implicate the roles of this PTP in cell adhesion, neurite growth, and neuronal differentiation. Alternate transcript variants encoding different isoforms have been found for this gene. NA
70 actin, alpha, cardiac muscle 1 ACTC1 ENSG00000159251 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). NA
ENSG00000231767 NA RP11-92K2.2 ENSG00000231767 NA NA
ENSG00000213598 NA RP11-112J1.1 ENSG00000213598 NA NA
NA NA NA ENSG00000259716 NA TRUE
write.table(as.factor(out$query), paste0("../utilities/gene_names_brain_clus_",6,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);