Overview

In this script, we give the gene annotations of the genes that seem to vary greatly between the clusters we get. This would be indicative of the markers that are driving the different clusters. If we can find that for a cluster which is mainly represented in a particular tissue type, if the genes significantly differentially expressed in that cluster are indeed related to the tissue in terms of its annotation, then we can say that the clustering makes biological sense.

Extracting top driving genes

GoM_output <- get(load("../external_data/GTEX_V6/gtexv6fit.k.20.master.rda"));
topics_theta <- GoM_output$theta;

top_features <- ExtractTopFeatures(topics_theta, top_features=100, method="poisson", options="min");

gene_names <- as.vector(as.matrix(read.table("../external_data/GTEX_V6/gene_names_GTEX_V6.txt")))
gene_names <- substring(gene_names,1,15);
xli  <-  gene_names;
gene_list <- do.call(rbind, lapply(1:dim(top_features)[1], function(x) gene_names[top_features[x,]]))
write.table(gene_names, paste0("../utilities/gene_names_all_gtex.txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 1 Annotations

out <- mygene::queryMany(gene_list[1,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id summary query name notfound
NEAT1 283131 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. ENSG00000245532 nuclear paraspeckle assembly transcript 1 (non-protein coding) NA
IGFBP5 3488 NA ENSG00000115461 insulin like growth factor binding protein 5 NA
CCNL2 81669 The protein encoded by this gene belongs to the cyclin family. Through its interaction with several proteins, such as RNA polymerase II, splicing factors, and cyclin-dependent kinases, this protein functions as a regulator of the pre-mRNA splicing process, as well as in inducing apoptosis by modulating the expression of apoptotic and antiapoptotic proteins. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. ENSG00000221978 cyclin L2 NA
SRSF5 6430 The protein encoded by this gene is a member of the serine/arginine (SR)-rich family of pre-mRNA splicing factors, which constitute part of the spliceosome. Each of these factors contains an RNA recognition motif (RRM) for binding RNA and an RS domain for binding other proteins. The RS domain is rich in serine and arginine residues and facilitates interaction between different SR splicing factors. In addition to being critical for mRNA splicing, the SR proteins have also been shown to be involved in mRNA export from the nucleus and in translation. Alternative splicing results in multiple transcript variants. ENSG00000100650 serine/arginine-rich splicing factor 5 NA
PNISR 25957 NA ENSG00000132424 PNN-interacting serine/arginine-rich protein NA
SRRM2 23524 NA ENSG00000167978 serine/arginine repetitive matrix 2 NA
SNRNP70 6625 NA ENSG00000104852 small nuclear ribonucleoprotein U1 subunit 70 NA
MYO15B ENSG00000266714 NA ENSG00000266714 myosin XVB NA
RBM6 10180 NA ENSG00000004534 RNA binding motif protein 6 NA
CIRBP 1153 NA ENSG00000099622 cold inducible RNA binding protein NA
RBM39 9584 This gene encodes a member of the U2AF65 family of proteins. The encoded protein is found in the nucleus, where it co-localizes with core spliceosomal proteins. It has been shown to play a role in both steroid hormone receptor-mediated transcription and alternative splicing, and it is also a transcriptional coregulator of the viral oncoprotein v-Rel. Multiple transcript variants have been observed for this gene. A related pseudogene has been identified on chromosome X. ENSG00000131051 RNA binding motif protein 39 NA
ZNF83 55769 NA ENSG00000167766 zinc finger protein 83 NA
JUN 3725 This gene is the putative transforming gene of avian sarcoma virus 17. It encodes a protein which is highly similar to the viral protein, and which interacts directly with specific target DNA sequences to regulate gene expression. This gene is intronless and is mapped to 1p32-p31, a chromosomal region involved in both translocations and deletions in human malignancies. ENSG00000177606 jun proto-oncogene NA
NUMA1 4926 This gene encodes a large protein that forms a structural component of the nuclear matrix. The encoded protein interacts with microtubules and plays a role in the formation and organization of the mitotic spindle during cell division. Chromosomal translocation of this gene with the RARA (retinoic acid receptor, alpha) gene on chromosome 17 have been detected in patients with acute promyelocytic leukemia. Alternative splicing results in multiple transcript variants. ENSG00000137497 nuclear mitotic apparatus protein 1 NA
KAT2A 2648 KAT2A, or GCN5, is a histone acetyltransferase (HAT) that functions primarily as a transcriptional activator. It also functions as a repressor of NF-kappa-B (see MIM 164011) by promoting ubiquitination of the NF-kappa-B subunit RELA (MIM 164014) in a HAT-independent manner (Mao et al., 2009 [PubMed 19339690]). ENSG00000108773 lysine acetyltransferase 2A NA
TIA1 7072 The product encoded by this gene is a member of a RNA-binding protein family and possesses nucleolytic activity against cytotoxic lymphocyte (CTL) target cells. It has been suggested that this protein may be involved in the induction of apoptosis as it preferentially recognizes poly(A) homopolymers and induces DNA fragmentation in CTL targets. The major granule-associated species is a 15-kDa protein that is thought to be derived from the carboxyl terminus of the 40-kDa product by proteolytic processing. Alternative splicing resulting in different isoforms of this gene product has been described in the literature. ENSG00000116001 TIA1 cytotoxic granule-associated RNA binding protein NA
ATN1 1822 Dentatorubral pallidoluysian atrophy (DRPLA) is a rare neurodegenerative disorder characterized by cerebellar ataxia, myoclonic epilepsy, choreoathetosis, and dementia. The disorder is related to the expansion from 7-23 copies to 49-75 copies of a trinucleotide repeat (CAG/CAA) within this gene. The encoded protein includes a serine repeat and a region of alternating acidic and basic amino acids, as well as the variable glutamine repeat. Alternative splicing results in two transcripts variants that encode the same protein. ENSG00000111676 atrophin 1 NA
HP1BP3 50809 NA ENSG00000127483 heterochromatin protein 1 binding protein 3 NA
CLK1 1195 This gene encodes a member of the CDC2-like (or LAMMER) family of dual specificity protein kinases. In the nucleus, the encoded protein phosphorylates serine/arginine-rich proteins involved in pre-mRNA processing, releasing them into the nucleoplasm. The choice of splice sites during pre-mRNA processing may be regulated by the concentration of transacting factors, including serine/arginine rich proteins. Therefore, the encoded protein may play an indirect role in governing splice site selection. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000013441 CDC like kinase 1 NA
EEF1D 1936 This gene encodes a subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This subunit, delta, functions as guanine nucleotide exchange factor. It is reported that following HIV-1 infection, this subunit interacts with HIV-1 Tat. This interaction results in repression of translation of host cell proteins and enhanced translation of viral proteins. Several alternatively spliced transcript variants encoding multiple isoforms have been found for this gene. Related pseudogenes have been defined on chromosomes 1, 6, 7, 9, 11, 13, 17, 19. ENSG00000104529 eukaryotic translation elongation factor 1 delta NA
FAM160B2 64760 NA ENSG00000158863 family with sequence similarity 160 member B2 NA
GSTM2 2946 Cytosolic and membrane-bound forms of glutathione S-transferase are encoded by two distinct supergene families. At present, eight distinct classes of the soluble cytoplasmic mammalian glutathione S-transferases have been identified: alpha, kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a glutathione S-transferase that belongs to the mu class. The mu class of enzymes functions in the detoxification of electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins and products of oxidative stress, by conjugation with glutathione. The genes encoding the mu class of enzymes are organized in a gene cluster on chromosome 1p13.3 and are known to be highly polymorphic. These genetic variations can change an individual’s susceptibility to carcinogens and toxins as well as affect the toxicity and efficacy of certain drugs. ENSG00000213366 glutathione S-transferase mu 2 (muscle) NA
PCED1A 64773 The protein encoded by this gene is a member of the GDSL/SGNH superfamily. Members of this family are hydrolytic enzymes with esterase and lipase activity and broad substrate specificity. This protein belongs to the Pmr5-Cas1p-esterase subfamily in that it contains the catalytic triad comprised of serine, aspartate and histidine and lacks two conserved regions (glycine after strand S2 and GxND motif). A pseudogene of this gene has been identified on the long arm of chromosome 2. Alternative splicing results in multiple transcript variants that encode different protein isoforms. ENSG00000132635 PC-esterase domain containing 1A NA
SULF2 55959 Heparan sulfate proteoglycans (HSPGs) act as coreceptors for numerous heparin-binding growth factors and cytokines and are involved in cell signaling. Heparan sulfate 6-O-endosulfatases, such as SULF2, selectively remove 6-O-sulfate groups from heparan sulfate. This activity modulates the effects of heparan sulfate by altering binding sites for signaling molecules (Dai et al., 2005 [PubMed 16192265]). ENSG00000196562 sulfatase 2 NA
ARRDC3 57561 NA ENSG00000113369 arrestin domain containing 3 NA
NFATC4 4776 This gene encodes a member of the nuclear factor of activated T cells (NFAT) protein family. The encoded protein is part of a DNA-binding transcription complex. This complex consists of at least two components: a preexisting cytosolic component that translocates to the nucleus upon T cell receptor stimulation and an inducible nuclear component. NFAT proteins are activated by the calmodulin-dependent phosphatase, calcineurin. The encoded protein plays a role in the inducible expression of cytokine genes in T cells, especially in the induction of interleukin-2 and interleukin-4. Alternative splicing results in multiple transcript variants. ENSG00000100968 nuclear factor of activated T-cells 4 NA
RSRP1 57035 NA ENSG00000117616 arginine/serine-rich protein 1 NA
AHSA2 130872 NA ENSG00000173209 AHA1, activator of heat shock 90kDa protein ATPase homolog 2 (yeast) NA
SF3B1 23451 This gene encodes subunit 1 of the splicing factor 3b protein complex. Splicing factor 3b, together with splicing factor 3a and a 12S RNA unit, forms the U2 small nuclear ribonucleoproteins complex (U2 snRNP). The splicing factor 3b/3a complex binds pre-mRNA upstream of the intron’s branch site in a sequence independent manner and may anchor the U2 snRNP to the pre-mRNA. Splicing factor 3b is also a component of the minor U12-type spliceosome. The carboxy-terminal two-thirds of subunit 1 have 22 non-identical, tandem HEAT repeats that form rod-like, helical structures. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000115524 splicing factor 3b subunit 1 NA
PNPLA7 375775 Human patatin-like phospholipases, such as PNPLA7, have been implicated in regulation of adipocyte differentiation and have been induced by metabolic stimuli (Wilson et al., 2006 [PubMed 16799181]). ENSG00000130653 patatin like phospholipase domain containing 7 NA
MTMR9LP ENSG00000220785 NA ENSG00000220785 myotubularin related protein 9-like, pseudogene NA
COL16A1 1307 This gene encodes the alpha chain of type XVI collagen, a member of the FACIT collagen family (fibril-associated collagens with interrupted helices). Members of this collagen family are found in association with fibril-forming collagens such as type I and II, and serve to maintain the integrity of the extracellular matrix. High levels of type XVI collagen have been found in fibroblasts and keratinocytes, and in smooth muscle and amnion. ENSG00000084636 collagen type XVI alpha 1 NA
HNRNPA1 3178 This gene encodes a member of a family of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs), which are RNA-binding proteins that associate with pre-mRNAs in the nucleus and influence pre-mRNA processing, as well as other aspects of mRNA metabolism and transport. The protein encoded by this gene is one of the most abundant core proteins of hnRNP complexes and plays a key role in the regulation of alternative splicing. Mutations in this gene have been observed in individuals with amyotrophic lateral sclerosis 20. Multiple alternatively spliced transcript variants have been found. There are numerous pseudogenes of this gene distributed throughout the genome. ENSG00000135486 heterogeneous nuclear ribonucleoprotein A1 NA
RBM5 10181 This gene is a candidate tumor suppressor gene which encodes a nuclear RNA binding protein that is a component of the spliceosome A complex. The encoded protein plays a role in the induction of cell cycle arrest and apoptosis through pre-mRNA splicing of multiple target genes including the tumor suppressor protein p53. This gene is located within the tumor suppressor region 3p21.3, and may play a role in the inhibition of tumor transformation and progression of several malignancies including lung cancer. ENSG00000003756 RNA binding motif protein 5 NA
ZMIZ1 57178 This gene encodes a member of the PIAS (protein inhibitor of activated STAT) family of proteins. The encoded protein regulates the activity of various transcription factors, including the androgen receptor, Smad3/4, and p53. The encoded protein may also play a role in sumoylation. A translocation between this locus on chromosome 10 and the protein tyrosine kinase ABL1 locus on chromosome 9 has been associated with acute lymphoblastic leukemia. ENSG00000108175 zinc finger MIZ-type containing 1 NA
WDR6 11180 This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. The encoded protein interacts with serine/threonine kinase 11, and is implicated in cell growth arrest. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000178252 WD repeat domain 6 NA
VEGFA 7422 This gene is a member of the PDGF/VEGF growth factor family. It encodes a heparin-binding protein, which exists as a disulfide-linked homodimer. This growth factor induces proliferation and migration of vascular endothelial cells, and is essential for both physiological and pathological angiogenesis. Disruption of this gene in mice resulted in abnormal embryonic blood vessel formation. This gene is upregulated in many known tumors and its expression is correlated with tumor stage and progression. Elevated levels of this protein are found in patients with POEMS syndrome, also known as Crow-Fukase syndrome. Allelic variants of this gene have been associated with microvascular complications of diabetes 1 (MVCD1) and atherosclerosis. Alternatively spliced transcript variants encoding different isoforms have been described. There is also evidence for alternative translation initiation from upstream non-AUG (CUG) codons resulting in additional isoforms. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is antiangiogenic. Expression of some isoforms derived from the AUG start codon is regulated by a small upstream open reading frame, which is located within an internal ribosome entry site. ENSG00000112715 vascular endothelial growth factor A NA
CREBZF 58487 NA ENSG00000137504 CREB/ATF bZIP transcription factor NA
FAM193B 54540 NA ENSG00000146067 family with sequence similarity 193 member B NA
MAN2C1 4123 NA ENSG00000140400 mannosidase alpha class 2C member 1 NA
D2HGDH 728294 This gene encodes D-2hydroxyglutarate dehydrogenase, a mitochondrial enzyme belonging to the FAD-binding oxidoreductase/transferase type 4 family. This enzyme, which is most active in liver and kidney but also active in heart and brain, converts D-2-hydroxyglutarate to 2-ketoglutarate. Mutations in this gene are present in D-2-hydroxyglutaric aciduria, a rare recessive neurometabolic disorder causing developmental delay, epilepsy, hypotonia, and dysmorphic features. ENSG00000180902 D-2-hydroxyglutarate dehydrogenase NA
SNHG5 ENSG00000203875 NA ENSG00000203875 small nucleolar RNA host gene 5 NA
PSMA3-AS1 379025 NA ENSG00000257621 PSMA3 antisense RNA 1 NA
LUC7L 55692 The LUC7L gene may represent a mammalian heterochromatic gene, encoding a putative RNA-binding protein similar to the yeast Luc7p subunit of the U1 snRNP splicing complex that is normally required for 5-prime splice site selection (Tufarelli et al., 2001 [PubMed 11170747]). ENSG00000007392 LUC7 like NA
NUPR1 26471 NA ENSG00000176046 nuclear protein 1, transcriptional regulator NA
LUC7L3 51747 This gene encodes a protein with an N-terminal half that contains cysteine/histidine motifs and leucine zipper-like repeats, and the C-terminal half is rich in arginine and glutamate residues (RE domain) and arginine and serine residues (RS domain). This protein localizes with a speckled pattern in the nucleus, and could be involved in the formation of splicesome via the RE and RS domains. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. ENSG00000108848 LUC7 like 3 pre-mRNA splicing factor NA
SOX4 6659 This intronless gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins, such as syndecan binding protein (syntenin). The protein may function in the apoptosis pathway leading to cell death as well as to tumorigenesis and may mediate downstream effects of parathyroid hormone (PTH) and PTH-related protein (PTHrP) in bone development. The solution structure has been resolved for the HMG-box of a similar mouse protein. ENSG00000124766 SRY-box 4 NA
NA NA NA ENSG00000256586 NA TRUE
HNRNPH1 3187 This gene encodes a member of a subfamily of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs). The hnRNPs are RNA binding proteins that complex with heterogeneous nuclear RNA. These proteins are associated with pre-mRNAs in the nucleus and appear to influence pre-mRNA processing and other aspects of mRNA metabolism and transport. While all of the hnRNPs are present in the nucleus, some may shuttle between the nucleus and the cytoplasm. The hnRNP proteins have distinct nucleic acid binding properties. The protein encoded by this gene has three repeats of quasi-RRM domains that bind to RNA and is very similar to the family member HNRPF. This gene may be associated with hereditary lymphedema type I. Alternatively spliced transcript variants have been described ENSG00000169045 heterogeneous nuclear ribonucleoprotein H1 (H) NA
JUND 3727 The protein encoded by this intronless gene is a member of the JUN family, and a functional component of the AP1 transcription factor complex. This protein has been proposed to protect cells from p53-dependent senescence and apoptosis. Alternative translation initiation site usage results in the production of different isoforms (PMID:12105216). ENSG00000130522 jun D proto-oncogene NA
GOLGA8A 23015 The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. ENSG00000175265 golgin A8 family member A NA
SRSF6 6431 The protein encoded by this gene is involved in mRNA splicing and may play a role in the determination of alternative splicing. The encoded nuclear protein belongs to the splicing factor SR family and has been shown to bind with and modulate another member of the family, SFRS12. Alternative splicing results in multiple transcript variants. In addition, two pseudogenes, one on chromosome 17 and the other on the X chromosome, have been found for this gene. ENSG00000124193 serine/arginine-rich splicing factor 6 NA
CCL21 6366 This antimicrobial gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. Similar to other chemokines the protein encoded by this gene inhibits hemopoiesis and stimulates chemotaxis. This protein is chemotactic in vitro for thymocytes and activated T cells, but not for B cells, macrophages, or neutrophils. The cytokine encoded by this gene may also play a role in mediating homing of lymphocytes to secondary lymphoid organs. It is a high affinity functional ligand for chemokine receptor 7 that is expressed on T and B lymphocytes and a known receptor for another member of the cytokine family (small inducible cytokine A19). ENSG00000137077 C-C motif chemokine ligand 21 NA
NKTR 4820 This gene encodes a membrane-anchored protein with a hydrophobic amino terminal domain and a cyclophilin-like PPIase domain. It is present on the surface of natural killer cells and facilitates their binding to targets. Its expression is regulated by IL2 activation of the cells. ENSG00000114857 natural killer cell triggering receptor NA
AC074212.5 ENSG00000259605 NA ENSG00000259605 NA NA
NXF1 10482 This gene is one member of a family of nuclear RNA export factor genes. Common domain features of this family are a noncanonical RNP-type RNA-binding domain (RBD), 4 leucine-rich repeats (LRRs), a nuclear transport factor 2 (NTF2)-like domain that allows heterodimerization with NTF2-related export protein-1 (NXT1), and a ubiquitin-associated domain that mediates interactions with nucleoporins. The LRRs and NTF2-like domains are required for export activity. Alternative splicing seems to be a common mechanism in this gene family. The encoded protein of this gene shuttles between the nucleus and the cytoplasm and binds in vivo to poly(A)+ RNA. It is the vertebrate homologue of the yeast protein Mex67p. The encoded protein overcomes the mRNA export block caused by the presence of saturating amounts of CTE (constitutive transport element) RNA of type D retroviruses. Alternative splicing results in multiple transcript variants. ENSG00000162231 nuclear RNA export factor 1 NA
UCKL1 54963 The protein encoded by this gene is a uridine kinase. Uridine kinases catalyze the phosphorylation of uridine to uridine monophosphate. This protein has been shown to bind to Epstein-Barr nuclear antigen 3 as well as natural killer lytic-associated molecule. Ubiquitination of this protein is enhanced by the presence of natural killer lytic-associated molecule. In addition, protein levels decrease in the presence of natural killer lytic-associated molecule, suggesting that association with natural killer lytic-associated molecule results in ubiquitination and subsequent degradation of this protein. Alternative splicing results in multiple transcript variants. ENSG00000198276 uridine-cytidine kinase 1 like 1 NA
PCGF3 10336 The protein encoded by this gene contains a C3HC4 type RING finger, which is a motif known to be involved in protein-protein interactions. The specific function of this protein has not yet been determined. ENSG00000185619 polycomb group ring finger 3 NA
IGFBP4 3487 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. ENSG00000141753 insulin like growth factor binding protein 4 NA
EGR1 1958 The protein encoded by this gene belongs to the EGR family of C2H2-type zinc-finger proteins. It is a nuclear protein and functions as a transcriptional regulator. The products of target genes it activates are required for differentitation and mitogenesis. Studies suggest this is a cancer suppressor gene. ENSG00000120738 early growth response 1 NA
FNBP4 23360 NA ENSG00000109920 formin binding protein 4 NA
MSANTD2 79684 NA ENSG00000120458 Myb/SANT DNA binding domain containing 2 NA
NSUN5P1 155400 This locus represents a transcribed pseudogene of a nearby locus on chromosome 7, which encodes a putative methyltransferase. There is also a third closely related pseudogene locus in this region. Alternative splicing results in multiple transcript variants of this gene. ENSG00000223705 NOP2/Sun RNA methyltransferase family member 5 pseudogene 1 NA
HNRNPH3 3189 This gene belongs to the subfamily of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs). The hnRNPs are RNA binding proteins and they complex with heterogeneous nuclear RNA (hnRNA). These proteins are associated with pre-mRNAs in the nucleus and appear to influence pre-mRNA processing and other aspects of mRNA metabolism and transport. While all of the hnRNPs are present in the nucleus, some seem to shuttle between the nucleus and the cytoplasm. The hnRNP proteins have distinct nucleic acid binding properties. The protein encoded by this gene has two repeats of quasi-RRM domains that bind to RNAs. It is localized in nuclear bodies of the nucleus. This protein is involved in the splicing process and it also participates in early heat shock-induced splicing arrest by transiently leaving the hnRNP complexes. Several alternatively spliced transcript variants have been noted for this gene, however, not all are fully characterized. ENSG00000096746 heterogeneous nuclear ribonucleoprotein H3 NA
USP36 57602 This gene encodes a member of the peptidase C19 or ubiquitin-specific protease family of cysteine proteases. Members of this family remove ubiquitin molecules from polyubiquitinated proteins. The encoded protein may deubiquitinate and stabilize the transcription factor c-Myc, also known as MYC, an important oncoprotein known to be upregulated in most human cancers. The encoded protease may also regulate the activation of autophagy. This gene exhibits elevated expression in some breast and lung cancers. ENSG00000055483 ubiquitin specific peptidase 36 NA
ZFP36L2 678 This gene is a member of the TIS11 family of early response genes. Family members are induced by various agonists such as the phorbol ester TPA and the polypeptide mitogen EGF. The encoded protein contains a distinguishing putative zinc finger domain with a repeating cys-his motif. This putative nuclear transcription factor most likely functions in regulating the response to growth factors. ENSG00000152518 ZFP36 ring finger protein-like 2 NA
SNX1 6642 This gene encodes a member of the sorting nexin family. Members of this family contain a phox (PX) domain, which is a phosphoinositide binding domain, and are involved in intracellular trafficking. This endosomal protein regulates the cell-surface expression of epidermal growth factor receptor. This protein also has a role in sorting protease-activated receptor-1 from early endosomes to lysosomes. This protein may form oligomeric complexes with family members. This gene results in three transcript variants encoding distinct isoforms. ENSG00000028528 sorting nexin 1 NA
ROBO3 64221 This gene is a member of the Roundabout (ROBO) gene family that controls neurite outgrowth, growth cone guidance, and axon fasciculation. ROBO proteins are a subfamily of the immunoglobulin transmembrane receptor superfamily. SLIT proteins 1-3, a family of secreted chemorepellants, are ligands for ROBO proteins and SLIT/ROBO interactions regulate myogenesis, leukocyte migration, kidney morphogenesis, angiogenesis, and vasculogenesis in addition to neurogenesis. This gene, ROBO3, has a putative extracellular domain with five immunoglobulin (Ig)-like loops and three fibronectin (Fn) type III motifs, a transmembrane segment, and a cytoplasmic tail with three conserved signaling motifs: CC0, CC2, and CC3 (CC for conserved cytoplasmic). Unlike other ROBO family members, ROBO3 lacks motif CC1. The ROBO3 gene regulates axonal navigation at the ventral midline of the neural tube. In mouse, loss of Robo3 results in a complete failure of commissural axons to cross the midline throughout the spinal cord and the hindbrain. Mutations ROBO3 result in horizontal gaze palsy with progressive scoliosis (HGPPS); an autosomal recessive disorder characterized by congenital absence of horizontal gaze, progressive scoliosis, and failure of the corticospinal and somatosensory axon tracts to cross the midline in the medulla. Alternative transcript variants have been described but have not been experimentally validated. ENSG00000154134 roundabout guidance receptor 3 NA
GATAD1 57798 The protein encoded by this gene contains a zinc finger at the N-terminus, and is thought to bind to a histone modification site that regulates gene expression. Mutations in this gene have been associated with autosomal recessive dilated cardiomyopathy. Alternatively spliced transcript variants have been found for this gene. ENSG00000157259 GATA zinc finger domain containing 1 NA
N4BP2L2 10443 NA ENSG00000244754 NEDD4 binding protein 2-like 2 NA
TTC17 55761 NA ENSG00000052841 tetratricopeptide repeat domain 17 NA
SH3BP5-AS1 100505696 NA ENSG00000224660 SH3BP5 antisense RNA 1 NA
KLF3-AS1 79667 NA ENSG00000231160 KLF3 antisense RNA 1 NA
CLK2 1196 This gene encodes a dual specificity protein kinase that phosphorylates serine/threonine and tyrosine-containing substrates. Activity of this protein regulates serine- and arginine-rich (SR) proteins of the spliceosomal complex, thereby influencing alternative transcript splicing. Chromosomal translocations have been characterized between this locus and the PAFAH1B3 (platelet-activating factor acetylhydrolase 1b, catalytic subunit 3 (29kDa)) gene on chromosome 19, resulting in the production of a fusion protein. Note that this gene is distinct from the TELO2 gene (GeneID:9894), which shares the CLK2 alias, but encodes a protein that is involved in telomere length regulation. There is a pseudogene for this gene on chromosome 7. Alternative splicing results in multiple transcript variants. ENSG00000176444 CDC like kinase 2 NA
LOC102724814 102724814 NA ENSG00000258727 uncharacterized LOC102724814 NA
SMAD3 4088 The protein encoded by this gene belongs to the SMAD, a family of proteins similar to the gene products of the Drosophila gene ‘mothers against decapentaplegic’ (Mad) and the C. elegans gene Sma. SMAD proteins are signal transducers and transcriptional modulators that mediate multiple signaling pathways. This protein functions as a transcriptional modulator activated by transforming growth factor-beta and is thought to play a role in the regulation of carcinogenesis. ENSG00000166949 SMAD family member 3 NA
PRPF3 9129 The removal of introns from nuclear pre-mRNAs occurs on complexes called spliceosomes, which are made up of 4 small nuclear ribonucleoprotein (snRNP) particles and an undefined number of transiently associated splicing factors. This gene product is one of several proteins that associate with U4 and U6 snRNPs. Mutations in this gene are associated with retinitis pigmentosa-18. ENSG00000117360 pre-mRNA processing factor 3 NA
TBL1XR1 79718 This gene is a member of the WD40 repeat-containing gene family and shares sequence similarity with transducin (beta)-like 1X-linked (TBL1X). The protein encoded by this gene is thought to be a component of both nuclear receptor corepressor (N-CoR) and histone deacetylase 3 (HDAC 3) complexes, and is required for transcriptional activation by a variety of transcription factors. Mutations in these gene have been associated with some autism spectrum disorders, and one finding suggests that haploinsufficiency of this gene may be a cause of intellectual disability with dysmorphism. Mutations in this gene as well as recurrent translocations involving this gene have also been observed in some tumors. ENSG00000177565 transducin (beta)-like 1 X-linked receptor 1 NA
ARGLU1 55082 NA ENSG00000134884 arginine and glutamate rich 1 NA
PATZ1 23598 The protein encoded by this gene contains an A-T hook DNA binding motif which usually binds to other DNA binding structures to play an important role in chromatin modeling and transcription regulation. Its Poz domain is thought to function as a site for protein-protein interaction and is required for transcriptional repression, and the zinc-fingers comprise the DNA binding domain. Since the encoded protein has typical features of a transcription factor, it is postulated to be a repressor of gene expression. In small round cell sarcoma, this gene is fused to EWS by a small inversion of 22q, then the hybrid is thought to be translocated (t(1;22)(p36.1;q12). The rearrangement of chromosome 22 involves intron 8 of EWS and exon 1 of this gene creating a chimeric sequence containing the transactivation domain of EWS fused to zinc finger domain of this protein. This is a distinct example of an intra-chromosomal rearrangement of chromosome 22. Four alternatively spliced transcript variants are described for this gene. ENSG00000100105 POZ/BTB and AT hook containing zinc finger 1 NA
LENG8 114823 NA ENSG00000167615 leukocyte receptor cluster (LRC) member 8 NA
CHD3 1107 This gene encodes a member of the CHD family of proteins which are characterized by the presence of chromo (chromatin organization modifier) domains and SNF2-related helicase/ATPase domains. This protein is one of the components of a histone deacetylase complex referred to as the Mi-2/NuRD complex which participates in the remodeling of chromatin by deacetylating histones. Chromatin remodeling is essential for many processes including transcription. Autoantibodies against this protein are found in a subset of patients with dermatomyositis. Three alternatively spliced transcripts encoding different isoforms have been described. ENSG00000170004 chromodomain helicase DNA binding protein 3 NA
SLC7A8 23428 NA ENSG00000092068 solute carrier family 7 member 8 NA
POGZ 23126 The protein encoded by this gene appears to be a zinc finger protein containing a transposase domain at the C-terminus. This protein was found to interact with the transcription factor SP1 in a yeast two-hybrid system. Alternatively spliced transcript variants encoding distinct isoforms have been observed. ENSG00000143442 pogo transposable element with ZNF domain NA
TAF1C 9013 Initiation of transcription by RNA polymerase I requires the formation of a complex composed of the TATA-binding protein (TBP) and three TBP-associated factors (TAFs) specific for RNA polymerase I. This complex, known as SL1, binds to the core promoter of ribosomal RNA genes to position the polymerase properly and acts as a channel for regulatory signals. This gene encodes the largest SL1-specific TAF. Multiple alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000103168 TATA-box binding protein associated factor, RNA polymerase I subunit C NA
KIAA0907 22889 NA ENSG00000132680 KIAA0907 NA
SNHG7 84973 NA ENSG00000233016 small nucleolar RNA host gene 7 NA
NA NA NA ENSG00000215513 NA TRUE
ZBED5 58486 This gene is unusual in that its coding sequence is mostly derived from Charlie-like DNA transposon; however, it does not appear to be an active DNA transposon as it is not flanked by terminal inverted repeats. The encoded protein is conserved among the mammalian Laurasiatheria branch. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000236287 zinc finger BED-type containing 5 NA
LOC150776 150776 NA ENSG00000152117 sphingomyelin phosphodiesterase 4, neutral membrane (neutral sphingomyelinase-3) pseudogene NA
ZNF266 10781 This gene encodes a protein containing many tandem zinc-finger motifs. Zinc fingers are protein or nucleic acid-binding domains, and may be involved in a variety of functions, including regulation of transcription. This gene is located in a cluster of similar genes encoding zinc finger proteins on chromosome 19. Alternative splicing results in multiple transcript variants for this gene. ENSG00000174652 zinc finger protein 266 NA
AC007563.5 ENSG00000236886 NA ENSG00000236886 NA NA
SETD5 55209 This function of this gene has yet to be determined but mutations in this gene have been associated with autosomal dominant mental retardation-23. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000168137 SET domain containing 5 NA
TP73-AS1 57212 NA ENSG00000227372 TP73 antisense RNA 1 NA
EIF3L 51386 NA ENSG00000100129 eukaryotic translation initiation factor 3 subunit L NA
HEXDC 284004 NA ENSG00000169660 hexosaminidase D NA
LINC01089 338799 NA ENSG00000212694 long intergenic non-protein coding RNA 1089 NA
AMT 275 This gene encodes one of four critical components of the glycine cleavage system. Mutations in this gene have been associated with glycine encephalopathy. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000145020 aminomethyltransferase NA
CSGALNACT1 55790 NA ENSG00000147408 chondroitin sulfate N-acetylgalactosaminyltransferase 1 NA
SIX5 147912 The protein encoded by this gene is a homeodomain-containing transcription factor that appears to function in the regulation of organogenesis. This gene is located downstream of the dystrophia myotonica-protein kinase gene. Mutations in this gene are a cause of branchiootorenal syndrome type 2. ENSG00000177045 SIX homeobox 5 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",1,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 2 Annotations

out <- mygene::queryMany(gene_list[2,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol summary X_id name
ENSG00000132639 SNAP25 Synaptic vesicle membrane docking and fusion is mediated by SNAREs (soluble N-ethylmaleimide-sensitive factor attachment protein receptors) located on the vesicle membrane (v-SNAREs) and the target membrane (t-SNAREs). The assembled v-SNARE/t-SNARE complex consists of a bundle of four helices, one of which is supplied by v-SNARE and the other three by t-SNARE. For t-SNAREs on the plasma membrane, the protein syntaxin supplies one helix and the protein encoded by this gene contributes the other two. Therefore, this gene product is a presynaptic plasma membrane protein involved in the regulation of neurotransmitter release. Two alternative transcript variants encoding different protein isoforms have been described for this gene. 6616 synaptosome associated protein 25kDa
ENSG00000127585 FBXL16 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). 146330 F-box and leucine-rich repeat protein 16
ENSG00000020129 NCDN This gene encodes a leucine-rich cytoplasmic protein, which is highly similar to a mouse protein that negatively regulates Ca/calmodulin-dependent protein kinase II phosphorylation and may be essential for spatial learning processes. Several alternatively spliced transcript variants of this gene have been described. 23154 neurochondrin
ENSG00000104888 SLC17A7 The protein encoded by this gene is a vesicle-bound, sodium-dependent phosphate transporter that is specifically expressed in the neuron-rich regions of the brain. It is preferentially associated with the membranes of synaptic vesicles and functions in glutamate transport. The protein shares 82% identity with the differentiation-associated Na-dependent inorganic phosphate cotransporter and they appear to form a distinct class within the Na+/Pi cotransporter family. 57030 solute carrier family 17 member 7
ENSG00000124507 PACSIN1 NA 29993 protein kinase C and casein kinase substrate in neurons 1
ENSG00000074317 SNCB This gene encodes a member of a small family of proteins that inhibit phospholipase D2 and may function in neuronal plasticity. The encoded protein is abundant in lesions of patients with Alzheimer disease. A mutation in this gene was found in individuals with dementia with Lewy bodies. Alternative splicing results in multiple transcript variants. 6620 synuclein beta
ENSG00000155980 KIF5A This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. 3798 kinesin family member 5A
ENSG00000160014 CALM3 NA 808 calmodulin 3 (phosphorylase kinase, delta)
ENSG00000160014 CALM2 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. 805 calmodulin 2 (phosphorylase kinase, delta)
ENSG00000106976 DNM1 This gene encodes a member of the dynamin subfamily of GTP-binding proteins. The encoded protein possesses unique mechanochemical properties used to tubulate and sever membranes, and is involved in clathrin-mediated endocytosis and other vesicular trafficking processes. Actin and other cytoskeletal proteins act as binding partners for the encoded protein, which can also self-assemble leading to stimulation of GTPase activity. More than sixty highly conserved copies of the 3’ region of this gene are found elsewhere in the genome, particularly on chromosomes Y and 15. Alternatively spliced transcript variants encoding different isoforms have been described. 1759 dynamin 1
ENSG00000198668 CALM1 This gene encodes a member of the EF-hand calcium-binding protein family. It is one of three genes which encode an identical calcium binding protein which is one of the four subunits of phosphorylase kinase. Two pseudogenes have been identified on chromosome 7 and X. Multiple transcript variants encoding different isoforms have been found for this gene. 801 calmodulin 1 (phosphorylase kinase, delta)
ENSG00000198668 CALM2 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. 805 calmodulin 2 (phosphorylase kinase, delta)
ENSG00000128656 CHN1 This gene encodes GTPase-activating protein for ras-related p21-rac and a phorbol ester receptor. It is predominantly expressed in neurons, and plays an important role in neuronal signal-transduction mechanisms. Mutations in this gene are associated with Duane’s retraction syndrome 2 (DURS2). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 1123 chimerin 1
ENSG00000136854 STXBP1 This gene encodes a syntaxin-binding protein. The encoded protein appears to play a role in release of neurotransmitters via regulation of syntaxin, a transmembrane attachment protein receptor. Mutations in this gene have been associated with infantile epileptic encephalopathy-4. Alternatively spliced transcript variants have been described. 6812 syntaxin binding protein 1
ENSG00000105696 TMEM59L This gene encodes a predicted type-I membrane glycoprotein. The encoded protein may play a role in functioning of the central nervous system. 25789 transmembrane protein 59 like
ENSG00000111674 ENO2 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme, a homodimer, is found in mature neurons and cells of neuronal origin. A switch from alpha enolase to gamma enolase occurs in neural tissue during development in rats and primates. 2026 enolase 2 (gamma, neuronal)
ENSG00000139970 RTN1 This gene belongs to the family of reticulon encoding genes. Reticulons are associated with the endoplasmic reticulum, and are involved in neuroendocrine secretion or in membrane trafficking in neuroendocrine cells. This gene is considered to be a specific marker for neurological diseases and cancer, and is a potential molecular target for therapy. Alternative splicing results in multiple transcript variants. 6252 reticulon 1
ENSG00000168490 PHYHIP NA 9796 phytanoyl-CoA 2-hydroxylase interacting protein
ENSG00000163032 VSNL1 This gene is a member of the visinin/recoverin subfamily of neuronal calcium sensor proteins. The encoded protein is strongly expressed in granule cells of the cerebellum where it associates with membranes in a calcium-dependent manner and modulates intracellular signaling pathways of the central nervous system by directly or indirectly regulating the activity of adenylyl cyclase. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. 7447 visinin like 1
ENSG00000099365 STX1B The protein encoded by this gene belongs to a family of proteins thought to play a role in the exocytosis of synaptic vesicles. Vesicle exocytosis releases vesicular contents and is important to various cellular functions. For instance, the secretion of transmitters from neurons plays an important role in synaptic transmission. After exocytosis, the membrane and proteins from the vesicle are retrieved from the plasma membrane through the process of endocytosis. Mutations in this gene have been identified as one cause of fever-associated epilepsy syndromes. A possible link between this gene and Parkinson’s disease has also been suggested. 112755 syntaxin 1B
ENSG00000125814 NAPB NA 63908 NSF attachment protein beta
ENSG00000154146 NRGN Neurogranin (NRGN) is the human homolog of the neuron-specific rat RC3/neurogranin gene. This gene encodes a postsynaptic protein kinase substrate that binds calmodulin in the absence of calcium. The NRGN gene contains four exons and three introns. The exons 1 and 2 encode the protein and exons 3 and 4 contain untranslated sequences. It is suggested that the NRGN is a direct target for thyroid hormone in human brain, and that control of expression of this gene could underlay many of the consequences of hypothyroidism on mental states during development as well as in adult subjects. 4900 neurogranin
ENSG00000104435 STMN2 This gene encodes a member of the stathmin family of phosphoproteins. Stathmin proteins function in microtubule dynamics and signal transduction. The encoded protein plays a regulatory role in neuronal growth and is also thought to be involved in osteogenesis. Reductions in the expression of this gene have been associated with Down’s syndrome and Alzheimer’s disease. Alternatively spliced transcript variants have been observed for this gene. A pseudogene of this gene is located on the long arm of chromosome 6. 11075 stathmin 2
ENSG00000132535 DLG4 This gene encodes a member of the membrane-associated guanylate kinase (MAGUK) family. It heteromultimerizes with another MAGUK protein, DLG2, and is recruited into NMDA receptor and potassium channel clusters. These two MAGUK proteins may interact at postsynaptic sites to form a multimeric scaffold for the clustering of receptors, ion channels, and associated signaling proteins. Multiple transcript variants encoding different isoforms have been found for this gene. 1742 discs large homolog 4
ENSG00000188191 PRKAR1B The protein encoded by this gene is a regulatory subunit of cyclic AMP-dependent protein kinase A (PKA), which is involved in the signaling pathway of the second messenger cAMP. Two regulatory and two catalytic subunits form the PKA holoenzyme, disbands after cAMP binding. The holoenzyme is involved in many cellular events, including ion transport, metabolism, and transcription. Several transcript variants encoding the same protein have been found for this gene. 5575 protein kinase cAMP-dependent type I regulatory subunit beta
ENSG00000008735 MAPK8IP2 The protein encoded by this gene is closely related to MAPK8IP1/IB1/JIP-1, a scaffold protein that is involved in the c-Jun amino-terminal kinase signaling pathway. This protein is expressed in brain and pancreatic cells. It has been shown to interact with, and regulate the activity of MAPK8/JNK1, and MAP2K7/MKK7 kinases. This protein thus is thought to function as a regulator of signal transduction by protein kinase cascade in brain and pancreatic beta-cells. 23542 mitogen-activated protein kinase 8 interacting protein 2
ENSG00000100321 SYNGR1 This gene encodes an integral membrane protein associated with presynaptic vesicles in neuronal cells. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it functions in synaptic plasticity without being required for synaptic transmission. The gene product belongs to the synaptogyrin gene family. Three alternatively spliced variants encoding three different isoforms have been identified. 9145 synaptogyrin 1
ENSG00000063180 CA11 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. They show extensive diversity in tissue distribution and in their subcellular localization. CA XI is likely a secreted protein, however, radical changes at active site residues completely conserved in CA isozymes with catalytic activity, make it unlikely that it has carbonic anhydrase activity. It shares properties in common with two other acatalytic CA isoforms, CA VIII and CA X. CA XI is most abundantly expressed in brain, and may play a general role in the central nervous system. 770 carbonic anhydrase 11
ENSG00000110076 NRXN2 This gene encodes a member of the neurexin gene family. The products of these genes function as cell adhesion molecules and receptors in the vertebrate nervous system. These genes utilize two promoters. The majority of transcripts are produced from the upstream promoter and encode alpha-neurexin isoforms while a smaller number of transcripts are produced from the downstream promoter and encode beta-neuresin isoforms. The alpha-neurexins contain epidermal growth factor-like (EGF-like) sequences and laminin G domains, and have been shown to interact with neurexophilins. The beta-neurexins lack EGF-like sequences and contain fewer laminin G domains than alpha-neurexins. Alternative splicing and the use of alternative promoters may generate thousands of transcript variants (PMID: 12036300, PMID: 11944992). 9379 neurexin 2
ENSG00000159164 SV2A NA 9900 synaptic vesicle glycoprotein 2A
ENSG00000100505 TRIM9 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The protein localizes to cytoplasmic bodies. Its function has not been identified. Alternate splicing of this gene generates two transcript variants encoding different isoforms. 114088 tripartite motif containing 9
ENSG00000198794 SCAMP5 NA 192683 secretory carrier membrane protein 5
ENSG00000138814 PPP3CA NA 5530 protein phosphatase 3 catalytic subunit alpha
ENSG00000171617 ENC1 This gene encodes a member of the kelch-related family of actin-binding proteins. The encoded protein plays a role in the oxidative stress response as a regulator of the transcription factor Nrf2, and expression of this gene may play a role in malignant transformation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 8507 ectodermal-neural cortex 1
ENSG00000197457 STMN3 This gene encodes a protein which is a member of the stathmin protein family. Members of this protein family form a complex with tubulins at a ratio of 2 tubulins for each stathmin protein. Microtubules require the ordered assembly of alpha- and beta-tubulins, and formation of a complex with stathmin disrupts microtubule formation and function. A pseudogene of this gene is located on chromosome 22. Alternative splicing results in multiple transcript variants. 50861 stathmin 3
ENSG00000088899 LZTS3 NA 9762 leucine zipper, putative tumor suppressor family member 3
ENSG00000105649 RAB3A NA 5864 RAB3A, member RAS oncogene family
ENSG00000092096 SLC22A17 NA 51310 solute carrier family 22 member 17
ENSG00000184524 CEND1 The protein encoded by this gene is a neuron-specific protein. The similar protein in pig enhances neuroblastoma cell differentiation in vitro and may be involved in neuronal differentiation in vivo. Multiple pseudogenes have been reported for this gene. 51286 cell cycle exit and neuronal differentiation 1
ENSG00000168993 CPLX1 Proteins encoded by the complexin/synaphin gene family are cytosolic proteins that function in synaptic vesicle exocytosis. These proteins bind syntaxin, part of the SNAP receptor. The protein product of this gene binds to the SNAP receptor complex and disrupts it, allowing transmitter release. 10815 complexin 1
ENSG00000112139 MDGA1 NA 266727 MAM domain containing glycosylphosphatidylinositol anchor 1
ENSG00000154277 UCHL1 The protein encoded by this gene belongs to the peptidase C12 family. This enzyme is a thiol protease that hydrolyzes a peptide bond at the C-terminal glycine of ubiquitin. This gene is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease. 7345 ubiquitin C-terminal hydrolase L1
ENSG00000108309 RUNDC3A NA 10900 RUN domain containing 3A
ENSG00000143847 PPFIA4 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. 8497 PTPRF interacting protein alpha 4
ENSG00000107130 NCS1 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. 23413 neuronal calcium sensor 1
ENSG00000166963 MAP1A This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1A heavy chain and LC2 light chain. Expression of this gene is almost exclusively in the brain. Studies of the rat microtubule-associated protein 1A gene suggested a role in early events of spinal cord development. 4130 microtubule associated protein 1A
ENSG00000058404 CAMK2B The product of this gene belongs to the serine/threonine protein kinase family and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. Calcium signaling is crucial for several aspects of plasticity at glutamatergic synapses. In mammalian cells, the enzyme is composed of four different chains: alpha, beta, gamma, and delta. The product of this gene is a beta chain. It is possible that distinct isoforms of this chain have different cellular localizations and interact differently with calmodulin. Alternative splicing results in multiple transcript variants. 816 calcium/calmodulin dependent protein kinase II beta
ENSG00000117016 RIMS3 NA 9783 regulating synaptic membrane exocytosis 3
ENSG00000221890 NPTXR This gene encodes a protein similar to the rat neuronal pentraxin receptor. The rat pentraxin receptor is an integral membrane protein that is thought to mediate neuronal uptake of the snake venom toxin, taipoxin, and its transport into the synapses. Studies in rat indicate that translation of this mRNA initiates at a non-AUG (CUG) codon. This may also be true for mouse and human, based on strong sequence conservation amongst these species. 23467 neuronal pentraxin receptor
ENSG00000139200 PIANP This gene encodes a ligand for the paired immunoglobin-like type 2 receptor alpha, and so may be involved in immune regulation. Alternate splicing results in multiple transcript variants encoding different proteins. 196500 PILR alpha associated neural protein
ENSG00000104833 TUBB4A This gene encodes a member of the beta tubulin family. Beta tubulins are one of two core protein families (alpha and beta tubulins) that heterodimerize and assemble to form microtubules. Mutations in this gene cause hypomyelinating leukodystrophy-6 and autosomal dominant torsion dystonia-4. Alternate splicing results in multiple transcript variants encoding different isoforms. A pseudogene of this gene is found on chromosome X. 10382 tubulin beta 4A class IVa
ENSG00000167371 PRRT2 This gene encodes a transmembrane protein containing a proline-rich domain in its N-terminal half. Studies in mice suggest that it is predominantly expressed in brain and spinal cord in embryonic and postnatal stages. Mutations in this gene are associated with episodic kinesigenic dyskinesia-1. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 112476 proline rich transmembrane protein 2
ENSG00000160469 BRSK1 NA 84446 BR serine/threonine kinase 1
ENSG00000059915 PSD This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. 5662 pleckstrin and Sec7 domain containing
ENSG00000127561 SYNGR3 This gene encodes an integral membrane protein. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it is a synaptic vesicle protein that also interacts with the dopamine transporter. The gene product belongs to the synaptogyrin gene family. 9143 synaptogyrin 3
ENSG00000073969 NSF NA 4905 N-ethylmaleimide sensitive factor
ENSG00000131771 PPP1R1B This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. 84152 protein phosphatase 1 regulatory inhibitor subunit 1B
ENSG00000139899 CBLN3 Members of the precerebellin family, such as CBLN3, contain a cerebellin motif (see CBLN1; MIM 600432) and a C-terminal C1q signature domain (see MIM 120550) that mediates trimeric assembly of atypical collagen complexes. However, precerebellins do not contain a collagen motif, suggesting that they are not conventional components of the extracellular matrix (Pang et al., 2000 [PubMed 10964938]). 643866 cerebellin 3 precursor
ENSG00000145362 ANK2 This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. 287 ankyrin 2, neuronal
ENSG00000204681 GABBR1 This gene encodes a receptor for gamma-aminobutyric acid (GABA), which is the main inhibitory neurotransmitter in the mammalian central nervous system. This receptor functions as a heterodimer with GABA(B) receptor 2. Defects in this gene may underlie brain disorders such as schizophrenia and epilepsy. Alternative splicing generates multiple transcript variants, but the full-length nature of some of these variants has not been determined. 2550 gamma-aminobutyric acid type B receptor subunit 1
ENSG00000101298 SNPH Syntaxin-1, synaptobrevin/VAMP, and SNAP25 interact to form the SNARE complex, which is required for synaptic vesicle docking and fusion. The protein encoded by this gene is membrane-associated and inhibits SNARE complex formation by binding free syntaxin-1. Expression of this gene appears to be brain-specific. Alternative splicing results in multiple transcript variants encoding different isoforms. 9751 syntaphilin
ENSG00000132563 REEP2 This gene encodes a member of the receptor expression enhancing protein family. Studies of a related gene in mouse suggest that the encoded protein is found in the cell membrane and enhances the function of sweet taste receptors. Alternative splicing results in multiple transcript variants. 51308 receptor accessory protein 2
ENSG00000156011 PSD3 NA 23362 pleckstrin and Sec7 domain containing 3
ENSG00000152154 TMEM178A NA 130733 transmembrane protein 178A
ENSG00000160460 SPTBN4 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein localizes to the nuclear matrix, PML nuclear bodies, and cytoplasmic vesicles. A highly similar gene in the mouse is required for localization of specific membrane proteins in polarized regions of neurons. Multiple transcript variants encoding different isoforms have been found for this gene. 57731 spectrin beta, non-erythrocytic 4
ENSG00000008710 PKD1 This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. 5310 polycystin 1, transient receptor potential channel interacting
ENSG00000084731 KIF3C NA 3797 kinesin family member 3C
ENSG00000008277 ADAM22 This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins, and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. Unlike other members of the ADAM protein family, the protein encoded by this gene lacks metalloprotease activity since it has no zinc-binding motif. This gene is highly expressed in the brain and may function as an integrin ligand in the brain. In mice, it has been shown to be essential for correct myelination in the peripheral nervous system. Alternative splicing results in several transcript variants. 53616 ADAM metallopeptidase domain 22
ENSG00000105270 CLIP3 This gene encodes a member of the cytoplasmic linker protein 170 family. Members of this protein family contain a cytoskeleton-associated protein glycine-rich domain and mediate the interaction of microtubules with cellular organelles. The encoded protein plays a role in T cell apoptosis by facilitating the association of tubulin and the lipid raft ganglioside GD3. The encoded protein also functions as a scaffold protein mediating membrane localization of phosphorylated protein kinase B. Alternatively spliced transcript variants have been observed for this gene. 25999 CAP-Gly domain containing linker protein 3
ENSG00000179456 ZBTB18 This gene encodes a C2H2-type zinc finger protein which acts a transcriptional repressor of genes involved in neuronal development. The encoded protein recognizes a specific sequence motif and recruits components of chromatin to target genes. Alternative splicing results in multiple transcript variants. 10472 zinc finger and BTB domain containing 18
ENSG00000107742 SPOCK2 This gene encodes a protein which binds with glycosaminoglycans to form part of the extracellular matrix. The protein contains thyroglobulin type-1, follistatin-like, and calcium-binding domains, and has glycosaminoglycan attachment sites in the acidic C-terminal region. Three alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. 9806 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 2
ENSG00000135709 KIAA0513 NA 9764 KIAA0513
ENSG00000197535 MYO5A This gene is one of three myosin V heavy-chain genes, belonging to the myosin gene superfamily. Myosin V is a class of actin-based motor proteins involved in cytoplasmic vesicle transport and anchorage, spindle-pole alignment and mRNA translocation. The protein encoded by this gene is abundant in melanocytes and nerve cells. Mutations in this gene cause Griscelli syndrome type-1 (GS1), Griscelli syndrome type-3 (GS3) and neuroectodermal melanolysosomal disease, or Elejalde disease. Multiple alternatively spliced transcript variants encoding different isoforms have been reported, but the full-length nature of some variants has not been determined. 4644 myosin VA
ENSG00000187189 TSPYL4 NA 23270 TSPY-like 4
ENSG00000109107 ALDOC This gene encodes a member of the class I fructose-biphosphate aldolase gene family. Expressed specifically in the hippocampus and Purkinje cells of the brain, the encoded protein is a glycolytic enzyme that catalyzes the reversible aldol cleavage of fructose-1,6-biphosphate and fructose 1-phosphate to dihydroxyacetone phosphate and either glyceraldehyde-3-phosphate or glyceraldehyde, respectively. 230 aldolase, fructose-bisphosphate C
ENSG00000247556 OIP5-AS1 NA ENSG00000247556 OIP5 antisense RNA 1
ENSG00000178531 CTXN1 NA 404217 cortexin 1
ENSG00000128482 RNF112 This gene encodes a member of the RING finger protein family of transcription factors. The protein is primarily expressed in brain. The gene is located within the Smith-Magenis syndrome region on chromosome 17. 7732 ring finger protein 112
ENSG00000129244 ATP1B2 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 2 subunit. Two transcript variants encoding different isoforms have been found for this gene. 482 ATPase Na+/K+ transporting subunit beta 2
ENSG00000125648 SLC25A23 NA 79085 solute carrier family 25 member 23
ENSG00000135439 AGAP2 The protein encoded by this gene belongs to the centaurin gamma-like family. It mediates anti-apoptotic effects of nerve growth factor by activating nuclear phosphoinositide 3-kinase. It is overexpressed in cancer cells, and promotes cancer cell invasion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 116986 ArfGAP with GTPase domain, ankyrin repeat and PH domain 2
ENSG00000227051 C14orf132 NA ENSG00000227051 chromosome 14 open reading frame 132
ENSG00000109472 CPE This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. 1363 carboxypeptidase E
ENSG00000171867 PRNP The protein encoded by this gene is a membrane glycosylphosphatidylinositol-anchored glycoprotein that tends to aggregate into rod-like structures. The encoded protein contains a highly unstable region of five tandem octapeptide repeats. This gene is found on chromosome 20, approximately 20 kbp upstream of a gene which encodes a biochemically and structurally similar protein to the one encoded by this gene. Mutations in the repeat region as well as elsewhere in this gene have been associated with Creutzfeldt-Jakob disease, fatal familial insomnia, Gerstmann-Straussler disease, Huntington disease-like 1, and kuru. An overlapping open reading frame has been found for this gene that encodes a smaller, structurally unrelated protein, AltPrp. Alternative splicing results in multiple transcript variants. 5621 prion protein
ENSG00000131584 ACAP3 NA 116983 ArfGAP with coiled-coil, ankyrin repeat and PH domains 3
ENSG00000184702 SEPT5 This gene is a member of the septin gene family of nucleotide binding proteins, originally described in yeast as cell division cycle regulatory proteins. Septins are highly conserved in yeast, Drosophila, and mouse and appear to regulate cytoskeletal organization. Disruption of septin function disturbs cytokinesis and results in large multinucleate or polyploid cells. This gene is mapped to 22q11, the region frequently deleted in DiGeorge and velocardiofacial syndromes. A translocation involving the MLL gene and this gene has also been reported in patients with acute myeloid leukemia. Alternative splicing results in multiple transcript variants. The presence of a non-consensus polyA signal (AACAAT) in this gene also results in read-through transcription into the downstream neighboring gene (GP1BB; platelet glycoprotein Ib), whereby larger, non-coding transcripts are produced. 5413 septin 5
ENSG00000108797 CNTNAP1 The gene product was initially identified as a 190-kD protein associated with the contactin-PTPRZ1 complex. The 1,384-amino acid protein, also designated p190 or CASPR for ‘contactin-associated protein,’ includes an extracellular domain with several putative protein-protein interaction domains, a putative transmembrane domain, and a 74-amino acid cytoplasmic domain. Northern blot analysis showed that the gene is transcribed predominantly in brain as a transcript of 6.2 kb, with weak expression in several other tissues tested. The architecture of its extracellular domain is similar to that of neurexins, and this protein may be the signaling subunit of contactin, enabling recruitment and activation of intracellular signaling pathways in neurons. 8506 contactin associated protein 1
ENSG00000165802 NSMF The protein encoded by this gene is involved in guidance of olfactory axon projections and migration of luteinizing hormone-releasing hormone neurons. Defects in this gene are a cause of idiopathic hypogonadotropic hypogonadism (IHH). Several transcript variants encoding different isoforms have been found for this gene. 26012 NMDA receptor synaptonuclear signaling and neuronal migration factor
ENSG00000130758 MAP3K10 The protein encoded by this gene is a member of the serine/threonine kinase family. This kinase has been shown to activate MAPK8/JNK and MKK4/SEK1, and this kinase itself can be phoshorylated, and thus activated by JNK kinases. This kinase functions preferentially on the JNK signaling pathway, and is reported to be involved in nerve growth factor (NGF) induced neuronal apoptosis. 4294 mitogen-activated protein kinase kinase kinase 10
ENSG00000139182 CLSTN3 NA 9746 calsyntenin 3
ENSG00000171130 ATP6V0E2 Multisubunit vacuolar-type proton pumps, or H(+)-ATPases, acidify various intracellular compartments, such as vacuoles, clathrin-coated and synaptic vesicles, endosomes, lysosomes, and chromaffin granules. H(+)-ATPases are also found in plasma membranes of specialized cells, where they play roles in urinary acidification, bone resorption, and sperm maturation. Multiple subunits form H(+)-ATPases, with proteins of the V1 class hydrolyzing ATP for energy to transport H+, and proteins of the V0 class forming an integral membrane domain through which H+ is transported. ATP6V0E2 encodes an isoform of the H(+)-ATPase V0 e subunit, an essential proton pump component (Blake-Palmer et al., 2007 [PubMed 17350184]). 155066 ATPase H+ transporting V0 subunit e2
ENSG00000105662 CRTC1 NA 23373 CREB regulated transcription coactivator 1
ENSG00000132879 FBXO44 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class. It is also a member of the NFB42 (neural F Box 42 kDa) family, similar to F-box only protein 2 and F-box only protein 6. Several alternatively spliced transcript variants encoding two distinct isoforms have been found for this gene. 93611 F-box protein 44
ENSG00000198825 INPP5F The protein encoded by this gene is an inositol 1,4,5-trisphosphate (InsP3) 5-phosphatase and contains a Sac domain. The activity of this protein is specific for phosphatidylinositol 4,5-bisphosphate and phosphatidylinositol 3,4,5-trisphosphate. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 22876 inositol polyphosphate-5-phosphatase F
ENSG00000137267 TUBB2A Microtubules, key participants in processes such as mitosis and intracellular transport, are composed of heterodimers of alpha- and beta-tubulins. The protein encoded by this gene is a beta-tubulin. Defects in this gene are associated with complex cortical dysplasia with other brain malformations-5. Two transcript variants encoding distinct isoforms have been found for this gene. 7280 tubulin beta 2A class IIa
ENSG00000072832 CRMP1 This gene encodes a member of a family of cytosolic phosphoproteins expressed exclusively in the nervous system. The encoded protein is thought to be a part of the semaphorin signal transduction pathway implicated in semaphorin-induced growth cone collapse during neural development. Alternative splicing results in multiple transcript variants. 1400 collapsin response mediator protein 1
ENSG00000174684 B4GAT1 This gene encodes a member of the beta-1,3-N-acetylglucosaminyltransferase family. This enzyme is a type II transmembrane protein. It is essential for the synthesis of poly-N-acetyllactosamine, a determinant for the blood group i antigen. 11041 beta-1,4-glucuronyltransferase 1
ENSG00000130294 KIF1A The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. 547 kinesin family member 1A
ENSG00000128245 YWHAH This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 99% identical to the mouse, rat and bovine orthologs. This gene contains a 7 bp repeat sequence in its 5’ UTR, and changes in the number of this repeat have been associated with early-onset schizophrenia and psychotic bipolar disorder. 7533 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein eta
ENSG00000073670 ADAM11 This gene encodes a member of the ADAM (a disintegrin and metalloprotease) protein family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins, and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. The encoded preproprotein is proteolytically processed to generate the mature protease. This gene represents a candidate tumor suppressor gene for human breast cancer based on its location within a minimal region of chromosome 17q21 previously defined by tumor deletion mapping. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. 4185 ADAM metallopeptidase domain 11
ENSG00000250510 GPR162 This gene was identified upon genomic analysis of a gene-dense region at human chromosome 12p13. It appears to be mainly expressed in the brain; however, its function is not known. Alternatively spliced transcript variants encoding different isoforms have been identified. 27239 G protein-coupled receptor 162
ENSG00000162545 CAMK2N1 NA 55450 calcium/calmodulin dependent protein kinase II inhibitor 1
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",2,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 3 Annotations

out <- mygene::queryMany(gene_list[3,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id summary name symbol query notfound
2167 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. fatty acid binding protein 4 FABP4 ENSG00000170323 NA
5346 The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. perilipin 1 PLIN1 ENSG00000166819 NA
2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. fatty acid synthase FASN ENSG00000169710 NA
2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. glutathione peroxidase 3 GPX3 ENSG00000211445 NA
63924 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. cell death inducing DFFA like effector c CIDEC ENSG00000187288 NA
57104 This gene encodes an enzyme which catalyzes the first step in the hydrolysis of triglycerides in adipose tissue. Mutations in this gene are associated with neutral lipid storage disease with myopathy. patatin like phospholipase domain containing 2 PNPLA2 ENSG00000177666 NA
3991 The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. lipase E, hormone sensitive type LIPE ENSG00000079435 NA
729359 Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). perilipin 4 PLIN4 ENSG00000167676 NA
2819 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. glycerol-3-phosphate dehydrogenase 1 GPD1 ENSG00000167588 NA
948 The protein encoded by this gene is the fourth major glycoprotein of the platelet surface and serves as a receptor for thrombospondin in platelets and various cell lines. Since thrombospondins are widely distributed proteins involved in a variety of adhesive processes, this protein may have important functions as a cell adhesion molecule. It binds to collagen, thrombospondin, anionic phospholipids and oxidized LDL. It directly mediates cytoadherence of Plasmodium falciparum parasitized erythrocytes and it binds long chain fatty acids and may function in the transport and/or as a regulator of fatty acid transport. Mutations in this gene cause platelet glycoprotein deficiency. Multiple alternatively spliced transcript variants have been found for this gene. CD36 molecule CD36 ENSG00000135218 NA
1675 This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. complement factor D (adipsin) CFD ENSG00000197766 NA
7079 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. The secreted, netrin domain-containing protein encoded by this gene is involved in regulation of platelet aggregation and recruitment and may play role in hormonal regulation and endometrial tissue remodeling. TIMP metallopeptidase inhibitor 4 TIMP4 ENSG00000157150 NA
2934 The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. gelsolin GSN ENSG00000148180 NA
81575 APOLD1 is an endothelial cell early response protein that may play a role in regulation of endothelial cell signaling and vascular function (Regard et al., 2004 [PubMed 15102925]). apolipoprotein L domain containing 1 APOLD1 ENSG00000178878 NA
ENSG00000255108 NA NA AP006621.8 ENSG00000255108 NA
50486 NA G0/G1 switch 2 G0S2 ENSG00000123689 NA
57678 This gene encodes a mitochondrial enzyme which prefers saturated fatty acids as its substrate for the synthesis of glycerolipids. This metabolic pathway’s first step is catalyzed by the encoded enzyme. Two forms for this enzyme exist, one in the mitochondria and one in the endoplasmic reticulum. Two alternatively spliced transcript variants have been described for this gene. glycerol-3-phosphate acyltransferase, mitochondrial GPAM ENSG00000119927 NA
123 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. perilipin 2 PLIN2 ENSG00000147872 NA
32 Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. acetyl-CoA carboxylase beta ACACB ENSG00000076555 NA
11067 The expression of this gene is induced by fasting as well as by progesterone. The protein encoded by this gene contains a t-synaptosome-associated protein receptor (SNARE) coiled-coil homology domain and a peroxisomal targeting signal. Production of the encoded protein leads to phosphorylation and activation of the transcription factor ELK1. chromosome 10 open reading frame 10 C10orf10 ENSG00000165507 NA
5577 cAMP is a signaling molecule important for a variety of cellular functions. cAMP exerts its effects by activating the cAMP-dependent protein kinase, which transduces the signal through phosphorylation of different target proteins. The inactive kinase holoenzyme is a tetramer composed of two regulatory and two catalytic subunits. cAMP causes the dissociation of the inactive holoenzyme into a dimer of regulatory subunits bound to four cAMP and two free monomeric catalytic subunits. Four different regulatory subunits and three catalytic subunits have been identified in humans. The protein encoded by this gene is one of the regulatory subunits. This subunit can be phosphorylated by the activated catalytic subunit. This subunit has been shown to interact with and suppress the transcriptional activity of the cAMP responsive element binding protein 1 (CREB1) in activated T cells. Knockout studies in mice suggest that this subunit may play an important role in regulating energy balance and adiposity. The studies also suggest that this subunit may mediate the gene induction and cataleptic behavior induced by haloperidol. protein kinase cAMP-dependent type II regulatory subunit beta PRKAR2B ENSG00000005249 NA
51129 This gene encodes a glycosylated, secreted protein containing a C-terminal fibrinogen domain. The encoded protein is induced by peroxisome proliferation activators and functions as a serum hormone that regulates glucose homeostasis, lipid metabolism, and insulin sensitivity. This protein can also act as an apoptosis survival factor for vascular endothelial cells and can prevent metastasis by inhibiting vascular growth and tumor cell invasion. The C-terminal domain may be proteolytically-cleaved from the full-length secreted protein. Decreased expression of this gene has been associated with type 2 diabetes. Alternative splicing results in multiple transcript variants. This gene was previously referred to as ANGPTL2 but has been renamed ANGPTL4. angiopoietin like 4 ANGPTL4 ENSG00000167772 NA
5468 This gene encodes a member of the peroxisome proliferator-activated receptor (PPAR) subfamily of nuclear receptors. PPARs form heterodimers with retinoid X receptors (RXRs) and these heterodimers regulate transcription of various genes. Three subtypes of PPARs are known: PPAR-alpha, PPAR-delta, and PPAR-gamma. The protein encoded by this gene is PPAR-gamma and is a regulator of adipocyte differentiation. Additionally, PPAR-gamma has been implicated in the pathology of numerous diseases including obesity, diabetes, atherosclerosis and cancer. Alternatively spliced transcript variants that encode different isoforms have been described. peroxisome proliferator activated receptor gamma PPARG ENSG00000132170 NA
84293 NA family with sequence similarity 213 member A FAM213A ENSG00000122378 NA
116362 Due to its chemical instability and low solubility in aqueous solution, vitamin A requires cellular retinol-binding proteins (CRBPs), such as RBP7, for stability, internalization, intercellular transfer, homeostasis, and metabolism. retinol binding protein 7 RBP7 ENSG00000162444 NA
4023 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. lipoprotein lipase LPL ENSG00000175445 NA
125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. alcohol dehydrogenase 1B (class I), beta polypeptide ADH1B ENSG00000196616 NA
5360 The protein encoded by this gene is one of at least two lipid transfer proteins found in human plasma. The encoded protein transfers phospholipids from triglyceride-rich lipoproteins to high density lipoprotein (HDL). In addition to regulating the size of HDL particles, this protein may be involved in cholesterol metabolism. At least two transcript variants encoding different isoforms have been found for this gene. phospholipid transfer protein PLTP ENSG00000100979 NA
23452 Angiopoietins are members of the vascular endothelial growth factor family and the only known growth factors largely specific for vascular endothelium. Angiopoietin-1, angiopoietin-2, and angiopoietin-4 participate in the formation of blood vessels. ANGPTL2 protein is a secreted glycoprotein with homology to the angiopoietins and may exert a function on endothelial cells through autocrine or paracrine action. angiopoietin like 2 ANGPTL2 ENSG00000136859 NA
5176 The protein encoded by this gene is a member of the serpin family, although it does not display the serine protease inhibitory activity shown by many of the other serpin family members. The encoded protein is secreted and strongly inhibits angiogenesis. In addition, this protein is a neurotrophic factor involved in neuronal differentiation in retinoblastoma cells. serpin family F member 1 SERPINF1 ENSG00000132386 NA
2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. glutamate-ammonia ligase GLUL ENSG00000135821 NA
NA NA NA NA ENSG00000256545 TRUE
9590 The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein is expressed in endothelial cells, cultured fibroblasts, and osteosarcoma cells. It associates with protein kinases A and C and phosphatase, and serves as a scaffold protein in signal transduction. This protein and RII PKA colocalize at the cell periphery. This protein is a cell growth-related protein. Antibodies to this protein can be produced by patients with myasthenia gravis. Alternative splicing of this gene results in two transcript variants encoding different isoforms. A-kinase anchoring protein 12 AKAP12 ENSG00000131016 NA
132720 NA chromosome 4 open reading frame 32 C4orf32 ENSG00000174749 NA
10252 NA sprouty RTK signaling antagonist 1 SPRY1 ENSG00000164056 NA
60481 This gene belongs to the ELO family. It is highly expressed in the adrenal gland and testis, and encodes a multi-pass membrane protein that is localized in the endoplasmic reticulum. This protein is involved in the elongation of long-chain polyunsaturated fatty acids. Mutations in this gene have been associated with spinocerebellar ataxia-38 (SCA38). Alternatively spliced transcript variants have been found for this gene. ELOVL fatty acid elongase 5 ELOVL5 ENSG00000012660 NA
154807 NA vitamin K epoxide reductase complex subunit 1 like 1 VKORC1L1 ENSG00000196715 NA
3479 The protein encoded by this gene is similar to insulin in function and structure and is a member of a family of proteins involved in mediating growth and development. The encoded protein is processed from a precursor, bound by a specific receptor, and secreted. Defects in this gene are a cause of insulin-like growth factor I deficiency. Alternative splicing results in multiple transcript variants encoding different isoforms that may undergo similar processing to generate mature protein. insulin like growth factor 1 IGF1 ENSG00000017427 NA
10555 This gene encodes a member of the 1-acylglycerol-3-phosphate O-acyltransferase family. The protein is located within the endoplasmic reticulum membrane and converts lysophosphatidic acid to phosphatidic acid, the second step in de novo phospholipid biosynthesis. Mutations in this gene have been associated with congenital generalized lipodystrophy (CGL), or Berardinelli-Seip syndrome, a disease characterized by a near absence of adipose tissue and severe insulin resistance. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 1-acylglycerol-3-phosphate O-acyltransferase 2 AGPAT2 ENSG00000169692 NA
8483 Major alterations in the composition of the cartilage extracellular matrix occur in joint disease, such as osteoarthrosis. This gene encodes the cartilage intermediate layer protein (CILP), which increases in early osteoarthrosis cartilage. The encoded protein was thought to encode a protein precursor for two different proteins; an N-terminal CILP and a C-terminal homolog of NTPPHase, however, later studies identified no nucleotide pyrophosphatase phosphodiesterase (NPP) activity. The full-length and the N-terminal domain of this protein was shown to function as an IGF-1 antagonist. An allelic variant of this gene has been associated with lumbar disc disease. cartilage intermediate layer protein CILP ENSG00000138615 NA
23344 NA extended synaptotagmin protein 1 ESYT1 ENSG00000139641 NA
1979 This gene encodes a member of the eukaryotic translation initiation factor 4E binding protein family. The gene products of this family bind eIF4E and inhibit translation initiation. However, insulin and other growth factors can release this inhibition via a phosphorylation-dependent disruption of their binding to eIF4E. Regulation of protein production through these gene products have been implicated in cell proliferation, cell differentiation and viral infection. eukaryotic translation initiation factor 4E binding protein 2 EIF4EBP2 ENSG00000148730 NA
79812 This gene encodes a protein belonging to the member of elastin microfibril interface-located (EMILIN) protein family. This family member is an extracellular matrix glycoprotein that can interfere with tumor angiogenesis and growth. It serves as a transforming growth factor beta antagonist and can interfere with the VEGF-A/VEGFR2 pathway. A related pseudogene has been identified on chromosome 6. multimerin 2 MMRN2 ENSG00000173269 NA
7049 This locus encodes the transforming growth factor (TGF)-beta type III receptor. The encoded receptor is a membrane proteoglycan that often functions as a co-receptor with other TGF-beta receptor superfamily members. Ectodomain shedding produces soluble TGFBR3, which may inhibit TGFB signaling. Decreased expression of this receptor has been observed in various cancers. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. transforming growth factor beta receptor III TGFBR3 ENSG00000069702 NA
NA NA NA NA ENSG00000117289 TRUE
2532 The protein encoded by this gene is a glycosylated membrane protein and a non-specific receptor for several chemokines. The encoded protein is the receptor for the human malarial parasites Plasmodium vivax and Plasmodium knowlesi. Polymorphisms in this gene are the basis of the Duffy blood group system. Two transcript variants encoding different isoforms have been found for this gene. atypical chemokine receptor 1 (Duffy blood group) ACKR1 ENSG00000213088 NA
665 This gene encodes a protein that belongs to the pro-apoptotic subfamily within the Bcl-2 family of proteins. The encoded protein binds to Bcl-2 and possesses the BH3 domain. The protein directly targets mitochondria and causes apoptotic changes, including loss of membrane potential and the release of cytochrome c. BCL2/adenovirus E1B 19kDa interacting protein 3-like BNIP3L ENSG00000104765 NA
947 The protein encoded by this gene may play a role in the attachment of stem cells to the bone marrow extracellular matrix or to stromal cells. This single-pass membrane protein is highly glycosylated and phosphorylated by protein kinase C. Two transcript variants encoding different isoforms have been found for this gene. CD34 molecule CD34 ENSG00000174059 NA
NA NA NA NA ENSG00000156750 TRUE
7048 This gene encodes a member of the Ser/Thr protein kinase family and the TGFB receptor subfamily. The encoded protein is a transmembrane protein that has a protein kinase domain, forms a heterodimeric complex with another receptor protein, and binds TGF-beta. This receptor/ligand complex phosphorylates proteins, which then enter the nucleus and regulate the transcription of a subset of genes related to cell proliferation. Mutations in this gene have been associated with Marfan Syndrome, Loeys-Deitz Aortic Aneurysm Syndrome, and the development of various types of tumors. Alternatively spliced transcript variants encoding different isoforms have been characterized. transforming growth factor beta receptor II TGFBR2 ENSG00000163513 NA
7450 This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. von Willebrand factor VWF ENSG00000110799 NA
84883 This gene encodes a flavoprotein oxidoreductase that binds single stranded DNA and is thought to contribute to apoptosis in the presence of bacterial and viral DNA. The expression of this gene is also found to be induced by tumor suppressor protein p53 in colon cancer cells. apoptosis inducing factor, mitochondria associated 2 AIFM2 ENSG00000042286 NA
1368 The protein encoded by this gene is a membrane-bound arginine/lysine carboxypeptidase. Its expression is associated with monocyte to macrophage differentiation. This encoded protein contains hydrophobic regions at the amino and carboxy termini and has 6 potential asparagine-linked glycosylation sites. The active site residues of carboxypeptidases A and B are conserved in this protein. Three alternatively spliced transcript variants encoding the same protein have been described for this gene. carboxypeptidase M CPM ENSG00000135678 NA
6776 The protein encoded by this gene is a member of the STAT family of transcription factors. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators. This protein is activated by, and mediates the responses of many cell ligands, such as IL2, IL3, IL7 GM-CSF, erythropoietin, thrombopoietin, and different growth hormones. Activation of this protein in myeloma and lymphoma associated with a TEL/JAK2 gene fusion is independent of cell stimulus and has been shown to be essential for tumorigenesis. The mouse counterpart of this gene is found to induce the expression of BCL2L1/BCL-X(L), which suggests the antiapoptotic function of this gene in cells. Alternatively spliced transcript variants have been found for this gene. signal transducer and activator of transcription 5A STAT5A ENSG00000126561 NA
23593 NA heme binding protein 2 HEBP2 ENSG00000051620 NA
7078 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. TIMP metallopeptidase inhibitor 3 TIMP3 ENSG00000100234 NA
11343 This gene encodes a serine hydrolase of the AB hydrolase superfamily that catalyzes the conversion of monoacylglycerides to free fatty acids and glycerol. The encoded protein plays a critical role in several physiological processes including pain and nociperception through hydrolysis of the endocannabinoid 2-arachidonoylglycerol. Expression of this gene may play a role in cancer tumorigenesis and metastasis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. monoglyceride lipase MGLL ENSG00000074416 NA
3486 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein forms a ternary complex with insulin-like growth factor acid-labile subunit (IGFALS) and either insulin-like growth factor (IGF) I or II. In this form, it circulates in the plasma, prolonging the half-life of IGFs and altering their interaction with cell surface receptors. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. insulin like growth factor binding protein 3 IGFBP3 ENSG00000146674 NA
4641 This gene encodes a member of the unconventional myosin protein family, which are actin-based molecular motors. The protein is found in the cytoplasm, and one isoform with a unique N-terminus is also found in the nucleus. The nuclear isoform associates with RNA polymerase I and II and functions in transcription initiation. The mouse ortholog of this protein also functions in intracellular vesicle transport to the plasma membrane. Multiple transcript variants encoding different isoforms have been found for this gene. The related gene myosin IE has been referred to as myosin IC in the literature, but it is a distinct locus on chromosome 19. myosin IC MYO1C ENSG00000197879 NA
80832 The protein encoded by this gene is a member of the apolipoprotein L family and may play a role in lipid exchange and transport throughout the body, as well as in reverse cholesterol transport from peripheral cells to the liver. Two transcript variants encoding two different isoforms have been found for this gene. Only one of the isoforms appears to be a secreted protein. apolipoprotein L4 APOL4 ENSG00000100336 NA
10544 The protein encoded by this gene is a receptor for activated protein C, a serine protease activated by and involved in the blood coagulation pathway. The encoded protein is an N-glycosylated type I membrane protein that enhances the activation of protein C. Mutations in this gene have been associated with venous thromboembolism and myocardial infarction, as well as with late fetal loss during pregnancy. The encoded protein may also play a role in malarial infection and has been associated with cancer. protein C receptor PROCR ENSG00000101000 NA
9945 NA glutamine-fructose-6-phosphate transaminase 2 GFPT2 ENSG00000131459 NA
23580 The product of this gene is a member of the CDC42-binding protein family. Members of this family interact with Rho family GTPases and regulate the organization of the actin cytoskeleton. This protein has been shown to bind both CDC42 and TC10 GTPases in a GTP-dependent manner. When overexpressed in fibroblasts, this protein was able to induce pseudopodia formation, which suggested a role in inducing actin filament assembly and cell shape control. CDC42 effector protein 4 CDC42EP4 ENSG00000179604 NA
2152 This gene encodes coagulation factor III which is a cell surface glycoprotein. This factor enables cells to initiate the blood coagulation cascades, and it functions as the high-affinity receptor for the coagulation factor VII. The resulting complex provides a catalytic event that is responsible for initiation of the coagulation protease cascades by specific limited proteolysis. Unlike the other cofactors of these protease cascades, which circulate as nonfunctional precursors, this factor is a potent initiator that is fully functional when expressed on cell surfaces. There are 3 distinct domains of this factor: extracellular, transmembrane, and cytoplasmic. This protein is the only one in the coagulation pathway for which a congenital deficiency has not been described. Alternate splicing results in multiple transcript variants. coagulation factor III, tissue factor F3 ENSG00000117525 NA
2687 This gene is a member of the gamma-glutamyl transpeptidase gene family, and some reports indicate that it is capable of cleaving the gamma-glutamyl moiety of glutathione. The protein encoded by this gene is synthesized as a single, catalytically-inactive polypeptide, that is processed post-transcriptionally to form a heavy and light subunit, with the catalytic activity contained within the small subunit. The encoded enzyme is able to convert leukotriene C4 to leukotriene D4, but appears to have distinct substrate specificity compared to gamma-glutamyl transpeptidase. Alternative splicing results in multiple transcript variants encoding different isoforms. gamma-glutamyltransferase 5 GGT5 ENSG00000099998 NA
7481 The WNT gene family consists of structurally related genes which encode secreted signaling proteins. These proteins have been implicated in oncogenesis and in several developmental processes, including regulation of cell fate and patterning during embryogenesis. This gene is a member of the WNT gene family. It encodes a protein which shows 97%, 85%, and 63% amino acid identity with mouse, chicken, and Xenopus Wnt11 protein, respectively. This gene may play roles in the development of skeleton, kidney and lung, and is considered to be a plausible candidate gene for High Bone Mass Syndrome. Wnt family member 11 WNT11 ENSG00000085741 NA
4489 NA metallothionein 1A MT1A ENSG00000205362 NA
23328 NA SAM and SH3 domain containing 1 SASH1 ENSG00000111961 NA
115330 NA G protein-coupled receptor 146 GPR146 ENSG00000164849 NA
56265 This gene likely encodes a member of the carboxypeptidase family of proteins. Cloning of a comparable locus in mouse indicates that the encoded protein contains a discoidin domain and a carboxypeptidase domain, but the protein appears to lack residues necessary for carboxypeptidase activity. carboxypeptidase X (M14 family), member 1 CPXM1 ENSG00000088882 NA
375061 NA family with sequence similarity 89 member A FAM89A ENSG00000182118 NA
5140 NA phosphodiesterase 3B PDE3B ENSG00000152270 NA
83636 This gene encodes a small transmembrane protein. Mutations in this gene are a cause of neurodegeneration with brain iron accumulation-4 (NBIA4), but the specific function of the encoded protein is unknown. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. chromosome 19 open reading frame 12 C19orf12 ENSG00000131943 NA
10114 NA homeodomain interacting protein kinase 3 HIPK3 ENSG00000110422 NA
6675 NA UDP-N-acetylglucosamine pyrophosphorylase 1 UAP1 ENSG00000117143 NA
9475 The protein encoded by this gene is a serine/threonine kinase that regulates cytokinesis, smooth muscle contraction, the formation of actin stress fibers and focal adhesions, and the activation of the c-fos serum response element. This protein, which is an isozyme of ROCK1 is a target for the small GTPase Rho. Rho associated coiled-coil containing protein kinase 2 ROCK2 ENSG00000134318 NA
9397 This gene encodes one of two N-myristoyltransferase proteins. N-terminal myristoylation is a lipid modification that is involved in regulating the function and localization of signaling proteins. The encoded protein catalyzes the addition of a myristoyl group to the N-terminal glycine residue of many signaling proteins, including the human immunodeficiency virus type 1 (HIV-1) proteins, Gag and Nef. Alternative splicing results in multiple transcript variants. N-myristoyltransferase 2 NMT2 ENSG00000152465 NA
9588 The protein encoded by this gene is a member of the thiol-specific antioxidant protein family. This protein is a bifunctional enzyme with two distinct active sites. It is involved in redox regulation of the cell; it can reduce H(2)O(2) and short chain organic, fatty acid, and phospholipid hydroperoxides. It may play a role in the regulation of phospholipid turnover as well as in protection against oxidative injury. peroxiredoxin 6 PRDX6 ENSG00000117592 NA
48 The protein encoded by this gene is a bifunctional, cytosolic protein that functions as an essential enzyme in the TCA cycle and interacts with mRNA to control the levels of iron inside cells. When cellular iron levels are high, this protein binds to a 4Fe-4S cluster and functions as an aconitase. Aconitases are iron-sulfur proteins that function to catalyze the conversion of citrate to isocitrate. When cellular iron levels are low, the protein binds to iron-responsive elements (IREs), which are stem-loop structures found in the 5’ UTR of ferritin mRNA, and in the 3’ UTR of transferrin receptor mRNA. When the protein binds to IRE, it results in repression of translation of ferritin mRNA, and inhibition of degradation of the otherwise rapidly degraded transferrin receptor mRNA. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alternative splicing results in multiple transcript variants aconitase 1 ACO1 ENSG00000122729 NA
84173 NA ELMO domain containing 3 ELMOD3 ENSG00000115459 NA
1282 This gene encodes a type IV collagen alpha protein. Type IV collagen proteins are integral components of basement membranes. This gene shares a bidirectional promoter with a paralogous gene on the opposite strand. The protein consists of an amino-terminal 7S domain, a triple-helix forming collagenous domain, and a carboxy-terminal non-collagenous domain. It functions as part of a heterotrimer and interacts with other extracellular matrix components such as perlecans, proteoglycans, and laminins. In addition, proteolytic cleavage of the non-collagenous carboxy-terminal domain results in a biologically active fragment known as arresten, which has anti-angiogenic and tumor suppressor properties. Mutations in this gene cause porencephaly, cerebrovascular disease, and renal and muscular defects. Alternative splicing results in multiple transcript variants. collagen type IV alpha 1 COL4A1 ENSG00000187498 NA
4232 This gene encodes a member of the alpha/beta hydrolase superfamily. It is imprinted, exhibiting preferential expression from the paternal allele in fetal tissues, and isoform-specific imprinting in lymphocytes. The loss of imprinting of this gene has been linked to certain types of cancer and may be due to promotor switching. The encoded protein may play a role in development. Alternatively spliced transcript variants encoding multiple isoforms have been identified for this gene. Pseudogenes of this gene are located on the short arm of chromosomes 3 and 4, and the long arm of chromosomes 6 and 15. mesoderm specific transcript MEST ENSG00000106484 NA
64757 NA mitochondrial amidoxime reducing component 1 MARC1 ENSG00000186205 NA
54884 NA retinol saturase (all-trans-retinol 13,14-reductase) RETSAT ENSG00000042445 NA
4828 This gene encodes a member of the bombesin-like family of neuropeptides, which negatively regulate eating behavior. The encoded protein may regulate colonic smooth muscle contraction through binding to its cognate receptor, the neuromedin B receptor (NMBR). Polymorphisms of this gene may be associated with hunger, weight gain and obesity. Alternative splicing results in multiple transcript variants. neuromedin B NMB ENSG00000197696 NA
51351 NA zinc finger protein 117 ZNF117 ENSG00000152926 NA
ENSG00000257607 NA NA RP11-449P15.1 ENSG00000257607 NA
1901 The protein encoded by this gene is structurally similar to G protein-coupled receptors and is highly expressed in endothelial cells. It binds the ligand sphingosine-1-phosphate with high affinity and high specificity, and suggested to be involved in the processes that regulate the differentiation of endothelial cells. Activation of this receptor induces cell-cell adhesion. Alternative splicing results in multiple transcript variants. sphingosine-1-phosphate receptor 1 S1PR1 ENSG00000170989 NA
80833 This gene is a member of the apolipoprotein L gene family, and it is present in a cluster with other family members on chromosome 22. The encoded protein is found in the cytoplasm, where it may affect the movement of lipids, including cholesterol, and/or allow the binding of lipids to organelles. In addition, expression of this gene is up-regulated by tumor necrosis factor-alpha in endothelial cells lining the normal and atherosclerotic iliac artery and aorta. Alternative splicing results in multiple transcript variants. apolipoprotein L3 APOL3 ENSG00000128284 NA
9270 The cytoplasmic domains of integrins are essential for cell adhesion. The protein encoded by this gene binds to the beta1 integrin cytoplasmic domain. The interaction between this protein and beta1 integrin is highly specific. Two isoforms of this protein are derived from alternatively spliced transcripts. The shorter form of this protein does not interact with the beta1 integrin cytoplasmic domain. The longer form is a phosphoprotein and the extent of its phosphorylation is regulated by the cell-matrix interaction, suggesting an important role of this protein during integrin-dependent cell adhesion. Several transcript variants, some protein-coding and some non-protein coding, have been found for this gene. integrin subunit beta 1 binding protein 1 ITGB1BP1 ENSG00000119185 NA
84230 NA leucine-rich repeat containing 8 family member C LRRC8C ENSG00000171488 NA
54941 This gene encodes a novel E3 ubiquitin ligase that contains a RING finger domain in the N-terminus and three zinc-binding and one ubiquitin-interacting motif in the C-terminus. As a result of myristoylation, this protein associates with membranes and is primarily localized to intracellular membrane systems. The encoded protein may function as a positive regulator in the T-cell receptor signaling pathway. ring finger protein 125, E3 ubiquitin protein ligase RNF125 ENSG00000101695 NA
7423 This gene encodes a member of the PDGF (platelet-derived growth factor)/VEGF (vascular endothelial growth factor) family. The VEGF family members regulate the formation of blood vessels and are involved in endothelial cell physiology. This member is a ligand for VEGFR-1 (vascular endothelial growth factor receptor 1) and NRP-1 (neuropilin-1). Studies in mice showed that this gene was co-expressed with nuclear-encoded mitochondrial genes and the encoded protein specifically controlled endothelial uptake of fatty acids. Alternatively spliced transcript variants encoding distinct isoforms have been identified. vascular endothelial growth factor B VEGFB ENSG00000173511 NA
125058 NA TBC1 domain family member 16 TBC1D16 ENSG00000167291 NA
2321 This gene encodes a member of the vascular endothelial growth factor receptor (VEGFR) family. VEGFR family members are receptor tyrosine kinases (RTKs) which contain an extracellular ligand-binding region with seven immunoglobulin (Ig)-like domains, a transmembrane segment, and a tyrosine kinase (TK) domain within the cytoplasmic domain. This protein binds to VEGFR-A, VEGFR-B and placental growth factor and plays an important role in angiogenesis and vasculogenesis. Expression of this receptor is found in vascular endothelial cells, placental trophoblast cells and peripheral blood monocytes. Multiple transcript variants encoding different isoforms have been found for this gene. Isoforms include a full-length transmembrane receptor isoform and shortened, soluble isoforms. The soluble isoforms are associated with the onset of pre-eclampsia. fms related tyrosine kinase 1 FLT1 ENSG00000102755 NA
1879 NA early B-cell factor 1 EBF1 ENSG00000164330 NA
2690 This gene encodes a member of the type I cytokine receptor family, which is a transmembrane receptor for growth hormone. Binding of growth hormone to the receptor leads to receptor dimerization and the activation of an intra- and intercellular signal transduction pathway leading to growth. Mutations in this gene have been associated with Laron syndrome, also known as the growth hormone insensitivity syndrome (GHIS), a disorder characterized by short stature. In humans and rabbits, but not rodents, growth hormone binding protein (GHBP) is generated by proteolytic cleavage of the extracellular ligand-binding domain from the mature growth hormone receptor protein. Multiple alternatively spliced transcript variants have been found for this gene. growth hormone receptor GHR ENSG00000112964 NA
220 This gene encodes an aldehyde dehydrogenase enzyme that uses retinal as a substrate. Mutations in this gene have been associated with microphthalmia, isolated 8, and expression changes have also been detected in tumor cells. Alternative splicing results in multiple transcript variants. aldehyde dehydrogenase 1 family member A3 ALDH1A3 ENSG00000184254 NA
NA NA NA NA ENSG00000229645 TRUE
4337 Molybdenum cofactor biosynthesis is a conserved pathway leading to the biological activation of molybdenum. The protein encoded by this gene is involved in this pathway. This gene was originally thought to produce a bicistronic mRNA with the potential to produce two proteins (MOCS1A and MOCS1B) from adjacent open reading frames. However, only the first open reading frame (MOCS1A) has been found to encode a protein from the putative bicistronic mRNA, whereas additional splice variants, whose full-length natures have yet to be determined, are likely to produce a fusion between the two open reading frames. This gene is defective in patients with molybdenum cofactor deficiency, type A. A related pseudogene has been identified on chromosome 16. molybdenum cofactor synthesis 1 MOCS1 ENSG00000124615 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",3,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 4 Annotations

out <- mygene::queryMany(gene_list[4,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query X_id name summary symbol notfound
ENSG00000163017 72 actin, gamma 2, smooth muscle, enteric Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ACTG2 NA
ENSG00000133392 4629 myosin, heavy chain 11, smooth muscle The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. MYH11 NA
ENSG00000182253 23336 synemin The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. SYNM NA
ENSG00000065534 4638 myosin light chain kinase This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. MYLK NA
ENSG00000130176 1264 calponin 1 NA CNN1 NA
ENSG00000159176 1465 cysteine and glycine rich protein 1 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. CSRP1 NA
ENSG00000183963 6525 smoothelin This gene encodes a structural protein that is found exclusively in contractile smooth muscle cells. It associates with stress fibers and constitutes part of the cytoskeleton. This gene is localized to chromosome 22q12.3, distal to the TUPLE1 locus and outside the DiGeorge syndrome deletion. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. SMTN NA
ENSG00000269936 ENSG00000269936 NA NA RP11-394O4.5 NA
ENSG00000075073 6865 tachykinin receptor 2 This gene belongs to a family of genes that function as receptors for tachykinins. Receptor affinities are specified by variations in the 5’-end of the sequence. The receptors belonging to this family are characterized by interactions with G proteins and 7 hydrophobic transmembrane regions. This gene encodes the receptor for the tachykinin neuropeptide substance K, also referred to as neurokinin A. TACR2 NA
ENSG00000101335 10398 myosin light chain 9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. MYL9 NA
ENSG00000259716 NA NA NA NA TRUE
ENSG00000129116 23022 palladin, cytoskeletal associated protein This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. PALLD NA
ENSG00000259627 ENSG00000259627 NA NA RP11-244F12.2 NA
ENSG00000263335 ENSG00000263335 NA NA AF001548.5 NA
ENSG00000154330 5239 phosphoglucomutase 5 Phosphoglucomutases (EC 5.2.2.2.), such as PGM5, are phosphotransferases involved in interconversion of glucose-1-phosphate and glucose-6-phosphate. PGM activity is essential in formation of carbohydrates from glucose-6-phosphate and in formation of glucose-6-phosphate from galactose and glycogen (Edwards et al., 1995 [PubMed 8586438]). PGM5 NA
ENSG00000058668 493 ATPase plasma membrane Ca2+ transporting 4 The protein encoded by this gene belongs to the family of P-type primary ion transport ATPases characterized by the formation of an aspartyl phosphate intermediate during the reaction cycle. These enzymes remove bivalent calcium ions from eukaryotic cells against very large concentration gradients and play a critical role in intracellular calcium homeostasis. The mammalian plasma membrane calcium ATPase isoforms are encoded by at least four separate genes and the diversity of these enzymes is further increased by alternative splicing of transcripts. The expression of different isoforms and splice variants is regulated in a developmental, tissue- and cell type-specific manner, suggesting that these pumps are functionally adapted to the physiological needs of particular cells and tissues. This gene encodes the plasma membrane calcium ATPase isoform 4. Alternatively spliced transcript variants encoding different isoforms have been identified. ATP2B4 NA
ENSG00000263065 ENSG00000263065 NA NA AF001548.6 NA
ENSG00000122786 800 caldesmon 1 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. CALD1 NA
ENSG00000111696 51559 5’-nucleotidase domain containing 3 NA NT5DC3 NA
ENSG00000075624 60 actin, beta This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB NA
ENSG00000092841 4637 myosin light chain 6 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain that is expressed in smooth muscle and non-muscle tissues. Genomic sequences representing several pseudogenes have been described and two transcript variants encoding different isoforms have been identified for this gene. MYL6 NA
ENSG00000095637 10580 sorbin and SH3 domain containing 1 This gene encodes a CBL-associated protein which functions in the signaling and stimulation of insulin. Mutations in this gene may be associated with human disorders of insulin resistance. Alternative splicing results in multiple transcript variants. SORBS1 NA
ENSG00000261054 ENSG00000261054 NA NA RP11-6O2.4 NA
ENSG00000023902 51177 pleckstrin homology domain containing O1 NA PLEKHO1 NA
ENSG00000197256 25959 KN motif and ankyrin repeat domains 2 NA KANK2 NA
ENSG00000106772 158471 prune homolog 2 (Drosophila) The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. PRUNE2 NA
ENSG00000163297 118429 anthrax toxin receptor 2 This gene encodes a receptor for anthrax toxin. The protein binds to collagen IV and laminin, suggesting that it may be involved in extracellular matrix adhesion. Mutations in this gene cause juvenile hyaline fibromatosis and infantile systemic hyalinosis. Multiple transcript variants encoding different isoforms have been found for this gene. ANTXR2 NA
ENSG00000072163 55679 LIM zinc finger domain containing 2 This gene encodes a member of a small family of focal adhesion proteins which interacts with ILK (integrin-linked kinase), a protein which effects protein-protein interactions with the extraceullar matrix. The encoded protein has five LIM domains, each domain forming two zinc fingers, which permit interactions which regulate cell shape and migration. A pseudogene of this gene is located on chromosome 4. Multiple transcript variants encoding different isoforms have been found for this gene. LIMS2 NA
ENSG00000156113 3778 potassium calcium-activated channel subfamily M alpha 1 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified. KCNMA1 NA
ENSG00000125503 54776 protein phosphatase 1 regulatory subunit 12C The gene encodes a subunit of myosin phosphatase. The encoded protein regulates the catalytic activity of protein phosphatase 1 delta and assembly of the actin cytoskeleton. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. PPP1R12C NA
ENSG00000113657 1809 dihydropyrimidinase like 3 NA DPYSL3 NA
ENSG00000100994 5834 phosphorylase, glycogen; brain The protein encoded by this gene is a glycogen phosphorylase found predominantly in the brain. The encoded protein forms homodimers which can associate into homotetramers, the enzymatically active form of glycogen phosphorylase. The activity of this enzyme is positively regulated by AMP and negatively regulated by ATP, ADP, and glucose-6-phosphate. This enzyme catalyzes the rate-determining step in glycogen degradation. PYGB NA
ENSG00000007866 7005 TEA domain transcription factor 3 This gene product is a member of the transcriptional enhancer factor (TEF) family of transcription factors, which contain the TEA/ATTS DNA-binding domain. It is predominantly expressed in the placenta and is involved in the transactivation of the chorionic somatomammotropin-B gene enhancer. Translation of this protein is initiated at a non-AUG (AUA) start codon. TEAD3 NA
ENSG00000163681 7871 sarcolemma associated protein This gene encodes a component of a conserved striatin-interacting phosphatase and kinase complex. Striatin family complexes participate in a variety of cellular processes including signaling, cell cycle control, cell migration, Golgi assembly, and apoptosis. The protein encoded by this gene is a coiled-coil, tail-anchored membrane protein with a single C-terminal transmembrane domain that is posttranslationally inserted into membranes. Mutations in this gene are associated with Brugada syndrome, a cardiac channelopathy. Alternative splicing results in multiple transcript variants. SLMAP NA
ENSG00000065882 23216 TBC1 domain family member 1 TBC1D1 is the founding member of a family of proteins sharing a 180- to 200-amino acid TBC domain presumed to have a role in regulating cell growth and differentiation. These proteins share significant homology with TRE2 (USP6; MIM 604334), yeast Bub2, and CDC16 (MIM 603461) (White et al., 2000 [PubMed 10965142]). TBC1D1 NA
ENSG00000180672 NA NA NA NA TRUE
ENSG00000121440 23024 PDZ domain containing ring finger 3 This gene encodes a member of the LNX (Ligand of Numb Protein-X) family of RING-type ubiquitin E3 ligases. This protein may function in vascular morphogenesis and the differentiation of adipocytes, osteoblasts and myoblasts. This protein may be targeted for degradation by the human papilloma virus E6 protein. Alternative splicing results in multiple transcript variants. PDZRN3 NA
ENSG00000135269 26136 testin LIM domain protein Cancer-associated chromosomal changes often involve regions containing fragile sites. This gene maps to a commom fragile site on chromosome 7q31.2 designated FRA7G. This gene is similar to mouse Testin, a testosterone-responsive gene encoding a Sertoli cell secretory protein containing three LIM domains. LIM domains are double zinc-finger motifs that mediate protein-protein interactions between transcription factors, cytoskeletal proteins and signaling proteins. This protein is a negative regulator of cell growth and may act as a tumor suppressor. This scaffold protein may also play a role in cell adhesion, cell spreading and in the reorganization of the actin cytoskeleton. Multiple protein isoforms are encoded by transcript variants of this gene. TES NA
ENSG00000058272 4659 protein phosphatase 1 regulatory subunit 12A Myosin phosphatase target subunit 1, which is also called the myosin-binding subunit of myosin phosphatase, is one of the subunits of myosin phosphatase. Myosin phosphatase regulates the interaction of actin and myosin downstream of the guanosine triphosphatase Rho. The small guanosine triphosphatase Rho is implicated in myosin light chain (MLC) phosphorylation, which results in contraction of smooth muscle and interaction of actin and myosin in nonmuscle cells. The guanosine triphosphate (GTP)-bound, active form of RhoA (GTP.RhoA) specifically interacted with the myosin-binding subunit (MBS) of myosin phosphatase, which regulates the extent of phosphorylation of MLC. Rho-associated kinase (Rho-kinase), which is activated by GTP. RhoA, phosphorylated MBS and consequently inactivated myosin phosphatase. Overexpression of RhoA or activated RhoA in NIH 3T3 cells increased phosphorylation of MBS and MLC. Thus, Rho appears to inhibit myosin phosphatase through the action of Rho-kinase. Several transcript variants encoding different isoforms have been found for this gene. PPP1R12A NA
ENSG00000261616 ENSG00000261616 NA NA RP11-6O2.3 NA
ENSG00000112658 6722 serum response factor This gene encodes a ubiquitous nuclear protein that stimulates both cell proliferation and differentiation. It is a member of the MADS (MCM1, Agamous, Deficiens, and SRF) box superfamily of transcription factors. This protein binds to the serum response element (SRE) in the promoter region of target genes. This protein regulates the activity of many immediate-early genes, for example c-fos, and thereby participates in cell cycle regulation, apoptosis, cell growth, and cell differentiation. This gene is the downstream target of many pathways; for example, the mitogen-activated protein kinase pathway (MAPK) that acts through the ternary complex factors (TCFs). Two transcript variants encoding different isoforms have been found for this gene. SRF NA
ENSG00000097007 25 ABL proto-oncogene 1, non-receptor tyrosine kinase This gene is a protooncogene that encodes a protein tyrosine kinase involved in a variety of cellular processes, including cell division, adhesion, differentiation, and response to stress. The activity of the protein is negatively regulated by its SH3 domain, whereby deletion of the region encoding this domain results in an oncogene. The ubiquitously expressed protein has DNA-binding activity that is regulated by CDC2-mediated phosphorylation, suggesting a cell cycle function. This gene has been found fused to a variety of translocation partner genes in various leukemias, most notably the t(9;22) translocation that results in a fusion with the 5’ end of the breakpoint cluster region gene (BCR; MIM:151410). Alternative splicing of this gene results in two transcript variants, which contain alternative first exons that are spliced to the remaining common exons. ABL1 NA
ENSG00000118496 84085 F-box protein 30 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class and it is upregulated in nasopharyngeal carcinoma. FBXO30 NA
ENSG00000116473 5906 RAP1A, member of RAS oncogene family This gene encodes a member of the Ras family of small GTPases. The encoded protein undergoes a change in conformational state and activity, depending on whether it is bound to GTP or GDP. This protein is activated by several types of guanine nucleotide exchange factors (GEFs), and inactivated by two groups of GTPase-activating proteins (GAPs). The activation status of the encoded protein is therefore affected by the balance of intracellular levels of GEFs and GAPs. The encoded protein regulates signaling pathways that affect cell proliferation and adhesion, and may play a role in tumor malignancy. Pseudogenes of this gene have been defined on chromosomes 14 and 17. Alternative splicing results in multiple transcript variants. RAP1A NA
ENSG00000101447 81610 family with sequence similarity 83 member D NA FAM83D NA
ENSG00000121067 8405 speckle type BTB/POZ protein This gene encodes a protein that may modulate the transcriptional repression activities of death-associated protein 6 (DAXX), which interacts with histone deacetylase, core histones, and other histone-associated proteins. In mouse, the encoded protein binds to the putative leucine zipper domain of macroH2A1.2, a variant H2A histone that is enriched on inactivated X chromosomes. The BTB/POZ domain of this protein has been shown in other proteins to mediate transcriptional repression and to interact with components of histone deacetylase co-repressor complexes. Alternative splicing of this gene results in multiple transcript variants encoding the same protein. SPOP NA
ENSG00000198624 26112 coiled-coil domain containing 69 NA CCDC69 NA
ENSG00000018408 25937 WW domain containing transcription regulator 1 NA WWTR1 NA
ENSG00000140682 7041 transforming growth factor beta 1 induced transcript 1 This gene encodes a coactivator of the androgen receptor, a transcription factor which is activated by androgen and has a key role in male sexual differentiation. The encoded protein is thought to regulate androgen receptor activity and may have a role to play in the treatment of prostate cancer. Multiple transcript variants encoding different isoforms have been found for this gene. TGFB1I1 NA
ENSG00000116729 79971 wntless Wnt ligand secretion mediator NA WLS NA
ENSG00000157110 11030 RNA binding protein with multiple splicing This gene encodes a member of the RNA recognition motif family of RNA-binding proteins. The RNA recognition motif is between 80-100 amino acids in length and family members contain one to four copies of the motif. The RNA recognition motif consists of two short stretches of conserved sequence, as well as a few highly conserved hydrophobic residues. The encoded protein has a single, putative RNA recognition motif in its N-terminus. Alternative splicing results in multiple transcript variants encoding different isoforms. RBPMS NA
ENSG00000064999 23294 ankyrin repeat and sterile alpha motif domain containing 1A NA ANKS1A NA
ENSG00000197894 128 alcohol dehydrogenase 5 (class III), chi polypeptide This gene encodes a member of the alcohol dehydrogenase family. Members of this family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. The encoded protein forms a homodimer. It has virtually no activity for ethanol oxidation, but exhibits high activity for oxidation of long-chain primary alcohols and for oxidation of S-hydroxymethyl-glutathione, a spontaneous adduct between formaldehyde and glutathione. This enzyme is an important component of cellular metabolism for the elimination of formaldehyde, a potent irritant and sensitizing agent that causes lacrymation, rhinitis, pharyngitis, and contact dermatitis. The human genome contains several non-transcribed pseudogenes related to this gene. ADH5 NA
ENSG00000139718 23067 SET domain containing 1B SET1B is a component of a histone methyltransferase complex that produces trimethylated histone H3 at Lys4 (Lee et al., 2007 [PubMed 17355966]). SETD1B NA
ENSG00000128272 468 activating transcription factor 4 This gene encodes a transcription factor that was originally identified as a widely expressed mammalian DNA binding protein that could bind a tax-responsive enhancer element in the LTR of HTLV-1. The encoded protein was also isolated and characterized as the cAMP-response element binding protein 2 (CREB-2). The protein encoded by this gene belongs to a family of DNA-binding proteins that includes the AP-1 family of transcription factors, cAMP-response element binding proteins (CREBs) and CREB-like proteins. These transcription factors share a leucine zipper region that is involved in protein-protein interactions, located C-terminal to a stretch of basic amino acids that functions as a DNA binding domain. Two alternative transcripts encoding the same protein have been described. Two pseudogenes are located on the X chromosome at q28 in a region containing a large inverted duplication. ATF4 NA
ENSG00000237886 ENSG00000237886 NOTCH1 associated lncRNA in T-cell acute lymphoblastic leukemia 1 NA NALT1 NA
ENSG00000213949 3672 integrin subunit alpha 1 This gene encodes the alpha 1 subunit of integrin receptors. This protein heterodimerizes with the beta 1 subunit to form a cell-surface receptor for collagen and laminin. The heterodimeric receptor is involved in cell-cell adhesion and may play a role in inflammation and fibrosis. The alpha 1 subunit contains an inserted (I) von Willebrand factor type I domain which is thought to be involved in collagen binding. ITGA1 NA
ENSG00000240771 115557 Rho guanine nucleotide exchange factor 25 Rho GTPases alternate between an inactive GDP-bound state and an active GTP-bound state, and GEFs facilitate GDP/GTP exchange. This gene encodes a guanine nucleotide exchange factor (GEF) which interacts with Rho GTPases involved in contraction of vascular smooth muscles, regulation of responses to angiotensin II and lens cell differentiation. Multiple transcript variants encoding different isoforms have been found for this gene. ARHGEF25 NA
ENSG00000103202 4833 NME/NM23 nucleoside diphosphate kinase 4 The nucleoside diphosphate (NDP) kinases (EC 2.7.4.6) are ubiquitous enzymes that catalyze transfer of gamma-phosphates, via a phosphohistidine intermediate, between nucleoside and dioxynucleoside tri- and diphosphates. The enzymes are products of the nm23 gene family, which includes NME4 (Milon et al., 1997 [PubMed 9099850]). NME4 NA
ENSG00000101452 60625 DEAH-box helicase 35 DEAD box proteins characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of the DEAD box protein family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. The function of this gene product which is a member of this family, has not been determined. Alternatively spliced transcript variants have been found for this gene. DHX35 NA
ENSG00000174136 285704 repulsive guidance molecule family member b RGMB is a glycosylphosphatidylinositol (GPI)-anchored member of the repulsive guidance molecule family (see RGMA, MIM 607362) and contributes to the patterning of the developing nervous system (Samad et al., 2005 [PubMed 15671031]). RGMB NA
ENSG00000188549 388115 chromosome 15 open reading frame 52 NA C15orf52 NA
ENSG00000243244 11037 stonin 1 Endocytosis of cell surface proteins is mediated by a complex molecular machinery that assembles on the inner surface of the plasma membrane. This gene encodes one of two human homologs of the Drosophila melanogaster stoned B protein. This protein is related to components of the endocytic machinery and exhibits a modular structure consisting of an N-terminal proline-rich domain, a central region of homology specific to the human stoned B-like proteins, and a C-terminal region homologous to the mu subunits of adaptor protein (AP) complexes. Read-through transcription of this gene into the neighboring downstream gene, which encodes TFIIA-alpha/beta-like factor, generates a transcript (SALF), which encodes a fusion protein comprised of sequence sharing identity with each individual gene product. Alternative splicing results in multiple transcript variants. STON1 NA
ENSG00000117013 9132 potassium voltage-gated channel subfamily Q member 4 The protein encoded by this gene forms a potassium channel that is thought to play a critical role in the regulation of neuronal excitability, particularly in sensory cells of the cochlea. The current generated by this channel is inhibited by M1 muscarinic acetylcholine receptors and activated by retigabine, a novel anti-convulsant drug. The encoded protein can form a homomultimeric potassium channel or possibly a heteromultimeric channel in association with the protein encoded by the KCNQ3 gene. Defects in this gene are a cause of nonsyndromic sensorineural deafness type 2 (DFNA2), an autosomal dominant form of progressive hearing loss. Two transcript variants encoding different isoforms have been found for this gene. KCNQ4 NA
ENSG00000149596 57158 junctophilin 2 Junctional complexes between the plasma membrane and endoplasmic/sarcoplasmic reticulum are a common feature of all excitable cell types and mediate cross talk between cell surface and intracellular ion channels. The protein encoded by this gene is a component of junctional complexes and is composed of a C-terminal hydrophobic segment spanning the endoplasmic/sarcoplasmic reticulum membrane and a remaining cytoplasmic domain that shows specific affinity for the plasma membrane. This gene is a member of the junctophilin gene family. Alternative splicing has been observed at this locus and two variants encoding distinct isoforms are described. JPH2 NA
ENSG00000196923 9260 PDZ and LIM domain 7 The protein encoded by this gene is representative of a family of proteins composed of conserved PDZ and LIM domains. LIM domains are proposed to function in protein-protein recognition in a variety of contexts including gene transcription and development and in cytoskeletal interaction. The LIM domains of this protein bind to protein kinases, whereas the PDZ domain binds to actin filaments. The gene product is involved in the assembly of an actin filament-associated complex essential for transmission of ret/ptc2 mitogenic signaling. The biological function is likely to be that of an adapter, with the PDZ domain localizing the LIM-binding proteins to actin filaments of both skeletal muscle and nonmuscle tissues. Alternative splicing of this gene results in multiple transcript variants. PDLIM7 NA
ENSG00000103852 64927 tetratricopeptide repeat domain 23 NA TTC23 NA
ENSG00000163637 166336 prickle planar cell polarity protein 2 This gene encodes a homolog of Drosophila prickle. The exact function of this gene is not known, however, studies in mice suggest that it may be involved in seizure prevention. Mutations in this gene are associated with progressive myoclonic epilepsy type 5. PRICKLE2 NA
ENSG00000261490 ENSG00000261490 NA NA RP11-448G15.3 NA
ENSG00000182175 56963 repulsive guidance molecule family member a This gene encodes a member of the repulsive guidance molecule family. The encoded protein is a glycosylphosphatidylinositol-anchored glycoprotein that functions as an axon guidance protein in the developing and adult central nervous system. This protein may also function as a tumor suppressor in some cancers. Alternate splicing results in multiple transcript variants. RGMA NA
ENSG00000035403 7414 vinculin Vinculin is a cytoskeletal protein associated with cell-cell and cell-matrix junctions, where it is thought to function as one of several interacting proteins involved in anchoring F-actin to the membrane. Defects in VCL are the cause of cardiomyopathy dilated type 1W. Dilated cardiomyopathy is a disorder characterized by ventricular dilation and impaired systolic function, resulting in congestive heart failure and arrhythmia. Multiple alternatively spliced transcript variants have been found for this gene, but the biological validity of some variants has not been determined. VCL NA
ENSG00000116194 9068 angiopoietin like 1 Angiopoietins are members of the vascular endothelial growth factor family and the only known growth factors largely specific for vascular endothelium. Angiopoietin-1, angiopoietin-2, and angiopoietin-4 participate in the formation of blood vessels. The protein encoded by this gene is another member of the angiopoietin family that is widely expressed in adult tissues with mRNA levels highest in highly vascularized tissues. This protein was found to be a secretory protein that does not act as an endothelial cell mitogen in vitro. ANGPTL1 NA
ENSG00000065320 9423 netrin 1 Netrin is included in a family of laminin-related secreted proteins. The function of this gene has not yet been defined; however, netrin is thought to be involved in axon guidance and cell migration during development. Mutations and loss of expression of netrin suggest that variation in netrin may be involved in cancer development. NTN1 NA
ENSG00000155760 8324 frizzled class receptor 7 Members of the ‘frizzled’ gene family encode 7-transmembrane domain proteins that are receptors for Wnt signaling proteins. The FZD7 protein contains an N-terminal signal sequence, 10 cysteine residues typical of the cysteine-rich extracellular domain of Fz family members, 7 putative transmembrane domains, and an intracellular C-terminal tail with a PDZ domain-binding motif. FZD7 gene expression may downregulate APC function and enhance beta-catenin-mediated signals in poorly differentiated human esophageal carcinomas. FZD7 NA
ENSG00000145012 4026 LIM domain containing preferred translocation partner in lipoma This gene encodes a member of a subfamily of LIM domain proteins that are characterized by an N-terminal proline-rich region and three C-terminal LIM domains. The encoded protein localizes to the cell periphery in focal adhesions and may be involved in cell-cell adhesion and cell motility. This protein also shuttles through the nucleus and may function as a transcriptional co-activator. This gene is located at the junction of certain disease-related chromosomal translocations, which result in the expression of chimeric proteins that may promote tumor growth. Alternative splicing results in multiple transcript variants. LPP NA
ENSG00000213160 151230 kelch like family member 23 NA KLHL23 NA
ENSG00000213160 100526832 PHOSPHO2-KLHL23 readthrough This locus represents naturally occurring read-through transcription between the neighboring PHOSPHO2 (phosphatase, orphan 2) and KLHL23 (kelch-like 23) genes on chromosome 2. The read-through transcript includes only non-coding PHOSPHO2 exons, and thus encodes the KLHL23 protein. PHOSPHO2-KLHL23 NA
ENSG00000072110 87 actinin alpha 1 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. ACTN1 NA
ENSG00000135931 80210 armadillo repeat containing 9 NA ARMC9 NA
ENSG00000114861 27086 forkhead box P1 This gene belongs to subfamily P of the forkhead box (FOX) transcription factor family. Forkhead box transcription factors play important roles in the regulation of tissue- and cell type-specific gene transcription during both development and adulthood. Forkhead box P1 protein contains both DNA-binding- and protein-protein binding-domains. This gene may act as a tumor suppressor as it is lost in several tumor types and maps to a chromosomal region (3p14.1) reported to contain a tumor suppressor gene(s). Alternative splicing results in multiple transcript variants encoding different isoforms. FOXP1 NA
ENSG00000173175 111 adenylate cyclase 5 This gene encodes a member of the membrane-bound adenylyl cyclase enzymes. Adenylyl cyclases mediate G protein-coupled receptor signaling through the synthesis of the second messenger cAMP. Activity of the encoded protein is stimulated by the Gs alpha subunit of G protein-coupled receptors and is inhibited by protein kinase A, calcium and Gi alpha subunits. Single nucleotide polymorphisms in this gene may be associated with low birth weight and type 2 diabetes. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. ADCY5 NA
ENSG00000071205 79658 Rho GTPase activating protein 10 NA ARHGAP10 NA
ENSG00000118257 8828 neuropilin 2 This gene encodes a member of the neuropilin family of receptor proteins. The encoded transmembrane protein binds to SEMA3C protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C} and SEMA3F protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F}, and interacts with vascular endothelial growth factor (VEGF). This protein may play a role in cardiovascular development, axon guidance, and tumorigenesis. Multiple transcript variants encoding distinct isoforms have been identified for this gene. NRP2 NA
ENSG00000182095 84629 trinucleotide repeat containing 18 NA TNRC18 NA
ENSG00000224713 ENSG00000224713 NA NA AC025165.8 NA
ENSG00000087448 57542 kelch like family member 42 NA KLHL42 NA
ENSG00000166444 6764 suppression of tumorigenicity 5 This gene was identified by its ability to suppress the tumorigenicity of Hela cells in nude mice. The protein encoded by this gene contains a C-terminal region that shares similarity with the Rab 3 family of small GTP binding proteins. This protein preferentially binds to the SH3 domain of c-Abl kinase, and acts as a regulator of MAPK1/ERK2 kinase, which may contribute to its ability to reduce the tumorigenic phenotype in cells. Three alternatively spliced transcript variants of this gene encoding distinct isoforms are identified. ST5 NA
ENSG00000010803 22955 sex comb on midleg homolog 1 (Drosophila) NA SCMH1 NA
ENSG00000151240 22982 disco interacting protein 2 homolog C This gene encodes a member of the disco-interacting protein homolog 2 family. The protein shares strong similarity with a Drosophila protein which interacts with the transcription factor disco and is expressed in the nervous system. DIP2C NA
ENSG00000166166 115708 tRNA methyltransferase 61A NA TRMT61A NA
ENSG00000138080 11117 elastin microfibril interfacer 1 This gene encodes an extracellular matrix glycoprotein that is characterized by an N-terminal microfibril interface domain, a coiled-coiled alpha-helical domain, a collagenous domain and a C-terminal globular C1q domain. The encoded protein associates with elastic fibers at the interface between elastin and microfibrils and may play a role in the development of elastic tissues including large blood vessels, dermis, heart and lung. EMILIN1 NA
ENSG00000231346 ENSG00000231346 long intergenic non-protein coding RNA 1160 NA LINC01160 NA
ENSG00000165995 783 calcium voltage-gated channel auxiliary subunit beta 2 This gene encodes a subunit of a voltage-dependent calcium channel protein that is a member of the voltage-gated calcium channel superfamily. The gene product was originally identified as an antigen target in Lambert-Eaton myasthenic syndrome, an autoimmune disorder. Mutations in this gene are associated with Brugada syndrome. Alternatively spliced variants encoding different isoforms have been described. CACNB2 NA
ENSG00000117569 58155 polypyrimidine tract binding protein 2 The protein encoded by this gene binds to intronic polypyrimidine clusters in pre-mRNA molecules and is implicated in controlling the assembly of other splicing-regulatory proteins. This protein is very similar to the polypyrimidine tract binding protein (PTB) but most of its isoforms are expressed primarily in the brain. Alternative splicing results in multiple transcript variants. PTBP2 NA
ENSG00000166333 3611 integrin linked kinase This gene encodes a protein with a kinase-like domain and four ankyrin-like repeats. The encoded protein associates at the cell membrane with the cytoplasmic domain of beta integrins, where it regulates integrin-mediated signal transduction. Activity of this protein is important in the epithelial to mesenchymal transition, and over-expression of this gene is implicated in tumor growth and metastasis. Alternative splicing results in multiple transcript variants. ILK NA
ENSG00000163431 25802 leiomodin 1 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. LMOD1 NA
ENSG00000179954 284297 scavenger receptor cysteine rich family, 5 domains NA SSC5D NA
ENSG00000162341 219931 two pore segment channel 2 This gene encodes a putative cation-selective ion channel with two repeats of a six-transmembrane-domain. The protein localizes to lysosomal membranes and enables nicotinic acid adenine dinucleotide phosphate (NAADP) -induced calcium ion release from lysosome-related stores. This ubiquitously expressed gene has elevated expression in liver and kidney. Two common nonsynonymous SNPs in this gene strongly associate with blond versus brown hair pigmentation. TPCN2 NA
ENSG00000197361 283807 F-box and leucine-rich repeat protein 22 This gene encodes a member of the F-box protein family. This F-box protein interacts with S-phase kinase-associated protein 1A and cullin in order to form SCF complexes which function as ubiquitin ligases. FBXL22 NA
ENSG00000155858 134353 LSM11, U7 small nuclear RNA associated NA LSM11 NA
ENSG00000073712 10979 fermitin family member 2 NA FERMT2 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",4,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 5 Annotations

out <- mygene::queryMany(gene_list[5,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
regulator of G-protein signaling 5 8490 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. RGS5 ENSG00000143248 NA
matrix Gla protein 4256 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. MGP ENSG00000111341 NA
AE binding protein 1 165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AEBP1 ENSG00000106624 NA
insulin like growth factor binding protein 7 3490 This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). IGFBP7 ENSG00000163453 NA
milk fat globule-EGF factor 8 protein 4240 This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. MFGE8 ENSG00000140545 NA
melanoma cell adhesion molecule 4162 NA MCAM ENSG00000076706 NA
integrin subunit alpha 8 8516 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. ITGA8 ENSG00000077943 NA
elastin 2006 This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. ELN ENSG00000049540 NA
actin, alpha 2, smooth muscle, aorta 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ACTA2 ENSG00000107796 NA
myosin, heavy chain 10, non-muscle 4628 This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. MYH10 ENSG00000133026 NA
latent transforming growth factor beta binding protein 2 4053 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. LTBP2 ENSG00000119681 NA
proline/arginine-rich end leucine-rich repeat protein 5549 The protein encoded by this gene is a leucine-rich repeat protein present in connective tissue extracellular matrix. This protein functions as a molecule anchoring basement membranes to the underlying connective tissue. This protein has been shown to bind type I collagen to basement membranes and type II collagen to cartilage. It also binds the basement membrane heparan sulfate proteoglycan perlecan. This protein is suggested to be involved in the pathogenesis of Hutchinson-Gilford progeria (HGP), which is reported to lack the binding of collagen in basement membranes and cartilage. Alternatively spliced transcript variants encoding the same protein have been observed. PRELP ENSG00000188783 NA
myosin, heavy chain 9, non-muscle 4627 This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. MYH9 ENSG00000100345 NA
frizzled-related protein 2487 The protein encoded by this gene is a secreted protein that is involved in the regulation of bone development. Defects in this gene are a cause of female-specific osteoarthritis (OA) susceptibility. FRZB ENSG00000162998 NA
osteoglycin 4969 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family of proteins. The encoded protein induces ectopic bone formation in conjunction with transforming growth factor beta and may regulate osteoblast differentiation. High expression of the encoded protein may be associated with elevated heart left ventricular mass. Alternative splicing results in multiple transcript variants. OGN ENSG00000106809 NA
fibromodulin 2331 Fibromodulin belongs to the family of small interstitial proteoglycans. The encoded protein possesses a central region containing leucine-rich repeats with 4 keratan sulfate chains, flanked by terminal domains containing disulphide bonds. Owing to the interaction with type I and type II collagen fibrils and in vitro inhibition of fibrillogenesis, the encoded protein may play a role in the assembly of extracellular matrix. It may also regulate TGF-beta activities by sequestering TGF-beta into the extracellular matrix. Sequence variations in this gene may be associated with the pathogenesis of high myopia. Alternative splicing results in multiple transcript variants. FMOD ENSG00000122176 NA
SPARC like 1 8404 NA SPARCL1 ENSG00000152583 NA
connective tissue growth factor 1490 The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. CTGF ENSG00000118523 NA
filamin binding LIM protein 1 54751 This gene encodes a protein with an N-terminal filamin-binding domain, a central proline-rich domain, and, multiple C-terminal LIM domains. This protein localizes at cell junctions and may link cell adhesion structures to the actin cytoskeleton. This protein may be involved in the assembly and stabilization of actin-filaments and likely plays a role in modulating cell adhesion, cell morphology and cell motility. This protein also localizes to the nucleus and may affect cardiomyocyte differentiation after binding with the CSX/NKX2-5 transcription factor. Alternative splicing results in multiple transcript variants encoding different isoforms. FBLIM1 ENSG00000162458 NA
IGFBP7 antisense RNA 1 255130 NA IGFBP7-AS1 ENSG00000245067 NA
ACTA2 antisense RNA 1 ENSG00000180139 NA ACTA2-AS1 ENSG00000180139 NA
latent transforming growth factor beta binding protein 1 4052 The protein encoded by this gene belongs to the family of latent TGF-beta binding proteins (LTBPs). The secretion and activation of TGF-betas is regulated by their association with latency-associated proteins and with latent TGF-beta binding proteins. The product of this gene targets latent complexes of transforming growth factor beta to the extracellular matrix, where the latent cytokine is subsequently activated by several different mechanisms. Alternatively spliced transcript variants encoding different isoforms have been identified. LTBP1 ENSG00000049323 NA
transglutaminase 2 7052 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. TGM2 ENSG00000198959 NA
WNT1 inducible signaling pathway protein 2 8839 This gene encodes a member of the WNT1 inducible signaling pathway (WISP) protein subfamily, which belongs to the connective tissue growth factor (CTGF) family. WNT1 is a member of a family of cysteine-rich, glycosylated signaling proteins that mediate diverse developmental processes. The CTGF family members are characterized by four conserved cysteine-rich domains: insulin-like growth factor-binding domain, von Willebrand factor type C module, thrombospondin domain and C-terminal cystine knot-like (CT) domain. The encoded protein lacks the CT domain which is implicated in dimerization and heparin binding. It is 72% identical to the mouse protein at the amino acid level. This gene may be downstream in the WNT1 signaling pathway that is relevant to malignant transformation. Its expression in colon tumors is reduced while the other two WISP members are overexpressed in colon tumors. It is expressed at high levels in bone tissue, and may play an important role in modulating bone turnover. WISP2 ENSG00000064205 NA
actinin alpha 4 81 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. ACTN4 ENSG00000130402 NA
chloride intracellular channel 4 25932 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 4 (CLIC4) protein, encoded by the CLIC4 gene, is a member of the p64 family; the gene is expressed in many tissues and exhibits a intracellular vesicular pattern in Panc-1 cells (pancreatic cancer cells). CLIC4 ENSG00000169504 NA
notch 3 4854 This gene encodes the third discovered human homologue of the Drosophilia melanogaster type I membrane protein notch. In Drosophilia, notch interaction with its cell-bound ligands (delta, serrate) establishes an intercellular signalling pathway that plays a key role in neural development. Homologues of the notch-ligands have also been identified in human, but precise interactions between these ligands and the human notch homologues remains to be determined. Mutations in NOTCH3 have been identified as the underlying cause of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL). NOTCH3 ENSG00000074181 NA
cytokine receptor-like factor 1 9244 This gene encodes a member of the cytokine type I receptor family. The protein forms a secreted complex with cardiotrophin-like cytokine factor 1 and acts on cells expressing ciliary neurotrophic factor receptors. The complex can promote survival of neuronal cells. Mutations in this gene result in Crisponi syndrome and cold-induced sweating syndrome. CRLF1 ENSG00000006016 NA
integrin subunit alpha 11 22801 This gene encodes an alpha integrin. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This protein contains an I domain, is expressed in muscle tissue, dimerizes with beta 1 integrin in vitro, and appears to bind collagen in this form. Therefore, the protein may be involved in attaching muscle tissue to the extracellular matrix. Alternative transcriptional splice variants have been found for this gene, but their biological validity is not determined. ITGA11 ENSG00000137809 NA
SPARC related modular calcium binding 2 64094 This gene encodes a member of the SPARC family (secreted protein acidic and rich in cysteine/osteonectin/BM-40), which are highly expressed during embryogenesis and wound healing. The gene product is a matricellular protein which promotes matrix assembly and can stimulate endothelial cell proliferation and migration, as well as angiogenic activity. Associated with pulmonary function, this secretory gene product contains a Kazal domain, two thymoglobulin type-1 domains, and two EF-hand calcium-binding domains. The encoded protein may serve as a target for controlling angiogenesis in tumor growth and myocardial ischemia. Alternative splicing results in multiple transcript variants. SMOC2 ENSG00000112562 NA
ras homolog family member B 388 NA RHOB ENSG00000143878 NA
thrombospondin 2 7058 The protein encoded by this gene belongs to the thrombospondin family. It is a disulfide-linked homotrimeric glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein has been shown to function as a potent inhibitor of tumor growth and angiogenesis. Studies of the mouse counterpart suggest that this protein may modulate the cell surface properties of mesenchymal cells and be involved in cell adhesion and migration. THBS2 ENSG00000186340 NA
anthrax toxin receptor 1 84168 This gene encodes a type I transmembrane protein and is a tumor-specific endothelial marker that has been implicated in colorectal cancer. The encoded protein has been shown to also be a docking protein or receptor for Bacillus anthracis toxin, the causative agent of the disease, anthrax. The binding of the protective antigen (PA) component, of the tripartite anthrax toxin, to this receptor protein mediates delivery of toxin components to the cytosol of cells. Once inside the cell, the other two components of anthrax toxin, edema factor (EF) and lethal factor (LF) disrupt normal cellular processes. Three alternatively spliced variants that encode different protein isoforms have been described. ANTXR1 ENSG00000169604 NA
EGF containing fibulin-like extracellular matrix protein 1 2202 This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. EFEMP1 ENSG00000115380 NA
myosin ID 4642 NA MYO1D ENSG00000176658 NA
transmembrane protein 181 57583 The TMEM181 gene encodes a putative G protein-coupled receptor expressed on the cell surface (Carette et al., 2009 [PubMed 19965467]; Wollscheid et al., 2009 [PubMed 19349973]). TMEM181 ENSG00000146433 NA
prostate transmembrane protein, androgen induced 1 56937 This gene encodes a transmembrane protein that contains a Smad interacting motif (SIM). Expression of this gene is induced by androgens and transforming growth factor beta, and the encoded protein suppresses the androgen receptor and transforming growth factor beta signaling pathways though interactions with Smad proteins. Overexpression of this gene may play a role in multiple types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. PMEPA1 ENSG00000124225 NA
coiled-coil domain containing 3 83643 NA CCDC3 ENSG00000151468 NA
TIMP metallopeptidase inhibitor 2 7077 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. TIMP2 ENSG00000035862 NA
collagen type XVIII alpha 1 80781 This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. COL18A1 ENSG00000182871 NA
destrin, actin depolymerizing factor 11034 The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. DSTN ENSG00000125868 NA
integrin subunit alpha 10 8515 Integrins are integral transmembrane glycoproteins composed of noncovalently linked alpha and beta chains. They participate in cell adhesion as well as cell-surface mediated signalling. This gene encodes an integrin alpha chain and is expressed at high levels in chondrocytes, where it is transcriptionally regulated by AP-2epsilon and Ets-1. The protein encoded by this gene binds to collagen. Alternative splicing results in multiple transcript variants. ITGA10 ENSG00000143127 NA
forkhead box C1 2296 This gene belongs to the forkhead family of transcription factors which is characterized by a distinct DNA-binding forkhead domain. The specific function of this gene has not yet been determined; however, it has been shown to play a role in the regulation of embryonic and ocular development. Mutations in this gene cause various glaucoma phenotypes including primary congenital glaucoma, autosomal dominant iridogoniodysgenesis anomaly, and Axenfeld-Rieger anomaly. FOXC1 ENSG00000054598 NA
superoxide dismutase 3, extracellular 6649 This gene encodes a member of the superoxide dismutase (SOD) protein family. SODs are antioxidant enzymes that catalyze the conversion of superoxide radicals into hydrogen peroxide and oxygen, which may protect the brain, lungs, and other tissues from oxidative stress. Proteolytic processing of the encoded protein results in the formation of two distinct homotetramers that differ in their ability to interact with the extracellular matrix (ECM). Homotetramers consisting of the intact protein, or type C subunit, exhibit high affinity for heparin and are anchored to the ECM. Homotetramers consisting of a proteolytically cleaved form of the protein, or type A subunit, exhibit low affinity for heparin and do not interact with the ECM. A mutation in this gene may be associated with increased heart disease risk. SOD3 ENSG00000109610 NA
jun B proto-oncogene 3726 NA JUNB ENSG00000171223 NA
versican 1462 This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. VCAN ENSG00000038427 NA
protein phosphatase 1 regulatory inhibitor subunit 14A 94274 The protein encoded by this gene belongs to the protein phosphatase 1 (PP1) inhibitor family. This protein is an inhibitor of smooth muscle myosin phosphatase, and has higher inhibitory activity when phosphorylated. Inhibition of myosin phosphatase leads to increased myosin phosphorylation and enhanced smooth muscle contraction. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. PPP1R14A ENSG00000167641 NA
insulin like growth factor binding protein 2 3485 The protein encoded by this gene is one of six similar proteins that bind insulin-like growth factors I and II (IGF-I and IGF-II). The encoded protein can be secreted into the bloodstream, where it binds IGF-I and IGF-II with high affinity, or it can remain intracellular, interacting with many different ligands. High expression levels of this protein promote the growth of several types of tumors and may be predictive of the chances of recovery of the patient. Several transcript variants, one encoding a secreted isoform and the others encoding nonsecreted isoforms, have been found for this gene. IGFBP2 ENSG00000115457 NA
integrin subunit beta 5 3693 NA ITGB5 ENSG00000082781 NA
fibulin 5 10516 The protein encoded by this gene is a secreted, extracellular matrix protein containing an Arg-Gly-Asp (RGD) motif and calcium-binding EGF-like domains. It promotes adhesion of endothelial cells through interaction of integrins and the RGD motif. It is prominently expressed in developing arteries but less so in adult vessels. However, its expression is reinduced in balloon-injured vessels and atherosclerotic lesions, notably in intimal vascular smooth muscle cells and endothelial cells. Therefore, the protein encoded by this gene may play a role in vascular development and remodeling. Defects in this gene are a cause of autosomal dominant cutis laxa, autosomal recessive cutis laxa type I (CL type I), and age-related macular degeneration type 3 (ARMD3). FBLN5 ENSG00000140092 NA
tenascin C 3371 This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. TNC ENSG00000041982 NA
protein kinase, cGMP-dependent, type I 5592 Mammals have three different isoforms of cyclic GMP-dependent protein kinase (Ialpha, Ibeta, and II). These PRKG isoforms act as key mediators of the nitric oxide/cGMP signaling pathway and are important components of many signal transduction processes in diverse cell types. This PRKG1 gene on human chromosome 10 encodes the soluble Ialpha and Ibeta isoforms of PRKG by alternative transcript splicing. A separate gene on human chromosome 4, PRKG2, encodes the membrane-bound PRKG isoform II. The PRKG1 proteins play a central role in regulating cardiovascular and neuronal functions in addition to relaxing smooth muscle tone, preventing platelet aggregation, and modulating cell growth. This gene is most strongly expressed in all types of smooth muscle, platelets, cerebellar Purkinje cells, hippocampal neurons, and the lateral amygdala. Isoforms Ialpha and Ibeta have identical cGMP-binding and catalytic domains but differ in their leucine/isoleucine zipper and autoinhibitory sequences and therefore differ in their dimerization substrates and kinase enzyme activity. PRKG1 ENSG00000185532 NA
CD151 molecule (Raph blood group) 977 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins and other transmembrane 4 superfamily proteins. It is involved in cellular processes including cell adhesion and may regulate integrin trafficking and/or function. This protein enhances cell motility, invasion and metastasis of cancer cells. Multiple alternatively spliced transcript variants that encode the same protein have been described for this gene. CD151 ENSG00000177697 NA
filamin A interacting protein 1-like 11259 NA FILIP1L ENSG00000168386 NA
transgelin 6876 The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. TAGLN ENSG00000149591 NA
serine/threonine kinase 38 like 23012 NA STK38L ENSG00000211455 NA
tumor necrosis factor receptor superfamily member 11b 4982 The protein encoded by this gene is a member of the TNF-receptor superfamily. This protein is an osteoblast-secreted decoy receptor that functions as a negative regulator of bone resorption. This protein specifically binds to its ligand, osteoprotegerin ligand, both of which are key extracellular regulators of osteoclast development. Studies of the mouse counterpart also suggest that this protein and its ligand play a role in lymph-node organogenesis and vascular calcification. Alternatively spliced transcript variants of this gene have been reported, but their full length nature has not been determined. TNFRSF11B ENSG00000164761 NA
cysteine and glycine rich protein 2 1466 CSRP2 is a member of the CSRP family of genes, encoding a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. CRP2 contains two copies of the cysteine-rich amino acid sequence motif (LIM) with putative zinc-binding activity, and may be involved in regulating ordered cell growth. Other genes in the family include CSRP1 and CSRP3. Alternative splicing results in multiple transcript variants. CSRP2 ENSG00000175183 NA
tubulointerstitial nephritis antigen like 1 64129 The protein encoded by this gene is similar in sequence to tubulointerstitial nephritis antigen, a secreted glycoprotein that is recognized by antibodies in some types of immune-related tubulointerstitial nephritis. Three transcript variants encoding different isoforms have been found for this gene. TINAGL1 ENSG00000142910 NA
polycystin 2, transient receptor potential cation channel 5311 This gene encodes a member of the polycystin protein family. The encoded protein is a multi-pass membrane protein that functions as a calcium permeable cation channel, and is involved in calcium transport and calcium signaling in renal epithelial cells. This protein interacts with polycystin 1, and they may be partners in a common signaling cascade involved in tubular morphogenesis. Mutations in this gene are associated with autosomal dominant polycystic kidney disease type 2. PKD2 ENSG00000118762 NA
Rho guanine nucleotide exchange factor 17 9828 NA ARHGEF17 ENSG00000110237 NA
nephroblastoma overexpressed 4856 The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. NOV ENSG00000136999 NA
adhesion molecule with Ig-like domain 2 347902 NA AMIGO2 ENSG00000139211 NA
carboxypeptidase X (M14 family), member 2 119587 NA CPXM2 ENSG00000121898 NA
chondroitin sulfate proteoglycan 4 1464 A human melanoma-associated chondroitin sulfate proteoglycan plays a role in stabilizing cell-substratum interactions during early events of melanoma cell spreading on endothelial basement membranes. CSPG4 represents an integral membrane chondroitin sulfate proteoglycan expressed by human malignant melanoma cells. CSPG4 ENSG00000173546 NA
cytochrome b5 reductase 3 1727 This gene encodes cytochrome b5 reductase, which includes a membrane-bound form in somatic cells (anchored in the endoplasmic reticulum, mitochondrial and other membranes) and a soluble form in erythrocytes. The membrane-bound form exists mainly on the cytoplasmic side of the endoplasmic reticulum and functions in desaturation and elongation of fatty acids, in cholesterol biosynthesis, and in drug metabolism. The erythrocyte form is located in a soluble fraction of circulating erythrocytes and is involved in methemoglobin reduction. The membrane-bound form has both membrane-binding and catalytic domains, while the soluble form has only the catalytic domain. Alternate splicing results in multiple transcript variants. Mutations in this gene cause methemoglobinemias. CYB5R3 ENSG00000100243 NA
SMAD family member 7 4092 The protein encoded by this gene is a nuclear protein that binds the E3 ubiquitin ligase SMURF2. Upon binding, this complex translocates to the cytoplasm, where it interacts with TGF-beta receptor type-1 (TGFBR1), leading to the degradation of both the encoded protein and TGFBR1. Expression of this gene is induced by TGFBR1. Variations in this gene are a cause of susceptibility to colorectal cancer type 3 (CRCS3). Several transcript variants encoding different isoforms have been found for this gene. SMAD7 ENSG00000101665 NA
tensin 1 7145 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. TNS1 ENSG00000079308 NA
vimentin 7431 This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. VIM ENSG00000026025 NA
Yes associated protein 1 10413 This gene encodes a downstream nuclear effector of the Hippo signaling pathway which is involved in development, growth, repair, and homeostasis. This gene is known to play a role in the development and progression of multiple cancers as a transcriptional regulator of this signaling pathway and may function as a potential target for cancer treatment. Alternative splicing results in multiple transcript variants encoding different isoforms. YAP1 ENSG00000137693 NA
regulator of calcineurin 2 10231 This gene encodes a member of the regulator of calcineurin (RCAN) protein family. These proteins play a role in many physiological processes by binding to the catalytic domain of calcineurin A, inhibiting calcineurin-mediated nuclear translocation of the transcription factor NFATC1. Expression of this gene in skin fibroblasts is upregulated by thyroid hormone, and the encoded protein may also play a role in endothelial cell function and angiogenesis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. RCAN2 ENSG00000172348 NA
collagen type IV alpha 2 1284 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. COL4A2 ENSG00000134871 NA
hes related family bHLH transcription factor with YRPW motif 2 23493 This gene encodes a member of the hairy and enhancer of split-related (HESR) family of basic helix-loop-helix (bHLH)-type transcription factors. The encoded protein forms homo- or hetero-dimers that localize to the nucleus and interact with a histone deacetylase complex to repress transcription. Expression of this gene is induced by the Notch signal transduction pathway. Two similar and redundant genes in mouse are required for embryonic cardiovascular development, and are also implicated in neurogenesis and somitogenesis. Alternatively spliced transcript variants have been found, but their biological validity has not been determined. HEY2 ENSG00000135547 NA
thromboxane A2 receptor 6915 This gene encodes a member of the G protein-coupled receptor family. The protein interacts with thromboxane A2 to induce platelet aggregation and regulate hemostasis. A mutation in this gene results in a bleeding disorder. Multiple transcript variants encoding different isoforms have been found for this gene. TBXA2R ENSG00000006638 NA
integrin subunit alpha 5 3678 The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. ITGA5 ENSG00000161638 NA
zinc finger homeobox 3 463 This gene encodes a transcription factor with multiple homeodomains and zinc finger motifs, and regulates myogenic and neuronal differentiation. The encoded protein suppresses expression of the alpha-fetoprotein gene by binding to an AT-rich enhancer motif. The protein has also been shown to negatively regulate c-Myb, and transactivate the cell cycle inhibitor cyclin-dependent kinase inhibitor 1A (also known as p21CIP1). This gene is reported to function as a tumor suppressor in several cancers, and sequence variants of this gene are also associated with atrial fibrillation. Multiple transcript variants expressed from alternate promoters and encoding different isoforms have been found for this gene. ZFHX3 ENSG00000140836 NA
secreted frizzled-related protein 2 6423 This gene encodes a member of the SFRP family that contains a cysteine-rich domain homologous to the putative Wnt-binding site of Frizzled proteins. SFRPs act as soluble modulators of Wnt signaling. Methylation of this gene is a potential marker for the presence of colorectal cancer. SFRP2 ENSG00000145423 NA
inhibitor of DNA binding 2, HLH protein 3398 The protein encoded by this gene belongs to the inhibitor of DNA binding family, members of which are transcriptional regulators that contain a helix-loop-helix (HLH) domain but not a basic domain. Members of the inhibitor of DNA binding family inhibit the functions of basic helix-loop-helix transcription factors in a dominant-negative manner by suppressing their heterodimerization partners through the HLH domains. This protein may play a role in negatively regulating cell differentiation. A pseudogene of this gene is located on chromosome 3. ID2 ENSG00000115738 NA
dishevelled-binding antagonist of beta-catenin 3 147906 NA DACT3 ENSG00000197380 NA
jagged 1 182 The jagged 1 protein encoded by JAG1 is the human homolog of the Drosophilia jagged protein. Human jagged 1 is the ligand for the receptor notch 1, the latter a human homolog of the Drosophilia jagged receptor notch. Mutations that alter the jagged 1 protein cause Alagille syndrome. Jagged 1 signalling through notch 1 has also been shown to play a role in hematopoiesis. JAG1 ENSG00000101384 NA
NA ENSG00000232415 NA CTB-51J22.1 ENSG00000232415 NA
potassium channel tetramerization domain containing 10 83892 The protein encoded by this gene binds proliferating cell nuclear antigen (PCNA) and may be involved in DNA synthesis and cell proliferation. In addition, the encoded protein may be a tumor suppressor. Several protein-coding and non-protein coding transcript variants have been found for this gene. KCTD10 ENSG00000110906 NA
hes related family bHLH transcription factor with YRPW motif-like 26508 This gene encodes a member of the hairy and enhancer of split-related (HESR) family of basic helix-loop-helix (bHLH)-type transcription factors. The sequence of the encoded protein contains a conserved bHLH and orange domain, but its YRPW motif has diverged from other HESR family members. It is thought to be an effector of Notch signaling and a regulator of cell fate decisions. Alternatively spliced transcript variants have been found, but their biological validity has not been determined. HEYL ENSG00000163909 NA
hyaluronan and proteoglycan link protein 3 145864 This gene belongs to the hyaluronan and proteoglycan binding link protein gene family. The protein encoded by this gene may function in hyaluronic acid binding and cell adhesion. HAPLN3 ENSG00000140511 NA
muscleblind like splicing regulator 1 4154 This gene encodes a member of the muscleblind protein family which was initially described in Drosophila melanogaster. The encoded protein is a C3H-type zinc finger protein that modulates alternative splicing of pre-mRNAs. Muscleblind proteins bind specifically to expanded dsCUG RNA but not to normal size CUG repeats and may thereby play a role in the pathophysiology of myotonic dystrophy. Mice lacking this gene exhibited muscle abnormalities and cataracts. Several alternatively spliced transcript variants have been described but the full-length natures of only some have been determined. The different isoforms are thought to have different binding specificities and/or splicing activities. MBNL1 ENSG00000152601 NA
microfibrillar associated protein 4 4239 This gene encodes a protein with similarity to a bovine microfibril-associated protein. The protein has binding specificities for both collagen and carbohydrate. It is thought to be an extracellular matrix protein which is involved in cell adhesion or intercellular interactions. The gene is located within the Smith-Magenis syndrome region. Two transcript variants encoding different isoforms have been found for this gene. MFAP4 ENSG00000166482 NA
Wilms tumor 1 interacting protein 126374 NA WTIP ENSG00000142279 NA
platelet derived growth factor subunit A 5154 This gene encodes a member of the protein family comprised of both platelet-derived growth factors (PDGF) and vascular endothelial growth factors (VEGF). The encoded preproprotein is proteolytically processed to generate platelet-derived growth factor subunit A, which can homodimerize, or alternatively, heterodimerize with the related platelet-derived growth factor subunit B. These proteins bind and activate PDGF receptor tyrosine kinases, which play a role in a wide range of developmental processes. Alternative splicing results in multiple transcript variants. PDGFA ENSG00000197461 NA
protein kinase C delta binding protein 112464 The protein encoded by this gene was identified as a binding protein of the protein kinase C, delta (PRKCD). The expression of this gene in cultured cell lines is strongly induced by serum starvation. The expression of this protein was found to be down-regulated in various cancer cell lines, suggesting the possible tumor suppressor function of this protein. PRKCDBP ENSG00000170955 NA
NA NA NA NA ENSG00000255905 TRUE
growth arrest and DNA damage inducible beta 4616 This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The genes in this group respond to environmental stresses by mediating activation of the p38/JNK pathway. This activation is mediated via their proteins binding and activating MTK1/MEKK4 kinase, which is an upstream activator of both p38 and JNK MAPKs. The function of these genes or their protein products is involved in the regulation of growth and apoptosis. These genes are regulated by different mechanisms, but they are often coordinately expressed and can function cooperatively in inhibiting cell growth. GADD45B ENSG00000099860 NA
atlastin GTPase 3 25923 This gene encodes a member of a family of dynamin-like, integral membrane GTPases. The encoded protein is required for the proper formation of the network of interconnected tubules of the endoplasmic reticulum. Mutations in this gene may be associated with hereditary sensory neuropathy type IF. Alternatively spliced transcript variants that encode distinct isoforms have been described. ATL3 ENSG00000184743 NA
adipogenesis regulatory factor 10974 APM2 gene is exclusively expressed in adipose tissue. Its function is currently unknown. ADIRF ENSG00000148671 NA
Sad1 and UNC84 domain containing 2 25777 SUN1 (MIM 607723) and SUN2 are inner nuclear membrane (INM) proteins that play a major role in nuclear-cytoplasmic connection by formation of a ‘bridge’ across the nuclear envelope, known as the LINC complex, via interaction with the conserved luminal KASH domain of nesprins (e.g., SYNE1; MIM 608441) located in the outer nuclear membrane (ONM). The LINC complex provides a direct connection between the nuclear lamina and the cytoskeleton, which contributes to nuclear positioning and cellular rigidity (summary by Haque et al., 2010 [PubMed 19933576]). SUN2 ENSG00000100242 NA
murine retrovirus integration site 1 homolog 10335 This gene is similar to a putative mouse tumor suppressor gene (Mrvi1) that is frequently disrupted by mouse AIDS-related virus (MRV). The encoded protein, which is found in the membrane of the endoplasmic reticulum, is similar to Jaw1, a lymphoid-restricted protein whose expression is down-regulated during lymphoid differentiation. This protein is a substrate of cGMP-dependent kinase-1 (PKG1) that can function as a regulator of IP3-induced calcium release. Studies in mouse suggest that MRV integration at Mrvi1 induces myeloid leukemia by altering the expression of a gene important for myeloid cell growth and/or differentiation, and thus this gene may function as a myeloid leukemia tumor suppressor gene. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, and alternative translation start sites, including a non-AUG (CUG) start site, are used. MRVI1 ENSG00000072952 NA
extracellular matrix protein 2 1842 ECM2 encodes extracellular matrix protein 2, so named because it shares extensive similarity with known extracelluar matrix proteins. Three transcript variants encoding different isoforms have been found for this gene. ECM2 ENSG00000106823 NA
Janus kinase 2 3717 This gene product is a protein tyrosine kinase involved in a specific subset of cytokine receptor signaling pathways. It has been found to be constituitively associated with the prolactin receptor and is required for responses to gamma interferon. Mice that do not express an active protein for this gene exhibit embryonic lethality associated with the absence of definitive erythropoiesis. JAK2 ENSG00000096968 NA
VIM antisense RNA 1 100507347 NA VIM-AS1 ENSG00000229124 NA
latent transforming growth factor beta binding protein 3 4054 The protein encoded by this gene forms a complex with transforming growth factor beta (TGF-beta) proteins and may be involved in their subcellular localization. Activation of this complex requires removal of the encoded binding protein. This protein also may play a structural role in the extracellular matrix. Three transcript variants encoding different isoforms have been found for this gene. LTBP3 ENSG00000168056 NA
polymerase I and transcript release factor 284119 This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. PTRF ENSG00000177469 NA
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",5,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 6 Annotations

out <- mygene::queryMany(gene_list[6,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query summary name X_id
KRT10 ENSG00000186395 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. keratin 10 3858
KRT1 ENSG00000167768 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 1 3848
KRT2 ENSG00000172867 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 2 3849
LOR ENSG00000203782 This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. loricrin 4014
KRT14 ENSG00000186847 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. keratin 14 3861
DMKN ENSG00000161249 This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. dermokine 93099
DCD ENSG00000161634 This antimicrobial gene encodes a secreted protein that is subsequently processed into mature peptides of distinct biological activities. The C-terminal peptide is constitutively expressed in sweat and has antibacterial and antifungal activities. The N-terminal peptide, also known as diffusible survival evasion peptide, promotes neural cell survival under conditions of severe oxidative stress. A glycosylated form of the N-terminal peptide may be associated with cachexia (muscle wasting) in cancer patients. Alternative splicing results in multiple transcript variants encoding different isoforms. dermcidin 117159
KRTDAP ENSG00000188508 This gene encodes a protein which may function in the regulation of keratinocyte differentiation and maintenance of stratified epithelia. Multiple transcript variants encoding different isoforms have been found for this gene. keratinocyte differentiation associated protein 388533
CALML5 ENSG00000178372 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. calmodulin like 5 51806
SBSN ENSG00000189001 NA suprabasin 374897
ASPRV1 ENSG00000244617 NA aspartic peptidase, retroviral-like 1 151516
DSP ENSG00000096696 This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. desmoplakin 1832
TMEM45A ENSG00000181458 NA transmembrane protein 45A 55076
CDHR1 ENSG00000148600 This gene belongs to the cadherin superfamily of calcium-dependent cell adhesion molecules. The encoded protein is a photoreceptor-specific cadherin that plays a role in outer segment disc morphogenesis. Mutations in this gene are associated with inherited retinal dystrophies. Alternatively spliced transcript variants encoding different isoforms have been identified. cadherin related family member 1 92211
LY6G6C ENSG00000204421 LY6G6C belongs to a cluster of leukocyte antigen-6 (LY6) genes located in the major histocompatibility complex (MHC) class III region on chromosome 6. Members of the LY6 superfamily typically contain 70 to 80 amino acids, including 8 to 10 cysteines. Most LY6 proteins are attached to the cell surface by a glycosylphosphatidylinositol (GPI) anchor that is directly involved in signal transduction (Mallya et al., 2002 [PubMed 12079290]). lymphocyte antigen 6 complex, locus G6C 80740
PKP1 ENSG00000081277 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. plakophilin 1 5317
DEGS1 ENSG00000143753 This gene encodes a member of the membrane fatty acid desaturase family which is responsible for inserting double bonds into specific positions in fatty acids. This protein contains three His-containing consensus motifs that are characteristic of a group of membrane fatty acid desaturases. It is predicted to be a multiple membrane-spanning protein localized to the endoplasmic reticulum. Overexpression of this gene inhibited biosynthesis of the EGF receptor, suggesting a possible role of a fatty acid desaturase in regulating biosynthetic processing of the EGF receptor. delta(4)-desaturase, sphingolipid 1 8560
CST6 ENSG00000175315 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. This gene encodes a cystatin from the type 2 family, which is down-regulated in metastatic breast tumor cells as compared to primary tumor cells. Loss of expression is likely associated with the progression of a primary tumor to a metastatic phenotype. cystatin E/M 1474
PERP ENSG00000112378 NA PERP, TP53 apoptosis effector 64065
LGALS7B ENSG00000178934 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. Differential and in situ hybridization studies indicate that this lectin is specifically expressed in keratinocytes and found mainly in stratified squamous epithelium. A duplicate copy of this gene (GeneID:3963) is found adjacent to, but on the opposite strand on chromosome 19. lectin, galactoside binding soluble 7B 653499
CLDN1 ENSG00000163347 Tight junctions represent one mode of cell-to-cell adhesion in epithelial or endothelial cell sheets, forming continuous seals around cells and serving as a physical barrier to prevent solutes and water from passing freely through the paracellular space. These junctions are comprised of sets of continuous networking strands in the outwardly facing cytoplasmic leaflet, with complementary grooves in the inwardly facing extracytoplasmic leaflet. The protein encoded by this gene, a member of the claudin family, is an integral membrane protein and a component of tight junction strands. Loss of function mutations result in neonatal ichthyosis-sclerosing cholangitis syndrome. claudin 1 9076
SIK1 ENSG00000142178 NA salt inducible kinase 1 150094
AHNAK2 ENSG00000185567 NA AHNAK nucleoprotein 2 113146
MUCL1 ENSG00000172551 NA mucin like 1 118430
KLF4 ENSG00000136826 This gene encodes a protein that belongs to the Kruppel family of transcription factors. The encoded zinc finger protein is required for normal development of the barrier function of skin. The encoded protein is thought to control the G1-to-S transition of the cell cycle following DNA damage by mediating the tumor suppressor gene p53. Mice lacking this gene have a normal appearance but lose weight rapidly, and die shortly after birth due to fluid evaporation resulting from compromised epidermal barrier function. Alternative splicing results in multiple transcript variants encoding different isoforms. Kruppel-like factor 4 (gut) 9314
HOPX ENSG00000171476 The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. HOP homeobox 84525
CXCL14 ENSG00000145824 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. C-X-C motif chemokine ligand 14 9547
NR1D1 ENSG00000126368 This gene encodes a transcription factor that is a member of the nuclear receptor subfamily 1. The encoded protein is a ligand-sensitive transcription factor that negatively regulates the expression of core clock proteins. In particular this protein represses the circadian clock transcription factor aryl hydrocarbon receptor nuclear translocator-like protein 1 (ARNTL). This protein may also be involved in regulating genes that function in metabolic, inflammatory and cardiovascular processes. nuclear receptor subfamily 1 group D member 1 9572
CTNNBIP1 ENSG00000178585 The protein encoded by this gene binds CTNNB1 and prevents interaction between CTNNB1 and TCF family members. The encoded protein is a negative regulator of the Wnt signaling pathway. Two transcript variants encoding the same protein have been found for this gene. catenin beta interacting protein 1 56998
THEM5 ENSG00000196407 NA thioesterase superfamily member 5 284486
LGALSL ENSG00000119862 NA lectin, galactoside binding like 29094
COL7A1 ENSG00000114270 This gene encodes the alpha chain of type VII collagen. The type VII collagen fibril, composed of three identical alpha collagen chains, is restricted to the basement zone beneath stratified squamous epithelia. It functions as an anchoring fibril between the external epithelia and the underlying stroma. Mutations in this gene are associated with all forms of dystrophic epidermolysis bullosa. In the absence of mutations, however, an acquired form of this disease can result from an autoimmune response made to type VII collagen. collagen type VII alpha 1 1294
RORA ENSG00000069667 The protein encoded by this gene is a member of the NR1 subfamily of nuclear hormone receptors. It can bind as a monomer or as a homodimer to hormone response elements upstream of several genes to enhance the expression of those genes. The encoded protein has been shown to interact with NM23-2, a nucleoside diphosphate kinase involved in organogenesis and differentiation, as well as with NM23-1, the product of a tumor metastasis suppressor candidate gene. Also, it has been shown to aid in the transcriptional regulation of some genes involved in circadian rhythm. Four transcript variants encoding different isoforms have been described for this gene. RAR related orphan receptor A 6095
BLMH ENSG00000108578 Bleomycin hydrolase (BMH) is a cytoplasmic cysteine peptidase that is highly conserved through evolution; however, the only known activity of the enzyme is metabolic inactivation of the glycopeptide bleomycin (BLM), an essential component of combination chemotherapy regimens for cancer. The protein contains the signature active site residues of the cysteine protease papain superfamily. bleomycin hydrolase 642
FGFR3 ENSG00000068078 This gene encodes a member of the fibroblast growth factor receptor (FGFR) family, with its amino acid sequence being highly conserved between members and among divergent species. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein would consist of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds acidic and basic fibroblast growth hormone and plays a role in bone development and maintenance. Mutations in this gene lead to craniosynostosis and multiple types of skeletal dysplasia. Three alternatively spliced transcript variants that encode different protein isoforms have been described. fibroblast growth factor receptor 3 2261
LOC284023 ENSG00000179859 NA uncharacterized LOC284023 284023
TUFT1 ENSG00000143367 Tuftelin is an acidic protein that is thought to play a role in dental enamel mineralization and is implicated in caries susceptibility. It is also thought to be involved with adaptation to hypoxia, mesenchymal stem cell function, and neurotrophin nerve growth factor mediated neuronal differentiation. tuftelin 1 7286
CASZ1 ENSG00000130940 The protein encoded by this gene is a zinc finger transcription factor. The encoded protein may function as a tumor suppressor, and single nucleotide polymorphisms in this gene are associated with blood pressure variation. Alternative splicing results in multiple transcript variants that encode different protein isoforms. castor zinc finger 1 54897
SCGB1B2P ENSG00000268751 NA secretoglobin family 1B member 2, pseudogene 643719
RPLP1 ENSG00000137818 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal phosphoprotein that is a component of the 60S subunit. The protein, which is a functional equivalent of the E. coli L7/L12 ribosomal protein, belongs to the L12P family of ribosomal proteins. It plays an important role in the elongation step of protein synthesis. Unlike most ribosomal proteins, which are basic, the encoded protein is acidic. Its C-terminal end is nearly identical to the C-terminal ends of the ribosomal phosphoproteins P0 and P2. The P1 protein can interact with P0 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Two alternatively spliced transcript variants that encode different proteins have been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein lateral stalk subunit P1 6176
TINCR ENSG00000223573 This gene produces a spliced long non-coding RNA that is required for normal epidermal differentiation. This transcript regulates the expression of genes involved in the differentiation of epidermal tissue. Mutations in some of the genes targeted by this transcript have been implicated in epidermal skin diseases. tissue differentiation-inducing non-protein coding RNA 257000
FOS ENSG00000170345 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. In some cases, expression of the FOS gene has also been associated with apoptotic cell death. FBJ murine osteosarcoma viral oncogene homolog 2353
LOC101930123 ENSG00000103319 NA eukaryotic elongation factor 2 kinase 101930123
EEF2K ENSG00000103319 This gene encodes a highly conserved protein kinase in the calmodulin-mediated signaling pathway that links activation of cell surface receptors to cell division. This kinase is involved in the regulation of protein synthesis. It phosphorylates eukaryotic elongation factor 2 (EEF2) and thus inhibits the EEF2 function. The activity of this kinase is increased in many cancers and may be a valid target for anti-cancer treatment. eukaryotic elongation factor 2 kinase 29904
CCL27 ENSG00000213927 This gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. The protein encoded by this gene is chemotactic for skin-associated memory T lymphocytes. This cytokine may also play a role in mediating homing of lymphocytes to cutaneous sites. It specifically binds to chemokine receptor 10 (CCR10). Studies of a similar murine protein indicate that these protein-receptor interactions have a pivotal role in T cell-mediated skin inflammation. C-C motif chemokine ligand 27 10850
EGFR ENSG00000146648 The protein encoded by this gene is a transmembrane glycoprotein that is a member of the protein kinase superfamily. This protein is a receptor for members of the epidermal growth factor family. EGFR is a cell surface protein that binds to epidermal growth factor. Binding of the protein to a ligand induces receptor dimerization and tyrosine autophosphorylation and leads to cell proliferation. Mutations in this gene are associated with lung cancer. Multiple alternatively spliced transcript variants that encode different protein isoforms have been found for this gene. epidermal growth factor receptor 1956
TRIM29 ENSG00000137699 The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype. tripartite motif containing 29 23650
JUP ENSG00000173801 This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. junction plakoglobin 3728
TNFRSF19 ENSG00000127863 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is highly expressed during embryonic development. It has been shown to interact with TRAF family members, and to activate JNK signaling pathway when overexpressed in cells. This receptor is capable of inducing apoptosis by a caspase-independent mechanism, and it is thought to play an essential role in embryonic development. Alternatively spliced transcript variants encoding distinct isoforms have been described. tumor necrosis factor receptor superfamily member 19 55504
KCNK7 ENSG00000173338 This gene encodes a member of the superfamily of potassium channel proteins containing two pore-forming P domains. The product of this gene has not been shown to be a functional channel; however, it may require other non-pore-forming proteins for activity. Multiple transcript variants encoding different isoforms have been found for this gene. potassium two pore domain channel subfamily K member 7 10089
GPNMB ENSG00000136235 The protein encoded by this gene is a type I transmembrane glycoprotein which shows homology to the pMEL17 precursor, a melanocyte-specific protein. GPNMB shows expression in the lowly metastatic human melanoma cell lines and xenografts but does not show expression in the highly metastatic cell lines. GPNMB may be involved in growth delay and reduction of metastatic potential. Two transcript variants encoding different isoforms have been found for this gene. glycoprotein nmb 10457
FAM57A ENSG00000167695 The protein encoded by this gene is a membrane-associated protein that promotes lung carcinogenesis. The encoded protein may be involved in amino acid transport and glutathione metabolism since it can interact with a solute carrier family member (SLC3A2) and an isoform of gamma-glutamyltranspeptidase-like 3. An alternatively spliced variant encoding a protein that lacks a 32 aa internal segment showed the opposite effect, inhibiting lung cancer cell growth. Knockdown of this gene also inhibited lung carcinogenesis and tumor cell growth. Several transcript variants encoding different isoforms have been found for this gene. family with sequence similarity 57 member A 79850
PPP1R13L ENSG00000104881 IASPP is one of the most evolutionarily conserved inhibitors of p53 (TP53; MIM 191170), whereas ASPP1 (MIM 606455) and ASPP2 (MIM 602143) are activators of p53. protein phosphatase 1 regulatory subunit 13 like 10848
APCDD1 ENSG00000154856 This locus encodes an inhibitor of the Wnt signaling pathway. Mutations at this locus have been associated with hereditary hypotrichosis simplex. Increased expression of this gene may also be associated with colorectal carcinogenesis. adenomatosis polyposis coli down-regulated 1 147495
LOC101927164 ENSG00000237101 NA uncharacterized LOC101927164 101927164
ZNF385A ENSG00000161642 Zinc finger proteins, such as ZNF385A, are regulatory proteins that act as transcription factors, bind single- or double-stranded RNA, or interact with other proteins (Sharma et al., 2004 [PubMed 15527981]). zinc finger protein 385A 25946
RAPGEFL1 ENSG00000108352 NA Rap guanine nucleotide exchange factor like 1 51195
EGR3 ENSG00000179388 This gene encodes a transcriptional regulator that belongs to the EGR family of C2H2-type zinc-finger proteins. It is an immediate-early growth response gene which is induced by mitogenic stimulation. The protein encoded by this gene participates in the transcriptional regulation of genes in controling biological rhythm. It may also play a role in a wide variety of processes including muscle development, lymphocyte development, endothelial cell growth and migration, and neuronal development. Alternative splicing results in multiple transcript variants encoding distinct isoforms. early growth response 3 1960
EPHB6 ENSG00000106123 This gene encodes a member of a family of transmembrane proteins that function as receptors for ephrin-B family proteins. Unlike other members of this family, the encoded protein does not contain a functional kinase domain. Activity of this protein can influence cell adhesion and migration. Expression of this gene is downregulated during tumor progression, suggesting that the protein may suppress tumor invasion and metastasis. Alternative splicing results in multiple transcript variants. EPH receptor B6 2051
IDE ENSG00000119912 This gene encodes a zinc metallopeptidase that degrades intracellular insulin, and thereby terminates insulins activity, as well as participating in intercellular peptide signalling by degrading diverse peptides such as glucagon, amylin, bradykinin, and kallidin. The preferential affinity of this enzyme for insulin results in insulin-mediated inhibition of the degradation of other peptides such as beta-amyloid. Deficiencies in this protein’s function are associated with Alzheimer’s disease and type 2 diabetes mellitus but mutations in this gene have not been shown to be causitive for these diseases. This protein localizes primarily to the cytoplasm but in some cell types localizes to the extracellular space, cell membrane, peroxisome, and mitochondrion. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Additional transcript variants have been described but have not been experimentally verified. insulin degrading enzyme 3416
ETV3 ENSG00000117036 NA ETS variant 3 2117
CA12 ENSG00000074410 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. This gene product is a type I membrane protein that is highly expressed in normal tissues, such as kidney, colon and pancreas, and has been found to be overexpressed in 10% of clear cell renal carcinomas. Three transcript variants encoding different isoforms have been identified for this gene. carbonic anhydrase 12 771
ZNF273 ENSG00000198039 This gene is a member of the krueppel C2H2-type zinc-finger protein family and encodes a protein with 13 C2H2-type zinc fingers and a KRAB domain. This nuclear protein is involved in transcriptional regulation. Alternative splicing results in multiple transcript variants. zinc finger protein 273 10793
LONRF1 ENSG00000154359 NA LON peptidase N-terminal domain and ring finger 1 91694
MYCL ENSG00000116990 NA v-myc avian myelocytomatosis viral oncogene lung carcinoma derived homolog 4610
ATP6V1C2 ENSG00000143882 This gene encodes a component of vacuolar ATPase (V-ATPase), a multisubunit enzyme that mediates acidification of eukaryotic intracellular organelles. V-ATPase dependent organelle acidification is necessary for such intracellular processes as protein sorting, zymogen activation, receptor-mediated endocytosis, and synaptic vesicle proton gradient generation. V-ATPase is composed of a cytosolic V1 domain and a transmembrane V0 domain. The V1 domain consists of three A,three B, and two G subunits, as well as a C, D, E, F, and H subunit. The V1 domain contains the ATP catalytic site. This gene encodes alternate transcriptional splice variants, encoding different V1 domain C subunit isoforms. ATPase H+ transporting V1 subunit C2 245973
LOC101929777 ENSG00000108379 NA uncharacterized LOC101929777 101929777
WNT3 ENSG00000108379 The WNT gene family consists of structurally related genes which encode secreted signaling proteins. These proteins have been implicated in oncogenesis and in several developmental processes, including regulation of cell fate and patterning during embryogenesis. This gene is a member of the WNT gene family. It encodes a protein which shows 98% amino acid identity to mouse Wnt3 protein, and 84% to human WNT3A protein, another WNT gene product. The mouse studies show the requirement of Wnt3 in primary axis formation in the mouse. Studies of the gene expression suggest that this gene may play a key role in some cases of human breast, rectal, lung, and gastric cancer through activation of the WNT-beta-catenin-TCF signaling pathway. This gene is clustered with WNT15, another family member, in the chromosome 17q21 region. Wnt family member 3 7473
BICD2 ENSG00000185963 This gene is one of two human homologs of Drosophila bicaudal-D and a member of the Bicoid family. It has been implicated in dynein-mediated, minus end-directed motility along microtubules. It has also been reported to be a phosphorylation target of NIMA related kinase 8. Two alternative splice variants have been described. BICD cargo adaptor 2 23299
IL20RB ENSG00000174564 IL20RB and IL20RA (MIM 605620) form a heterodimeric receptor for interleukin-20 (IL20; MIM 605619) (Blumberg et al., 2001 [PubMed 11163236]). interleukin 20 receptor subunit beta 53833
IRF6 ENSG00000117595 This gene encodes a member of the interferon regulatory transcription factor (IRF) family. Family members share a highly-conserved N-terminal helix-turn-helix DNA-binding domain and a less conserved C-terminal protein-binding domain. The encoded protein may be a transcriptional activator. Mutations in this gene can cause van der Woude syndrome and popliteal pterygium syndrome. Mutations in this gene are also associated with non-syndromic orofacial cleft type 6. Alternate splicing results in multiple transcript variants. interferon regulatory factor 6 3664
TRIM35 ENSG00000104228 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The function of this protein has not been identified. tripartite motif containing 35 23087
METRNL ENSG00000176845 NA meteorin, glial cell differentiation regulator-like 284207
VANGL2 ENSG00000162738 The protein encoded by this gene is a membrane protein involved in the regulation of planar cell polarity, especially in the stereociliary bundles of the cochlea. The encoded protein transmits directional signals to individual cells or groups of cells in epithelial sheets. This protein is also involved in the development of the neural plate. VANGL planar cell polarity protein 2 57216
ACVR1B ENSG00000135503 This gene encodes an activin A type IB receptor. Activins are dimeric growth and differentiation factors which belong to the transforming growth factor-beta (TGF-beta) superfamily of structurally related signaling proteins. Activins signal through a heteromeric complex of receptor serine kinases which include at least two type I and two type II receptors. This protein is a type I receptor which is essential for signaling. Mutations in this gene are associated with pituitary tumors. Alternate splicing results in multiple transcript variants. activin A receptor type 1B 91
GJA1 ENSG00000152661 This gene is a member of the connexin gene family. The encoded protein is a component of gap junctions, which are composed of arrays of intercellular channels that provide a route for the diffusion of low molecular weight materials from cell to cell. The encoded protein is the major protein of gap junctions in the heart that are thought to have a crucial role in the synchronized contraction of the heart and in embryonic development. A related intronless pseudogene has been mapped to chromosome 5. Mutations in this gene have been associated with oculodentodigital dysplasia, autosomal recessive craniometaphyseal dysplasia and heart malformations. gap junction protein alpha 1 2697
MAFB ENSG00000204103 The protein encoded by this gene is a basic leucine zipper (bZIP) transcription factor that plays an important role in the regulation of lineage-specific hematopoiesis. The encoded nuclear protein represses ETS1-mediated transcription of erythroid-specific genes in myeloid cells. This gene contains no introns. v-maf avian musculoaponeurotic fibrosarcoma oncogene homolog B 9935
ACAD9 ENSG00000177646 This gene encodes a member of the acyl-CoA dehydrogenase family. Members of this family of proteins localize to the mitochondria and catalyze the rate-limiting step in the beta-oxidation of fatty acyl-CoA. The encoded protein is specifically active toward palmitoyl-CoA and long-chain unsaturated substrates. Mutations in this gene cause acyl-CoA dehydrogenase family member type 9 deficiency. Alternate splicing results in multiple transcript variants. acyl-CoA dehydrogenase family member 9 28976
DNASE1L2 ENSG00000167968 NA deoxyribonuclease I-like 2 1775
ELOVL4 ENSG00000118402 This gene encodes a membrane-bound protein which is a member of the ELO family, proteins which participate in the biosynthesis of fatty acids. Consistent with the expression of the encoded protein in photoreceptor cells of the retina, mutations and small deletions in this gene are associated with Stargardt-like macular dystrophy (STGD3) and autosomal dominant Stargardt-like macular dystrophy (ADMD), also referred to as autosomal dominant atrophic macular degeneration. ELOVL fatty acid elongase 4 6785
RP5-1126H10.2 ENSG00000272084 NA NA ENSG00000272084
RALBP1 ENSG00000017797 RALBP1 plays a role in receptor-mediated endocytosis and is a downstream effector of the small GTP-binding protein RAL (see RALA; MIM 179550). Small G proteins, such as RAL, have GDP-bound inactive and GTP-bound active forms, which shift from the inactive to the active state through the action of RALGDS (MIM 601619), which in turn is activated by RAS (see HRAS; MIM 190020) (summary by Feig, 2003 [PubMed 12888294]). ralA binding protein 1 10928
FAM110A ENSG00000125898 NA family with sequence similarity 110 member A 83541
RP11-84A14.5 ENSG00000223989 NA NA ENSG00000223989
HES1 ENSG00000114315 This protein belongs to the basic helix-loop-helix family of transcription factors. It is a transcriptional repressor of genes that require a bHLH protein for their transcription. The protein has a particular type of basic domain that contains a helix interrupting protein that binds to the N-box rather than the canonical E-box. hes family bHLH transcription factor 1 3280
SPPL3 ENSG00000157837 NA signal peptide peptidase like 3 121665
PLCH2 ENSG00000149527 PLCH2 is a member of the PLC-eta family of the phosphoinositide-specific phospholipase C (PLC) superfamily of enzymes that cleave PtdIns(4,5) P2 to generate second messengers inositol 1,4,5-trisphosphate and diacylglycerol (Zhou et al., 2005 [PubMed 16107206]). phospholipase C eta 2 9651
WEE1 ENSG00000166483 This gene encodes a nuclear protein, which is a tyrosine kinase belonging to the Ser/Thr family of protein kinases. This protein catalyzes the inhibitory tyrosine phosphorylation of CDC2/cyclin B kinase, and appears to coordinate the transition between DNA replication and mitosis by protecting the nucleus from cytoplasmically activated CDC2 kinase. WEE1 G2 checkpoint kinase 7465
RNH1 ENSG00000023191 Placental ribonuclease inhibitor (PRI) is a member of a family of proteinaceous cytoplasmic RNase inhibitors that occur in many tissues and bind to both intracellular and extracellular RNases (summarized by Lee et al., 1988 [PubMed 3219362]). In addition to control of intracellular RNases, the inhibitor may have a role in the regulation of angiogenin (MIM 105850). Ribonuclease inhibitor, of 50,000 Da, binds to ribonucleases and holds them in a latent form. Since neutral and alkaline ribonucleases probably play a critical role in the turnover of RNA in eukaryotic cells, RNH may be essential for control of mRNA turnover; the interaction of eukaryotic cells with ribonuclease may be reversible in vivo. ribonuclease/angiogenin inhibitor 1 6050
PARD6G ENSG00000178184 NA par-6 family cell polarity regulator gamma 84552
BNIPL ENSG00000163141 The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. BCL2/adenovirus E1B 19kD interacting protein like 149428
GAL3ST4 ENSG00000197093 This gene encodes a member of the galactose-3-O-sulfotransferase protein family. The product of this gene catalyzes sulfonation by transferring a sulfate to the C-3’ position of galactose residues in O-linked glycoproteins. This enzyme is highly specific for core 1 structures, with asialofetuin, Gal-beta-1,3-GalNAc and Gal-beta-1,3 (GlcNAc-beta-1,6)GalNAc being good substrates. galactose-3-O-sulfotransferase 4 79690
S100A3 ENSG00000188015 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein has the highest content of cysteines of all S100 proteins, has a high affinity for Zinc, and is highly expressed in human hair cuticle. The precise function of this protein is unknown. S100 calcium binding protein A3 6274
RAB40C ENSG00000197562 NA RAB40C, member RAS oncogene family 57799
SERPINB8 ENSG00000166401 The superfamily of high molecular weight serine proteinase inhibitors (serpins) regulate a diverse set of intracellular and extracellular processes such as complement activation, fibrinolysis, coagulation, cellular differentiation, tumor suppression, apoptosis, and cell migration. Serpins are characterized by well-conserved a tertiary structure that consists of 3 beta sheets and 8 or 9 alpha helices (Huber and Carrell, 1989 [PubMed 2690952]). A critical portion of the molecule, the reactive center loop connects beta sheets A and C. Protease inhibitor-8 (PI8; SERPINB8) is a member of the ov-serpin subfamily, which, relative to the archetypal serpin PI1 (MIM 107400), is characterized by a high degree of homology to chicken ovalbumin, lack of N- and C-terminal extensions, absence of a signal peptide, and a serine rather than an asparagine residue at the penultimate position (summary by Bartuski et al., 1997 [PubMed 9268635]). serpin family B member 8 5271
RPS6KB2 ENSG00000175634 This gene encodes a member of the RSK (ribosomal S6 kinase) family of serine/threonine kinases. This kinase contains a kinase catalytic domain and phosphorylates the S6 ribosomal protein and eukaryotic translation initiation factor 4B (eIF4B). Phosphorylation of S6 leads to an increase in protein synthesis and cell proliferation. ribosomal protein S6 kinase B2 6199
ADAM15 ENSG00000143537 The protein encoded by this gene is a member of the ADAM (a disintegrin and metalloproteinase) protein family. ADAM family members are type I transmembrane glycoproteins known to be involved in cell adhesion and proteolytic ectodomain processing of cytokines and adhesion molecules. This protein contains multiple functional domains including a zinc-binding metalloprotease domain, a disintegrin-like domain, as well as a EGF-like domain. Through its disintegrin-like domain, this protein specifically interacts with the integrin beta chain, beta 3. It also interacts with Src family protein-tyrosine kinases in a phosphorylation-dependent manner, suggesting that this protein may function in cell-cell adhesion as well as in cellular signaling. Multiple alternatively spliced transcript variants encoding distinct isoforms have been observed. ADAM metallopeptidase domain 15 8751
FAM46B ENSG00000158246 NA family with sequence similarity 46 member B 115572
VPS13D ENSG00000048707 This gene encodes a protein belonging to the vacuolar-protein-sorting-13 gene family. In yeast, vacuolar-protein-sorting-13 proteins are involved in trafficking of membrane proteins between the trans-Golgi network and the prevacuolar compartment. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode distinct isoforms. vacuolar protein sorting 13 homolog D 55187
JAG2 ENSG00000184916 The Notch signaling pathway is an intercellular signaling mechanism that is essential for proper embryonic development. Members of the Notch gene family encode transmembrane receptors that are critical for various cell fate decisions. The protein encoded by this gene is one of several ligands that activate Notch and related receptors. Two transcript variants encoding different isoforms have been found for this gene. jagged 2 3714
DEGS2 ENSG00000168350 This gene encodes a bifunctional enzyme that is involved in the biosynthesis of phytosphingolipids in human skin and in other phytosphingolipid-containing tissues. This enzyme can act as a sphingolipid delta(4)-desaturase, and also as a sphingolipid C4-hydroxylase. delta(4)-desaturase, sphingolipid 2 123099
PLEKHG5 ENSG00000171680 This gene encodes a protein that activates the nuclear factor kappa B (NFKB1) signaling pathway. Mutations in this gene are associated with autosomal recessive distal spinal muscular atrophy. Multiple transcript variants encoding different isoforms have been found for this gene. pleckstrin homology and RhoGEF domain containing G5 57449
write.table(as.factor(out$query), paste0("../utilities/gene_names_clus_",6,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Cluster 7 Annotations

out <- mygene::queryMany(gene_list[7,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id name query summary
NEB 4703 nebulin ENSG00000183091 This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy.
MYH1 4619 myosin, heavy chain 1, skeletal muscle, adult ENSG00000109061 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development.
MYH2 4620 myosin, heavy chain 2, skeletal muscle, adult ENSG00000125414 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified.
MYBPC1 4604 myosin binding protein C, slow type ENSG00000196091 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
ACTA1 58 actin, alpha 1, skeletal muscle ENSG00000143632 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects.
MYL1 4632 myosin light chain 1 ENSG00000168530 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene.
TNNC2 7125 troponin C2, fast skeletal type ENSG00000101470 Troponin (Tn), a key protein complex in the regulation of striated muscle contraction, is composed of 3 subunits. The Tn-I subunit inhibits actomyosin ATPase, the Tn-T subunit binds tropomyosin and Tn-C, while the Tn-C subunit binds calcium and overcomes the inhibitory action of the troponin complex on actin filaments. The protein encoded by this gene is the Tn-C subunit.
TNNT1 7138 troponin T1, slow skeletal type ENSG00000105048 This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene.
CKM 1158 creatine kinase, M-type ENSG00000104879 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family.
ATP2A1 487 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 1 ENSG00000196296 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in muscular excitation and contraction. Mutations in this gene cause some autosomal recessive forms of Brody disease, characterized by increasing impairment of muscular relaxation during exercise. Alternative splicing results in three transcript variants encoding different isoforms.
PYGM 5837 phosphorylase, glycogen, muscle ENSG00000068976 This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants.
TTN 7273 titin ENSG00000155657 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma.
MYBPC2 4606 myosin binding protein C, fast type ENSG00000086967 This gene encodes a member of the myosin-binding protein C family. This family includes the fast-, slow- and cardiac-type isoforms, each of which is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The protein encoded by this locus is referred to as the fast-type isoform. Mutations in the related but distinct genes encoding the slow-type and cardiac-type isoforms have been associated with distal arthrogryposis, type 1 and hypertrophic cardiomyopathy, respectively.
MYLPF 29895 myosin light chain, phosphorylatable, fast skeletal muscle ENSG00000180209 NA
TNNT3 7140