Short from multiple GWAS analysis on various other

Short
Title: Identification
and functional annotation of causal non-coding variants involved in
complex diseases taking Alzheimer as model disease

Rational
of the study

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Genome-wide
association studies (GWAS), SNP-based genome-wide scans, have taken
one approach to assess complex diseases. GWAS interrogated the
genetic determinants of complex diseases as a dichotomous phenotype
(disease vs. no disease). We have seen from multiple GWAS analysis on
various other complex diseases that only a minor fraction of
risk-associated loci harbor SNPs that affect coding sequences,
suggesting that the majority of causal variants do not alter protein
sequences. Thus, the majority of the SNPs lies at inter genic region
of genome, where inferring functional consequences of sequence
variants has been challenging 1. Recently researchers have
identified that variants associated with diseased phenotype
preferentially reside in these noncoding regulatory regions and can
affect gene regulation. Utilizing data from ENCODE and Roadmap
Epigenome projects 2 people have shown that most of GWAS associated
SNPs or highly correlated variants overlapped enhancers 3.

Introduction

In
this research proposal I want
to combine computational methods, and disease modelling in human
induced pluripotent stem cells (iPSCs), as well as want to exploit
extensive publicly available genomic and epigenomic resources to
comprehensively annotate neuron related complex disease and test
variants for their gene regulatory function. I will use Alzheimer as
a model disease to identify and functional annotation of causal
non-coding variants involved in disease. Around 54 GWAS have been
published for Alzheimer disease 4. However, remarkable progress of
GWAS there are substantial limitations to this approach. Although
statistically significant associations have been identified, but for
the most part the causal genetic variants underlying susceptibility
for Alzheimer disease are yet to be identified. Each risk-associated
locus is defined by a lead SNP (SNPs which have been shown
significantly associated with Alzheimer in previously 54 published
GWAS) which is directly associated with additional variants that are
in linkage disequilibrium (LD) with the lead SNP. All risk associated
variants can be putatively causal and extensive experimental and
computational analyses are required to identify the precise
variant(s) underlying the association. Importantly, to date
identification of the actual or highly causal variant(s) in any
complex disease risk-associated loci with their functional annotation
is lacking 5. Thus, one of the major challenges in this post-GWAS
era is to identify and functionally annotate causal non-coding
variants involved in complex diseases as here we will test this
hypothesis taking Alzheimer disease as a model disease.

Methodology

The
proposed project work flow is as follows:

Aim
1: Identify a comprehensive set of Alzheimer candidate causal
variants through systematic analysis of genomic and epigenomic
datasets

We
will identify candidate regulatory variants within and neighboring
Alzheimer GWAS risk-associated loci that potentially influence these
traits. As recently shown that GWAS-loci can harbor multiple
independent rare variants (not in LD with lead SNPs) that contribute
to disease 6. Additionally, previous studies have shown that
regulatory variants can work at far distances (~1Mb) to affect the
expression of genes 78910.

To
considering all previous facts, we want to prioritize based on
putative regulatory function rather than LD structure. We will select
Alzheimer associated common variant which has minor allele frequency
(MAF) >1% (Lead SNPs) in previously published GWAS data. We will
identify nearly all SNPs with a minor allele frequency (MAF) >1%
within and surrounding risk-associated loci (+/- 1Mb of all lead
SNPs)
for Alzheimer
disease by analyzing ~4500 whole-genome sequences (~2500 genome from
1000 genome phase 3 (http://www.1000genomes.org)
and 2000 genome sequences from TCGA data set
(https://tcga-data.nci.nih.gov/tcga)
). Regulatory regions in the previously known risk-associated loci
and extended intervals (+/- 1Mb) will be identified by examining the
ENCODE and Roadmap Epigenome projects data, which include the brain
relevant tissue and cell type. The variants with a MAF >1% and the
rare variants will be annotated using intersectBed to identify those
that overlap enhancer regions, open
chromatin domains.

For pilot scale analysis variants with high LD (r2
>=.80) with lead SNPs will be selected for further
characterization.

Aim
2: Alzheimer associated enhancer elements screening in in
vitro condition

Computational
analysis of existing genomic and epigenomic datasets in Aim 1 will
likely yield candidate regulatory variants in risk-associated loci.

We will carry out high throughput molecular assays to determine which
of these non-coding variants impair transcriptional regulation in
iPSCs derived neuronal cell line or Alzheimer specific cell lines. We
will first perform massive parallel reporter assays (MPRA) 11 to
test the ability of the genomic sequences corresponding to the above
variants to act as transcriptional enhancers, and if so, whether
enhancer activity is affected by the candidate variants.

We will use computational strategies to examine the potential effect
of a variant on transcription factor (TF) binding. We will first
explore existing TF binding specificity databases, and determine if
the variant could disrupt TF binding motifs. We will scan all the
~1,500 currently known TF motifs in TRANSFAC 12, JASPAR 13,
UniPROBE 14 and hPDI 15 databases for their presence at or near
the SNPs. There are several ongoing projects to identify DNA motifs
recognized by the ~1,500 TFs encoded in the human genome. We will
include the new motifs once they are available. We will use the
position-weighted-matrix (PWM) of each TF to calculate the binding
index to determine if a variant could result in significantly reduced
binding affinity. To functionally test effect of enhancer region in
gene expression regulation first we will prioritize candidate regions
by following method:

1)
Candidate SNPs with MAF>1% and in high LD (r2
>=.80)

2)
These variants should be present in putative enhancer regions

3)
Variant should affect TF binding

4)
These TF should have role in neuron related processes

After
prioritizing these variants whether the elected SNP-containing
enhancers regulate important neuron related genes, we will introduce
a 1kb deletion of each enhancer into the genome of iPSCs derived
neuronal cells line by CRISPR/Cas9-mediated homologous recombination.

We will characterize the effect on enhancers based on life-dead
screen.

Aim
3: Molecular characterization of candidate regulatory elements using
multiple high throughput assays

Enhancers
can be located as far as 1 Mb from the target genes, making it a
great challenge to infer their target genes 1617. It is now
generally agreed that distal enhancers act to regulate gene
expression by looping to the target promoters 16. Consequently,
identification of long-range chromatin interactions could help
determine enhancer/promoter target relationships.

After
functionally validating enhancer regions we will use 4C method to
identify long range interaction between these characterized enhancer
regions.

Impact
and conclusive remarks

This
project will certainly add new layers to complex disease biology as
it aims to characterize genetic variations in human genome with
complex disease here for Alzheimer disease as model disease. This
goal will be achieved by 1) integrative analysis of multidimensional
datasets 2) deciphering the role of non-coding elements in complex
disease 3) using computational approaches to prioritize the
identified genetic variants for follow up functional validation and
4) directly assaying enhancer activities.

References

1.

Edwards SL, Beesley J, French JD, Dunning M: Beyond GWASs:
Illuminating the dark road from association to function. American
Journal of Human Genetics 2013:779–797.

2.

Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M: An
integrated encyclopedia of DNA elements in the human genome.

Nature 2012, 489:57–74.

3.

Grundberg E, Meduri E, Sandling JK, Hedman ÅK, Keildson S, Buil A,
Busche S, Yuan W, Nisbet J, Sekowska M, Wilk A, Barrett A, Small KS,
Ge B, Caron M, Shin SY, Lathrop M, Dermitzakis ET, McCarthy MI,
Spector TD, Bell JT, Deloukas P: Global analysis of dna
methylation variation in adipose tissue from twins reveals links to
disease-associated variants in distal regulatory elements. Am
J Hum Genet 2013, 93:876–890.

4.

Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm
A, Flicek P, Manolio T, Hindorff L, Parkinson H: The NHGRI GWAS
Catalog, a curated resource of SNP-trait associations. Nucleic
Acids Res 2014, 42.

5.

Frazer KA, Murray SS, Schork NJ, Topol EJ: Human genetic variation
and its contribution to complex traits. Nat Rev Genet
2009, 10:241–251.

6.

Balzola F, Bernstein C, Ho GT, Russell RK: Deep resequencing of
GWAS loci identifies independent rare variants associated with
inflammatory bowel disease: Commentary. Inflammatory Bowel
Disease Monitor 2012:126–127.

7.

Schierding W, Cutfield WS, O’Sullivan JM: The missing story
behind Genome Wide Association Studies: Single nucleotide
polymorphisms in gene deserts have a story to tell. Frontiers
in Genetics 2014(FEB).

8.

Smemo S, Tena JJ, Kim K-H, Gamazon ER, Sakabe NJ, Gómez-Marín C,
Aneas I, Credidio FL, Sobreira DR, Wasserman NF, Lee JH, Puviindran
V, Tam D, Shen M, Son JE, Vakili NA, Sung H-K, Naranjo S, Acemel RD,
Manzanares M, Nagy A, Cox NJ, Hui C-C, Gomez-Skarmeta JL, Nóbrega M
a: Obesity-associated variants within FTO form long-range
functional connections with IRX3. Nature 2014, 507:371–5.

9.

Sandhu KS, Li G, Poh HM, Quek YLK, Sia YY, Peh SQ, Mulawadi FH, Lim
J, Sikic M, Menghi F, Thalamuthu A, Sung WK, Ruan X, Fullwood MJ, Liu
E, Csermely P, Ruan Y: Large-Scale Functional Organization of
Long-Range Chromatin Interaction Networks. Cell Rep 2012,
2:1207–1219.

10.

Guenther C a, Tasic B, Luo L, Bedell M a, Kingsley DM: A molecular
basis for classic blond hair color in Europeans. Nat Genet
2014, 46:748–52.

11.

Melnikov A, Murugan A, Zhang X, Tesileanu T, Wang L, Rogov P, Feizi
S, Gnirke A, Callan CG, Kinney JB, Kellis M, Lander ES, Mikkelsen TS:
Systematic dissection and optimization of inducible enhancers in
human cells using a massively parallel reporter assay. Nature
Biotechnology 2012:271–277.

12.

Matys V, Fricke E, Geffers R, Gößling E, Haubrock M, Hehl R,
Hornischer K, Karas D, Kel AE, Kel-Margoulis O V., Kloos DU, Land S,
Lewicki-Potapov B, Michael H, Münch R, Reuter I, Rotert S, Saxel H,
Scheer M, Thiele S, Wingender E: TRANSFAC®: Transcriptional
regulation, from patterns to profiles. Nucleic Acids Research
2003:374–378.

13.

Sandelin A, Alkema W, Engström P, Wasserman WW, Lenhard B: JASPAR:
an open-access database for eukaryotic transcription factor binding
profiles. Nucleic Acids Res 2004, 32(Database
issue):D91–D94.

14.

Newburger DE, Bulyk ML: UniPROBE: An online database of protein
binding microarray data on protein-DNA interactions. Nucleic
Acids Res 2009, 37(SUPPL. 1).

15.

Xie Z, Hu S, Blackshaw S, Zhu H, Qian J: hPDI: A database of
experimental human protein-DNA interactions. Bioinformatics
2010, 26:287–289.

16.

Miele A, Dekker J: Long-range chromosomal interactions and gene
regulation. Mol Biosyst 2008, 4:1046–1057.

17.

Lettice LA, Horikoshi T, Heaney SJH, van Baren MJ, van der Linde HC,
Breedveld GJ, Joosse M, Akarsu N, Oostra BA, Endo N, Shibata M,
Suzuki M, Takahashi E, Shinka T, Nakahori Y, Ayusawa D, Nakabayashi
K, Scherer SW, Heutink P, Hill RE, Noji S: Disruption of a
long-range cis-acting regulator for Shh causes preaxial polydactyly.

Proc Natl Acad Sci U S A 2002, 99:7548–7553.