SNPs (точечные нуклеотидные полиморфизмы)

advertisement
Сравнительная геномика
Полиморфизм генома человека
Василий Евгеньевич Раменский,
Институт молекулярной биологии РАН
ФББ, 4 курс
People are different…
…and so are their genomes
…caccagctcctgtgGggggaggccctgct…
…caccagctcctgtgGggggaggccctgct…
…caccagctcctgtgGggggaggccctgct…
…caccagctcctgtgCggggaggccctgct…
…caccagctcctgtgCggggaggccctgct…
Определение
SNP (single nucleotide polymorphism): существование в популяции
на одной и той же позиции геномной ДНК двух нуклеотидных
вариантов с частотой более редкого варианта (аллеля) ≥1%
5’---------------A---------------3’
|||||||||||||||||||||||||||||||
3’---------------T---------------5’
Na
5’---------------G---------------3’
|||||||||||||||||||||||||||||||
3’---------------C---------------5’
Ng
Na+Ng = N, Na/N ≥0.01, Ng/N ≥0.01
Комментарии к определению
•речь идет о сравнении последовательностей одного биол. вида
•слово «полиморфизм» не имеет в русском языке
множественного числа (Н.Ляпунова, личное сообщение)
•в обыденной речи под «полиморфизмом» чаще всего
подразумевают именно нуклеотид (т.е. используют его как
синоним слова «мутация»)
•определение подразумевает достоверное измерение частот в
популяции(-ях), что в текущей практике пока редкость
Типы полиморфизма в геноме
* однонуклеотидный (SNP)
* короткая вставка/делеция
* микросателлитный повтор различной длины (VNTR,
variable number tandem repeat)
* вставка объекта
* множественный нуклеотидный (MNP)
Некоторые свойства SNPs
• Comprise the ~90% of human genetic variation
• Occur with an average density ~1/1000 bp
• Transition C↔T(G↔A) occurs at ~2/3 of all cases, three
transversions C↔A (G↔T), C↔G(G↔C), T↔A(A↔T) in
~1/6 of all cases each
• Most of them (~85%) are common to all populations
(with differing allele frequencies)
Why SNPs are important?
• Convenient genetic markers
• Responsible for existence of various phenotypes,
with primary interest in disease ones
• Pharmacogenomics: individual response to drugs
• Clues to understand human evolution
SNP в геноме человека
dbSNP build statistics
Build
10?
106
110
119
124
Date
# rs’s, x106
Feb. 01. . . . . . . . . .1.42
Aug. 02. . . . . . . . . .2.81
Jan. 03. . . . . . . . . . 3.05
Jan. 04. . . . . . . . . . 7.23
Jan. 05 . . . . . . . . . .10.0
Estimates of SNP density in the human genome
• Li and Sadler (1991), Genetics, ~1/1000 bp
• Zhao et al., (2003), Gene: ~1/1200 bp
• dbSNP, build 124 (2005): ~1/300 bp (?)
Классификация SNP по положению в геноме
1. гены
1.1 UTR
1.2 экзоны (cSNP)
1.2.1 синонимичные(sSNP)
1.2.2 несинонимичные (nsSNP)
1.3 интроны
1.4 сайты сплайсинга
2. регуляторные участки генов (rSNP)
3. межгенные участки
Synonymous vs. non-synonymous SNPs:
Example: Lysosomal alpha-glucosidase precursor (SwissProt P10253)
Hypothetical SNP: C  T
HGVBase ID: SNP000003023 G  C
…CAC CAG CTC CTG TGG GGG GAG GCC CTG CT…
…CAC CAG CTC CTG TGC GGG GAG GCT CTG CT…
… H
Q
L
L
W
G
E
A
L
…
… H
Q
L
L
C
G
E
A
L
…
nsSNP Trp746Cys
sSNP Ala749Ala
Summary of Annotation on human Genome Build 33
dbSNP Build 124 :
FUNCTION
CLASS
CODE
1
GENE
COUNT
SNP COUNT
338787
FUNCTIONAL
CLASSIFICATION
26210 Locus region
39214
Allele synonymous to contig
14342 nucleotide
4
50772
Allele nonsynonymous to contig
15710 nucleotide
5
546965
6
2925773
7
832
8
89554
9
7111
3
17898 untranslated region
19332 intron
769 splice site
18655 Allele is same as contig nucleotide
1006 Coding: synonymy unknown
Упражнение
В одной базе ~11,000 nsSNPs в ~6,000 белков. В другой базе
~47,000 последовательностей белков общей длиной
~19.5x106 остатков. Оценить
(а) среднюю длину белка
(б) среднее число nsSNP в одном белке
(в) среднее число nsSNP на единицу длины белка
Жизненный цикл SNP (по Miller&Kwok, 2001)
I.
Появление нового аллельного варианта путем мутации
(~100 мутаций на индивидуум)
II. «Выживание» до момента появления гомозигот по этому
аллелю
III. Медленное увеличение частоты в популяции
IV. Фиксация нового аллеля (0 vs. 100%), превращение в
between-species difference
Упражнение
Описанный выше жизненный цикл SNP занимает ~0.3 млн
лет. Предполагая, что разделение человека и шимпанзе
произошло ~5 млн лет назад, а выход H.sapiens из Африки и
разделение различных популяций ~0.1-0.2 млн лет назад,
аргументировать возможность существования (а) одинаковых
SNPs у человека и других видов, (б) «private» SNP, т.е.
локализованных в пределах одной человеческой популяции
Why polymorphisms are maintained
in the population?
• Selectionists: because heterozygotes have
higher fitness
• Neutralists: because all observed
polymoprhisms are selectively neutral
- - - - - -- - - - - - - - - - - - - - - - - - - - - - - - Reality: is always somewhat more complicated
Why SNPs are important?
• Convenient genetic markers
• Responsible for existence of various phenotypes,
with primary interest in disease ones
• Pharmacogenomics: individual response to drugs
• Clues to understand human evolution
nsSNPs vs. disease mutations
 Disease mutations are rare (<<1%) and usually cause
monogenic diseases (e.g., cystic fibrosis)
 nsSNPs are frequent (>1%) and can modify risks of
major common (multigenic, complex) diseases (e.g.,
cancer, cardiovascular disease, mental illness,
autoimmune states, diabetes)
In some cases, however, it is difficult to make a distinction
Some common nsSNPs are known to affect
critical structure features
Frequency of the haemochromatosis allelic variant of
HLA-H protein Cys260Tyr (with destroyed disulphide
bond) is up to 6% in Northern Europe
Identifying SNPs responsible for
specific phenotypes
 whole genome scan – hypothesis free approach;
extraordinary number of candidate SNPs
 candidate gene studies – requires a priori models;
nevertheless, large numbers of candidate SNPs to be
tested
Both methods, however, require huge amounts of
expensive experimental data and are are statistically
unreliable. Therefore, in silico expertise is required
Methods for prediction of effect of nsSNPs
* Sequence-based methods: analysis of multiple
alignment with homologs Ng-Henikoff [2002]
* Structure-based methods: analysis of various
structural parameters Wang, Moult [2001]; Chasman, Adams [2001]
* Combined methods: sequence and structure analysis
Sunyaev,Ramensky,Bork [2000, 2001, 2002]
PolyPhen: prediction of amino acid
substitution effect on protein function
Data sources:
1. Sequence annotation of the query protein
2. PSIC profile matrix values derived from multiple
alignment with homologous proteins
3. Structural parameters and contacts of query protein
structure or its >50% homolog
Prediction: benign (neutral), damaging (deleterious)
PolyPhen query processing flowchart
sequence
annotation
INPUT:
•Sequence:
…IMAGLQQTNSE…
•Position: 133
•Var1: Q
•Var2: P
PSIC profile
scores for two
amino acid
variants
•ACC/ID (if known
protein):
DMD_HUMAN
structural
parameters and
contacts
prediction
rules
PREDICTION:
•damaging
•benign
•unknown
I. Sequence annotation
Hereditary hemochromatosis protein
precursor (HLA-H, Q30201)
Features checked:
* bond: DISULFID, THIOLEST, THIOETH
* site: BINDING, ACT_SITE, LIPID, METAL, SITE,
MOD_RES, SE_CYS
* region: TRANSMEM, SIGNAL, PROPEP
II. PSIC: profile analysis of
homologous sequences
1. Align with homologous proteins with seq. ide. 30..94%
II. PSIC: profile analysis of
homologous sequences
2. Calculate the profile matrix with PSIC algorithm
Profile matrix: Sa,j = ln[ pa,j / qa ], a = {1,..20}, j = {1,..N}, N =
alignment length
SAsn,4
SCys,4
II. PSIC: profile analysis of
homologous sequences
3. Analyse difference between profile scores for two a.a.
variants:
AsnCys:  = | SAsn,4 – SCys,4 | = 1.591
SAsn,4
SCys,4
III. 3D structure analysis
1. Residues that are in spatial contact with a
ligand or other “critical” residues
Zen 999
residues in 5Å contact
with Zen 999
Bos Taurus trypsin
[PDB ID :1ql7]
III. 3D structure analysis
2. Residues that form the hydrophobic core of
the protein (buried residues)
Surface residues
Buried residues
Bos Taurus trypsin
[PDB ID :1ql7]
Structural parameters and contacts








Secondary structure
Phi-psi dihedral angles
Solvent accessible surface area, normed s.a.s.a
Change in accessible surface propensity
Change in residue side chain volume
Contacts with heteroatoms
Interchain contacts
Contacts with functional sites (BINDING,
ACT_SITE, LIPID, and METAL)
 Region of the phi-psi map (Ramachandran map)
 Normalised B-factor (temperature factor)
RULES (connected with logical AND)
PREDICTION
PSIC score
difference :
Substitution site properties:
arbitrary
annotated as a functional* or bond formation**
site
arbitrary
in a region annotated or predicted as
transmembrane
PHAT matrix difference resulting
from substitution is negative
0.5
arbitrary
arbitrary
benign
>1.0
atoms are closer than 3.0Å to atoms of a ligand
or residue annotated as BINDING, ACT_SITE,
LIPID, METAL
arbitrary
probably damaging
not considered
normed accessibility ACC15%
0.5<1.5
normed accessibility ACC5%
1.5<2.0
>2.0
Substitution type properties:
absolute change of accessible
surface propensity is 0.75 or
absolute change of side chain
volume is 60
absolute change of accessible
surface propensity is 1.0 or
absolute change of side chain
volume is 80
probably damaging
possibly damaging
possibly damaging
probably damaging
arbitrary
arbitrary
possibly damaging
arbitrary
arbitrary
probably damaging
Control sets
all
dam unknown dam/(dam+ben)
–––––––––––––––––––––––––––––––––––––––––––––
Disease mutations
Strict set
444 366 3
82.9%
Total
2,782 2,047 70
75.4%
Between species substitutions
Total
671 58
5
8.7%
PolyPhen:
predictions for nsSNPs
All SNPs from HGVBase, rel.12.............................983,589
synonymous...................................9,310 (5,378 proteins)
non-synonymous..............................11,152 (6,124 proteins)
Predictions for nsSNPs:
unknown................................................1,987
benign.................................................6,317
possibly damaging......................................1,591
probably damaging......................................1,257
Prediction basis:
multiple alignment...................................2,654
sequence annotation....................................118
structure...............................................76
PolyPhen predictions for dbSNP b.121
[ Ivan Adzhubei, 2004 ]
All:
9,502
27,991
7,905
5,521
50,919
unknown
benign...............67.6%
possibly damaging....19.1%
probably damaging....13.3%
total (44,005 unique rs’s)
With structure:
42
2,142
531
1,076
3,791
unknown
benign...............57.1%
possibly damaging....14.2%
probably damaging....28.7%
total (,167 uniqe rs’s)
PolyPhen predictions for dbSNP b.121
[ Ivan Adzhubei, 2004 ]
All:
Filtered: 5 seq. in multiple alignment
16,813
5,195
4,168
26,176
benign...............64.2%
possibly damaging....19.8%
probably damaging....15.9%
total (21,677 unique rs’s)
With structure:
Filtered: 5 seq. in multiple alignment
2,021
499
1,050
3,570
benign...............56.6%
possibly damaging....14.0%
probably damaging....29.4%
total (2,983 unique rs’s)
Hydrophobic core stability parameters
are the best predictors
Ramensky et al., Nucleic Acids Res. (2002) 30:3894-90
PolyPhen http://www.bork.embl.de/PolyPhen
PolyPhen input :
Protein identifier
OR sequence
Substitution
position
Substitution type
PolyPhen http://www.bork.embl.de/PolyPhen
PolyPhen:
nsSNPs data collection
Transphyretin
(PDB: 1tyr,
SNP000012365)
Thr118  Asn occurs
at the ligand (REA)
binding site
Thr 118
REA 130
DAMAGING nsSNPs
Trypsin
(PDB: 1trn,
SNP000012965)
Ser142Phe results
in the strong side
chain volume change
at a buried position
DAMAGING nsSNPs
Ser 142
PolyPhen: дитя семи нянек
ЦИКЛОП ПОЛИФЕМ ПРЕДСТАВЛЯЛ СОБОЙ
УНИКАЛЬНЫЙ ПОДВИД КАРЛИКОВЫХ СЛОНОВ
Известия-Наука, 18 ноября 2003
Вонзая заостренное бревно в единственный глаз свирепого
циклопа Полифема, легендарный Одиссей истреблял
уникальный вид карликовых слонов, обитавших на острове
Сицилия. Древний миф об одноглазых человекообразных
исполинах развеяли итальянские палеонтологи на научной
экспозиции "Полифем в Модене".
На выставке представлены черепа, обнаруженные исследователями на Сицилии, у
которых одна фронтальная глазница. С первого взгляда она очень напоминает глаз во
лбу. Найденные рядом с черепами кости действительно принадлежат немаленькому
млекопитающему, которое имело габариты крупного медведя. Обладатель этих
останков был не циклопом, а карликовым слоном. "Глаз" во лбу - отверстие для
дыхательных путей, то есть для хобота.
Polyphenism: the ability of a single genome to produce two or more
alternative morphologies within a single population in response to an
environmental cue (such as temperature, photoperiod, or nutrition).
[Dr. Ehab Abouheif, McGill University, Montréal Québec]
The seasonal morphs of the buckeye butterfly, Precis coenia (Nymphalidae). The
ventral surfaces are shown. The Summer morph ("linea") is on the left; the Fall morph
("rosa") is on the right. [Scott F.Gilbert, A Companion to Developmental Biology.
Chapter 22, Seasonal Polyphenism in Butterfly Wings]
Damaging nsSNPs
• We estimate that ~20% of non-synonymous cSNPs
from databases are damaging
• Average allele frequency of non-synonymous cSNPs
predicted to be damaging is twice lower than for benign
non-synonymous cSNPs
• We propose to use these predictions for prioritisation of
candidates for association studies
Download