Petersburg Theodosius Dobzhansky Center for Genome Bioinformatics

advertisement
Theodosius Dobzhansky Center for
Genome Bioinformatics
May 25, 2015
St. Petersburg State University
Founded 1724
"Nothing in Biology makes
any sense except in light of evolution"
Theodosius Dobzhansky
Genomic
Medicine
GWATCH - A new frontier in
Genome Wide Association Analyses
and Data Release
• GWAS
GWAS
G W A TC
H
A suite of genome tools and
programs to discover, view and
assess hits from GWAS and WGS
• Genome
• Wide
• Association
• Tracks
• Chromosome
• Highway
Display Everything
beyond Manhattan
GWATCH
Snapshots2D
3D
Tests
Genetic association Tests
SNPs
2D Snapshot -Chromosome 7 CFTR Region
A heat plot of p-values for
131 ARG association tests
200+ SNPs in each region
~26,000 SNP-test com
in
PROX1 region chr 1
PROX1 2D Snapsho
PROX1 polarized
Promise of GWATC
H
• Automate analyses of GWAS and WGS
• Improve display
• Instant replication of hits without Bonferroni pena
• Open data release without violating patient privacy
•Let’s take a ride
along a
chromosome
GWATCH
Genetic Diversity of the
Russian Mycobacterium
tuberculosis strains
GMTV Database
Ekaterina Chernyaeva
St. Petersburg State University
11.06.2015
19
MTBC History
• MTBC emerged about 70,000 years
ago, accompanied migrations of
anatomically modern humans out of
Africa and expanded as a consequence
of increases in human population
density during the Neolithic period
11.06.2015
20
M. tuberculosis H37Rv genome
SIZE:
GENES:
4411532 b.p.
3993 protein coding
50 RNA coding
GC CONTENT:
65%
INSERTION ELEMENTS:
56 copies of IS-elements
(Families IS3, IS5, IS21, IS30, IS110, IS256, ISL3, IS1535)
MTBC genetic diversity
•
•
11.06.2015
MTBC is characterized by 99.9% similarity of
16S rRNA gene, but differ widely in terms of
their host tropisms, phenotypes and
pathogenicity
14 “Regions of Difference” which discriminate
all MTBC members
GMTV Database
Genome-wide Mycobacterium tuberculosis Variation
Database
http://mtb.dobzhanskycenter.ru
Includes 1084 M. tuberculosis genomes collected across Russia
over 69,000 SNP or Indel variants
Drug resistance information
Clinical Data (diagnosis, HIV-status)
Geographical information
Gender
Year of isolation
11.06.2015
22
TB Centers of surveillance
Countries
UK, USA, China, Canada, Portugal, Germany,
Georgia, Uzbekistan, Netherlands, Malawi,
Uganda, South Africa, Global collection (Ethiopia,
Vietnam, Mexico, South Korea, Pakistan,
Senegal, Cambodia, Gambia, Malaysia, Sri
Lanka, Nepal, India, Ghana, Sierra Leone,
Tanzania, Iran, Afghanistan, Turkey, Singapore,
Burkina Faso, Turkmenistan, Colombia, Puerto
Rico, Nicaragua, Mongolia, Indonesia, Thailand,
Burma, Laos, The Philippines, Guatemala,
Salvador, Somalia)
11.06.2015
24
Clinical Data in GMTV
• HIV-status:
- 11 isolates revealed from HIV-infected patients
- 19 from HIV-negative
• Clinical outcome:
- 12 patients with extrapulmonary TB
- 15 patients with pulmonary TB
- 3 patients with pulmonary and extrapulmonary
localizations.
• Drug resistance:
- 478 multiple drug resistance (MDR)
- 60 Extensive Drug Resistance (XDR)
11.06.2015
25
Comparative
Genomics
10,000
vertebrate genomes
Coming Now
Sequencing 10,000 Vertebrate Species
Of circa 60,000 species
•
•
•
•
•
3000 of 5,000 mammals
2000 of 10,000 birds
1300 of 8,500 reptiles
700 of 6,500 amphibians
3000 of 30,000 fish
Role of G10K Community
of Scientists
• Gather Voucher specimens
• Identify species’ biological communities
• Set Standards for Genome:
– Assembly
– Annotation
– Release on Browser
•
•
•
•
Monitor progress
Rapid data release
Raise funds
Spawn Offspring…..
Felidae Genome Consortium
20,343
O'Brien, S. J. , Wildt, D. E. , Goldman, D. ,
Merril, C. R. , and Bush, M. :
The Cheetah genome-Statistics
• Namibian Cheetah Chewbacca-75X coverage
Illumina HiSeq BGI
• Six cheetahs seqiuenced ~6x coverage
– Tanzania -3 - A. jubatis rainei
– Namibia -3 -A. jubatis jubatis
– 8 mate pair libraries
• N50
– Contigs 28.2 kbp
– Scaffolds 3.1Mbp
• Estimated genome size
– 25x raw reads 2.395 GBP
– Assembly
2.375 Gbp
• Assembly
• SOAP denovo
• Assisted assembly with FCA 6.0
– RH map 3000 markers
– SNP array LM 60,000 markers
SNV frequency in different Mammal
genomes
0,0025
0,002
0,0015
0,001
0,0005
0
Feline
Genome
Project
May 25 2015
FELIDAE GENOMES sequenced or promised
Species
genome sequence
Where done
Publication status plans
Common name
Felis catus
Domestic cat
Wash U
GigaScience 2014;PNAS 2014;Gen Res 2009
Felis silvestris
Wildcat
NIAAA
Analysis
Acinonyx jubatus
Cheetah
BGI
Submitted
Panthera leo
Lion
Korea
Nature Comm 2014
Panthera leo
Af & Asian lion
BGI
Writing up now
Panthera tigris
Tiger
Korea
Nature Comm 2014
Panthera uncia
Snow Leopard
Korea
Nature Comm 2014
Panthera pardus
Leopard
Hudson Alpha; Moleculo Illumina
Just started
Panthera onca
Jaguar
Univ Porte Allegra
Analyses
Lynx pardalis
Iberian Lynx
Spain
Writing up now
Puma concolor
Florida panther &
BGI
Just started
Puma concolor
Western Puma
UCSC
Just started
Neofelis nebulosa
Clouded leopard
Smithsonian
gatherring samples
Neofelis diardi
Sunda Cl.Leopard
Smithsonian
gatherring samples
Prionailuris bengalensis
Leopard cat
Natural History Museum of Denmark,
gatherring samples
Prionailuris viverrina
Fishing cat
Natural History Museum of Denmark,
gatherring samples
Caracal caracal
caracal
Natural History Museum of Denmark,
gatherring samples
Prionailuris rubiginosis
Rusty Spotted cat
Natural History Museum of Denmark,
gatherring samples
Crocuta crocuta
Hyena
BGI
Pending; read done at BGI
dog
Broad
Nature
Chinese pangolin
Wash U
analyses
Malayan pangolin
u Malaysia
Analyses
Canis familiaris
Manis pentadactyla
Manis javanica
Conservation
Genetics
Shujin Luo
&
Jae-Haup Kim
G10K Offspring
1000 fungal genomes
Highlights of Dobzhansky Center 2012-2015
•
•
•
•
•
•
•
•
Center staffed to ~30 employees and occupied labs on
Sredniy Prospekt Oct. 2012
Hosted 6 lab retreats and 30+ science visitors; chaired 6
Conferences
Hosted international web sites.
Led and coordinated Genome10K project
Published > 140 peer reviewed papers in high ranking
journals
Initiated Genome Russia Project 2015-2-2018
Education in courses, Coursera Online, ConGen and G10K.
Accessed and housed sequence and genotypes data from 30
species genomes and 12,000 human study participants in
disease gene association studies
1000 genomes
Studies of human
genomic variation have
great potential to identif
genes that may underlie
differences in disease
resistance (e.g., MHC
region) or drug
metabolism
Цели и задачи:
Genome Russia Mission
• 1. Создать биоибанк тканей и ДНК представителей крупных
этнических групп, живущих в России.
• 2. Составить каталог геномных вариаций в рамках этнических
групп.
• 3. Документированное описание структуры и распределения
генов различных заболеваний, наследуемых
• населением России
• 4. Созадать Русский HapMap - проект, который поможет
исследователям найти гены, связанные с заболеванием
• человека и ответ на лекарственные препараты.
• 5. Вычленение у характреситика специфических изменений
геномной ДНК, характерных лишь для населения России.
• 6. Изучить пути древних географических перемещений предков
современных русских народов.
• 7. Разработать инновационные биоинформатические
алгоритмы и подходы, применимые к анализы болезней,
вызванных охарактеризованными в ходе проекта генов
Международный
проект
HapMap
партнерство ученых и финансирующих
учреждений Канады, Китая, Японии,
Нигерии, Соединенного Королевства и
Соединенных
Штатах
Америки
по
разработке
общественного
ресурса,
который поможет исследователям найти
гены, связанные с заболеванием человека
и ответ на лекарственные препараты.
Русский проект внесет свою лепту в
международные усилия, сконцентрировавшись
на особенностях и разнобразии российского
населения
Russian HapMap
Genome Russia Concortium
21 groups from 14 cities
1000 RU Genomes
`
And now I really
would like to say
"Thanks" to…
G W A TC
H
Russian
Mycobacterium tuberculosis
GMTV Database
Theodosius Dobzhansky Center for
Genome Bioinformatics
St. Petersburg
St. Petersburg State Univers
Founded 1724
Dobzhansky Center Excited author in the process of research.
Download