1
|
|
2
|
- Maps of genomes can be divided into 2 types
- -Genetic maps
- -Abstract maps that place the relative location of genes on
chromosomes based on recombination frequency
- -Physical maps
- -Use landmarks within DNA sequences, ranging from restriction sites
to the actual DNA sequence
|
3
|
- Distances between “landmarks” are measured in base-pairs
- -1000 basepairs (bp) = 1 kilobase (kb)
- Knowledge of DNA sequence is not necessary
- There are three main types of physical maps
- -Restriction maps
- -Cytological maps
- -Radiation hybrid maps
|
4
|
- Restriction maps
- -The first physical maps
- -Based on distances between restriction sites
- -Overlap between smaller segments can be used to assemble them into a
contig
- -Continuous segment of the genome
|
5
|
|
6
|
- Cytological maps
- -Employ stains that generate reproducible patterns of bands on the
chromosomes
- -Divide chromosomes into subregions
- -Provide a map of the whole genome, but at low resolution
- -Cloned DNA is correlated with map using fluorescent in situ
hybridization (FISH)
|
7
|
|
8
|
- Radiation hybrid maps
- -Use radiation to fragment chromosomes randomly
- -Fragments are then recovered by fusing irradiated cell to another cell
- -Usually a rodent cell
- -Fragments can be identified based on banding patterns or FISH
|
9
|
- Sequence-tagged sites
- -An STS is a small stretch of DNA that is unique in the genome
- -Only 200-500 bp
- -Boundary is defined by PCR primers
- -Identified using any DNA as a template
- -STSs essentially provide a scaffold for assembling genome sequences
|
10
|
|
11
|
- Genetic maps are measured in centimorgans
- -1 cM = 1% recombination frequency
- Linkage mapping can be done without knowing the DNA sequence of a gene
- -Limitations:
- 1. Genetic distance does not directly correspond to actual physical
distance
- 2. Not all genes have obvious phenotypes
|
12
|
- Most common markers are short repeat sequences called, short tandem
repeats, or STR loci
- -Differ in repeat length between individuals
- -13 form the basis of modern DNA fingerprinting developed by the FBI
- -Cataloged in the CODIS database to identify criminal offenders
|
13
|
- Genetic and physical maps can be correlated
- -Any cloned gene can be placed within the genome and can also be mapped
genetically
|
14
|
- All of these different kinds of maps are stored in databases
- -The National Center for Biotechnology Information (NCBI) serves as the
US repository for these data and more
- -Similar databases exist in Europe and Japan
|
15
|
- The ultimate physical map is the base-pair sequence of the entire genome
|
16
|
- Sequencers provide accurate sequences for DNA segments up to 800 bp long
- -To reduce errors, 5-10 copies of a genome are sequenced and compared
- Vectors use to clone large pieces of DNA:
- -Yeast artificial chromosomes (YACs)
- -Bacterial artificial chromosomes (BACs)
- -Human artificial chromosomes (HACs)
- -Are circular, at present
|
17
|
- Clone-by-clone sequencing
- -Overlapping regions between BAC clones are identified by restriction
mapping or STS analysis
- Shotgun sequencing
- -DNA is randomly cut into smaller fragments, cloned and then sequenced
- -Computers put together the overlaps
- -Sequence is not tied to other information
|
18
|
|
19
|
- Originated in 1990 by the International Human Genome Sequencing
Consortium
- Craig Venter formed a private company, and entered the “race” in May,
1998
- In 2001, both groups published a draft sequence
- -Contained numerous gaps
|
20
|
- In 2004, the “finished” sequence was published as the reference sequence
(REF-SEQ) in databases
- -3.2 gigabasepairs
- -1 Gb = 1 billion basepairs
- -Contains a 400-fold reduction in gaps
- -99% of euchromatic sequence
- -Error rate = 1 per 100,000 bases
|
21
|
- The Human Genome Project found fewer genes than expected
- -Initial estimate was 100,000 genes
- -Number now appears to be about 25,000!
- In general, eukaryotic genomes are larger and have more genes than those
of prokaryotes
- -However, the complexity of an organism is not necessarily related to
its gene number
|
22
|
|
23
|
- Genes are identified by open reading frames
- -An ORF begins with a start codon and contains no stop codon for a
distance long enough to encode a protein
- Sequence annotation
- -The addition of information, such as ORFs, to the basic sequence
information
|
24
|
- BLAST
- -A search algorithm used to search NCBI databases for homologous
sequences
- -Permits researchers to infer functions for isolated molecular clones
- Bioinformatics
- -Use of computer programs to search for genes, and to assemble and
compare genomes
|
25
|
- Genomes consist of two main regions
- -Coding DNA
- -Contains genes than encode proteins
- -Noncoding DNA
- -Regions that do not encode proteins
|
26
|
- Four different classes are found:
- -Single-copy genes : Includes most genes
- -Segmental duplications : Blocks of genes copied from one chromosome to
another
- -Multigene families : Groups of related but distinctly different genes
- -Tandem clusters : Identical copies of genes occurring together in
clusters
- -Also include rRNA genes
|
27
|
- Each cell in our bodies has about 6 feet of DNA stuffed into it
- -However, less than one inch is devoted to genes!
- Six major types of noncoding human DNA have been described
|
28
|
- Noncoding DNA within genes
- -Protein-encoding exons are embedded within much larger noncoding introns
- Structural DNA
- -Called constitutive heterochromatin
- -Localized to centromeres and telomeres
- Simple sequence repeats (SSRs)
- -One- to six-nucleotide sequences repeated thousands of times
|
29
|
- Segmental duplications
- -Consist of 10,000 to 300,000 bp that have duplicated and moved
- Pseudogenes
- -Inactive genes
|
30
|
- Transposable elements (transposons)
- -Mobile genetic elements
- -Four types:
- -Long interspersed elements (LINEs)
- -Short interspersed elements (SINEs)
- -Long terminal repeats (LTRs)
- -Dead transposons
|
31
|
|
32
|
- ESTs can identify genes that are expressed
- -They are generated by sequencing the ends of randomly selected cDNAs
- ESTs have identified 87,000 cDNAs in different human tissues
- -But how can 25,000 human genes encode three to four times as many
proteins?
- -Alternative splicing yields different proteins with different
functions
|
33
|
|
34
|
- Single-nucleotide polymorphisms (SNPs) are sites where individuals
differ by only one nucleotide
- -Must be found in at least 1% of population
- Haplotypes are regions of the chromosome that are not exchanged by
recombination
- -Tendency for genes not to be randomized is called linkage
disequilibrium
- -Can be used to map genes
|
35
|
|
36
|
- Comparative genomics, the study of whole genome maps of organisms, has
revealed similarities among them
- -For example, over half of Drosophila genes have human counterparts
- Synteny refers to the conserved arrangements of DNA segments in related
genomes
- -Allows comparisons of unsequenced genomes
|
37
|
|
38
|
|
39
|
- Organellar genomes
- -Mitochondria and chloroplasts are
descendants of ancient endosymbiotic bacterial cells
- -Over time, their genomes exchanged genes with the nuclear genome
- -Both organelles contain polypeptides encoded by the nucleus
|
40
|
- Functional genomics is the study of the function of genes and their
products
- DNA microarrays (“gene chips”) enable the analysis of gene expression at
the whole-genome level
- -DNA fragments are deposited on a slide
- -Probed with labeled mRNA from different sources
- -Active/inactive genes are identified
|
41
|
|
42
|
- Transgenics is the creation of organisms containing genes from other
species (transgenic organisms
- -Can be used to determine whether:
- -A gene identified by an annotation program is really functional in
vivo
- -Homologous genes from different species have the same function
|
43
|
|
44
|
- Proteomics is the study of the proteome
- -All the proteins encoded by the genome
- The transcriptome consists of all the RNA that is present in a cell or
tissue
|
45
|
- Proteins are much more difficult to study than DNA because of:
- -Post-translational modifications
- -Alternative splicing
- However, databases containing the known protein structural motifs exist
- -These can be searched to predict the structure and function of gene
sequences
|
46
|
|
47
|
- Protein microarrays are being used to study large numbers of proteins
simultaneously
- -Can be probed using:
- -Antibodies to specific proteins
- -Specific proteins
- -Small molecules
- The yeast two-hybrid system has generated large-scale maps of
interacting proteins
|
48
|
- The genomics revolution will have a lasting effect on how we think about
living systems
- The immediate impact of genomics is being seen in diagnostics
- -Identifying genetic abnormalities
- -Identifying victims by their remains
- -Distinguishing between naturally occurring and intentional outbreaks
of infections
|
49
|
|
50
|
- Genomics has also helped in agriculture
|
51
|
- Genome science is also a source of ethical challenges and dilemmas
- -Gene patents
- -Should the sequence/use of genes be freely available or can it be
patented?
- -Privacy concerns
- -Could one be discriminated against because their SNP profile
indicates susceptibility to a disease?
|