植物生物信息学培训
Plant Genomic Databases, and useful sites for info about proteins
In this module we'll be exploring several plant databases including Ensembl Plants,
Gramene, PLAZA, SUBA, TAIR and Araport.
The information in these databases allows us to easily identify functional regions within gene products,
view subcellular localization, find homologs in other species,
and even explore pre-computed gene trees to see if our gene of interest
has undergone a gene duplication event in another species, all at the click of a mouse!
Expression Analysis
Vast databases of gene expression and nifty visualization tools allow
us to explore where and when a gene is expressed. Often
this information can be used to help guide a search for a phenotype if we don't see a phenotype
in a gene mutant under "normal" growth conditions.
We explore several tools for Arabidopsis data (eFP Browser, Genevestigator,
TraVA DB, Araport) along with NCBI's Genome Data Viewer for RNA-seq data for other plant species.
We also examine the MPSS database of small RNAs and degradation products
to see if our example gene has any potential microRNA targets.
Coexpression Tools
Being able to group genes by similar patterns of expression across expression
data sets using algorithms like WGCNA is a very useful way of organizing the data.
Clusters of genes with similar patterns of expression can then be subject
to Gene Ontology term enrichment analysis (see Module 5) or examined to see if they are part
of the same pathway. What's even more powerful is being able
to identify genes with similar patterns of expression without doing a single expression profiling experiment,
by mining gene expression databases! There are several tools that allow you to do this in many plant species simply
by entering a query gene identifier. The genes that are returned are often in the same biological process as the query gene,
and thus this "guilt-by-association" paradigm is a excellent tool for hypothesis generation.
Sectional Quiz 1
Promoter AnalysisThe regulation of gene expression
is one of the main ways by which a plant can control the abundance
of a gene product (post-translational modifications and protein degradation
are some others). When and where a gene is expressed is controlled to a large extent
by the presence of short sequence motifs, called cis-elements, present in the promoter
of the gene. These in turn are regulated by transcription factors that perhaps get induced in response
to environmental stresses or during specific developmental programs.
Thus understanding which transcription factors can bind to which promoters can help us understand the role
the downstream genes might be playing in a biological system.
Functional Classification and Pathway Vizualization
Often the results of 'omics experiments are large lists of genes,
such as those that are differentially expressed. We can use a "cherry picking" approach to explore individual genes
in those lists but it's nice to be able to have an automated way of analyzing them.
Here tools for performing Gene Ontology enrichment analysis are invaluable and can tell
you if any particular biological processes or molecular functions are over-represented in your gene list.
We'll explore AgriGO, AmiGO, tools at TAIR and the BAR, and g:Profiler, which all allow you to do such analyses.
Another useful analysis is to be able to map your gene lists (along with associated e.g.
expression values) onto pathway representations, and we'll use AraCyc and MapMan to do this. In this way
it is easy to see if certain biosynthetic reactions are upregulated, which can help you interpret your 'omics data!
Network Exploration (PPIs, PDIs, GRNs)
Molecules inside the cell rarely operate in isolation.
Proteins act together to form complexes, or are part of signal transduction cascades.
Transcription factors bind to cis-elements in promoters or elsewhere and can act as activators or repressors of transcription.
MicroRNAs can affect transcription in other ways. One of the main themes to have emerged
in the past two decades in biology is that of networks. In terms of protein-protein interaction networks,
often proteins that are highly connected with others are crucial for biological function – when these “hubs” are perturbed,
we see large phenotypic effects. The way that transcription factors interact with downstream promoters,
some driving the expression of other transcription factors that in turn regulate genes combinatorially
with upstream transcription factors can have an important biological effect in terms of modulating
the kind of output achieved. The tools described in this lab can help us to explore molecular interactions
in a network context, perhaps with the eventual goal of modeling
the behaviour of a given system.
Sectional Quiz 2 and Final Assignment