Visualizing Differential Expression with SeqMonk — Step-by-Step Guide
This guide walks you through preparing data, running differential expression (DE) analysis, and visualizing results in SeqMonk (assumes basic familiarity with mapped read files and gene annotations). Steps use reasonable defaults so you can follow along without extra setup.
1. Prepare input files
- Mapped reads: Use BAM files (sorted, indexed). One BAM per sample.
- Annotation: GTF/GFF or a gene list compatible with SeqMonk.
- Experimental design: Decide sample groups (e.g., control vs treated) and ensure consistent naming.
2. Create a new project and import data
- Open SeqMonk → File → New Project.
- Import your annotation: Data → Import annotation → load GTF/GFF.
- Import BAM files: Data → Import mapped data → select all BAMs. SeqMonk will index and register them.
3. Define features (probes)
SeqMonk needs probes to quantify reads. For gene-level DE use annotated exons or generate custom probes.
- Option A — Use annotation probes:
- Data → Features → Build features from annotation → choose “Genes” (or exons) → OK.
- Option B — Create probes from genome windows:
- Data → Features → Create probes → set window size (e.g., 1 kb) → OK.
- Option C — Import a custom probe list (if you have a specific gene list).
4. Quantify reads across probes
- Data → Quantitation → Annotated probes → choose “Read counts” (or RPKM/TPM if you prefer normalization).
- Use default strand settings unless your data are stranded (set appropriately).
- SeqMonk calculates counts per probe per sample and stores them in the project.
5. Normalize counts
Normalization is essential before DE testing.
- Data → Transformations → Read counts → choose “Depth correction” (or “Counts per million”).
- For between-sample normalization suitable for DE, use:
- Data → Normalization → DESeq/DESeq2 or TMM (if available in your SeqMonk version) — pick DESeq if unsure.
- Apply the normalization; SeqMonk will store normalized values.
6. Set up groups and replicates
- Data → Sample grouping → Create group from sample name pattern OR manually assign samples to groups (e.g., Control, Treated).
- Ensure replicates are correctly assigned and balanced where possible.
7. Run differential expression analysis
- Data → Statistical analysis → Differential expression.
- Select the quantitation to test (e.g., normalized counts).
- Choose statistical test:
- Use DESeq (or DESeq2) for count data with replicates — default recommendation.
- Use Mann–Whitney or t-test only for simple, small comparisons where parametric assumptions hold.
- Set significance thresholds (default p-value 0.05, adjust for multiple testing using Benjamini–Hochberg FDR).
- Run analysis. SeqMonk produces a results table with log fold changes, p-values, and adjusted p-values.
8. Inspect and filter results
- Open the DE results table: sort by adjusted p-value or log fold change.
- Filter to genes meeting thresholds, e.g.:
- Adjusted p-value < 0.05 and |log2 fold change| ≥ 1.
- Export filtered lists: Data → Export → export selections as CSV/TSV for downstream use.
9. Visualize differential expression
SeqMonk offers several visualization modes:
-
Scatter plots (MA plots):
- View → Plot → MA plot or Scatter plot.
- X-axis: mean expression; Y-axis: log fold change. Color significant points for clarity.
-
Volcano plots:
- View → Plot → Volcano plot (if available) or produce a scatter of log2FC vs -log10(p-value).
- Highlight significant genes with a different color or label top hits.
-
Heatmaps:
- Data → Heatmap → choose normalized quantitation and select genes of interest (e.g., top 50 DE genes).
- Configure clustering (hierarchical) and scaling (row z-score) to reveal patterns.
-
Genome browser tracks:
- Double-click a gene to open the probe view and inspect per-sample coverage tracks.
- Useful to validate DE candidates visually for consistent coverage differences.
10. Annotate and export figures
- Add gene labels to plots for top hits.
- Export plots/images: File → Export view or use the PNG/PDF export options for high-resolution figures.
- Export the full results table or selected gene lists for pathway analysis.
11. Quick troubleshooting
- Low replicate number: DE tests may lack power; report effect sizes and avoid overinterpreting marginal p-values.
- Batch effects: If detected, include batch as a covariate in the design if supported or correct externally (e.g., limma/voom).
- Zero inflation / low counts: Filter out probes with very low counts across all samples before running DE.
12. Example recommended workflow (default assumptions)
- Import BAMs + GTF.
- Build gene probes.
- Quantify read counts.
- Filter probes with sum counts < 10.
- Normalize with DESeq.
- Group samples (Control vs Treated).
- Run DESeq differential expression.
- Filter by adj. p-value < 0.05 and |log2FC| ≥ 1.
- Generate volcano plot and heatmap of top 50 genes.
- Export figures and gene list for enrichment analysis.
If you want, I can produce an exact SeqMonk menu-click list or a brief script-style checklist for your specific sample names and thresholds.
Leave a Reply