4/29/2025

RNA-Seq 實驗設計指引

本⽂件提供 RNA-Seq 實驗設計的全⾯指南，整合了來⾃ Melbourne Bioinformatics、ENCODE、Conesa et al. (2016)、Sims et al. (2014)、Illumina指南的建議，並新增⽂庫建置⽅法選擇及DNA污染影響的指導，旨在確保實驗設計穩健、可重現且具統計檢定⼒。

1. 實驗設計的重要性

良好的 RNA-Seq 實驗設計能減少偏差、提升數據質量並確保結果可重現。需考慮⽣物變異、技術變異及統計檢定⼒。來源：Melbourne Bioinformatics 教程

2. 實驗設計的關鍵原則

2.1 重複

⽣物重複：來⾃不同個體或條件的獨⽴樣本，⽤於捕捉⽣物變異。
建議：每組條件 3–5 個⽣物重複，以確保⾜夠的統計檢定⼒。
技術重複：對同⼀樣本的重複測序。現代 RNA-Seq 技術變異低，通常無需技術重複。
來源：Melbourne Bioinformatics 教程；Conesa et al., 2016 (Genome Biology, DOI: 10.1186/s13059-016-0881-8)

2.2 隨機化

將樣本隨機分配到實驗條件或測序通道，以減少系統性偏差。
來源：Melbourne Bioinformatics 教程

2.3 對照組

設置適當的對照組（如未處理組或假處理組），以便與實驗條件進⾏⽐較。
來源：Melbourne Bioinformatics 教程

2.4 配對/分組設計

使⽤配對或分組設計控制已知⼲擾因素（如批次效應或個體變異）。
例如：同⼀受試者的治療前後樣本。
來源：Melbourne Bioinformatics 教程

3. 樣本數量與測序深度

3.1 樣本數量

優先增加⽣物重複數量，⽽⾮僅提升測序深度，因⽣物變異是主要變異來源。
使⽤統計⼯具（如 Scotty、RNASeqPower）根據預期效應⼤⼩和變異估算所需樣本數。
來源：Melbourne Bioinformatics 教程；Conesa et al., 2016

3.2 測序深度

差異表達分析：

每樣本 10–30 百萬 reads ⾜以檢測⼈類轉錄組中中⾼表達基因。
來源：Conesa et al., 2016；Melbourne Bioinformatics 教程

低表達基因或轉錄組組裝：

需更⾼測序深度，以確保低豐度轉錄本的覆蓋率或重建複雜轉錄組。
建議：每樣本 50–100 百萬 reads，某些實驗可能需⾼達 200 百萬 reads，具體取決於基因組複雜性和分析⽬標。
來源：Sims et al., 2014 (Nature Reviews Genetics, DOI: 10.1038/nrg3642)；Illumina RNA-Seq 指南

通⽤範圍：每樣本 5–200 百萬 reads，依⽣物體複雜性、轉錄組⼤⼩及項⽬⽬標⽽定。

來源：Illumina RNA-Seq 指南

使⽤ Scotty 或 RNASeqPower 等⼯具根據試驗數據估算最佳深度。

來源：Melbourne Bioinformatics 教程

4. ⽂庫建置⽅法選擇

⽂庫建置是 RNA-Seq 實驗的關鍵步驟，影響數據質量和分析結果。選擇⽂庫建置⽅法需考慮實驗⽬標、樣本類型、RNA 質量、測序平台及DNA污染。

4.1 常⾒⽂庫建置⽅法

Poly-A 富集：

適⽤場景：檢測編碼 mRNA 的基因表達（如差異表達分析）。
優點：富集成熟 mRNA，減少核糖體 RNA（rRNA）⼲擾，適合⾼質量 RNA 樣本。限制：無法檢測⾮聚腺苷酸化的 RNA（如⻑⾮編碼 RNA 或某些前體 mRNA）。
建議：適⽤於⼤多數標準 RNA-Seq 實驗，特別是⼈類或哺乳動物樣本。

rRNA 去除：

適⽤場景：全⾯轉錄組分析，包括⾮編碼 RNA、pre-mRNA 或低質量/降解 RNA 樣本。
優點：保留更多 RNA 種類，適合研究⾮聚腺苷酸化轉錄本或微⽣物轉錄組。
限制：rRNA 去除效率可能不完全，成本較⾼。
建議：適⽤於轉錄組組裝或研究複雜轉錄組（如植物、細菌）。

總 RNA 測序：

適⽤場景：無需富集或去除，直接測序所有 RNA。
優點：簡單，適⽤於低起始量 RNA 或特殊樣本。
限制：rRNA 占⽐⾼，需更⾼測序深度以覆蓋⽬標 RNA。
建議：適合特殊實驗或無需特定 RNA 富集的場景。

⼩ RNA 測序：

適⽤場景：研究 miRNA、siRNA 等⼩ RNA。
優點：專注於⼩分⼦ RNA，適合調控網絡研究。
限制：需要專⽤⽂庫建置試劑盒，且無法檢測⻑ RNA。
建議：⽤於⼩ RNA 專項研究。

單細胞 RNA-Seq：

適⽤場景：分析單細胞或低輸⼊量 RNA。
優點：揭⽰細胞異質性，適合稀有細胞研究。
限制：技術複雜，成本⾼，需專⽤試劑盒（如 10x Genomics）。
建議：適⽤於單細胞轉錄組學研究。

4.2 選擇⽂庫建置⽅法的考慮因素

實驗⽬標：

差異表達分析：Poly-A 富集通常⾜夠。
轉錄組組裝或⾮編碼 RNA 研究：rRNA 去除或總 RNA 測序。
⼩ RNA 研究：⼩ RNA ⽂庫建置。

RNA 質量：

⾼質量 RNA（RIN ≥ 7）：Poly-A 富集或 rRNA 去除。
降解 RNA（如 FFPE 樣本）：rRNA 去除或總 RNA 測序。

起始 RNA 量：

標準量（>100 ng）：⼤多數⽅法適⽤。
低輸⼊量（<10 ng）：單細胞或低輸⼊量專⽤試劑盒。

測序平台：

Illumina：兼容⼤多數⽂庫建置⽅法。
⻑讀⻑測序（如 PacBio、Nanopore）：rRNA 去除或總 RNA 測序更適合。

DNA 污染：

Poly-A 富集對 DNA 污染的影響較⼩，因其特異性結合 poly-A 尾，可排除基因組 DNA。
rRNA 去除或總 RNA 測序受 DNA 污染影響較⼤，需嚴格 DNase 處理。
建議：在 RNA 提取後進⾏ DNase 處理，並通過 qPCR 或 Bioanalyzer 檢查 DNA 污染。

成本與效率：

Poly-A 富集成本較低，適合標準實驗。
rRNA 去除或單細胞 RNA-Seq 成本較⾼，需權衡預算。

4.3 實⽤建議

選擇商業試劑盒（如 Illumina TruSeq、NEBNext）以確保穩定性和重現性。進⾏試驗性⽂庫建置，測試⽂庫質量（如⽚段⼤⼩分佈）。
記錄⽂庫建置參數（如接頭序列、PCR 循環數），便於後續分析。
來源：Illumina RNA-Seq 指南；ENCODE RNA-Seq 指南；Conesa et al., 2016；Kukurba & Montgomery, 2015 (Cold Spring Harbor Protocols, DOI: 10.1101/pdb.top084970)

5. 常⾒實驗設計類型

簡單設計：單因素⽐較（如處理組 vs. 對照組）。
多因素設計：研究多個變量（如處理、時間點、基因型）。
時間序列設計：分析基因表達隨時間的變化。
配對設計：控制個體間變異（如同⼀受試者的治療前後樣本）。
來源：Melbourne Bioinformatics 教程

6. 批次效應管理

定義：由測序批次、試劑或操作⼈員差異引起的變異，可能掩蓋⽣物信號。
策略：

將樣本隨機分配到批次。
若可⾏，將所有樣本置於同⼀批次測序。
在統計模型中將批次作為協變量（如使⽤ DESeq2、limma）。
來源：Melbourne Bioinformatics 教程；Conesa et al., 2016

7. 其他注意事項

7.1 樣本質量

確保 RNA 完整性，建議 RNA 完整性數值（RIN）≥ 7，以避免數據偏差。對於更嚴格的應⽤，RIN ≥ 8 更為理想。
來源：ENCODE RNA-Seq 指南；Illumina RNA-Seq指南

7.2 元數據

記錄樣本的詳細信息（如處理條件、採集時間），便於後續分析。
來源：Melbourne Bioinformatics 教程

7.3 試驗性實驗

進⾏⼩規模試驗性實驗，優化設計參數（如樣本數、測序深度、⽂庫建置⽅法）。
來源：Melbourne Bioinformatics 教程

7.4 ⽣物信息學分析計劃

提前計劃數據分析流程，選擇合適的⼯具（如 DESeq2、edgeR）和統計⽅法。
使⽤模擬數據或公開數據集（如 GEO、ArrayExpress）測試設計可⾏性。
來源：Melbourne Bioinformatics 教程；Conesa et al., 2016

8. 實⽤建議

與⽣物信息學家或統計學家合作，確保設計符合分析需求。
參考公開- 公開資源（如 GEO、ArrayExpress）獲取相似實驗的設計靈感。
採⽤ ENCODE 數據標準進⾏質量控制和分析流程開發。
來源：Melbourne Bioinformatics 教程；ENCODE RNA-Seq 指南

9. 總結

穩健的 RNA-Seq 實驗需重視重複、隨機化、對照組、測序深度及⽂庫建置。每組條件建議 3–5 個⽣物重複，差異表達分析使⽤ 10–30 百萬reads，低表達基因或轉錄組組裝需 50–100 百萬 reads。Scotty 和 RNASeqPower 等⼯具可協助估算樣本數和測序深度。

⽂庫建置⽅法應根據實驗⽬標選擇（如 Poly-A 富集⽤於差異表達，rRNA 去除⽤於轉錄組組裝）。Poly-A 富集對 DNA 污染影響較⼩，但仍需 DNase 處理以確保數據質量。通過隨機化和統計建模控制批次效應，確保樣本質量（RIN ≥ 7）和詳細元數據。

參考⽂獻

1. Melbourne Bioinformatics RNA-Seq 實驗設計教程：

https://www.melbournebioinformatics.org.au/tutorials/tutorials/rna_seq_exp_design/rna_seq_experimental_design/

2. Conesa, A., et al. (2016). A survey of best practices for RNA-seq data analysis. Genome Biology, 17, 13. DOI: 10.1186/s13059-016-0881-8

3. Sims, D., et al. (2014). Sequencing depth and coverage: key considerations in genomic analyses. Nature Reviews Genetics, 15, 121–132. DOI: 10.1038/nrg3642

4. Illumina RNA-Seq 指南：https://www.illumina.com/content/dam/illumina-

marketing/documents/products/illumina_sequencing_introduction.pdf

5. ENCODE RNA-Seq 指南：https://www.encodeproject.org/rna-seq/

6. Kukurba, K. R., & Montgomery, S. B. (2015). RNA sequencing and analysis. Cold Spring Harbor Protocols, 2015(11), pdb.top084970. DOI: 10.1101/pdb.top084970

7. Levin, J. Z., et al. (2010). Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nature Methods, 7(9), 709–715. DOI: 10.1038/nmeth.f.303

RNA-Seq Experimental Design Guidelines

This document provides a comprehensive guide for designing RNA-Seq experiments, synthesizing recommendations from authoritative sources including Melbourne Bioinformatics, ENCODE, Conesa et al. (2016), Sims et al. (2014), and Illumina guidelines. It includes guidance on library

preparation method selection and considerations for DNA contamination to ensure robust, reproducible, and statistically powerful RNA-Seq experiments.

1.Importance of Experimental Design

A well-designed RNA-Seq experiment minimizes bias, enhances data quality, and ensures reproducibility. Key considerations include biological and technical variability, statistical power, and downstream analysis requirements.

Source: Melbourne Bioinformatics Tutorial

(https://www.melbournebioinformatics.org.au/tutorials/tutorials/rna_seq_exp_design/rna_seq_experimental_design/)

2.Key Principles of Experimental Design

2.1 Replication

Biological Replicates: Independent samples from different individuals or conditions to capture biological variability. Recommended: 3–5 biological replicates per condition to ensure sufficient statistical power.
Technical Replicates: Repeated sequencing of the same sample. Generally unnecessary in modern RNA-Seq due to low technical variability.
Source: Melbourne Bioinformatics Tutorial; Conesa et al., 2016 (Genome Biology, DOI: 10.1186/s13059-016-0881-8)

2.2 Randomization

Randomly assign samples to experimental conditions or sequencing lanes to reduce systematic bias.
Source: Melbourne Bioinformatics Tutorial

2.3 Controls

Include appropriate controls (e.g., untreated or mock-treated samples) to enable comparison with experimental conditions.
Source: Melbourne Bioinformatics Tutorial

2.4 Blocking/Paired Design

Use paired or blocked designs to control for known confounding factors (e.g., batch effects or individual variability).
Example: Pre- and post-treatment samples from the same subject.
Source: Melbourne Bioinformatics Tutorial

3.Sample Size and Sequencing Depth

3.1 Sample Size

Prioritize more biological replicates over increased sequencing depth, as biological variability is the primary source of variation.
Use statistical tools (e.g., Scotty, RNASeqPower) to estimate required sample size based on expected effect sizes and variability.
Source: Melbourne Bioinformatics Tutorial; Conesa et al., 2016

3.2 Sequencing Depth

Differential Expression Analysis:

10–30 million reads per sample is sufficient for detecting moderately to highly expressed genes in human transcriptomes.
Source: Conesa et al., 2016; Melbourne Bioinformatics Tutorial

Lowly Expressed Genes or Transcriptome Assembly:

Higher sequencing depth is required to ensure sufficient coverage of low-abundance transcripts or to reconstruct complex transcriptomes.
Recommended: 50–100 million reads per sample, though some experiments may require up to 200 million reads depending on genome complexity and analysis goals.
Source: Sims et al., 2014 (Nature Reviews Genetics, DOI: 10.1038/nrg3642); Illumina RNA-Seq Guidelines (https://www.illumina.com/content/dam/illumina-marketing/documents/products/illumina_sequencing_introduction.pdf)

General Range: 5–200 million reads per sample, depending on organism complexity, transcriptome size, and project aims.

Source: Illumina RNA-Seq Guidelines

Use tools like Scotty or RNASeqPower to estimate optimal depth based on pilot data.

Source: Melbourne Bioinformatics Tutorial

4. Library Preparation Method Selection

Library preparation is a critical step in RNA-Seq experiments, directly impacting data quality and analysis outcomes. The choice of library preparation method depends on experimental goals, sample type, RNA quality, sequencing platform, and DNA contamination.

4.1 Common Library Preparation Methods

Poly-A Enrichment:

Applications: Detecting coding mRNA expression (e.g., differential expression analysis).
Advantages: Enriches mature mRNA, reduces ribosomal RNA (rRNA) interference, suitable for high-quality RNA samples.
Limitations: Misses non-polyadenylated RNAs (e.g., long non-coding RNAs, some pre-mRNAs).
Recommendation: Ideal for standard RNA-Seq experiments, especially for human or mammalian samples.

rRNA Depletion:

Applications: Comprehensive transcriptome analysis, including non-coding RNAs, pre-mRNAs, or degraded RNA samples.
Advantages: Captures a broader range of RNA types, suitable for non-polyadenylated transcripts or microbial transcriptomes.
Limitations: rRNA depletion may be incomplete, higher cost.
Recommendation: Suitable for transcriptome assembly or complex transcriptome studies (e.g., plants, bacteria).

Total RNA Sequencing:

Applications: Sequencing all RNA without enrichment or depletion.
Advantages: Simple, suitable for low-input RNA or specialized samples.
Limitations: High rRNA content requires greater sequencing depth to cover target RNAs.
Recommendation: Used for specific experiments or when RNA enrichment is not needed.

Small RNA Sequencing:

Applications: Studying miRNAs, siRNAs, or other small RNAs.
Advantages: Focused on small RNA molecules, ideal for regulatory network studies.
Limitations: Requires specialized kits, not suitable for long RNAs.
Recommendation: Used for small RNA-specific studies.

Single-Cell RNA-Seq:

Applications: Analyzing single cells or low-input RNA.
Advantages: Reveals cellular heterogeneity, ideal for rare cell studies.
Limitations: Technically complex, high cost, requires specialized kits (e.g., 10x Genomics).
Recommendation: Used for single-cell transcriptomics.

4.2 Considerations for Choosing a Library Preparation Method

Experimental Goals:

Differential expression: Poly-A enrichment is typically sufficient.
Transcriptome assembly or non-coding RNA studies: rRNA depletion or total RNA sequencing.
Small RNA studies: Small RNA library preparation.

RNA Quality:

High-quality RNA (RIN ≥ 7): Poly-A enrichment or rRNA depletion.
Degraded RNA (e.g., FFPE samples): rRNA depletion or total RNA sequencing.

Starting RNA Amount:

Standard input (>100 ng): Most methods are applicable. Low input (<10 ng): Single-cell or low-input kits.

Sequencing Platform:

Illumina: Compatible with most library preparation methods.
Long-read sequencing (e.g., PacBio, Nanopore): rRNA depletion or total RNA sequencing preferred.

DNA Contamination:

Poly-A enrichment is less affected by DNA contamination due to its specific binding to poly-A tails, effectively excluding genomic DNA. rRNA depletion or total RNA sequencing is more susceptible to DNA contamination, requiring stringent DNase treatment.
Recommendation: Perform DNase treatment post-RNA extraction and verify absence of DNA contamination using qPCR or Bioanalyzer.

Cost and Efficiency:

Poly-A enrichment is cost-effective for standard experiments.
rRNA depletion or single-cell RNA-Seq is more expensive, requiring budget consideration.

4.3 Practical Recommendations

Use commercial kits (e.g., Illumina TruSeq,NEBNext) for consistency and reproducibility.
Perform pilot library preparation to assess library quality (e.g., fragment size distribution).
Document library preparation parameters (e.g., adapter sequences, PCR cycles) for downstream analysis.
Source: Illumina RNA-Seq Guidelines; ENCODE RNA-Seq Guidelines; Conesa et al., 2016; Kukurba & Montgomery, 2015 (Cold Spring Harbor Protocols, DOI: 10.1101/pdb.top084970)

5. Common Experimental Design Types

Simple Design: Single-factor comparison (e.g., treated vs. control).
Multi-factor Design: Multiple variables (e.g., treatment, time points, genotypes).
Time-series Design: Analyze gene expression changes over time.
Paired Design: Control for individual variability (e.g., pre- and post-treatment samples from the same subject).
Source: Melbourne Bioinformatics Tutorial

6. Managing Batch Effects

Definition: Variability introduced by sequencing batches, reagents, or operators, which can obscure biological signals.
Strategies:

Randomize samples across batches.
Sequence all samples in a single batch if feasible.
Include batch as a covariate in statistical models (e.g., DESeq2, limma).

Source: Melbourne Bioinformatics Tutorial; Conesa et al., 2016

7. Additional Considerations

7.1 Sample Quality

Ensure high RNA integrity, with a recommended RNA Integrity Number (RIN) ≥ 7 to avoid biases in sequencing data. For more stringent applications, RIN ≥ 8 is preferred.
Source: ENCODE RNA-Seq Guidelines Illumina RNA-Seq Guidelines

7.2 Metadata

Record detailed sample information (e.g., treatment conditions, collection time) to facilitate downstream analysis.
Source: Melbourne Bioinformatics Tutorial

7.3 Pilot Studies

Conduct small-scale pilot experiments to optimize design parameters (e.g., sample size, sequencing depth, library preparation method).
Source: Melbourne Bioinformatics Tutorial

7.4 Bioinformatics Analysis Plan

Plan data analysis in advance, selecting appropriate tools (e.g., DESeq2, edgeR) and statistical methods.
Use simulated or public datasets (e.g., GEO, ArrayExpress) to test design feasibility.
Source: Melbourne Bioinformatics Tutorial; Conesa et al., 2016

8.Practical Recommendations

Collaborate with bioinformaticians or statisticians to ensure the design meets analysis requirements.
Refer to public resources (e.g., GEO, ArrayExpress) for design inspiration from similar experiments.
Adopt ENCODE data standards for quality control and pipeline development.
Source: Melbourne Bioinformatics Tutorial; ENCODE RNA-Seq Guidelines

9. Summary

A robust RNA-Seq experiment requires careful consideration of replication, randomization, controls, sequencing depth, and library preparation. Prioritize 3–5 biological replicates per condition, use 10–30 million reads for differential expression, and 50–100 million reads for lowly expressed genes or transcriptome assembly. Select library preparation methods based on experimental goals (e.g., Poly-A enrichment for differential expression, rRNA depletion for transcriptome assembly). Poly-A enrichment is less affected by DNA contamination, but DNase treatment is recommended to ensure data quality. Control batch effects through randomization and statistical modeling, and ensure high sample quality (RIN ≥ 7) and detailed metadata. Tools like Scotty and RNASeqPower can guide sample size and depth estimation.

References

1. Melbourne Bioinformatics RNA-Seq Experimental Design Tutorial:

https://www.melbournebioinformatics.org.au/tutorials/tutorials/rna_seq_exp_design/rna_seq_experimental_design/

2. Conesa, A., et al. (2016). A survey of best practices for RNA-seq data analysis. Genome Biology, 17, 13. DOI: 10.1186/s13059-016-0881-8

3. Sims, D., et al. (2014). Sequencing depth and coverage: key considerations in genomic analyses. Nature Reviews Genetics, 15, 121–132. DOI: 10.1038/nrg3642

4. Illumina RNA-Seq Guidelines: https://www.illumina.com/content/dam/illumina-marketing/documents/products/illumina_sequencing_introduction.pdf

5. ENCODE RNA-Seq Guidelines: https://www.encodeproject.org/rna-seq/

6. Kukurba, K. R., & Montgomery, S. B. (2015). RNA sequencing and analysis. Cold Spring Harbor Protocols, 2015(11), pdb.top084970. DOI: 10.1101/pdb.top084970

7. Levin, J. Z., et al. (2010). Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nature Methods, 7(9), 709–715. DOI: 10.1038/nmeth.f.303