您的位置：首页 > 其它

Iso-Seq学习

2016-02-27 00:01 453 查看

SMRT portal安装教程:
http://www.pacb.com/wp-content/uploads/2015/09/SMRT-Analysis-Software-Installation-v2.3.0.pdf
ISO-seq数据地址：

/share/backups/pacbio/20160222_68 的 A01 和 B01。

<1kb的得到1.28G数据，>1kb的得到了2.8G的数据。

SMRT portal 地址：

http://59.79.232.10:8080/smrtportal/#/Design-Job/ 软件安装主目录：
/share/workplace/software/PACBIO

reference_droplist: :

/share/workplace/software/PACBIO/userdata/references_dropbox

username: pbuser
password: pacbio-one2three

学习目的：对这两个cell收集一下结果（多少reads，多少全长reads，多少isoform，SMRT-portal的报告都有。

ISOseq数据比对到参考基因组

文本教程参见：

https://github.com/PacificBiosciences/cDNA_primer/wiki

视频教程：
http://www.pacb.com/training/IsoformSequencingIsoSeqOverview/story.html

THE CHALLENGE OF ISOFORM RECONSTRUCTION

简单的说就是二代测序无法有效区分同一个transcript的单倍型！

In eukaryotic organisms, the majority of genes are alternatively spliced to produce multiple transcript isoforms, dramatically increasing the protein-coding potential of a genome.

Alternatively spliced isoforms from the same gene can have significantly different, even antagonistic, effects. To study gene expression, researchers have looked at fragments of an organism’s genes utilizing next-generation sequencing methods, commonly referred to as RNA sequencing (RNA-seq). However, short-read RNA-seq cannot span full-length transcripts, making it difficult to accurately characterize the diverse landscape of isoforms.

Produce full-length transcripts without assembly

简单的说就是三代测序能直接把一个单倍型测穿。这就是ISOseq

The isoform sequencing (Iso-Seq) application generates full-length cDNA sequences — from the 5’ end of transcripts to the poly-A tail — eliminating the need for transcriptome reconstruction using isoform-inference algorithms. The Iso-Seq method generates accurate information about alternatively spliced exons and transcriptional start sites. It also delivers information about poly-adenylation sites for transcripts up to 10 kb in length across the full complement of isoforms within targeted genes or the entire transcriptome.

Iso-Seq的目的就是： understand transcriptome complexity using accurate, unassembled, full-length long reads.

实验室测序出来的数据目录结构：

Analysis_Results下的文件：

正确的数据结构如下：

注意metadata.xml文件和子目录下的bax.h5文件。

对于数据的处理有三种方式，一种是通过RS_isoseq SMRT portal, 一种是github code，一种是RS_isoseq 明令行。三者的主要区别如下：

The differences between the GitHub code and the

RS_IsoSeq

code are:

GitHub code requires you to set up a virtual environment and install all libraries on your own

GitHub code is more step-by-step and allows more flexibility

GitHub code is updated faster

GitHub code is all source code - you can modify the code as needed

The difference between the SMRT Portal version and the command-line version (

pbtranscript.py

) is that the command-line version additionally allows you to:

Use more CPUs than default

Directly start from the isoform-level clustering (ICE) part of

RS_IsoSeq

. Since v2.3.0, we have added additional entry points to the ICE/Quiver pipeline.

如果用SMRT portal 来分析数据，步骤如下：

1, getting FL reads

首先导入你的raw data，然后选择RS_IsoSeq protocol(SMRT PORTAL的版本要v2.3.0以上)

具体操作参见以前写的博客。（http://www.cnblogs.com/freemao/p/3783475.html）

Iso-seq 建库流程：

扫盲几个概念：

reads of insert 和 FL reads：

建库的时候可能会产生artificial chimeras,分两种：

第一种是接头浓度低导致的：

第二种是PCR扩增时导致的：

所以最终的数据：

下一步：

为何要进行上面的步骤：

Iso-seq的整个生物信息学分析流程大概就是这样的：

主要是两部分：1是classify, 2是cluster

classify 识别FL reads

cluster 主要是performs isoform-level clustering and outputs Quiver-polished high-quality consensus full-length transcript sequences.

整个过程是不需要参考基因组的，如果有参考基因组，可以被用来做比对，把polished transcipts map上去。从而可以

①，去除redundancy（Iso-Seq cluster output can be redundant）.如下图：

去除冗余应用实例：

②，可以发现新的基因或者isoforms.

classify 和 cluster的比较如下：

运行classify 和 cluster既可以在SMRT Portal,也可完全用命令行（pbtranscript.py),TOFU. 使用帮助在（https://github.com/PacificBiosciences/cDNA_primer/wiki）

关于最后的isoform结果可以通过UCSC browser看一下，肯定是要比二代的效果好很多。

Iso seq的应用：

1， Transcript indentification and annotation

2, Identification of Alternatively spliced isoforms

3, Targeted sequencing

4, normalization reduces the representation of highly expressed genes.

后续可以做的分析有（根据你自己的项目而定）：

详情见2015 webinar 文档。

学习网站：

•Iso-Seq Website (general information):
–http://www.pacb.com/isoseq
•
•Iso-Seq Analysis Information:
–https://github.com/PacificBiosciences/cDNA_primer/wiki
•
•Protocols:
–http://www.pacb.com/support/pubmap/documentation.html

•Available Datasets:
–MCF-7 Cancer Cell Line
−http://blog.pacificbiosciences.com/2013/12/data-release-human-mcf-7-transcriptome.html
–Human Normal Tissues (Brain, Heart, Liver)
−http://blog.pacificbiosciences.com/2014/10/data-release-whole-human-transcriptome.html

Library and Sequencing Evaluation 步骤：

结果表格如下：

任务过程： http://59.79.232.10:8080/smrtportal/#/Design-Job/ import and manage
import SMRT cells
add...
/share/backups/pacbio/20160222_68/A01_1
scan...OK
/share/backups/pacbio/20160222_68/B01_1
scan...OK
Design Job
Creat new
Analysis 对话框全部打钩
Next
填写Job Name
Protocals 选择 RS_IsoSeq.1
将YM1-30pM和YM2-30pM 这两个样导入，如果不知道哪个是你的数据，就看Uri那一列，有数据的位置。
save
start
任务就开始跑了
可以到melon上执行 qstat -a查看任务状态也可以直接在网页上monitor查看

freemao
FAFU
miaochenyong@163.com

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航