Iso-Seq学习
2016-02-27 00:01
453 查看
SMRT portal安装教程:
http://www.pacb.com/wp-content/uploads/2015/09/SMRT-Analysis-Software-Installation-v2.3.0.pdf
ISO-seq数据地址:
/share/backups/pacbio/20160222_68 的 A01 和 B01。
<1kb的得到1.28G数据,>1kb的得到了2.8G的数据。
SMRT portal 地址:
http://59.79.232.10:8080/smrtportal/#/Design-Job/ 软件安装主目录:
/share/workplace/software/PACBIO
reference_droplist: :
/share/workplace/software/PACBIO/userdata/references_dropbox
username: pbuser
password: pacbio-one2three
学习目的:对这两个cell收集一下结果(多少reads,多少全长reads,多少isoform,SMRT-portal的报告都有。
ISOseq数据比对到参考基因组
文本教程参见:
https://github.com/PacificBiosciences/cDNA_primer/wiki
视频教程:
http://www.pacb.com/training/IsoformSequencingIsoSeqOverview/story.html
In eukaryotic organisms, the majority of genes are alternatively spliced to produce multiple transcript isoforms, dramatically increasing the protein-coding potential of a genome.
Alternatively spliced isoforms from the same gene can have significantly different, even antagonistic, effects. To study gene expression, researchers have looked at fragments of an organism’s genes utilizing next-generation sequencing methods, commonly referred to as RNA sequencing (RNA-seq). However, short-read RNA-seq cannot span full-length transcripts, making it difficult to accurately characterize the diverse landscape of isoforms.
The isoform sequencing (Iso-Seq) application generates full-length cDNA sequences — from the 5’ end of transcripts to the poly-A tail — eliminating the need for transcriptome reconstruction using isoform-inference algorithms. The Iso-Seq method generates accurate information about alternatively spliced exons and transcriptional start sites. It also delivers information about poly-adenylation sites for transcripts up to 10 kb in length across the full complement of isoforms within targeted genes or the entire transcriptome.
Iso-Seq的目的就是: understand transcriptome complexity using accurate, unassembled, full-length long reads.
实验室测序出来的数据目录结构:
Analysis_Results下的文件:
正确的数据结构如下:
注意metadata.xml文件和子目录下的bax.h5文件。
对于数据的处理有三种方式,一种是通过RS_isoseq SMRT portal, 一种是github code,一种是RS_isoseq 明令行。三者的主要区别如下:
The differences between the GitHub code and the
GitHub code requires you to set up a virtual environment and install all libraries on your own
GitHub code is more step-by-step and allows more flexibility
GitHub code is updated faster
GitHub code is all source code - you can modify the code as needed
The difference between the SMRT Portal version and the command-line version (
Use more CPUs than default
Directly start from the isoform-level clustering (ICE) part of
如果用SMRT portal 来分析数据,步骤如下:
1, getting FL reads
首先导入你的raw data,然后选择RS_IsoSeq protocol(SMRT PORTAL的版本要v2.3.0以上)
具体操作参见以前写的博客。(http://www.cnblogs.com/freemao/p/3783475.html)
Iso-seq 建库流程:
扫盲几个概念:
reads of insert 和 FL reads:
建库的时候可能会产生artificial chimeras,分两种:
第一种是接头浓度低导致的:
第二种是PCR扩增时导致的:
所以最终的数据:
下一步:
为何要进行上面的步骤:
Iso-seq的整个生物信息学分析流程大概就是这样的:
主要是两部分:1是classify, 2是cluster
classify 识别FL reads
cluster 主要是performs isoform-level clustering and outputs Quiver-polished high-quality consensus full-length transcript sequences.
整个过程是不需要参考基因组的,如果有参考基因组,可以被用来做比对,把polished transcipts map上去。从而可以
①,去除redundancy(Iso-Seq cluster output can be redundant).如下图:
去除冗余应用实例:
②,可以发现新的基因或者isoforms.
classify 和 cluster的比较如下:
运行classify 和 cluster既可以在SMRT Portal,也可完全用命令行(pbtranscript.py),TOFU. 使用帮助在(https://github.com/PacificBiosciences/cDNA_primer/wiki)
关于最后的isoform结果 可以通过UCSC browser看一下,肯定是要比二代的效果好很多。
Iso seq的应用:
1, Transcript indentification and annotation
2, Identification of Alternatively spliced isoforms
3, Targeted sequencing
4, normalization reduces the representation of highly expressed genes.
后续可以做的分析有(根据你自己的项目而定):
详情见2015 webinar 文档。
学习网站:
•Iso-Seq Website (general information):
–http://www.pacb.com/isoseq
•
•Iso-Seq Analysis Information:
–https://github.com/PacificBiosciences/cDNA_primer/wiki
•
•Protocols:
–http://www.pacb.com/support/pubmap/documentation.html
•Available Datasets:
–MCF-7 Cancer Cell Line
−http://blog.pacificbiosciences.com/2013/12/data-release-human-mcf-7-transcriptome.html
–Human Normal Tissues (Brain, Heart, Liver)
−http://blog.pacificbiosciences.com/2014/10/data-release-whole-human-transcriptome.html
Library and Sequencing Evaluation 步骤:
结果表格如下:
任务过程: http://59.79.232.10:8080/smrtportal/#/Design-Job/ import and manage
import SMRT cells
add...
/share/backups/pacbio/20160222_68/A01_1
scan...OK
/share/backups/pacbio/20160222_68/B01_1
scan...OK
Design Job
Creat new
Analysis 对话框全部打钩
Next
填写Job Name
Protocals 选择 RS_IsoSeq.1
将YM1-30pM和YM2-30pM 这两个样导入,如果不知道哪个是你的数据,就看Uri那一列,有数据的位置。
save
start
任务就开始跑了
可以到melon上 执行 qstat -a查看任务状态 也可以直接在网页上monitor查看
freemao
FAFU
miaochenyong@163.com
http://www.pacb.com/wp-content/uploads/2015/09/SMRT-Analysis-Software-Installation-v2.3.0.pdf
ISO-seq数据地址:
/share/backups/pacbio/20160222_68 的 A01 和 B01。
<1kb的得到1.28G数据,>1kb的得到了2.8G的数据。
SMRT portal 地址:
http://59.79.232.10:8080/smrtportal/#/Design-Job/ 软件安装主目录:
/share/workplace/software/PACBIO
reference_droplist: :
/share/workplace/software/PACBIO/userdata/references_dropbox
username: pbuser
password: pacbio-one2three
学习目的:对这两个cell收集一下结果(多少reads,多少全长reads,多少isoform,SMRT-portal的报告都有。
ISOseq数据比对到参考基因组
文本教程参见:
https://github.com/PacificBiosciences/cDNA_primer/wiki
视频教程:
http://www.pacb.com/training/IsoformSequencingIsoSeqOverview/story.html
THE CHALLENGE OF ISOFORM RECONSTRUCTION
简单的说就是二代测序无法有效区分同一个transcript的单倍型!In eukaryotic organisms, the majority of genes are alternatively spliced to produce multiple transcript isoforms, dramatically increasing the protein-coding potential of a genome.
Alternatively spliced isoforms from the same gene can have significantly different, even antagonistic, effects. To study gene expression, researchers have looked at fragments of an organism’s genes utilizing next-generation sequencing methods, commonly referred to as RNA sequencing (RNA-seq). However, short-read RNA-seq cannot span full-length transcripts, making it difficult to accurately characterize the diverse landscape of isoforms.
Produce full-length transcripts without assembly
简单的说就是三代测序能直接把一个单倍型测穿。这就是ISOseqThe isoform sequencing (Iso-Seq) application generates full-length cDNA sequences — from the 5’ end of transcripts to the poly-A tail — eliminating the need for transcriptome reconstruction using isoform-inference algorithms. The Iso-Seq method generates accurate information about alternatively spliced exons and transcriptional start sites. It also delivers information about poly-adenylation sites for transcripts up to 10 kb in length across the full complement of isoforms within targeted genes or the entire transcriptome.
Iso-Seq的目的就是: understand transcriptome complexity using accurate, unassembled, full-length long reads.
实验室测序出来的数据目录结构:
Analysis_Results下的文件:
正确的数据结构如下:
注意metadata.xml文件和子目录下的bax.h5文件。
对于数据的处理有三种方式,一种是通过RS_isoseq SMRT portal, 一种是github code,一种是RS_isoseq 明令行。三者的主要区别如下:
The differences between the GitHub code and the
RS_IsoSeqcode are:
GitHub code requires you to set up a virtual environment and install all libraries on your own
GitHub code is more step-by-step and allows more flexibility
GitHub code is updated faster
GitHub code is all source code - you can modify the code as needed
The difference between the SMRT Portal version and the command-line version (
pbtranscript.py) is that the command-line version additionally allows you to:
Use more CPUs than default
Directly start from the isoform-level clustering (ICE) part of
RS_IsoSeq. Since v2.3.0, we have added additional entry points to the ICE/Quiver pipeline.
如果用SMRT portal 来分析数据,步骤如下:
1, getting FL reads
首先导入你的raw data,然后选择RS_IsoSeq protocol(SMRT PORTAL的版本要v2.3.0以上)
具体操作参见以前写的博客。(http://www.cnblogs.com/freemao/p/3783475.html)
Iso-seq 建库流程:
扫盲几个概念:
reads of insert 和 FL reads:
建库的时候可能会产生artificial chimeras,分两种:
第一种是接头浓度低导致的:
第二种是PCR扩增时导致的:
所以最终的数据:
下一步:
为何要进行上面的步骤:
Iso-seq的整个生物信息学分析流程大概就是这样的:
主要是两部分:1是classify, 2是cluster
classify 识别FL reads
cluster 主要是performs isoform-level clustering and outputs Quiver-polished high-quality consensus full-length transcript sequences.
整个过程是不需要参考基因组的,如果有参考基因组,可以被用来做比对,把polished transcipts map上去。从而可以
①,去除redundancy(Iso-Seq cluster output can be redundant).如下图:
去除冗余应用实例:
②,可以发现新的基因或者isoforms.
classify 和 cluster的比较如下:
运行classify 和 cluster既可以在SMRT Portal,也可完全用命令行(pbtranscript.py),TOFU. 使用帮助在(https://github.com/PacificBiosciences/cDNA_primer/wiki)
关于最后的isoform结果 可以通过UCSC browser看一下,肯定是要比二代的效果好很多。
Iso seq的应用:
1, Transcript indentification and annotation
2, Identification of Alternatively spliced isoforms
3, Targeted sequencing
4, normalization reduces the representation of highly expressed genes.
后续可以做的分析有(根据你自己的项目而定):
详情见2015 webinar 文档。
学习网站:
•Iso-Seq Website (general information):
–http://www.pacb.com/isoseq
•
•Iso-Seq Analysis Information:
–https://github.com/PacificBiosciences/cDNA_primer/wiki
•
•Protocols:
–http://www.pacb.com/support/pubmap/documentation.html
•Available Datasets:
–MCF-7 Cancer Cell Line
−http://blog.pacificbiosciences.com/2013/12/data-release-human-mcf-7-transcriptome.html
–Human Normal Tissues (Brain, Heart, Liver)
−http://blog.pacificbiosciences.com/2014/10/data-release-whole-human-transcriptome.html
Library and Sequencing Evaluation 步骤:
结果表格如下:
任务过程: http://59.79.232.10:8080/smrtportal/#/Design-Job/ import and manage
import SMRT cells
add...
/share/backups/pacbio/20160222_68/A01_1
scan...OK
/share/backups/pacbio/20160222_68/B01_1
scan...OK
Design Job
Creat new
Analysis 对话框全部打钩
Next
填写Job Name
Protocals 选择 RS_IsoSeq.1
将YM1-30pM和YM2-30pM 这两个样导入,如果不知道哪个是你的数据,就看Uri那一列,有数据的位置。
save
start
任务就开始跑了
可以到melon上 执行 qstat -a查看任务状态 也可以直接在网页上monitor查看
freemao
FAFU
miaochenyong@163.com
相关文章推荐
- linux copy cp目录强制覆盖
- This function has none of DETERMINISTIC, NO SQL, o
- mysql主从复制跳过错误
- Linux redhat5.5安装Oracle 11g RAC + ASM + RAW 准备工作
- iOS 数据持久化
- python操作mysql
- Android的Activity跳转和传值
- Cannot change version ...Module to 3.0 解决
- 解决“Dynamic Web Module 3.0 requires Java 1.6
- redis键值存储系统
- AngularJS动态操作列表
- spring security oauth2
- Xcode 配置问题
- cell 网络数据缓存
- 单例模式 OC
- github 上常用的
- OLAP、OLTP的介绍和比较
- 毁灭程序员的15个障碍
- 中国重点类型咨询信息领域经营发展规模
- C++常用集合开发:STL vector