基因数据处理11之sam文件格式
2016-03-13 16:44
302 查看
基因数据处理11之sam文件格式
SAM的全称是sequence alignment map format。而BAM就是SAM的二进制文件(B取自binary)
1. read名称
2. SAM标记
3. chromosome
4. 5′端起始位置
5. MAPQ(mapping quality,描述比对的质量,数字越大,特异性越高)
6. CIGAR字串,记录插入,删除,错配以及splice junctions(后剪切拼接的接头)
7. mate名称,记录mate pair信息
8. mate的位置
9. 模板的长度
10. read序列
11. read质量
12. 程序用标记
样例:
hadoop@Master:~/cloud/adam/xubo/data/test20160310$ samtools view SRR003161h20.sam
SRR003161.1 0 chr1 143217889 0 4S35M85S * 0 0 TCAGATGCAATCATCGAATGGTCTCGAATGGAATCNTCTANAGAGATGGAATGTATCNCTCGCCANACGACACNCGAACAGGGNAAGGCAAGCAGNAGGNAGNNNANNNNNNNNNNNNNNNNNN AAAAAAAAAAAAAAAA:::BAAFAABAAB?>>=44!39=<!:866699888220862!08:8002!0200000!022200800!20660000600!000!06!!!6!!!!!!!!!!!!!!!!!! NM:i:1 MD:Z:31A3 AS:i:3XS:i:33 XA:Z:chr10,+42092546,4S35M85S,1;chr1,+143217421,4S35M85S,1;chr1,+143239587,4S35M85S,1;chr1,-143252938,85S35M4S,1;chr1,+143220601,4S35M85S,1;chr1,+143219665,4S35M85S,1;chr1,-143210830,85S35M4S,1;chr10,+42075371,4S35M85S,1;chr10,+42101425,4S35M85S,1;chr1,+143272381,4S35M85S,1;chr1,-143204112,85S35M4S,1;chr1,+143189975,4S35M85S,1;chr10,+42080829,4S35M85S,1;chr10,+42067652,4S35M85S,1;chr1,+143236600,4S35M85S,1;chr10,+42071261,4S35M85S,1;chr1,+143202568,4S35M85S,1;chr1,+143262016,4S35M85S,1;chr10,+42094445,4S35M85S,1;chr1,+143229991,4S35M85S,1;chr1,+143194906,4S35M85S,1;chr10,+42098197,4S35M85S,1;chr1,+143229325,4S35M85S,1;chr1,+143273144,4S35M85S,1;chr1,+143236132,4S35M85S,1;chr3,-196898795,85S35M4S,1;chr1,-125173710,85S35M4S,1;chr10,+42074903,4S35M85S,1;chr1,+143193143,4S35M85S,1;chr1,+143190443,4S35M85S,1;chr10,+42085796,4S35M85S,1;chr1,+143224622,4S35M85S,1;chr1,+143267943,4S35M85S,1;chr10,+42103854,4S35M85S,1;chr1,+143225093,4S35M85S,1;chr1,-143249828,85S35M4S,1;chr1,+143231300,4S35M85S,1;chr1,-143256486,85S35M4S,1;chr1,-143209440,85S35M4S,1;chr1,+143228021,4S35M85S,1;chr1,+143185063,4S35M85S,1;chr10,-41852367,85S35M4S,1;chr1,-143251629,85S35M4S,1;chr1,+143233540,4S35M85S,1;chr10,+42093977,4S35M85S,1;chr1,+143200517,4S35M85S,1;chr1,+143194441,4S35M85S,1;chr10,+42070793,4S35M85S,1;chr1,+143206914,4S35M85S,1;chr1,+143237811,4S35M85S,1;chr1,+143227553,4S35M85S,1;chr1,-143255189,85S35M4S,1;chr1,+143231768,4S35M85S,1;chr1,+143271341,4S35M85S,1;chr10,+42080361,4S35M85S,1;chr1,+143213870,4S35M85S,1;chr10,+42074435,4S35M85S,1;chr1,+143263324,4S35M85S,1;chr10,+42097745,4S35M85S,1;chr10,+42090276,4S35M85S,1;chr1,-125180284,85S35M4S,1;chr1,+143240055,4S35M85S,1;chr1,+143265756,4S35M85S,1;chr1,+143216113,4S35M85S,1;chr1,-125169985,85S35M4S,1;chr1,+143219197,4S35M85S,1;chr1,+143192675,4S35M85S,1;chr10,+42095848,4S35M85S,1;chr1,+143195374,4S35M85S,1;chr1,+143214338,4S35M85S,1;chr1,+143270772,4S35M85S,1;chr1,-125166285,85S35M4S,1;chr1,+143275099,4S35M85S,1;chr1,+143226451,4S35M85S,1;chr10,+42104319,4S35M85S,1;chr1,+143232233,4S35M85S,1;chr1,+143211626,4S35M85S,1;chr1,+143220133,4S35M85S,1;chr1,+143215645,4S35M85S,1;chr10,+42100036,4S35M85S,1;chr10,-41846998,85S35M4S,1;chr1,-125168084,85S35M4S,1;chr1,-125179816,85S35M4S,1;chr1,+143240523,4S35M85S,1;chr1,+143264771,4S35M85S,1;chr1,+143212094,4S34M86S,1;chr10,-41845898,86S34M4S,1;chr1,+143191375,4S31M89S,0;chr1,-125182919,89S31M4S,0;chr1,+143221908,4S31M89S,0;chr1,+143190911,4S31M89S,0;chr10,-41843753,89S31M4S,0;chr10_KI270824v1_alt,+1080,4S35M85S,1;chr10_KI270824v1_alt,+615,4S35M85S,1;
SRR003161.2 0 chr7 41381016 60 4S153M1D132M1D5M1D28M1D73M3I12M1I40M54S * 0 0 TCAGTTTGAGATGGAGTTTCATTCTTGTTGCCCAGGCTGGAGTGCAATGGCGCAATCTCAGCTCACAGCAACCTCCGCCTCCCGGGTTCAAGCGATTCTCCTGCCTCAGCCTCTCGAGTAGCTGGGATTACAGGCATGCACCATCACGCCCAGCTAATTTGCATTTTTTATTAGAGATGGGGTTTCTCCACATTGGTCAGGCTGATCTCGAACTCCTGACCTCAGGTGATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCTGAGCCCAACCTATTTACTTTCAATCCATCTTTTCAATAACTTAAATACAAGTGTCAATATATACAATCTTTTCCTCCCTGGTTATCAAGCTTTCTAATATATATGGATGTATCTTCCAAGGTTTTTGATCCCATTTTACTTTACAGGCTCACTGCTGTGGAACCCAGAGAGCAGTCTCTTTTCAAGGNGGGCTGAGACNCGCAACAGGGGATTAGGCCAAGGCNCAGG CCCCCCCCCCCCCCCC@@@CCCFEEEFEEG888EEEFFEEEEFGGGGGGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA<777@@CCCBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAAACCCCCCCCCCCCCCCCCCCCCCC:93339@A>77//39AC666666C22CAAAA93333///7-0017>9999>>A???ACCCCCCC2239322>9977<?????CCCCCCCCC877777777111111::::5555:555:::::::::;:555:;;::::0040-----***--467::::;;;;;;:::511155555:555:::;::::::7777744-------///245::;;;::::::;;;;;;;;:55554774----------44-----064---------6---522451115247644255-----,4---24464422---------!,,,4464224!11:::7:::111111--7777---!---- NM:i:1MD:Z:153^T40T91^T5^T28^G73G23C0G26 AS:i:379 XS:i:88
参考:
http://www.bbioo.com/lifesciences/40-113338-1.html
SAM的全称是sequence alignment map format。而BAM就是SAM的二进制文件(B取自binary)
1. read名称
2. SAM标记
3. chromosome
4. 5′端起始位置
5. MAPQ(mapping quality,描述比对的质量,数字越大,特异性越高)
6. CIGAR字串,记录插入,删除,错配以及splice junctions(后剪切拼接的接头)
7. mate名称,记录mate pair信息
8. mate的位置
9. 模板的长度
10. read序列
11. read质量
12. 程序用标记
样例:
hadoop@Master:~/cloud/adam/xubo/data/test20160310$ samtools view SRR003161h20.sam
SRR003161.1 0 chr1 143217889 0 4S35M85S * 0 0 TCAGATGCAATCATCGAATGGTCTCGAATGGAATCNTCTANAGAGATGGAATGTATCNCTCGCCANACGACACNCGAACAGGGNAAGGCAAGCAGNAGGNAGNNNANNNNNNNNNNNNNNNNNN AAAAAAAAAAAAAAAA:::BAAFAABAAB?>>=44!39=<!:866699888220862!08:8002!0200000!022200800!20660000600!000!06!!!6!!!!!!!!!!!!!!!!!! NM:i:1 MD:Z:31A3 AS:i:3XS:i:33 XA:Z:chr10,+42092546,4S35M85S,1;chr1,+143217421,4S35M85S,1;chr1,+143239587,4S35M85S,1;chr1,-143252938,85S35M4S,1;chr1,+143220601,4S35M85S,1;chr1,+143219665,4S35M85S,1;chr1,-143210830,85S35M4S,1;chr10,+42075371,4S35M85S,1;chr10,+42101425,4S35M85S,1;chr1,+143272381,4S35M85S,1;chr1,-143204112,85S35M4S,1;chr1,+143189975,4S35M85S,1;chr10,+42080829,4S35M85S,1;chr10,+42067652,4S35M85S,1;chr1,+143236600,4S35M85S,1;chr10,+42071261,4S35M85S,1;chr1,+143202568,4S35M85S,1;chr1,+143262016,4S35M85S,1;chr10,+42094445,4S35M85S,1;chr1,+143229991,4S35M85S,1;chr1,+143194906,4S35M85S,1;chr10,+42098197,4S35M85S,1;chr1,+143229325,4S35M85S,1;chr1,+143273144,4S35M85S,1;chr1,+143236132,4S35M85S,1;chr3,-196898795,85S35M4S,1;chr1,-125173710,85S35M4S,1;chr10,+42074903,4S35M85S,1;chr1,+143193143,4S35M85S,1;chr1,+143190443,4S35M85S,1;chr10,+42085796,4S35M85S,1;chr1,+143224622,4S35M85S,1;chr1,+143267943,4S35M85S,1;chr10,+42103854,4S35M85S,1;chr1,+143225093,4S35M85S,1;chr1,-143249828,85S35M4S,1;chr1,+143231300,4S35M85S,1;chr1,-143256486,85S35M4S,1;chr1,-143209440,85S35M4S,1;chr1,+143228021,4S35M85S,1;chr1,+143185063,4S35M85S,1;chr10,-41852367,85S35M4S,1;chr1,-143251629,85S35M4S,1;chr1,+143233540,4S35M85S,1;chr10,+42093977,4S35M85S,1;chr1,+143200517,4S35M85S,1;chr1,+143194441,4S35M85S,1;chr10,+42070793,4S35M85S,1;chr1,+143206914,4S35M85S,1;chr1,+143237811,4S35M85S,1;chr1,+143227553,4S35M85S,1;chr1,-143255189,85S35M4S,1;chr1,+143231768,4S35M85S,1;chr1,+143271341,4S35M85S,1;chr10,+42080361,4S35M85S,1;chr1,+143213870,4S35M85S,1;chr10,+42074435,4S35M85S,1;chr1,+143263324,4S35M85S,1;chr10,+42097745,4S35M85S,1;chr10,+42090276,4S35M85S,1;chr1,-125180284,85S35M4S,1;chr1,+143240055,4S35M85S,1;chr1,+143265756,4S35M85S,1;chr1,+143216113,4S35M85S,1;chr1,-125169985,85S35M4S,1;chr1,+143219197,4S35M85S,1;chr1,+143192675,4S35M85S,1;chr10,+42095848,4S35M85S,1;chr1,+143195374,4S35M85S,1;chr1,+143214338,4S35M85S,1;chr1,+143270772,4S35M85S,1;chr1,-125166285,85S35M4S,1;chr1,+143275099,4S35M85S,1;chr1,+143226451,4S35M85S,1;chr10,+42104319,4S35M85S,1;chr1,+143232233,4S35M85S,1;chr1,+143211626,4S35M85S,1;chr1,+143220133,4S35M85S,1;chr1,+143215645,4S35M85S,1;chr10,+42100036,4S35M85S,1;chr10,-41846998,85S35M4S,1;chr1,-125168084,85S35M4S,1;chr1,-125179816,85S35M4S,1;chr1,+143240523,4S35M85S,1;chr1,+143264771,4S35M85S,1;chr1,+143212094,4S34M86S,1;chr10,-41845898,86S34M4S,1;chr1,+143191375,4S31M89S,0;chr1,-125182919,89S31M4S,0;chr1,+143221908,4S31M89S,0;chr1,+143190911,4S31M89S,0;chr10,-41843753,89S31M4S,0;chr10_KI270824v1_alt,+1080,4S35M85S,1;chr10_KI270824v1_alt,+615,4S35M85S,1;
SRR003161.2 0 chr7 41381016 60 4S153M1D132M1D5M1D28M1D73M3I12M1I40M54S * 0 0 TCAGTTTGAGATGGAGTTTCATTCTTGTTGCCCAGGCTGGAGTGCAATGGCGCAATCTCAGCTCACAGCAACCTCCGCCTCCCGGGTTCAAGCGATTCTCCTGCCTCAGCCTCTCGAGTAGCTGGGATTACAGGCATGCACCATCACGCCCAGCTAATTTGCATTTTTTATTAGAGATGGGGTTTCTCCACATTGGTCAGGCTGATCTCGAACTCCTGACCTCAGGTGATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCTGAGCCCAACCTATTTACTTTCAATCCATCTTTTCAATAACTTAAATACAAGTGTCAATATATACAATCTTTTCCTCCCTGGTTATCAAGCTTTCTAATATATATGGATGTATCTTCCAAGGTTTTTGATCCCATTTTACTTTACAGGCTCACTGCTGTGGAACCCAGAGAGCAGTCTCTTTTCAAGGNGGGCTGAGACNCGCAACAGGGGATTAGGCCAAGGCNCAGG CCCCCCCCCCCCCCCC@@@CCCFEEEFEEG888EEEFFEEEEFGGGGGGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA<777@@CCCBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAAACCCCCCCCCCCCCCCCCCCCCCC:93339@A>77//39AC666666C22CAAAA93333///7-0017>9999>>A???ACCCCCCC2239322>9977<?????CCCCCCCCC877777777111111::::5555:555:::::::::;:555:;;::::0040-----***--467::::;;;;;;:::511155555:555:::;::::::7777744-------///245::;;;::::::;;;;;;;;:55554774----------44-----064---------6---522451115247644255-----,4---24464422---------!,,,4464224!11:::7:::111111--7777---!---- NM:i:1MD:Z:153^T40T91^T5^T28^G73G23C0G26 AS:i:379 XS:i:88
参考:
http://www.bbioo.com/lifesciences/40-113338-1.html
相关文章推荐
- 服务器安全设置小技巧
- Lambda表达式应用浅析
- CF235C Cyclical Quest
- hdu4436 SAM_多串匹配
- poj1509_SAM_简单题。。
- hdu_1403_SAM_求最小公共前缀
- HDU 4641 SAM
- HDU 4622 Reincarnation(SAM 后缀自动机 求子串的不同子串个数)
- POJ 1509 Glass Beads(SAM 求最小表示法)
- 使用samtools来对sam/bam/cram相互转换
- 基因数据处理1之mapping_to_cram
- 像我这种背景的人跑到微软来干什么? 推荐
- 12294错误事件的处理--利用审核日志查找病毒来源
- windows xp突然无法登录的一种解决方法
- 第24讲: Scala中SAM转换实战详解
- 简约而不简单锐捷网络校园网安全管理平台新鲜上线
- Windows Server入门系列29 Hash加密与SAM数据库
- Windows Server入门系列30 清除Windows系统用户密码 推荐
- SAM
- SAM得到完美匹配(perfect match)