去除测序reads中的接头:adaptor
2015-05-29 13:11
246 查看
之前用c写过一个程序,查找reads中是否包含了adaptor,如果检测到的话就过滤掉含有adaptor的reads,这次在过滤完数据之后发现接头序列比较多,为了提升组装效果,又不能很大地影响数据量,需要对接头进行截断处理,并过滤过短的reads,用python写了一个简短的程序,指定超过3个错配以内的匹配都认为匹配到,并且长度小于50bp的reads过滤,在以下程序基础上添加传入参数,可以适用比较多的情况(单端、双端、含有single等):
import sys import re from Bio import SeqIO def rmPE(read1,read2,adaptor1,adaptor2,min_length): res_1 = rmSE(read1,adaptor1,min_length) res_2 = rmSE(read2,adaptor2,min_length) if res_1 and res_2: return res_1,res_2 else: return False def rmSE(read,adaptor,min_length): seq = read.seq seed_len = 6 a_len = len(adaptor) seq_len = len(seq) for i in range(a_len - seed_len): seed = adaptor[i:i+seed_len] pos = 0 while(pos < seq_len): find_pos = seq.find(seed,pos) if find_pos > 0: mistaken_count = 0 _b = find_pos _e = find_pos + seed_len while(_b >= 0 and i >= find_pos - _b): if adaptor[i - find_pos + _b] != seq[_b]: mistaken_count += 1 if mistaken_count > 3: break _b -= 1 else : while(_e < seq_len and i - find_pos + _e < a_len): if adaptor[ i - find_pos + _e ] != seq[_e]: mistaken_count += 1 if mistaken_count > 3: break _e += 1 else: if find_pos - i > min_length: return read[:find_pos-i] else : return False pos = find_pos + 1 else: break return read def rmAdaptor(argv): argv.pop(0) type = argv.pop(0) if type=='PE': read1_file,read2_file,adaptor1,adaptor2,out_prefix,min_length = argv read2_records = SeqIO.parse(open(read2_file),'fastq') read1_out = open( '%s.1.fq'%out_prefix,'w' ) read2_out = open( '%s.2.fq'%out_prefix,'w' ) for read1 in SeqIO.parse(open(read1_file),'fastq'): read2 = read2_records.next() rmPE_res = rmPE(read1,read2,adaptor1,adaptor2,min_length) if rmPE_res: read1_out.write(rmPE_res[0].format('fastq')) read2_out.write(rmPE_res[1].format('fastq')) elif type=='SE': reads_file,adaptor,out_prefix,min_length = argv reads_out = open( '%s.single.fq'%out_prefix,'w' ) for reads in SeqIO.parse(open(reads_file),'fastq'): rmSE_res = False if re.search('[\s\/](\d)',reads.id).group(1) == '1': rmSE_res = rmSE(reads,adaptor1,min_length) elif re.search('[\s\/](\d)',reads.id).group(1) == '2': rmSE_res = rmSE(reads,adaptor2,min_length) if rmSE_res: reads_out.write(rmSE_res.format('fastq')) if __name__ == '__main__': rmAdaptor(sys.argv)
相关文章推荐
- MySQL实现SQLServer ROW_NUMBER() OVER ORDER BY
- Web服务器之虚拟目录
- 股票基金看哪些书
- 想让应用拥有material风格?
- Winform——计算器
- FZU2082 树链剖分(单点更新区间求值)
- BeagleBone Black下Debian文件更新问题
- UML类图
- 如何在ppt中剔除图片的背景精美ppt模板下载
- 如何在Ubuntu QML应用中进行语言录音
- Yahoo的mysql性能监控snmp服务设定
- Android-framework
- android布局分析工具
- java值传递思考
- 03.windows系统重新分配ip的cmd命令
- 利用kaptcha生成验证码的详细教程
- Android必知必会--使用shape制作drawable素材
- Django笔记(1)
- 加加减减有副作用, C/C++er 请小心
- Android必知必会--使用shape制作drawable素材