您的位置:首页 > 其它

GATK errors 及解决办法 (持续更新)

2014-06-11 08:29 399 查看
1, MESSAGE: Input files reads and reference have incompatible contigs: Relative ordering of overlapping contigs differs, which is unsafe.
##### ERROR reads contigs = [Chr1, Chr10, Chr11, Chr12, Chr2, Chr3, Chr4, Chr5, Chr6, Chr7, Chr8, Chr9, ChrSy, ChrUn]
##### ERROR reference contigs = [Chr1, Chr2, Chr3, Chr4, Chr5, Chr6, Chr7, Chr8, Chr9, Chr10, Chr11, Chr12, ChrUn, ChrSy]

RESOLVE: 这种错误是由于你的bam文件和参考序列中contigs的名字顺序不对应。将bam文件中contig名字的顺序调整使两者一致即可,可用picardtools 中的ReorderSam 工具:

java -jar /share/Public/cmiao/picard-tools-1.112/ReorderSam.jar I=L1-2_ATCACG_L003_R_tophat_accepted_hits.sorted.rmp.rg.bam O=order.bam REFERENCE=Osativa_204.fa

2, MESSAGE: Unsupported CIGAR operator N in read HWI-D00258:28:D2EU3ACXX:3:1106:20678:47827 at Chr1:3160. Perhaps you are trying to use RNA-Seq data? While we are currently actively working to support this data type unfortunately the GATK cannot be used with this data in its current form. You have the option of either filtering out all reads with operator N in their CIGAR string (please add --filter_reads_with_N_cigar to your command line) or assume the risk of processing those reads as they are including the pertinent unsafe flag (please add -U ALLOW_N_CIGAR_READS to your command line). Notice however that if you were to choose the latter, an unspecified subset of the analytical outputs of an unspecified subset of the tools will become unpredictable. Consequently the GATK team might well not be able to provide you with the usual support with any issue regarding any output。

RESOLVE: 如果你用的是RAN-seq数据,参考序列为整个基因组,那么在比对的时候,由于存在splice,有些read的不同部分可能比对到参考序列的不同region, 这是在这个read的CIGAR中就会有N (含义见sam格式),如果想继续call snp,就要把这些read给过滤掉,加上--filter_reads_with_N_cigar就可以了:

java -jar /share/Public/cmiao/GATK_tools/GenomeAnalysisTK.jar -nct 30 -T HaplotypeCaller -R Osativa_204.fa -I order.bam --filter_reads_with_N_cigar -o gatk.order.vcf

但是这种解决方式是不完美的,因为这只是将那些带有N read扔掉了,损失了很多有用的数据,这对于call snp 是不利的,目前最好的办法是在bam数据处理增加read group之前,即rmp之后,执行splitNtrim步骤。命令如下:

java -jar /share/Public/cmiao/GATK_tools/GenomeAnalysisTK.jar -T SplitNCigarReads -I in.bam -U ALLOW_N_CIGAR_READS -o out.bam -R Osativa_204.fa

注意,SplitNCigarReads 不支持GATK的多线程-nct 参数。输入的bam文件必须建立了index。

3, ERROR MESSAGE: SAM/BAM file SAMFileReader{<file.bam>} appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 61; please see the GATK --help documentation for options related to this error

今天帮serenity call snp, 出现了这个错误

解决办法是加了一个 -allowPotentiallyMisencodedQuals

原始数据的质量编码应该没错。可能是用GATK 处理SplitNcigar 时导致的

4, error message: 参考序列不标准,每行含的碱基个数可能不一样(除了最后一行),这个时候GATK也会报错,不得不说GATK真的对数据特别挑剔。解决办法是用picardtools 中的NormalizeFasta.jar来标准化你的fasta file

命令: java -jar /share/Public/cmiao/picard-tools-1.112/NormalizeFasta.jar I=OLD.fa O=NEW.fa
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: