Extracting info from VCF files
2015-06-08 21:28
549 查看
R, Bioconductor
filterVcf: Extract Variants of Interest from a Large VCF File (Paul Shannon)
We demonstrate three methods: filtering by genomic region, filtering on attributes of
each specific variant call, and intersecting with known regions of interest (exons, splice
sites, regulatory regions, etc.).
http://www.bioconductor.org/packages/release/bioc/vignettes/VariantAnnotation/inst/doc/filterVcf.pdf
Java
SelectVariants -- Select a subset of variants from a larger callset ( GATK SelectVariants )
Often, a VCF containing many samples and/or variants will need to be subset in order to facilitate certain analyses (e.g. comparing and contrasting cases vs. controls; extracting variant or non-variant loci that meet certain requirements, displaying just a few samples in a browser like IGV, etc.). SelectVariants can be used for this purpose.
https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php
Biostars
Question: How To Split Multiple Samples In Vcf File Generated By Gatk?
I did variant calling using BWA + PiCard + GATK and have just got the filtered VCF files from GATK. In the process of running GATK, I used list of inputs (11 samples) and for most steps, I had only one output file for each step. Now, I got two VCF files (one for SNPs and the other is for indels), each of which contains 11 samples. I can see the names of the 11 samples in the header of vcf files, and each sample seems to have one column of data. So I am wondering how to split each VCF files into individual sample vcf files?
https://www.biostars.org/p/78929/
bcftools
https://www.biostars.org/p/12535/#115691
vcf-subset
https://www.biostars.org/p/78929/
http://campagnelab.org/software/goby/reference-documentation/modes/vcf-subset/
REF:
http://samtools.github.io/hts-specs/VCFv4.2.pdf
filterVcf: Extract Variants of Interest from a Large VCF File (Paul Shannon)
We demonstrate three methods: filtering by genomic region, filtering on attributes of
each specific variant call, and intersecting with known regions of interest (exons, splice
sites, regulatory regions, etc.).
http://www.bioconductor.org/packages/release/bioc/vignettes/VariantAnnotation/inst/doc/filterVcf.pdf
Java
SelectVariants -- Select a subset of variants from a larger callset ( GATK SelectVariants )
Often, a VCF containing many samples and/or variants will need to be subset in order to facilitate certain analyses (e.g. comparing and contrasting cases vs. controls; extracting variant or non-variant loci that meet certain requirements, displaying just a few samples in a browser like IGV, etc.). SelectVariants can be used for this purpose.
https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php
Biostars
Question: How To Split Multiple Samples In Vcf File Generated By Gatk?
I did variant calling using BWA + PiCard + GATK and have just got the filtered VCF files from GATK. In the process of running GATK, I used list of inputs (11 samples) and for most steps, I had only one output file for each step. Now, I got two VCF files (one for SNPs and the other is for indels), each of which contains 11 samples. I can see the names of the 11 samples in the header of vcf files, and each sample seems to have one column of data. So I am wondering how to split each VCF files into individual sample vcf files?
https://www.biostars.org/p/78929/
bcftools
for file in *.vcf*; do for sample in `bcftools view -h $file | grep "^#CHROM" | cut -f10-`; do bcftools view -c1 -Oz -s $sample -o ${file/.vcf*/.$sample.vcf.gz} $file done done
https://www.biostars.org/p/12535/#115691
vcf-subset
vcf-subset -c S1 bigfile.vcf > S1.vcf
https://www.biostars.org/p/78929/
http://campagnelab.org/software/goby/reference-documentation/modes/vcf-subset/
REF:
http://samtools.github.io/hts-specs/VCFv4.2.pdf
相关文章推荐
- compile and link
- [Deep Learning]学习资料积累
- QQ聊天界面的布局和设计(IOS篇)-第二季
- FuzzyAutocomplete代码模糊匹配智能提示
- hunnu--11545--小明的烦恼——找路径
- LIBRARY_PATH和LD_LIBRARY_PATH环境变量的区别
- 使用指针取得Color的RGB值和透明度值
- ajax简单入门
- java面向对象-this
- Python Appium实现中文输入分享
- style、currentStyle、getComputedStyle的应用
- Eclipse里面设置编译版本方法
- Sql和其他技巧随笔未完待续
- UML类图几种关系的总结
- 牛客网--腾讯java工程师笔试卷---练习改错
- spring ioc
- KeystoneJS之云中漫步
- SharePoint 2010 部署架构
- 编码转换:怎样将 GB2312 编码的字符串转换为 ISO-8859-1 编码的字符串?
- phpmailer使用教程及实例演示