FASTQ文件,每条reads记录有四行:@HWI-EAS255_4_FC2010Y_1_43_110_790TTAATCTACAGAATAGATAGCTAGCATATATTT+IIIIIIIIIIIIIIIIAIIIIIIIII&;II&,IThisformatusesfourlinespersequence.-line1:beginswith‘@’followedbythesequencename-line2:istherawsequenceletters-line3:beginswith‘+’followedbyadescriptionorempty-line4:containsqualityscoresinASCIIcode软件地址:http://hannonlab.cshl.edu/fastx_toolkit/FASTQfilesBAMfilesVCFfiles与基因组比对变异位点鉴定变异位点注释https://www.broadinstitute.org/gatk/定义:将测序得到的数目众多的Reads,比对到参考基因组序列,并允许一定的错配。比对方法:Blast、Blat?将又短有多的reads,比对到长长的基因组上,建立索引是关键!对reads集合建立索引,可以对基因组建立索引,两者同时也可建立索引;建立索引主要的两种方法:AlgorithmsbasedonhashtablesMAQ、SOAP、ELAND、SeqMap、RMAP、ZOOM、SHRiMP…AlgorithmsbasedonsuffixtreeBowtie,BWA,SOAP2…Reads的基因组回帖基因组建立索引bwaindexref.faBWA-MEM(reads长度>=70bp,可用于Illumina,454,IonTorrentandSangerreads,等平台)bwamemref.fareads.fq>aln-se.sambwamemref.faread1.fqread2.fq>aln-pe.samBWA-backtrack(reads长度<70bp)bwaalnref.fashort_read.fq>aln_sa.saibwasamseref.faaln_sa.saishort_read.fq>aln-se.sambwasamperef.faaln_sa1.saialn_sa2.sairead1.fqread2.fq>aln-pe.samBWA-SW(reads长度>100bp,且gap出现的较频繁)bwabwaswref.falong_read.fq>aln.samBWA软件介绍$bwaindexhg19_genome.fa$bwamemhg19_genomereads1.fqreads2.fq>bwa.bam$samtoolssortbwa.bam>bwa.sort.bam$samtoolsrmdupbwa.bam>bwa.sort.rmd.bamSamtoolsBWA软件地址:http://bio-bwa.sourceforge.net/bwa.shtmlhttp://samtools.sourceforge.net/samtools.shtml基因组重测序reads回帖流程$java–Xmx8g-jarGenomeAnalysisTK.jar-TRealignerTargetCreator-Rhg19_genome.fa-knowndbSNP.vcf-oBAM.intervals$java–Xmx8g-jarGenomeAnalysisTK.jar-TIndelRealigner-Rhg19_genome.fa-Ilane-level.bam-knowndbSNP.vcf-targetIntervalsBAM.intervals-orealignedBam.bam--consensusDeterminationModelKNOWNS_ONLY基因组重测序reads回帖结果的矫正1:497:R:-272+13M17D24M11314973737M151003386620CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG0;==-==9;>>>>>=>>...