This is an old revision of the document!
2.1 Primary Analysis - Sequence processing
After sequencing, reads are mapped to the human genome reference (GRCh37). Then, there are some filtering steps in which the number of reads decreases until variant calling. Here, we describe how this reduction occurs in each pipeline.
BIER's pipeline
In this pipeline, there are four stages where the number of reads decreases (mapping, filter by mapping quality, remove duplicates and intervals realignment). This table shows reads remaining after these stages.
N_reads_forward and reverse: initial number of reads forward and reverse obtained in the exome sequencing process
N_mapped_read_pairs: number of read pairs mapped to the human genome reference
%_mapped_read_pairs: percentage of initial read pairs mapped to the human genome reference
N_mapped_reads_mapq>10: number of mapped reads whose mapping quality (mapq) is higher than 10
%_mapped_reads_mapq>10: percentage of initial mapped reads whose mapping quality (mapq) is higher than 10
N_reads_single_hit: number of reads uniquely mapped to the human genome reference
%_reads_single_hit: percentage of initial reads uniquely mapped to the human genome reference
N_reads_single_hit_realigned: number of reads located in the exome capture kit targets who had been realigned
%_reads_single_hit_realigned: percentage of initial reads located in the exome capture kit targets who had been realigned
CNAG's pipeline
In contrast with BIER's pipeline, there are only two stages where the number of reads decreases (mapping and remove duplicates).
N_reads_forward and reverse: initial number of reads forward and reverse obtained in the exome sequencing process.
N_mapped_read_pairs: number of read pairs mapped to the human genome reference.
%_mapped_read_pairs: percentage of initial read pairs mapped to the human genome reference.
N_read_pairs_single_hit: number of read pairs uniquely mapped to the human genome reference.
%_read_pairs_single_hit: percentage of initial read pairs uniquely mapped to the human genome reference.
| Sample | N_reads_forward | N_reads_reverse | N_mapped_read_pairs | %_mapped_read_pairs | N_read_pairs single_hit | %_reads_pairs single_hit |
| SGT038 | 31471997 | 31471997 | 27098380 | 86.10 | 26533415 | 84.31 |
| SGT077 | 27308034 | 27308034 | 23450991 | 85.88 | 22904031 | 83.87 |
| SGT161 | 27566691 | 27566691 | 23668265 | 85.86 | 23170780 | 84.05 |
| SGT187 | 29730857 | 29730857 | 25554609 | 85.95 | 24894597 | 83.73 |
| SGT230 | 30415770 | 30415770 | 26257368 | 86.33 | 25639411 | 84.30 |
| SGT238 | 29472514 | 29472514 | 25386147 | 86.13 | 24773292 | 84.06 |
| SGT241 | 29513223 | 29513223 | 25365246 | 85.95 | 24539542 | 83.15 |
| SGT274 | 30394832 | 30394832 | 26242677 | 86.34 | 25693406 | 84.53 |