4. Input files

4.1. Alignment file

BamSnap requires sorted and indexed bam or cram files. For each alignment file, the index file (.bam.bai, .bai, .cram.crai, or .crai) should be located in the same directory.

4.1.1. Input files (-bam for BAM or CRAM format)

Input files to be used can be specified using the -bam argument. It is possible to specify a single file or list multiple files. Also, a cram file can be assigned with -bam argument.

$ bamsnap -bam ./data/NA12878.bam
$ bamsnap -bam ./data/NA12878.bam ./data/NA12877.bam ./data/NA12879.bam
$ bamsnap -bam ./data/NA12878.cram
$ bamsnap -bam ./data/NA12878.cram ./data/NA12877.bam ./data/NA12879.bam

Note

BamSnap supports both the indexed BAM and the indexed CRAM format for the alignment files.

4.1.1.1. Title of alignment file(s) (-title)

A label can be assigned to each of the bam files using the -title argument. The label will be used as title for the corresponding plot.

$ bamsnap -bam ./data/NA12879.bam -title NA12879
$ bamsnap -bam ./data/NA12879.bam -title "NA12879  (Daughter)"
$ bamsnap -bam ./data/NA12878.bam ./data/NA12877.bam ./data/NA12879.bam \
  -title "NA12877 (Father)" "NA12878 (Mother)" "NA12879 (Daughter)"
_images/pic_title1.png

If no label is specified, the file name will be used as title by default.

_images/pic_title2.png

To completely remove the title use the -no_title option.

$ bamsnap -bam ./data/NA12879.bam -no_title
_images/pic_title3.png

Note

By default, the title font size is 18. It is possible to change the font size with -title_fontsize (e.g. -title_fontsize 10).

4.1.2. BAM list file (-bamlist)

$ bamsnap -bamlist ./data/NATRIO_bamlist.txt

It is possible to provide a single file listing all the input bam files to be used. The expected format is a tabular (tab-separated) file. The first column is mandatory and must contain the paths to files, the second column is optional and allows to associate labels to files. It also supports .bam and .cram file.

# example of bamlist file with lables
./data/NA12878.bam    NA12878 (F)
./data/NA12877.cram   NA12877 (M)
./data/NA12879.bam    NA12879 (D)
# example of bamlist file
./data/NA12878.bam
./data/NA12877.cram
./data/NA12879.bam

4.2. Genomic position

4.2.1. Genomic position (-pos)

Genomic positions to plot can be specified with the -pos option. It is possible to specify a single position or to list multiple regions.

$ bamsnap -bam ./data/NA12878.bam -pos chr1:7364529
$ bamsnap -bam ./data/NA12878.bam -pos chr1:7364529 chr3:7364529 chr1:7364529
$ bamsnap -bam ./data/NA12878.bam -pos chr1:7364509-7364559

Note

Chromosome names must match between the positions that are specified and the bam files. For example, ‘chr’ prefix should be omitted from regions if the bam files don’t use ‘chr’ prefix in chromosome names (ex. 1:7364529).

4.2.2. VCF file (-vcf)

The program can read .vcf (raw) and .vcf.gz (gzip or bgzip compressed vcf) files.

$ bamsnap \
  -bam ./data/NA12878.bam \
  -vcf ./data/multiple_variants.vcf.gz \
  -out ./out/mutiple_variants_NA12878

4.2.3. BED file (-bed)

$ bamsnap \
  -bam ./data/NA12878.bam \
  -bed ./data/multiple_regions.bed \
  -out ./out/mutiple_regions_NA12878

4.3. Reference sequence file

User can provide a fasta file to be used as reference using the -ref option. Alternatively, it is possible to specify a reference version to be used with -refversion. The program will automatically obtain the corresponding sequence from UCSC database. The current default version for -refversion is hg38. -refversion hg19 force the use of hg19 release.

4.3.1. FASTA file (-ref)

$ bamsnap \
  -bam ./data/NA12879.bam_chr10_117542947.bam \
  -ref ./fasta/GRCh38_full_analysis_set_plus_decoy_hla.fa

Note

If a fasta file is specified, the program checks for its index file (.fai). If the index file does not exist it will be automatically created. If the index file exists but is older than the fasta file, the program can rebuild the index using the -ref_index_rebuild option.