Configuration¶
Bam_files_info.dic¶
BamQuery stores in a Python dictionary information about each BAM/CRAM file queried. The key for each bam/cram file is obtained from the path where the file is located. Therefore, take precautions to store bam/cram files under informative folder names that serve to differentiate them. For example, for all GTEx healthy tissue cram files, the file organization should be as follows:
GTEx
├── adipose_subcutaneous
│ ├── SRR1333352
│ ├── SRR1338301
│ ├── SRR1338627
│ ├── SRR1339740
│ ├
With this file structure, a cram path file for GTEx should look like this: /home/GTEX/brain_amygdala/SRR1333352/SRR1333352.cram
.
In this example, BamQuery creates the key brain_amygdala_SRR1333352
to save information related to this sample.
This information is organized in a list as follows:
0: /home/GTEX/brain_amygdala/SRR1333352/SRR1333352.cram
–> Whole path to bam/cram file
1: 80302110
–> Total Primary Read count in the bam/cram file
2: brain_amygdala
–> Tissue
3: Brain
–> Tissue type
4: no
–> Shortlist
5: NA
–> Sequencing
6: NA
–> Library
7: User_1
–> The user that includes the bam/cram file information (first user quering a given bam file)
The Tissue, Tissue type and Shortlist fields must be provided by the first user who queries the given bam/cram file. This is done only once (see instructions below).
The sequencing and library fields are guessed directly by BamQuery from the bam/cram file. This is also done once when a user configures BamQuery to query the file taking into account its stradedness.
Provide details to each Bam file¶
Every time a BAM file is queried for the first time, you need to provided some information about the origin of the file.
This is why the following exception will appear when running BamQuery:
- fill in the `bam_files_tissues.csv` file with the requested information:
Before to continue you must provide the tissue type for the bam files annotated in the file : …/output/res/AUX_files/bam_files_tissues.csv. Please enter for each sample : tissue, tissue_type, shortlist.
To resolve this, you must fill in the bam_files_tissues.csv
file with the requested information.
BamQuery stores the information, so this is a one-time operation for each BAM file.
Columns in bam_files_tissues.csv
:
For each BAM file, you must provide tissue, tissue_type, shortlist.
This classification is used by BamQuery for the elaboration of the heatmaps. See heat_maps
tissue: Refers to the tissue of the sample. For example: prostate
tissue_type: It refers to a specific feauture of the tissue. For example: prostate tissue, can be classified as a type of SexSpecific tissue
shortlist: Yes or No. This sets the BAM file as part of a selected group of samples within a tissue type to calculate the average level of transcript expression.
Once the file bam_files_tissues.csv
has been filled, you can relaunch BamQuery.