Frequently Asked Questions
Table of contents
- General
- Can I use my consumer genetic testing data (23andMe / Ancestry.org / etc.) with PharmCAT?
- How can I get updates about PharmCAT?
- When does PharmCAT release new versions?
- Does PharmCAT transmit input data over the network or otherwise share input data?
- Does PharmCAT read genotype dosage data in a VCF?
- What VCF fields does PharmCAT use?
- Running PharmCAT
- Can PharmCAT treat missing positions as reference?
- VCF parsing errors
- Can I modify the definitions of alleles and phenotypes in PharmCAT?
- What happens if I provide an outside diplotype or phenotype for a gene also found in the VCF file?
- Why doesn’t the Preprocessor normalize consecutive homozygous reference positions?
- Output-related
- What are the meanings of unassigned function, uncertain function, unknown function for allele function? And N/A, no call, indeterminate for phenotype?
- How to understand "Reference" or "*1"?
- How to render PharmCAT outputs into a tabular-formatted file
- Create one's own PDF report based on the PharmCAT .JSON file
- Why does PharmCAT output have genetic variants that are not listed in the pharmcat_position.vcf?
- Gene-specific
General
Can I use my consumer genetic testing data (23andMe / Ancestry.org / etc.) with PharmCAT?
PharmCAT requires input genomic data to be in VCF format. If you can transform your data into valid VCF that meets the requirements outlined in PharmCAT's documentation, then you can use it. However, if you're not familiar with genomic data tools, then this may be an extremely challenging task.
PharmCAT does not test against datasets generated by consumer testing companies, so we make no claim about how well they work. Many consumer testing datasets have limited overlap with most of the gene definitions used by PharmCAT, so this could result in results with very few callable alleles and therefore not very useful reports. PharmCAT works best with data derived from whole genome sequencing (WGS) datasets that have good coverage.
Regarding 23andMe in particular, some things to consider:
- 23andMe does genotyping and not sequencing. You will not have full coverage of the positions used by PharmCAT. This means you will have to make assumptions about those missing positions.
- As of May 2021, 23andMe uses the GRCh37 assembly. This means you will have to re-align your data to the GRCh38 assembly that PharmCAT uses.
How can I get updates about PharmCAT?
The PharmCAT project is managed on GitHub, which has many features for people who want to stay aware of changes happening with PharmCAT:
- Sign up for a free GitHub account
- Go to the PharmCAT repository
- Click the watch button
- Configure notifications in a way that works for you
When does PharmCAT release new versions?
PharmCAT releases new versions when substantial updates are ready to be released and not on a time-based schedule. For more information, see our Versioning documentation.
Does PharmCAT transmit input data over the network or otherwise share input data?
No, PharmCAT does not copy or transmit any user-input data (i.e. input VCFs or outside call data) off of the system that it's being run on.
Does PharmCAT read genotype dosage data in a VCF?
No, PharmCAT only considers the information as stated in the VCF Requirements section. Genotype dosage refers to the posterior probability of allele counts as a result of imputation. Genotype dosage can take any value between 0 and 2, such as 1.05. It is often stored in a separate genotype field other than GT and requires the user to pick a numeric threshold to determine the specific allele counts at a position.
What VCF fields does PharmCAT use?
Please see the VCF requirements for the specific VCF fields used by PharmCAT.
When an optional FORMAT/AD field is present in a VCF file, PharmCAT will perform a simple validation check on whether FORMAT/GT and FORMAT/AD agree with each other. The check was added to PharmCAT to address a reported issue where FORMAT/GT and FORMAT/AD were contradicting each other and confused PharmCAT.
The VCF Preprocessor uses the INFO/END field to recognize gVCF, a file format that is yet to be supported. As the VCF specs v4.4 states:
END: End reference position (1-based), indicating the variant spans positions POS–END on reference/contig CHROM. Normally this is the position of the last base in the REF allele and no END INFO field is necessary. However, when symbolic alleles are used, e.g. in gVCF or structural variants, an explicit END INFO field provides variant span information that is otherwise unknown.
To address a reported issue, the PharmCAT VCF Preprocessor now has a feature to bypass the gVCF check. See Running the VCF Preprocessor for details.
This feature should only be used when you are sure your file is not a gVCF file. Alternatively, you can strip out the INFO/END or other fields that PharmCAT does not require.
Running PharmCAT
Can PharmCAT treat missing positions as reference?
PharmCAT will not be supporting this.
We want people to be 100% clear on how PharmCAT works and what happens with the data you provide to it. It does not accept arbitrary VCF for many reasons (see VCF Requirements for the full list of requirements), but the main one is that we will not make any assumptions on the input you provide. We have already encountered users making assumptions on how PharmCAT works or should work, which has led to confusion down the line.
For more reasons behind this decision, please see VCF Requirements.
VCF parsing errors
PharmCAT and the VCF Preprocessor are designed not to alter any info in the input VCF file. Please make sure your VCF file follows PharmCAT's VCF requirements.
One example is a VCF file where the QUAL
column has entries other than the allowed numeric numbers or a missing .
value. In this case, PharmCAT will complain about the VCF file format. But the root cause is that the input VCF deviates from the VCF file specifications. If this happens or you see other parsing errors, please check whether your VCF file follows the VCF file specifications, and if necessary, contact your bioinformatics team for a proper solution.
Can I modify the definitions of alleles and phenotypes in PharmCAT?
PharmCAT is open source and can be modified to satisfy your own needs.
We do not, however, endorse modifying the allele or phenotype definitions to give different allele matching or phenotype results for genes already covered by PharmCAT. A goal of PharmCAT is to create transparent reports about what alleles or genetic positions are used to determine genotype and phenotype, and to promote consistent and robust results.
In general, we frequently get this question when there is a problem with the genotyping data. For example, if not all positions PharmCAT requires is available. The instinct is to remove those positions from PharmCAT's named allele definitions. But when a genetic position is removed, PharmCAT will not "see" those positions, which will likely cause the sample/study individual to be inaccurately reported as reference who in fact is not, incorrectly assigned genotypes or, even worse, phenotypes.
If you have no information about some genetic positions in your dataset and want to ignore them or assume reference at those positions, there is an option in the PharmCAT VCF Preprocessor to set the missing positions to reference. We suggest using this option for research purposes only. We recommend against using this option for reporting results or implementation.
If instead you are interested in customizing PharmCAT to add support for additional genes and PGx recommendations, this is possible but currently undocumented. Just adding the required JSON files to PharmCAT will only get you part of the way there. We are currently unable to support anyone looking to do this at this time because we are focused on providing actionable prescribing recommendations from authorities CPIC, DPWG and the FDA.
What happens if I provide an outside diplotype or phenotype for a gene also found in the VCF file?
Outside calls provided by the user will override the results from the VCF file. Details about the relative priority of outside calls can be found on the Outside Call Format page.
Why doesn’t the Preprocessor normalize consecutive homozygous reference positions?
The Preprocessor does not consider consecutive homozygous reference genotypes as evidence for homozygous reference INDELs. For the reason behind this decision, please see VCF Requirements.
Output-related
What are the meanings of unassigned function, uncertain function, unknown function for allele function? And N/A, no call, indeterminate for phenotype?
Uncertain function and unknown function are standardized CPIC allele function terms. Alleles with uncertain function are alleles that have been reviewed by CPIC experts, but there has not been enough evidence to sufficiently draw a conclusion about the allele's clinical functional status to inform prescribing actionability. On the other hand, unknown function suggests that there is no literature describing the function.
Unassigned function is a PharmCAT term that describes a known allele which has not been assigned an allele function by CPIC. New alleles defined by e.g. PharmVar or the TPMT nomenclature committee, will be included in the corresponding CPIC gene allele definition files based on the SOP and thus subsequently become part of PharmCAT. Nonetheless, allele function is generally only assigned when there is a new guideline or a guideline update that involves that gene. In this case, these newer alleles are included and reported in PharmCAT as unassigned function since they have not been assigned an allele function term by CPIC.
For genotype, no calls (shown as empty phenotype field or N/A in PharmCAT JSON outputs) indicate that a genotype cannot be determined based on the input VCF file and the allele definitions. Similarly, for phenotype, these suggest that 1) these genes do not have a diplotype-phenotype translation table (diplotypes are interpreted as is), or 2) a phenotype cannot be determined based on the genotype calls or diplotype-phenotype translation table due to, e.g., an allele with unassigned function that is yet to be reviewed.
Indeterminate is a standardized CPIC phenotype term assigned to genotypes containing uncertain function or unknown function alleles.
Please review the latest CPIC SOP for assigning allele function for further details or any updates on the definitions.
How to understand "Reference" or "*1"?
You will see "Reference" (for genes like DPYD, RYR1, CACNA1S, CFTR, etc.) or "*1" (for genes like CYP2B6, CYP2C9, etc.). "Reference" or "*1" indicate an absence of alternative genetic alleles at the PharmCAT interrogated genetic positions. They are assigned by default when no alternative variants are found at the queried positions. They do not suggest a lack of genetic variation at every position in the gene and should not be mistaken to mean an exact match to the entire reference sequence for the gene.
For the gene CFTR, "Reference" in the PharmCAT report corresponds to "ivacaftor non-responsive CFTR sequence" in the CPIC guideline.
How to render PharmCAT outputs into a tabular-formatted file
PharmCAT is designed to take a single-sample VCF file and generate an individual PGx report in JSON or HTML formats. To support data analysis, we provide scripts and examples that render PharmCAT JSON outputs to tabular-formatted files. You can follow the instructions on this PharmCAT multi-sample analysis page for how to convert PharmCAT's JSON files into TSV or CSV files.
Create one's own PDF report based on the PharmCAT .JSON file
You can create your own PDF report based on PharmCAT's .JSON files. If you do so, please refer to the PharmCAT website at https://pharmcat.org/ and cite our methods paper: K Sangkuhl & M Whirl-Carrillo, et al. Pharmacogenomics Clinical Annotation Tool (PharmCAT). Clinical Pharmacology & Therapeutics (2020) 107(1):203-210.
Please note that PharmCAT is a research tool and review our disclaimers.
Note that PharmCAT is being actively developed, so there will be ongoing improvements and bug fixes. PharmCAT is also continually being updated to stay current with alleles defined by PharmVar and recommendations from CPIC, DPWG and the FDA. If the latest version of PharmCAT is not being used at any given time, it may not be the most accurate or complete version.
Why does PharmCAT output have genetic variants that are not listed in the pharmcat_position.vcf?
Some PGx allele-defining positions are multiallelic and can harbor other genetic variants. PharmCAT and the VCF Preprocessor are designed not to alter any info in the input VCF file. As a result, it retains all genetic variants at PGx allele-defining positions represented in an input file. This, however, will not affect the appropriate PGx calls.
Gene-specific
Can PharmCAT call CYP2D6?
If you have access to whole genome sequencing (WGS) CRAM/BAM files, we strongly discourage calling CYP2D6 using PharmCAT. Please refer to our documentation about calling CYP2D6.
Starting with v2.0, PharmCAT provides a research mode for calling CYP2D6. PharmCAT is designed to take VCF as input, which is NOT a desirable file format for calling CYP2D6 alleles. This research mode for CYP2D6 calls alleles using ONLY SNPs and INDELs that are available in VCF files. Please note that this approach has many caveats. VCF can't handle structural variation (SV) and copy number variation (CNV) which are essential for calling CYP2D6 alleles, especially CNVs for ultrarapid metabolizers. VCF format cannot correctly reflect whole gene deletion (*5), which will lead to erroneous calls and beyond the capability of PharmCAT. CYP2D6 calls made from VCFs should not be used for clinical purposes. This research mode should be used at your own risk.
G6PD for samples with only one chrX
While PharmCAT supports hemizygotes for genes such as G6PD, you need to pay attention to how the G6PD genotypes are represented in your VCF, especially for male samples or samples with only one X chromosome. Some samples only have one copy of the X chromosome, a.k.a., hemizygotes. Nonetheless, many variant calling software or bioinformatics pipelines do not necessarily consider the hemizygosity of the X chromosome in these samples and will represent these samples as homozygotes.
Based on the VCF file format specifications, chrX should be observed as a haploid (GT field = 0) in a male with a single X chromosome and a diplotype (GT field = 0/0) in a female with two X chromosomes.
In reality, for many variant calling pipelines, you will find that all samples are diploid on the X chromosome regardless of the number of X chromosomes a sample actually has. Male samples (or a sample with a single X chromosome) appear to be homozygous across all chrX positions, while female samples (or samples with more than one X chromosome) tend to be heterozygous at some positions.
You won't be able to tell whether a sample is a male or a female, or whether the sample has one or more X chromosomes if you only know that this sample is homozygous on the X chromosomes. While this sample can indeed be a homozygous diploid, it can also be a haploid that has been accidentally represented as a homozygous diploid.
If you run PharmCAT on male samples or samples with only one chrX, be aware of the issue and use only the haploid for male samples or samples with a single X chromosome for reporting purposes. Nonetheless, the drug prescribing recommendations should be the same for these samples regardless of whether they are observed correctly as a hemizygote or a diploid for the X chromosome.
We will add the support for hemizygotes at the X chromosome in the PharmCAT VCF Preprocessor in the future.
Why is the CYP2C Cluster variant, rs12777823, in the CYP2C9 section of PharmCAT's JSON output?
PharmCAT includes an intergenic single nucleotide variation (SNV), rs12777823, based on the CPIC warfarin guideline. This SNV is in the CYP2C cluster on chromosome 10 but independent of the CYP2C9 gene, and is listed independently in the PharmCAT HTML report. However, the PharmCAT JSON output is gene-dependent. For this reason, rs12777823 is nested under CYP2C9 as a CYP2C POI (position of interest) in the JSON even though the SNV is not located within the CYP2C9 gene boundary and does not affect CYP2C9 genotype or phenotype.