Frequently Asked Questions

General
Running PharmCAT
Output-related
Gene-specific

General

Can I use my consumer genetic testing data (23andMe / Ancestry.org / etc.) with PharmCAT?

PharmCAT requires input genomic data to be in VCF format. If you can transform your data into valid VCF that meets the requirements outlined in PharmCAT documentation then you can run it. However, if you're not familiar with genomic data tools then this may be an extremely difficult task.

PharmCAT does not test against datasets generated by consumer testing companies so we can make no claim about how well they work. Many consumer testing datasets have limited overlap with most of the gene definitions used by PharmCAT so this could result in results with very few callable alleles and, thus, not very useful reports. PharmCAT works best with data derived from whole genome sequencing (WGS) datasets that have good coverage.

Regarding 23andMe in particular, some things to consider:

23andMe does genotyping and not sequencing. You will not have full coverage of the positions used by PharmCAT. This means you will have to make assumptions about those missing positions.
As of May 2021, 23andMe uses the GRCh37 assembly. This means you will have to re-align your data to the GRCh38 assembly that PharmCAT uses.

How can I get updates about PharmCAT?

The PharmCAT project is managed on GitHub which has many features for people who want to stay aware of changes happening with PharmCAT.

First, sign up for a free GitHub account.

Second, go to the PharmCAT repository, click the watch button, and configure notifications in a way that works for you.

When does PharmCAT release new versions?

PharmCAT releases new versions when substantial updates are ready to be released and not on a time-based schedule. For more information see our Versioning documentation.

No, PharmCAT does not copy or transmit any user-input data (i.e. input VCFs or outside call data) off of the system that it's being run on.

Does PharmCAT read genotype dosage data in a VCF?

No, PharmCAT only considers the information as stated in the VCF Requirements section. Genotype dosage refers to the posterior probability of allele counts as a result of imputation. Genotype dosage can take any value between 0 and 2, such as 1.05. It is often stored in a separate genotype field other than GT and requires the user to pick a numeric threshold to determine the specific allele counts at a position.

What VCF fields does PharmCAT use?

Please see the VCF requirements for the specific VCF fields used by PharmCAT.

When an optional FORMAT/AD field is present in a VCF file, PharmCAT will perform a quality assurance check on whether FORMAT/GT and FORMAT/GT agree with each other. The check was added to PharmCAT to address a reported issue where FORMAT/GT and FORMAT/AD were discrepant from each other and confused PharmCAT.

The VCF Preprocessor uses the INFO/END field to recognize gVCF, a file format that is yet to be supported. As the VCF specs v4.4 states: "END: End reference position (1-based), indicating the variant spans positions POS–END on reference/contig CHROM. Normally this is the position of the last base in the REF allele… and no END INFO field is needed. However when symbolic alleles are used, e.g. in gVCF or structural variants, an explicit END INFO field provides variant span information that is otherwise unknown." If your file is indeed a VCF, you can strip out the INFO/END or other fields that PharmCAT does not require.

Running PharmCAT

Can PharmCAT treat missing positions as reference?

PharmCAT will not be supporting this.

We want people to be 100% clear on how PharmCAT works and what happens with the data you provide to it. It does not accept arbitrary VCF for many reasons (see VCF Requirements for the full list of requirements), but the main one is that we will not make any assumptions on the input you provide. We have already encountered users making assumptions on how PharmCAT works or should work which has led to confusion down the line.

For one thing, we do not know what "reference" is because it can vary based on your reference sequence. Did you convert it from GRCh37 to GRCh38? If so, the "reference" from the two could have changed and your VCF would not provide any indications that this is the case. Secondly, a missing entry can mean that the reference base was detected OR it can mean the base was not assayed or has no call. We cannot distinguish between uncalled positions and reference in a VCF file. So we ask that you declare each required position for PharmCAT to be clear about the input.

You have to decide on how accurate you want the data you provide to PharmCAT should be, especially if you're making any clinical decisions based on PharmCAT's results. If you wish to make assumptions of your data, you are welcome to do so. Instructions on how to do this can be found here.

VCF parsing errors

PharmCAT and VCF Preprocessor is designed not to alter any info in the input VCF file. Please make sure your VCF file follow the VCF file specifications > 4.2.

One example is a VCF file where the QUAL column has entries other than the allowed numeric numbers or a missing value .. In this case, PharmCAT will complain about the VCF file format. If this happens or you see other parsing errors, please check whether your VCF file follows the VCF file specifications, and if necessary, contact the bioinformatics tool team for a proper solution.

Can I modify the definitions of alleles and phenotypes in PharmCAT?

PharmCAT is open source and can be modified to satisfy your own needs.

We do not, however, endorse modifying the allele or phenotype definitions to give different allele matching or phenotype results for genes already covered by PharmCAT. A goal of PharmCAT is to create transparent reports about what alleles or genetic positions are used to determine genotype and phenotype, and to promote consistent and robust results.

In general, we frequently get this question when there is a problem with the genotyping data. For example, if not all positions PharmCAT requires is available. The instinct is to remove those positions from PharmCAT's named allele definitions. But when a genetic position is removed, PharmCAT will not "see" those positions, which will likely cause the sample/study individual to be inaccurately reported as reference who in fact is not, incorrectly assigned genotypes or, even worse, phenotypes.

If you have no information about some genetic positions in your dataset, and want to ignore them or assume reference at those positions, there is an option in the Pharmcat VCF Preprocessor to set the missing positions to reference. We suggest using this option for research purposes only. We recommend against using this option for reporting results or implementation.

If instead you are interested in customizing PharmCAT to add support for additional genes and PGx recommendations, this is possible, but currently undocumented. Just adding the required JSON files to PharmCAT will only get you part of the way there. We are currently unable to support anyone looking to do this at this time because we are focused on providing actionable prescribing recommendations from authorities like CPIC and DPWG.

What happens if I provide an outside diplotype or phenotype for a gene also found in the VCF file?

Outside calls provided by the user will override the results from the VCF file. Details about the relative priority of outside calls can be found on the Outside Call Format page.

What are the meanings of unassigned function, uncertain function, unknown function for allele function? And N/A, no call, indeterminate for phenotype?

Uncertain function and unknown function are standardized CPIC allele function terms. Alleles with uncertain function are alleles that have been reviewed by CPIC experts but there has not been enough evidence to sufficiently draw a conclusion about the allele's clinical functional status to inform prescribing actionability. On the other hand, unknown function suggests that there is no literature describing the function.

Unassigned function is a PharmCAT term that describes a known allele which has not been assigned an allele function by CPIC. New alleles defined by e.g. PharmVar or the TPMT nomenclature committee, will be included in the corresponding CPIC gene allele definition files based on the SOP and thus, subsequently becomes part of PharmCAT. Nonetheless, allele function is generally only assigned when there is a new guideline or a guideline update that involves that gene. In this case, these newer alleles are included and reported in PharmCAT as unassigned function since they have not been assigned an allele function term by CPIC.

For genotype, no calls (shown as empty phenotype field or N/A in PharmCAT JSON outputs) indicate that a genotype cannot be determined based on the input VCF file and the allele definitions. Similarly, for phenotype, these suggest that 1) these genes do not have a diplotype-phenotype translation table (diplotypes are interpreted as is), or 2) a phenotype cannot be determined based on the genotype calls or diplotype-phenotype translation table due to, e.g., an allele with unassigned function that is yet to be reviewed.

Indeterminate is a standardized CPIC phenotype term assigned to genotypes containing uncertain function or unknown function alleles.

Please review the latest CPIC SOP for assigning allele function for further details or any updates on the definitions.

How to understand "Reference" or "*1"?

You will see "Reference" (for genes like DPYD, RYR1, CACNA1S, CFTR, etc.) or "*1" (for genes like CYP2B6, CYP2C9, etc.). "Reference" or "*1" indicate an absence of alternative genetic alleles at the PharmCAT interrogated genetic positions. They are assigned by default when no alternative variants are found at the queried positions. They do not suggest a lack of genetic variation at every position in the gene and should not be mistaken to mean an exact match to the entire reference sequence for the gene.

For the gene CFTR, "Reference" in the PharmCAT report corresponds to "ivacaftor non-reponsive CFTR sequnence" in the CPIC guideline.

How to render PharmCAT outputs into a tabular-formatted file

PharmCAT is designed to take a single-sample VCF file and generate an individual PGx report in JSON or HTML formats. To support data analysis, we provide scripts and examples that render PharmCAT JSON outputs to tabular-formatted files. You can follow the instructions on this PharmCAT multi-sample analysis page for how to convert PharmCAT JSONs into TSV or CSV files.

Create one's own PDF report based on the PharmCAT .JSON file

You can create your own pdf report based on PharmCAT's .JSON file. If you do so, please refer to the PharmCAT website at https://pharmcat.org/ and cite our methods paper: K Sangkuhl & M Whirl-Carrillo, et al. Pharmacogenomics Clinical Annotation Tool (PharmCAT). Clinical Pharmacology & Therapeutics (2020) 107(1):203-210.

Please note that PharmCAT is a research tool and note our disclaimers: https://pharmcat.org/Disclaimers/.

Note that PharmCAT is being actively developed, so there will be ongoing content updates and bug fixes. Additionally, in order for PharmCAT to stay current with alleles defined by PharmVar and recommendations from CPIC and DPWG, PharmCAT is continually being released with updates. If the most current version of PharmCAT is not being used at any given time, it may not be the most accurate or complete version.

Why do PharmCAT output have genetic variants that are not listed in the pharmcat_position.vcf?

Some PGx allele-defining positions are multiallelic and can harbor other genetic variants. PharmCAT and VCF Preprocessor is designed not to alter any info in the input VCF file. As a result, it retains all genetic variants at PGx allele-defining positions represented in an input file. This, however, will not affect the appropriate PGx calls.

Gene-specific

Can PharmCAT call CYP2D6?

If you have access to whole genome sequencing (WGS) CRAM/BAM files, we strongly discourage calling CYP2D6 using PharmCAT. Please refer to our documentation about calling CYP2D6.

Starting with v2.0, PharmCAT provides a research mode for calling CYP2D6. PharmCAT is designed to take VCF as input which is NOT a desirable file format for calling CYP2D6 alleles. This research mode for CYP2D6 calls alleles using ONLY SNPs and INDELs that are available in VCF files. Please note that this approach has many caveats. VCF can't handle structural variation (SV) and copy number variation (CNV) which are essential for calling CYP2D6 alleles, especially CNVs for ultrarapid metabolizers. VCF format cannot correctly reflect whole gene deletion (*5), which will lead to erroneous calls and beyond the capability of PharmCAT. CYP2D6 calls made from VCFs should not be used for clinical purposes. This research mode should be used at your own risk.

G6PD for samples with only one chrX

While PharmCAT supports hemizygotes for genes such as G6PD, you need to pay attention to how the G6PD genotypes are represented in your VCF especially for male samples or samples with only one X chromosome. Some samples only have one copy of the X chromosome, a.k.a., hemizygotes. Nonetheless, many variant calling software or bioinformatics pipelines do not necessarily consider the hemizygosity of the X chromosome in these samples and will represent these samples as homozygotes.

Based on the VCF file format specifications, chrX should be observed as a haploid (GT field = 0) in a male with a single X chromosome and a diplotype (GT field = 0/0) in a female with two X chromosomes.

In reality, for many variant calling pipelines, you will find that all samples are diploid on the X chromosome regardless of the number of X chromosomes a sample actually has. Male samples (or a sample with a single X chromosome) appear to be homozygous across all chrX positions, while female samples (or samples with more than one X chromosome) tend to be heterozygous at some positions.

You won't be able to tell whether a sample is a male or a female, or whether the sample has one or more X chromosomes, if you only know that this sample is homozygous on the X chromosomes. This sample can be a haploid that is accidentally represented as homozygous diploid, or this sample can be indeed a homozygous diploid.

If you run PharmCAT on male samples or samples with only one chrX, be aware of the issue and use only the haploid for male samples or samples with a single X chromosome for reporting purposes. Nonetheless, the drug prescribing recommendations should be the same for these samples regardless whether they are observed correctly as a hemizygote or a diploid for the X chromosome.

We will add the support for hemizygotes at the X chromosome in the PharmCAT VCF Preprocessor in the future.

Why is the CYP2C Cluster variant, rs12777823, in the CYP2C9 section of PharmCAT's JSON output?

PharmCAT includes an intergenic single nucleotide variation (SNV), rs12777823, based on the CPIC warfarin guideline. This SNV is in the CYP2C cluster on chromosome 10 but independent of the CYP2C9 gene, and is listed independently in the PharmCAT HTML report. However, the PharmCAT JSON output is gene-dependent. For this reason, rs12777823 is nested under CYP2C9 as a CYP2C POI (position of interest) in the JSON even though the SNV is not located within the CYP2C9 gene boundary and does not affect CYP2C9 genotype or phenotype.