Calling CYP2D6

PharmCAT supports CYP2D6 data with the following options, which are discussed in more detail below:

  • use a different tool to determine CYP2D6 diplotypes and use PharmCAT to provide the translation into phenotypes and recommendations
  • run PharmCAT in research mode to call CYP2D6 from VCF files

CYP2D6 complications

Problems with calling CYP2D6 from VCF

While PharmCAT supports CYP2D6, we do NOT recommend calling CYP2D6 from VCF due to the large influence of structural variation (SV) and copy number variation (CNV) on inferring CYP2D6 phenotype, which is beyond the scope of what can be called from SNPs or INDELs in a VCF file.

CYP2D6 phenotype prediction is sensitive to SV and CNV. Two or more copies of CYP2D6 on one chromosome have been reported for normal function (e.g. *1, *2), decreased function (e.g. *10, *17) and no function (e.g. *4, *36) alleles. The CYP2D6 ultrarapid metabolizer phenotype (UM, activity score >2.25) is predicted based on duplications of normal function alleles and the absence of alleles with an activity value of 0 or 0.25 (e.g. CYP2D6*2x2/*1, CYP2D6*2x2/*17) or multiplications of normal function alleles (e.g. CYP2D6*1x3/*4). As such, CYP2D6 UMs cannot be called using only SNPs and INDELs in a VCF file. Additionally, missing the copy number of a normal or decreased function allele can misrepresent a normal metabolizer (NM) (e.g. *2x2/*4, *10/*17x2) as an intermediate metabolizer (IM) (e.g. *2/*4, *10/*17). Moreover, SVs, such as CYP2D6 and CYP2D7 hybrid alleles (e.g. *13, *68) and the whole gene deletion (*5), are no function alleles. At the moment, it is still difficult to have accurate representation of these alleles defined by complex SV in a VCF. Omission or misrepresentation of these alleles, depending on other alleles, will lead to potential carrier being mistakenly reported as IM or NM.

In the specific case where a sample has the whole gene deletion (*5) on one CYP2D6 allele and presents variants on the other CYP2D6 allele, these hemizygous variants will be falsely presented as homozygous in the VCF, e.g. *5/*29 will be detected as *29/*29 due to this misrepresentation in a VCF file.

Problems with whole-exome sequencing/low coverage whole-genome sequencing

We strongly discourage using whole-exome sequencing or low coverage whole-genome sequencing data to call CYP2D6 based on our comparison below.

For demonstration purposes, we are using StellarPGx, which supports calling CYP2D6 with structural variants and CNV from whole genome sequencing (WGS) CRAM/BAM files. StellarPGx calls have been compared against the CDC's GeT-RM dataset which provides consensus reference diplotypes for selected samples. According to the StellarPGx paper, StellarPGx produces results with a higher concordance to GeT-RM than other available CYP2D6 callers. However, this benchmark was performed using GeT-RM's short-read 30x PCR-free WGS dataset. In addition to high-coverage WGS, GeT-RM also provides access to low-coverage WGS and whole exome sequencing (WES) data for some samples. We selected 6 random Get-RM samples and ran StellarPGx on the high-coverage WGS, low-coverage WGS, and WES and compared the results to the GeT-RM ground truth calls.

Sample WES Low-Cov WGS 30x WGS GeT-RM
HG00436 No_call No_call *71/*2x2 *2x2/*71
HG01086 *1/*1x2 No_call *1/*31 *1/*31
HG01190 *1/*1 *5/*1 *5/*68+*4 *68+*4/*5
NA07048 No_call *1/*1 *139/*4 *1/*4
NA18545 *34/*34x7 *1/*1x5 *36+*10/*36+*10 *5/*36x2+*10x2
NA21105 No_call *34/*34 *111/*3 *3/*111

There is drastic variation in the calls produced by the three sequencing technologies. In many cases, the WES or low-coverage WGS input resulted in a "no call" result. In some samples, improbably high copy numbers and calls that varied greatly from the GeT-RM calls were produced. The 30x WGS produced identical or equivalent results most of the time, and when it deviated the differences were relatively minor. In conclusion, we do not recommend the use WES or low-coverage WGS for calling CYP2D6 for both research and clinical purposes.

Working with CYP2D6 in PharmCAT

Using outside calls

PharmCAT supports pulling in results from other tools using what we call "outside calls". This allows you to select your preferred CYP2D6 caller and use PharmCAT for the phenotype translation and to include CYP2D6 results in a PharmCAT report.

To include outside calls you need to use the -po flag in PharmCAT to specify the outside calls file. For example:

# java -jar pharmcat.jar -vcf patient_001.vcf -po patient_001_cyp2d6.txt

where patient_001_cyp2d6.txt takes the format of:

CYP2D6	*1x2/[*2 + *6]

For more information:

Formatting StellarPGx output for PharmCAT

As mentioned above, if you have whole genome sequencing (WGS) CRAM/BAM files, your best option is a tool like StellarPGx. Please note we are not affiliated with the StellarPGx team and offer no guarantees about its performance. Any questions or concerns on StellarPGx should be directed to the StellarPGx maintainer at [email protected]. Usage instructions for StellarPGx can be found on their GitHub repo.

While this tutorial is StellarPGx specific, it should illustrate how to integrate genotype calls from other tools.

After running StellarPGx with CYP2D6 as the target gene, it should produce a <run_name>_summary.txt file which looks like:

HG00436	*71/*2x2
HG01086	[*1/*31]	Possible novel allele or suballele present: interpret with caution; experimental validation and expert review through PharmVar is recommended
HG01190	*5/*68+*4
NA07048	*139/*4
NA18545	*36+*10/*36+*10
NA21105	*111/*3

Column 1 is the sample name and column 2 is the CYP2D6 call. In order to convert this into a PharmCAT-readable format you must split this into a separate file for each sample.

The following is a simple python script that parses StellarPGx's summary.txt file and outputs PharmCAT-ready outside calls files for use with the -a flag.

import sys
import os

input_path = sys.argv[1] # summary.txt file from StellarPGx
output_dir = sys.argv[2] # output directory

with open(input_path, "r") as infile:
    entries = infile.readlines() # read all the StellarPGx file lines into a list

for entry in entries: # for each sample in the StellarPGx file
    split = entry.split("\t") # split the lines into columns
    sample = split[0] # extract the sample name
    call = split[1].strip("[] \n") # extract the sample call and strip excess spaces and brackets

    if not os.path.exists(output_dir): # create the output directory if it doesn't exist yet
        os.makedir(output_dir)

    with open("%s/%s_cyp2d6.txt" % (output_dir,sample), "w") as outfile:
        # write the sample to a text file in the specified output directory
        outfile.write("CYP2D6\t%s" % call)

Usage example:

# python3 stellarPGx_to_PharmCAT.py summary.txt pharmcat_inputs

Where summary.txt is the StellarPGx output and pharmcat_inputs is the directory where the call files should be placed. After running this, calling ls pharmcat_inputs to list the directory contents should show:

# ls pharmcat_inputs
HG00436_cyp2d6.txt  HG01086_cyp2d6.txt  HG01190_cyp2d6.txt  NA07048_cyp2d6.txt  NA18545_cyp2d6.txt  NA21105_cyp2d6.txt

and HG00436_cyp2d6.txt would look like

CYP2D6  *71/*2x2

We urge you to manually verify that the calls in the PharmCAT-ready files correspond to the StellarPGx output.

Calling CYP2D6 with PharmCAT's research mode

As stated above, we do NOT recommend calling CYP2D6 from VCF due to the large influence of SV and CNV on phenotype prediction.

PharmCAT can, however, call CYP2D6 star alleles that are defined based on SNPs and/or INDELs. For more information, read the Research Mode page.


PharmCAT is managed at Stanford University & University of Pennsylvania (NHGRI U24HG013077).