PharmCAT supports CYP2D6 data with the following options, which are discussed in more detail below:
- use a different tool to determine CYP2D6 diplotypes and use PharmCAT to provide the translation into phenotypes and recommendations
- run PharmCAT in research mode to call CYP2D6 from VCF files
While PharmCAT supports CYP2D6, we do NOT recommend calling CYP2D6 from VCF due to the large influence of structural variation (SV) and copy number variation (CNV) on inferring CYP2D6 phenotype, which is beyond the scope of what can be called from SNPs or INDELs in a VCF file.
CYP2D6 phenotype prediction is sensitive to SV and CNV. Two or more copies of CYP2D6 on one chromosome have been reported for normal function (e.g.
*2), decreased function (e.g.
*17) and no function (e.g.
*36) alleles. The CYP2D6 ultrarapid metabolizer phenotype (UM, activity score >2.25) is predicted based on duplications of normal function alleles and the absence of alleles with an activity value of 0 or 0.25 (e.g.
CYP2D6*2x2/*17) or multiplications of normal function alleles (e.g.
CYP2D6*1x3/*4). As such, CYP2D6 UMs cannot be called using only SNPs and INDELs in a VCF file. Additionally, missing the copy number of a normal or decreased function allele can misrepresent a normal metabolizer (NM) (e.g.
*10/*17x2) as an intermediate metabolizer (IM) (e.g.
*10/*17). Moreover, SVs, such as CYP2D6 and CYP2D7 hybrid alleles (e.g.
*68) and the whole gene deletion (
*5), are no function alleles. At the moment, it is still difficult to have accurate representation of these alleles defined by complex SV in a VCF. Omission or misrepresentation of these alleles, depending on other alleles, will lead to potential carrier being mistakenly reported as IM or NM.
In the specific case where a sample has the whole gene deletion (
*5) on one CYP2D6 allele and presents variants on the other CYP2D6 allele, these hemizygous variants will be falsely presented as homozygous in the VCF, e.g.
*5/*29 will be detected as
*29/*29 due to this misrepresentation in a VCF file.
We strongly discourage using whole-exome sequencing or low coverage whole-genome sequencing data to call CYP2D6 based on our comparison below.
For demonstration purposes, we are using StellarPGx, which supports calling CYP2D6 with structural variants and CNV from whole genome sequencing (WGS) CRAM/BAM files. StellarPGx calls have been compared against the CDC's GeT-RM dataset which provides consensus reference diplotypes for selected samples. According to the StellarPGx paper, StellarPGx produces results with a higher concordance to GeT-RM than other available CYP2D6 callers. However, this benchmark was performed using GeT-RM's short-read 30x PCR-free WGS dataset. In addition to high-coverage WGS, GeT-RM also provides access to low-coverage WGS and whole exome sequencing (WES) data for some samples. We selected 6 random Get-RM samples and ran StellarPGx on the high-coverage WGS, low-coverage WGS, and WES and compared the results to the GeT-RM ground truth calls.
|Sample||WES||Low-Cov WGS||30x WGS||GeT-RM|
There is drastic variation in the calls produced by the three sequencing technologies. In many cases, the WES or low-coverage WGS input resulted in a "no call" result. In some samples, improbably high copy numbers and calls that varied greatly from the GeT-RM calls were produced. The 30x WGS produced identical or equivalent results most of the time, and when it deviated the differences were relatively minor. In conclusion, we do not recommend the use WES or low-coverage WGS for calling CYP2D6 for both research and clinical purposes.
PharmCAT supports pulling in results from other tools using what we call "outside calls". This allows you to select your preferred CYP2D6 caller and use PharmCAT for the phenotype translation and to include CYP2D6 results in a PharmCAT report.
To include outside calls you need to use the
-po flag in PharmCAT to specify the outside calls file. For example:
# java -jar pharmcat.jar -vcf patient_001.vcf -po patient_001_cyp2d6.txt
patient_001_cyp2d6.txt takes the format of:
CYP2D6 *1x2/[*2 + *6]
For more information:
- see Running PharmCAT for details on the
- see Outside Call Format for details on the outside call file
As mentioned above, if you have whole genome sequencing (WGS) CRAM/BAM files, your best option is a tool like StellarPGx. Please note we are not affiliated with the StellarPGx team and offer no guarantees about its performance. Any questions or concerns on StellarPGx should be directed to the StellarPGx maintainer at [email protected]. Usage instructions for StellarPGx can be found on their GitHub repo.
While this tutorial is StellarPGx specific, it should illustrate how to integrate genotype calls from other tools.
After running StellarPGx with CYP2D6 as the target gene, it should produce a
<run_name>_summary.txt file which looks like:
HG00436 *71/*2x2 HG01086 [*1/*31] Possible novel allele or suballele present: interpret with caution; experimental validation and expert review through PharmVar is recommended HG01190 *5/*68+*4 NA07048 *139/*4 NA18545 *36+*10/*36+*10 NA21105 *111/*3
Column 1 is the sample name and column 2 is the CYP2D6 call. In order to convert this into a PharmCAT-readable format you must split this into a separate file for each sample.
The following is a simple python script that parses StellarPGx's
summary.txt file and outputs PharmCAT-ready outside calls files for use with the
import sys import os input_path = sys.argv # summary.txt file from StellarPGx output_dir = sys.argv # output directory with open(input_path, "r") as infile: entries = infile.readlines() # read all the StellarPGx file lines into a list for entry in entries: # for each sample in the StellarPGx file split = entry.split("\t") # split the lines into columns sample = split # extract the sample name call = split.strip(" \n") # extract the sample call and strip excess spaces and brackets if not os.path.exists(output_dir): # create the output directory if it doesn't exist yet os.makedir(output_dir) with open("%s/%s_cyp2d6.txt" % (output_dir,sample), "w") as outfile: # write the sample to a text file in the specified output directory outfile.write("CYP2D6\t%s" % call)
# python3 stellarPGx_to_PharmCAT.py summary.txt pharmcat_inputs
summary.txt is the StellarPGx output and
pharmcat_inputs is the directory where the call files should be placed. After running this, calling
ls pharmcat_inputs to list the directory contents should show:
# ls pharmcat_inputs HG00436_cyp2d6.txt HG01086_cyp2d6.txt HG01190_cyp2d6.txt NA07048_cyp2d6.txt NA18545_cyp2d6.txt NA21105_cyp2d6.txt
HG00436_cyp2d6.txt would look like
We urge you to manually verify that the calls in the PharmCAT-ready files correspond to the StellarPGx output.
As stated above, we do NOT recommend calling CYP2D6 from VCF due to the large influence of SV and CNV on phenotype prediction.
PharmCAT can, however, call CYP2D6 star alleles that are defined based on SNPs and/or INDELs. If you are willing to accept all these caveats, this functionality is gated behind the
To get PharmCAT to call CYP2D6, use
# java -jar pharmcat.jar -vcf patient_001.vcf -research cyp2d6