Named Allele Matcher 201
This document attempts to cover more advanced details of the Named Allele Matcher
.
This is really only applicable if you are defining your own named allele definitions.
Scoring
The score for a given named allele is determined by the number of positions at which alleles have been provided.
Example:
rs1 | rs2 | rs3 | rs4 | rs5 | score | |
---|---|---|---|---|---|---|
*1 | C | C | T | G | A | 5 |
*2 | T | T | A | 3 |
The default named allele definitions are designed to assume that missing alleles are the same as the reference named allele, which is defined as the first named allele in the definition file.
It is, however, possible to increase the score of a named allele by specifying the reference allele. For example, this gene definition table is effectively identical to the one above, but *2
has a different score.
rs1 | rs2 | rs3 | rs4 | rs5 | score | |
---|---|---|---|---|---|---|
*1 | C | C | T | G | A | 5 |
*2 | T | T | T | A | A | 5 |
Exemptions
src/main/resources/org/pharmgkb/pharmcat/definition/alleles/exemptions.json
gives you a way to modify the behavior of the Named Allele Matcher
.
Ignoring Named Alleles
If you are designing your own named allele definitions, you might need to define a named allele but not want it to be considered by the Named Allele Matcher
.
You can add an exemption for this using ignoredAlleles
and ignoredAllelesLc
(the latter is just a lower-cased collection of the former).
{
"gene": "XXX",
"ignoredAlleles": [
"*1S"
],
"ignoredAllelesLc": [
"*1s"
]
}
Combinations and Partial Alleles
Calling combination and partial alleles is intended for research use only.
A combination allele is when a sample matches a combination of 2 or more defined alleles. For example, [*6 + *14]
in the CYP2B6 [*6 + *14]/*13
diplotype output.
PharmCAT's syntax for combination calls uses square brackets to reflect that it is a variation on one gene copy and to distinguish it from gene duplications (e.g. tandem arrangements like CYP2D6 *36+*10
).
A partial allele is when a sample matches all the (core) variants of a defined allele but also has additional variants. For example, CYP2C19 *2/[*17 + g.94781859G>A]
. In the case where a partial call occurs off the reference allele, only the positions are listed (e.g. *2/g.94781859G>A
).
When asked to find combination and partial alleles, the Named Allele Matcher
will only attempt to do so if no viable call can be made.
The Named Allele Matcher
will only look for variant combinations not catalogued by PharmVar or other nomenclature sites. It does not consider novel variants; it only considers variants included in existing allele definitions found in novel combinations.
Phased vs. Unphased Data
When dealing with unphased data, note that the Named Allele Matcher
will never produce a combination/partial allele call if there is a viable non-combination/partial call. For example, if the sample would be called CYP2B6 *1/[*8 + *9]
with phased data, the Named Allele Matcher
will only call *8/*9
if the data is unphased because that is a viable call. It will not attempt to look for potential combination/partial alleles.
In addition, to limit the potential search space, a partial off the reference allele will only be called if the data is phased or the unphased data only has 2 possible sequence combinations.
Scoring
Because PharmCAT scores on the number of matched positions in the definitions, the reference named allele (usually *1) will get the highest score. As such, scoring is biased towards grouping combinations together. For example, CYP2B6 *1/[*5 + *9 + *23]
will be the call with the highest score but permutations such as *5/[*9 + *23]
, *9/[*5 + *23]
, *23/[*5 + *9]
are also possible.
Undocumented Variations
By default, only genetic variations that are defined in the allele definitions can be mapped to genotypes by the Named Allele Matcher
. If the sample includes a variant call that is located at an allele-defining position but itself not included in the allele definitions, the Named Allele Matcher
produces a "Not called" for the affected gene since the sample matches neither the reference nor any defined variant.
A “Not called” output cannot be connected to guideline recommendations, even if the sample has other defined, actionable variants. The decision was made that, in the interest of providing recommendation guidance, these undocumented variants are set to reference for genes for which the defined variants affect drug toxicity. This applies to:
- CACNA1S
- DPYD
- G6PD
- NUDT15
- RYR1
- TPMT