Therapeutic Product Immunogenicity Community

 View Only
  • 1.  Analytical Variability of Reactive ADA Samples

    Posted 12-05-2025 18:46

    I wanted to post this here because as the paradigm for ADA assays shifts, something has been gnawing at me. I'm fully on board with moving away from the three tier approach, and nothing I'm raising here is "fixed" by keeping it. But if we're now using language like "ADA are a biomarker," then I think there are analytical implications to that shift that I haven't seen well explained. Thank you to Robert for the recent presentation, it pushed me to try to make my thoughts more coherent.

    Before ADA appears, the only analytical variance we can characterize is assay noise because we have no relevant surrogate to measure analytical variability of an ADA response. The mixture is too heterogeneous to represent meaningfully until it exists in a study sample. But once ADA appears, the assay is measuring an analyte, a mixture yes (IgM, IgG, low affinity, high affinity), but that mixture is the analyte.

    At that point the meaningful question becomes: how consistently can we measure that analyte?

    I've heard the argument that because the analyte mixture may shift from day to day or timepoint to timepoint, that variability should be treated as biological. But that doesn't sit well with me if we are treating ADA assays like biomarker assays. With other biomarkers, we do not assume precision; we measure it using endogenous samples because they contain the analyte in the true matrix. ADA assays seem to be the only place where we go straight from "the signal is above assay noise" to "this might be biologically meaningful," without ever asking whether the measurement itself is stable enough to make that determination.

    Consider a simple hypothetical. Take one ADA positive sample, split it into identical aliquots, freeze them, and run them on different days. You might get S:N values of 5, 7, 10, 6, 9. The biology is unchanged, same antibodies, same complexes, yet the measured ADA response swings twofold simply because the assay was run on a different day. To me, that level of variation is potentially meaningful. If modelers receive a data transfer with a S:N of 5 vs 10, they may arrive at a different conclusion as to what is clinically meaningful response. It is also important to point out that those divergent interpretations come entirely from analytical variation, repeated measures of the same sample.

    So if ADA is a biomarker and S:N is the quantity we intend to report, the three-tier approach and cutpoints are gone, but analytical rigor doesn't go away, it shifts. It is worth asking whether we are now responsible for understanding the uncertainty around the S N measurement itself.  If a sample is above assay noise, do we need to re-analyze it multiple times to understand the analytical variation around that sample before drawing PK, PD, or clinical conclusions? That is exactly how other ligand binding biomarker assays are handled: precision is established against the real analyte (endogenous) in the real matrix. Why would ADA be the exception?   Because the analyte mixture changes from time point to time point?  Because it's harder?

    If we don't want to go to the level of reporting a CV or a SD from a n=6 of every sample above the assay noise – I think the median of an n=3 would be a far more accurate S:N measurement, on average, than simply going with the S:N we received in the first test.

    And just to be clear, the goal here is not to do extra work, but logically that seems to be the direction it is heading. Thoughts?



    ------------------------------
    Jason Delcarpini
    Director
    Moderna
    Cambridge MA
    [email protected]

    Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.
    ------------------------------


  • 2.  RE: Analytical Variability of Reactive ADA Samples

    Community Leadership
    Posted 12-09-2025 11:32

    Hi Jason,

    Thanks for raising a number of questions that I'm sure are on people's minds.  I have long advocated the concept that immunogenicity is a biomarker, so have thought a fair bit about these points.  

    I agree that we have no good way to measure the analytical variability of the true analyte in pre-study validation.  We can, however, analytically characterize assay performance with regard to the PC, imperfect as that is. It at least tells you the analytical variability associated with measuring the same sample multiple times.  That is a starting place.  The power of using all the S/N data from all time points (not invoking a cut point) is that subject/patient full profiles provide a much more comprehensive view of immunogenicity development over time.  Additionally, placebo 'profiles' provide insight into the biological variability, cross-sectionally and longitudinally in the study population over the duration of the study, allowing for a better understanding of what magnitude of responses exceed the analytical and biological variability seen in the untreated population.

    Your proposed hypothetical situation does not align with the real data sets I have evaluated.  When the assay is analytically robust, as determined with PCs, I have observed similar analytical precision with study samples.  Most S/N repeat measure analyses with study samples that I have undertaken report CVs <10%.  In pre-study validation, you can use your PC precision as a starting place and then in-study you can use your Pbo samples as a guide to incorporation of biological variability.  At the end of the day, what we are looking for is clinically relevant immunogenicity and in every post-hoc case I've examined to date, complete S/N profiles from the screening assay have provided a clearer, more granular view of the development of ADA and differentiated meaningful responses from biological variability as well as from real, but low-level, clinically irrelevant responses.  Additionally, unlike other biomarkers, clinically relevant immunogenicity appears, increases and persists - these profiles are easily discerned from transient profiles and biological noise.

    I'd approach each assay considering context of use, demonstrate pre-study analytical precision to a degree that will meet that context, and then carefully evaluate my in study data to assure myself the assay is meeting needs of its intended use.  The 3-tiered paradigm already uses S/N to declare a sample 'positive' and then we are happy to report titers that are inherently imprecise, which many believe is precise enough for most immunogenicity analyses.  The good news here is that even as currently designed, the screening assays perform well for S/N and if we take the opportunity to look at the full data sets they offer, we already gain more insight.  This is likely because we already develop these assays to meet S/N precision requirements for the PC at multiple levels.  



    ------------------------------
    Lauren Stevenson Ph.D.
    Chief Scientific Officer
    Immunologix Laboratories
    Tampa FL
    [email protected]

    Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.
    ------------------------------



  • 3.  RE: Analytical Variability of Reactive ADA Samples

    Posted 12-09-2025 12:03
    Jason,
    A few thoughts on your discussion points:
    1.  ADA assays are not biomarker assays, nor are they treated that way for validation.  For biomarkers you actually have relevant reference albeit recombinant-this is one of the purposes of the parallelism experiment, to see if the endogenous material behaves the same in the assay as your standard (an experiment not done in ADA validation).  As you point out clearly, ADAs have a surrogate positive control (not referred to as "reference" material), nor does subject ADA remain constant over time anyway.
    2. One performance metric for ADA assays is sample reproducibility.  That is tested by plotting repeat runs of screening cutpoint datasets against each other-a nice regression fit, particularly the same rank order of subject responses, tells you that you'll get the same responses on repeat testing.  This is important because typically a sample is run once and it's nice to know you can rely on that data.  If your repeat sample run is off by 2x, I'd be worried about the assay.
    3. Now I confess that I have yet to read Robert's paper, but I think I understand the concern (please forgive if I got this wrong):  a cutpoint (CP) based on population variability will have the following issue (assuming the distribution is representative of the study population):  samples with low backgrounds will require relatively more ADA to go above the CP than those with higher backgrounds.  This is something I suspect most folks who have familiarity with these assays understand.  The question is whether it really matters in terms of detecting impactful ADA.  I suspect not because low ADA levels are rarely impactful (the same point can be made for ADA below the CP, which surely are there but not detectable).
    4. But let's assume it's important to even up the playing field so to speak.  S:N seems a reasonable approach (again, don't know the specifics of Robert's proposal, so the following may be included therein), this brings in another source of variability apart from run-to-run of the same sample, I.e., longitudinal variability of sample background., in other words multiple baseline samples from the same subject.  I've had occasion to assess this for a molecule that had an huge prevalence of pre-existing ADA (that's another story) and it's not trivial variability.  This value is the "N", which surely must be known as accurately as possible to get a reliable S:N value upon which to make a +/- decision.  So, how many of those samples over what stretch of time are needed?  How operationally difficult will that be?  Will disease state affect that?  Will response to drug (efficacy) effect that?
    Bottom line:  I think the nature of ADA and our approach to measuring them has inherent difficulties and the question of whether we are missing clinically relevant ADA using the standard CP approach or a S:N approach will in some sense remain.  In my experience, assay responses for clinically impactful ADA are generally not ambiguous.






  • 4.  RE: Analytical Variability of Reactive ADA Samples

    Posted 12-10-2025 12:43

    I'm very glad that we are having this discussion, and I hope more people join and that it spills to other fora. There is so much to unpack here that I don't know where to start and if I can address everything.

    • Does the current tiered strategy result in neglecting any clinically relevant ADA responses? No, it does not. Does it generate results that have no correlation with clinical observations? Definitively yes, and we expend a lot of resources to generate these meaningless results. We should strive to get better data with less work. Call me lazy, but I think it's smart. I often hear the argument that we need a highly conservative tiered approach for high-risk molecules. I don't think "Look busy! It's a high-risk molecule!" is a good bioanalytical strategy. For high-risk drugs we need to do the right experiments, not just more of them.
    • We have been collecting inter-assay precision data for actual ADA from real patients for many years now; for every ADA-positive sample we have three results; screening S/N, confirmatory (no drug) S/N, and S/N for undiluted sample in the titer assay. I encourage people to look at and share these results; this is the best characterization of inter-assay precision at different levels of the "real" analyte we can ever hope for.
    • Whenever I looked at these sets of 3 results they had very good precision and if the precision was not good, it was very bad. If the assay couldn't reproduce signal between the screening, confirmatory and titer tiers, it was typically related to ruthenylated drug and could be fixed by preparing a new batch of reagents. 
    • Note that only ADA-positive samples are tested three times to generate a positive classification. A negative classification is generated from a single measurement which could be an additional source of false negatives. Even when the assay is running with good precision, there is a non-zero probability that a high ADA sample is going to fall below the cut point occasionally. The answer to this problem is not to test every sample three times, but to follow all samples without discarding any in multiple testing tiers.
    • Typical assay acceptance criteria are designed to avoid false negatives but not well suited for monitoring of assay precision. It's normal for signal to bounce up / down and we can define the acceptable limits of these changes. We sort of do it for PK assays when we perform ISR; ≥ 66.7% of samples should have mean %difference within ±30%. Something similar could be applied to responses of positive controls across multiple assays.
    • We should be able to define a minimum level of change between subsequent time points that is larger than normal analytical noise. For S/N a ≤ 2-fold change should be statistically significant, even for assays that are not very precise but still acceptable. (BTW, I obtained a similar result using the MSR approach, but I find MSR a pain to work with for an uneducated statistician like me.) Once we identify these statistically significant changes, we can start interpreting what they mean from the immunogenicity standpoint. 

    Anyway, lots to think about but I'm glad we are thinking about it.

    Cheer!



    ------------------------------
    Robert J. Kubiak, PhD
    Director, Head of Bioanalytical Science
    Third Arc Bio
    [email protected]

    Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.
    ------------------------------