Our goal with this project is to develop a high-throughput screening method for finding the critical features in building response-predictive biomarkers. More specifically, we consider a massive set of pairs of univariate models: for each biomolecular feature, we separately estimate a non-linear relationship with outcome under treatment and control. We are interested in those features which can define patient sub-populations with better expected outcome on treatment than control (i.e. features for which the treatment-response curve is at some point above the control-response curve). For the vast majority of treatments, most patients do not benefit (and the treatments are toxic); so for all of our features, we would expect the treatment-response curve to mostly lie below the control-response curve. Thus we can reframe what we're searching for: features for which the treatment-response and control-response curves cross.

This crossing behaviour is known as a qualitative interaction. If one curve instead lay everywhere above the other, then the feature is useless for discriminating treatments (everyone should get the better treatment). This search for treatment informing features can be thought of as a search for qualitative interactions. Most commonly used methods instead just look for general interactions: shape differences (rather than just intercept differences) between the treatment-response and control-response curves (in particular, most assume linearity and look for a slope difference). Having an interaction is prerequisite to having a qualitative interaction; however shape differences which do not result in crossing generally cannot be used to inform treatment decisions.

Our proposal combines convex optimization and statistics: We translate our search into a set of convex optimization problems. These data-adaptively estimate the control and treatment groups' underlying trends in treatment response over the range of each candidate feature. We then compare goodness-of-fit when we allow qualitative interactions to goodness-of-fit under the null hypothesis, when we restrict our treatment-response curve to lie everywhere below our control-response curve. Using permutations we can assess significance.

The plots below show example solutions to our optimization problems where we have simulated noisy data from two crossing sinusoidal functions to represent a true qualitative interaction for control (blue) and treatment (orange) groups. The truth is shown with dashed lines and fitted values are shown with solid lines.

One major strength of this approach is that we do not need to assume linearity or any simple parametric form for our response curves (which likely do not reflect reality). We can instead flexibly estimate these curves from our data. Some possibilities are shown in the following table.