We have implemented our method in Python. The AMI ami-5b753a6b contains all of the packages needed to use our method.
To evaluate the power and Type I error of our test for qualitative interactions, we conducted a simulation study using Amazon EC2. We simulated 1,000 data realizations from each of the 12 truths shown below for a range of signal-to-noise ratios and implemented our test each time. Since no qualitative interaction is occurring in the first two rows, it would be a Type I error to reject the null hypothesis under these scenarios. Rejecting the null hypothesis would be the correct decision in the third row.
The plots below show the share of simulations with a p-value below 0.05 using our method (blue) or using a procedure based on linear regression (orange). These results suggest that our method appropriately controls Type I error while offering a substantial improvement in power over conventional tests for qualitative interactions based on linear regression.