Reflections on Publishing a Negative Study
William Hersh, MD, Professor and Chair, OHSU
Blog: Informatics Professor
Twitter: @williamhersh
Although scientists are supposed to be disinterested observers of the results of their work, the reality is that none of us really want to see a negative result from our research. This is especially the case in fields like biomedical and health informatics, where we aim to develop tools that can be used to help patients and/or clinicians with their health or for delivery of care. Nonetheless, it is critically important to publish such outcomes, especially if the research was methodologically sound. This is especially the case in currently hyped technologies such as machine learning.
One of the challenges for negative studies is the myriad of potential reasons why they failed, which sometimes makes the true reason hard to pinpoint. Was it poor methods, inadequate data, or mis-application of the technology? Or more than one of those? But especially in the case of studies that involve human subjects who may be exposing themselves to risk, we must always remember the words of noted physician-scientist Iain Chalmers, who stated that failure to publish studies with patients is a form of scientific misconduct.(1)
A couple years ago, my colleague Aaron Cohen and I received some funding from a pharmaceutical company that had developed a highly effective new treatment for a rare disease, acute intermittent porphyria (AIP).(2) The new treatment, although expensive, significantly reduces the severe and sometimes disabling manifestations of AIP. As such, tools to identify patients who might have the disease could significantly improve their lives. Our approach used data from their electronic health record (EHR).
We were thrilled when our machine learning model was found to identify a group of patients with the classic presentation of AIP, yet the diagnosis had never been considered.(3) AIP is known to be one of those rare diseases that often goes undiagnosed for long periods of time. Diagnosis of these types of rare diseases is considered to be a major use case for machine learning in healthcare. From a collection of over 200,000 patient records in the research data warehouse at our institution, we manually reviewed the top 100 ranking patients identified by the model. Twenty-two of these patients were determined to have the manifestations of AIP but with no occurrence of the string “porph” anywhere in their record, i.e., not in lab tests, notes, or diagnoses. As the test to diagnose AHP is a relatively simple and inexpensive urine porphobilinogen test, we developed a clinical protocol to invite such patients for testing.
We encountered a number of challenges in developing and implementing the clinical protocol. For example, what is the role of the patient’s primary care or other provider in the decision to offer testing? Our institutional review board (IRB) determined that the process should a decision by the provider, who would then allow us to contact the patient to offer testing. Once given approval by the provider, we then had to convince the patients, many of them skeptical of the healthcare system long unable to diagnose their symptoms, to come in for testing. Another challenge in implementing the protocol was that it took place during the COVID-19 pandemic, when many people wanted to avoid healthcare settings, though by the time we were recruiting patients, the COVID-19 vaccines had started to become available.
As a result, we could only convince 7 of the 22 patients to undergo the simple and free test we were offering them. And as noted in the paper, none of them were positive.(4) Because these patients did submit to a clinical protocol, and because we believe that any attempt to clinically validate machine learning is important, we decided to submit our results to a journal as a Brief Communication, i.e., not the definitive study but clearly worth reporting. The report has now been published.
Clearly there are many possible reasons for our failure to diagnose any new cases of AIP. Not the least of these is the fact that AIP is a rare disease, and there may have been few if any new cases to be found. Certainly if we had more resources or time, we could have invited for testing more people beyond the first 100 cases we evaluated who may also have been found to have the classic presentation yet for whom the diagnosis was never considered. It is also possible that our machine learning model could have been better at identifying true cases of the disease. There may have been confounders in that those actually coming for testing were not the most likely to have the diagnosis.
Could this study have benefited from proactive genotyping? There may be a role for gene sequencing in this situation, but AIP is a condition of incomplete penetrance, i.e., just because one has the genotype does not mean they will develop the manifestations of the disease. In other words, genotyping would not be enough.
My main hope is that this contribution shows that we should be pursuing such studies aiming to apply machine learning in real-world clinical settings. As one who has been critical that machine learning has not been “translational” enough, I hope there will be a role for larger and more comprehensive studies of many different uses that have been found to be beneficial in model-building studies.
For cited references in this article, see original source. Dr. Hersh is a frequent contributing expert to Answers Media Network.