Compared to what? A reimbursement case

We have very good results from two multicentre case series. Why are we getting pushback from payers about the evidence?

A case series tells you what did happen to the patients who were treated, but says nothing about what would have happened to them if they had been treated with something else. A payer is interested in how the benefits of your proposition compare with what happens at present to similar patients: this describes the net benefit of introducing a pathway including your product. A payer is also interested in how the costs of your proposition compare to the current costs of similar patients: this will determine the budget impact.

In presenting the case for your product, you will need to generate figures for current benefits and costs. When there is evidence from a comparative trial, in which the course of disease or recovery or health service usage is measured in two (similar) groups, these figures come from the study. Patients recruited to the study are typically assigned to either the new interventions or a control group which receives treatment representing the baseline against which the new intervention is being compared: this can include ‘do nothing’, a sham treatment (‘do nothing’ which looks just like the new treatment so that the patient is not influenced by subjective preconceptions about the intervention), an alternative product in a head-to-head trial, current best practice, or current typical treatment. For reimbursement, the alternative chosen should be the one of most interest to the payer: in most situations, it is current typical treatment or care.

The way patients are assigned to a study group should aim to make the groups as comparable as possible: in other words, if both groups were given the same treatment—whatever that might be—the outcomes would be the same, other than the differences due to random variation. Random allocation, assignment on the basis of a roll of the dice, is most effective at achieving this. There are numerous examples of non-random or even pseudo-random allocation which have inadvertently led to dissimilarities between groups. This usually leads to errors which favour the new treatment, but can lead to the opposite area. For example, the investigator may favour a new treatment for patients whose symptoms are more severe or more resistant to current treatment whose disease is therefore more difficult to treat than those in the control group. Random allocation not only protects the results from being contaminated by investigator bias, it also helps to distribute any unknown factors which may affect outcome equally between the groups. Any form of non-random allocation is weaker and, more to the point, this weakness can be exploited by a payer who is looking for reasons to reject your proposition.

Once similar patients have been assigned to the study groups, they must be assessed in the same way to avoid biases in the results. For example, if the group being treated with your product is followed up more often than the control group, they may unconsciously stick to a treatment protocol more than the control group and this may lead to better results; more frequently follow-up may detect more problems in the group being treated with your product than controls, which may spuriously suggest that the results are poorer. Frequency and length of follow-up, what is measured, and how it is measured should be the same in the study groups.

The way doctors and other health professionals interact with patients may subtly (and sometimes not so subtly) influence patients’ subjective experience of their treatment or disease, or cause them not to respond accurately to questions (‘I didn’t want to upset the doctor—he really thinks this new treatment is going to work’). Patients’ beliefs, too, can influence how they feel, whether they consider a side-effect to be severe, and what they report. ‘Blinding’ guards against this. Ideally, both patients and investigators are unaware of the group to which the patients was assigned: such a trial is ‘double blind’. Sometimes this is difficult: if a device is being compared with pharmacological therapy, blinding of the patient would involve sham treatment with the device for one group, and sham pharmacological therapy in the other. Sometimes it is impossible: a new treatment for improving arterial patency in the leg compared to lower limb amputation. Where the patient is not blinded, outcome measures should be as objective as possible. Where the investigator is not blinded, assessments (of tumour size, of cardiac function etc) should as far as possible be done by someone who is blinded to which group the patient was assigned.

If you want to use study results to underpin a reimbursement case, sponsoring a non-comparative trial will not normally give you value for money although they are, other things being equal, cheaper than comparative trials to run.

What if, as in this case, you have to work with evidence from non-comparative trials? Modelling is sometimes used to predict what would have happened to these patients if they had not been treated with your product. In most studies, data on many variables are recorded: age, sex, disease severity, co-morbidities, previous treatment etc. For some diseases, one can attempt to reconstruct likely outcomes for patients with the recorded characteristics. However, this is problematic. The literature often does not provide the information required, or provide it in the form required. More important, perhaps, other factors may influence the outcome in unknown ways. Payers will worry that the patients studied were not in fact similar to those assumed in your model, and there is no way of proving that this is not the case.

Another tactic is to use ‘historical controls’, data on apparently similar patients treated in the past at the same or another institution. Patients are ‘matched’ on what are considered to be the key variables, and outcomes compared. This approach is open to criticism: treatments in the past may be less effective than current treatments, personnel and facilities change, there may be incomplete recording in the past, or biased recording of the current study treatment. Data on patients from other institutions, whether concurrent or historical, may be different (for example, because referral practices are different), and are treated in different facilities by different people. Again, the problem here is that matching on a limited number of known variables may help to prove that the groups are not obviously dissimilar, but cannot show that they are similar.

It is seldom worth saving money by sponsoring clinical trials which are inherently weak methodologically. It is a risk which is increasingly unlikely to pay off. As health service expenditure comes under continuing relentless pressure, why give payers ways to use the evidence-based medicine rules of engagement to avoid grasping the budgetary nettle of paying more for more benefit?