The science and practice of selecting outcome measures
A behavioural scientist’s aim is to change behaviour. As such we need to be clear about what behaviour we are changing and then find a way of measuring that behaviour. Not least, so that we can assess the impact of our intervention. And while outcome measurement may seem a little dry as a topic, it is surely at the heart of any behavioural science practice.
It is certainly an area with many conundrums. What if that behaviour is hard to measure? What if it is a very private behaviour? Or one that only occurs at very specific moments in time or at locations that can be hard to identify? There are a variety of ways in which things can get unstuck when measuring outcomes, meaning this issue needs a little thinking through.
Let’s take the example of fitness. We may wish to encourage people to take more exercise, using a series of interventions such as campaign messaging. The obvious solution to assessing impact is surely to measure the change in actual fitness activity. The first consideration is that there is a range of fitness activities. We need to transform our aspiration (e.g. get fit, eat more healthily, take more exercise) into tangible behaviours (e.g. walk more, eat less meat, go swimming twice a week). For the purposes of illustration, but let’s take a simple example of increased walking.
The means we use to capture the behaviour (in this case walking) is always a topic that energises discussion as there is an understandable desire to make good use of digital data capture methods (to collect actual behaviour rather than self-reported). But even capturing something as simple as walking is more complex than it might first seem. Our trial participants can be equipped with tracking devices, but the costs of equipping people with devices is high (not all people have smartphones that record this accurately), a record of baseline behaviour is required, and the number of participants recruited needs to cater for drop outs and so on.
Second, what counts as behaviour change? We all inevitably have variation in the degree to which we walk - some weeks we happen to walk more, other weeks less. Consideration needs to be paid to this variability through the number of people recruited and the timespans covered. Otherwise we are in danger of the inevitable natural variation we observe being confounded with the effect of the intervention.
Thought also needs to be given to whether we are simply interested in measuring the behaviour being started (adoption) versus being maintained. It is unlikely that we are interested in measuring degree of adoption only if people do not then typically maintain the behaviour. Which means a longer-term assessment of the walking activity is needed (with all the logistical considerations that come with that).
There can also be an assumption that interventions will immediately result in the desired behaviour change. But as we have seen previously, behaviour change often operates as a process of incremental change, rather than blunt, binary shifts. As such, the expectation of observing an actual change in behaviour may, at times, be overly ambitious, and as such, we need to measure changes earlier on in the process. In other words, when we are aiming for behaviour change in walking, it may be legitimate to measure also preparedness and propensity, rather than simply the number of steps taken.
A further consideration needs to be given to understanding the mechanisms that help us understand whether the intervention actually worked (or not as the case may be). For example. we may have identified social norms as being important to encourage walking and as such these were used in the design of the intervention. But if we do not measure the degree to which these were impacted by the intervention (alongside walking) then we are left now knowing if these were actually responsible for the increase in walking or not. We do not just want a blunt measure of change in walking but what the behavioural mechanisms were that resulted in this change.
We can see that very quickly, the issue of considering outcomes is one that needs careful consideration. The reality of applied behavioural science is that pragmatic decisions will necessarily need to be made and the trade-offs understood.
The role of surveys for outcome measurement
Briefs for research projects frequently express a desire to move away from self-reported measures of behaviour to the use of passive data. This is understandable – in a sense, why wouldn’t we? If the data is available then it’s a great resource. But often, data-based measures of behaviour are not always that straightforward to capture and it is necessary to use self-report. In these instances, it is critical to follow best practice on how to ask questions about behaviour that will result in accurate measurement (not least to manage social desirability).
However, there are other benefits to using surveys for outcome measurement, even if these are used alongside available data. These reasons include:
Low cost: Does not require additional data capture tools which can be logistically complex and resource intensive (or it is impossible to capture data for that behaviour)
Flexible: They can capture the nuance and context of behaviour (what else was going on, what triggered the behaviour, different patterns of behaviour such as walking around the house more often)
Follow-up: They can more easily be used as a follow-up after 3 or 6 months to assess the degree to which the behaviour is being maintained
Holistic: We can not only measure behaviour but in addition awareness, planning, propensity to undertake the behaviour
Diagnostic: It is possible to measure the different dimensions that are responsible for mediating this change in behaviour, so it is possible to see why the intervention worked (if the mechanisms we thought were responsible are the ones that actually resulted in change)
In summary, we need to blend our measurement, as different tools offer different strengths: the most important requirement is to think carefully about what these bring but also their downsides in order to design effective measurement of the impact of interventions.