The importance of measurement to improve the quality and safety of healthcare is now little in dispute. Measurement enables problems that were previously occluded to become visible: it facilitates the assessment of size and scope of quality issues as well as identification of targets for action and monitoring of change. It also enables exceptionally good performance to be detected, so that templates for the organisation and delivery of care that can be emulated by others can be derived. Yet the measurement of quality and safety also remains the subject of an enduring and often conflict-ridden debate about whether measurement for improvement should be distinguished, conceptually and practically, from measurement for accountability.
Measurement for improvement is challenging, not least because of the influence of blame dynamics on data collection and reporting practices. Inequity aversion – a dislike of unfair outcomes – is also critical. Whether it is possible to design and operate measurement systems for quality improvement in healthcare that escape the negative effects of blame remains little researched. Despite calls for attention to the science of quality measurement to better understand the strengths and limitations of quality measures and to curtail the possibility of unintended adverse consequences, field studies have remained rare.
We describe responses to a data collection system that explicitly sought to promote the use of data about patient safety for improvement, exploring how it intervened in and was influenced by the dynamics of blame. We focus on the NHS Safety Thermometer, a large-scale exercise which introduced an innovative measure of patient safety across multiple clinical settings (Harmfreecare.org 2016; Power et al. 2014). The NHS Safety Thermometer is an improvement tool measuring four common harms: pressure ulcers, falls in care settings, urinary tract infections (UTIs) in patients with a catheter, and venous thromboembolism (VTE). It represents the first attempt to measure harms at scale across diverse health settings and as such is a noteworthy example of an instrument developed specifically to drive improvement across an entire healthcare system.
The study was ethnographic, involving observations, interviews and documentary analysis. Data collection took place between February 2013 and August 2014. We recruited 19 NHS organisations in England selected using a purposive sampling strategy to ensure diversity of type and size of organisation as well as different levels of reported harm. The sample comprised ten acute hospitals, two specialist hospitals, five community healthcare organisations and two integrated healthcare organisations.
We completed ~115 hours of observation of Safety Thermometer-associated activity. Observations focused on how NHS Safety Thermometer data collection was undertaken and whether and how it was used locally to drive improvement. Observers made brief notes during their visits which were later transcribed in full as field-notes. Two audio-recorded team de-briefing sessions also took place and were transcribed.
Observations in the participating sites were complemented by interviews with 38 senior (executive/managerial) staff (mostly with director-level responsibility for nursing, patient safety, or quality improvement) and with 52 frontline staff (including ward nurses and community nurses).
Additional interviews included 27 experts in the four harms, four regional and national NHS leaders and five individuals who were involved in the design and/or national implementation of the NHS Safety Thermometer. All 126 interviews were audio-recorded and transcribed verbatim.
A range of both national and local documentation related to the NHS Safety Thermometer documentation was also collected and analysed.
Data analysis was iterative with data collection and based on the constant comparison method.
Despite its developers' intentions that this should not be case, those being asked to use the NHS Safety Thermometer focused heavily on its potential latent function as a means of performance management rather than its manifest function as an improvement tool. Perhaps unsurprisingly, frontline staff expressed most concern about the level of unfair blame to which they believed the NHS Safety Thermometer might expose them. These staff were at the sharp end of attempting to interpret and operationalise definitions and were therefore most aware of the limitations of the data while also feeling most at risk of blame for the harms.
Despite the Safety Thermometer consistently being presented as a tool to drive local improvement, rather than to generate data through which organisations' performance could be compared, many working within the NHS, both at frontline and senior levels, were concerned about the potential for any data collected to be appropriated for performance management purposes. This often negatively influenced their engagement with data collection and use. There were also very strong views that not all of the harms being recorded were preventable. Concerns about fairness, especially in relation to different care settings, were frequently voiced.
Our data consistently pointed to the prominence of blame dynamics as a latent feature of the NHS Safety Thermometer. Though the instrument was promoted as a benign tool of measurement by its designers, those charged with using it expressed doubts about neutrality of the instrumentation and about the commensurability of data collected from different sources. Drawing also on a history of performance management and deep familiarity with the dynamics of blame, participants largely saw the tool as a way not of taking the temperature of their organisations and using it to improve care, but as a way of distributing heat – the potential for blame.
The resistance encountered here is not unique – similar responses have been identified in relation to infection control (Brewster et al. 2016) and national audit (Taylor et al. 2016), and alternatives for measuring harm, such as incident reporting, may be similarly problematic – and is likely to be a feature of any attempt to measure at scale.
At the root of many concerns was the fact that data were publicly available. The NHS Safety Thermometer team had no choice but to comply with transparency requirements, but making performance data publicly available is riven with issues and recognised as exacerbating the tension between measurement for improvement and measurement for accountability (Hood and Rothstein 2001; Solberg et al. 1997).
The NHS Safety Thermometer's developers aimed to create an instrument to measure harm-free care across the health economy that could be used with a minimum of difficulty by healthcare staff in a variety of clinical settings. Care was taken in its design to avoid issues that caused problems in other NHS performance metrics, such as onerous microbiological or diagnostic criteria and complex weighting of indicators. Nonetheless, our study found that the NHS Safety Thermometer process gave rise to concerns among many about data quality and fairness which led them to question its validity and limited its use as a tool for improvement (Bradley et al. 2004; Pronovost et al. 2007).
It is important that efforts to measure for improvement take into account the serious concerns about fairness and effectiveness that were identified by this study. Staff wished to be able to attribute causes of harms, not only to ensure that their unit or organisation was not unfairly blamed, but also to be able to identify effective interventions. There is a clearly felt tension between measurement for local improvement and making the resulting data publicly accessible, but not making such data available sits uncomfortably with transparency.