Extended opening hours in primary care: helpful for patients and--or--a distraction for health professionals?

WHO regards access to primary care as a priority for all health systems, because of the benefits for population health and because of the changing nature of populations (more older people with chronic conditions) and the growing expectations of the public.1 In most developed countries, progress has been made in enabling people to use primary care services during routine office hours, and policymakers have begun to ask "how much access is enough"?

The two main drivers for extending access to general practices beyond traditional office hours are the possibility that longer opening hours would lead to reduced pressure on hospital services and the need for policy to respond to the pressure from patients for appointments with their primary care providers.

Root-cause analysis: swatting at mosquitoes versus draining the swamp

Many healthcare systems recommend root-cause analysis (RCA) as a key method for investigating critical incidents and developing recommendations for preventing future events. In practice, however, RCAs vary widely in terms of their conduct and the utility of the recommendations they produce.1 2 RCAs often fail to explore deep system problems that contributed to safety events3 due to the limited methods used, constrained time and meagre financial/human resources to conduct RCAs.4 Furthermore, healthcare organisations often lack the mandate and authority required to develop and implement sophisticated and effective corrective actions.4 Consequently, corrective actions primarily aim at changing human behaviour rather than system-based changes.5 6

Clinical summaries for hospitalised patients: time for higher standards

The average person remembers less than half of the information provided by healthcare professionals during a medical visit.1 The situation is arguably most challenging for patients leaving the hospital, where acute illness, sleep deprivation and delirium add to the challenge of learning and memory.2 3 Indeed, research has shown that after hospital discharge, only 59.6% of patients are able to accurately describe their discharge diagnoses, and 43.9% can accurately recall follow-up appointments.4 Approximately one-third of patients have difficulty understanding their discharge medication regimen.5

Responding to the challenge of look-alike, sound-alike drug names

Despite significant advances in medication safety, errors related to confusion between drug names are a cause of preventable adverse events and serious harm,1 and remain a patient safety priority.2 3 Although drug name confusion is recognised as a factor contributing to error, its minimisation or elimination is a prevailing challenge.4 5 In this issue, Schroeder et al6 postulate that despite industry's efforts to follow regulators' guidance7 on how to review drug names, more objective evidence, in a standardised format, is needed to improve decision-making about the acceptability of a name. To address this concern, the authors assessed the association between error rates in laboratory-based tests of drug name memory and perception and rates of real-world errors related to drug name confusion.

Extended opening hours and patient experience of general practice in England: multilevel regression analysis of a national patient survey


The UK government plans to extend the opening hours of general practices in England. The ‘extended hours access scheme’ pays practices for providing appointments outside core times (08:00 to 18.30, Monday to Friday) for at least 30 min per 1000 registered patients each week.


To determine the association between extended hours access scheme participation and patient experience.


Retrospective analysis of a national cross-sectional survey completed by questionnaire (General Practice Patient Survey 2013–2014); 903 357 survey respondents aged ≥18 years old and registered to 8005 general practices formed the study population. Outcome measures were satisfaction with opening hours, experience of making an appointment and overall experience (on five-level interval scales from 0 to 100). Mean differences between scheme participation groups were estimated using multilevel random-effects regression, propensity score matching and instrumental variable analysis.


Most patients were very (37.2%) or fairly satisfied (42.7%) with the opening hours of their general practices; results were similar for experience of making an appointment and overall experience. Most general practices participated in the extended hours access scheme (73.9%). Mean differences in outcome measures between scheme participants and non-participants were positive but small across estimation methods (mean differences ≤1.79). For example, scheme participation was associated with a 1.25 (95% CI 0.96 to 1.55) increase in satisfaction with opening hours using multilevel regression; this association was slightly greater when patients could not take time off work to see a general practitioner (2.08, 95% CI 1.53 to 2.63).


Participation in the extended hours access scheme has a limited association with three patient experience measures. This questions expected impacts of current plans to extend opening hours on patient experience.

Opportunities to improve clinical summaries for patients at hospital discharge


Clinical summaries are electronic health record (EHR)-generated documents given to hospitalised patients during the discharge process to review their hospital stays and inform postdischarge care. Presently, it is unclear whether clinical summaries include relevant content or whether healthcare organisations configure their EHRs to generate content in a way that promotes patient self-management after hospital discharge. We assessed clinical summaries in three relevant domains: (1) content; (2) organisation; and (3) readability, understandability and actionability.


Two authors performed independent retrospective chart reviews of 100 clinical summaries generated at two Michigan hospitals using different EHR vendors for patients discharged 1 April –30 June 2014. We developed an audit tool based on the Meaningful Use view-download-transmit objective and the Society of Hospital Medicine Discharge Checklist (content); the Institute of Medicine recommendations for distributing easy-to-understand print material (organisation); and five readability formulas and the Patient Education Materials Assessment Tool (readability, understandability and actionability).


Clinical summaries averaged six pages (range 3–12). Several content elements were universally auto-populated into clinical summaries (eg, medication lists); others were not (eg, care team). Eighty-five per cent of clinical summaries contained discharge instructions, more often generated from third-party sources than manually entered by clinicians. Clinical summaries contained an average of 14 unique messages, including non-clinical elements irrelevant to postdischarge care. Medication list organisation reflected reconciliation mandates, and dosing charts, when present, did not carry column headings over to subsequent pages. Summaries were written at the 8th–12th grade reading level and scored poorly on assessments of understandability and actionability. Inter-rater reliability was strong for most elements in our audit tool.


Our study highlights opportunities to improve clinical summaries for guiding patients' postdischarge care.

Our current approach to root cause analysis: is it contributing to our failure to improve patient safety?


Despite over a decade of efforts to reduce the adverse event rate in healthcare, the rate has remained relatively unchanged. Root cause analysis (RCA) is a process used by hospitals in an attempt to reduce adverse event rates; however, the outputs of this process have not been well studied in healthcare. This study aimed to examine the types of solutions proposed in RCAs over an 8-year period at a major academic medical institution.


All state-reportable adverse events were gathered, and those for which an RCA was performed were analysed. A consensus rating process was used to determine a severity rating for each case. A qualitative approach was used to categorise the types of solutions proposed by the RCA team in each case and descriptive statistics were calculated.


302 RCAs were reviewed. The most common event types involved a procedure complication, followed by cardiopulmonary arrest, neurological deficit and retained foreign body. In 106 RCAs, solutions were proposed. A large proportion (38.7%) of RCAs with solutions proposed involved a patient death. Of the 731 proposed solutions, the most common solution types were training (20%), process change (19.6%) and policy reinforcement (15.2%). We found that multiple event types were repeated in the study period, despite repeated RCAs.


This study found that the most commonly proposed solutions were weaker actions, which were less likely to decrease event recurrence. These findings support recent attempts to improve the RCA process and to develop guidance for the creation of effective and sustainable solutions to be used by RCA teams.

Six ways not to improve patient flow: a qualitative study


Although well-established principles exist for improving the timeliness and efficiency of care, many organisations struggle to achieve more than small-scale, localised gains. Where care processes are complex and include segments under different groups' control, the elegant solutions promised by improvement methodologies remain elusive. This study sought to identify common design flaws that limit the impact of flow initiatives.


This qualitative study was conducted within an explanatory case study of a Canadian regional health system in which multitudinous flow initiatives had yielded no overall improvement in system performance. Interviews with 62 senior, middle and departmental managers, supplemented by ~700 documents on flow initiatives, were analysed using the constant comparative method.


Findings suggested that smooth flow depends on linking a defined population to appropriate capacity by means of an efficient process; flawed initiatives reflected failure to consider one or more of these essential elements. Many initiatives focused narrowly on process, failing to consider that the intended population was poorly defined or the needed capacity inaccessible; some introduced capacity for an intended population, but offered no process to link the two. Moreover, interveners were unable to respond effectively when a bottleneck moved to another part of the system. Errors of population, capacity and process, in different combinations, generated six ‘formulae for failure’.


Typically, flawed initiatives focused on too small a segment of the patient journey to properly address the impediments to flow. The proliferation of narrowly focused initiatives, in turn, reflected a decentralised system in which responsibility for flow improvement was fragmented. Thus, initiatives' specific design flaws may have their roots in a deeper problem: the lack of a coherent system-level strategy.

Cognitive tests predict real-world errors: the relationship between drug name confusion rates in laboratory-based memory and perception tests and corresponding error rates in large pharmacy chains


Drug name confusion is a common type of medication error and a persistent threat to patient safety. In the USA, roughly one per thousand prescriptions results in the wrong drug being filled, and most of these errors involve drug names that look or sound alike. Prior to approval, drug names undergo a variety of tests to assess their potential for confusability, but none of these preapproval tests has been shown to predict real-world error rates.


We conducted a study to assess the association between error rates in laboratory-based tests of drug name memory and perception and real-world drug name confusion error rates.


Eighty participants, comprising doctors, nurses, pharmacists, technicians and lay people, completed a battery of laboratory tests assessing visual perception, auditory perception and short-term memory of look-alike and sound-alike drug name pairs (eg, hydroxyzine/hydralazine).


Laboratory test error rates (and other metrics) significantly predicted real-world error rates obtained from a large, outpatient pharmacy chain, with the best-fitting model accounting for 37% of the variance in real-world error rates. Cross-validation analyses confirmed these results, showing that the laboratory tests also predicted errors from a second pharmacy chain, with 45% of the variance being explained by the laboratory test data.


Across two distinct pharmacy chains, there is a strong and significant association between drug name confusion error rates observed in the real world and those observed in laboratory-based tests of memory and perception. Regulators and drug companies seeking a validated preapproval method for identifying confusing drug names ought to consider using these simple tests. By using a standard battery of memory and perception tests, it should be possible to reduce the number of confusing look-alike and sound-alike drug name pairs that reach the market, which will help protect patients from potentially harmful medication errors.

Reviewing deaths in British and US hospitals: a study of two scales for assessing preventability


Standardised mortality ratios do not provide accurate measures of preventable mortality. This has generated interest in using case notes to assess the preventable component of mortality. But, different methods of measurement have not been compared. We compared the reliability of two scales for assessing preventability and the correspondence between them.


Medical specialists reviewed case notes of patients who had died in hospital, using two instruments: a five-point Likert scale and a continuous (0–100) scale of preventability. To enhance generalisability, we used two different hospital datasets with different types of acute medical patients across different epochs, and in two jurisdictions (UK and USA). We investigated the reliability of measurement and correspondence of preventability estimates across the two scales. Ordinal mixed effects regression methods were used to analyse the Likert scale and to calibrate it against the continuous scale. We report the estimates of the probability a death could have been prevented, accounting for reviewer inconsistency.


Correspondence between the two scales was strong; the Likert categories explained most of the variation (76% UK, 73% USA) in the continuous scale. Measurement reliability was low, but similar across the two instruments in each dataset (intraclass correlation: 0.27, UK; 0.23, USA). Adjusting for the inconsistency of reviewer judgements reduced the proportion of cases with high preventability, such that the proportion of all deaths judged probably or definitely preventable on the balance of probability was less than 1%.


The correspondence is high between a Likert and a continuous scale, although the low reliability of both would suggest careful measurement design would be needed to use either scale. Few to no cases are above the threshold when using a balance of probability approach to determining a preventable death, and in any case, there is little evidence supporting anything more than an ordinal correspondence between these reviewer estimates of probability and the true probability. Thus, it would be more defensible to use them as an ordinal measure of the quality of care received by patients who died in the hospital.

The problem with root cause analysis


Attempts to learn from high-risk industries such as aviation and nuclear power have been a prominent feature of the patient safety movement since the late 1990s. One noteworthy practice adopted from such industries, endorsed by healthcare systems worldwide for the investigation of serious incidents,1–3 is root cause analysis (RCA). Broadly understood as a method of structured risk identification and management in the aftermath of adverse events,1 RCA is not a single technique. Rather, it describes a range of approaches and tools drawn from fields including human factors and safety science4 5 that are used to establish how and why an incident occurred in an attempt to identify how it, and similar problems, might be prevented from happening again.6 In this article, we propose that RCA does have potential value in healthcare, but it...

Estimating deaths due to medical error: the ongoing controversy and why it matters

One important reason for the widespread attention given to the 1999 US Institute of Medicine (IOM) report To Err Is Human1 lie in its estimate that medical error was to blame for 44 000–98 000 deaths each year in the US hospitals. This striking claim established patient safety as a public concern, strengthened the case for improving the science underlying safety and motivated providers, policymakers, payers and regulators to take safety seriously. Some did express disquiet about the validity of the figures cited,2 including one of the principal investigators of the two studies that provided the data for these estimates.3

A decade and a half later, Makary and Daniel4 attribute an even higher toll to medical error: 251 454 deaths in US hospitals per year, making, they say, medical error the third-leading cause of death in the USA. Unsurprisingly, this claim generated widespread coverage in...

Erratum: Computerised prescribing for safer medication ordering: still a work in progress

The original article is missing several acknowledgements. The amended acknowledgements statement is as below:

The authors would like to acknowledge contributions of FDA staff: Carol Holquist, RPh, Office of Regulatory Operations, Office of Generic Drugs; Kendra Worthy PharmD, Division of Medication Error Prevention and Analysis (DMEPA), Office of Medication Error Prevention and Risk Management (OMEPRM), Office of Surveillance and Epidemiology (OSE), Center for Drug Evaluation and Research (CDER); and Vicky Borders-Hemphill PharmD, DMEPA, OMEPRM, OSE, CDER, for their scientific support. In addition, the authors would like to especially thank FDA staff: Kellie Taylor, PharmD, MPH, DMEPA, OMEPERM, OSE, CDER, Irene Chan, PharmD, BCPS, DMEPA, OMEPERM, OSE, CDER, and Colleen Brennan, RPH, DMEPA, OMEPERM, OSE, CDER for their thoughtful contributions to the science and manuscript development,...



The importance of measurement to improve the quality and safety of healthcare is now little in dispute. Measurement enables problems that were previously occluded to become visible: it facilitates the assessment of size and scope of quality issues as well as identification of targets for action and monitoring of change. It also enables exceptionally good performance to be detected, so that templates for the organisation and delivery of care that can be emulated by others can be derived. Yet the measurement of quality and safety also remains the subject of an enduring and often conflict-ridden debate about whether measurement for improvement should be distinguished, conceptually and practically, from measurement for accountability.

Measurement for improvement is challenging, not least because of the influence of blame dynamics on data collection and reporting practices. Inequity aversion – a dislike of unfair outcomes – is also critical. Whether it is possible to design and operate measurement systems for quality improvement in healthcare that escape the negative effects of blame remains little researched. Despite calls for attention to the science of quality measurement to better understand the strengths and limitations of quality measures and to curtail the possibility of unintended adverse consequences, field studies have remained rare.

We describe responses to a data collection system that explicitly sought to promote the use of data about patient safety for improvement, exploring how it intervened in and was influenced by the dynamics of blame. We focus on the NHS Safety Thermometer, a large-scale exercise which introduced an innovative measure of patient safety across multiple clinical settings (Harmfreecare.org 2016; Power et al. 2014). The NHS Safety Thermometer is an improvement tool measuring four common harms: pressure ulcers, falls in care settings, urinary tract infections (UTIs) in patients with a catheter, and venous thromboembolism (VTE). It represents the first attempt to measure harms at scale across diverse health settings and as such is a noteworthy example of an instrument developed specifically to drive improvement across an entire healthcare system.


The study was ethnographic, involving observations, interviews and documentary analysis. Data collection took place between February 2013 and August 2014. We recruited 19 NHS organisations in England selected using a purposive sampling strategy to ensure diversity of type and size of organisation as well as different levels of reported harm. The sample comprised ten acute hospitals, two specialist hospitals, five community healthcare organisations and two integrated healthcare organisations.

We completed ~115 hours of observation of Safety Thermometer-associated activity. Observations focused on how NHS Safety Thermometer data collection was undertaken and whether and how it was used locally to drive improvement. Observers made brief notes during their visits which were later transcribed in full as field-notes. Two audio-recorded team de-briefing sessions also took place and were transcribed.

Observations in the participating sites were complemented by interviews with 38 senior (executive/managerial) staff (mostly with director-level responsibility for nursing, patient safety, or quality improvement) and with 52 frontline staff (including ward nurses and community nurses).

Additional interviews included 27 experts in the four harms, four regional and national NHS leaders and five individuals who were involved in the design and/or national implementation of the NHS Safety Thermometer. All 126 interviews were audio-recorded and transcribed verbatim.

A range of both national and local documentation related to the NHS Safety Thermometer documentation was also collected and analysed.

Data analysis was iterative with data collection and based on the constant comparison method.


Despite its developers' intentions that this should not be case, those being asked to use the NHS Safety Thermometer focused heavily on its potential latent function as a means of performance management rather than its manifest function as an improvement tool. Perhaps unsurprisingly, frontline staff expressed most concern about the level of unfair blame to which they believed the NHS Safety Thermometer might expose them. These staff were at the sharp end of attempting to interpret and operationalise definitions and were therefore most aware of the limitations of the data while also feeling most at risk of blame for the harms.

Despite the Safety Thermometer consistently being presented as a tool to drive local improvement, rather than to generate data through which organisations' performance could be compared, many working within the NHS, both at frontline and senior levels, were concerned about the potential for any data collected to be appropriated for performance management purposes. This often negatively influenced their engagement with data collection and use. There were also very strong views that not all of the harms being recorded were preventable. Concerns about fairness, especially in relation to different care settings, were frequently voiced.

Our data consistently pointed to the prominence of blame dynamics as a latent feature of the NHS Safety Thermometer. Though the instrument was promoted as a benign tool of measurement by its designers, those charged with using it expressed doubts about neutrality of the instrumentation and about the commensurability of data collected from different sources. Drawing also on a history of performance management and deep familiarity with the dynamics of blame, participants largely saw the tool as a way not of taking the temperature of their organisations and using it to improve care, but as a way of distributing heat – the potential for blame.


The resistance encountered here is not unique – similar responses have been identified in relation to infection control (Brewster et al. 2016) and national audit (Taylor et al. 2016), and alternatives for measuring harm, such as incident reporting, may be similarly problematic – and is likely to be a feature of any attempt to measure at scale.

At the root of many concerns was the fact that data were publicly available. The NHS Safety Thermometer team had no choice but to comply with transparency requirements, but making performance data publicly available is riven with issues and recognised as exacerbating the tension between measurement for improvement and measurement for accountability (Hood and Rothstein 2001; Solberg et al. 1997).

The NHS Safety Thermometer's developers aimed to create an instrument to measure harm-free care across the health economy that could be used with a minimum of difficulty by healthcare staff in a variety of clinical settings. Care was taken in its design to avoid issues that caused problems in other NHS performance metrics, such as onerous microbiological or diagnostic criteria and complex weighting of indicators. Nonetheless, our study found that the NHS Safety Thermometer process gave rise to concerns among many about data quality and fairness which led them to question its validity and limited its use as a tool for improvement (Bradley et al. 2004; Pronovost et al. 2007).

It is important that efforts to measure for improvement take into account the serious concerns about fairness and effectiveness that were identified by this study. Staff wished to be able to attribute causes of harms, not only to ensure that their unit or organisation was not unfairly blamed, but also to be able to identify effective interventions. There is a clearly felt tension between measurement for local improvement and making the resulting data publicly accessible, but not making such data available sits uncomfortably with transparency.

Response to: 'Supporting adherence for people starting a new medication for a long-term condition through community pharmacies: a pragmatic randomised controlled trial of the New Medicine Service by Elliott et al

The literature concerning the effectiveness of community pharmacy-based interventions is notable for its lack of high quality randomised studies. The publication by Elliott et al1 of a randomised controlled trial (RCT) examining the effectiveness of the New Medicine Service (NMS—a service designed to improve adherence to newly prescribed medications for long-term conditions) is therefore welcome.

The paper states that ‘the study is reported according to Consolidated Standards of Reporting Trials (CONSORT) criteria’. The CONSORT statement is ‘an evidence-based, minimum set of recommendations for reporting randomized trials’.2 Regarding outcome measures, the CONSORT criteria specify that RCTs should have ‘completely defined pre-specified primary and secondary outcome measures, including how and when they were assessed’ and that ‘for each primary and secondary outcome, results for each group, and the estimated effect size and its precision (such as 95% confidence interval)’ should be reported.3

Opening up to Open Notes and adding the patient to the team

This issue of BMJ Quality & Safety features a paper by Bell et al1 that follows up on the original ‘Open Notes’ experiment with providing patients electronic access to their primary care providers' notes. In the first report,2 the intervention was well received by the patients and did not provoke the anticipated adverse impacts feared by physicians. The current paper explores the effect of Open Notes on trust within the doctor/patient relationship, again finding generally positive responses from patients and doctors. Most patients indicated that they accessed notes to better understand and learn more about their health; and reading notes either did not change their feelings or made them feel better about their doctors. Most physicians thought that patient satisfaction had improved.

Triggering safer general practice care

We know for sure that healthcare does good things for a lot of people and that for some, it harms—sometimes concurrently with providing benefits. Quite likely doctors have known this for millennia, inspiring the caution in the Hippocratic oath to ‘first do no harm’. Clinically grounded medical researchers know that medicine is complicated and that things can sometimes go wrong despite the best efforts of conscientious and well-intentioned clinicians. To minimise people's exposure to healthcare that harms them, while maximising their exposure to healthcare that helps, a diverse armamentarium has developed that includes tools such as alarms and alerts built in to equipment, postmarketing medicines surveillance, incident reporting systems, check lists, fish plots, run charts and many other things.

When doctors share visit notes with patients: a study of patient and doctor perceptions of documentation errors, safety opportunities and the patient-doctor relationship


Patient advocates and safety experts encourage adoption of transparent health records, but sceptics worry that shared notes may offend patients, erode trust or promote defensive medicine. As electronic health records disseminate, such disparate views fuel policy debates about risks and benefits of sharing visit notes with patients through portals.


Presurveys and postsurveys from 99 volunteer doctors at three US sites who participated in OpenNotes and postsurveys from 4592 patients who read at least one note and submitted a survey.


Patients read notes to be better informed and because they were curious; about a third read them to check accuracy. In total, 7% (331) of patients reported contacting their doctor's office about their note. Of these, 29% perceived an error, and 85% were satisfied with its resolution. Nearly all patients reported feeling better (37%) or the same (62%) about their doctor. Patients who were older (>63), male, non-white, had fair/poor self-reported health or had less formal education were more likely to report feeling better about their doctor. Among doctors, 26% anticipated documentation errors, and 44% thought patients would disagree with notes. After a year, 53% believed patient satisfaction increased, and 51% thought patients trusted them more. None reported ordering more tests or referrals.


Despite concerns about errors, offending language or defensive practice, transparent notes overall did not harm the patient–doctor relationship. Rather, doctors and patients perceived relational benefits. Traditionally more vulnerable populations—non-white, those with poorer self-reported health and those with fewer years of formal education—may be particularly likely to feel better about their doctor after reading their notes. Further informing debate about OpenNotes, the findings suggest transparent records may improve patient satisfaction, trust and safety.

Towards optimising local reviews of severe incidents in maternity care: messages from a comparison of local and external reviews


Detailed local case review is commonly used as a strategy to improve care. However, recent reports have highlighted concerns over quality of local reviews in maternity care. The aim of this project was to describe the methods used for conducting local reviews of care of women with severe maternal morbidity, and to compare lessons identified for future care through external and local reviews.


Thirty-three anonymised clinical records from women with severe maternal morbidities were obtained, together with the report of the local review of their care. The methodology used for the local reviews was described, including specific tools used, team members involved, their disciplines, report format and whether an action plan with recommendations for audit was produced. Multidisciplinary external reviewers considered the records using a standard confidential enquiry approach. A thematic analysis of lessons learned from the two approaches was undertaken.


A formal report of the local review was produced for 11/33 cases; 4 of these used root cause analysis. A further 12 local reviews consisted of a group discussion with output noted in a spreadsheet; 5 consisted of a timeline with good practice points and 5 had no formal review. Patients were involved in five local reviews; only one was multidisciplinary. Action plans were recorded in 14 local reviews; 3 of these included a recommendation to audit the proposed changes. External reviews identified additional messages for care and highlighted aspects of good care in every case, whereas only 55% (n=18) of local reviews identified good care (p<0.0005).


The quality of local reviews can clearly be improved. Very few of the reviews involved patients. Local reviews should be multidisciplinary, generate an action plan, and the implementation of recommendations should be audited. Improvements in local reviews may be achieved by standardised training or development of national protocols.

How does audit and feedback influence intentions of health professionals to improve practice? A laboratory experiment and field study in cardiac rehabilitation


To identify factors that influence the intentions of health professionals to improve their practice when confronted with clinical performance feedback, which is an essential first step in the audit and feedback mechanism.


We conducted a theory-driven laboratory experiment with 41 individual professionals, and a field study in 18 centres in the context of a cluster-randomised trial of electronic audit and feedback in cardiac rehabilitation. Feedback reports were provided through a web-based application, and included performance scores and benchmark comparisons (high, intermediate or low performance) for a set of process and outcome indicators. From each report participants selected indicators for improvement into their action plan. Our unit of observation was an indicator presented in a feedback report (selected yes/no); we considered selecting an indicator to reflect an intention to improve.


We analysed 767 observations in the laboratory experiment and 614 in the field study, respectively. Each 10% decrease in performance score increased the probability of an indicator being selected by 54% (OR, 1.54; 95% CI 1.29% to 1.83%) in the laboratory experiment, and 25% (OR, 1.25; 95% CI 1.13% to 1.39%) in the field study. Also, performance being benchmarked as low and intermediate increased this probability in laboratory settings. Still, participants ignored the benchmarks in 34% (laboratory experiment) and 48% (field study) of their selections.


When confronted with clinical performance feedback, performance scores and benchmark comparisons influenced health professionals' intentions to improve practice. However, there was substantial variation in these intentions, because professionals disagreed with benchmarks, deemed improvement unfeasible or did not consider the indicator an essential aspect of care quality. These phenomena impede intentions to improve practice, and are thus likely to dilute the effects of audit and feedback interventions.

Trial registration number

NTR3251, pre-results.