Quality and Safety in Health Care Journal

Understanding the evidence for artificial intelligence in healthcare

Scientific studies of artificial intelligence (AI) solutions in healthcare have been the subject of intense criticism—both in research publications and in the media.1–3 Early validations of predictive algorithms are criticised for not having meaningful clinical impact, and AI tools that make mistakes or fail to show immediate improvement in health outcomes are heralded as the first snowflakes in the next AI winter (a period of decreased interest in AI research and development). Scientific evidence is the language of trust in healthcare, and peer-reviewed studies evaluating AI solutions are key to fostering adoption. There are over two dozen reporting guidelines for AI in medicine,4 and many other consensus statements and standards that offer recommendations for the publication of research about medical AI.5 Despite such guidance, the average frontline clinician still struggles in interpreting the results of an AI study to...

Workforce well-being is workforce readiness: it is time to advance from describing the problem to solving it

‘We need bold, fundamental change that gets at the roots of the burnout crisis.’- US Surgeon General Vivek H. Murthy, MD, MBA.

Well-being was brought into clearer focus during the COVID-19 pandemic, during which the prevalence of healthcare worker (HCW) emotional exhaustion increased from 27%1 to 39%.2 Currently, there is not a coordinated effort to ensure HCW well-being interventions meet minimum standards of feasibility, accessibility and methodological rigour. In this issue of BMJ Quality and Safety, Melvin et al assessed perceptions of physician well-being programmes by interviewing physicians and people involved in these programmes.3 As is often the case with any real-world application of science, there are substantial gaps between the programmes as intended and the programmes in practice. The authors conclude that the ‘persistence of poor well-being outcomes suggests that current support initiatives are suboptimal’.

The key is understanding what is suboptimal....

We will take some team resilience, please: Evidence-based recommendations for supporting diagnostic teamwork

In this issue of BMJ Quality and Safety, Black and colleagues present a qualitative study of healthcare teams working to uncover diagnoses in patients experiencing non-specific cancer symptoms.1 The study highlights the criticality of teams in helping support or derail diagnostic pathways. Overall, Black et al1 present unique insights that highlight the challenges clinical teams face when caring for patients with non-specific symptoms.

Unfortunately, we know that diagnostic processes such as those studied by Black et al are frequently unsafe. Diagnostic errors are ‘the single largest source of deaths across all (healthcare) settings,’ with estimates for cancer-related mistakes estimated at around 11.1%.2 A key challenge to making diagnoses in patients with non-specific symptoms is the presence of uncertainty throughout the diagnostic process.

As Black et al point out,1 uncertainty in the diagnostic process is felt by both patients and clinicians. It...

Large-scale observational study of AI-based patient and surgical material verification system in ophthalmology: real-world evaluation in 37 529 cases

Background

Surgical errors in ophthalmology can have devastating consequences. We developed an artificial intelligence (AI)-based surgical safety system to prevent errors in patient identification, surgical laterality and intraocular lens (IOL) selection. This study aimed to evaluate its effectiveness in real-world ophthalmic surgical settings.

Methods

In this retrospective observational before-and-after implementation study, we analysed 37 529 ophthalmic surgeries (18 767 pre-implementation, 18 762 post implementation) performed at Tsukazaki Hospital, Japan, between 1 March 2019 and 31 March 2024. The AI system, integrated with the WHO surgical safety checklist, was implemented for patient identification, surgical laterality verification and IOL authentication.

Results

Post implementation, five medical errors (0.027%) occurred, with four in non-authenticated cases (where the AI system was not fully implemented or properly used), compared with one (0.0053%) pre-implementation (p=0.125). Of the four non-authenticated errors, two were laterality errors during the initial implementation period and two were IOL implantation errors involving unlearned IOLs (7.3% of cases) due to delayed AI updates. The AI system identified 30 near misses (0.16%) post implementation, vs 9 (0.048%) pre-implementation (p=0.00067), surgical laterality errors/near misses occurred at 0.039% (7/18 762) and IOL recognition at 0.29% (28/9713). The system achieved>99% implementation after 3 months. Authentication performance metrics showed high efficiency: facial recognition (1.13 attempts, 11.8 s), surgical laterality (1.05 attempts, 3.10 s) and IOL recognition (1.15 attempts, 8.57 s). Cost–benefit analysis revealed potential benefits ranging from US$181 946.94 to US$2 769 129.12 in conservative and intermediate scenarios, respectively.

Conclusions

The AI-based surgical safety system significantly increased near miss detection and showed potential economic benefits. However, errors in non-authenticated cases underscore the importance of consistent system use and integration with existing safety protocols. These findings emphasise that while AI can enhance surgical safety, its effectiveness depends on proper implementation and continuous refinement.

Support for hospital doctors workplace well-being in England: the Care Under Pressure 3 realist evaluation

Introduction

The vital role of medical workforce well-being for improving patient experience and population health while assuring safety and reducing costs is recognised internationally. Yet the persistence of poor well-being outcomes suggests that current support initiatives are suboptimal. The aim of this research study was to work with, and learn from, diverse hospital settings to understand how to optimise strategies to improve doctors’ well-being and reduce negative impacts on the workforce and patient care.

Methods

Realist evaluation consistent with the Realist And Meta-narrative Evidence Synthesis: Evolving Standards (RAMESES) II quality standards. Realist interviews (n=124) with doctors, well-being intervention implementers/practitioners and leaders in eight hospital settings (England) were analysed using realist logic.

Results

There were four key findings, underpinned by 21 context-mechanism-outcome configurations: (1) solutions needed to align with problems, to support doctor well-being and avoid harm to doctors; (2) doctors needed to be involved in creating solutions to their well-being problems; (3) doctors often did not know what support was available to help them with well-being problems and (4) there were physical and psychological barriers to accessing well-being support.

Discussion and conclusion

Doctors are mandated to ‘first, do no harm’ to their patients, and the same consideration should be extended to doctors themselves. Since doctors can be harmed by poorly designed or implemented well-being interventions, new approaches need careful planning and evaluation. Our research identified many ineffective or harmful interventions that could be stopped. The findings are likely transferable to other settings and countries, given the realist approach leading to principles and causal explanations.

Doing 'detective work to find a cancer: how are non-specific symptom pathways for cancer investigation organised, and what are the implications for safety and quality of care? A multisite qualitative approach

Background

Over the past two decades, the UK has actively developed policies to enhance early cancer diagnosis, particularly for individuals with non-specific cancer symptoms. Non-specific symptom (NSS) pathways were piloted and then implemented in 2015 to address delays in referral and diagnosis. The aim of this study was to outline the functions that enable NSS teams to investigate cancer and other diagnoses for patients with NSSs.

Methods

The analysis was derived from a multisite ethnographic study conducted between 2020 and 2023 across four major National Health Service (NHS) trusts. Data collection encompassed observations, patient shadowing, interviews with clinicians and patients (n=54) and gathered documents. We used principles of the functional resonance analysis method to identify the functions of the NSS pathway and analyse their relevance to patient safety.

Results

Our analysis produced 29 distinct functions within NSS pathways, organised into two clusters: pretesting assessment and information gathering, and post-testing interpretation and management. Safety-critical functions encompassed assessing the reason for referral, deciding on a plan of investigation and estimating the remaining cancer risk. We also identified ways that teams build and maintain safety across all functions, for example, by cultivating generalist-specialist expertise within the team and creating continuity through patient navigation. Variation in practice across sites revealed targets for an NSS pathway blueprint that would foster local development and quality improvement.

Conclusions

Our findings suggest that national and local improvement plans could differentiate specific policies to reduce unwarranted variation and support adaptive variation that facilitates the delivery of safe care within the local context. Enhancing multidisciplinary teams with additional consultants and deploying patient navigators with clinical backgrounds could improve safety within NSS pathways. Future research should investigate different models of generalist-specialist team composition.

Quantifying the cost savings and health impacts of improving colonoscopy quality: an economic evaluation

Objective

To estimate and quantify the cost implications and health impacts of improving the performance of English endoscopy services to the optimum quality as defined by postcolonoscopy colorectal cancer (PCCRC) rates.

Design

A semi-Markov state-transition model was constructed, following the logical treatment pathway of individuals who could potentially undergo a diagnostic colonoscopy. The model consisted of three identical arms, each representing a high, middle or low-performing trust’s endoscopy service, defined by PCCRC rates. A cohort of 40-year-old individuals was simulated in each arm of the model. The model’s time horizon was when the cohort reached 90 years of age and the total costs and quality-adjusted life-years (QALYs) were calculated for all trusts. Scenario and sensitivity analyses were also conducted.

Results

A 40-year-old individual gains 0.0006 QALYs and savings of £6.75 over the model lifetime by attending a high-performing trust compared with attending a middle-performing trust and gains 0.0012 QALYs and savings of £14.64 compared with attending a low-performing trust. For the population of England aged between 40 and 86, if all low and middle-performing trusts were improved to the level of a high-performing trust, QALY gains of 14 044 and cost savings of £249 311 295 are possible. Higher quality trusts dominated lower quality trusts; any improvement in the PCCRC rate was cost-effective.

Conclusion

Improving the quality of endoscopy services would lead to QALY gains among the population, in addition to cost savings to the healthcare provider. If all middle and low-performing trusts were improved to the level of a high-performing trust, our results estimate that the English National Health Service would save approximately £5 million per year.

Improving weaning and liberation from mechanical ventilation for tracheostomy patients: a quality improvement initiative

For patients in the intensive care unit (ICU), prolonged mechanical ventilation is associated with poor outcomes. A quality improvement (QI) initiative with the aim of reducing median time on the ventilator for tracheostomy patients was undertaken at a tertiary care ICU in Toronto, Canada. A QI team was formed, and using QI methodology, a deep understanding of our local process was achieved. Based on this information and on the latest evidence on weaning, a standard tracheostomy weaning protocol was designed. The protocol was refined through three developmental and two testing plan–do–study–act cycles. This study was a prospective time series showing the effect of the implementation of our intervention on tracheotomy patients’ time on the ventilator. The baseline median number of days on the ventilator after tracheostomy insertion was 17. Within 12 months of the introduction of the intervention, a shift in the data showing a reduction in the median time on the ventilator to 10.6 days had developed. Length of stay in the ICU was reduced by 4.3 days. Adherence and compliance to the protocol also improved over time. A standard tracheostomy weaning protocol was successfully developed, tested and implemented in a tertiary care ICU. Using strategies such as frequent communication with key stakeholders and incorporating a tracheostomy weaning progress sheet to document and track tracheostomy patients and their outcomes, this QI intervention has become engrained in the local culture at our centre. This weaning protocol has successfully reduced the median time on the ventilator for tracheostomy patients by over 6 days.

Testing and cancer diagnosis in general practice

Healthcare systems worldwide have for decades sought to prioritise prompt diagnosis of cancer as a means to improve outcomes. The gatekeeping role of general practitioners (GPs) that restricts access to testing and referral,1 along with their relatively lower propensity to use diagnostic tests,2 has been offered as partial explanations for the UK’s consistently poor performance in cancer compared with other high-income countries.3

In this issue of BMJ Quality & Safety, Akter and colleagues examined primary care investigations prior to a cancer diagnosis using data on 53 252 patients and 1868 general practices from the 2018 English National Cancer Diagnostic Audit.4 Grouping tests into four categories (any investigation, blood tests, imaging and endoscopy), the study demonstrated large variation in use of tests in general practice prior to diagnosis with cancer. Recorded characteristics of practices accounted for only a small proportion of this variation,...

Just how many diagnostic errors and harms are out there, really? It depends on how you count

The significant adverse consequences of diagnostic errors are well established.1 2 Across clinical settings and study methods, diagnostic adverse events often lead to serious permanent disability or death and are frequently deemed preventable.3–5 In malpractice claims, diagnostic adverse events consistently account for more total serious harms than any other individual type of medical error,5 6 a finding supported by large, population-based estimates of total serious misdiagnosis-related harms.2 Despite this, they generally go unrecognised, unmeasured and unmonitored, causing the US National Academy of Medicine to label diagnostic errors as ‘a blind spot’ for healthcare delivery systems.1

Diagnostic errors have been described as ‘the bottom of the iceberg’ of patient safety. This analogy is intended to connote both their enormous impact and their unmeasured, hidden nature relative to more visible errors such as...

Learning from an allied health perspective on quality and safety

In this issue of the journal, the article ‘Developing the Allied Health Professionals workforce within mental health, learning disability, and autism inpatient services: Rapid review of learning from quality and safety incidents’ by Wilson and colleagues1 reviews materials on safety incidents in England published between 2014 and 2024, with a focus on the contribution of allied health professionals. In the context of this study, NHS England’s definition of ‘allied health professionals’ (AHPs) was used, namely the 14 registerable professions of art therapists (art/music/drama), chiropodists/podiatrists, dietitians, occupational therapists, operating department practitioners, orthoptists, osteopaths, paramedics, physiotherapists, prosthetists/orthotists, radiographers and speech and language therapists.1 The review largely considers more extreme forms of harm, such as death (including homicide and suicide), abuse by staff and self-harm.

In this editorial, we take a reflective stance informed by critical discourse analysis. Critical discourse analysis concerns itself with the use of language...

Increasing surgical volumes in resource limited-healthcare systems: team-based quality improvement as a novel approach to quantity improvement

Quality improvement (QI) in the context of extremely limited healthcare access presents unique challenges, as the primary focus is often on increasing service quantity to meet needs. Access and quality in such situations can be at odds, as is the case with surgical care in resource-limited healthcare systems around the world. However, volumes and quality must advance in tandem to prevent inadvertent harm. In many healthcare systems, patients abandon treatment due to poor quality care despite reaching the hospital.1 These challenges are further magnified in very low-resource settings, where public hospitals serve populations in the lowest economic strata. Such realities underscore the vital importance of QI in such settings to build trust of communities in their healthcare system and providers.

An important contribution to the sparse body of literature in this space is the study by Barker et al in this issue of BMJ Quality & Safety.

Variation in the use of primary care-led investigations prior to a cancer diagnosis: analysis of the National Cancer Diagnosis Audit

Introduction

Use of investigations can help support the diagnostic process of patients with cancer in primary care, but the size of variation between patient group and between practices is unclear.

Methods

We analysed data on 53 252 patients from 1868 general practices included in the National Cancer Diagnosis Audit 2018 using a sequence of logistic regression models to quantify and explain practice-level variation in investigation use, accounting for patient-level case-mix and practice characteristics. Four types of investigations were considered: any investigation, blood tests, imaging and endoscopy.

Results

Large variation in practice use was observed (OR for 97.5th to 2.5th centile being 4.02, 4.33 and 3.12, respectively for any investigation, blood test and imaging). After accounting for patient case-mix, the spread of practice variation increased further to 5.61, 6.30 and 3.60 denoting that patients with characteristics associated with higher use (ie, certain cancer sites) are over-represented among practices with lower than the national average use of such investigation. Practice characteristics explained very little of observed variation, except for rurality (rural practices having lower use of any investigation) and concentration of older age patients (practices with older patients being more likely to use all types of investigations).

Conclusion

There is very large variation between practices in use of investigation in patients with cancer as part of the diagnostic process. It is conceivable that the diagnostic process can be improved if investigation use was to be increased in lower use practices, although it is also possible that there is overtesting in practices with very high use of investigations, and in fact both undertesting and overtesting may co-exist.

Adverse diagnostic events in hospitalised patients: a single-centre, retrospective cohort study

Background

Adverse event surveillance approaches underestimate the prevalence of harmful diagnostic errors (DEs) related to hospital care.

Methods

We conducted a single-centre, retrospective cohort study of a stratified sample of patients hospitalised on general medicine using four criteria: transfer to intensive care unit (ICU), death within 90 days, complex clinical events, and none of the aforementioned high-risk criteria. Cases in higher-risk subgroups were over-sampled in predefined percentages. Each case was reviewed by two adjudicators trained to judge the likelihood of DE using the Safer Dx instrument; characterise harm, preventability and severity; and identify associated process failures using the Diagnostic Error Evaluation and Research Taxonomy modified for acute care. Cases with discrepancies or uncertainty about DE or impact were reviewed by an expert panel. We used descriptive statistics to report population estimates of harmful, preventable and severely harmful DEs by demographic variables based on the weighted sample, and characteristics of harmful DEs. Multivariable models were used to adjust association of process failures with harmful DEs.

Results

Of 9147 eligible cases, 675 were randomly sampled within each subgroup: 100% of ICU transfers, 38.5% of deaths within 90 days, 7% of cases with complex clinical events and 2.4% of cases without high-risk criteria. Based on the weighted sample, the population estimates of harmful, preventable and severely harmful DEs were 7.2% (95% CI 4.66 to 9.80), 6.1% (95% CI 3.79 to 8.50) and 1.1% (95% CI 0.55 to 1.68), respectively. Harmful DEs were frequently characterised as delays (61.9%). Severely harmful DEs were frequent in high-risk cases (55.1%). In multivariable models, process failures in assessment, diagnostic testing, subspecialty consultation, patient experience, and history were significantly associated with harmful DEs.

Conclusions

We estimate that a harmful DE occurred in 1 of every 14 patients hospitalised on general medicine, the majority of which were preventable. Our findings underscore the need for novel approaches for adverse DE surveillance.

Developing the allied health professionals workforce within mental health, learning disability and autism inpatient services: rapid review of learning from quality and safety incidents

Background

Allied health professionals (AHPs) in inpatient mental health, learning disability and autism services work in cultures dominated by other professions who often poorly understand their roles. Furthermore, identified learning from safety incidents often lacks focus on AHPs and research is needed to understand how AHPs contribute to safe care in these services.

Methods

A rapid literature review was conducted on material published from February 2014 to February 2024, reporting safety incidents within adult inpatient mental health, learning disability and autism services in England, with identifiable learning for AHPs. 115 reports/publications were included, predominantly consisting of independent investigations by NHS England, prevent future deaths reports and Care Quality Commission reports.

Findings

Misunderstanding of AHP roles, from senior leadership to frontline staff, led to AHPs being disempowered and excluded from conversations/decisions, and patients not getting sufficient access to AHPs, contributing to safety incidents. A central thread ‘organisational culture’ ran through five subthemes: (1) (lack of) effective multidisciplinary team (MDT) working, evidenced by poor communication, siloed working, marginalisation of AHPs and a lack of psychological safety; (2) (lack of) AHP involvement in patient care including care and discharge planning, and risk assessment/management. Some MDTs had no AHPs, some recommendations by AHPs were not actioned and referrals to AHPs were not always made when indicated; (3) training needs were identified for AHPs and other professions; (4) staffing issues included understaffing of AHPs and (5) senior management and leadership were found to not value/understand AHP roles, and instil a blame culture. A need for cohesive, well-led and nurturing MDTs was emphasised.

Conclusion

Understanding and recognition of AHP roles is lacking at all levels of healthcare organisations. AHPs can be marginalised in MDTs, presenting risks to patients and missed opportunities for quality improvement. Raising awareness of the essential roles of AHPs is critical for improving quality and safety in inpatient mental health, learning disability and autism services.

Quality improvement collaborative to increase access to caesarean sections: lessons from Bihar, India

Background

Countries with resource-poor health systems have struggled to improve access to and the quality of caesarean section (C-section; CS) for women seeking care in public health facilities. Access to C-section in Bihar State remains very low, while access has increased in many other contexts.

Methods

We used quality improvement (QI) combined with targeted resource management to test and implement changes that were designed to increase C-section delivery. We compared C-section delivery percentages after the interventions across eight intervened (QI) hospitals and between QI hospitals and the remaining 22 non-intervened (non-QI) hospitals with baseline CS <10%. We linked patterns of improvement and sustainability to theoretical drivers of improvement and timing of interventions.

Results

In QI hospitals, C-section percentage increased from 2.9% at baseline to 5.9% in the intervention phase and 4.6% in the post intervention phase. In non-QI hospitals, we observed a small change (2.6–3.3%) during the same time period of the interventions in the QI hospitals. Addition of skilled personnel resulted in increased C-section percentage in QI hospitals (3.6–5.9%) but not non-QI hospitals (3.4–3.2%).

Conclusions

C-section availability increased for a population of women giving birth following initiation of QI BTS collaborative in a low-income country public sector setting that has historically struggled to provide this service. Addition of obstetric and operating room resources alone, without interventions to support system changes, may not result in additional increase in C-section delivery. The adaptive implementation model may contribute to efforts to provide more access to C-sections in other very resource-limited settings.

Systems analysis of clinical incidents: development of a new edition of the London Protocol

The investigation of incidents and accidents, together with subsequent reflection and action, is an essential component of safety management in every safety-critical industry, including healthcare. A number of formal methods of incident analysis were developed in the early days of risk management and patient safety, including the London Protocol which was published in 2004. In this paper, we describe the development of a new edition of the London Protocol. We explain the need for a revised and expanded version of the London Protocol, addressing both the changes in healthcare in the last two decades and what has been learnt from the experience of incident analysis across the world. We describe a systematic process of development of the new edition drawing on the findings of a narrative review of incident analysis methods. The principal changes in the new edition are as follows: increased emphasis and guidance on the engagement of patients and families as partners in the investigation; giving more attention to the support of patients, families and staff in the aftermath of an incident; emphasising the value of a small number of in-depth analyses combined with thematic reviews of wider problems; including proposals and guidance for the examination of much longer time periods; emphasising the need to highlight good care as well as problems; adding guidance on direct observation of the work environment; providing a more structured and wide-ranging approach to recommendations and including more guidance on how to write safety incident reports. Finally, we offer some proposals to place research on incident analysis on a firmer foundation and make suggestions for the practice and implementation of incident investigation within safety management systems.

Diagnostic delay: lessons learnt from marginalised voices

Diagnostic delay, a type of diagnostic error, is the failure to establish an accurate and timely diagnosis; diagnostic delay remains a significant source of error in healthcare.1 As in other areas of medicine, there are racial and ethnic disparities in the risk of diagnostic delay; increased risk has been found among marginalised populations in a wide range of conditions, including breast cancer, acute coronary syndrome and even appendicitis in children.2–4 In issue 34:3 of BMJQS, Elena et al present the results of their systematic review of the perspectives of minoritised patients on the causes of diagnostic delay.5 They further map their findings onto an adapted Model of Pathways to Treatment, a conceptual model widely used to describe the diagnostic process.6 Through their work, the authors add voices from marginalised groups to a field of study where patient...

Audit and feedback to improve antibiotic prescribing in primary care--the time is now

Antimicrobial resistance (AMR) has quietly become a global health crisis, claiming 1.1 million lives annually as of 2021. If left unchecked, the death toll is forecasted to climb to 1.9 million per year by 2050.1Despite the mounting volume of data on the burden of AMR, the global response has been sluggish with limited progress.

Global leaders agree that multi-sectorial and multi-faceted approaches are needed to limit the emergence and spread of AMR. Antimicrobial use is a key driver of AMR, where as much as 50% of use is unnecessary.2 3 In humans, the vast majority of antimicrobial use occurs outside of hospitals, making this setting crucial for antimicrobial stewardship efforts. With the estimated number of global outpatient treatment courses of antimicrobials in the billions,4 curtailing inappropriate prescribing is a daunting task. However, audit and feedback has a robust evidence base and...

Co-production in maternal health services: creating culturally safe spaces, respecting difference and supporting collaborative solutions

Structural and social barriers to healthcare contribute significantly to the poorer health outcomes observed among minoritised ethnic people around the world.1 2 Globally, women who are members of an ethnic group that is a minority in their country of residence have been reported to receive suboptimal maternity care. This can include access challenges, poorer quality of care and support, as well as discrimination.3 4 This global pattern is mirrored in UK maternity services, where black, Asian and minoritised ethnic groups are at greater risk of severe morbidity and death during pregnancy, childbirth and postnatally than their white counterparts.5 Poor maternal outcomes have been attributed to intersecting factors, including social circumstances, cross-cultural communication barriers and organisational factors, which combine to delay help-seeking, reduce access and negatively impact experiences of care.6 7 Poor communication is a persistent...

Pages