In Conversation With… Karl Bilimoria, MD, MS

In Conversation With… Karl Bilimoria, MD, MS

August 1, 2017

Editor's note: Dr. Bilimoria is the Director of the Surgical Outcomes and Quality Improvement Center of Northwestern University, which focuses on national, regional, and local quality improvement research and practical initiatives. He is also the Director of the Illinois Surgical Quality Improvement Collaborative and a Faculty Scholar at the American College of Surgeons. In the second part of a two-part interview (the earlier one concerned residency duty hours), we spoke with him about quality and safety in surgery.

Dr. Robert M. Wachter: Surgeons have always cared deeply about their quality and safety and their outcomes, but they have tended to embrace the traditional, "It's all about the individual performance and virtuosity" mantra. How has life changed over the last 10 or 15 years in terms of systems thinking, and how do you and surgeons think about improvement now?

Dr. Karl Bilimoria: It has evolved from the individual blame game to thinking about how to improve systems of care to achieve safe care and good outcomes. The M&M [morbidity and mortality] conferences across the country have changed from blaming the presenter to thinking about how do we prevent this from ever happening again. It has certainly moved to a nonpunitive environment. The best evidence for how our culture has changed in surgery has been that whenever we now put forth public reporting initiatives, there is almost no pushback. Surgeons are used to this. They realize that it's better when we do it ourselves rather than having it pushed upon us. We definitely have evolved in both the safety and quality areas and hopefully to some degree are perceived as leaders in the area.

RW: I think you are. As people accept the idea of measurement, do they buy that the methodology is good enough—both in the sense that you're measuring the right things and that you've captured the frequent perception that "my patients are older and sicker"?

KB: I think it depends on the data source. Our confidence in claims data continues to be very poor. They are fairly terrible for measuring postoperative complications. And most of today's claims data are focused on the inpatient stay, but length of stay has decreased a lot. For complex operations, we discharge people after a day or two, so most of their complications are occurring in the outpatient setting.

But when you move into registry data like the National Surgery Quality Improvement Program (NSQIP) data—that we have much more confidence in. They're collected in a standardized fashion across 600 hospitals in the country. They collect a number of demographics and comorbidities about the patient and what was done in the actual surgery, which allows us to faithfully risk adjust and level the playing field among hospitals and surgeons. So that is where our future is.

RW: What are the other megatrends, as EHR adoption has gone from 10% a decade ago to 90% now? How does the work of registry data collection and analysis change, and where is that going over the next 5 or 10 years as the systems get better?

KB: We have perfected manual data abstraction—it's as good as it's going to get for now, using standardized definitions, audits, and whatnot. But it's clearly not sustainable. It only gets us a sample of the cases at each hospital because we don't have the manpower to collect every case. At this point, we are not very good at pulling data out of the EHR in an accurate, reliable way. We can get the easy variables: the demographics and their lab values and even some of their comorbidities. But what really becomes challenging has been to identify postoperative complications from the EHR using standard definitions. We've had projects with various EHR vendors putting their best and brightest to work trying to figure this out and that really hasn't panned out. I have faith that we can do it. Especially as systems move to unified EHRs, we'll see more and more standardization. We clearly have to do it. Otherwise, we will not succeed in having high-quality data for all of our patients.

RW: If you had a family member who needed surgery in a distant town (and you were not a doctor who knew who to call), what would you do to figure out what the best place was for him or her to go and who the best surgeon would be?

KB: The publicly available data for patients are still fairly weak. Most of it is based on that flawed administrative data and nonstandard, unaudited data. So that is challenging. I don't think there are great data sources out there for assessing empirically where you should receive care. To be honest, the thing you would do is to talk to your referring doctors in the local market and talk to family and friends and hear about who they have had good experiences with to get some of that information.

The other place that seems to have good face validity because it collects that data for you through a somewhat systematic survey is the US News and World Report rankings. Through their reputation survey, they essentially do that work for you of asking people, "Where do you think I should go for this type of care?" A lot of us fault the reputation survey and dislike it for a number of reasons, but it is filling a void right now. It is probably reflecting things like the services and specialists available, the trials, the research, and a variety of things that are done at those specialty centers.

It's embarrassing to say as somebody in the measurement field: I wish we could provide better information, but we don't have that type of information available yet. We're starting to make progress in publishing registry data, but not all of the hospitals in the country participate in these registries. The Society for Thoracic Surgeons National Database is a great example of where most hospitals participate, but not all of them are willing to publicly report yet. That trend is starting to change. As we can provide better data to the patients, I think they'll use that data over the recommendation of family, friends, and their referring doctors.

RW: There is an interesting tension there because of NSQIP and other efforts. We've gotten better at internally knowing who's doing well and who's not and learning from it. But we collectively have decided not to open that data spigot for patients. Grappling with the ethics of that and also the practicality of that—would there be less engagement with these systems if you knew it was going to be public—are really complicated questions.

KB: Yes.

RW: Talk about the Birkmeyer trial from a few years ago and the evidence for differences in technical skills among surgeons as they relate to meaningful outcomes. What do you take from that in terms of what we need to do about training and certification of surgeons?

KB: John Birkmeyer's study was brilliant. The fact that technical skills are associated with outcomes comes as no surprise to any surgeon—and probably no surprise to anybody else. Certainly, the technical maneuvers in the operating room are important, particularly in a highly complex procedure like laparoscopic bariatric surgery. The skill set for that is pretty complex. Birkmeyer set us up well to think about how do we get surgeons to study their own skill when they are out in practice, and improve upon it. There are several initiatives looking at video coaching. I lead a state-wide collaborative of 56 hospitals here in Illinois. We have implemented a program where surgeons record their laparoscopic colectomy; they score each other so they get a score sheet with comments and standardized scores about their procedure. Then they get together, pair up, and review their peer's operation. I watch yours and you watch mine, and we talk about details of the case. It's fantastic to watch. They flip over a piece of paper, and they're drawing and trying to explain their point of how they would do this differently or some other trick they learned.

The feedback has been heartwarming. We have surgeons who have said that it has been a transformational, life-changing experience to finally have somebody to give them feedback about their technical skills once they're in practice. I mean nobody watches us operate except maybe the residents, or if you're out in the community, some nurses and maybe your partner. It then spills over into nontechnical issues, which is really interesting. Like how do you get set up for the operation? What do you make sure you have in the room? How do you talk about which experts you need on backup? How do you deal with your scrub nurse who you've never worked with before and make sure she can help you through this complicated operation? The goal now is to figure out how we move that forward. How is it scalable? It is complicated to get two busy surgeons together to watch videos. We're looking at trying to do this more broadly for many more procedures across the state of Illinois and working potentially with the American Board of Surgery to try to understand whether this could have potential for a more meaningful maintenance of certification approach.

RW: Back to the question before about transparency for patients versus creating a safe environment for improvement. When I read Birkmeyer's study, I thought: if I needed a surgeon for my dad I would want to know his or her video score. I imagine if that were made generally available from the work you're doing or Birkmeyer did, your participation rates would plummet. What's the balance between this work and the patient's right to know whether their surgeon has technical skills?

KB: It's a fair point. I don't think we're there yet. We're still pretty early in this journey of trying to evaluate technical performance. We have no risk adjustments for this sort of measurement right now. Right now, it's purely an improvement activity. The bigger thing to keep in mind is that this is one piece, and it shouldn't be overshadowed by other aspects of surgical care. I would argue that even more important, and more nuanced, than the technical maneuvers in the operating room are the decisions about whom to operate on. The aftercare is important, too. You could do a technically beautiful operation, but if you did not give them optimal VTE [venous thromboembolism] prophylaxis and surgical site infection prevention approaches, or if it's a cancer operation and you don't give them the right adjuvant therapy afterward or recommend that they go meet with the medical oncologist… There are so many other aspects beyond technical skills that may be more important. What I'd like to see is a bundled approach where we have valid, reliable measures in decision-making, best practice adherence, and technical scores. And patient experience measures added to that. Then, we have a more robust comprehensive package by which to evaluate surgeon performance.

RW: In terms of trying to make this work scalable, is anybody working on analyzing the videotape through machine learning? I went to golf camp a few years ago and my swing got compared to Tiger Woods and Ernie Els. I didn't do very well, but they've devised an automated way to look at the mechanics of my swing and overlay them against a gold standard. All to tell me how bad my swing is!

KB: I have not seen that. Some groups have started to put biosensors on surgeons to be able to track, for example, how much they use their nondominant hand. How do you get that nondominant hand more involved in the operation to make you more efficient and safe? But anything more sophisticated than that with machine learning, it may be happening but not that I know of.

RW: It's probably still true that when someone applies for a surgical residency, nothing in the process of interviewing and applying tests whether they have the aptitude to have the technical skills to be a surgeon. When you compare that to the way people interview for a job to fly planes or other complex technical skills, does that seem right?

KB: It's true. We still don't do any technical skills assessment. There is probably some technical skills assessment that happens as you're a medical student. If there really is some phenomenal failure of your potential to be a surgeon, it would come out in your letters. But beyond that, it's also likely that we can train most people to be a competent surgeon. Some surgeries require really gifted technical skills, and a large number of other surgeries are much less complicated. I think people self-select once they're in residency. If you don't have the skills, you're not going to do hepatobiliary surgery. You're going to go do something more straightforward.

RW: Sometimes when I speak, I'll show Birkmeyer's videos of the good example and the bad example. The question is always: Can that bad example person be trained to be at least good enough?

KB: Yes, again we are pretty early in this, but we believe that we can get those people to improve—to be competent enough to do those surgeries safely.

RW: What does the training of surgeons and the measurement of surgical quality look like 10 years from now?

KB: I hope that we can have a transformative moment where we are able to get great data out of the EHR in a standardized fashion across all hospitals in the country and be able to provide really high-quality data. And then to give it back to ourselves to be able to analyze performance and identify opportunities for improvement. Give it to the public so they can assess quality in a way that we all believe in and has good face validity. I would love to see individual surgeon performance measures be available to the public as well as transparency around all aspects of hospital care. Patients want detailed data on every type of condition and every type of operation and want to know how their hospital and surgeon do at that specific condition. I would hope that in 10 years we've made considerable progress to getting standardized data to be able to provide that. Right now, it's simply a data abstraction and an analysis issue. If we can get that data in a more systematic fashion out of the EHRs, we will be able to provide good information.

The Evolution of Patient Safety in Surgery

December 1, 2017

Perspective

In 1979, 20 years before the Institute of Medicine's To Err Is Human report (1) catalyzed the creation of the patient safety field, University of Pennsylvania sociologist Charles Bosk published a book entitled Forgive and Remember: Managing Medical Failure.(2) The book, which became a classic, was based on 18 months of ethnographic observations and interviews with surgeons and their trainees at an unnamed academic medical center.

Describing a macho and unforgiving culture, one in which the notion of a systems approach to errors was largely unknown, Bosk ushered readers into the secret world of surgery. His description of one Morbidity and Mortality (M&M) conference captures the zeitgeist. After the case of a Mr. Will was presented, Bosk wrote that Dr. Arthur, the surgery attending, "sprang from his chair." The attending then gravely addressed the assembled surgical attendings and residents:

I think that this case represents all the things that are wrong with the hierarchy of a teaching hospital…. The first in the comedy of errors made on this man was made by the medical service. The decision by them not to dilate his abdomen was tantamount to gross neglect.… I'm now going to turn to the errors we made in treating this man. First, I made a fundamental error this early in the training year in allowing the chief resident to operate solo in this emergency. We should have learned from experience never to do this.… The [other] guilty party is the chief resident involved. By not calling for help when he ran into trouble, the resident took undue risk with the patient's life.… In this case, we have clearly committed surgical immorality.

Viewed against today's sensibilities, Dr. Arthur comes off as blunt and authoritative, a typical "surgical personality," more General Patton than Lucian Leape. His root cause analysis (though the term was not used in health care at that time) is an example of the kind of finger pointing that our modern safety movement eschews.

In fact, judged against modern patient safety standards, it is easy to reject Dr. Arthur's approach as anachronistic, even counterproductive. However, it is worth appreciating some of its virtues. First, he is as hard on himself as on others, a nice example of personal accountability. Second, while his focus is on the acts of individuals (particularly his own and those of the surgical chief resident), we now can appreciate that he is also arguing for system changes. Unfortunately, in 1979, the surgical field did not have access to today's terminology, nor to many of our effective tools and approaches.

For example, take the question of when residents should be able to operate independently. Today, this question would be viewed through the lens of competency assessment, accompanied by a discussion of milestones and Entrustable Professional Activities (EPAs).(3) The minimalist trainee accreditation environment of 1979 has been replaced by a far more rigorous set of standards, one that demands training programs demonstrate a culture of safety, an appropriate service-to-education ratio, and appropriate availability of, and oversight by, attending physicians.

Similarly, the chief resident's failure to call for help would not be automatically viewed as an individual defect. Rather, it would lead to a hard look at the program's culture and structure.(4) Within a strong safety culture and structure, trainees (and others) should understand that calling for help when one is in over one's head is a strength, not a weakness. Attendings should accept such calls with gratitude, rather than with words and body language that implicitly or explicitly criticize the caller for admitting his or her limitations. The attending back-up schedule should be appropriately staffed and accessible to everyone.

In the accompanying interview this month, surgeon and health services researcher Karl Bilimoria discusses advances in surgical safety, including improvements in the kind of hierarchy and culture that compromised the care of Mr. Will. Just the fact that leaders like Bilimoria are looking at surgical safety and quality with rigor, blending qualitative observations pioneered by Bosk with new methods of data collection and analysis (e.g., registry data, electronic health record data, video analysis) is evidence of significant progress. The surgical field has embraced the implementation of a variety of safety-related tools—including checklists and time-outs—that have been associated with improved outcomes.(5) An increased awareness of the importance of teamwork has, in some institutions, led to the implementation of team training and simulation programs, some of which have also been associated with improved outcomes.(6,7). Bilimoria and his team also led a national study to bring evidence to bear on the controversial issue of residency duty hours. The resulting FIRST trial, discussed by Bilimoria in another recent PSNet interview, is a wonderful example of using evidence to determine how best to structure system changes.(8)

Improvements in the processes and culture in the operating room (OR) have been accompanied by an increased awareness that surgical outcomes are often determined by events outside the OR, including whether a patient has received appropriate prophylaxis against surgical site infections or deep venous thrombosis, and whether a hospital has implemented systems to prevent failure to rescue when things go wrong. And, of course, some of the gains in surgical outcomes can be traced to major improvements in anesthesia safety.(9)

Even with all of this progress, there is room for further improvement. Just as diagnostic errors have, until recently, been a relatively neglected part of the patient safety field, so too has surgical decision-making—particularly the often complex decision regarding whether to operate. A perfectly executed but inappropriate operation is still an example of preventable harm.

Moreover, the use of simulators, both for training and assessment, remains sporadic. Birkmeyer's landmark 2013 study, which showed that patient outcomes were tightly correlated with surgical technique (as judged by peer reviews of intraoperative videos) has led to much discussion in surgical circles but still relatively little action.(10) What should be done in the wake of these provocative results? In my judgment, we should consider assessing hand–eye coordination when students apply to surgical residencies, similar to assessments of aspiring commercial pilots and military aviators. Furthermore, emerging evidence supports the use of video recording and review of surgical procedures, akin to the black box review after aviation crashes and near misses.(11)

We should also develop and implement methods to promote ongoing assessment of procedural skills, along with easy access to peer coaching.(12) There should be consequences for surgeons who cannot meet a baseline standard of procedural competency. Procedural competency assessments should be part of maintenance of certification programs in procedural fields. Finally, in an era of transparency of meaningful safety and quality data, it is hard to argue against providing patients with validated data regarding procedural competency. Such data will prompt improvement and help patients make more informed choices about which surgeon to see.

None of these changes are easy, but we will need to make these kinds of choices to take surgical safety to the next level. Surgery has always had the two most important ingredients for any safety program: an intense commitment to excellence and a powerful sense of accountability. Over the past 20 years, these personal strengths have been augmented by meaningful system and culture changes that have led to improvements in surgical outcomes. Continued progress will require that some of the harder questions—particularly those that involve clinician decision-making and procedural competency—be addressed in equally innovative ways.

Robert M. Wachter, MD Professor and Chair, Department of Medicine Marc and Lynne Benioff Endowed Chair University of California, San Francisco

References

1. Kohn L, Corrigan J, Donaldson M, eds. To Err Is Human: Building a Safer Health System. Washington, DC: Committee on Quality of Health Care in America, Institute of Medicine. National Academies Press; 1999. ISBN: 9780309068376.

2. Bosk CL. Forgive and Remember: Managing Medical Failure. 2nd ed. Chicago, IL: University of Chicago Press; 2003. ISBN: 0226066789.

3. ten Cate O, Hart D, Ankel F, et al; International Competency-Based Medical Education Collaborators. Entrustment decision making in clinical training. Acad Med. 2016;91:191-198. [go to PubMed]

4. Lyndon A, Sexton JB, Simpson KR, Rosenstein A, Lee KA, Wachter RM. Predictors of likelihood of speaking up about safety concerns in labour and delivery. BMJ Qual Saf. 2012;21:791-799. [go to PubMed]

5. Haynes AB, Weiser TG, Berry WR, et al; Safe Surgery Saves Lives Study Group. A surgical safety checklist to reduce morbidity and mortality in a global population. N Engl J Med. 2009;360:491-499. [go to PubMed]

6. Bruppacher HR, Alam SK, LeBlanc VR, et al. Simulation-based training improves physicians' performance in patient care in high-stakes clinical setting of cardiac surgery. Anesthesiology. 2010;112:985-992. [go to PubMed]

7. Neily J, Mills PD, Young-Xu Y, et al. Association between implementation of a medical team training program and surgical mortality. JAMA. 2010;304:1693-1700. [go to PubMed]

8. Bilimoria KY, Chung JW, Hedges LV, et al. National cluster-randomized trial of duty-hour flexibility in surgical training. N Engl J Med. 2016;374:713-727. [go to PubMed]

9. Bainbridge D, Martin J, Arango M, Cheng D; Evidence-based Peri-operative Clinical Outcomes Research (EPiCOR) Group. Perioperative and anaesthetic-related mortality in developed and developing countries: a systematic review and meta-analysis. Lancet. 2012;380:1075-1081. [go to PubMed]

10. Birkmeyer J, Finks JF, O'Reilly A, et al; Michigan Bariatric Surgery Collaborative. Surgical skill and complication rates after bariatric surgery. N Engl J Med. 2013;369:1434-1442. [go to PubMed]

11. Goldenberg MG, Jung J, Grantcharov TP. Using data to enhance performance and improve quality and safety in surgery. JAMA Surg. 2017;152:972-973. [go to PubMed]

12. Gawande A. Personal best. New Yorker. October 3, 2011. [Available at]

This project was funded under contract number 75Q80119C00004 from the Agency for Healthcare Research and Quality (AHRQ), U.S. Department of Health and Human Services. The authors are solely responsible for this report’s contents, findings, and conclusions, which do not necessarily represent the views of AHRQ. Readers should not interpret any statement in this report as an official position of AHRQ or of the U.S. Department of Health and Human Services. None of the authors has any affiliation or financial involvement that conflicts with the material presented in this report. View AHRQ Disclaimers