In Conversation With… Paul G. Shekelle, MD, MPH, PhD
Editor's note: Dr. Shekelle is director of the Southern California Evidence-Based Practice Center at RAND Corporation, director of the Quality Assessment and Quality Improvement Program at RAND Health, and professor of medicine at the UCLA School of Medicine. A practicing general internist and an international leader in evidence-based medicine and quality improvement, he led an AHRQ-funded effort to better define the role of context in patient safety.
Dr. Robert Wachter, Editor, AHRQ WebM&M: In your career you've done a lot of work in quality, quality assessment, quality improvement, and appropriateness. What got you interested in taking a harder look at patient safety?
Dr. Paul G. Shekelle: Well, a couple of things got me more interested in patient safety. The first was, we were asked to review for the government the information on the cost and benefits of health information technology (HIT). I became aware that we were not going to be able to summarize this area the way we typically summarize the evidence for the other kinds of topics that the government would give us—things like, "what's the role of beta-blockers in heart failure," or "what's the effect of antibiotics on acute otitis media?" We tend to summarize those kinds of things with a single number, such as "the effect of beta-blockers is an X% improvement in mortality in patients with heart failure." As we were looking at these HIT data, it became apparent that there was no way to summarize this in a simple way that was going to be meaningful to all users, because the information was going to be so highly dependent on the individual circumstance. People might be interested in looking at the evidence in different ways. One way might be if you were in a hospital as compared to being in an ambulatory care site; another would be is if you're trying to improve quality as opposed to trying to reduce cost; another might be that if you were in a large integrated system like Kaiser versus a free-standing or single institution.
The contextual milieu in which a study has been done really came to the fore for me, and so we ended up doing a couple of different things in this report. First, instead of creating an overall summary analysis of the result, we created a database with all the 500 or 600 articles that we found, and tagged each one by a number of different variables including the organizational setting, what kind of HIT system it was, what kind of outcome they were hoping to influence, what kind of study design it was, and other variables. This was so that users could try to customize the evidence to their own particular clinical situation. While there was some agreement in the community about the kinds of contextual issues that might be important, there also was not a lot of thinking about how these would be measured and how they should be reported. For many of the aspects that people think are important when deciding to implement an HIT system, the published articles don't really give information about the situation where it was implemented, and that led me on further to try to understand more about context and particularly its role in patient safety.
RW: I was very interested in your analogy to clinical medicine. You were describing a study of whether beta-blockers work in a given patient, and your ability to articulate that they lead, let's say, to a 10% benefit. It strikes me that even in this case, you might want to know context—does the patient have diabetes, is the patient poor or wealthy, does the patient live in the United States or in Europe? In this situation, though, we don't think of these as context but rather as confounders. What is the fundamental difference between patient safety research and clinical research in this regard?
PS: When you open up a journal article describing a clinical trial of an intervention for heart failure, you will usually see a table of demographics of the people enrolled. It will tell you something about the age, gender, proportion of people who smoke, New York Heart Association classification, proportion who had diabetes or hypertension, and other factors that might influence the outcome, such as mortality or hospitalizations. Those things are all built on a large body of information about the physiology of the condition and about the kinds of things medical doctors think about patients. In addition to the main result in the journal article, you'll frequently see another table or figure presenting the effect of the intervention by those variables: how it worked in men versus women, how it worked in patients with diabetes versus those without, etc. You can visually scan these results and get a sense of whether this intervention is approximately equally efficacious or whether it's particularly effective or ineffective in certain subgroups.
When we think about how to implement something in an organization, most of the analogous variables aren't known. Everyone agrees that there has to be some analogy to age, gender, New York Heart Association classification, presence of diabetes, etc., when thinking about an organization as the site of the implementation of an intervention. But what is reported in studies is usually just bed size, academic status, and whether it's rural or urban. Yet there are a variety of other factors that may be important, including organizational complexity, level of teamwork and leadership, and prior experience with patient safety initiatives. Even within certain things like organizational characteristics, we don't necessarily understand what those things are and what should be reported. What differs in patient safety practice, as opposed to clinical practice, is that there are a whole host of implementation issues that people believe important in allowing the intervention to be delivered as intended. Yet mostly these kinds of implementation issues aren't described in the patient safety literature, leaving people to ask the question at the end of an unsuccessful or partly successful study: was it an underlying problem with the theory of the intervention or a problem in the implementation?
RW: And are they not described because we don't really know how to describe them?
PS: I think we know more than we're taking advantage of right now. Do we know how to describe them the same way we know how to describe age and gender? Of course not. But I think that a lot of measures out there could be used for these purposes, and the field would be moved forward more quickly if people started using those measures and reporting on them so that people got a sense of how they worked. I believe it was Arnold Milstein who once said that what makes good measures is using them. I think that that's really true for these so-called softer measures as well. Just think about what's gone on in the measurement of quality of life, which started out 20 some years ago, to try to measure health outcomes that weren't death or heart attacks or hospitalizations. Of course it began relatively rudimentary, yet look where we are today. Not only do we have generic measures of quality of life, but we have all kinds of disease-specific measures of quality of life and they're used routinely in clinical trials of interventions where that's an important outcome dimension. They didn't just spring forth from the earth fully developed; they started out as rough ideas and improved with time. I think that these same kinds of so-called softer measures are going to be improved most quickly if they get out there, get used, get reported on, and people can start comparing the results with them across different studies and decide how to make them better.
RW: I was struck, again in your clinical analogy, that as you measure all of these potential confounders you can really do two very different things with them. One is adjust for them so that your main findings aren't unduly influenced by differences in some of these important predictors, and the second is stratify by them so you can say, "This medicine is less likely to work in this particular patient who is older and has diabetes." As you think about the analogy in patient safety, let's fast forward 5 years from now and assume there are now terrific measures of these contextual factors. Do they get used to adjust away these differences? Or do they get used to tell us what place or what kinds of context are likely to predict an effective intervention?
PS: They certainly could be used for both. Let's say I was the manager of patient safety in the VA and I was reading about some new great thing that had been tested and found to be beneficial; I would want to know where this was tested. Does it look like the kind of system that I deal with in the VA? In the VA, we have a centralized electronic health record, for example, and we have a centralized organizational structure, with some decentralized variations. We have mostly male patients but a growing female population, and there are also certain requirements in order to be eligible for care here. The places where this was tested and found to work: do they look like my system? Alternatively, it may have been tested in numerous different kinds of health care settings and it would be very helpful to know if it worked about equivalently well in all of those settings. Statins, for example, seem to work pretty much in any kind of population, but beta-blockers seem to vary quite a bit depending on the population that you're dealing with and the type of beta-blocker that you're using. If the latter is the case for the patient safety practice, I'd want to know whether this thing looked like it would work well in my system and, if it didn't, are there things in my system that I can change to make it work better? So for example, you might find that it worked best in places that had strong unit-level leadership support for it. Is there a way in my system to get the leadership behind it, which may be a crucial factor for getting this thing implemented?
RW: The patient safety field has much more active accreditation and regulation standards than the field of clinical medicine, so it would be inconceivable that an article would come out and say that beta-blockers work in heart failure patients and that you would then be required from an accreditation or regulatory standpoint to do that. A standard of care might emerge, the data about whether you do it or not might be publicly reported, but it would never be (at least in the present thinking) that it's an accreditation or regulatory requirement for it. Whereas an analogous study that says that a given practice works in patient safety might lead to it becoming a Joint Commission requirement or a regulatory requirement. Does that change the importance of this measurement and focus on context?
PS: I think it does actually, and this is not to in any way critique things that have gone on in the past. We're basically trying to build the airplane as we're flying it here. There's a lot of obvious concern and desire to try to get the health care system as safe as possible as quickly as possible. But some of these contextual things may not have been well recognized at the time. If something is highly context-sensitive, then it becomes a challenge to regulate that it be done in all 6300 health care facilities if it turns out that it can't be well achieved at some of those sites. There's sort of two horses that are both running in the same direction, and we have to make sure that we can do the best we can to keep the context horse in front of the regulatory horse, so that we can make sure when the regulations come out they make the most sense possible.
RW: You just finished a large project for AHRQ in examining these issues about context, with a world-class advisory or expert panel, people from different countries and backgrounds, clinical medicine, social science, other walks of life. What did you learn from that group? What were some of the different perspectives that people brought to the table, and how did they influence your thinking about these issues?
PS: I learned a lot in this project. It really was an extraordinary group of individuals who brought a wide-ranging set of perspectives. One of the main take-home messages was how similar their points of view actually were once you got into them in sufficient detail to allow people to discuss it. Although people may have sounded like they were coming at this from different angles initially, as we got further into it, it turns out we're all coming at this more or less from the same perspective. The end result of this project was a strong set of recommendations for patient safety practice evaluation in terms of trying to build an evidence base to reach conclusions about effectiveness, and also the sensitivity of effectiveness to context.
RW: What were the main findings of the project?
PS: Context doesn't have a precise definition. For some people, context is a particular set of variables that can be described and measured, and for others context can go all the way out to being everything that contributes to the success or lack of success of a particular intervention that hasn't already been measured or taken account of in the analysis. Trying to bridge that domain was not something that we were able to achieve. But we were able to achieve agreement on some contextual domains that are high priority, meaning that there's enough theory and evidence to think that these are going to be important ones—at least to describe in terms for readers to understand what kind of situation the patient safety evaluation was done in, and then in some cases probably also being important in terms of the effectiveness. Those domains would be things like the structural organizational characteristics, not only the size and academic status, but also the financial status or organizational complexity. Exactly which variables may be different from project to project, but something about the structural organization of the place where this is being implemented. These things are things that are mostly not changeable by the organization. A second domain of context that would be important is external factors: what's in the external milieu, is this a regulated practice, is this something that is being reported on as part of a public reporting system either nationally or in the local environment, or has there been some high profile lay media sentinel event where something disastrous happened which gives an extra stimulus to this? These are things that at the individual health care manager's perspective are mostly not changeable; you're living in the environment where you are. A third important domain of context would be this collection of teamwork and leadership and patient safety culture. It's not clear exactly how independent these three are, but all of them in some form or another are thought to be important in terms of the effectiveness of complex organizational interventions that constitute many of the patient safety practices. The fourth high priority contextual domain would be what we we're calling management and implementation tools, and this would be one of the things that could be influenced by somebody thinking to implement a patient safety practice. For example, what kinds of training resources you had, how were they deployed, did you hire external consultants, did you have local champions, did you use internal audit and feedback, were there other internal organizational incentives? All these are the kinds of tools and tricks that people can try to use in order to get something implemented. So measuring something in those four domains and reporting on it was thought to be an important factor for both describing the situation in which the patient safety evaluation took place and then later if this was implemented across multiple sites, as important modifying variables for the effectiveness of the practice.
RW: One of your co-investigators on the project was Peter Pronovost, who has written about the checklist work he did first in Michigan and then later disseminated throughout the rest of the country. It's often framed as just a checklist, and he pointed out in a Lancet article that it's much more than that: leadership had to be involved; there were efforts to influence culture change. Is that a good example of what you're trying to achieve?
PS: Clearly it's not just the checklist. It's all the things surrounding the checklist. What was very interesting at the expert panel meetings where Peter was involved is that some of the experts of the group then asked him about some additional contextual factors, how did you ultimately get people to do such and such. Then he explained it, and those were things that hadn't yet made it into print either. So there were additional contextual factors. It's a fabulous example of exactly what we're talking about. It's also a fabulous example of how what starts out as context—in terms of trying to understand issues around implementation and where it's more effective and less effective—then can get incorporated into the mechanism of the intervention. So as you learn something about how to make it more effective, you then incorporate that as part of the intervention in sort of a second generation or a third generation as it gets rolled out.
RW: What advice do you then give to the junior researcher who has been pushed by his or her mentors to isolate a single intervention, to make it as pure as possible? You want to try to influence these contexts in a direction to make the intervention most likely to succeed? It seems like there's a real tension there.
PS: For many patient safety practices, we need to move away from the idea that we can isolate a single intervention and hold it static throughout the period of the evaluation. That situation is both artificial and probably not even helpful. What most distinguishes patient safety practice research and quality improvement research from its clinical research counterparts is that, in clinical research, we don't expect the intervention to change as the result of the patient. So when we give somebody an ACE inhibitor, statin, or beta-blocker, we don't expect the statin to undergo a metamorphosis during the clinical trial as a result of the patient himself or herself. Whereas, in patient safety practices and in quality improvement in general, we expect that to happen and we expect it is necessary to maximally achieve effectiveness of the intervention. The junior investigator that you just posited would be wise to recognize that the dynamic nature of the intervention and the target of the intervention is something that is both desirable and to be promoted. However, it has to be described fully and transparently and in the context of the theory or logic model for why it should work, so that you can then document changes that were necessary to have it achieve its maximal success. Because what's really going to be most interesting to people who are reading about it is knowing how you made it work.