#PTonICE Daily Show – Thursday, November 9th, 2023 – Clinically relevant statistics: the forest plot

In today’s episode of the PT on ICE Daily Show, ICE faculty member Christina Prevett emphasizes the crucial role of understanding statistics in making clinically relevant decisions. While staying up to date with the literature and being evidence-based are often emphasized in healthcare, Christina points out that it is not enough if one lacks the ability to comprehend the meaning of statistics and their application in a clinical setting.

Christina acknowledges that interpreting statistics can be challenging, even for individuals with a PhD and experience in the field. This understanding leads the host to empathize with clinicians who may find statistics intimidating. It is recognized that being evidence-informed and evidence-based requires clinicians to possess the skills to understand and interpret the data they encounter.

To make statistics more clinically relevant, Christina suggests utilizing systematic reviews and meta-analyses as tools for interpretation. Specifically, she delves into the interpretation of a forest plot, which graphically represents the results of a meta-analysis. By understanding how to interpret and analyze the data presented in systematic reviews and meta-analyses, clinicians can determine if the findings are significant enough to drive changes in their practice.

Christina also highlights the importance of considering clinical relevance when interpreting statistical findings. The concept of the minimum clinically important difference (MCID) is introduced, which refers to the smallest change in an outcome measure that is considered clinically meaningful. An example is given of a statistically significant improvement in a timed up-and-go (TUG) test, but it is explained that it may not be clinically relevant if it does not meet the MCID for the TUG.


Take a listen to the podcast episode or read the full transcription below.

If you’re looking to learn more about courses designed to start your own practice, check out our Brick by Brick practice management course or our online physical therapy courses, check out our entire list of continuing education courses for physical therapy including our physical therapy certifications by checking out our website. Don’t forget about all of our FREE eBooks, prebuilt workshops, free CEUs, and other physical therapy continuing education on our Resources tab.


Hey everyone, this is Alan. Chief Operating Officer here at ICE. Before we get started with today’s episode, I want to talk to you about VersaLifts. Today’s episode is brought to you by VersaLifts. Best known for their heel lift shoe inserts, VersaLifts has been a leading innovator in bringing simple but highly effective rehab tools to the market. If you have clients with stiff ankles, Achilles tendinopathy, or basic skeletal structure limitations keeping them from squatting with proper form and good depth, a little heel lift can make a huge difference. VersaLifts heel lifts are available in three different sizes and all of them add an additional half inch of h drop to any training shoe, helping athletes squat deeper with better form. Visit www.vlifts.com/icephysio or click the link in today’s show notes to get your VersaLifts today.

Good morning everybody and welcome to the PT on ICE daily show. My name is Christina Prevett. I am one of the lead faculty in our geriatric and pelvic health divisions. So usually you’re seeing me on Monday and Wednesday, but today I’m putting on my PhD research hat to talk a little bit about statistics, which I know sounds really boring, but I promise I’m gonna make it really exciting. But before we do that, we have a couple of courses that are coming up across our divisions. So MMOA is in Wappinger’s Falls, NY this weekend. Extremity Management is on the road in Woodstock, Georgia. And Cervical Spine is heading to Bridgewater, Massachusetts. And so if you are looking to get in some Con Ed before the end of the year, we still have a couple of opportunities across all of our different divisions. And so I encourage you to go to ptinice.com and take a look at some of those opportunities. Okay, so a little bit about my kind of hat outside of working with ice is that I recently finished my PhD at McMaster University at the end of this year. I just announced that I’m doing a part-time postdoctoral fellowship at the University of Alberta looking at resistance training and its interaction with pregnancy and pelvic floor function.

What that means is that I am bumping into statistics all the time. And I’m going to like kind of start this off and say, I’ve been asked to do some webinars and things around statistics for the ice crew for a while. And to be honest, it’s been really intimidating for me to do that, despite the fact that, you know, I have a PhD and I’m interacting with this stuff all the time. Um, statistics is hard and, you know, discussing statistics in a way that makes sense is also challenging. And when I reflect on that and the fact that you know, I feel uncomfortable sometimes with interpretation and you know, I did a part-time PhD for seven years and I’m in a postdoctoral position. I recognize how challenging it can be for clinicians. And, you know, we get told all the time, like, you know, stay evidence-informed, like it’s important to be evidence-based. It’s important to stay up to date with the literature. But your ability to stay up to date with the literature is only as good as your capacity to understand what it is trying to tell you. And I mean that in the best way possible, that it is so tough for us to gain insights from what the statistics mean into what is clinically relevant for us to understand and be able to bring into our clinics. So today I’m trying to take our statistics and make them clinically relevant to you.

One of the first ways that I want to do that, and if you like this type of podcast please let me know, and I’ll do more, is around the systematic review and meta-analysis and then trying to kind of deep dive into interpreting a forest plot. So when we’re thinking about a systematic review, this is the highest level of evidence when we have a systematic review of intervention or prospective studies. When we take a systematic review, we ask a very specific question. And I’m going to use the example, I’m working on a systematic review right now on resistance training and pregnancy. And I’m going to take some of that to make this relevant to how this happens. This is where we’re trying to get an idea of the state of the literature. So we use a PICO format, which is the population that we’re trying to look at. So in this case, it’s individuals who are pregnant. The intervention is what you are trying to see if there’s a positive or negative benefit or whatever that exposure may be. And that for me is resistance training. The comparison group is to usual obstetrical care. And then the outcomes, we are looking at fetal delivery, pregnancy, and pelvic floor-related outcomes. So we’re looking at the investigation of resistance training on incidents of gestational hypertension and preeclampsia, gestational diabetes, rights of cesarean section, the size of babies, and babies more likely to be too big or too small. What does their birth weight look like? How long are they pregnant? And then are they at increased risk for things like urinary incontinence, pelvic organ prolapse, diastasis recti, or pelvic girdle pain? So that’s kind of the format of a systematic review we’re trying to answer a very specific question. From there, we go to the literature and we want to make sure that we encompass as much literature as we can. in our search strategy. So that is usually why you’ll see a list of PubMed and OVID, CINAHL, Sports Discus, like these types of different big searching platforms that are looked at. And then you’re going to get a Prisma plot that you’re going to see in the first figure. And that kind of describes a person’s search strategy. So how many hits were given when this search was done? How many were excluded because of duplicates? How many were excluded from the title and abstract because they were done in rats instead of in humans? Or they were looking at an acute effect of resistance training versus being on a resistance training program like you’re going to have a lot of those that are excluded. And then you’re going to have kind of what is included in your systematic review, and then what is included in your meta-analysis if a meta-analysis is indicated or possible. When we’re looking at a systematic review, we’re looking at a qualitative synthesis. And what we mean by that is that we’re trying to figure out, you know, where the state of the literature is. And when I’m reporting on something like the systematic review portion of a paper, You’re seeing things like, you know, how many studies were done in resistance training in pregnancy? How long were those interventions? Were they done in the same cohort of individuals? What was, how many of them were statistically significant? What was the dosage of that intervention? Those are things that kind of come under the systematic review umbrella. But I would say really now the emphasis is being placed on the meta-analysis and that is the quantitative combination of these studies and that is what gives us this forest plot. So when we are going through and doing a meta-analysis, there are a couple of things that we need to make decisions on very early on. So the first thing is on a random or a fixed effects model. This is kind of getting into the weeds, but almost all papers are going to be a random effects model, which means that we’re going to expect some variability in the population that we are working with, and we’re going to account for that variability in the calculations that we’re using for our forest plot.

The second thing that we are looking at is a priori subgroup analysis. And so I’m going to use my research study to describe this. Before going into this meta-analysis and putting this forest plot together, we have to brainstorm around where possible sources of skew or bias would come into a forest plot. For example, in the resistance training intervention, it would be very different when we have resistance training in isolation versus resistance training as a component of a multi-component program. And so one of our subgroups analyses a priori we discussed was that we were going to subgroup studies that were only resistance training compared to our big meta-analysis, which included our resistance training in isolation or as a multi-pronged program. Another example in our systematic review is that some of our studies were on individuals with low risk at inception into the papers versus those that were brought into the study because they were diagnosed with a complication like gestational diabetes. we could think that the influence of resistance training on a person who has not been diagnosed with gestational diabetes versus those who have could be different. And so we did a secondary subgroup analysis where we looked at the differences between studies that looked at only individuals with gestational diabetes versus those that didn’t. And so when you are looking at a forest plot, you will see the big analysis at the top, including all of the different studies. And then after that, you will see different subgroups where there’s a repeater of what was in the main group, but it’s a subsection of the included studies. And that’s what we see. And then we try to see, you know, is resistance training and isolation positively associated with a benefit? versus multi-component or is there no difference and that gives us a lot of information too? So that’s that subgroup analysis. Then you go into the results of the paper and there is a forest plot that is there and this forest plot has a bunch of different names of studies It has the total number of incidences and the weight. It has a confidence interval with a number around it. And then on the right-hand side, there’s like dots with lots of lines and then a big thick dot at the bottom. I’m trying to explain this to our podcast listeners so that you can kind of understand. And I hope you’re kind of thinking of a study in your mind that you have seen in the past. But we’re going to kind of explain each of these different things. Okay, so when we’re looking at what we are trying to find, it is going to depend if we are looking at a dichotomous variable like did gestational hypertension get diagnosed or not? And if it is a dichotomous variable, what we’re looking at is an odds ratio with a 95% confidence interval. So if we are thinking that no difference between usual care and resistance training is one, then a reduction in risk for gestational hypertension with resistance training would be an odds ratio that is less than one. When it is less than one, it becomes statistically significant when the 95% confidence interval encompasses all numbers less than one. When the confidence interval, say for example, our odds ratio is 0.8, we can say that there is a 20% reduction in risk, because a one minus 0.8, of getting gestational hypertension because of resistance training. I’m making these numbers up. But that is only statistically significant if the confidence interval is 0.7 to 0.9. then we can say there’s a statistically significant reduction in risk for gestational hypertension with resistance training in this systematic review of this meta-analysis. Where we cannot say it’s statistically significant is if the odds ratio is 0.8 and the 95% confidence interval is 0.6 to 1.2. That crossing of one means that there is a higher likelihood that there is that variation is because of chance and not because of a true difference. And so what you see is that when you’re looking at the odds ratio, the combination of all of those odds ratios from the individual studies are then pooled in that bolded line at the bottom of the forest plot to give us the confidence that we have based on all of the studies combined, that there is a true effect of resistance training in this example on gestational hypertension.

The other kind of statistic that we’re looking at is the I-squared statistic or the amount of heterogeneity. So when you’re looking at that forest plot and you’re seeing all the dots and those lines, the heterogeneity is basically saying how close are those dots? How much spread is there in those dots? And so if the heterogeneity is low, we can say that not only did we have a statistically significant result, but across all of the studies, we tended to see a trend in the same direction. So it allows us to have more strength and confidence in the results that we are getting. If we see a high amount of heterogeneity, so like there are some that are like really favoring control and saying that resistance training is bad for gestational hypertension, and then some are having really positive effects of gestational hypertension on resistance training, that I square statistic would be high, and then we would probably have to be doing more evaluation, and that’s where we would rely really heavily on the subgroup and say, Well, is there certain subpopulations of this group that are skewing the data in one way or the other where their results may be different than the results of other individuals? And so that gives us a bit more information. So the odds ratio is when we’re looking at the presence of an event and it’s a binary variable of yes, this exposure exists or no, this exposure didn’t. When we are looking at continuous variables, we are looking at like a time on an outcome measure, like the time to up and go, we are looking at a mean difference score between resistance training and a control. So the mean difference is going to be in the measurement of the outcome measure that we are looking at. So the target would be seconds. So then from the pool, it would be plus, Six seconds or mine I guess minus six seconds would be in favor of resistance training and that your tug score is six seconds less in a resistance training arm than a control arm or if it goes against resistance training it would be plus six and Again, we’re looking at that 95% confidence interval. That average, that mean difference is also something that we would push against what our clinically relevant difference is. So we may see something that’s statistically significant at a two-second improvement, but we know that the MCID for the TUG is four seconds. So while yes, it’s statistically significant, it may not be a clinically relevant finding. So that’s kind of where we build in clinical relevance. And then again, we look at that 95% confidence interval, see what that spread looks like, and look at that I squared statistic. Where it gets a little bit more complicated is when we have things that are measuring the same thing, but measuring it in a different way. So an example in the systematic review that I did on resistance training and lower extremity strength is that there are a lot of different ways for us to measure lower extremity strength. Some people may use an estimated one rep max, and Some people may use a five-time sit-to-stand as a conduit for functional strength training. Some people may use a dynamometer for knee extensor strength. There’s a lot of different ways for us to do that. We can still do a meta-analysis on this, but what we have to do is transform all of those variables into one type of measure. And that’s when we would see something called a standardized mean difference, an SMD. And in that SMD, we’re essentially taking the impacts of all these different types of measurements that are telling us the same information and putting it into an effect size. And so the effect size gives us the amount of confidence that we can see in the influence of the intervention resistance training on the outcome of lower extremity strength. So an effect size using Cohen’s d statistic would be that less than 2 is no effect, 2 to 5 is a moderate or minimal effect, 5 to 0.8 is a moderate effect, and 0.8 and above is a large effect. And so in my systematic review on lower extremity strength and resistance training in individuals with mobility disability, we saw a standardized mean difference of 3, which means that we can be really confident there was a large influence of resistance training on the development of lower extremity strength. So kind of pulling this all together, I know I threw a lot at you. When you were looking at the forest plot, you were looking at trends in the data that are pooling all of the different intervention studies, looking at the same construct and looking at the same outcome. When we are looking at the odds ratio, this is a binary variable. There’s going to be a 95% confidence interval. And the pooled odds ratio that we look at with respect to making decisions is that bolded number at the bottom. Our I-squared statistic gives us an idea of the spread of the data and the results that we see. When we are looking at continuous variables, you’re going to see either a mean difference or a standardized mean difference. The mean difference is reported in the measurement of the outcome measure that we’re talking about. So it could be seconds, it could be points. A standardized mean difference is an effect size where we are transforming multiple different outcome measures into one output that’s pooling these things together, but we have to do it in a standardized metric that looks at the magnitude of the effect of that outcome. So how do we think about this clinically? Well, the first thing is that we need to understand where these effect sizes are and if they are significant. And then we have to put it through the filter of, is this clinically relevant? When we have something that isn’t statistically significant, the next thing to do is go into the methods and say, you know, was this dose appropriate? Was this done in the way that I would do this? And can I be confident that the interaction between what I would do in the clinic and what was done in these studies is significant enough for me to drive changes in my practice? All right, I hope you found that helpful. I’m at 18 minutes, I knew I would. But if you have any other questions about statistics and how to interpret them, please let me know. It’s really important that we know how to understand the data that we’re being presented with because that’s how we’re gonna change our clinical decisions based on what we are seeing. All right, have a wonderful afternoon, everyone. I promise hopefully I didn’t stress your brain out by talking about math too much and hopefully, this was helpful and we can do it again sometime.

Hey, thanks for tuning in to the PT on Ice daily show. If you enjoyed this content, head on over to iTunes and leave us a review, and be sure to check us out on Facebook and Instagram at the Institute of Clinical Excellence. If you’re interested in getting plugged into more ice content on a weekly basis while earning CEUs from home, check out our virtual ice online mentorship program at ptonice.com. While you’re there, sign up for our Hump Day Hustling newsletter for a free email every Wednesday morning with our top five research articles and social media posts that we think are worth reading. Head over to ptonice.com and scroll to the bottom of the page to sign up.