A "type aloud" as I review a meta-analysis of digital game-based learning

For this, my first post on digital learning research, I thought I’d do a “type aloud” (a term I just made up…I think). Often in my research I ask participants to “think aloud” as they work on something, in the hope of revealing not only what they are doing but what they are thinking as they do it. It can be a useful way to discover ways of thinking that would otherwise be difficult to surface, such as “I was reading that paragraph and thought I understood it, but then I realized I was confused, so I started over.” That realization would be tough to observe or infer just from seeing someone start reading a paragraph over again (if you could even spot that just by watching them). Thinking-aloud while reading an article (i.e., “typing aloud”) might be helpful for modeling how I understand and analyze research, I hope. :)

So, I’m going to jot down some thoughts as I read Barz et al. (in press) “The Effect of Digital Game-Based Learning Interventions on Cognitive, Metacognitive, and Affective-Motivational Learning Outcomes in School: A Meta-Analysis.”

  • Okay, let’s open up the pdf. Ah, it’s a meta-analysis. Meta-analyses can be really great because they synthesize a bunch of empirical work and give you the “big picture” story. But, they are susceptible to the “garbage in, garbage out” (GIGO) rule - if the empirical work on a topic is lousy or if there isn’t enough of it yet, then the synthesis of it will be lousy, too. I gotta watch for that.
  • Where was this published? Oh, it’s in Review of Educational Research. That’s one of the best journals in education, with really rigorous review processes. So, I’m a little less concerned about GIGO, but still want to watch for it.
  • In the abstract, the authors wrote that it would be useful to meta-analyze studies published more recently, rather than all the ones ever published, and to look at just those studies conducted in school contexts rather than other contexts, because the results would be more likely to transfer. I’m not sure I buy that - including all the studies and moderators for year of publication and context would seem to give us a better sense of what to expect when applying the findings to new research or practice. (Moderators are variables that we hypothesize might change how effective the intervention is. A simple, silly example would be a learning intervention that is really effective for tall people but not at all for short people. In that case, the average effect size would be somewhat misleading, because that average isn’t accurate for anyone, really - there’s a moderator [height] that really changes how it affects people.)
  • Digital game interventions often suffer from the lack of a good comparison group (i.e., a digital game compared to what alternative kind of instruction?). Digital games are good for some topics, and some learning outcomes, but not others. And likewise, other kinds of instruction (face-to-face, simulations, etc.) are good for some topics and outcomes but not others. So, if you choose a topic or outcome that is best addressed via one kind of instruction (e.g., teaching control of variables strategy using a digital game vs. face-to-face instruction) and not another, then your comparison is confounded: it’s not that digital games are better than face-to-face instruction, necessarily, but rather for that topic, digital games might be better. So, again, the comparison groups are something to look for.
  • Okay, in the introduction the authors argued that digital games have really changed in quality since 2015, justifying their focus. I suppose that’s reasonable, but again, I feel like a moderator analysis with year of publication (even if it’s just pre- and post-2015) would be a better approach.
  • The authors did address a number of potential confounding moderator variables that they’ll include, like the age of the participants (e.g., maybe digital games work better for older students), type of digital game (e.g., serious game v. interactive simulation), number of sessions with the game, learning domain (verbal v. scientific v. mathematical), and many others including, surprisingly, year of publication (but just 2015-on). There were a bunch of moderators in there that I was less interested in (e.g., did the game include an avatar, which in theory might increase students’ identification with the game?). I was just less convinced those factors matter too much in terms of learning outcomes. I’m not aware of a strong theory that would support those factors as being important.
  • The authors looked at learning outcomes, cognitive outcomes, metacognitive outcomes, and affective-motivational outcomes. It’s funny, but those are ordered in increasing levels of difficulty regarding measurement. It’s tough to measure learning well, even tougher to measure cognition, tougher still to measure metacognition, and tough to measure affect-motivation (maybe I’d flip those last two in terms of difficulty, but they’re both really tough). Measurement validity is a huge issue for meta-analysis - again, GIGO. If the outcomes aren’t measured well, the whole synthesis falls apart.
  • Ah, the authors mentioned GIGO when justifying their decision to exclude studies that were not quasi-experimental (pre-post control design) or experimental in nature. They also excluded “pseudo-treatments” that they thought were bogus, like reading a website. Okay, that’s good. I would argue non-experimental work has its place, but I agree it can be tough to include in a meta-analysis like this.
  • Looks like they searched a lot of high-quality databases, which is good.
  • They had at least two people independently code each study. They had decent but not great interrater reliability for coding each study on the moderators. For example, coders often disagreed on the moderator “visual realisms” and when they did, a third coder was brought in to code, also, to break the tie. That got the reliability up, but is a concern. Again, if the coding of the studies isn’t consistent or clear, then you have another GIGO factor.
  • The stats look decent, and they are sufficiently complex that I won’t write about them here. I didn’t see whether any studies had multiple effect sizes, which requires modern techniques to adjust for. (Later in the Discussion, the authors wrote one limitation was they used the mean of effect sizes when 2 or more were in the same study. That’s not in line with modern approaches to handling this issue. That was a concern.)
  • The average effect size for digital game interventions was “medium” sized by most rule-of-thumb metrics, but those rules of thumb are often way off when considered in context. The same value that would be a small effect size for diagnosing cancer, relative to other methods, might be a huge effect size for improving students’ critical thinking scores, relative to other methods. Context matters. But, in terms of education interventions, I think “medium” is a decent description of the average effect size they found.
  • There’s lots of variance around that average effect size, and it doesn’t seem to be noise - it’s systematic. So, some of those moderators likely matter.
  • The metacognitive outcomes meta-analysis only had 5 studies in it. I doubt that’s enough to have confidence in the findings. I like to see 20 or more studies per meta-analysis. (The authors said much the same thing in their limitations section, later.)
  • As expected, digital game interventions had the largest effects on learning and cognitive outcomes, rather than metacognition or affective-motivation outcomes.
  • Very surprisingly, the authors found no evidence for their moderators as explanations of the variance in effect sizes. But the high level of variance means SOMETHING is causing the effect sizes to vary, making me less confident that the average effect size is useful at the moment. There’s a lot of variance around it that we don’t understand, yet. That’s not the authors’ fault. They looked at a bunch of moderators to try to explain that variance and didn’t find anything useful. Maybe that means we need more studies, or we haven’t yet identified the relevant sources of variance. More theory needed, perhaps?
  • I think the authors went a bit too far in their Implications and Future Directions section. They wrote “the current study strengthens the assumption that DGBL interventions are also effective for pupils’ learning, especially for cognitive and affective-motivational outcomes” (p. 27). I’d be more conservative given the unexplained variance in effect sizes, the lack of detail regarding the quality of measurement of the outcomes, and the lack of info on the kinds of comparison groups the studies used. I think it’s fair to say “digital games show some promise” but “are also effective” is too far for me, personally.
  • The authors were appropriately circumspect in other ways, though, acknowledging limitations regarding why digital games are effective, for whom, what kinds of personalization are needed, etc.

So, in sum, I’d say this meta-analysis was pretty good but limited in its implications, due mostly to the lack of sufficient empirical research to meta-analysis. It certainly supports continuing to study digital games for learning. But I wouldn’t make strong claims about the efficacy of digital games based on this meta-analysis. We need more empirical research on digital game interventions. That’s one thing I worry about nowadays - there are A LOT of meta-analyses being published, which is good as long as there is a sufficiently large and comprehensive empirical research base to be synthesized. But, for many areas of scholarship, I think authors’ time would be better spent conducting empirical studies, rather than meta-analyzing small numbers of the studies that exist.

Music played while I wrote this: https://music.apple.com/us/album/1459320797