Dr. Bob's Cog Blog: Forgetting

Showing posts with label Forgetting. Show all posts

Tuesday, April 20, 2021

Minding the Gap: Connecting teachers and students to learning science

Editorial Note. We are in for a real treat! We have a guest post by Josh Ling, the CEO and founder of Podsie. I'm a fan of Podsie because it is one of the rare ed-tech companies that takes learning science seriously and attempts to fix a rather significant problem in learning and teaching. Take it away, Josh!

Learning By Doing

Let’s start by doing a challenge. (Full disclosure: if you end up not finding this exercise challenging, it’s because I typically do this with middle-school students!)

Here we go:

Google a map of North Africa. Then, study the country that’s to the west of Egypt for 10 seconds. Next, study the country that’s south of Egypt for 10 seconds.
Close that tab!
Count to 20.
Now, test yourself. Are you able to recall the names of the countries to the west and to the south of Egypt?

Given the short duration of time that passed between you learning the names and then being quizzed, you were likely successful in getting them both right! However, what if I asked you to retrieve that same information again in 3 hours? How about tomorrow? Or in a week? Or in a month?

“Hoping to find some old forgotten words…” —Africa, Toto

At this point, you might have realized that we’re revisiting some topics that have previously been discussed on this blog: forgetting and how one might combat it.

In those posts, Dr. Bob explained:

“Forgetting is non-linear, meaning it decays quickly and eventually slows down.”

Visually, he also provided a forgetting curve that shows how fast we forget newly learned information:

The good news is that we can combat this forgetting through retrieval practice, where recalling that information from memory strengthens the stickiness of that information and slows the rate of forgetting.

To take it one step further, those blog posts referenced another article from Duolingo’s blog that expanded on the optimal cadence for retrieval practice. At Duolingo, they utilize the spacing effect [1], and they use their vast amounts of data points to map out a model of when students should review certain vocabulary words based on their prior performance [2]:

Research shows that the optimal time to retrieve information is right when you’re about to forget it [3]. Because each retrieval practice should increase the durability of that memory, the retrieval practice is spaced out with longer and longer lag times in between each session. Practically, this has the positive effect of increasing studying efficiency because at any given moment, you can focus only on the subset of content you’re most about to forget.

The Classroom Connection

I taught 8th grade math for two years from 2013 - 2015.

As a first year teacher, I was overly focused on just making it through the large amount of content that was mandated by our state curriculum. I gave little time or thought to review, and overall, my classroom looked a lot like the one that Dr. Bob described:

“The traditional method of teaching is to introduce a topic, solve a few illustrative problems that relate to that topic in class, assign some homework problems, and then give a test a few days or weeks later to see if the students retained the material. For highly important topics, the same items might make a reappearance on the final exam.”

Dr. Bob also describes the problem with this traditional approach:

“...if a topic hasn't been discussed in several weeks, then it is likely the memory system is going to treat that memory as unimportant, and it will find itself on the fast side of the forgetting curve. Second, if too much time elapses between the presentation and evaluation, then the probability of successful recall is going to be very low.”

I had never heard of the forgetting curve, but I saw it working in full force with my students.

In my second year, I resolved to provide more review opportunities. Overall, however, I was still completely ignorant to the basic principles of learning science. I didn’t know that retrieval was the most effective way to review. When I myself was a student, I was a serial crammer, so the spacing effect was basically foreign to me.

As a result, review in my classroom was often suboptimal. For example, I occasionally asked my students to just re-read their notes. I would sometimes put questions covering older topics on students’ homeworks or quizzes, but certainly not with enough consistency to make it stick. To make matters worse, it was a massive challenge to figure out which topics most needed review. Some of my students needed to review fractions at certain points of the school year, while others really needed to review solving integer operations.

That year, my students performed better than the previous year, but continued to struggle to effectively retain information that they had learned.

A few years later

A couple years later, I made a career change and became a software developer. Around that time, I read a book called Make it Stick by Roedinger, McDaniel, and Brown [4], and I was blown away.

This book was the first time that I learned about the basics of learning science. The book went through the nuances of how we learn and retain information, and it provided definitive recommendations on research-backed practices that educators should be using in their classrooms, like retrieval and spacing.

My mind immediately flashed back to my classroom, where these best practices could have made a substantial difference with how my students learned. I also realized that this issue went further than just my own classroom. By that point, I had sat through countless hours of teacher professional development, and I also had a master’s degree in education. However, not once did we cover those basic cognitive science principles on how students learn.

Now

Those experiences were the inspiration for Podsie, a nonprofit edtech I co-founded that’s focused on improving student learning and empowering teachers by making the science of learning more accessible.

At Podsie, we’ve built an online web app for teachers and students that makes best practices like spacing, retrieval, and interleaving easy to implement in the classroom. With Podsie, a teacher creates an assignment that assesses the content that students learned. When students complete a question on an assignment, the question goes into that student's personal deck, which essentially represents the entire body of knowledge that a student should know for that class.

Each student's personal deck is powered by a spacing algorithm that determines when the student should review a question again, similar to how Duolingo prompts students to review a vocab word when they are just about to forget it. Overall, this ensures that students have a personalized review experience that allows them to focus on concepts they most need to review.

All in all, we are incredibly excited to make it easier for teachers and students to utilize and learn more about learning science best practices. On the way, we’ve had the privilege of learning from and working with cognitive scientists like Dr. Bob who are on the same journey to ensure students and educators can be the best they can be.

We launched a beta trial of our app in August of 2020, and we’re preparing to launch in June of 2021. Our app is free for teachers and students, and if you’re interested, you can sign up on www.podsie.org to be notified as soon as we’re live!

Going Beyond the Information Given

[1] Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological bulletin, 132(3), 354.

[2] Figure 3 is taken from https://blog.duolingo.com/how-we-learn-how-you-learn/, which was originally published in their academic paper:

Settles, B., & Meeder, B. (2016, August). A trainable spaced repetition model for language learning. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers) (pp. 1848-1858).

[3] Landauer, T. K., & Bjork, R. A. (1978). Optimal rehearsal patterns and name learning. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 625-632). London: Academic Press.

[4] Brown, P. C., Roediger III, H. L., & McDaniel, M. A. (2014). Make it stick. Harvard University Press.

Tuesday, February 25, 2020

Fight the Power!: Retrieval Practice

Learning By Doing

Let's start with a handful of questions. Without looking back at any of the previous posts, try to answer the following questions:

What are the three processes involved in memory?
What is the shape of the forgetting curve?
How many items can be held in working memory at the same time?
What is the capacity of long-term memory?
Are there memories that we never forget?

The answers can be found at at the end of this post [1].

Wait...what was I going to say?

Do you remember sliding down the memory curve? If not, it's okay. It’s been a while. Forgetting is a normal (and adaptive!) part of memory. Forgetting is non-linear, meaning it decays quickly and eventually slows down. If you plot it on a graph, then it might look something like this (see Fig. 1). The y-axis is the probability of successfully recalling a memory, and the x-axis is the amount of time that has elapsed since the last time you tried to recall that same memory.

Figure 1. An idealized forgetting graph.

Notice the shape of the graph. It resembles a power function. In fact, most mathematical models of forgetting follow a power function, P = at^−b , where P is the probability of accurately recalling an item, t represents time, and b is the forgetting rate [2].

In another past post, we tried to address the question of why the forgetting curve looks like this. John Anderson and his colleague Lael Schooler put forth the argument that memory is adapted to our informational environment. We forget because the environment does not demand that we remember. Put another way, memory, and therefore forgetting, is a reflection of the environment. That's an interesting argument because it means we can structure the environment in such a way that guards against forgetting.

Inoculating Against Forgetting

If Anderson and Schooler's argument is accurate, what can we do to improve our memory? Burr Settles, Research Director at Duolingo, has an excellent suggestion. In his blog post, he suggests that we treat forgetting by administering little booster shots over time [3]. If you remember a vocabulary word accurately, then the system waits a longer time span than if you forget. If you forget, then the system asks you to recall that word more frequently. It's pretty ingenious, and it's an excellent example of using technology to solve a tricky educational problem.

The concept behind the recommendation is called retrieval practice. In other words, you give your students an opportunity to retrieve a word, concept, or fact from long-term memory. Merely attempting to recall an item ends up helping to boost that item's strength in memory. The critical component is that you try. If you fail, however, then you are going to need feedback (i.e., you need to see the item you were trying to recall). Retrieval practice has been shown to be more effective than rereading or reviewing the same material [4].

It seems weird, but that's how memory works. By the fact that you are trying to recall something signals to the memory system that this item is important, and that I need to remember it for next time.

The S.T.E.M. Connection

How do we harness Dr. Settle's suggestion in a classroom environment, where specific items (such as words) are not being tracked by a computer for each individual student? Is there a way to help teachers administer those memory booster shots to their students? 💉

The traditional method of teaching is to introduce a topic, solve a few illustrative problems that relate to that topic in class, assign some homework problems, and then give a test a few days or weeks later to see if the students retained the material. For highly important topics, the same items might make a reappearance on the final exam. Wouldn't the unit test and final exam count as a booster?

Depending on the time series, probably not. There are two potential problems. First, if a topic hasn't been discussed in several weeks, then it is likely the memory system is going to treat that memory as unimportant, and it will find itself on the fast side of the forgetting curve. Second, if too much time elapses between the presentation and evaluation, then the probability of successful recall is going to be very low.

There are a couple of ways to combat this situation. First, if you are an educator, and you are in complete control over the homework items assigned to your students, then you can "sneak" an old item into the current problem set. The problem, of course, is that if you do this too often, then your inoculation graph might look like this:

Figure 2. Spaced practice for multiple items with different decay rates.

As you can see, this can get really messy, really fast. One way to deal with that complexity is to schedule homework assignments where all of the problems are review items.

Second, if your domain has facts or skills that build on older ideas, then students will automatically receive practice on the foundational material. Math is a great example. Learning about ratios can help students understand slope, which then leads into solving linear equations. By exercising the more complex skills, such as solving linear equations, student receive practice on ratio reasoning.

I understand that implementing these suggestions is difficult because there are a lot of factors at play in the classroom, but I hope it is helpful to think about forgetting in terms of multiple, overlapping power functions. With that image in mind, we can keep giving doses of anti-forgetting shots [5].

Share and Enjoy!

Dr. Bob

Going Beyond the Information Given

[1] Answers are: 1) encoding, storage, and retrieval; 2) it's a decelerating power curve; 3) between five and nine items; 4) extremely large; 5) there is evidence that we have permanent memories for some items.

[2] Of course, there is some debate about that. Two of undergraduate professors argue that the empirically observable power law might be an artifact of averaging over multiple exponential functions. I know. Your mind is blown, right? Mine was too when I first heard their argument. All of the gory details can be found in: Anderson, R. B. & Tweney, R. D. (1997). Artifactual power curves in forgetting. Memory & Cognition, 25, 724–730.

[3] Burr Settles, B. (2016, December 14) How we learn how you learn. Retrieved from https://making.duolingo.com/how-we-learn-how-you-learn.

[4] Roediger III, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-term retention. Trends in cognitive sciences, 15(1), 20-27.

[5] If you've been following this blog, you might notice that booster shots show up every so often. This post is at attempt to boost your memory of the forgetting curve and the environmental factors that influence memory!

Thursday, August 6, 2015

Mirror, Mirror: Memory As a Reflection of the Environment

Take a few minutes to reflect on these questions:

Why is there a distinction between short-term memory and long-term memory?
Why does forgetting happen?
What are the environmental demands my memory?
Why does a lot of forgetting happen initially, but then it tapers off?
What is the optimal amount of time that I need to spend studying to remember something?

Why do we forget?

In a previous post, we graphed the forgetting curve of Hermann Ebbinghaus's study of his own memory. For very short delays, his memory was very good. But as the delays got longer, his memory for his list of trigrams (e.g., "LEK") dropped precipitously. Then the accuracy for remembering the list leveled off at around 20%.

We also drew a forgetting curve for the recall of Spanish vocabulary words across an entire life span. The shape of the graph was surprisingly similar to the forgetting curve of non-sense words used in Ebbinghaus's study. There was a large amount of forgetting initially, but then the percent of recalled words off at around 60%.

These two graphs are interesting in their own right, but you might be asking yourself the following questions: Why are these graphs shaped this way? Why are forgetting curves steep at first and then asymptote at some value? There must be a reason why memory works this way. Let's look to finches, moths, and ants for some answers.

Finches, moths, and ants...oh my!

During his trip to the Galápagos Islands, Charles Darwin noticed something peculiar about a particular family of birds. Although they were of the same family, there were several different species of finches, each with a distinctive beak. Some of the birds had a wide, stout beak; whereas, other finches had a sharp, pointed beak. It turned out that the different shapes aided the birds in consuming food for their different diets. The wide-beaked finches ate nuts and berries, while the sharp-beaked finches ate insects. In effect, the shape of the beak was optimized to the finch's environment, which included their dietary requirements [1].

But what happens if the environment changes? Can an organism's features evolve to respond to the change? It's hard to conduct a controlled laboratory experiment to answer this question; fortunately, a natural experiment occurred at the turn of the century. During the rise of the Industrial Revolution in Great Britain, the amount of pollution escalated rapidly. Ash from the factories coated trees in the surrounding region. Trees that once had light-colored bark, now covered in soot, turned to a dark gray. Resting on the bark of these trees was a species of moth, called the peppered moth. The most prevalent pepper moth had bodies that matched the original color of the tree bark. When the trees became gray, birds could now easily spot the white moths against the gray background. Due to some natural variation in pigmentation, some moths were born a darker color, which was much more difficult for predators to see. As the lighter colored moths were eaten, the ratio of darker moths to lighter moths tipped in favor of the dark-bodied pepper moths [2].

The study of finches demonstrated that species are optimized to their environment, and the study of pepper moths showed that the range of variation within a species can tilt depending on the factors that lend themselves better to survival. So far, however, this conversation has been about the outward appearance of an organism. What about an animal's behavior? Herbert A. Simon wrote this about the complex behavior of the ant:

Imagine watching an ant on the beach. Its path looks complicated. It zigs and zags to avoid rocks and twigs. Very reminiscent of complex behavior — what an intelligent ant!

Except an ant is just a simple machine. It wants to return to its nest, so it starts moving in a straight line. When it encounters an object, it zigs to avoid it. Repeat until the destination is reached.

Trying to simulate the path itself would be difficult, but simulating the ant is easy. It’s maybe a half-dozen rules.

The point of this parable is to illustrate the interaction between the environment and perceived complexity. Lots of complex looking things are really the result of the territory, the shape of the beach, and not the agent, in this case, an ant.

But, of course, with this metaphor, I’m not really talking about ants. I’m talking about people. How much of the complexity of human behavior is really the product of the environment? [3]

Memory, as we have seen, is highly complex. There appears to be at least two different storage mechanisms (i.e., short- vs. long-term memory) and several different classifications of memory types (i.e., semantic vs. episodic; procedural vs. declarative). Can we explain the complexity of memory by looking at the environment?

"All the news that's fit to print" (in memory).

To answer that question, we need a model of the environment, and see if it matches (more or less) to the models of memory that we currently have (i.e., the forgetting curves). How would you construct a model of the information in your everyday environment? That seems like a tall order. Since we live in an information-rich environment, it might be a good idea to narrow it down.

That's precisely what two cognitive scientists did when attempting to construct their own model of the informational environment [4]. They decided to look at all of the words that appeared in the headlines of the New York Times for a two-year period (i.e., 730 days between Jan. 1, 1986 and Dec. 31, 1987). They tracked two variables. The first was the day on which a word appeared. For example, the word Challenger occurred on days 29, 31, 34, 36, 40, 44, and 99. Then they counted how many times that word appeared in a 100-day window (n = 7 for Challenger). These two variables allowed them to construct a retention function, which is the probability that a word will appear on the 101st day given n number of days since its last occurrence.

To make that a little more clear, let's look at some hypothetical data (see Fig. 1). Suppose we want to know, What is the probability that a word will appear on the 101st day, given it has been 20 days since the last time I saw it? According to the hypothetical retention function, I have about an 11% chance of that particular word appearing. If it's been 100 days since the last I saw it, however, then the probability drops to around 3%. In other words, it is highly unlikely that a particular word will appear in my environment as more time passes since the last time it appeared. I think this makes intuitive sense. It's unlikely that we will read about Muammar Gaddafi today, given we haven't heard anything about him in several years.

Figure 1: A hypothetical retention function for words appearing on the 101st day.

To bring this full circle, it seems that our memory is like a finch's beak, a pepper moth's coloration, or an ant's path on the beach. Memory is optimized to the environment in which it operates; thus, memory is a reflection of the environment. The forgetting curves show that memory is solid for short durations, similar to the New York Times headlines model showing that it is highly likely that a particular word will show up again given a short delay [5]. But then as time goes on, that word is less likely to show up. So why bother remembering a piece of information if it is unlikely to appear again in one's informational world?

The STEM Connection

If memory is in fact a reflection of our environment, then what does that mean for the way we structure the informational environment in our classrooms? First of all, the forgetting curves reinforce the adage: Use it or lose it. If the informational environment does not demand that I remember something, then guess what? I'm probably not going to remember it.

However, we can systematically and intentionally structure the information environment so that important declarative chunks or procedural memories are needed and exercised in a periodic fashion. If something is important enough to remember (e.g., the slope-intercept form of a linear equation), then keep bringing it up. Keep using important information. As the demands in the informational environment escalate, then students will rise to the occasion. If learned well, then it might even make it into the part of the curve that never goes to zero (i.e., permastore).

Share and Enjoy!

Dr. Bob

For More Information

[1] Darwin's finches

[2] Peppered moth evolution

[3] Simon, H. A. (1996). The sciences of the artificial (Vol. 136). MIT press.

[4] Anderson, J. R., & Schooler, L. J. (1991). Reflections of the environment in memory. Psychological Science, 2(6), 396-408.

[5] Some might argue that editors of newspapers and magazine's are sensitive to our ability to remember and therefore might decide not to write about something that occurred long ago. While that might be the case, Anderson and Schooler (1991) also used two other databases to construct their argument. They included a database of children's speech (CHILDES) and the second author's email inbox.