
Most training fails within a week. Cognitive science explains exactly why, and what to do about it. Here are the 10 design principles that turn drills into durable knowledge.
In 1885, German psychologist Hermann Ebbinghaus plotted what he called the forgetting curve. He memorized lists of nonsense syllables and tested himself at different intervals. The results were stark: within 24 hours, he had forgotten roughly 60% of what he had learned. Within a week, closer to 80%.
Ebbinghaus thought the solution was repetition. He was half right. Cognitive science has spent 140 years refining that insight, and the conclusion is more specific: what matters is not how many times you expose yourself to material, but how many times you retrieve it. Practice that demands active recall produces memory traces that survive real work conditions. Passive re-reading does not.
This is the gap that well-designed drills exist to close. The 10 principles below are not style preferences or common sense dressed up as frameworks. Each one is grounded in peer-reviewed research in cognitive psychology and instructional science. Understanding the evidence behind them helps you design practice that actually works, not just practice that feels productive in the moment.
The human working memory system has firm limits. Miller's classic 1956 paper (opens in new tab) established that people can hold roughly seven items (give or take two) in working memory at once. Later research by Nelson Cowan (2001) (opens in new tab) refined this to around four chunks, where a chunk is any cluster of information already organized in long-term memory. The key word is chunks: familiar patterns compress into fewer slots, but genuinely novel content does not.
When a drill presents a dense block covering a full chapter, a procedure, and its exceptions simultaneously, learners cannot process it effectively. John Sweller's cognitive load theory (opens in new tab), developed in the late 1980s, identifies three types of mental load: intrinsic load (the inherent complexity of the content), extraneous load (unnecessary complexity from poor design), and germane load (the effort that actually builds knowledge). Piling multiple concepts into one question inflates both intrinsic and extraneous load, leaving little capacity for the learning that matters.
Segmenting material into small, focused units solves this. One concept per drill item keeps load manageable and makes results interpretable: when someone answers incorrectly, you know which concept failed, not a bundle of three ideas tangled into one question.
Schedule a free consulting call with one of our experts and we'll help you find the right approach.
Not every sentence in a training document deserves drill time. Retrieval practice is most powerful when it targets actionable knowledge: the decisions, thresholds, warning signs, and first steps that people must execute without looking things up.
The evidence for selectivity comes from the testing effect, one of the most replicated findings in cognitive psychology. Roediger and Karpicke (2006) (opens in new tab) showed that students who took practice tests on a passage retained roughly 50% more after one week than students who re-read the passage the same number of times. Follow-up research by Karpicke and Blunt (2011) (opens in new tab) found that retrieval practice outperformed even concept mapping, a strategy generally considered superior to passive re-reading.
The implication for drill design is selective focus. Identifying which knowledge items carry real decision weight or risk, then drilling those specifically, concentrates practice where it produces the highest return. Everything else can stay in the reference document where it belongs.
Knowing something in a test and applying it in context are not the same skill. Cognitive scientists call this the transfer problem, and it has been studied for over a century.
The key insight comes from encoding specificity, a principle developed by Tulving and Thomson (1973) (opens in new tab). Their research showed that memory retrieval is most reliable when the cues available at recall match the cues present during encoding. If someone learns a safety rule while reading a manual, they encode it in a "reading" context. When the rule needs to be applied during a live incident, the retrieval cues are completely different, and recall can fail precisely when it is needed most.
A vivid demonstration came from Godden and Baddeley (1975) (opens in new tab), who asked scuba divers to memorize word lists either on land or underwater. Recall was significantly better when tested in the same environment where learning had occurred. Context shapes retrieval in ways that feel counterintuitive but are remarkably consistent.
For drill design, this means wrapping knowledge in scenarios, cases, and short stories that mirror the situations where the knowledge will actually be needed. When someone practices a decision rule through a realistic incident description, the memory is encoded with workplace cues rather than classroom cues. The transfer gap narrows. Our article on storytelling as a learning tool explores this further.
A question that bundles multiple concepts forces the learner to manage several things at once, and makes the outcome nearly uninterpretable. Did the person hesitate because of the terminology? The exception? The sequence? Scoring reveals a failure but not its cause.
Sweller's cognitive load theory explains why this matters beyond mere interpretation. When a question demands simultaneous processing of several interrelated elements, what researchers call element interactivity, cognitive load can exceed working memory capacity. The person guesses or gives up, and no genuine learning event occurs.
Single-focus questions reduce element interactivity to a manageable level. They also enable adaptive systems to route practice precisely. If someone consistently struggles with one specific concept while handling related concepts well, the system can schedule reinforcement for that exact element rather than a broad topic cluster that may include things already fully mastered.
The way answer options are written affects what gets encoded. This principle draws from research on the generation effect, first described by Slamecka and Graf (1978) (opens in new tab): information that is generated by the learner is remembered significantly better than information that is merely read.
When the correct answer option restates the principle or rule in clear, specific language, the person who selects it rehearses the actual reasoning. The person who chose incorrectly reads the correction as a richer piece of information than a simple "wrong" label. In both cases, what is encoded is the content of the correct answer, not just a label.
This distinction is not trivial. Answer options that say "Option B: the correct approach" teach learners to recognize a label. Answer options that say "Option B: first confirm the system is in standby, then apply the lockout procedure" teach the actual sequence. The memory trace from the second formulation is meaningfully stronger, and the feedback phase becomes a genuine learning event rather than a scorecard.
The timing and quality of feedback are not minor design details. A large-scale review by Hattie and Timperley (2007) (opens in new tab) found that elaborative feedback, explaining why an answer is correct and what reasoning should guide future decisions, produces significantly stronger outcomes than evaluative feedback (right/wrong signals alone). Research by Butler and Roediger (2008) (opens in new tab) showed further that immediate feedback, given while the retrieval attempt is still active in working memory, is more effective at correcting errors than feedback given after a delay.
For drill design, this translates directly: feedback that says "Incorrect. The correct approach is to isolate the circuit before testing voltage, because live testing creates arc flash risk" does something very different from feedback that says only "Incorrect." The first gives the learner a corrected mental model. The second gives them a score. Our article on the positive effects of good feedback digs into this distinction in more detail.
When every practice item uses the same format, learners adapt to the format rather than the content. This is what cognitive scientists call surface feature reliance: recognizing a correct answer pattern without genuinely engaging with the underlying idea.
The antidote is interleaving, studied by Kornell and Bjork (2008) (opens in new tab). In their research, subjects who studied paintings by different artists in interleaved order (artist A, then B, then C, repeating) outperformed those who studied in blocked order (all of artist A, then all of B) on style recognition tests, even though interleaved practice felt harder and less productive to the learners themselves. The difficulty was desirable: it forced deeper processing each time. See our article on why interleaving matters for a fuller treatment.
Different formats also tap different memory systems. A sequence question builds procedural ordering. An image question builds recognition memory. An open-response question forces pure recall without cues. A scenario question builds contextual judgment. Together, they create multiple retrieval routes to the same knowledge, which means more ways to access it when it is needed in real conditions.
One strong drill item tests one retrieval path. Multiple variants of the same concept, asked through different cues, wordings, and scenarios, build a richer and more flexible knowledge structure.
The scientific backing comes from two related bodies of work. First, the spacing effect: Cepeda and colleagues (2006) (opens in new tab) analyzed 317 studies involving over 16,000 participants and confirmed that distributing practice across time consistently outperforms massed practice for long-term retention. Returning to the same material in varied forms across multiple sessions produces the distributed exposure that spacing requires.
Second, variability of practice: Schmidt and Bjork (1992) (opens in new tab) showed that exercising the same underlying concept under changing conditions produces better transfer to new situations than constant practice of a fixed form. Asking the same safety rule through a scenario involving a new employee, then a night-shift scenario, then a maintenance window, is not mere repetition. It is variability. And variability is what transfers to real work conditions.
The relationship between difficulty and learning is not linear. Practice that is too easy produces little retrieval effort and, consequently, little durable learning. Practice that is too hard produces failure, frustration, and disengagement. The productive zone is in between: what Robert Bjork famously called desirable difficulties.
Bjork and Bjork (2011) (opens in new tab) summarized the evidence: making retrieval effortful through spacing, interleaving, and reduced cuing strengthens long-term retention even when it appears to slow immediate performance. The sensation of difficulty is often a signal that something useful is happening in memory. When practice feels easy, it usually means retrieval is not actually occurring.
Currency matters for a separate reason. When content becomes outdated, practice can actively harm performance by reinforcing wrong behaviour or an obsolete rule. In high-stakes domains such as aviation, healthcare, and financial services, a procedural change that has not reached the drill layer is a compliance or safety risk, not merely a training gap. Our article on why people forget what they just learned covers the forgetting curve mechanics in more detail.
Adaptivity closes the loop. Systems that track per-item performance and adjust presentation timing can deliver each item at the moment when retrieval is most effortful but still achievable. This is the moment where memory traces are most strengthened, a principle drawn from spaced repetition research that traces back to Ebbinghaus and has been refined across decades of experimental work.
No cognitive principle operates in isolation from motivation. A learner who does not understand why they are practicing, or who cannot see progress, will disengage before the retrieval events that drive learning can accumulate.
The evidence comes from self-determination theory, developed by Ryan and Deci (2000) (opens in new tab). Their framework identifies three psychological needs that sustain motivation: autonomy (a sense of ownership over the activity), competence (feeling effective and making progress), and relatedness (seeing the purpose in relation to real work and other people). Drills that connect content to real consequences address all three simultaneously.
Amabile and Kramer's (2011) (opens in new tab) research on inner work life adds a complementary angle: the single strongest driver of positive engagement is making visible progress on meaningful work. Even small forward steps sustain motivation when people can see them. In drill design, this translates to progress indicators, visible competence trajectories, and language in the scenarios and feedback that connects practice directly to the job.
The "why" must not be buried in a course description most learners have already forgotten. It belongs in the drill itself.
The research behind these 10 principles spans cognitive load theory, retrieval practice, encoding specificity, interleaving, spacing effects, desirable difficulties, feedback science, and motivational psychology. That breadth is not accidental. Effective drills sit at the intersection of all of these fields.
The practical consequence is that drill design is a discipline, not a shortcut. A well-designed drill item is not a quiz question. It is a precisely targeted retrieval event: built to strengthen one memory trace, correctly contextualized, clearly worded, followed by feedback that builds a better mental model, and scheduled to return at the moment when forgetting is most likely.
Design drills with these 10 principles in mind, and you give learners a meaningfully better chance of retrieving the right knowledge at the right time. You also give your organization a way to verify that knowledge and competences are genuinely reliable, not just completed.
To see how Drillster turns source material into adaptive drills built around these principles, see Drillster Question Crafter.
Subscribe and we will send you more like this from time to time. No spam, ever. Just thoughtful reads on the science of learning and how it meets critical business needs.