JLPT, HSK, TOPIK, CEFR: A Practical Guide for Vocabulary Flashcards

vocabularyflashcardsjlpthsktopikcefrproficiency

Why Grading Systems Matter When You're Building a Vocabulary

If you've ever opened a vocabulary flashcard app and seen a row of opaque labels — N5, HSK 3, B1, TOPIK 4 — you've run into the proficiency frameworks. They look bureaucratic, but they're doing something useful: every one of them is, underneath, a defined vocabulary list at a defined level of fluency.

For a flashcard learner, that's gold. A grading system gives you three things at once:

  • A target. Not "I'd like to know more words" but "I want to know the ~3,700 words on the JLPT N3 list by August."
  • A diagnostic. If you can already recall 90% of the HSK 2 vocabulary, you don't need to start at HSK 1 — and a frequency-ranked deck tagged by HSK level will tell you exactly where to begin.
  • A pacing constraint. Each step up roughly doubles the vocabulary load. Knowing that ahead of time prevents the classic "I plateaued" surprise around the intermediate-to-advanced jump.

This article walks through the four frameworks Vocabcraft uses to tag its decks — JLPT for Japanese, HSK for Chinese, TOPIK for Korean, and CEFR for Spanish, French, German, and Danish — explains the vocabulary thresholds at each level, and shows how we map a single frequency-ranked deck to all of them.

JLPT — Japanese Language Proficiency Test

The Japanese Language Proficiency Test is run twice a year by the Japan Foundation and JEES. It has five levels, from N5 (beginner) to N1 (advanced), and tests only reading and listening — there's no speaking or writing section. Despite that limitation, JLPT N1 is the de facto standard employers in Japan use to gate Japanese-speaking jobs.

Approximate vocabulary thresholds:

JLPT vocabulary by level
N5~800Basic survival Japanese, hiragana/katakana, ~100 kanji
N4~1,500Daily conversation, ~300 kanji
N3~3,700The bridge level — most learners get stuck here
N2~6,000Newspapers, work email, most TV with context
N1~10,000+Native-adjacent reading, abstract / literary text

The JCAT, J-CAT, and J-LPT all have different official word lists; the numbers above are the rough community consensus. A useful sanity check: N5 → N4 → N3 each roughly doubles the vocabulary load. N3 → N2 → N1 keeps doubling but the new words are increasingly rare — which is why the SRS interval matters more at the upper levels.

If you're studying for the JLPT with flashcards, you want a deck that's frequency-ranked within each level. Sorting alphabetically or by chapter wastes effort on rare items before common ones.

HSK — Hanyu Shuiping Kaoshi (Chinese)

The HSK is China's official Chinese-proficiency test. There are two versions in circulation right now and this is a common source of confusion:

  • HSK 2.0 (the standard from 2010–2021) has six levels, HSK 1 → HSK 6, with about 5,000 words at the top.
  • HSK 3.0 (introduced 2021, transitioning in) has nine levels plus the same six original tests, with about 11,000+ words at HSK 7-9.

Most exam-prep material, third-party apps, and learner discussion still refers to HSK 2.0. Vocabcraft tags decks with the HSK 2.0 numbering (1 → 6) and treats anything above HSK 6 as a high-frequency "advanced" pool.

Approximate vocabulary thresholds (HSK 2.0):

HSK vocabulary by level
HSK 1~150Pinyin, tones, greeting phrases
HSK 2~300Basic everyday conversation
HSK 3~600Travel, simple work topics
HSK 4~1,200News headlines, common topics
HSK 5~2,500Most newspapers, casual reading
HSK 6~5,000Native-adjacent, formal writing

A practical note for HSK flashcard study: the official HSK lists are not frequency-ordered within a level. They're organized loosely by topic. So if you study them in published order, you can spend the first week of HSK 4 on relatively rare words. A frequency-ranked deck tagged to HSK 4 fixes this — you see the most common HSK 4 words first.

TOPIK — Test of Proficiency in Korean

TOPIK is run by the South Korean Ministry of Education and tests reading, listening, and (in some sittings) writing. It's structured a little differently from JLPT and HSK:

  • TOPIK I is one exam that scores you at Level 1 or Level 2 (beginner).
  • TOPIK II is a separate, harder exam that scores you at Level 3, 4, 5, or 6 (intermediate → advanced).

You don't pick a level when you sign up; your score on the exam decides which level you're certified at. Vocabulary thresholds:

TOPIK vocabulary by level
TOPIK 1~800Hangul, basic survival Korean
TOPIK 2~2,000Everyday conversation, simple text
TOPIK 3~3,000Common workplace and social topics
TOPIK 4~5,000News articles, business email
TOPIK 5~8,000Academic and professional text
TOPIK 6~10,000+Native-adjacent reading and listening

TOPIK doesn't publish an official vocabulary list the way JLPT and HSK do, so the numbers above are derived from analysis of past papers and the National Institute of Korean Language's frequency studies. They're directional, not definitive.

CEFR — The Common European Framework

CEFR isn't a test — it's a framework, used as the rubric by which actual tests (DELE for Spanish, DELF/DALF for French, Goethe-Zertifikat for German, Prøve i Dansk for Danish, and many others) are scored. Six levels:

  • A1 — Beginner. Introduce yourself, order food, basic past tense.
  • A2 — Elementary. Hold simple conversations, write short emails.
  • B1 — Intermediate. Independent user. Travel comfortably, follow most TV.
  • B2 — Upper intermediate. Fluent in familiar contexts, can argue a point.
  • C1 — Advanced. Effective use in professional and academic settings.
  • C2 — Mastery. Near-native nuance and idiomatic command.

Vocabulary thresholds vary significantly between languages (Spanish A2 ≠ German A2 in raw word count, because German morphology shifts what "a word" even means), but the consensus ranges look like:

CEFR vocabulary by level (typical)
A1~500Survival vocabulary, present tense
A2~1,000Daily routines, past and future tenses
B1~2,000Independent traveler, follow most media
B2~4,000Comfortable in work / academic settings
C1~8,000Nuance, idiom, register switching
C2~16,000+Near-native reading and writing

A useful rule of thumb across all four frameworks: the gap between A1 and A2 is small. The gap between B1 and B2 is enormous. The gap between C1 and C2 is the size of a career. Plan study time accordingly.

Comparing the Four Frameworks Side by Side

The four systems are calibrated to different traditions, so the levels don't line up cleanly. The most defensible comparison is by vocabulary count rather than by name:

Approximate vocabulary parity
Words
JLPT
HSK
TOPIK
CEFR
~500–1,000
N5
HSK 2
TOPIK 1
A1 → A2
~1,500–2,500
N4
HSK 4
TOPIK 2
A2 → B1
~3,000–4,000
N3
HSK 5
TOPIK 3
B1 → B2
~5,000–6,000
N2
HSK 6
TOPIK 4
B2
~8,000+
N1
TOPIK 5
C1
~10,000+
N1+
TOPIK 6
C2

Approximate only. Rows are aligned by total active vocabulary, not by test difficulty.

This is the table our decks are built around. The framework labels are signposts; the underlying study unit is the word.

How Vocabcraft Uses These Frameworks

We build every deck the same way: take the largest reliable frequency list for the language, rank it by real-world usage, then tag each card with the proficiency level it falls into (JLPT for Japanese, HSK for Chinese, TOPIK for Korean, CEFR for the rest).

That tagging gives the deck two superpowers:

  1. Filter by certification range. Studying for JLPT N3? Toggle N3 in the deck filter and you see only N3 cards — still in frequency order, so the most common N3 words come first. The 80/20 rule does most of the work for you.
  2. Pick up where you left off. If you've already passed HSK 3, you can mark everything ≤ HSK 3 as known in one tap and start with HSK 4. No need to grind through 750 already-known words just because the deck starts at 1.

Crucially, the frequency ranking comes first. The proficiency tags are a view on top of that ranking, not a replacement for it. This matters because the official JLPT/HSK/TOPIK lists themselves are not frequency-ordered — they're grouped by topic or chapter. Studying them in published order means you'll see "to embroider" before "to want", which is exactly backwards.

If you want to skip the explanation and just start studying, every supported language has a tagged deck ready:

Which Level Should You Start At?

Two failure modes are worth naming:

Starting too low. If you can already recognize 80%+ of the words at a given level, studying them again is mostly a waste. Use the filter to jump to the next level up — you'll still see the same cards eventually as the SRS pulls them back into rotation if you ever forget, but your study time goes into new ground.

Starting too high. A flashcard deck where you fail every other card isn't a study session, it's a frustration generator. If you're getting under 30% recall on a level, drop back one and let the easier vocabulary anchor before climbing.

A practical heuristic: start where you can recall ~50% on first sight. The SRS will turn the other 50% into easy cards within two or three sessions, and you'll feel the progress, which matters more for adherence than people give it credit for.

A Closing Thought on Frameworks

The four grading systems above are real and useful, but they're also somewhat arbitrary. Native speakers don't think in N3 or B2 — they think in topics and contexts. The frameworks exist because language learning needs measurable milestones, and "I want to be fluent" isn't measurable. JLPT N3 is.

The right way to use them is as a scaffolding for habit, not as a goal in itself. Pick a level, study the vocabulary on a frequency-ranked flashcard deck, take the test if it motivates you, and move on. The vocabulary you learn along the way is what actually does the work.

If you're ready to start, our language decks are tagged by exactly these levels.