๐Ÿ“š Weekly AI Paper Digest

๊ธฐ๊ฐ„: 2026-02-23 ~ 2026-02-28 ์„ ์ •: ์ด๋ฒˆ ์ฃผ ๊ฐ€์žฅ ์ฃผ๋ชฉ๋ฐ›์€ ๋…ผ๋ฌธ Top 5


๐Ÿ† ์ด๋ฒˆ ์ฃผ Top 5

์ˆœ์œ„๋…ผ๋ฌธโฌ†๏ธDeep Dive
๐Ÿฅ‡A Very Big Video Reasoning Suite491DD-031
๐ŸฅˆDoes Your Reasoning Model Implicitly Knoโ€ฆ246DD-032
๐Ÿฅ‰VESPO: Variational Sequence-Level Soft Pโ€ฆ215DD-033
4.The Trinity of Consistency as a Definingโ€ฆ185DD-034
5.From Blind Spots to Gains: Diagnostic-Drโ€ฆ143DD-035

๐Ÿ” ์ด๋ฒˆ ์ฃผ ํŠธ๋ Œ๋“œ

ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ

  • ๋น„๋””์˜ค ์ถ”๋ก  (Video Reasoning): ์‹œ๊ฐ์  ํ’ˆ์งˆ์„ ๋„˜์–ด ๋™์˜์ƒ ์† ์‹œ๊ณต๊ฐ„์  ๊ตฌ์กฐ์™€ ์ธ๊ณผ๊ด€๊ณ„๋ฅผ ์ดํ•ดํ•˜๋ ค๋Š” ์—ฐ๊ตฌ๊ฐ€ ๋ถ€์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ถ”๋ก  ์ตœ์ ํ™” (Reasoning Efficiency): ๊ธด ์‚ฌ๊ณ ์˜ ์‚ฌ์Šฌ(CoT)์ด ๋น„ํšจ์œจ์ ์ด๋ผ๋Š” ๋ฌธ์ œ๋ฅผ ์ œ๊ธฐํ•˜๋ฉฐ, ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ์ถ”๋ก ์„ ๋ฉˆ์ถ”๊ฑฐ๋‚˜ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์ด ๋…ผ์˜๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • ํ•™์Šต ์•ˆ์ •์„ฑ (Training Stability): LLM ๋ฐ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ์˜ ๊ฐ•ํ™”ํ•™์Šต(RL) ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ถˆ์•ˆ์ •์„ฑ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ์ง„๋ณด๊ฐ€ ๋‘๋“œ๋Ÿฌ์กŒ์Šต๋‹ˆ๋‹ค.
  • ์›”๋“œ ๋ชจ๋ธ (World Models): ๋ฌผ๋ฆฌ ๋ฒ•์น™๊ณผ ์‹œ๊ณต๊ฐ„์  ์ผ๊ด€์„ฑ์„ ์ค€์ˆ˜ํ•˜๋Š” ๋ฒ”์šฉ ์›”๋“œ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ์›๋ฆฌ์™€ ์ฒ ํ•™์ด ์ œ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • ์ง„๋‹จํ˜• ํ•™์Šต (Diagnostic Training): ์ •์ ์ธ ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹Œ ๋ชจ๋ธ์˜ ์•ฝ์ (Blind Spots)์„ ์ง„๋‹จํ•˜๊ณ  ์ด๋ฅผ ๋ณด์™„ํ•˜๋Š” ๋™์ ์ธ ํ›ˆ๋ จ ๋ฐฉ์‹์ด ์ฃผ๋ชฉ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค.

๊ณตํ†ต ์ฃผ์ œ

์ด๋ฒˆ ์ฃผ ๋…ผ๋ฌธ๋“ค์€ AI ๋ชจ๋ธ์ด ๋‹จ์ˆœํžˆ โ€˜ํฌ๊ธฐโ€™๋‚˜ โ€˜์ƒ์„ฑ ํ’ˆ์งˆโ€™์„ ๋„˜์–ด **โ€˜์–ผ๋งˆ๋‚˜ ํšจ์œจ์ ์ด๊ณ  ์•ˆ์ •์ ์œผ๋กœ ์ถ”๋ก ํ•˜๋Š”๊ฐ€โ€™**์— ์ง‘์ค‘ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ํ…์ŠคํŠธ๋ฅผ ๋„˜์–ด ๋น„๋””์˜ค์™€ ๊ฐ™์€ ๋ณต์žกํ•œ ํ™˜๊ฒฝ์—์„œ์˜ ๋ฌผ๋ฆฌ์  ์ดํ•ด(์›”๋“œ ๋ชจ๋ธ)๋ฅผ ์ถ”๊ตฌํ•˜๋ฉฐ, ์ด๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ•ํ™”ํ•™์Šต์˜ ํ›ˆ๋ จ ์•ˆ์ •์„ฑ๊ณผ ํšจ์œจ์„ฑ์„ ๋†’์ด๋Š” ๊ธฐ์ˆ ์  ๋ฐฉ๋ฒ•๋“ค์ด ๊ณตํ†ต์ ์œผ๋กœ ๋‹ค๋ฃจ์–ด์กŒ์Šต๋‹ˆ๋‹ค.

์ฃผ๋ชฉํ•  ์ 

๊ฐ€์žฅ ํฅ๋ฏธ๋กœ์šด ์ ์€ **โ€œ์ƒ๊ฐ์„ ๋ฉˆ์ถ”๋Š” ๋ฒ•โ€**์— ๋Œ€ํ•œ ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ๋ฌด์ž‘์ • ๊ธธ๊ฒŒ ์ƒ๊ฐํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ•„์š”ํ•œ ๋งŒํผ๋งŒ ์ถ”๋ก ํ•ด ํšจ์œจ์„ฑ์„ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•(2๋ฒˆ ๋…ผ๋ฌธ)๊ณผ, ์›”๋“œ ๋ชจ๋ธ์˜ ํ•ต์‹ฌ์„ ์‹œ๊ณต๊ฐ„์ /์ธ๊ณผ์  โ€˜์ผ๊ด€์„ฑโ€™์œผ๋กœ ์ •์˜ํ•œ ์‹œ๊ฐ(4๋ฒˆ ๋…ผ๋ฌธ)์€ AI๊ฐ€ ์ธ๊ฐ„์ฒ˜๋Ÿผ ํšจ์œจ์ ์ด๊ณ  ๋ฌผ๋ฆฌ์ ์ธ ์กด์žฌ๊ฐ€ ๋˜๋ ค๋Š” ๋…ธ๋ ฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ธ์˜ ๊ฒฐํ•จ์„ ์ง„๋‹จํ•ด ์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ต์ •ํ•˜๋Š” ์ง„๋‹จํ˜• ํ›ˆ๋ จ(5๋ฒˆ ๋…ผ๋ฌธ)์€ ๊ธฐ์กด ์ •์  ํ•™์Šต ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๋Š” ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค.

์‹ค๋ฌด ์‹œ์‚ฌ์ 

๊ฐœ๋ฐœ์ž์™€ ์—ฐ๊ตฌ์ž๋Š” ๋น„์šฉ ๊ณผ๋ถ€ํ•˜๋ฅผ ์œ ๋ฐœํ•  ์ˆ˜ ์žˆ๋Š” ๊ธด ์ถ”๋ก  ์ฒด์ธ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๊ธฐ์ˆ ์— ์ฃผ๋ชฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ถ”๋ก  ์„œ๋น„์Šค์˜ ์†๋„์™€ ๋น„์šฉ ํšจ์œจ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ์‚ฌ๊ณ ๋ฅผ ์ข…๋ฃŒํ•˜๊ฒŒ ํ•˜๊ฑฐ๋‚˜ ํ•„์ˆ˜์ ์ธ ์ถ”๋ก ๋งŒ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์œ ๋„ํ•˜๋Š” ๊ธฐ๋ฒ•์ด ์‹ค๋ฌด์ ์œผ๋กœ ์ค‘์š”ํ•ด์งˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ณต์žกํ•œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ์ด๋‚˜ ์—์ด์ „ํŠธ๋ฅผ ๊ฐœ๋ฐœํ•  ๋•Œ ํ›ˆ๋ จ ๊ณผ์ •์˜ ๋ถ•๊ดด(collapse)๋ฅผ ๋ง‰๋Š” ์•ˆ์ •์ ์ธ RL ์•Œ๊ณ ๋ฆฌ์ฆ˜(VESPO ๋“ฑ)์„ ๋„์ž…ํ•˜์—ฌ, ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ๊ณผ ์‹ ๋ขฐ์„ฑ์„ ๋™์‹œ์— ํ™•๋ณดํ•˜๋Š” ์ „๋žต์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“‘ ๋…ผ๋ฌธ๋ณ„ ์š”์•ฝ

๐Ÿฅ‡ 1. A Very Big Video Reasoning Suite

arXiv: 2602.20159 | โฌ†๏ธ 491 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: video-reasoning large-scale-dataset scaling-law cognitive-architecture computer-vision evaluation-benchmark ai-research multimodal

๋น„๋””์˜ค ๋ชจ๋ธ์˜ ์‹œ๊ฐ์  ํ’ˆ์งˆ ํ–ฅ์ƒ์— ์น˜์šฐ์ณ ์žˆ๋˜ ์—ฐ๊ตฌ ํŠธ๋ Œ๋“œ๋ฅผ ๊นจ๊ณ , 100๋งŒ ๊ฐœ๊ฐ€ ๋„˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•˜๋Š” ๋Œ€๊ทœ๋ชจ ๋น„๋””์˜ค ์ถ”๋ก  ๋ฐ์ดํ„ฐ์…‹(VBVR)์„ ๊ตฌ์ถ•ํ•˜์—ฌ ๋ชจ๋ธ์˜ ๊ทœ๋ชจ์— ๋”ฐ๋ฅธ ์ถ”๋ก  ๋Šฅ๋ ฅ์˜ ๋ฐœํ˜„ ๊ฐ€๋Šฅ์„ฑ์„ ์ฒ˜์Œ์œผ๋กœ ์ž…์ฆํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅˆ 2. Does Your Reasoning Model Implicitly Know When to Stop Thinking?

arXiv: 2602.08354 | โฌ†๏ธ 246 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: reasoning-models chain-of-thought efficient-inference sage sampling-paradigm test-time-scaling llm-efficiency

์ด ๋…ผ๋ฌธ์€ ๋Œ€ํ˜• ์ถ”๋ก  ๋ชจ๋ธ(LRM)์ด ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๋•Œ ๋ถˆํ•„์š”ํ•˜๊ฒŒ ๊ธด ์‚ฌ๊ณ  ๊ณผ์ •์„ ๊ฑฐ์น˜๋Š” ๋ฌธ์ œ๋ฅผ ์ง€์ ํ•˜๋ฉฐ, ๋ชจ๋ธ์ด ์ด๋ฏธ โ€˜์–ธ์ œ ์ƒ๊ฐ์„ ๋ฉˆ์ถฐ์•ผ ํ• ์ง€โ€™ ์•”๋ฌต์ ์œผ๋กœ ์•Œ๊ณ  ์žˆ๋‹ค๋Š” ์ ์„ ์ฆ๋ช…ํ•˜๊ณ  ์ด๋ฅผ ํ™œ์šฉํ•ด ํšจ์œจ์„ฑ๊ณผ ์ •ํ™•๋„๋ฅผ ๋ชจ๋‘ ๊ฐœ์„ ํ•œ ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ๋ง ํŒจ๋Ÿฌ๋‹ค์ž„์ธ SAGE๋ฅผ ์ œ์‹œํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅ‰ 3. VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

arXiv: 2602.10693 | โฌ†๏ธ 215 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: llm rlhf vespo off-policy reinforcement-learning optimization stability math-reasoning

์ด ๋…ผ๋ฌธ์€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ฐ•ํ™” ํ•™์Šต ๊ณผ์ •์—์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ์ตœ์‹  ๋ชจ๋ธ๊ณผ ๋งž์ง€ ์•Š๋Š” ์˜คํ”„ ํด๋ฆฌ์‹œ(Off-Policy) ์ƒํ™ฉ์ด ๋ฐœ์ƒํ•ด๋„ ํ•™์Šต์ด ๋ถ•๊ดดํ•˜์ง€ ์•Š๋„๋ก, ์ด๋ก ์ ์œผ๋กœ ์—„๋ฐ€ํ•˜๋ฉด์„œ๋„ ํšจ์œจ์ ์œผ๋กœ ๋ถ„์‚ฐ์„ ์ค„์ด๋Š” ์ƒˆ๋กœ์šด ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋ก  VESPO๋ฅผ ์ œ์•ˆํ–ˆ๊ธฐ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


4. 4. The Trinity of Consistency as a Defining Principle for General World Models

arXiv: 2602.23152 | โฌ†๏ธ 185 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: world-model consistency trinity multimodal causal-inference physics-simulation agi generative-model

์ด ๋…ผ๋ฌธ์€ ๊ธฐ์กด ์ƒ์„ฑ ๋ชจ๋ธ์ด ๊ฒช๋Š” โ€˜์ˆœ์ง„ํ•œ ๋ฌผ๋ฆฌํ•™์žโ€™ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ตฌ์กฐ์ , ์‹œ๊ฐ„์ , ์ธ๊ณผ์  ์ผ๊ด€์„ฑ์ด๋ผ๋Š” โ€˜์ผ๊ด€์„ฑ์˜ ์‚ผ์œ„์ผ์ฒด(Trinity of Consistency)โ€˜๋ฅผ ์ด๋ก ์  ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ์ œ์ •ํ•˜์—ฌ, ๋‹จ์ˆœํ•œ ํ™”์†Œ ์ƒ์„ฑ์„ ๋„˜์–ด ์ง„์ •ํ•œ ๋ฌผ๋ฆฌ ๋ฒ•์น™์„ ์ดํ•ดํ•˜๋Š” ์ผ๋ฐ˜ ์„ธ๊ณ„ ๋ชจ๋ธ(General World Model)์˜ ์„ค๊ณ„ ์›๋ฆฌ๋ฅผ ํ™•๋ฆฝํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


5. 5. From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

arXiv: 2602.22859 | โฌ†๏ธ 143 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: lmm diagnostic-driven self-evolution reinforcement-learning data-generation multi-agent iterative-training

์ด ๋…ผ๋ฌธ์€ ์ •ํ˜•ํ™”๋œ ๋ฐ์ดํ„ฐ์™€ ํœด๋ฆฌ์Šคํ‹ฑ(๊ฒฝํ—˜์  ๊ทœ์น™)์— ์˜์กดํ•˜๋Š” ๊ธฐ์กด ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ๋„˜์–ด, ๋ชจ๋ธ์˜ ์•ฝ์ ์„ ์ •๋ฐ€ํ•˜๊ฒŒ ์ง„๋‹จํ•˜๊ณ  ์ด์— ๋งž์ถฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑ ๋ฐ ๊ฐ•ํ™”ํ•™์Šตํ•˜๋Š” ์ˆœํ™˜ ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ๋Œ€๊ทœ๋ชจ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ์„ ํšจ์œจ์ ์œผ๋กœ ๊ณ ๋„ํ™”ํ•˜๋Š” ์ง„๋‹จ ๊ธฐ๋ฐ˜์˜ ์ ์ง„์  ์ง„ํ™” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ–ˆ๊ธฐ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿ“… ์ƒ์„ฑ์ผ: 2026-03-01 | ๐Ÿค– GLM-4.7 Weekly Digest