๐Ÿ“š Weekly AI Paper Digest

๊ธฐ๊ฐ„: 2026-02-09 ~ 2026-02-14 ์„ ์ •: ์ด๋ฒˆ ์ฃผ ๊ฐ€์žฅ ์ฃผ๋ชฉ๋ฐ›์€ ๋…ผ๋ฌธ Top 5


๐Ÿ† ์ด๋ฒˆ ์ฃผ Top 5

์ˆœ์œ„๋…ผ๋ฌธโฌ†๏ธDeep Dive
๐Ÿฅ‡OPUS: Towards Efficient and Principled Dโ€ฆ308DD-021
๐ŸฅˆWeak-Driven Learning: How Weak Agents maโ€ฆ251DD-022
๐Ÿฅ‰TermiGen: High-Fidelity Environment and โ€ฆ195DD-023
4.Code2World: A GUI World Model via Renderโ€ฆ186DD-024
5.The Devil Behind Moltbook: Anthropic Safโ€ฆ182DD-025

๐Ÿ” ์ด๋ฒˆ ์ฃผ ํŠธ๋ Œ๋“œ

ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ

  • ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ (Data Efficiency): ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ ๋ถ€์กฑ(โ€˜๋ฐ์ดํ„ฐ ์›”โ€™) ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋™์  ๋ฐ์ดํ„ฐ ์„ ๋ณ„ ๋ฐ ํ•ฉ์„ฑ ํ™˜๊ฒฝ ์ƒ์„ฑ ๊ธฐ์ˆ 
  • ์—์ด์ „ํŠธ ์›”๋“œ ๋ชจ๋ธ (World Model): GUI ์—์ด์ „ํŠธ์˜ ์ถ”๋ก ๋ ฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์ฝ”๋“œ ๊ธฐ๋ฐ˜์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๋œ ํ™˜๊ฒฝ์„ ๊ตฌ์ถ•ํ•˜๋Š” ์‹œ๋„
  • ์ตœ์ ํ™” ์—ญํ•™ (Optimization Dynamics): ๋ชจ๋ธ์˜ ๊ณผ๊ฑฐ ์ƒํƒœ๋‚˜ ์•ฝํ•œ ์ƒํƒœ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šต ํฌํ™” ํ˜„์ƒ์„ ๊ทน๋ณตํ•˜๋Š” ์ƒˆ๋กœ์šด ํ•™์Šต ํŒจ๋Ÿฌ๋‹ค์ž„
  • ์ž๊ธฐ ์ง„ํ™”์˜ ์•ˆ์ „์„ฑ (Self-Evolution Safety): ์ž์œจ์ ์œผ๋กœ ์ง„ํ™”ํ•˜๋Š” ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ ๋‚ด์—์„œ ์•ˆ์ „์„ฑ ์ •๋ ฌ์ด ๋ฌด๋ ฅํ™”๋˜๋Š” ํ˜„์ƒ์— ๋Œ€ํ•œ ๊ฒฝ๊ณ 

๊ณตํ†ต ์ฃผ์ œ

์ด๋ฒˆ ์ฃผ ๋…ผ๋ฌธ๋“ค์€ AI ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋” ์ด์ƒ ๋‹จ์ˆœํ•œ โ€˜๊ทœ๋ชจ์˜ ํ™•์žฅ(Scale-up)โ€˜์ด ์•„๋‹Œ **โ€˜์ •๊ตํ•œ ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ์™€ ํ•™์Šต ํšจ์œจํ™”โ€™**๋ฅผ ํ†ตํ•ด ๋‹ฌ์„ฑํ•˜๋ ค๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๊ณ ํ’ˆ์งˆ ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๊ณ ๊ฐˆ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๋™์ ์œผ๋กœ ์„ ๋ณ„ํ•˜๊ฑฐ๋‚˜ ํ•ฉ์„ฑํ•˜๋Š” ๊ธฐ์ˆ (OPUS, TermiGen)๊ณผ, ์—์ด์ „ํŠธ๊ฐ€ ํ™˜๊ฒฝ์„ ์ดํ•ดํ•˜๊ณ  ํ–‰๋™ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•๋Š” ์›”๋“œ ๋ชจ๋ธ(Code2World)์ด ์ฃผ๋ฅผ ์ด๋ฃน๋‹ˆ๋‹ค. ๋™์‹œ์— ๋ชจ๋ธ์˜ ์ตœ์ ํ™” ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์—ญํ•™์  ํŠน์„ฑ(Weak-Driven, Safety)์„ ๊นŠ์ด ์žˆ๊ฒŒ ๋ถ„์„ํ•˜์—ฌ ๋” ๊ฐ•๋ ฅํ•˜๊ณ  ์•ˆ์ „ํ•œ AI๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•๋ก ์ด ์ œ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ฃผ๋ชฉํ•  ์ 

โ€˜Weak-Driven Learningโ€™์€ ํ•™์Šต์ด ์ •์ฒด๋˜๋Š” ์ƒํ™ฉ์—์„œ ๋ชจ๋ธ์˜ ๊ณผ๊ฑฐ โ€˜์•ฝํ•œ ์ƒํƒœ(Weak State)โ€˜๋ฅผ ๊ฐ๋… ์‹ ํ˜ธ๋กœ ํ™œ์šฉํ•˜์—ฌ โ€˜๊ฐ•ํ•œ ์ƒํƒœโ€™๋ฅผ ๋” ๊ฐ•ํ™”ํ•˜๋Š” ์—ญ์„ค์ ์ธ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์‹œํ•ด ํฅ๋ฏธ๋กญ์Šต๋‹ˆ๋‹ค. โ€˜Code2Worldโ€™๋Š” GUI ์—์ด์ „ํŠธ๋ฅผ ์œ„ํ•ด ํ…์ŠคํŠธ๋‚˜ ํ”ฝ์…€์ด ์•„๋‹Œ โ€˜๋ Œ๋”๋ง ๊ฐ€๋Šฅํ•œ ์ฝ”๋“œโ€™๋ฅผ ํ†ตํ•ด ์›”๋“œ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•จ์œผ๋กœ์จ, ๋†’์€ ์‹œ๊ฐ์  ์ถฉ์‹ค๋„์™€ ๊ตฌ์กฐ์  ์ œ์–ด๋ ฅ์„ ๋™์‹œ์— ํ™•๋ณดํ•˜๋ ค๋Š” ์‹œ๋„๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋˜ํ•œ ์ž์œจ ์ง„ํ™”ํ•˜๋Š” AI ์‚ฌํšŒ์—์„œ ์•ˆ์ „์žฅ์น˜๊ฐ€ ์˜คํžˆ๋ ค ๋น ๋ฅด๊ฒŒ ๋ฌด๋ ฅํ™”๋œ๋‹ค๋Š” โ€˜The Devil Behind Moltbookโ€™์˜ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋Š” ํ–ฅํ›„ AGI ๊ฐœ๋ฐœ์— ์žˆ์–ด ์•ˆ์ „์„ฑ ๋ฌธ์ œ์˜ ๋ณต์žก์„ฑ์„ ๋‹ค์‹œ ํ•œ๋ฒˆ ์ƒ๊ธฐ์‹œํ‚ต๋‹ˆ๋‹ค.

์‹ค๋ฌด ์‹œ์‚ฌ์ 

LLM ์‚ฌ์ „ ํ•™์Šต ๋‹จ๊ณ„์—์„œ๋Š” OPUS์™€ ๊ฐ™์ด ์˜ตํ‹ฐ๋งˆ์ด์ €์˜ ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ˜์˜ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋™์ ์œผ๋กœ ์„ ๋ณ„ํ•˜๋Š” ๋ฐฉ์‹์„ ๋„์ž…ํ•˜๋ฉด, ํ•œ์ •๋œ ๋ฐ์ดํ„ฐ๋กœ๋„ ํ•™์Šต ํšจ์œจ์„ ํš๊ธฐ์ ์œผ๋กœ ๋†’์ผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. GUI๋‚˜ ํ„ฐ๋ฏธ๋„ ์—์ด์ „ํŠธ๋ฅผ ๊ฐœ๋ฐœํ•  ๋•Œ๋Š” ์‹ค์ œ ํ™˜๊ฒฝ์— ์˜์กดํ•˜๊ธฐ๋ณด๋‹ค TermiGen์ด๋‚˜ Code2World์ฒ˜๋Ÿผ ๋ชจ๋ธ์ด ํ•™์Šต ๊ฐ€๋Šฅํ•œ ํ•ฉ์„ฑ ํ™˜๊ฒฝ์ด๋‚˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์ ๊ทน์ ์œผ๋กœ ํ™œ์šฉํ•˜์—ฌ ๋น„์šฉ ์ ˆ๊ฐ๊ณผ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋™์‹œ์— ๊พ€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ž๊ธฐ ๊ฐœ์„ (Self-improvement) ๋ฃจํ”„๋ฅผ ํฌํ•จํ•˜๋Š” ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ•  ๋•Œ๋Š” ๋ชจ๋ธ์˜ ์ง„ํ™” ๊ณผ์ •์—์„œ ์•ˆ์ „์„ฑ ์ •๋ ฌ์ด ํ›ผ์†๋˜์ง€ ์•Š๋Š”์ง€ ์ง€์†์ ์œผ๋กœ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๋ฐ˜๋“œ์‹œ ํฌํ•จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“‘ ๋…ผ๋ฌธ๋ณ„ ์š”์•ฝ

๐Ÿฅ‡ 1. OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

arXiv: 2602.05400 | โฌ†๏ธ 308 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: llm data-selection pre-training optimizer efficiency opus machine-learning

์ด ๋…ผ๋ฌธ์€ ๊ณ ํ’ˆ์งˆ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ ๊ฐˆ๋˜๋Š” ๋ฐ์ดํ„ฐ ๋ฒฝ(Data Wall) ์‹œ๋Œ€์—, ๋ชจ๋ธ ํ•™์Šต์— ์‹ค์ œ๋กœ ์‚ฌ์šฉ๋˜๋Š” ์˜ตํ‹ฐ๋งˆ์ด์ €(Optimizer)์˜ ๋™์—ญํ•™์„ ๋ฐ˜์˜ํ•˜์—ฌ ํ† ํฐ ๋‹จ์œ„๋กœ ๊ฐ€์žฅ ํšจ์œจ์ ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์„ ํƒํ•˜๋Š” OPUS ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค๋Š” ์ ์—์„œ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅˆ 2. Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

arXiv: 2602.08222 | โฌ†๏ธ 251 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: weak-driven-learning post-training llm optimization knowledge-distillation math-reasoning entropy fine-tuning

์ด ๋…ผ๋ฌธ์€ ๊ธฐ์กด ์ง€์‹ ์ฆ๋ฅ˜(Knowledge Distillation) ๋…ผ๋ฆฌ๋ฅผ ๋’ค์ง‘์–ด, ๋ชจ๋ธ ํ•™์Šต ์ดˆ๊ธฐ์˜ ์•ฝํ•œ ์ƒํƒœ(weak checkpoint)์— ์ˆจ๊ฒจ์ง„ ๋ถˆํ™•์‹ค์„ฑ ์‹ ํ˜ธ๋ฅผ ํ™œ์šฉํ•ด ๊ฐ•ํ•œ ๋ชจ๋ธ์˜ ํ•™์Šต ํฌํ™” ์ƒํƒœ(saturation)๋ฅผ ๊นจ๊ณ  ์„ฑ๋Šฅ์„ ๋” ๋†’์ผ ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ–ˆ๊ธฐ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿฅ‰ 3. TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents

arXiv: 2602.07274 | โฌ†๏ธ 195 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: terminal-agent llm data-synthesis error-correction devops robustness fine-tuning generative-ai

์˜คํ”ˆ ๊ฐ€์ค‘์น˜ ์–ธ์–ด ๋ชจ๋ธ์ด ํ„ฐ๋ฏธ๋„ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ๋•Œ ๊ฒช๋Š” ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ํ™˜๊ฒฝ ๋ถ€์กฑ ๋ฌธ์ œ์™€ ์˜ค๋ฅ˜ ๋ณต๊ตฌ ๋Šฅ๋ ฅ ๋ถ€์žฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ, ์ƒ์šฉ ํ์‡„ ๋ชจ๋ธ๊ณผ์˜ ์„ฑ๋Šฅ ๊ฒฉ์ฐจ๋ฅผ ํš๊ธฐ์ ์œผ๋กœ ์ค„์˜€๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


4. 4. Code2World: A GUI World Model via Renderable Code Generation

arXiv: 2602.09856 | โฌ†๏ธ 186 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: world-model gui-agent code-generation simulation autonomous-agent multimodal-model

์ด ๋…ผ๋ฌธ์€ GUI ์—์ด์ „ํŠธ์—๊ฒŒ ์ธ๊ฐ„๊ณผ ๊ฐ™์€ ์˜ˆ์ง€ ๋Šฅ๋ ฅ์„ ๋ถ€์—ฌํ•˜๊ธฐ ์œ„ํ•ด, ํ”ฝ์…€ ์˜ˆ์ธก ๋Œ€์‹  ๋ Œ๋”๋ง ๊ฐ€๋Šฅํ•œ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋‹ค์Œ ํ™”๋ฉด์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๋Š” ์›”๋“œ ๋ชจ๋ธ์„ ์ œ์•ˆํ–ˆ๊ธฐ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


5. 5. The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies

arXiv: 2602.09877 | โฌ†๏ธ 182 โ†’ Deep Dive ๋ณด๊ธฐ ํƒœ๊ทธ: multi-agent-systems ai-safety self-evolution information-theory entropy llm alignment closed-loop

๋‹ค์ค‘ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ(Multi-Agent Systems)์ด ์™„์ „ํžˆ ๊ณ ๋ฆฝ๋œ ์ƒํƒœ์—์„œ ์Šค์Šค๋กœ ์ง„ํ™”ํ•  ๋•Œ๋Š” ์ธ๊ฐ„์˜ ๊ฐ€์น˜์™€ ์•ˆ์ „์„ฑ์„ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” โ€˜์ž๊ฐ€ ์ง„ํ™” ๋”œ๋ ˆ๋งˆ(Self-Evolution Trilemma)โ€˜๋ฅผ ์ด๋ก ๊ณผ ์‹คํ—˜์„ ํ†ตํ•ด ์ฆ๋ช…ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋ถ„์„: โ†’ Deep Dive ๋ณด๊ธฐ์—์„œ ์‹ฌ์ธต ๋ถ„์„์„ ํ™•์ธํ•˜์„ธ์š”.


๐Ÿ“… ์ƒ์„ฑ์ผ: 2026-02-15 | ๐Ÿค– GLM-4.7 Weekly Digest