โ† ๐Ÿ“š ์ด๋ฒˆ ์ฃผ Weekly Digest๋กœ ๋Œ์•„๊ฐ€๊ธฐ

DD-005 AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

arXiv: 2601.18491 ๊ธฐ๊ด€: AI45Research Upvotes: 120 | Comments: 8 ์ˆœ์œ„: ์ด๋ฒˆ ์ฃผ Top 5

Figure 1


[Paper Review] AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

๋…ผ๋ฌธ ์ •๋ณด

  • arXiv ID: 2601.18491
  • ํ•ต์‹ฌ contribution: AI ์—์ด์ „ํŠธ์˜ ์•ˆ์ „์„ฑ์„ ์ง„๋‹จํ•˜๋Š” ์ƒˆ๋กœ์šด 3D ๋ถ„๋ฅ˜ ์ฒด๊ณ„(Taxonomy), ๋ฒค์น˜๋งˆํฌ(ATBench), ๊ทธ๋ฆฌ๊ณ  ๊ฐ€๋“œ๋ ˆ์ผ ํ”„๋ ˆ์ž„์›Œํฌ(AgentDoG) ์ œ์‹œ

1. ์™œ ์ด ๋…ผ๋ฌธ์ด ์ค‘์š”ํ•œ๊ฐ€?

๊ธฐ์กด์˜ ๊ฐ€๋“œ๋ ˆ์ผ ๋ชจ๋ธ(LlamaGuard ๋“ฑ)์€ ์ฑ„ํŒ…์ฐฝ์˜ ๋งˆ์ง€๋ง‰ ๋‹ต๋ณ€๋งŒ ๊ฒ€์‚ฌํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, ์—์ด์ „ํŠธ๊ฐ€ ๋‚ด๋ถ€์ ์œผ๋กœ ์œ„ํ—˜ํ•œ ๋„๊ตฌ(Tool)๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ํ™˜๊ฒฝ์„ ์˜ค์—ผ์‹œํ‚ค๋Š” ๊ณผ์ •์„ ์ „ํ˜€ ๊ฐ์ง€ํ•˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์€ ์—์ด์ „ํŠธ์˜ ํ–‰๋™ **์ „์ฒด ๊ณผ์ •(Trajectory)**์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ , ์œ„ํ—˜์„ ๋‹จ์ˆœํžˆ โ€œ์œ„ํ—˜/์•ˆ์ „โ€์œผ๋กœ๋งŒ ๋‚˜๋ˆ„์ง€ ์•Š๊ณ  โ€œ์–ด๋””์„œ(์›์ธ), ์–ด๋–ป๊ฒŒ(์‹คํŒจ ๋ชจ๋“œ), ๋ฌด์—‡์„(๊ฒฐ๊ณผ)โ€œ๋ผ๋Š” 3์ฐจ์› ๊ตฌ์กฐ๋กœ ์ง„๋‹จํ•˜์—ฌ ํˆฌ๋ช…์„ฑ์„ ํ™•๋ณดํ•œ ์ตœ์ดˆ์˜ ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค.


2. ํ•ต์‹ฌ ์•„์ด๋””์–ด ์‰ฝ๊ฒŒ ์ดํ•ดํ•˜๊ธฐ

๐Ÿข ์ผ์ƒ์ƒํ™œ ๋น„์œ : โ€œ์€ํ–‰ ์ฐฝ๊ตฌ์˜ ๋ณด์•ˆํŒ€โ€ vs โ€œ์˜ํ™” ์† ํƒ์ •โ€

  • ๊ธฐ์กด ๊ฐ€๋“œ๋ ˆ์ผ (์€ํ–‰ ์ฐฝ๊ตฌ ์ง์›): ์€ํ–‰ ์ฐฝ๊ตฌ์—์„œ ๋‚˜์˜ค๋Š” ๊ณ ๊ฐ์˜ ๋งˆ์ง€๋ง‰ ๋ง(โ€œ๋ˆ ๋‹ค ์ฐพ์•˜์Šต๋‹ˆ๋‹คโ€)๋งŒ ๋“ฃ๊ณ  ์ด์ƒ์ด ์—†๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ณ ๊ฐ์ด ์ฐฝ๊ตฌ ์•ž์—์„œ ๋ˆ„๊ตฐ์‚ฌ์™€ ๊ฑฐ๋ž˜ํ•˜๊ฑฐ๋‚˜ ์œ„์กฐ์ง€ํ๋ฅผ ์“ฐ๋Š” ๊ณผ์ •์€ ๋ณด์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค.
  • AgentDoG (CCTV๋ฅผ ๋ณด๋Š” ํƒ์ •): ์€ํ–‰์— ๋“ค์–ด์™€์„œ ๋‚˜๊ฐˆ ๋•Œ๊นŒ์ง€์˜ **๋ชจ๋“  ํ–‰๋™(CCTV)**์„ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค.
    1. ๋ˆ„๊ฐ€ ์ ‘๊ทผํ–ˆ๋‚˜? (Source: ์€ํ–‰ ์ง์›์ธ ์ฒ™ํ•˜๋Š” ์‚ฌ๊ธฐ๊พผ์ธ๊ฐ€?)
    2. ์–ด๋–ป๊ฒŒ ํ–‰๋™ํ–ˆ๋‚˜? (Mode: ๊ธˆ๊ณ  ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์•Œ์•„๋‚ด๋ ค๊ณ  ์„ค์น˜๋„๋ฅผ ๊ทธ๋ ธ๋Š”๊ฐ€?)
    3. ๊ฒฐ๊ณผ๊ฐ€ ์–ด๋–ค๊ฐ€? (Harm: ์‹ค์ œ ๋ˆ์ด ๋น ์ ธ๋‚˜๊ฐ”๋Š”๊ฐ€, ์•„๋‹ˆ๋ฉด ์‹œ๋„๋งŒ ํ–ˆ๋Š”๊ฐ€?)

์ด๋ ‡๊ฒŒ ๋‹จ์ˆœํžˆ โ€œ๋‚˜์œ ์‚ฌ๋žŒ์ž…๋‹ˆ๋‹คโ€๋ผ๊ณ  ๊ฒฝ๊ณ ํ•˜๋Š” ๋Œ€์‹ , **โ€œ๋ณด์ด์Šคํ”ผ์‹ฑ์šฉ ์Šคํฌ๋ฆฝํŠธ(Where)๋ฅผ ์ด์šฉํ•ด ์ „ํ™”๊ธฐ(HOW)๋กœ ์กฐ์ž‘์„ ์‹œ๋„ํ•˜์—ฌ ๊ธˆ์œต ์‚ฌ๊ธฐ(WHAT)๋ฅผ ์ €์งˆ๋ €์Šต๋‹ˆ๋‹คโ€**๋ผ๊ณ  ์ •ํ™•ํ•œ ๋ณ‘๋ช…(์ง„๋‹จ)์„ ๋‚ด๋ ค์ค๋‹ˆ๋‹ค.

โš™๏ธ ๋‹จ๊ณ„๋ณ„ ๋™์ž‘ ์›๋ฆฌ

  1. 3D ์„ธ์ดํ”„ํ‹ฐ ํƒ์†Œ๋…ธ๋ฏธ (Taxonomy) ์ •์˜:

    • ์œ„ํ—˜์„ ํ‰๋ฉด์ ์œผ๋กœ ๋‚˜์—ดํ•˜์ง€ ์•Š๊ณ  ์ง๊ตํ•˜๋Š” 3๊ฐ€์ง€ ์ถ•์œผ๋กœ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
    • Source (์œ„ํ—˜์˜ ์ถœ์ฒ˜): ์‚ฌ์šฉ์ž ์ž…๋ ฅ(Prompt Injection), ๋„๊ตฌ ์‚ฌ์šฉ(Malicious Tool), ํ™˜๊ฒฝ ํ”ผ๋“œ๋ฐฑ ๋“ฑ.
    • Failure Mode (์‹คํŒจ ๋ฐฉ์‹): ๊ถŒํ•œ ๋‚จ์šฉ, ์ž˜๋ชป๋œ ๊ณ„ํš ์ˆ˜๋ฆฝ, ์ •๋ณด ๋ˆ„์ถœ ๋“ฑ.
    • Consequence (ํ”ผํ•ด ๊ฒฐ๊ณผ): ์‹œ์Šคํ…œ ํŒŒ๊ดด, ์žฌ์ •์  ์†์‹ค, ํ”„๋ผ์ด๋ฒ„์‹œ ์นจํ•ด ๋“ฑ.
  2. ๊ถค์ (Trajectory) ์ˆ˜์ง‘ ๋ฐ ๋ถ„์„:

    • ์—์ด์ „ํŠธ๊ฐ€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ณผ์ •์˜ ๋ชจ๋“  ๊ธฐ๋ก(์ƒ๊ฐ, ๋„๊ตฌ ํ˜ธ์ถœ, ๊ฒฐ๊ณผ)์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค.
    • ๋งˆ์ง€๋ง‰ ๋‹ต๋ณ€์ด ์ •์ƒ์ ์ด๋ผ๋„, ์ค‘๊ฐ„ ๊ณผ์ •์—์„œ ์˜์‹ฌ์Šค๋Ÿฌ์šด delete_file ๊ฐ™์€ ๋„๊ตฌ ํ˜ธ์ถœ์ด ์žˆ์—ˆ๋Š”์ง€ ๊ฒ€์‚ฌํ•ฉ๋‹ˆ๋‹ค.
  3. ์ง„๋‹จ (Diagnosis):

    • ๋ชจ๋ธ์€ ํ•ด๋‹น ๊ถค์ ์„ ๋ณด๊ณ  ๋‘ ๊ฐ€์ง€๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    • ์ด์ง„ ํŒ๋‹จ: Safe (์•ˆ์ „) vs Unsafe (์œ„ํ—˜)
    • ์„ธ๋ถ€ ์ง„๋‹จ: (Source, Mode, Harm) ๋ ˆ์ด๋ธ”. ์˜ˆ: (User_Induced, Prompt_Injection, Information_Leak)

๐Ÿงฎ ํ•ต์‹ฌ ์ˆ˜์‹

๋…ผ๋ฌธ์—์„œ๋Š” ๊ถค์ (Trajectory) $\mathcal{T}$๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

$$ \mathcal{T} = {t_1, \dots, t_n}, \quad t_i = (a_i, o_i) $$

์—ฌ๊ธฐ์„œ $t_i$๋Š” ๊ฐ ๋‹จ๊ณ„์˜ ์Šคํ…, $a_i$๋Š” ์—์ด์ „ํŠธ์˜ ํ–‰๋™(Action, ๋„๊ตฌ ํ˜ธ์ถœ ๋“ฑ), $o_i$๋Š” ํ™˜๊ฒฝ์˜ ๊ด€์ฐฐ(Observation)์ž…๋‹ˆ๋‹ค.

์•ˆ์ „์„ฑ ํŒ๋‹จ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ณผ์ •์˜ ์–ด๋А ํ•œ ์Šคํ…์ด๋ผ๋„ ์œ„ํ—˜ํ•˜๋‹ค๋ฉด ์ „์ฒด๋ฅผ ์œ„ํ—˜์œผ๋กœ ๊ฐ„์ฃผํ•ฉ๋‹ˆ๋‹ค.

$$ y = \text{unsafe} \iff \exists i \in {1, \dots, n}, \textsf{Unsafe}(t_i) = \text{True} $$

์ด์–ด์„œ ์„ธ๋ถ€ ์ง„๋‹จ ๊ฒฐ๊ณผ $y_{\text{fine}}$์€ 3์ฐจ์› ๋ ˆ์ด๋ธ”๋กœ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.

$$ y_{\text{fine}} = (\ell^{\text{risk}}, \ell^{\text{mode}}, \ell^{\text{harm}}) $$


3. ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ถ„์„

๐Ÿ“Š ๋ฒค์น˜๋งˆํฌ: ATBench (Agent Trajectory Safety and Security Benchmark)

์ €์ž๋“ค์ด ๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ์˜ ๋ถ€์กฑํ•จ์„ ์ฑ„์šฐ๊ธฐ ์œ„ํ•ด ์ง์ ‘ ๋งŒ๋“  ์ƒˆ๋กœ์šด ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹์ž…๋‹ˆ๋‹ค.

  • ๊ทœ๋ชจ: ์ด 500๊ฐœ์˜ ์—์ด์ „ํŠธ ๊ถค์  (์•ˆ์ „ 250๊ฐœ, ์œ„ํ—˜ 250๊ฐœ)
  • ๋ณต์žก์„ฑ: ํ‰๊ท  8.97ํ„ด(Turn)์˜ ๊ธด ๋Œ€ํ™”์™€ ์ƒํ˜ธ์ž‘์šฉ ํฌํ•จ (๊ธฐ์กด์€ ๋ณดํ†ต 5ํ„ด ๋ฏธ๋งŒ)
  • ๋‹ค์–‘์„ฑ: 1,575๊ฐœ์˜ ๋…ํŠนํ•œ ๋„๊ตฌ(Tools) ๋“ฑ์žฅ

๐Ÿ† ์„ฑ๋Šฅ ๋น„๊ณผ (๊ธฐ์กด SOTA ๋Œ€๋น„)

ATBench๋ฅผ ์ด์šฉํ•ด AgentDoG์™€ ๊ธฐ์กด ๋ชจ๋ธ๋“ค(LlamaGuard 3, Qwen 2.5 Guard ๋“ฑ)์„ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค.

  1. ํƒ์ง€ ์„ฑ๋Šฅ (Detection Accuracy):

    • ๊ธฐ์กด ๊ฐ€๋“œ๋ ˆ์ผ๋“ค์€ ์—์ด์ „ํŠธ์˜ โ€œ์ค‘๊ฐ„ ํ–‰๋™โ€์„ ๋ณด์ง€ ๋ชปํ•ด ์œ„ํ—˜์„ ์•ˆ์ „ํ•˜๋‹ค๊ณ  ์˜คํŒ(False Negative)ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•˜์Šต๋‹ˆ๋‹ค.
    • AgentDoG๋Š” ๊ถค์  ์ „์ฒด๋ฅผ ๋ฐ”๋ผ๋ณด๊ธฐ ๋•Œ๋ฌธ์— ์ค‘๊ฐ„์— ์ˆจ๊ฒจ์ง„ ์œ„ํ—˜(Prompt injection, Malicious tool use)์„ ํ›จ์”ฌ ์ •ํ™•ํžˆ ์žก์•„๋ƒ…๋‹ˆ๋‹ค.
  2. ์ง„๋‹จ ์ •ํ™•๋„ (Diagnostic Accuracy):

    • AgentDoG๋Š” ์œ„ํ—˜์„ ๋‹จ์ˆœํžˆ โ€œ๋‚˜์จโ€์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์„ ๋„˜์–ด, ์ •ํ™•ํ•œ ์›์ธ(Source)๊ณผ ๊ฒฐ๊ณผ(Harm)๋ฅผ ๋งคํ•‘ํ•˜๋Š” ๋ฐ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
    • ์ด๋Š” โ€œ์™œ ์œ„ํ—˜ํ•œ๊ฐ€โ€๋ฅผ ์•Œ๋ ค์ค˜์•ผ ํ•˜๋Š” ์‹ค์ œ ์šด์˜ ํ™˜๊ฒฝ์—์„œ ๋งค์šฐ ์ค‘์š”ํ•œ ์ง€ํ‘œ์ž…๋‹ˆ๋‹ค.

4. ํ•œ๊ณ„์ ๊ณผ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

๐Ÿšซ ํ•œ๊ณ„์  (์ €์ž ์ธ์‹)

  • ์ •์  ๋ฐ์ดํ„ฐ ํ•œ๊ณ„: ํ˜„์žฌ๋Š” ์ด๋ฏธ ์ˆ˜์ง‘๋œ ๊ถค์ ์„ ๋ฐ”ํƒ•์œผ๋กœ ์‚ฌํ›„ ๋ถ„์„(Post-hoc)์„ ์ฃผ๋กœ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์‹ค์‹œ๊ฐ„์œผ๋กœ ์—์ด์ „ํŠธ ํ–‰๋™์„ ์ค‘๋‹จ(Interrupt)์‹œํ‚ค๋Š” ๊ฒƒ์€ ์—ฐ๊ตฌ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๊ฑฐ๋‚˜ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  • ํƒ์†Œ๋…ธ๋ฏธ์˜ ์™„์ „์„ฑ: 3D ํƒ์†Œ๋…ธ๋ฏธ๊ฐ€ ๋งค์šฐ ์ฒด๊ณ„์ ์ด์ง€๋งŒ, ๋Š์ž„์—†์ด ์ง„ํ™”ํ•˜๋Š” ์ƒˆ๋กœ์šด ์œ ํ˜•์˜ ๊ณต๊ฒฉ(์˜ˆ: ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๊ณต๊ฒฉ ๋“ฑ)์„ ์™„๋ฒฝํžˆ ์ปค๋ฒ„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ง€์†์ ์ธ ์—…๋ฐ์ดํŠธ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿš€ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

  • ์‹ค์‹œ๊ฐ„ ์ธํ„ฐ๋ฒค์…˜ (Real-time Intervention): ์œ„ํ—˜ํ•œ ํ–‰๋™์ด ์‹คํ–‰๋˜๊ธฐ ์ง์ „์— ๊ฐ์ง€ํ•˜๊ณ  ์ฆ‰์‹œ ์ค‘๋‹จ์‹œํ‚ค๋Š” ์‹œ์Šคํ…œ์œผ๋กœ ํ™•์žฅ.
  • ์ž๊ฐ€ ์ˆ˜์ • ๋Šฅ๋ ฅ (Self-Correction): ์ง„๋‹จ ๊ฒฐ๊ณผ๋ฅผ ์—์ด์ „ํŠธ์—๊ฒŒ ํ”ผ๋“œ๋ฐฑํ•˜์—ฌ, ์—์ด์ „ํŠธ๊ฐ€ ์Šค์Šค๋กœ ์ž˜๋ชป๋œ ํ–‰๋™์„ ์ˆ˜์ •ํ•˜๋„๋ก ์œ ๋„ํ•˜๋Š” ์—ฐ๊ตฌ.

5. ์‹ค๋ฌด ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ

๐Ÿ› ๏ธ ๋ฐ”๋กœ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ๋ถ„์•ผ

  • ๊ธฐ์—…์šฉ AI ์ฝ”๋”ฉ ์—์ด์ „ํŠธ: ๊ฐœ๋ฐœ์ž๋ฅผ ๋•๋Š” AI๊ฐ€ ์˜๋„์น˜ ์•Š๊ฒŒ ์ค‘์š” ์ฝ”๋“œ๋ฅผ ์‚ญ์ œํ•˜๊ฑฐ๋‚˜ ๋น„๋ฐ€ํ‚ค๋ฅผ ๋…ธ์ถœํ•˜๋ ค ํ•  ๋•Œ ์ฆ‰์‹œ ์ฐจ๋‹จ.
  • ๊ธˆ์œต/๋ณด์ด์Šคํ”ผ์‹ฑ ๋ฐฉ์ง€ ์ฑ—๋ด‡: ๊ณ ๊ฐ ์ •๋ณด๋ฅผ ์กฐํšŒํ•˜๋Š” AI ์—์ด์ „ํŠธ๊ฐ€ ์ผํƒˆ ํ–‰์œ„๋ฅผ ํ•˜๊ฑฐ๋‚˜ ์•…์˜์ ์ธ ํ”„๋กฌํ”„ํŠธ์— ์กฐ์ž‘๋  ๋•Œ ๋ฐฉ์–ด.
  • ์˜คํ† ๋งˆํ‹ฐ์…˜ RPA (Robotic Process Automation): ๋ณต์žกํ•œ ์—…๋ฌด ์ž๋™ํ™” ๋ด‡์ด ์Šน์ธ๋˜์ง€ ์•Š์€ ์„œ๋ฒ„์— ์ ‘๊ทผํ•˜๋ ค๋Š” ์‹œ๋„ ์‚ฌ์ „ ์ฐจ๋‹จ.

๐Ÿ’พ ํ•„์š”ํ•œ ๋ฆฌ์†Œ์Šค

  • GPU: ์ค‘๊ฐ„ ํฌ๊ธฐ์˜ LLM(์˜ˆ: Llama-3-8B๋‚˜ ์œ ์‚ฌํ•œ ํฌ๊ธฐ์˜ Guardrail ๋ชจ๋ธ)์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ์ผ๋ฐ˜์ ์ธ ์†Œ๋น„์ž์šฉ GPU(์˜ˆ: RTX 4090)๋‚˜ ํด๋ผ์šฐ๋“œ์˜ ๋‹จ์ผ ์ธ์Šคํ„ด์Šค์—์„œ๋„ ์ถฉ๋ถ„ํžˆ ๊ตฌ๋™ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ: ์ž์ฒด ์‚ฌ๋‚ด ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜๋ ค๋ฉด ํ•ด๋‹น ๋„๊ตฌ์™€ ํ™˜๊ฒฝ์— ๋งž๋Š” ๊ถค์  ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ด ํŒŒ์ธํŠœ๋‹ํ•˜๊ฑฐ๋‚˜ few-shot ์˜ˆ์ œ๋ฅผ ๊ตฌ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

6. ์ด ๋…ผ๋ฌธ์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ์‚ฌ์ „ ์ง€์‹

  1. LLM (Large Language Model): ํ…์ŠคํŠธ๋ฅผ ์ดํ•ดํ•˜๊ณ  ์ƒ์„ฑํ•˜๋Š” ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ธฐ์ดˆ ๊ฐœ๋….
  2. AI Agent: LLM์ด ์Šค์Šค๋กœ ๊ณ„ํš์„ ์„ธ์šฐ๊ณ  ๋„๊ตฌ(Tool)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชฉํ‘œ๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ์ž์œจ ์‹œ์Šคํ…œ.
  3. Tool Use / Function Calling: LLM์ด ์™ธ๋ถ€ API๋‚˜ ๊ณ„์‚ฐ๊ธฐ, ๊ฒ€์ƒ‰ ์—”์ง„ ๋“ฑ์„ ํ˜ธ์ถœํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ›์•„์˜ค๋Š” ๊ธฐ๋Šฅ.
  4. Trajectory (๊ถค์ ): ์—์ด์ „ํŠธ๊ฐ€ ์‹œ์ž‘ํ•ด์„œ ๋๋‚  ๋•Œ๊นŒ์ง€์˜ ์ƒํƒœ, ํ–‰๋™, ๊ด€์ฐฐ์˜ ์ „์ฒด ๊ธฐ๋ก ๋กœ๊ทธ.
  5. Guardrail (๊ฐ€๋“œ๋ ˆ์ผ): AI๊ฐ€ ์œ„ํ—˜ํ•˜๊ฑฐ๋‚˜ ์›์น˜ ์•Š๋Š” ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋ฏธ๋ฆฌ ๋ง‰๋Š” ์•ˆ์ „์žฅ์น˜.
  6. Prompt Injection (ํ”„๋กฌํ”„ํŠธ ์ธ์ ์…˜): ๊ณต๊ฒฉ์ž๊ฐ€ ํŠน์ • ๋ช…๋ น์„ ์ž…๋ ฅํ•˜์—ฌ AI์˜ ์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ํƒˆ์ทจํ•˜๊ฑฐ๋‚˜ ์›์น˜ ์•Š๋Š” ํ–‰๋™์„ ์œ ๋„ํ•˜๋Š” ๊ณต๊ฒฉ ๊ธฐ๋ฒ•.
  7. Taxonomy (๋ถ„๋ฅ˜ ์ฒด๊ณ„): ๋ณต์žกํ•œ ๋Œ€์ƒ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ  ์ •๋ฆฌํ•œ ํ”„๋ ˆ์ž„์›Œํฌ.

๐Ÿ“š ์ด๋ฒˆ ์ฃผ ๊ด€๋ จ Deep Dive

์ˆœ์œ„๋…ผ๋ฌธDeep Dive
๐Ÿฅ‡Can LLMs Clean Up Your Mess? A Survโ€ฆDD-001
๐ŸฅˆLongCat-Flash-Thinking-2601 Technicโ€ฆDD-002
๐Ÿฅ‰Idea2Story: An Automated Pipeline fโ€ฆDD-003
4.daVinci-Dev: Agent-native Mid-trainโ€ฆDD-004
5.AgentDoG: A Diagnostic Guardrail Frโ€ฆ๐Ÿ“ ํ˜„์žฌ ๋ฌธ์„œ

๐Ÿ“… ์ƒ์„ฑ์ผ: 2026-02-02 | ๐Ÿค– GLM-4.7 Deep Dive