๐Ÿ“ฑย ****ํ”„๋กœ์ ํŠธ ํ•œ ์ค„ ์†Œ๊ฐœ

image.png

<aside> ๐Ÿ’ก

๊ฐ•ํ™” ํ•™์Šต์„ ํ†ตํ•ด CPU ๋ฐ GPU ์ „๋ ฅ์„ ๋™์ ์œผ๋กœ ์กฐ์ •ํ•ด ํŠน์ • On-Device LLM Inference ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋งž๊ฒŒ ์—๋„ˆ์ง€ ํšจ์œจ์„ ๋†’์ž…๋‹ˆ๋‹ค.

โ†’ Target ์„ฑ๋Šฅ ๋Œ€๋น„ 27.3% ํ–ฅ์ƒ, ๊ธฐ์กด ์ „๋ ฅ ๋ชจ๋“œ ๋Œ€๋น„ ๋ฐฐํ„ฐ๋ฆฌ ์ˆ˜๋ช… 61.04% ํ–ฅ์ƒ

</aside>


๐Ÿ“ย ๋ฌธ์ œ ์ •์˜

  1. Cloud vs On-device

    Cloud-based inference On-device inference
    ์ง€์—ฐ ์‹œ๊ฐ„ โฌ†๏ธ
    Privacy ๋ฌธ์ œ ์•ผ๊ธฐ
    ๋„คํŠธ์›Œํฌ ์—ฐ๊ฒฐ ํ•„์ˆ˜ ์ง€์—ฐ ์‹œ๊ฐ„ โฌ‡๏ธ
    Privacy ๋ฌธ์ œ ํ•ด๊ฒฐ
    ๋„คํŠธ์›Œํฌ ์˜์กด์„ฑ ๊ฐ์†Œ
  2. ๊ธฐ์กด ์—ฐ๊ตฌ์™€ ์ฐจ์ด์ 


๐Ÿ“ถย ๋™๊ธฐ

CPU ๋ฐ GPU ์ฃผํŒŒ์ˆ˜์— ๋”ฐ๋ฅธ LLM Throughput ๋ณ€ํ™”

CPU ๋ฐ GPU ์ฃผํŒŒ์ˆ˜์— ๋”ฐ๋ฅธ LLM Throughput ๋ณ€ํ™”

CPU ๋ฐ GPU ์ฃผํŒŒ์ˆ˜์— ๋”ฐ๋ฅธ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰ ๋ณ€ํ™”

CPU ๋ฐ GPU ์ฃผํŒŒ์ˆ˜์— ๋”ฐ๋ฅธ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰ ๋ณ€ํ™”

<aside> <img src="/icons/light-bulb_gray.svg" alt="/icons/light-bulb_gray.svg" width="40px" />

๋ชฉ์ฐจ


</aside>

<aside> <img src="/icons/arrow-northeast_gray.svg" alt="/icons/arrow-northeast_gray.svg" width="40px" />

๋ฐ”๋กœ๊ฐ€๊ธฐ


Paper (ICCE 2025)

</aside>


๐Ÿ’ซย Reinforcement Learning Structure

$\alpha = 100$, $\beta = 3$

$\alpha = 100$, $\beta = 3$


๐Ÿ—’๏ธย ํ”„๋กœ์ ํŠธ ๊ฐœ๋ฐœ ๋…ธํŠธ

<aside> ๐Ÿ“Œ

Scenario Design + Data collection

Scenario Design + Data collection

</aside>

<aside> ๐Ÿ“Œ

Environment + Agent

Enviroment + Agent

</aside>