PKC AI Project

AI๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ณด๊ธ‰ํ˜• ๊ทธ๋ž˜ํ”ฝ ์นด๋“œ์—์„œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ฑ—๋ด‡ ๋งŒ๋“ค์–ด ๋ณด๊ธฐ

AI MARK Benchmark/AI MARK Benchmark English Translation

RTX 2060 SUPER + Llama 3B VS 8B Korean Real-World Benchmark: Speed, VRAM, and Analysis

AI Orchestrator 2025. 10. 22. 11:53

๐ŸŽฅ 'PKC Project' Video Summary: LLM Bllossom 3B vs 8B Benchmark

RTX2060 super llm llama 8B performance

 

This post contains real-world LLM benchmarks
on a consumer 8GB GPU.

Topic: RTX 2060 SUPER-Based Llama 8B Performance and Real-World Korean Usability Comparison

Tested Models: Q4 Quantized Public Models (Bllossom-3B, Bllossom-8B)

Test Environment: RTX 2060 SUPER 8GB

๐Ÿ“Š Bllossom-3B vs Bllossom-8B Detailed Comparison

MetricBllossom-3BBllossom-8B

Model Status โš ๏ธ Quality Degradation Indicated in some prompts All Prompts โœ… Successful
Load Time (Load) Approx. 2.2 sec Approx. 2.6 sec
VRAM Usage Avg. 4.19 ~ 4.24GB Avg. 6.91 ~ 6.96GB
Power Consumption 160~170W 172~173W
Temperature (Temp) 67~68โ„ƒ 67~70โ„ƒ
CPU Utilization 28% ~ 41% 24% ~ 46%
TTFT (Time To First Token Latency) Approx. 20~36ms Approx. 38~51ms
TPS (Tokens Per Second) 79~80 tok/s 46 tok/s
Inference Time 0.0000 (Potential Labeling Error) 0.0000 (Potential Labeling Error)
Output Characteristics - Mixed Language Output (Korean, English, Chinese, Spanish etc.) - Repetition of the Same Response Occurs - Poor Conversational Context Retention - Stable Korean Output Focus - Includes Sentiment Tagging (e.g., Expectation 80%, Anxiety 40%) - Specific and Structured Responses
Output Example - Repeated English phrases like “really big thing happened” - Mixed Chinese/Spanish like ไน่ง‚, Friendship, realmente - Natural Korean responses fitting the prompt - Situational advice (time management, stress relief, etc.) - Added Sentiment Classification (expectation, anxiety, fear, etc.)

โœจ Summary and Conclusion

Bllossom-3B: It is lightweight, consumes less VRAM, and offers fast loading (2.2s) and TTFT. However, Mixed Language Output and Repetition Issues lead to poor stability for practical use. It is acceptable for debugging/experimentation but unstable for actual service application.

Bllossom-8B: While consuming more VRAM (approx. 6.9GB) and power than the 3B model, it provides significantly more stable Korean output, better conversational context retention, and excellent Sentiment Analysis functionality. Despite lower TPS, its superior TTFT (under 51ms) makes it an extremely suitable model for real-world conversation and service environments.

 

 

Unlike most benchmarks using A100 or RTX 4090,
this test was conducted on a mid-range consumer GPU.