RTX 2060 SUPER + Llama 3B VS 8B Korean Real-World Benchmark: Speed, VRAM, and Analysis

AI MARK Benchmark/AI MARK Benchmark English Translation

RTX 2060 SUPER + Llama 3B VS 8B Korean Real-World Benchmark: Speed, VRAM, and Analysis

AI Orchestrator 2025. 10. 22. 11:53

🎥 'PKC Project' Video Summary: LLM Bllossom 3B vs 8B Benchmark

RTX2060 super llm llama 8B performance

This post contains real-world LLM benchmarks
on a consumer 8GB GPU.

Topic: RTX 2060 SUPER-Based Llama 8B Performance and Real-World Korean Usability Comparison

Tested Models: Q4 Quantized Public Models (Bllossom-3B, Bllossom-8B)

Test Environment: RTX 2060 SUPER 8GB

📊 Bllossom-3B vs Bllossom-8B Detailed Comparison

MetricBllossom-3BBllossom-8B

Model Status	⚠️ Quality Degradation Indicated in some prompts	All Prompts ✅ Successful
Load Time (Load)	Approx. 2.2 sec	Approx. 2.6 sec
VRAM Usage	Avg. 4.19 ~ 4.24GB	Avg. 6.91 ~ 6.96GB
Power Consumption	160~170W	172~173W
Temperature (Temp)	67~68℃	67~70℃
CPU Utilization	28% ~ 41%	24% ~ 46%
TTFT (Time To First Token Latency)	Approx. 20~36ms	Approx. 38~51ms
TPS (Tokens Per Second)	79~80 tok/s	46 tok/s
Inference Time	0.0000 (Potential Labeling Error)	0.0000 (Potential Labeling Error)
Output Characteristics	- Mixed Language Output (Korean, English, Chinese, Spanish etc.) - Repetition of the Same Response Occurs - Poor Conversational Context Retention	- Stable Korean Output Focus - Includes Sentiment Tagging (e.g., Expectation 80%, Anxiety 40%) - Specific and Structured Responses
Output Example	- Repeated English phrases like “really big thing happened” - Mixed Chinese/Spanish like 乐观, Friendship, realmente	- Natural Korean responses fitting the prompt - Situational advice (time management, stress relief, etc.) - Added Sentiment Classification (expectation, anxiety, fear, etc.)

✨ Summary and Conclusion

Bllossom-3B: It is lightweight, consumes less VRAM, and offers fast loading (2.2s) and TTFT. However, Mixed Language Output and Repetition Issues lead to poor stability for practical use. It is acceptable for debugging/experimentation but unstable for actual service application.

Bllossom-8B: While consuming more VRAM (approx. 6.9GB) and power than the 3B model, it provides significantly more stable Korean output, better conversational context retention, and excellent Sentiment Analysis functionality. Despite lower TPS, its superior TTFT (under 51ms) makes it an extremely suitable model for real-world conversation and service environments.

Unlike most benchmarks using A100 or RTX 4090,
this test was conducted on a mid-range consumer GPU.

저작자표시 비영리 변경금지 (새창열림)

'AI MARK Benchmark > AI MARK Benchmark English Translation' 카테고리의 다른 글

LLaVA v1.5-7B LLM Benchmark: An In-Depth AI Performance Analysis Report (0)	2025.11.03
Llm Llama 3.2-8B: 8GB VRAM Limits Smashed (RTX 2060 S) (0)	2025.10.30
LLM Llama3 8B + Sentiment Analyzer vs. Community 4-bit Benchmark Comparison (RTX 2060 8GB benchmark Performance) (0)	2025.10.22
Help me finish my LLM benchmark tool – I’m too lazy! (0)	2025.10.12
PKC MARK Benchmark Tool Current Progress (0)	2025.10.04

현재글RTX 2060 SUPER + Llama 3B VS 8B Korean Real-World Benchmark: Speed, VRAM, and Analysis

PKC Project

LLM과 LLaMA 기반 멀티모달 챗봇을 RTX 2060 Super 8GB 보급형 PC에서 구현하며 남기는 PKC Project 기록 블로그.

PKC Project

AI Orchestrator

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

PKC Project