๐ฅ 'PKC Project' Video Summary: LLM Bllossom 3B vs 8B Benchmark
This post contains real-world LLM benchmarks
on a consumer 8GB GPU.
Topic: RTX 2060 SUPER-Based Llama 8B Performance and Real-World Korean Usability Comparison
Tested Models: Q4 Quantized Public Models (Bllossom-3B, Bllossom-8B)
Test Environment: RTX 2060 SUPER 8GB
๐ Bllossom-3B vs Bllossom-8B Detailed Comparison
MetricBllossom-3BBllossom-8B
| Model Status | โ ๏ธ Quality Degradation Indicated in some prompts | All Prompts โ Successful |
| Load Time (Load) | Approx. 2.2 sec | Approx. 2.6 sec |
| VRAM Usage | Avg. 4.19 ~ 4.24GB | Avg. 6.91 ~ 6.96GB |
| Power Consumption | 160~170W | 172~173W |
| Temperature (Temp) | 67~68โ | 67~70โ |
| CPU Utilization | 28% ~ 41% | 24% ~ 46% |
| TTFT (Time To First Token Latency) | Approx. 20~36ms | Approx. 38~51ms |
| TPS (Tokens Per Second) | 79~80 tok/s | 46 tok/s |
| Inference Time | 0.0000 (Potential Labeling Error) | 0.0000 (Potential Labeling Error) |
| Output Characteristics | - Mixed Language Output (Korean, English, Chinese, Spanish etc.) - Repetition of the Same Response Occurs - Poor Conversational Context Retention | - Stable Korean Output Focus - Includes Sentiment Tagging (e.g., Expectation 80%, Anxiety 40%) - Specific and Structured Responses |
| Output Example | - Repeated English phrases like “really big thing happened” - Mixed Chinese/Spanish like ไน่ง, Friendship, realmente | - Natural Korean responses fitting the prompt - Situational advice (time management, stress relief, etc.) - Added Sentiment Classification (expectation, anxiety, fear, etc.) |
โจ Summary and Conclusion
Bllossom-3B: It is lightweight, consumes less VRAM, and offers fast loading (2.2s) and TTFT. However, Mixed Language Output and Repetition Issues lead to poor stability for practical use. It is acceptable for debugging/experimentation but unstable for actual service application.
Bllossom-8B: While consuming more VRAM (approx. 6.9GB) and power than the 3B model, it provides significantly more stable Korean output, better conversational context retention, and excellent Sentiment Analysis functionality. Despite lower TPS, its superior TTFT (under 51ms) makes it an extremely suitable model for real-world conversation and service environments.
Unlike most benchmarks using A100 or RTX 4090,
this test was conducted on a mid-range consumer GPU.