AI MARK Benchmark/AI MARK Benchmark English Translation

PKC Benchmark Tool MARK - Unified LLM Performance Measurement Beta Version

AI Orchestrator 2025. 11. 6. 19:20

PKC Benchmark Tool MARK

A New Beta Version for Unified LLM Performance Measurement

As AI models continue to evolve rapidly, there is a growing need for a practical way to evaluate and compare the performance of local Large Language Models (LLMs).
This article introduces the PKC Benchmark Tool - MARK (Beta), providing an in-depth look into its architecture, benchmarking pipeline, and test logic.

⬇️ ⬇️ ⬇️ Please read this section carefully before proceeding. ⬇️ ⬇️ ⬇️

⬇️ Download Links:

PKC_Benchmark_Tool_MARK.zip

0.04MB

🚀 Download from this blog 📥

💾 Download from Google Drive 📥

🧠 Download from GitHub 📥

1️⃣ Project Overview

PKC MARK Benchmark Tool (Beta) is a web-based benchmarking system that visualizes the real inference performance of local AI models.
It automatically detects Llama, Diffusers, and Transformers model families and measures performance using unified prompts and standardized metrics.

Supported Model Families:

Llama (Language Model)
Diffusers (Image Generation Model)
Transformers / GGUF models

Main Metrics:

VRAM usage (GB)
Model load time (s)
Token generation speed (TPS)
Time to First Token (TTFT, ms)
GPU power / temperature / CPU utilization

2️⃣ Key Features in the Beta Version

The beta release introduces several key improvements and new visualization features.

🔹 Real-Time Pipeline Visualization

Each benchmarking stage is logged and displayed by prompt, allowing users to track the end-to-end flow between analysis models and LLMs.

🔹 Automatic Model Scanning (via config.json)

Set your model folder path in the models_scan_path field.
The server will automatically scan and categorize models when launched.

Example:

{
    "results_dir": "results",
    "models_scan_path": "C:/MyModels"
}

🔹 Integrated Web UI (benchmark_canvas.html)

TailwindCSS-based dark mode interface
Real-time logs, charts, and comparison tabs
Results automatically stored in browser LocalStorage

🔹 Pipeline Integration Testing

Analysis model results (e.g., emotion analysis) are automatically injected into LLM prompts, enabling end-to-end pipeline testing and quality tracking.

3️⃣ Server and Execution Structure

benchmark_server.py runs on FastAPI and handles the following operations:

Model scanning and loading (scan_models_directory)
GPU metrics monitoring (VRAM, Power, Temp via pynvml)
Real-time pipeline event streaming (/api/stream)
Benchmark execution (/api/run-benchmark)
Result saving (JSON + HTML)

The server uses uvicorn, and on Windows, it can be launched instantly using start_server_windows.bat.

4️⃣ Benchmark Execution & Metrics Collection

Models can be tested in Sequential or Parallel mode:

Sequential: Loads and tests models one by one (VRAM-efficient)
Parallel: Keeps models cached for repeated runs (faster throughput)

Key metrics recorded:

load_time_s — model loading time
vram_usage_gb — GPU memory usage
ttft_ms — time to first token
tokens_per_second — inference speed
inference_time_s — total generation time

5️⃣ Results and Analysis

All test results are automatically saved in both JSON and HTML formats under the results directory.

Each HTML report includes:

Model-wise metrics for each prompt
Output text and status indicators (✅ / ⚠️ / ❌)
Llama.cpp acceleration details (AVX, CUBLAS, etc.)

Additionally, the web UI can load LocalStorage results into Comparison Mode to visualize performance differences between benchmark sessions.

6️⃣ Conclusion

The PKC Benchmark Tool MARK (Beta) provides a practical way to quantify LLM performance and visualize pipeline-based interactions.
Its integrated architecture enables real-world evaluation of AI model chains, from analysis to response generation.

This beta release focuses on model scanning, cache management, and VRAM auto-retry stability.
Future updates will introduce distributed benchmarking and auto-report generation features.

For questions, bug reports, or suggestions, feel free to leave a comment or contact the author.

Keywords:
AI, LLM, Llama, Transformers, Diffusers, Benchmark, PKC MARK, AI Benchmark, Model Test, Local LLM, Pipeline, FastAPI, VRAM, Inference, Token Speed, TTFT, GPU Benchmark, Machine Learning Performance, Open Source AI, Beta Version, PKC AI, AI Tool, AI Development, AI Engineering, AI Analysis, Python AI, LlamaCpp, CUBLAS, GGUF, AI Lab, AI Research, AI Project, AI Pipeline, Model Performance, AI Testing, Deep Learning, Neural Network, AI Framework, AI Visualization, AI Metrics, Local AI, Offline AI, AI Technology, ML Benchmark, PKC Tech

저작자표시 비영리 변경금지 (새창열림)

'AI MARK Benchmark > AI MARK Benchmark English Translation' 카테고리의 다른 글

GPT-20B vs ERNIE-21B LLM Benchmark Log Deep Comparison (0)	2025.11.05
LLaVA v1.5-7B LLM Benchmark: An In-Depth AI Performance Analysis Report (0)	2025.11.03
Llm Llama 3.2-8B: 8GB VRAM Limits Smashed (RTX 2060 S) (0)	2025.10.30
RTX 2060 SUPER + Llama 3B VS 8B Korean Real-World Benchmark: Speed, VRAM, and Analysis (0)	2025.10.22
LLM Llama3 8B + Sentiment Analyzer vs. Community 4-bit Benchmark Comparison (RTX 2060 8GB benchmark Performance) (0)	2025.10.22

현재글PKC Benchmark Tool MARK - Unified LLM Performance Measurement Beta Version

PKC Project

LLM과 LLaMA 기반 멀티모달 챗봇을 RTX 2060 Super 8GB 보급형 PC에서 구현하며 남기는 PKC Project 기록 블로그.

PKC Project

AI Orchestrator

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

PKC Project