PKC Benchmark Tool MARK
A New Beta Version for Unified LLM Performance Measurement
As AI models continue to evolve rapidly, there is a growing need for a practical way to evaluate and compare the performance of local Large Language Models (LLMs).
This article introduces the PKC Benchmark Tool - MARK (Beta), providing an in-depth look into its architecture, benchmarking pipeline, and test logic.
⬇️ ⬇️ ⬇️ Please read this section carefully before proceeding. ⬇️ ⬇️ ⬇️
⬇️ Download Links:
🚀 Download from this blog 📥
💾 Download from Google Drive 📥
1️⃣ Project Overview
PKC MARK Benchmark Tool (Beta) is a web-based benchmarking system that visualizes the real inference performance of local AI models.
It automatically detects Llama, Diffusers, and Transformers model families and measures performance using unified prompts and standardized metrics.
Supported Model Families:
- Llama (Language Model)
- Diffusers (Image Generation Model)
- Transformers / GGUF models
Main Metrics:
- VRAM usage (GB)
- Model load time (s)
- Token generation speed (TPS)
- Time to First Token (TTFT, ms)
- GPU power / temperature / CPU utilization







2️⃣ Key Features in the Beta Version
The beta release introduces several key improvements and new visualization features.
🔹 Real-Time Pipeline Visualization
Each benchmarking stage is logged and displayed by prompt, allowing users to track the end-to-end flow between analysis models and LLMs.
🔹 Automatic Model Scanning (via config.json)
Set your model folder path in the models_scan_path field.
The server will automatically scan and categorize models when launched.
Example:
{
"results_dir": "results",
"models_scan_path": "C:/MyModels"
}
🔹 Integrated Web UI (benchmark_canvas.html)
- TailwindCSS-based dark mode interface
- Real-time logs, charts, and comparison tabs
- Results automatically stored in browser LocalStorage
🔹 Pipeline Integration Testing
Analysis model results (e.g., emotion analysis) are automatically injected into LLM prompts, enabling end-to-end pipeline testing and quality tracking.
3️⃣ Server and Execution Structure
benchmark_server.py runs on FastAPI and handles the following operations:
- Model scanning and loading (scan_models_directory)
- GPU metrics monitoring (VRAM, Power, Temp via pynvml)
- Real-time pipeline event streaming (/api/stream)
- Benchmark execution (/api/run-benchmark)
- Result saving (JSON + HTML)
The server uses uvicorn, and on Windows, it can be launched instantly using start_server_windows.bat.
4️⃣ Benchmark Execution & Metrics Collection
Models can be tested in Sequential or Parallel mode:
- Sequential: Loads and tests models one by one (VRAM-efficient)
- Parallel: Keeps models cached for repeated runs (faster throughput)
Key metrics recorded:
- load_time_s — model loading time
- vram_usage_gb — GPU memory usage
- ttft_ms — time to first token
- tokens_per_second — inference speed
- inference_time_s — total generation time
5️⃣ Results and Analysis
All test results are automatically saved in both JSON and HTML formats under the results directory.
Each HTML report includes:
- Model-wise metrics for each prompt
- Output text and status indicators (✅ / ⚠️ / ❌)
- Llama.cpp acceleration details (AVX, CUBLAS, etc.)
Additionally, the web UI can load LocalStorage results into Comparison Mode to visualize performance differences between benchmark sessions.
6️⃣ Conclusion
The PKC Benchmark Tool MARK (Beta) provides a practical way to quantify LLM performance and visualize pipeline-based interactions.
Its integrated architecture enables real-world evaluation of AI model chains, from analysis to response generation.
This beta release focuses on model scanning, cache management, and VRAM auto-retry stability.
Future updates will introduce distributed benchmarking and auto-report generation features.
For questions, bug reports, or suggestions, feel free to leave a comment or contact the author.
Keywords:
AI, LLM, Llama, Transformers, Diffusers, Benchmark, PKC MARK, AI Benchmark, Model Test, Local LLM, Pipeline, FastAPI, VRAM, Inference, Token Speed, TTFT, GPU Benchmark, Machine Learning Performance, Open Source AI, Beta Version, PKC AI, AI Tool, AI Development, AI Engineering, AI Analysis, Python AI, LlamaCpp, CUBLAS, GGUF, AI Lab, AI Research, AI Project, AI Pipeline, Model Performance, AI Testing, Deep Learning, Neural Network, AI Framework, AI Visualization, AI Metrics, Local AI, Offline AI, AI Technology, ML Benchmark, PKC Tech