AI MARK Benchmark/AI MARK Benchmark English Translation

LLM Llama Benchmark Tool - Public Work Report

AI Orchestrator 2025. 9. 27. 14:49

LLM Llama Benchmark Tool - Public Work Report

PKC Benchmark Tool MARK, Here's How We've Built It So Far! (Reporter: chatgpt)

This document is a summary of the thoughts and features we've added from the very beginning of the project until now, so you can see it all at a glance. We prepared this for: > Everyone who uses our benchmark tool. We hope everyone finds it an easy and enjoyable read!

Performance Test Video (Korean): [Link Here]

How Far We've Come! (Milestone Summary)

■ v3: Building the Foundation

Perfect for Desktops!!
Clean results! Instead of CSV files, we made it display pretty tables and save them directly as HTML files.

■ v4: Solid to the Core!

(Smart Engine) It automatically detects and uses a good graphics card (CUDA, MPS) if available, otherwise it runs on the CPU.
(Safety First) We reinforced the program to prevent it from crashing even if issues occur when fetching graphics card information (power, temperature).
(Automation King) It now finds local model files more intelligently and we've added a filter function when searching for models on Hugging Face for greater convenience!
(User Experience Up!) We made it easier to enter the server address and added guidance for offline use. The tooltips have also been made more user-friendly.

■ v5: Now with One-Click!

It now unconditionally creates a dedicated virtual environment (.pkc-venv) to prevent conflicts with other programs.
On Windows, macOS, or Linux, just one click on a single file handles everything from installation to execution!

■ v5.3 ~ v5.4: A Prettier Interface

We slimmed down the input fields and aligned them to the right for a much cleaner look.
The number of prompt input fields was increased to three, and it now shows the number of models currently running for better convenience!

■ v5.5c: Language Policy Update

Korean/English now switches automatically based on browser settings, no switch needed. Please use your browser's translation feature for other languages!

■ v5.5c + F1: Major Feature Integration!

We combined Hugging Face model search/download, real-time progress checking, and local model finding into one.
We kept our original clean charts and convenient prompt input method!

■ v5.6: Now Multilingual!

We officially support Korean, English, Japanese, and Chinese (Simplified/Traditional).
It displays automatically based on the browser's language, and you can also specify the language directly in the address bar, like ?lang=en.

■ v5.6.1: Urgent Layout Hotfix!

We moved the prompt input area to the left and the local model selection area to the right to make it more user-friendly.
We also widened the spacing in the number input fields to make them easier to click!

So, what can it do now? (Current Features)

One-Click Execution: Unzip and click one file to run it immediately. Simple, right?
Model Management: You can search and download models directly from Hugging Face, as well as use models on your own computer.
Run Benchmark: Enter your desired text in the three prompt fields (blanks are skipped automatically!) and run it. The results will appear in real-time in the Summary, Log, and Chart tabs!
Save Results: You can save the current screen as an HTML file to use in reports.
Multilingual Support: Automatically displays in Korean, English, Japanese, or Chinese.

Actually, it has these advantages too! (Internal Quality)

Smart Device Selection: It automatically uses a good GPU if available, and smoothly switches to the CPU if not.
Real-time Streaming: Intermediate results and logs appear on the screen immediately without delay. No more frustrating waits!
Meticulous Performance Measurement: It records all the essential information, including performance (TTFT, TPS), memory, power, and temperature.
Robust Stability: The program won't crash, even if there are missing drivers or permission issues.

These are the recently changed files (Latest)

benchmark_server.py — Contains the core server functions, including Hugging Face integration and real-time log transmission.
benchmark_canvas.html — The screen we see! All design elements like multilingual support and layout are in here.
install_wizard.py — The wizard file that helps with the one-click installation.
requirements.txt — A list of the required libraries.
setup. / start_server. / OneClick_*** — Script files that enable one-click execution.

There are still a few things to fix (Known Issues)

Screen Tearing: There's an issue where the card order can get slightly mixed up depending on the screen size. We will definitely fix this in the next version (v5.6.2)!
Finding Missing Translations: If you see any untranslated parts, please let us know! We'll add them right away.
More Powerful Search Filters: We plan to add features to sort models or filter by file size in the future.

How to use is really simple!

Unzip the file and run the file that starts with OneClick. Your browser will pop open.
Feel free to enter any text you want to test in the three prompt fields.
Press the Run Benchmark button, and you're done! Check the results in real-time on the chart, and when it's finished, press the Save as HTML button.

If you have any questions, feel free to ask!

PKC Blog: https://pkc0412.tistory.com/
Email: pkc0412@gmail.com

by: chatgpt

The public version of the benchmark tool will be available on the blog. Scheduled for public release upon completion.

RTX 2060 SUPER + Llama 3B VS 8B Korean Real-World Benchmark: Speed, VRAM, and Analysis (0)	2025.10.22
LLM Llama3 8B + Sentiment Analyzer vs. Community 4-bit Benchmark Comparison (RTX 2060 8GB benchmark Performance) (0)	2025.10.22
Help me finish my LLM benchmark tool – I’m too lazy! (0)	2025.10.12
PKC MARK Benchmark Tool Current Progress (0)	2025.10.04
PKC Benchmark Tool MARK (Public Edition) Analysis Report (0)	2025.09.27

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

PKC Project