Model Arena

D 54 completed

Web App

containerized / python · tiny

Files

1,468

LOC

Frameworks

Languages

Overview Files & Metrics Git Activity Call Graph Security Reports

Pipeline State

completed

Run ID

#369762

Phase

done

Progress

Started

Finished

2026-04-13 01:31:02

LLM tokens

Pipeline Metadata

Stage

Skipped

Decision

skip_scaffold_dup

Novelty

35.23

Framework unique

—

Isolation

—

Last stage change

2026-04-16 18:15:42

Deduplication group #52266

Member of a group with 1 similar repo(s) — canonical #89897 view group →

Top concepts (2)

Project DescriptionWeb Backend

Repobility · code-quality intelligence platform · https://repobility.com

AI Prompt

Create a self-hosted web tool for blind AI model comparison and ranking, similar to Chatbot Arena. The tool should allow users to enter a prompt and select a category, then have two different models respond simultaneously via real-time streaming. After viewing the side-by-side responses, users should be able to vote ("A Wins", "Tie", or "B Wins"). Key features to include are an ELO leaderboard that tracks rankings, support for multiple OpenAI-compatible APIs (like OpenAI, Anthropic, or Ollama), and the ability to configure models via a YAML file without changing code. The frontend should be simple, using vanilla JavaScript, and the deployment should be easy with Docker Compose.

python fastapi web-tool ai-comparison elo-ranking docker streaming javascript yaml

Generated by gemma4:latest

Catalog Information

A self-hosted web tool that lets teams compare AI models blind and rank them using ELO.

Description

Model Arena is a lightweight, self-hosted web application that enables teams to compare two AI models side‑by‑side on the same prompt without revealing their identities. It streams responses in real time, allowing users to vote on which model performs better. The platform tracks performance with an ELO leaderboard, supports multiple OpenAI‑compatible providers, and estimates cost per response. Users can configure models via a simple YAML file and run the service with a single Docker command. Model Arena is ideal for internal model evaluation, budgeting, and unbiased benchmarking.

الوصف

تُعد أداة Model Arena منصة ويب خفيفة الوزن يمكن نشرها محلياً، وتتيح للفرق مقارنة نماذج الذكاء الاصطناعي بشكل خفي على نفس السؤال. تُظهر الأداة ردّ كل نموذج جنباً إلى جنب مع تدفق النتائج في الوقت الحقيقي، ما يتيح للمستخدمين التصويت على الأفضل دون معرفة هوية النموذج. تُحسب تصنيفات الأداء باستخدام نظام ELO، مع إمكانية تصفية النتائج حسب الفئة (عام، برمجة، استدلال، إبداعي). يدعم التطبيق مزودات متعددة متوافقة مع واجهة OpenAI، ويُقدّر تكلفة كل رد بناءً على تكاليف المزود. يُمكن تكوين النماذج عبر ملف YAML بسيط، ويُشغَّل التطبيق بأمر Docker واحد فقط. يضمن النظام حفظ سرية المطالبات، ويُسجِّل كل تصويت مع سجل كامل لتغييرات ELO. تُعد هذه الأداة حلاً مثالياً لتقييم النماذج داخلياً، وتخطيط الميزانية، وتحليل الأداء بدون تحيز.

Novelty

7/10

Technologies

fastapi openai uvicorn

Claude Models

claude-opus-4.6

Quality Score

53.5/100

Structure

Code Quality

Documentation

Testing

Practices

Security

Dependencies

Strengths

Consistent naming conventions (snake_case)
Good security practices \u2014 no major issues detected
Containerized deployment (Docker)

Weaknesses

No LICENSE file \u2014 legal ambiguity for contributors
No tests found \u2014 high risk of regressions
No CI/CD configuration \u2014 manual testing and deployment

Recommendations

Add a test suite \u2014 start with critical path integration tests
Set up CI/CD (GitHub Actions recommended) to automate testing and deployment
Add a linter configuration to enforce code style consistency
Add a LICENSE file (MIT recommended for open source)

Security & Health

4.1h

Tech Debt (D)

OWASP (100%)

PASS

Quality Gate

Risk (7)

Methodology: Repobility · https://repobility.com/research/state-of-ai-code-2026/

Unknown

License

0.7%

Duplication

Full Security Report AI Fix Prompts SARIF SBOM

Languages

python

36.8%

css

27.8%

javascript

19.5%

html

8.8%

markdown

5.8%

yaml

0.9%

text

0.4%

Frameworks

FastAPI

Concepts (2)

Source: Repobility analyzer (https://repobility.com)
Category	Name	Description	Confidence
Repobility analyzer · published findings · https://repobility.com
auto_description	Project Description	A self-hosted blind AI model comparison tool with ELO rankings. Inspired by Chatbot Arena (LMSYS) — a lightweight, self-hosted alternative for internal/private model evaluation.	80%
auto_category	Web Backend	web-backend	70%

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/93966.svg)

Export Quality CSV Download SBOM Export Findings CSV

Model Arena

Pipeline State

Pipeline Metadata

AI Prompt

Catalog Information

Description

الوصف

Novelty

Tags

Technologies

Claude Models

Quality Score

Strengths

Weaknesses

Recommendations

Security & Health

Languages

Frameworks

Concepts (2)

Quality Timeline

Embed Badge