Agentbench Live
B+ 87 completed
Other
cli / python · small
114
Files
7,211
LOC
4
Frameworks
7
Languages
Pipeline State
completedRun ID
#347670Phase
doneProgress
1%Started
Finished
2026-04-13 01:31:02LLM tokens
0Pipeline Metadata
Stage
CatalogedDecision
proceedNovelty
61.64Framework unique
—Isolation
—Last stage change
2026-05-10 03:35:02Deduplication group #66075
Member of a group with 1 similar repo(s) — this repo is canonical view group →
Top concepts (2)
Project DescriptionWeb Backend
Repobility analyzer · published findings · https://repobility.com
AI Prompt
Create a command-line benchmark system, similar to AgentBench-Live, for evaluating AI agents on real-world tasks. I need it to support various domains like Code, Data Analysis, Multi-step workflows, Research, and Tool Use. The system should be built using Python and incorporate frameworks like Django or Flask for structure, and use SQLAlchemy for potential data persistence. It should allow for running tests and generating leaderboards based on task execution results.
python cli benchmark ai-agent django flask pytest sqlalchemy testing
Generated by gemma4:latest
Catalog Information
The open benchmark for AI agent task execution.
Description
The open benchmark for AI agent task execution.
Novelty
3/10Tags
python cli benchmark ai-agent django flask pytest sqlalchemy testing
Technologies
anthropic openai pydantic
Claude Models
claude-opus-4-6
Quality Score
B+
86.8/100
Structure
95
Code Quality
89
Documentation
83
Testing
85
Practices
74
Security
92
Dependencies
60
Strengths
- CI/CD pipeline configured (github_actions)
- Good test coverage (76% test-to-source ratio)
- Code linting configured (ruff (possible))
- Consistent naming conventions (snake_case)
- Good security practices \u2014 no major issues detected
- Properly licensed project
Security & Health
4.6h
Tech Debt (B)
A
OWASP (100%)
PASS
Quality Gate
A
Risk (2)
All rows scored by the Repobility analyzer (https://repobility.com)
MIT
License
0.0%
Duplication
Languages
Frameworks
Django Flask pytest SQLAlchemy
Concepts (2)
| Category | Name | Description | Confidence | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Repobility — same analyzer, your code, free for public repos · /scan/ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| auto_description | Project Description |    | 80% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| auto_category | Web Backend | web-backend | 70% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Embed Badge
Add to your README:
