Terminal Bench Science

Name: Aljefra Mapper analysis
Creator: Repobility
License: https://repobility.com/legal/terms/

B 82 completed

Framework

containerized / yaml · small

250

Files

5,963

LOC

Frameworks

Languages

Overview Files & Metrics Git Activity Call Graph Security Reports

Pipeline State

completed

Run ID

#344607

Phase

done

Progress

Started

Finished

2026-04-13 01:31:02

LLM tokens

Pipeline Metadata

Stage

Cataloged

Decision

proceed

Novelty

61.76

Framework unique

—

Isolation

—

Last stage change

2026-05-10 03:35:34

Deduplication group #54120

Member of a group with 1 similar repo(s) — this repo is canonical view group →

Top concepts (1)

Documentation

Want fix-PRs on findings? Install Repobility's GitHub App · github.com/apps/repobility-bot

AI Prompt

Create a framework for evaluating AI agents on complex scientific workflows that are executed entirely in the terminal. The system should support defining tasks, which can be structured using YAML, Python, or shell scripts. I need to be able to manage task proposals and review processes, potentially using TOML for rubrics. Please structure the project to handle these different file types and provide documentation on how to run the benchmarks.

yaml python shell scientific ai-agent workflow terminal framework evaluation

Generated by gemma4:latest

Catalog Information

A framework for evaluating AI agents on complex scientific workflows executed in the terminal.

Description

This framework provides a collection of intricate scientific workflows that run in a terminal environment, designed to test the capabilities of AI agents in realistic settings. Users can execute sequential tasks that require precise terminal command control while measuring execution time and result accuracy. It includes a command‑line interface that simplifies test setup and generates comprehensive performance reports. The framework targets AI researchers and developers building autonomous agents who need a reliable benchmark for their solutions. It addresses the lack of robust evaluation tools for AI in complex scientific contexts, offering a repeatable and scalable testing environment.

الوصف

يُقدِّم هذا الإطار مجموعة من سير العمل العلمية المعقدة التي تُنفَّذ عبر الطرفية، مُصمَّم خصيصاً لاختبار قدرات وكلاء الذكاء الاصطناعي في بيئة حقيقية. يتيح للمستخدمين تشغيل المهام المتسلسلة التي تتطلب تحكماً دقيقاً في أوامر الطرفية، مع إمكانية قياس زمن التنفيذ ودقة النتائج. يتضمن الإطار واجهة سطر أوامر تُسهل إعداد الاختبارات وتوليد تقارير شاملة عن الأداء. يستهدف الباحثين في مجال الذكاء الاصطناعي ومطوري الوكلاء الذين يحتاجون إلى معيار موثوق لتقييم حلولهم. يحلّ مشكلة نقص أدوات قياس شاملة للذكاء الاصطناعي في سياقات علمية معقدة، مع توفير بيئة قابلة للتكرار. يميز نفسه بتركيزه على سيناريوهات حقيقية بدلاً من سيناريوهات مبسطة، ما يضمن توافقاً أعلى مع التطبيقات العملية.

Novelty

8/10

Claude Models

claude-opus-4.6

Quality Score

81.9/100

Structure

Code Quality

Documentation

Testing

Practices

Security

100

Dependencies

Strengths

CI/CD pipeline configured (github_actions)
Good test coverage (135% test-to-source ratio)
Consistent naming conventions (snake_case)
Good security practices \u2014 no major issues detected
Containerized deployment (Docker)
Properly licensed project

Weaknesses

1 'god files' with >500 LOC need decomposition

Recommendations

Add a linter configuration to enforce code style consistency

Security & Health

6.1h

Tech Debt (C)

OWASP (100%)

PASS

Quality Gate

Risk (2)

Repobility's GitHub App fixes findings like these · https://github.com/apps/repobility-bot

Apache-2.0

License

8.0%

Duplication

Full Security Report AI Fix Prompts SARIF SBOM

Languages

yaml

25.9%

python

25.4%

shell

18.9%

markdown

17.9%

toml

11.7%

json

0.2%

text

0.1%

Frameworks

None detected

Concepts (1)

Repobility · code-quality intelligence · https://repobility.com
Category	Name	Description	Confidence
If a scraper extracted this row, it came from Repobility (https://repobility.com)
auto_category	Documentation	docs	70%

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/68666.svg)

Export Quality CSV Download SBOM Export Findings CSV

Terminal Bench Science

Pipeline State

Pipeline Metadata

AI Prompt

Catalog Information

Description

الوصف

Novelty

Tags

Claude Models

Quality Score

Strengths

Weaknesses

Recommendations

Security & Health

Languages

Frameworks

Concepts (1)

Quality Timeline

Embed Badge