Dmt Eval

C+ 76 completed
Library
unknown / python · small
78
Files
14,148
LOC
1
Frameworks
4
Languages

Pipeline State

completed
Run ID
#359099
Phase
done
Progress
1%
Started
Finished
2026-04-13 01:31:02
LLM tokens
0

Pipeline Metadata

Stage
Skipped
Decision
skip_scaffold_dup
Novelty
42.67
Framework unique
Isolation
Last stage change
2026-04-16 18:15:42
Deduplication group #47778
Member of a group with 1 similar repo(s) — canonical #22814 view group →
Top concepts (2)
Project DescriptionTesting
Powered by Repobility — scan your code at https://repobility.com

AI Prompt

Create a universal validation framework in Python called dmt-eval. I need it to assess data quality, model performance, and test coverage for various computational models. The framework should support model-agnostic adapters to evaluate different types of models, generate structured narrative reports (like LabReports), and allow for parameterized measurement sweeps. The core functionality should allow evaluating a model against a dataset using metrics like accuracy and latency, and finally generating a structured Markdown report. Please ensure it uses pytest for testing.
python pytest ai-evaluation llm framework data-validation scientific-computing testing
Generated by gemma4:latest

Catalog Information

A universal validation framework that uses large language models to assess data quality, model performance, and test coverage.

Description

This framework provides a unified approach to validate data sets, evaluate machine‑learning models, and analyze test coverage using large language models. It offers a modular test‑case engine that can be scripted in Python or invoked from the command line, and it integrates with pandas for data manipulation. By leveraging OpenAI and Anthropic APIs, it generates detailed reports highlighting strengths, weaknesses, and actionable insights. The tool is designed for data scientists, ML engineers, and QA teams who need automated, repeatable validation across the entire development pipeline. It addresses common pain points such as inconsistent data, hidden model biases, and incomplete test suites, helping teams deliver higher‑quality products faster.

الوصف

يُقدّم هذا الإطار حلاً شاملاً لتقييم جودة البيانات، أداء النماذج، وتغطية الاختبارات باستخدام نماذج اللغة الكبيرة. يتيح للمستخدمين إنشاء مجموعات اختبار مخصصة تُطبق على مجموعات البيانات أو نماذج التعلم الآلي، مع إمكانية دمج تحليلات متقدمة عبر مكتبة pandas. يعتمد على واجهات برمجية للذكاء الاصطناعي مثل OpenAI وAnthropic لتوليد تقارير تفصيلية تُظهر نقاط القوة والضعف في كل مرحلة. يُسهل التكامل مع خطوط الأنابيب CI/CD، مما يضمن فحصًا تلقائيًا قبل نشر أي تحديث. يستهدف المهندسين الذين يعملون على تطوير نماذج أو أنظمة بيانات، ويساعدهم على تقليل الأخطاء وتحسين موثوقية المنتجات. يبرز عن الحلول التقليدية بقدرته على الجمع بين التحقق اليدوي والذكاء الاصطناعي في إطار موحد.

Novelty

8/10

Tags

data-validation model-evaluation test-coverage llm-powered-assessment automation quality-assurance universal-framework

Technologies

anthropic openai pandas

Claude Models

claude-opus-4.6

Quality Score

C+
75.9/100
Structure
76
Code Quality
94
Documentation
64
Testing
60
Practices
63
Security
90
Dependencies
60

Strengths

  • Good test coverage (44% test-to-source ratio)
  • Code linting configured (ruff (possible))
  • Consistent naming conventions (snake_case)
  • Good security practices \u2014 no major issues detected

Weaknesses

  • No LICENSE file \u2014 legal ambiguity for contributors
  • No CI/CD configuration \u2014 manual testing and deployment
  • Potential hardcoded secrets in 1 files
  • 328 duplicate lines detected \u2014 consider DRY refactoring

Recommendations

  • Set up CI/CD (GitHub Actions recommended) to automate testing and deployment
  • Add a LICENSE file (MIT recommended for open source)
  • Move hardcoded secrets to environment variables or a secrets manager

Security & Health

5.1h
Tech Debt (A)
A
OWASP (100%)
PASS
Quality Gate
A
Risk (1)
Generated by Repobility's multi-pass static-analysis pipeline (https://repobility.com)
MIT
License
4.1%
Duplication
Full Security Report AI Fix Prompts SARIF SBOM

Languages

python
94.6%
markdown
3.3%
toml
1.1%
shell
1.1%

Frameworks

pytest

Concepts (2)

Open methodology · Repobility · https://repobility.com/research/
CategoryNameDescriptionConfidence
Powered by Repobility — scan your code at https://repobility.com
auto_descriptionProject DescriptionData, Models, Tests — universal validation framework for the age of AI agents.80%
auto_categoryTestingtesting70%

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/83242.svg)
Quality BadgeSecurity Badge
Export Quality CSVDownload SBOMExport Findings CSV