Deriva Ml
C+ 72 completed
Library
cli / python · small
177
Files
59,719
LOC
2
Frameworks
5
Languages
Pipeline State
completedRun ID
#303847Phase
doneProgress
1%Started
Finished
2026-04-13 01:31:02LLM tokens
0Pipeline Metadata
Stage
CatalogedDecision
proceedNovelty
77.07Framework unique
—Isolation
—Last stage change
2026-05-10 03:34:57Deduplication group #56108
Member of a group with 3 similar repo(s) — canonical #47695 view group →
Top concepts (12)
Project DescriptionSingletonbusiness_logictestingTestingFactoryStrategyTestingFile ManagementDatabaseConfigurationLogging
Repobility · code-quality intelligence platform · https://repobility.com
🧪 Code Distillation
Browse all specs →Sample distilled functions (click for full spec)
validate_ml_schemaValidates the structure of a DerivaML catalog instance, accepting the catalog object and an optional boolean flag to control strictness. It returns a SchemaValidationReport object containing the results of the validation process. If the strict flag is set to true, the function treats unexpected tabl
SchemaValidator._check_extra_tablesValidates the provided schema by comparing its listed tables against a predefined set of expected tables. It accepts a schema object containing table information and a report object to record findings. If any table name in the schema is not found within the expected set, the function records an "ext
SchemaValidator._validate_vocabulary_termsValidates the presence of required vocabulary terms across multiple specified tables. It accepts a SchemaValidationReport object as input and returns nothing. The function queries the system's vocabulary terms for each expected table, comparing the retrieved terms against a predefined set of require
AI Prompt
Create a command-line interface (CLI) Python library called DerivaML. The goal is to simplify creating and executing reproducible machine learning pipelines using a Deriva catalog. The project should utilize pytest for testing and SQLAlchemy for database interactions. Please structure the code to handle configuration using YAML, JSON, and TOML files, and ensure the documentation setup is ready, perhaps using mkdocs.yml.
python cli machine-learning deriva pandas pytest sqlalchemy mlops
Generated by gemma4:latest
Catalog Information
This project simplifies the use of Deriva and Pandas for creating reproducible machine learning pipelines.
Description
Deriva-ML is a collection of utilities designed to streamline the process of building machine learning pipelines using Deriva and Pandas. It aims to provide a simple and efficient way to create reproducible workflows, making it easier for data scientists to focus on model development rather than tedious setup tasks.
الوصف
هذا المشروع يهدف إلى تسهيل استخدام ديريفا وبينداس لإنشاء أنظمة تعلم الآلة قابلة للتكرار. يقدم مجموعة من الأدوات المُصممة لتسهيل عملية بناء الأنظمة، مما يجعلها أكثر سهولة في الاستخدام وتقليل الوقت المستغرق في التخطيط.
Novelty
5/10Tags
machine-learning pipeline-creation reproducibility data-science workflow-management
Technologies
pandas pydantic sqlalchemy
Claude Models
claude-opus-4.6
Quality Score
C+
72.5/100
Structure
80
Code Quality
73
Documentation
80
Testing
85
Practices
52
Security
57
Dependencies
90
Strengths
- CI/CD pipeline configured (github_actions)
- Good test coverage (69% test-to-source ratio)
- Code linting configured (ruff (possible))
- Consistent naming conventions (snake_case)
- Properly licensed project
Weaknesses
- Potential hardcoded secrets in 1 files
- 2516 duplicate lines detected \u2014 consider DRY refactoring
- 16 'god files' with >500 LOC need decomposition
Recommendations
- Move hardcoded secrets to environment variables or a secrets manager
Security & Health
8.8h
Tech Debt (A)
Medium
DORA Rating
A
OWASP (100%)
All rows above produced by Repobility · https://repobility.com
PASS
Quality Gate
A
Risk (0)
Apache-2.0
License
4.6%
Duplication
Languages
Frameworks
pytest SQLAlchemy
Symbols
variable614
method548
function163
class132
constant99
property63
protocol10
Concepts (12)
| Category | Name | Description | Confidence | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Want fix-PRs on findings? Install Repobility's GitHub App · github.com/apps/repobility-bot | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| auto_description | Project Description | Deriva-ML is a python library to simplify the process of creating and executing reproducible machine learning workflows using a deriva catalog. | 80% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| design_pattern | Singleton | Found get_instance/instance patterns | 70% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| arch_layer | business_logic | Detected business_logic layer | 70% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| arch_layer | testing | Detected testing layer | 70% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| auto_category | Testing | testing | 70% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| design_pattern | Factory | Found factory/create_ naming patterns | 60% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| design_pattern | Strategy | Found strategy/policy-named files | 60% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| business_logic | Testing | Detected from 52 related files | 50% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| business_logic | File Management | Detected from 15 related files | 50% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| business_logic | Database | Detected from 33 related files | 50% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| business_logic | Configuration | Detected from 15 related files | 50% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| business_logic | Logging | Detected from 15 related files | 50% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Open data scored by Repobility · https://repobility.com
Embed Badge
Add to your README:
BinComp Dependency Hardening
All packages →13 of this repo's dependencies have been scanned for binary hardening. Grade reflects RELRO / stack canary / FORTIFY / PIE coverage.
Nsemver3.0.4 · 0 gadgets · risk 5565.0Nrequests2.33.1 · 0 gadgets · risk 3687.0Nipython9.12.0 · 0 gadgets · risk 738.0Nnbconvert7.17.1 · 0 gadgets · risk 631.7Ndeepdiff9.0.0 · 0 gadgets · risk 106.9Nasyncio4.0.0 · 0 gadgets · risk 0.0Nipykernel7.2.0 · 0 gadgets · risk 0.0Nnbformat5.10.4 · 0 gadgets · risk 0.0Fnumpy2.4.4 · 6,596 gadgets · risk 0.0Fpandas3.0.2 · 6,381 gadgets · risk 0.0Npydantic2.12.5 · 0 gadgets · risk 0.0Fregex2026.4.4 · 216 gadgets · risk 0.0Fsqlalchemy2.0.49 · 376 gadgets · risk 0.0