Vlm

D 52 completed
Api
containerized / python · tiny
37
Files
2,810
LOC
1
Frameworks
5
Languages

Pipeline State

completed
Run ID
#370846
Phase
done
Progress
1%
Started
Finished
2026-04-13 01:31:02
LLM tokens
0

Pipeline Metadata

Stage
Skipped
Decision
skip_scaffold_dup
Novelty
38.52
Framework unique
Isolation
Last stage change
2026-04-16 18:15:42
Deduplication group #47702
Member of a group with 1 similar repo(s) — canonical #29960 view group →
Top concepts (2)
Project DescriptionWeb Backend
Repobility · MCP-ready · https://repobility.com

AI Prompt

Create an end-to-end Vision-Language Model pipeline for understanding temporal video data related to warehouse packaging operations. I need this built using FastAPI for the API, and it should handle video clip prediction via a POST endpoint. The system needs components for data loading, specifically using the OpenPack dataset, and must include scripts for both fine-tuning using QLoRA and evaluating the model using metrics like OCA, tIoU, and AA@1. Please structure the deployment using `docker-compose.yml` and provide the necessary Python scripts for the data pipeline and evaluation.
python fastapi vlm video-analysis machine-learning docker openpack nlp computer-vision
Generated by gemma4:latest

Catalog Information

A vision‑language API that classifies and predicts warehouse packaging operations from video clips.

Description

The service exposes a FastAPI endpoint that accepts short video clips of warehouse packaging operations and returns structured predictions about the operation type, its temporal boundaries, and the anticipated next step. It leverages a fine‑tuned Qwen2.5‑VL‑2B model, trained with QLoRA on the OpenPack dataset, to understand both visual content and textual labels. The pipeline includes motion‑adaptive frame sampling to capture key moments around operation transitions, improving temporal precision. Target users are logistics engineers and warehouse automation teams seeking real‑time analytics and predictive insights. The system addresses the need for accurate, low‑latency operation recognition in industrial video streams, reducing manual monitoring effort.

الوصف

يقدم هذا النظام نقطة نهاية FastAPI تستقبل مقاطع فيديو قصيرة تُظهر عمليات التعبئة في المستودعات وتعيد تنبؤات منظمة حول نوع العملية، والحدود الزمنية لها، والخطوة التالية المتوقعة. يعتمد على نموذج Qwen2.5‑VL‑2B مُحسَّن باستخدام QLoRA على مجموعة بيانات OpenPack لفهم المحتوى البصري والملصقات النصية معاً. يتضمن خط الأنابيب اختيار إطارات معتمد على الحركة لتسليط الضوء على اللحظات الرئيسية حول انتقالات العمليات، ما يحسن الدقة الزمنية. يستهدف المهندسين في مجال اللوجستيات وفرق أتمتة المستودعات الذين يحتاجون تحليلات فورية ورؤى تنبؤية. يحل النظام مشكلة الحاجة إلى التعرف الدقيق على العمليات في تدفقات الفيديو الصناعية مع زمن استجابة منخفض، مما يقلل الجهد اليدوي في المراقبة.

Novelty

7/10

Tags

vision-language temporal-video-understanding logistics-automation operation-classification motion-sampling fine-tuning warehouse-packaging

Technologies

fastapi huggingface numpy pandas pytorch scikit-learn scipy uvicorn

Claude Models

claude-opus-4.6

Quality Score

D
52.4/100
Structure
43
Code Quality
65
Documentation
51
Testing
0
Practices
77
Security
84
Dependencies
60

Strengths

  • Consistent naming conventions (snake_case)
  • Good security practices \u2014 no major issues detected
  • Containerized deployment (Docker)

Weaknesses

  • No LICENSE file \u2014 legal ambiguity for contributors
  • No tests found \u2014 high risk of regressions
  • No CI/CD configuration \u2014 manual testing and deployment
  • 165 duplicate lines detected \u2014 consider DRY refactoring
  • 1 'god files' with >500 LOC need decomposition

Recommendations

  • Add a test suite \u2014 start with critical path integration tests
  • Set up CI/CD (GitHub Actions recommended) to automate testing and deployment
  • Add a linter configuration to enforce code style consistency
  • Add a LICENSE file (MIT recommended for open source)

Security & Health

4.1h
Tech Debt (C)
A
OWASP (100%)
PASS
Quality Gate
A
Risk (4)
Provenance: Repobility (https://repobility.com) — every score reproducible from /scan/
Unknown
License
7.4%
Duplication
Full Security Report AI Fix Prompts SARIF SBOM

Languages

python
67.3%
json
22.9%
markdown
7.6%
yaml
1.3%
text
0.9%

Frameworks

FastAPI

Concepts (2)

Analysis by Repobility (https://repobility.com) · MCP-ready
CategoryNameDescriptionConfidence
Methodology: Repobility · https://repobility.com/research/state-of-ai-code-2026/
auto_descriptionProject DescriptionEnd-to-end Vision-Language Model pipeline for temporal video understanding in warehouse packaging operations, built on Qwen2.5-VL-2B with QLoRA fine-tuning on the OpenPack dataset.80%
auto_categoryWeb Backendweb-backend70%

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/95053.svg)
Quality BadgeSecurity Badge
Export Quality CSVDownload SBOMExport Findings CSV