Youtube Content Pipeline

C 60 completed
Api
unknown / python · small
95
Files
18,775
LOC
2
Frameworks
7
Languages

Pipeline State

completed
Run ID
#362249
Phase
done
Progress
1%
Started
Finished
2026-04-13 01:31:02
LLM tokens
0

Pipeline Metadata

Stage
Skipped
Decision
skip_scaffold_dup
Novelty
53.67
Framework unique
Isolation
Last stage change
2026-04-16 18:15:42
Deduplication group #48867
Member of a group with 1 similar repo(s) — canonical #68026 view group →
Top concepts (2)
Project DescriptionWeb Backend
Open data scored by Repobility · https://repobility.com

AI Prompt

Create a production-grade web API using FastAPI for YouTube content processing. I need it to handle video transcription by first getting the transcript and then saving the results to MongoDB. The system should support automatic cookie management from Chrome and use Whisper for transcription, with a fallback mechanism. Key features to include are REST API endpoints, API key authentication, rate limiting using Redis, and comprehensive Prometheus metrics for monitoring. Additionally, I need functionality for channel tracking and integration with an MCP server.
python fastapi mongodb whisper rest-api prometheus youtube api-key redis ai-integration
Generated by gemma4:latest

Catalog Information

A web API that extracts YouTube video content, transcribes it, and stores the results in MongoDB for easy retrieval.

Description

The project provides a RESTful API that accepts YouTube video URLs, downloads the audio and video streams, and runs automatic speech recognition to produce a transcript. It stores the original video metadata, the transcript, and related timestamps in a MongoDB collection for quick querying. The service is built with a lightweight web framework, enabling fast deployment and horizontal scaling. Target users include content creators, media analysts, and data engineers who need structured text from video content. It solves the problem of manual transcription and data ingestion by automating the entire pipeline.

الوصف

يقدم المشروع واجهة برمجية RESTful تستقبل روابط فيديوهات يوتيوب، وتحميل تدفقات الصوت والفيديو، ثم تشغيل تقنية التعرف على الكلام تلقائياً لإنتاج نص. تُخزن بيانات الفيديو الأصلية، والنص المترجم، والوقت المقابل لكل جزء في مجموعة MongoDB لتسهيل الاستعلام السريع. يُبنى هذا النظام على إطار عمل خفيف الوزن يتيح نشره بسرعة وتوسيع نطاقه أفقيًا. يستهدف المستخدمين المحتملين منشئي المحتوى، ومحللي الإعلام، ومهندسي البيانات الذين يحتاجون إلى نص منظم من محتوى الفيديو. يحل المشروع مشكلة النسخ اليدوي وإدخال البيانات من خلال أتمتة كامل سلسلة المعالجة. يميز المشروع بقدرة على التعامل مع مقاطع فيديو طويلة وتوفير نتائج دقيقة مع تخزين مرن. كما يتيح للمستخدمين البحث السريع عبر النصوص المترجمة لتسهيل التحليل والبحث.

Novelty

6/10

Tags

video-extraction transcription content-ingestion data-storage searchable-text automation media-analytics

Technologies

fastapi huggingface numpy pydantic pytorch rich typer uvicorn

Claude Models

claude-opus-4.6 claude-opus-4.5

Quality Score

C
60.0/100
Structure
68
Code Quality
54
Documentation
64
Testing
50
Practices
57
Security
72
Dependencies
60

Strengths

  • Code linting configured (ruff (possible))
  • Consistent naming conventions (snake_case)

Weaknesses

  • No LICENSE file \u2014 legal ambiguity for contributors
  • No CI/CD configuration \u2014 manual testing and deployment
  • Potential hardcoded secrets in 2 files
  • 2177 duplicate lines detected \u2014 consider DRY refactoring
  • 4 'god files' with >500 LOC need decomposition

Recommendations

  • Set up CI/CD (GitHub Actions recommended) to automate testing and deployment
  • Add a LICENSE file (MIT recommended for open source)
  • Move hardcoded secrets to environment variables or a secrets manager

Security & Health

6.6h
Tech Debt (A)
A
OWASP (100%)
PASS
Quality Gate
A
Risk (1)
All rows above produced by Repobility · https://repobility.com
Unknown
License
9.7%
Duplication
Full Security Report AI Fix Prompts SARIF SBOM

Languages

python
70.2%
markdown
21.4%
shell
7.6%
toml
0.4%
yaml
0.3%
json
0.1%
text
0.0%

Frameworks

FastAPI pytest

Concepts (2)

Findings curated by Repobility · https://repobility.com
CategoryNameDescriptionConfidence
Same scanner, your repo: https://repobility.com — Repobility
auto_descriptionProject DescriptionA production-grade API for YouTube video transcription and transcript management. Features automatic cookie management, Whisper fallback, REST API with authentication, rate limiting, Prometheus metrics, MCP integration, and channel tracking.80%
auto_categoryWeb Backendweb-backend70%

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/86409.svg)
Quality BadgeSecurity Badge
Export Quality CSVDownload SBOMExport Findings CSV