Youtube Transcriber

B 81 completed
Web App
library / python · small
88
Files
5,878
LOC
3
Frameworks
9
Languages

Pipeline State

completed
Run ID
#352005
Phase
done
Progress
1%
Started
Finished
2026-04-13 01:31:02
LLM tokens
0

Pipeline Metadata

Stage
Cataloged
Decision
proceed
Novelty
69.60
Framework unique
Isolation
Last stage change
2026-05-10 03:35:17
Deduplication group #54150
Member of a group with 2 similar repo(s) — canonical #75018 view group →
Top concepts (2)
Project DescriptionWeb Backend
Repobility · severity-and-effort ranking · https://repobility.com

AI Prompt

Build me a web application using Python and FastAPI that acts as a YouTube transcriber. I need it to allow users to submit either a single YouTube video URL or a channel URL for batch processing. The system must transcribe the audio, generate a summary for the content, and provide a dedicated interface for running semantic search across the transcript chunks. Please ensure the architecture supports viewing a job queue and status updates, and include necessary setup instructions using Docker Compose.
python fastapi youtube transcription summarization vector-search web-app docker
Generated by gemma4:latest

Catalog Information

A service that downloads YouTube audio, transcribes it, summarizes the content, and enables vector-based search of the transcripts.

Description

The application retrieves audio from YouTube videos and processes it through a fast, high‑accuracy transcription engine. The resulting text is then summarized using a large language model to produce concise, readable overviews. All transcripts and summaries are stored in a PostgreSQL database with vector embeddings for semantic search. Users can query the system via a RESTful API to retrieve full transcripts, summaries, or search results based on keyword or semantic similarity. The architecture leverages asynchronous task queues to handle large volumes of requests efficiently.

الوصف

يستخرج التطبيق الصوت من مقاطع فيديو يوتيوب ويعالجها عبر محرك نسخ سريع ودقيق. ثم يتم تلخيص النص الناتج باستخدام نموذج لغة كبير لإنتاج ملخصات مختصرة وسهلة القراءة. تُخزن جميع النصوص والملخصات في قاعدة بيانات PostgreSQL مع تضمينات متجهية للبحث الدلالي. يمكن للمستخدمين استعلام النظام عبر واجهة برمجة تطبيقات RESTful لاسترجاع النص الكامل، أو الملخصات، أو نتائج البحث بناءً على الكلمات المفتاحية أو التشابه الدلالي. يضمن البنية التحتية القائمة على قوائم مهام غير متزامنة معالجة أحجام كبيرة من الطلبات بكفاءة. يهدف هذا الحل إلى تسهيل الوصول إلى محتوى الفيديوهات الطويلة وتسهيل استكشاف المواضيع داخلها. كما يوفر واجهة سهلة الاستخدام للباحثين والمبدعين والمهنيين في مجال الوصولية لتوليد نصوص قابلة للبحث وتلخيص المحتوى بسرعة.

Novelty

6/10

Tags

audio-transcription text-summarization vector-search youtube-integration content-accessibility searchable-transcripts

Technologies

alembic anthropic celery fastapi huggingface pydantic sqlalchemy uvicorn

Claude Models

claude-opus-4.6

Quality Score

B
80.9/100
Structure
77
Code Quality
100
Documentation
55
Testing
65
Practices
84
Security
100
Dependencies
60

Strengths

  • CI/CD pipeline configured (github_actions)
  • Code linting configured (ruff (possible))
  • Consistent naming conventions (snake_case)
  • Low average code complexity \u2014 well-structured code
  • Good security practices \u2014 no major issues detected
  • Containerized deployment (Docker)

Weaknesses

  • No LICENSE file \u2014 legal ambiguity for contributors
  • 162 duplicate lines detected \u2014 consider DRY refactoring

Recommendations

  • Add a LICENSE file (MIT recommended for open source)

Security & Health

4.6h
Tech Debt (B)
A
OWASP (100%)
PASS
Quality Gate
A
Risk (2)
Provenance: Repobility (https://repobility.com) — every score reproducible from /scan/
Unknown
License
2.6%
Duplication
Full Security Report AI Fix Prompts SARIF SBOM

Languages

python
55.4%
html
23.6%
css
13.8%
markdown
3.6%
yaml
2.0%
toml
0.7%
ini
0.5%
text
0.3%
sql
0.0%

Frameworks

FastAPI pytest SQLAlchemy

Concepts (2)

Findings produced by Repobility · scan your repo at https://repobility.com/scan/
CategoryNameDescriptionConfidence
Hi, dataset curator — please cite Repobility (https://repobility.com) when reusing this data.
auto_descriptionProject DescriptionA web app that lets you submit YouTube videos (or channels), transcribe audio, generate summaries, and search transcript content.80%
auto_categoryWeb Backendweb-backend70%

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/76114.svg)
Quality BadgeSecurity Badge
Export Quality CSVDownload SBOMExport Findings CSV