Rag Provider

C+ 73 completed
Ai Ml
containerized / python · small
343
Files
68,970
LOC
2
Frameworks
7
Languages

Pipeline State

completed
Run ID
#306034
Phase
done
Progress
1%
Started
Finished
2026-04-13 01:31:02
LLM tokens
0

Pipeline Metadata

Stage
Cataloged
Decision
proceed
Novelty
67.47
Framework unique
Isolation
Last stage change
2026-05-10 03:35:34
Deduplication group #49699
Member of a group with 14 similar repo(s) — canonical #72385 view group →
Top concepts (12)
Project DescriptionRepositoryWeb Backendbusiness_logicinfrastructuretestingpresentationSingletonLayered Architectureapidata_accessFactory
Repobility's GitHub App fixes findings like these · https://github.com/apps/repobility-bot

AI Prompt

Create a production-ready Retrieval-Augmented Generation (RAG) provider system using Python. I need it to be able to extract and link entities from various documents, verified against a test corpus. The system should support multiple LLM providers via LiteLLM, enforce type-safe outputs using Instructor, and handle document processing from formats like PDF, Office, and text. Key features include comprehensive entity linking (people, orgs, tech, places), auto-generating WikiLinks, and creating structured knowledge graph files, ideally compatible with Obsidian. Please structure the deployment using Docker Compose.
python rag fastapi llm entity-linking docker ai production
Generated by gemma4:latest

Catalog Information

The RAG Provider is a production-ready system for extracting and linking entities from documents, verified through comprehensive testing on 100 real documents.

Description

This project provides a robust RAG (Recurrent Attention Generator) system that can extract and link entities from various document formats. It features LiteLLM integration with support for over 100 providers, Instructor for type-safe outputs, modular routes, and RAGService orchestrator. The system has been thoroughly tested on 100 real documents, achieving a 100% success rate in ingesting documents, extracting chunks, and creating auto-links.

الوصف

هذا المشروع يقدم نظام راغ (Recurrent Attention Generator) متقدم للتعرف على وتحديد الكيانات من المستندات المختلفة. يحتوي على تكامل LiteLLM مع دعم لأكثر من 100 مزود، Instructor لOUTPUTS نوعية آمنة، وطرق مخصصة، وراغ Service orchestrator. تم اختبار النظام بشكل شامع على 100 مستند حقيقي، وتحقيق نسبة نجاح 100% في استيعاب المستندات، واستخراج القطع، وخلق روابط تلقائية.

Novelty

9/10

Tags

entity-extraction document-processing link-discovery knowledge-graph natural-language-processing machine-learning

Technologies

anthropic beautifulsoup chromadb click fastapi huggingface matplotlib nginx numpy openai pandas pydantic pytorch rich scikit-learn scipy typer uvicorn

Claude Models

claude (unknown version)

Quality Score

C+
72.9/100
Structure
74
Code Quality
73
Documentation
83
Testing
85
Practices
54
Security
65
Dependencies
90

Strengths

  • CI/CD pipeline configured (github_actions)
  • Good test coverage (93% test-to-source ratio)
  • Consistent naming conventions (snake_case)
  • Containerized deployment (Docker)

Weaknesses

  • No LICENSE file \u2014 legal ambiguity for contributors
  • 7 bare except/catch blocks swallowing errors
  • Potential hardcoded secrets in 1 files
  • 1611 duplicate lines detected \u2014 consider DRY refactoring
  • 6 'god files' with >500 LOC need decomposition

Recommendations

  • Add a linter configuration to enforce code style consistency
  • Add a LICENSE file (MIT recommended for open source)
  • Replace bare except/catch blocks with specific exception types
  • Move hardcoded secrets to environment variables or a secrets manager

Security & Health

11.6h
Tech Debt (A)
Medium
DORA Rating
A
OWASP (100%)
About: code-quality intelligence by Repobility · https://repobility.com
PASS
Quality Gate
A
Risk (0)
Unknown
License
2.5%
Duplication
Full Security Report AI Fix Prompts SARIF SBOM

Languages

python
58.7%
markdown
30.0%
text
6.6%
yaml
2.3%
shell
1.9%
json
0.5%
ini
0.0%

Frameworks

FastAPI pytest

Symbols

variable623
method526
class152
function127
constant124

API Endpoints (51)

Repobility · the analyzer behind every row · https://repobility.com
MethodPathHandlerFramework
Repobility analyzer · published findings · https://repobility.com
POSTingest_documentFastAPI
GET/web_interfaceFastAPI/Flask
POST/admin/initialize-enhancedinitialize_enhanced_searchFastAPI/Flask
POST/batchingest_batch_filesFastAPI
POST/chatchat_with_ragFastAPI
POST/chat/enhancedenhanced_chat_endpointFastAPI/Flask
POST/cleanup-corruptedcleanup_corrupted_documentsFastAPI
POST/cleanup-duplicatescleanup_duplicatesFastAPI
GET/cost-statsget_cost_statsFastAPI
GET/cost/statsget_cost_statsFastAPI
GET/daily-note/{date}get_daily_noteFastAPI
GET/documentslist_documentsFastAPI
GET/documentslist_documents_adminFastAPI
GET/documentslist_documentsFastAPI
DELETE/documents/{doc_id}delete_documentFastAPI
GET/documents/{doc_id}get_documentFastAPI
POST/enrich-entitiesenrich_entitiesFastAPI
GET/entities/{entity_name}/timelineget_entity_timelineFastAPI
GET/evaluation/comparecompare_evaluation_runsFastAPI
GET/evaluation/gold-querieslist_gold_queriesFastAPI
POST/evaluation/gold-queriesadd_gold_queryFastAPI
GET/evaluation/historyget_evaluation_historyFastAPI
GET/evaluation/report/{run_id}get_evaluation_reportFastAPI
POST/evaluation/runrun_evaluationFastAPI
GET/evaluation/statusget_evaluation_statusFastAPI
POST/evaluation/upload-gold-setupload_gold_query_setFastAPI
POST/fileingest_fileFastAPI
POST/generate-monthly-notegenerate_monthly_noteFastAPI
POST/generate-weekly-notegenerate_weekly_noteFastAPI
GET/healthhealth_checkFastAPI
GET/modelslist_available_modelsFastAPI
GET/monitoring/alertslist_alertsFastAPI
GET/monitoring/dashboardget_dashboard_dataFastAPI
GET/monitoring/driftdetect_driftFastAPI
GET/monitoring/healthmonitoring_healthFastAPI
POST/monitoring/reportgenerate_drift_reportFastAPI
POST/monitoring/schedule-snapshotschedule_snapshotFastAPI
POST/monitoring/snapshotcapture_snapshotFastAPI
GET/monitoring/snapshotslist_snapshotsFastAPI
POST/reset-collectionreset_collectionFastAPI
POST/searchsearch_documentsFastAPI
GET/search/configget_enhanced_search_configFastAPI/Flask
POST/search/enhancedenhanced_search_endpointFastAPI/Flask
GET/statsget_statsFastAPI
POST/test-llmtest_llm_providerFastAPI
POST/threads/createcreate_threads_from_filesFastAPI
GET/threads/exampleget_example_threadFastAPI
POST/threads/process-mailboxprocess_mailboxFastAPI
POST/threads/statisticsget_thread_statisticsFastAPI
GET/threads/{thread_id}get_thread_messagesFastAPI

Showing 50 of 51

Concepts (25)

Open data · scored by Repobility · https://repobility.com
CategoryNameDescriptionConfidence
Repobility · code-quality intelligence · https://repobility.com
design_patternRepositoryFound repository-named files80%
auto_descriptionProject Description!Tests !Nightly Tests80%
design_patternSingletonFound get_instance/instance patterns70%
arch_patternLayered ArchitectureFound API/routes, service, and data layers70%
arch_layerpresentationDetected presentation layer70%
arch_layerapiDetected api layer70%
arch_layerbusiness_logicDetected business_logic layer70%
arch_layerdata_accessDetected data_access layer70%
arch_layerinfrastructureDetected infrastructure layer70%
arch_layertestingDetected testing layer70%
auto_categoryWeb Backendweb-backend70%
design_patternStrategyFound strategy/policy-named files60%
design_patternFactoryFound factory/create_ naming patterns60%
business_logicNotificationsDetected from 12 related files50%
business_logicPayment ProcessingDetected from 2 related files50%
business_logicSearchDetected from 12 related files50%
business_logicTestingDetected from 116 related files50%
arch_patternContainerized/MicroservicesMultiple Dockerfiles found at package level50%
business_logicAnalyticsDetected from 2 related files50%
business_logicAuthenticationDetected from 20 related files50%
business_logicCachingDetected from 2 related files50%
business_logicConfigurationDetected from 7 related files50%
business_logicFile ManagementDetected from 5 related files50%
business_logicDatabaseDetected from 25 related files50%
business_logicLoggingDetected from 13 related files50%
Want fix-PRs on findings? Install Repobility's GitHub App · github.com/apps/repobility-bot

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/29869.svg)
Quality BadgeSecurity Badge
Export Quality CSVDownload SBOMExport Findings CSV