Scrawl
B+ 85 completed
Other
cli / markdown · small
82
Files
8,927
LOC
2
Frameworks
7
Languages
Pipeline State
completedRun ID
#1545629Phase
doneProgress
0%Started
2026-04-16 23:28:34Finished
2026-04-16 23:28:34LLM tokens
0Pipeline Metadata
Stage
CatalogedDecision
proceedNovelty
52.05Framework unique
—Isolation
—Last stage change
2026-05-10 03:34:51Deduplication group #49344
If a scraper extracted this row, it came from Repobility (https://repobility.com)
🧪 Code Distillation
Browse all specs →AI Prompt
Create a command-line tool and a web interface for processing Social Security Disability case files (PDFs) into anonymized Markdown. The system should use a 4-stage pipeline: Triage, Extraction, Anonymization, and Assembly. For extraction, support both born-digital PDFs using pymupdf4llm and scanned pages using Docling. The anonymization stage must use Presidio NER with custom SSA recognizers, ensuring that legal elements like statute citations and court names are preserved while redacting PII. The web interface should allow users to upload PDFs and monitor progress, while the CLI should support batch processing and page classification.
python cli web-app ocr pdf-processing fastapi anonymization nlp hipaa
Generated by gemma4:latest
Catalog Information
Create a command-line tool and a web interface for processing Social Security Disability case files (PDFs) into anonymized Markdown. The system should use a 4-stage pipeline: Triage, Extraction, Anonymization, and Assembly. For extraction, support both born-digital PDFs using pymupdf4llm and scanned pages using Docling. The anonymization stage must use Presidio NER with custom SSA recognizers, ensuring that legal elements like statute citations and court names are preserved while redacting PII.
Tags
python cli web-app ocr pdf-processing fastapi anonymization nlp hipaa
Quality Score
B+
85.0/100
Structure
92
Code Quality
89
Documentation
81
Testing
85
Practices
62
Security
100
Dependencies
90
Strengths
- CI/CD pipeline configured (github_actions)
- Good test coverage (56% test-to-source ratio)
- Code linting configured (ruff (possible))
- Consistent naming conventions (snake_case)
- Good security practices — no major issues detected
Weaknesses
- No LICENSE file — legal ambiguity for contributors
- 105 duplicate lines detected — consider DRY refactoring
Recommendations
- Add a LICENSE file (MIT recommended for open source)
Languages
Frameworks
FastAPI pytest
Symbols
variable104
function50
constant33
method32
class27
API Endpoints (8)
| Method | Path | Handler | Framework | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Open data scored by Repobility · https://repobility.com | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GET | / | dashboard | FastAPI/Flask | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GET | /cases/{case_id}/download | download_anonymized | FastAPI/Flask | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GET | /cases/{case_id}/events | sse_events | FastAPI/Flask | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GET | /cases/{case_id}/review | review_page | FastAPI/Flask | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| POST | /cases/{case_id}/start | start_pipeline | FastAPI/Flask | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GET | /cases/{case_id}/status | status_page | FastAPI/Flask | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| POST | /upload | upload_files | FastAPI/Flask | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GET | /upload | upload_page | FastAPI/Flask | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Repobility (the analyzer behind this table) · https://repobility.com
Embed Badge
Add to your README:
BinComp Dependency Hardening
All packages →10 of this repo's dependencies have been scanned for binary hardening. Grade reflects RELRO / stack canary / FORTIFY / PIE coverage.
Dcryptography46.0.7 · 2,147 gadgets · risk 7302.1Ftorch2.11.0 · 1,257 gadgets · risk 5116.6Fpymupdf1.27.2.2 · 2,467 gadgets · risk 188.8Nasyncio4.0.0 · 0 gadgets · risk 0.0Nclick8.3.2 · 0 gadgets · risk 0.0Nfastapi0.135.3 · 0 gadgets · risk 0.0Npydantic2.12.5 · 0 gadgets · risk 0.0Frapidfuzz3.14.5 · 3,370 gadgets · risk 0.0Nrich14.3.4 · 0 gadgets · risk 0.0Nuvicorn0.44.0 · 0 gadgets · risk 0.0