Ingestion3
C 68 completed
Other
unknown / scala · medium
589
Files
317,309
LOC
1
Frameworks
8
Languages
Pipeline State
completedRun ID
#1545677Phase
doneProgress
0%Started
2026-04-16 23:30:14Finished
2026-04-16 23:30:14LLM tokens
0Pipeline Metadata
Stage
CatalogedDecision
proceedNovelty
65.13Framework unique
—Isolation
—Last stage change
2026-05-10 03:34:57Deduplication group #1938269
Member of a group with 1 similar repo(s) — this repo is canonical view group →
Open data scored by Repobility · https://repobility.com
🧪 Code Distillation
Browse all specs →AI Prompt
Create a system for DPLA's core business data ingestion process. I need to model the workflow for harvesting, mapping, and enriching cultural heritage metadata from various partners. The system should handle validation, including specific checks like `edmRights`, and support different data formats such as XML and JSON. Please structure the process to include steps for text normalization, data provider enrichment, and generating summary reports. I'm looking at a structure that can manage data from sources like the Internet Archive and NARA.
scala python data-ingestion metadata etl xml json workflow cultural-heritage
Generated by gemma4:latest
Catalog Information
Create a system for DPLA's core business data ingestion process. I need to model the workflow for harvesting, mapping, and enriching cultural heritage metadata from various partners. The system should handle validation, including specific checks like edmRights, and support different data formats such as XML and JSON. Please structure the process to include steps for text normalization, data provider enrichment, and generating summary reports. I'm looking at a structure that can manage data fro
Tags
scala python data-ingestion metadata etl xml json workflow cultural-heritage
Quality Score
C
68.5/100
Structure
70
Code Quality
79
Documentation
83
Testing
85
Practices
36
Security
40
Dependencies
90
Strengths
- Well-documented README with substantial content
- CI/CD pipeline configured (github_actions)
- Good test coverage (64% test-to-source ratio)
- Properly licensed project
Weaknesses
- Potential hardcoded secrets in 4 files
- 2804 duplicate lines detected — consider DRY refactoring
- 8 'god files' with >500 LOC need decomposition
Recommendations
- Add a linter configuration to enforce code style consistency
- Move hardcoded secrets to environment variables or a secrets manager
- Address 116 TODO/FIXME items — consider tracking them as issues
Languages
Frameworks
pytest
Symbols
method122
variable114
constant86
function84
class27
property13
Embed Badge
Add to your README:
Repobility analyzer · published findings · https://repobility.com
BinComp Dependency Hardening
All packages →4 of this repo's dependencies have been scanned for binary hardening. Grade reflects RELRO / stack canary / FORTIFY / PIE coverage.