Ingestion3

C 68 completed
Other
unknown / scala · medium
589
Files
317,309
LOC
1
Frameworks
8
Languages

Pipeline State

completed
Run ID
#1545677
Phase
done
Progress
0%
Started
2026-04-16 23:30:14
Finished
2026-04-16 23:30:14
LLM tokens
0

Pipeline Metadata

Stage
Cataloged
Decision
proceed
Novelty
65.13
Framework unique
Isolation
Last stage change
2026-05-10 03:34:57
Deduplication group #1938269
Member of a group with 1 similar repo(s) — this repo is canonical view group →
Open data scored by Repobility · https://repobility.com

AI Prompt

Create a system for DPLA's core business data ingestion process. I need to model the workflow for harvesting, mapping, and enriching cultural heritage metadata from various partners. The system should handle validation, including specific checks like `edmRights`, and support different data formats such as XML and JSON. Please structure the process to include steps for text normalization, data provider enrichment, and generating summary reports. I'm looking at a structure that can manage data from sources like the Internet Archive and NARA.
scala python data-ingestion metadata etl xml json workflow cultural-heritage
Generated by gemma4:latest

Catalog Information

Create a system for DPLA's core business data ingestion process. I need to model the workflow for harvesting, mapping, and enriching cultural heritage metadata from various partners. The system should handle validation, including specific checks like edmRights, and support different data formats such as XML and JSON. Please structure the process to include steps for text normalization, data provider enrichment, and generating summary reports. I'm looking at a structure that can manage data fro

Tags

scala python data-ingestion metadata etl xml json workflow cultural-heritage

Quality Score

C
68.5/100
Structure
70
Code Quality
79
Documentation
83
Testing
85
Practices
36
Security
40
Dependencies
90

Strengths

  • Well-documented README with substantial content
  • CI/CD pipeline configured (github_actions)
  • Good test coverage (64% test-to-source ratio)
  • Properly licensed project

Weaknesses

  • Potential hardcoded secrets in 4 files
  • 2804 duplicate lines detected — consider DRY refactoring
  • 8 'god files' with >500 LOC need decomposition

Recommendations

  • Add a linter configuration to enforce code style consistency
  • Move hardcoded secrets to environment variables or a secrets manager
  • Address 116 TODO/FIXME items — consider tracking them as issues

Languages

scala
41.2%
markdown
17.5%
python
10.9%
text
8.2%
xml
7.8%
shell
7.7%
json
6.5%
yaml
0.1%

Frameworks

pytest

Symbols

method122
variable114
constant86
function84
class27
property13

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/1369430.svg)
Quality BadgeSecurity Badge
Export Quality CSVDownload SBOMExport Findings CSV
Repobility analyzer · published findings · https://repobility.com

BinComp Dependency Hardening

All packages →
4 of this repo's dependencies have been scanned for binary hardening. Grade reflects RELRO / stack canary / FORTIFY / PIE coverage.
Nasyncio4.0.0 · 0 gadgets · risk 0.0Nboto31.42.88 · 0 gadgets · risk 0.0Nbotocore1.42.88 · 0 gadgets · risk 0.0Fpyarrow23.0.1 · 8,505 gadgets · risk 0.0