Ingestion3

C 68 completed

Other

unknown / scala · medium

589

Files

317,309

LOC

Frameworks

Languages

Overview Files & Metrics Git Activity Call Graph Security Reports

Pipeline State

completed

Run ID

#1545677

Phase

done

Progress

Started

2026-04-16 23:30:14

Finished

2026-04-16 23:30:14

LLM tokens

Pipeline Metadata

Stage

Cataloged

Decision

proceed

Novelty

65.13

Framework unique

—

Isolation

—

Last stage change

2026-05-10 03:34:57

Deduplication group #1938269

Member of a group with 1 similar repo(s) — this repo is canonical view group →

Open data scored by Repobility · https://repobility.com

🧪 Code Distillation

AI Prompt

Create a system for DPLA's core business data ingestion process. I need to model the workflow for harvesting, mapping, and enriching cultural heritage metadata from various partners. The system should handle validation, including specific checks like `edmRights`, and support different data formats such as XML and JSON. Please structure the process to include steps for text normalization, data provider enrichment, and generating summary reports. I'm looking at a structure that can manage data from sources like the Internet Archive and NARA.

scala python data-ingestion metadata etl xml json workflow cultural-heritage

Generated by gemma4:latest

Catalog Information

Create a system for DPLA's core business data ingestion process. I need to model the workflow for harvesting, mapping, and enriching cultural heritage metadata from various partners. The system should handle validation, including specific checks like edmRights, and support different data formats such as XML and JSON. Please structure the process to include steps for text normalization, data provider enrichment, and generating summary reports. I'm looking at a structure that can manage data fro

Quality Score

68.5/100

Structure

Code Quality

Documentation

Testing

Practices

Security

Dependencies

Strengths

Well-documented README with substantial content
CI/CD pipeline configured (github_actions)
Good test coverage (64% test-to-source ratio)
Properly licensed project

Weaknesses

Potential hardcoded secrets in 4 files
2804 duplicate lines detected — consider DRY refactoring
8 'god files' with >500 LOC need decomposition

Recommendations

Add a linter configuration to enforce code style consistency
Move hardcoded secrets to environment variables or a secrets manager
Address 116 TODO/FIXME items — consider tracking them as issues

Languages

scala

41.2%

markdown

17.5%

python

10.9%

text

8.2%

xml

7.8%

shell

7.7%

json

6.5%

yaml

0.1%

Frameworks

pytest

Symbols

method122

variable114

constant86

function84

class27

property13

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/1369430.svg)

Export Quality CSV Download SBOM Export Findings CSV

Repobility analyzer · published findings · https://repobility.com

BinComp Dependency Hardening

All packages →

4 of this repo's dependencies have been scanned for binary hardening. Grade reflects RELRO / stack canary / FORTIFY / PIE coverage.

Nasyncio4.0.0 · 0 gadgets · risk 0.0 Nboto31.42.88 · 0 gadgets · risk 0.0 Nbotocore1.42.88 · 0 gadgets · risk 0.0 Fpyarrow23.0.1 · 8,505 gadgets · risk 0.0

Ingestion3

Pipeline State

Pipeline Metadata

🧪 Code Distillation

AI Prompt

Catalog Information

Tags

Quality Score

Strengths

Weaknesses

Recommendations

Languages

Frameworks

Symbols

Quality Timeline

Embed Badge

BinComp Dependency Hardening