Papersift

C 69 completed

Library

cli / python · tiny

Files

5,679

LOC

Frameworks

Languages

Overview Files & Metrics Git Activity Call Graph Security Reports

Pipeline State

completed

Run ID

#307737

Phase

done

Progress

Started

Finished

2026-04-13 01:31:02

LLM tokens

Previous runs

Data scored by Repobility · https://repobility.com
#	Status	Phase	Started	Finished
Repobility · code-quality intelligence platform · https://repobility.com
#111989	failed	AI_REASONING	2026-03-21 08:49:09	—
#31633	failed	SYMBOL_EXTRACTION	2026-03-07 09:02:21	—

Pipeline Metadata

Stage

Skipped

Decision

skip_scaffold_dup

Novelty

43.90

Framework unique

—

Isolation

—

Last stage change

2026-04-16 18:15:42

Deduplication group #48116

Member of a group with 1 similar repo(s) — canonical #88432 view group →

Top concepts (1)

Data/ML

Open data scored by Repobility · https://repobility.com

🧪 Code Distillation

Sample distilled functions (click for full spec)

ClusterValidator.generate_report

Generates a comprehensive validation report detailing the agreement between entity clustering and citation patterns. It takes no explicit inputs but relies on previously computed internal state, including entity clusters and citation clusters. The function outputs a ValidationReport object containin

ClusterValidator.compute_confidence

Calculates a confidence score for every paper based on its assigned cluster. It takes no explicit inputs other than the object's internal state, which includes cluster assignments and citation data. The function computes the ratio of other papers within the same cluster that are citation-connected t

ClusterValidator.compute_ari

Calculates the Adjusted Rand Index comparing the entity clusters against the citation clusters. It requires the instance to have pre-computed entity and citation clusters, and if citation clusters are missing, it first computes them. The function returns a floating-point score representing the simil

AI Prompt

Create a command-line tool in Python called PaperSift that clusters research papers based on shared entities extracted from their titles. The tool should use rule-based patterns for entity extraction, not LLMs. It needs a `cluster` command that takes an input JSON file and an output directory, allowing users to specify a resolution parameter, a random seed, and flags for citation validation or using OpenAlex topics. Additionally, include an `enrich` command to fetch supplementary data from OpenAlex, requiring an input file and an output JSON file. The core dependencies include igraph and leidenalg.

python cli research clustering nlp command-line graph-theory data-science

Generated by gemma4:latest

Catalog Information

The papersift project is designed to cluster research papers based on entities for use in code generation.

Description

Papersift is an entity-based paper clustering tool for Claude Code. It groups research papers by entities, enabling efficient retrieval and utilization of relevant information. This project aims to facilitate the development of code generation capabilities by providing a structured approach to paper organization.

الوصف

هذا المشروع يقوم بترتيب المقالات البحثية حسب الكيانات (entities) لاستخدامها في جيل كلود للكود. يجمع بين هذه المقالات حسب الكيانات، مما يساعد على استرجاع المعلومات ذات الصلة بسهولة أكبر. هذا المشروع يهدف إلى تيسير تطوير قدرات الجيل للكود من خلال تقديم منهجية منظمة لترتيب المقالات.

Novelty

5/10

Technologies

numpy pandas plotly rich scikit-learn

Claude Models

claude-opus-4.6

Quality Score

68.8/100

Structure

Code Quality

Documentation

Testing

Practices

Security

Dependencies

Strengths

Good test coverage (50% test-to-source ratio)
Code linting configured (ruff (possible))
Consistent naming conventions (snake_case)
Good security practices \u2014 no major issues detected

Weaknesses

No LICENSE file \u2014 legal ambiguity for contributors
No CI/CD configuration \u2014 manual testing and deployment
445 duplicate lines detected \u2014 consider DRY refactoring
1 'god files' with >500 LOC need decomposition

Recommendations

Set up CI/CD (GitHub Actions recommended) to automate testing and deployment
Add a LICENSE file (MIT recommended for open source)

Security & Health

5.1h

Tech Debt (C)

Medium

DORA Rating

OWASP (100%)

Repobility · open methodology · https://repobility.com/research/

PASS

Quality Gate

Risk (2)

Unknown

License

7.5%

Duplication

Full Security Report AI Fix Prompts SARIF SBOM

Languages

python

66.3%

markdown

17.7%

json

15.4%

toml

0.6%

text

0.1%

Frameworks

None detected

Symbols

function51

method23

variable13

class5

constant3

property1

Concepts (1)

Per-row analysis by Repobility · https://repobility.com
Category	Name	Description	Confidence
Repobility (the analyzer behind this table) · https://repobility.com
auto_category	Data/ML	data-ml	60%

Quality Timeline

1 quality score recorded.

View File Metrics

Repobility analyzer · published findings · https://repobility.com

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/31582.svg)

Export Quality CSV Download SBOM Export Findings CSV

BinComp Dependency Hardening

All packages →

3 of this repo's dependencies have been scanned for binary hardening. Grade reflects RELRO / stack canary / FORTIFY / PIE coverage.

Nnetworkx3.6.1 · 0 gadgets · risk 0.0 Fnumpy2.4.4 · 6,596 gadgets · risk 0.0 Nplotly6.7.0 · 0 gadgets · risk 0.0

Papersift

Pipeline State

Pipeline Metadata

🧪 Code Distillation

AI Prompt

Catalog Information

Description

الوصف

Novelty

Tags

Technologies

Claude Models

Quality Score

Strengths

Weaknesses

Recommendations

Security & Health

Languages

Frameworks

Symbols

Concepts (1)

Quality Timeline

Embed Badge

BinComp Dependency Hardening