Papersift

C 69 completed
Library
cli / python · tiny
42
Files
5,679
LOC
0
Frameworks
5
Languages

Pipeline State

completed
Run ID
#307737
Phase
done
Progress
1%
Started
Finished
2026-04-13 01:31:02
LLM tokens
0
Previous runs
Data scored by Repobility · https://repobility.com
#StatusPhaseStartedFinished
Repobility · code-quality intelligence platform · https://repobility.com
#111989failedAI_REASONING2026-03-21 08:49:09
#31633failedSYMBOL_EXTRACTION2026-03-07 09:02:21

Pipeline Metadata

Stage
Skipped
Decision
skip_scaffold_dup
Novelty
43.90
Framework unique
Isolation
Last stage change
2026-04-16 18:15:42
Deduplication group #48116
Member of a group with 1 similar repo(s) — canonical #88432 view group →
Top concepts (1)
Data/ML
Open data scored by Repobility · https://repobility.com

AI Prompt

Create a command-line tool in Python called PaperSift that clusters research papers based on shared entities extracted from their titles. The tool should use rule-based patterns for entity extraction, not LLMs. It needs a `cluster` command that takes an input JSON file and an output directory, allowing users to specify a resolution parameter, a random seed, and flags for citation validation or using OpenAlex topics. Additionally, include an `enrich` command to fetch supplementary data from OpenAlex, requiring an input file and an output JSON file. The core dependencies include igraph and leidenalg.
python cli research clustering nlp command-line graph-theory data-science
Generated by gemma4:latest

Catalog Information

The papersift project is designed to cluster research papers based on entities for use in code generation.

Description

Papersift is an entity-based paper clustering tool for Claude Code. It groups research papers by entities, enabling efficient retrieval and utilization of relevant information. This project aims to facilitate the development of code generation capabilities by providing a structured approach to paper organization.

الوصف

هذا المشروع يقوم بترتيب المقالات البحثية حسب الكيانات (entities) لاستخدامها في جيل كلود للكود. يجمع بين هذه المقالات حسب الكيانات، مما يساعد على استرجاع المعلومات ذات الصلة بسهولة أكبر. هذا المشروع يهدف إلى تيسير تطوير قدرات الجيل للكود من خلال تقديم منهجية منظمة لترتيب المقالات.

Novelty

5/10

Tags

paper-clustering entity-based-clustering code-generation research-paper-organization information-retrieval

Technologies

numpy pandas plotly rich scikit-learn

Claude Models

claude-opus-4.6

Quality Score

C
68.8/100
Structure
77
Code Quality
63
Documentation
65
Testing
70
Practices
60
Security
84
Dependencies
90

Strengths

  • Good test coverage (50% test-to-source ratio)
  • Code linting configured (ruff (possible))
  • Consistent naming conventions (snake_case)
  • Good security practices \u2014 no major issues detected

Weaknesses

  • No LICENSE file \u2014 legal ambiguity for contributors
  • No CI/CD configuration \u2014 manual testing and deployment
  • 445 duplicate lines detected \u2014 consider DRY refactoring
  • 1 'god files' with >500 LOC need decomposition

Recommendations

  • Set up CI/CD (GitHub Actions recommended) to automate testing and deployment
  • Add a LICENSE file (MIT recommended for open source)

Security & Health

5.1h
Tech Debt (C)
Medium
DORA Rating
A
OWASP (100%)
Repobility · open methodology · https://repobility.com/research/
PASS
Quality Gate
A
Risk (2)
Unknown
License
7.5%
Duplication
Full Security Report AI Fix Prompts SARIF SBOM

Languages

python
66.3%
markdown
17.7%
json
15.4%
toml
0.6%
text
0.1%

Frameworks

None detected

Symbols

function51
method23
variable13
class5
constant3
property1

Concepts (1)

Per-row analysis by Repobility · https://repobility.com
CategoryNameDescriptionConfidence
Repobility (the analyzer behind this table) · https://repobility.com
auto_categoryData/MLdata-ml60%

Quality Timeline

1 quality score recorded.

View File Metrics
Repobility analyzer · published findings · https://repobility.com

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/31582.svg)
Quality BadgeSecurity Badge
Export Quality CSVDownload SBOMExport Findings CSV

BinComp Dependency Hardening

All packages →
3 of this repo's dependencies have been scanned for binary hardening. Grade reflects RELRO / stack canary / FORTIFY / PIE coverage.
Nnetworkx3.6.1 · 0 gadgets · risk 0.0Fnumpy2.4.4 · 6,596 gadgets · risk 0.0Nplotly6.7.0 · 0 gadgets · risk 0.0