Papersift
C 69 completed
Library
cli / python · tiny
42
Files
5,679
LOC
0
Frameworks
5
Languages
Pipeline State
completedRun ID
#307737Phase
doneProgress
1%Started
Finished
2026-04-13 01:31:02LLM tokens
0Previous runs
| # | Status | Phase | Started | Finished | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Repobility · code-quality intelligence platform · https://repobility.com | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #111989 | failed | AI_REASONING | 2026-03-21 08:49:09 | — | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #31633 | failed | SYMBOL_EXTRACTION | 2026-03-07 09:02:21 | — | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Pipeline Metadata
Stage
SkippedDecision
skip_scaffold_dupNovelty
43.90Framework unique
—Isolation
—Last stage change
2026-04-16 18:15:42Deduplication group #48116
Member of a group with 1 similar repo(s) — canonical #88432 view group →
Top concepts (1)
Data/ML
Open data scored by Repobility · https://repobility.com
🧪 Code Distillation
Browse all specs →Sample distilled functions (click for full spec)
ClusterValidator.generate_reportGenerates a comprehensive validation report detailing the agreement between entity clustering and citation patterns. It takes no explicit inputs but relies on previously computed internal state, including entity clusters and citation clusters. The function outputs a ValidationReport object containin
ClusterValidator.compute_confidenceCalculates a confidence score for every paper based on its assigned cluster. It takes no explicit inputs other than the object's internal state, which includes cluster assignments and citation data. The function computes the ratio of other papers within the same cluster that are citation-connected t
ClusterValidator.compute_ariCalculates the Adjusted Rand Index comparing the entity clusters against the citation clusters. It requires the instance to have pre-computed entity and citation clusters, and if citation clusters are missing, it first computes them. The function returns a floating-point score representing the simil
AI Prompt
Create a command-line tool in Python called PaperSift that clusters research papers based on shared entities extracted from their titles. The tool should use rule-based patterns for entity extraction, not LLMs. It needs a `cluster` command that takes an input JSON file and an output directory, allowing users to specify a resolution parameter, a random seed, and flags for citation validation or using OpenAlex topics. Additionally, include an `enrich` command to fetch supplementary data from OpenAlex, requiring an input file and an output JSON file. The core dependencies include igraph and leidenalg.
python cli research clustering nlp command-line graph-theory data-science
Generated by gemma4:latest
Catalog Information
The papersift project is designed to cluster research papers based on entities for use in code generation.
Description
Papersift is an entity-based paper clustering tool for Claude Code. It groups research papers by entities, enabling efficient retrieval and utilization of relevant information. This project aims to facilitate the development of code generation capabilities by providing a structured approach to paper organization.
الوصف
هذا المشروع يقوم بترتيب المقالات البحثية حسب الكيانات (entities) لاستخدامها في جيل كلود للكود. يجمع بين هذه المقالات حسب الكيانات، مما يساعد على استرجاع المعلومات ذات الصلة بسهولة أكبر. هذا المشروع يهدف إلى تيسير تطوير قدرات الجيل للكود من خلال تقديم منهجية منظمة لترتيب المقالات.
Novelty
5/10Tags
paper-clustering entity-based-clustering code-generation research-paper-organization information-retrieval
Technologies
numpy pandas plotly rich scikit-learn
Claude Models
claude-opus-4.6
Quality Score
C
68.8/100
Structure
77
Code Quality
63
Documentation
65
Testing
70
Practices
60
Security
84
Dependencies
90
Strengths
- Good test coverage (50% test-to-source ratio)
- Code linting configured (ruff (possible))
- Consistent naming conventions (snake_case)
- Good security practices \u2014 no major issues detected
Weaknesses
- No LICENSE file \u2014 legal ambiguity for contributors
- No CI/CD configuration \u2014 manual testing and deployment
- 445 duplicate lines detected \u2014 consider DRY refactoring
- 1 'god files' with >500 LOC need decomposition
Recommendations
- Set up CI/CD (GitHub Actions recommended) to automate testing and deployment
- Add a LICENSE file (MIT recommended for open source)
Security & Health
5.1h
Tech Debt (C)
Medium
DORA Rating
A
OWASP (100%)
Repobility · open methodology · https://repobility.com/research/
PASS
Quality Gate
A
Risk (2)
Unknown
License
7.5%
Duplication
Languages
Frameworks
None detected
Symbols
function51
method23
variable13
class5
constant3
property1
Concepts (1)
| Category | Name | Description | Confidence | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Repobility (the analyzer behind this table) · https://repobility.com | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| auto_category | Data/ML | data-ml | 60% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Repobility analyzer · published findings · https://repobility.com
Embed Badge
Add to your README:
BinComp Dependency Hardening
All packages →3 of this repo's dependencies have been scanned for binary hardening. Grade reflects RELRO / stack canary / FORTIFY / PIE coverage.