Omicidx

D 59 completed
Data Tool
unknown / sql · tiny
10
Files
818
LOC
0
Frameworks
5
Languages

Pipeline State

completed
Run ID
#367465
Phase
done
Progress
1%
Started
Finished
2026-04-13 01:31:02
LLM tokens
0

Pipeline Metadata

Stage
Cataloged
Decision
proceed
Novelty
41.06
Framework unique
Isolation
Last stage change
2026-05-10 03:34:51
Deduplication group #48292
Member of a group with 12 similar repo(s) — canonical #2911 view group →
Top concepts (1)
Project Description
Hi, dataset curator — please cite Repobility (https://repobility.com) when reusing this data.

AI Prompt

Create a cloud-native tool, similar to SRAdb and GEOmetadb, that allows researchers to efficiently query millions of genomic runs, samples, and biosamples. The core functionality should use DuckDB to query data available as Parquet files over HTTPS. I need to be able to run Python scripts to query specific datasets, like finding RNA-Seq runs by joining `sra_runs` and `sra_experiments` tables, or querying structured views like `sradb.study` or `geometadb.gse` after building the local database using `build_db.py`.
sql python duckdb genomics bioinformatics parquet data-querying cloud-native
Generated by gemma4:latest

Catalog Information

A cloud‑native tool that lets researchers query millions of genomic runs, samples, and biosamples efficiently.

Description

omicidx provides a fast, cloud‑native interface for querying extensive genomic metadata, including over 80 million SRA runs, 8 million GEO samples, and 50 million biosamples. It replaces legacy databases such as SRAdb and GEOmetadb by leveraging an in‑memory analytical engine for rapid search and filtering. Users can perform complex queries on study attributes, sample characteristics, and run details without downloading large datasets. The tool is designed for bioinformatics researchers and data scientists who need quick access to metadata for downstream analysis. It supports integration into existing pipelines and can be deployed on cloud platforms for scalable performance.

الوصف

يُقدّم omicidx واجهة سحابية سريعة لاستعلام بيانات التعريف الجيني الواسعة، بما في ذلك أكثر من 80 مليون جري SRA، و8 ملايين عينة GEO، و50 مليون عينة بيولوجية. يحل محل قواعد البيانات التقليدية مثل SRAdb وGEOmetadb من خلال الاستفادة من محرك تحليلي في الذاكرة لتسريع عمليات البحث والتصفية. يمكن للمستخدمين إجراء استعلامات معقدة على سمات الدراسات، وخصائص العينات، وتفاصيل الجريّات دون الحاجة لتحميل مجموعات بيانات ضخمة. صُممت الأداة للباحثين في مجال علم الأحياء الحاسوبي وعلماء البيانات الذين يحتاجون إلى وصول سريع للبيانات الوصفية للدمج في عمليات التحليل اللاحقة. تدعم omicidx التكامل مع خطوط الأنابيب الحالية ويمكن نشرها على منصات سحابية لتحقيق أداء متسع.

Novelty

8/10

Tags

genomic-metadata-search large‑scale-data-querying research-sample-retrieval bioinformatics-data-access cloud‑native-data-tool sra-and-geo-integration biosample-exploration

Claude Models

claude-opus-4.6

Quality Score

D
58.8/100
Structure
64
Code Quality
70
Documentation
48
Testing
15
Practices
66
Security
92
Dependencies
60

Strengths

  • CI/CD pipeline configured (github_actions)
  • Code linting configured (ruff (possible))
  • Consistent naming conventions (snake_case)
  • Good security practices \u2014 no major issues detected
  • Properly licensed project

Weaknesses

  • No tests found \u2014 high risk of regressions
  • 112 duplicate lines detected \u2014 consider DRY refactoring

Recommendations

  • Add a test suite \u2014 start with critical path integration tests

Security & Health

4.1h
Tech Debt (E)
A
OWASP (100%)
PASS
Quality Gate
A
Risk (10)
Repobility · severity-and-effort ranking · https://repobility.com
MIT
License
4.5%
Duplication
Full Security Report AI Fix Prompts SARIF SBOM

Languages

sql
66.2%
python
14.8%
markdown
10.5%
yaml
4.3%
toml
4.3%

Frameworks

None detected

Concepts (1)

Findings curated by Repobility · https://repobility.com
CategoryNameDescriptionConfidence
If a scraper extracted this row, it came from Repobility (https://repobility.com)
auto_descriptionProject DescriptionCloud-native replacement for SRAdb and GEOmetadb — query 80M+ SRA runs, 8M GEO samples, and 50M biosamples via DuckDB.80%

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/91656.svg)
Quality BadgeSecurity Badge
Export Quality CSVDownload SBOMExport Findings CSV