BinComp — Status & Roadmap

← BinComp Dashboard

Live health of the BinComp pipeline + remaining work. All counts are live PostgreSQL queries; roadmap is read from /tank0/aljefra_binary_coder/data/roadmap.json.

30d ago
Last scan
0 in last hour
0
Scanned in 24h
of 474 total
100%
Install success
0 failed
0
Enriched (24h)
of 343 total
0
Analyses running
live mapper jobs

Recent crawler runs

All metrics by Repobility · https://repobility.com
#StartedFinishedScanned/TotalFailedStatus
Methodology: Repobility · https://repobility.com/research/state-of-ai-code-2026/
3762026-04-18 18:54:242026-04-18 18:54:280/1 (0%)0failed
3752026-04-17 02:53:022026-04-17 02:53:101/1 (100%)0completed
3742026-04-17 02:47:572026-04-17 02:48:021/1 (100%)0completed
3732026-04-17 02:42:512026-04-17 02:42:561/1 (100%)0completed
3722026-04-17 02:37:382026-04-17 02:37:511/1 (100%)0completed
3712026-04-17 02:32:012026-04-17 02:32:381/1 (100%)0completed
3702026-04-17 02:26:002026-04-17 02:26:141/1 (100%)0completed
3692026-04-17 02:20:472026-04-17 02:21:001/1 (100%)0completed

Enriched findings by source

lazy: 191triple: 48cuda: 40syscall: 21multiproc: 20ct: 13discovery: 10
All rows above produced by Repobility · https://repobility.com

Mapper run status (all-time)

completed: 676,294?: 67,493failed: 47,860

PostgreSQL table inventory

library_binaries: 373composition_findings: 1,066composition_patches: 2,153llm_explanations: 1,065triple_cooccurrence: 39,861triple_findings: 244enriched_findings: 343bincomp_packages: 474

Roadmap

Updated 2026-04-17 · source data/roadmap.json

M1 — BinComp v2 frontend coverage

done
9/9 done (100%)
  • M1-1BinComp link in topnav
  • M1-2PyPI crawler daemon
  • M1-3Enriched finding detail page
  • M1-4Per-repo dependency hardening panel
  • M1-5Dangerous-packages leaderboard
  • M1-6Pipeline status / health page
  • M1-7Triple findings browser
  • M1-8Composition findings list view
  • M1-9LLM explanations browser

M2 — Cross-host integration with repobility.com

in_progress
2/5 done (40%)
  • M2-1HTTP API at /api/bincomp/* on server001
  • M2-2REPOBILITY_HANDOFF.md spec
  • M2-3Django views in repobility.com consuming API [repobility-coder]
  • M2-4Per-repo BinComp badge on repobility.com [repobility-coder]
  • M2-5Supply-chain analyzer (requirements.txt upload) [repobility-coder]

M3 — Continuous data freshness

in_progress
5/5 done (100%)
  • M3-1Hourly bincomp-pipeline.timer
  • M3-2Crawler daemon (1 pkg / 5 min)
  • M3-3Crawler progress counter (bincomp_crawl_progress)
  • M3-4Wiki auto-rebuild on new data
  • M3-5Stale-data alerting (>48h since scan)

M4 — Repo analysis status visibility

done
4/4 done (100%)
  • M4-1Pipeline state card on /repos/{id}
  • M4-2Phase progress bar (analysis_runs.phase_progress)
  • M4-3Recent run history (last 5)
  • M4-4Failed-run reason surface

M5 — Dataset hardening + quality

in_progress
3/4 done (75%)
  • M5-1Mass enrichment via 6-GPU gemma4 cluster
  • M5-2CWE / CVSS / severity normalization
  • M5-3Triple co-occurrence mining (39,861 triples)
  • M5-4Quality re-validation pass on enriched_findings

M6 — Clean-room spec pipeline

done
7/9 done (78%)
  • M6-1PG schema (clean_specs, bincomp_jobs, clean_room_audit)
  • M6-2extract_v2.py: license-filtered, file-level, dedup'd extractor (228,755 v2 records)
  • M6-3synthesize_specs.py: gemma4 dirty room
  • M6-4Static distillation (Python/TS/Go/Rust) producing 4.09M specs
  • M6-5Leakage audit: 3.21M audited, 0 verbatim, 215 major, 472 minor
  • M6-6Worker daemon on server001 (one host)
  • M6-7Audit backfill complete: 100% coverage all models; 0 verbatim; 6,048 major + 81,495 minor flagged for review; 826k Rust/Go specs were marked clean with reason=source_unavailable (source repos no longer indexed in mapper PG)
  • M6-8Re-synthesize static specs through gemma4 for behavioral (not structural) specs
  • M6-9Workers deployed across .10/.101/.103/.104 with 3 redundant paths (HTTP + SSH tunnel + PG LAN). Not deployed on .106/.107/.108 (ollama dead) or .150/.151 (deferred).
All rows scored by the Repobility analyzer (https://repobility.com)

M7 — Binary-codegen model training

todo
1/4 done (25%)
  • M7-1Decide first model: 7B fine-tune on clean_specs vs (source, .so basic block) alignment
  • M7-2Held-out split already carved (177 repos / 21,965 specs)
  • M7-3Training harness
  • M7-4Eval on held-out

M8 — Solidity/smart-contract antifunction mining (POC done 2026-04-16)

parked
1/3 done (33%)
  • M8-1Brute-force POC on Zeppelin fork (723 fn, 1,139 critical findings, 14ms)
  • M8-2Decision: integrate as antifunction_engine layer 10, or spin out to separate repo
  • M8-3If integrated: solidity_contracts PG table + contract_bruteforce job_kind

M9 — Commercial dataset export (Product 3)

todo
1/4 done (25%)
  • M9-1Antifunction engine exists (commercial_datasets/tools/antifunction_engine.py)
  • M9-2Export antifunction_dataset.jsonl to commercial_datasets/v1/
  • M9-3Dataset card (schema, license, intended use, limitations)
  • M9-4License decision (CC-BY-SA vs dual commercial)

Non-goals

  • No fine-tuning at this stage (tracked as M7, not started)
  • No publishing to HuggingFace / arXiv yet (deferred until M6-7 and M6-8 complete)
  • No SQLite anywhere (PG-only)

Blockers

  • {'id': 'B2', 'text': '99.85% of specs are static (not gemma4) — structural not behavioral, weakens clean-room argument', 'refs': ['M6-8']}
  • {'id': 'B4', 'text': 'No binary-codegen model trained — M7 fully untouched', 'refs': ['M7']}
  • {'id': 'B5', 'text': 'Solidity POC orphaned — no integration decision', 'refs': ['M8-2']}
  • {'id': 'B6', 'text': '826k Rust/Go static specs marked clean without real LCS (source_unavailable): reason stamped in leakage_examples.reason. Not legal blocker (static specs are structural) but limits what the audit proves about that slice.', 'refs': ['M6-7']}