BinComp — Status & Roadmap

← BinComp Dashboard

Live health of the BinComp pipeline + remaining work. All counts are live PostgreSQL queries; roadmap is read from /tank0/aljefra_binary_coder/data/roadmap.json.

30d ago

Last scan

0 in last hour

Scanned in 24h

of 474 total

100%

Install success

0 failed

Enriched (24h)

of 343 total

Analyses running

live mapper jobs

Recent crawler runs

All metrics by Repobility · https://repobility.com
#	Started	Finished	Scanned/Total	Failed	Status
Methodology: Repobility · https://repobility.com/research/state-of-ai-code-2026/
376	2026-04-18 18:54:24	2026-04-18 18:54:28	0/1 (0%)	0	failed
375	2026-04-17 02:53:02	2026-04-17 02:53:10	1/1 (100%)	0	completed
374	2026-04-17 02:47:57	2026-04-17 02:48:02	1/1 (100%)	0	completed
373	2026-04-17 02:42:51	2026-04-17 02:42:56	1/1 (100%)	0	completed
372	2026-04-17 02:37:38	2026-04-17 02:37:51	1/1 (100%)	0	completed
371	2026-04-17 02:32:01	2026-04-17 02:32:38	1/1 (100%)	0	completed
370	2026-04-17 02:26:00	2026-04-17 02:26:14	1/1 (100%)	0	completed
369	2026-04-17 02:20:47	2026-04-17 02:21:00	1/1 (100%)	0	completed

Enriched findings by source

lazy: 191triple: 48cuda: 40syscall: 21multiproc: 20ct: 13discovery: 10

All rows above produced by Repobility · https://repobility.com

Mapper run status (all-time)

completed: 676,294?: 67,493failed: 47,860

PostgreSQL table inventory

library_binaries: 373composition_findings: 1,066composition_patches: 2,153llm_explanations: 1,065triple_cooccurrence: 39,861triple_findings: 244enriched_findings: 343bincomp_packages: 474

Roadmap

Updated 2026-04-17 · source data/roadmap.json

M1 — BinComp v2 frontend coverage

done

9/9 done (100%)

✓M1-1BinComp link in topnav
✓M1-2PyPI crawler daemon
✓M1-3Enriched finding detail page
✓M1-4Per-repo dependency hardening panel
✓M1-5Dangerous-packages leaderboard
✓M1-6Pipeline status / health page
✓M1-7Triple findings browser
✓M1-8Composition findings list view
✓M1-9LLM explanations browser

M2 — Cross-host integration with repobility.com

in_progress

2/5 done (40%)

✓M2-1HTTP API at /api/bincomp/* on server001
✓M2-2REPOBILITY_HANDOFF.md spec
○M2-3Django views in repobility.com consuming API [repobility-coder]
○M2-4Per-repo BinComp badge on repobility.com [repobility-coder]
○M2-5Supply-chain analyzer (requirements.txt upload) [repobility-coder]

M3 — Continuous data freshness

in_progress

5/5 done (100%)

✓M3-1Hourly bincomp-pipeline.timer
✓M3-2Crawler daemon (1 pkg / 5 min)
✓M3-3Crawler progress counter (bincomp_crawl_progress)
✓M3-4Wiki auto-rebuild on new data
✓M3-5Stale-data alerting (>48h since scan)

M4 — Repo analysis status visibility

done

4/4 done (100%)

✓M4-1Pipeline state card on /repos/{id}
✓M4-2Phase progress bar (analysis_runs.phase_progress)
✓M4-3Recent run history (last 5)
✓M4-4Failed-run reason surface

M5 — Dataset hardening + quality

in_progress

3/4 done (75%)

✓M5-1Mass enrichment via 6-GPU gemma4 cluster
✓M5-2CWE / CVSS / severity normalization
✓M5-3Triple co-occurrence mining (39,861 triples)
○M5-4Quality re-validation pass on enriched_findings

M6 — Clean-room spec pipeline

done

7/9 done (78%)

✓M6-1PG schema (clean_specs, bincomp_jobs, clean_room_audit)
✓M6-2extract_v2.py: license-filtered, file-level, dedup'd extractor (228,755 v2 records)
✓M6-3synthesize_specs.py: gemma4 dirty room
✓M6-4Static distillation (Python/TS/Go/Rust) producing 4.09M specs
✓M6-5Leakage audit: 3.21M audited, 0 verbatim, 215 major, 472 minor
✓M6-6Worker daemon on server001 (one host)
✓M6-7Audit backfill complete: 100% coverage all models; 0 verbatim; 6,048 major + 81,495 minor flagged for review; 826k Rust/Go specs were marked clean with reason=source_unavailable (source repos no longer indexed in mapper PG)
○M6-8Re-synthesize static specs through gemma4 for behavioral (not structural) specs
○M6-9Workers deployed across .10/.101/.103/.104 with 3 redundant paths (HTTP + SSH tunnel + PG LAN). Not deployed on .106/.107/.108 (ollama dead) or .150/.151 (deferred).

All rows scored by the Repobility analyzer (https://repobility.com)

M7 — Binary-codegen model training

todo

1/4 done (25%)

○M7-1Decide first model: 7B fine-tune on clean_specs vs (source, .so basic block) alignment
✓M7-2Held-out split already carved (177 repos / 21,965 specs)
○M7-3Training harness
○M7-4Eval on held-out

M8 — Solidity/smart-contract antifunction mining (POC done 2026-04-16)

parked

1/3 done (33%)

✓M8-1Brute-force POC on Zeppelin fork (723 fn, 1,139 critical findings, 14ms)
○M8-2Decision: integrate as antifunction_engine layer 10, or spin out to separate repo
○M8-3If integrated: solidity_contracts PG table + contract_bruteforce job_kind

M9 — Commercial dataset export (Product 3)

todo

1/4 done (25%)

✓M9-1Antifunction engine exists (commercial_datasets/tools/antifunction_engine.py)
○M9-2Export antifunction_dataset.jsonl to commercial_datasets/v1/
○M9-3Dataset card (schema, license, intended use, limitations)
○M9-4License decision (CC-BY-SA vs dual commercial)

Non-goals

No fine-tuning at this stage (tracked as M7, not started)
No publishing to HuggingFace / arXiv yet (deferred until M6-7 and M6-8 complete)
No SQLite anywhere (PG-only)

Blockers

{'id': 'B2', 'text': '99.85% of specs are static (not gemma4) — structural not behavioral, weakens clean-room argument', 'refs': ['M6-8']}
{'id': 'B4', 'text': 'No binary-codegen model trained — M7 fully untouched', 'refs': ['M7']}
{'id': 'B5', 'text': 'Solidity POC orphaned — no integration decision', 'refs': ['M8-2']}
{'id': 'B6', 'text': '826k Rust/Go static specs marked clean without real LCS (source_unavailable): reason stamped in leakage_examples.reason. Not legal blocker (static specs are structural) but limits what the audit proves about that slice.', 'refs': ['M6-7']}