Tps.Sh

D 58 completed
Cli Tool
web_app / python · small
75
Files
9,485
LOC
2
Frameworks
8
Languages

Pipeline State

completed
Run ID
#368935
Phase
done
Progress
1%
Started
Finished
2026-04-13 01:31:02
LLM tokens
0

Pipeline Metadata

Stage
Cataloged
Decision
proceed
Novelty
71.70
Framework unique
Isolation
Last stage change
2026-05-10 03:35:38
Deduplication group #52342
Member of a group with 3 similar repo(s) — canonical #86328 view group →
Top concepts (2)
Project DescriptionWeb Frontend
Provenance: Repobility (https://repobility.com) — every score reproducible from /scan/

AI Prompt

Create a benchmarking tool called tps.sh using Python. This tool should benchmark local Ollama models against the Claude API on Apple Silicon. It needs to measure speed metrics like tokens per second (TPS) and Time To First Token (TTFT), as well as quality scores using Claude Sonnet. The system should run benchmarks across 21 coding prompts categorized into 7 groups. Include functionality to generate comparison reports, potentially as a PPTX file, and display results on a dashboard.
python web-app llm benchmarking ollama claude apple-silicon api
Generated by gemma4:latest

Catalog Information

tps.sh is a benchmarking tool for local Ollama models against the Claude API on Apple Silicon, measuring speed and quality across various coding prompts.

Description

tps.sh is a benchmarking tool that compares the performance of local Ollama models with the cloud-based Claude API. It measures speed (tokens per second) and quality (scored by Claude Sonnet) across 21 coding prompts in 7 categories. The results are displayed on a live site, allowing users to interactively compare the performance of different models.

الوصف

هذا الأداة لتقييم أداء النماذج المحلية Ollama مقارنةً بالواجهة السحابية Claude API على Apple Silicon، وتقييم سرعة و جودة عبر 21 طلبًا في 7 فئات. يظهر النتائج على موقع حيوي، مما يسمح للمستخدمين بتعرف على أداء النماذج المختلفة.

Novelty

7/10

Tags

benchmarking model-comparison performance-evaluation coding-prompts quality-scoring

Technologies

anthropic matplotlib numpy pandas rich typer

Claude Models

claude-opus-4.6

Quality Score

D
57.8/100
Structure
45
Code Quality
84
Documentation
59
Testing
0
Practices
60
Security
100
Dependencies
60

Strengths

  • Code linting configured (eslint)
  • Good security practices \u2014 no major issues detected

Weaknesses

  • No LICENSE file \u2014 legal ambiguity for contributors
  • No tests found \u2014 high risk of regressions
  • No CI/CD configuration \u2014 manual testing and deployment
  • 1 files with critical complexity need refactoring
  • 302 duplicate lines detected \u2014 consider DRY refactoring
  • 3 'god files' with >500 LOC need decomposition

Recommendations

  • Add a test suite \u2014 start with critical path integration tests
  • Set up CI/CD (GitHub Actions recommended) to automate testing and deployment
  • Add a LICENSE file (MIT recommended for open source)

Security & Health

12.6h
Tech Debt (C)
A
OWASP (100%)
PASS
Quality Gate
A
Risk (3)
Repobility (the analyzer behind this table) · https://repobility.com
Unknown
License
6.3%
Duplication
Full Security Report AI Fix Prompts SARIF SBOM

Languages

python
51.1%
html
27.6%
typescript
12.1%
yaml
5.7%
markdown
2.2%
json
0.9%
javascript
0.2%
text
0.1%

Frameworks

React Vite

Concepts (2)

Findings produced by Repobility · scan your repo at https://repobility.com/scan/
CategoryNameDescriptionConfidence
Powered by Repobility — scan your code at https://repobility.com
auto_descriptionProject DescriptionTokens Per Second — LLM Benchmark80%
auto_categoryWeb Frontendweb-frontend70%

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/93137.svg)
Quality BadgeSecurity Badge
Export Quality CSVDownload SBOMExport Findings CSV