Transformer From Scratch

C 64 completed
Library
unknown / cpp · small
58
Files
6,664
LOC
0
Frameworks
4
Languages

Pipeline State

completed
Run ID
#354760
Phase
done
Progress
1%
Started
Finished
2026-04-13 01:31:02
LLM tokens
0

Pipeline Metadata

Stage
Skipped
Decision
skip_scaffold_dup
Novelty
42.74
Framework unique
Isolation
Last stage change
2026-04-16 18:15:42
Deduplication group #48631
Member of a group with 1 similar repo(s) — canonical #8906 view group →
Top concepts (2)
Project DescriptionData/ML
Powered by Repobility — scan your code at https://repobility.com

AI Prompt

Create a minimal, complete Transformer/GPT-style model implemented entirely from scratch using modern C++ (C++17). I need the core components like `Tensor`, `Variable`, and a reverse-mode autograd system. Please include implementations for `MultiHeadAttention`, `LayerNorm`, `PositionalEncoding`, and the full `GPTModel`. The project should also feature a simple `BPE` tokenizer and utilities for training, such as a `DataLoader` and `Dataset`, to demonstrate end-to-end training and inference mechanics without relying on any deep learning frameworks.
cpp c++17 transformer gpt autograd machine-learning nlp from-scratch cpp-library
Generated by gemma4:latest

Catalog Information

This project demonstrates a complete Transformer/GPT-style model implemented entirely from scratch in modern C++ (C++17) to highlight systems-level understanding.

Description

The project is designed to showcase the implementation of a Transformer/GPT-style model without relying on deep learning frameworks or external numerical libraries. It includes tensors, autograd, attention, optimizer, and a tokenizer to demonstrate end-to-end training and inference mechanics. The code highlights systems-level understanding by implementing memory layout, numerics, backprop, and training loops in clean, readable C++.

الوصف

هذا المشروع يظهر تنفيذ نموذج ترميزي/ GPT-الوضع من الصفر في C ++ الحديث (C ++17) لتعزيز فهم المستوى النظامي. يتضمن التطبيق تروس، autograd، الانتباه، optimizer، وtokenizer للتوضيح عن التدريب والتشغيل المباشر. يظهر الكود فهم المستوى النظامي عن طريق تنفيذ ترتيب الذاكرة، العددية، backprop، و حلول التدريب في C ++ النظيفة.

Novelty

7/10

Tags

transformer gpt-style-model from-scratch-implementation c++17 autograd attention optimizer tokenizer

Claude Models

claude-sonnet-4.5

Quality Score

C
64.0/100
Structure
63
Code Quality
52
Documentation
61
Testing
50
Practices
78
Security
100
Dependencies
60

Strengths

  • Consistent naming conventions (snake_case)
  • Good security practices \u2014 no major issues detected

Weaknesses

  • No LICENSE file \u2014 legal ambiguity for contributors
  • No CI/CD configuration \u2014 manual testing and deployment
  • 1689 duplicate lines detected \u2014 consider DRY refactoring
  • 3 'god files' with >500 LOC need decomposition

Recommendations

  • Set up CI/CD (GitHub Actions recommended) to automate testing and deployment
  • Add a linter configuration to enforce code style consistency
  • Add a LICENSE file (MIT recommended for open source)

Security & Health

7.1h
Tech Debt (C)
A
OWASP (100%)
PASS
Quality Gate
A
Risk (3)
Repobility (the analyzer behind this table) · https://repobility.com
Unknown
License
16.1%
Duplication
Full Security Report AI Fix Prompts SARIF SBOM

Languages

cpp
82.0%
c
10.1%
text
4.1%
markdown
3.8%

Frameworks

None detected

Concepts (2)

Findings curated by Repobility · https://repobility.com
CategoryNameDescriptionConfidence
About: code-quality intelligence by Repobility · https://repobility.com
auto_descriptionProject DescriptionA minimal yet complete Transformer/GPT-style model implemented entirely from scratch in modern C++ (C++17). No deep learning frameworks or external numerical libraries—just raw tensors, autograd, attention, optimizer, and a tiny tokenizer to demonstrate end-to-end training and inference mechanics.80%
auto_categoryData/MLdata-ml70%

Quality Timeline

1 quality score recorded.

View File Metrics

Embed Badge

Add to your README:

![Quality](https://repos.aljefra.com/badge/78882.svg)
Quality BadgeSecurity Badge
Export Quality CSVDownload SBOMExport Findings CSV