GitHub - Natarajan-R/SLM_from_SCRATCH: Language Model from Scratch using Pytorch

Beyond the API: My Journey Building a Language Model (AI) from Scratch

As a software architect working with Cloud AI and local LLMs, I felt a gap between using these powerful tools and truly understanding them. So, I decided to close that gap by building one myself, from the ground up, using only Python and PyTorch on my personal Linux laptop.

1. The Goal: From "Black Box" to "Glass Box"

My objective wasn't to compete with large-scale models. It was to demystify the "magic" of Generative AI.
I wanted to build a Small Language Model (SLM) that was powerful and non-toy, but highly specialized in a single domain.
The Challenge: Could this be done on a standard CPU, without GPUs, by carefully managing the scope of knowledge, not the quality of the architecture?

2. The Project: A Specialist AI on the "Bhagavad Gita"

I created a character-level language model trained exclusively on the text of the Bhagavad Gita.
This means its entire "world" and knowledge base is contained within this single, profound philosophical text.
The result is not a generalist chatbot, but a deep expert, capable of generating text that is stylistically and thematically perfect for its domain.

3. The Architecture: A Look Under the Hood

My SLM's "brain" is a Transformer architecture, which works in a few key stages:

Stage 1: The Input - Turning Language into Numbers
- What it does: A simple tokenizer converts each character of a prompt (e.g., 'A', 'r', 'j', 'u', 'n', 'a') into a unique number. The model only ever sees these numbers.
Stage 2: The Embedding - Giving Numbers Meaning
- What it does: This layer acts like a rich dictionary, converting each number into a dense vector (a list of 128 numbers). This vector represents the character's "personality" and, crucially, its position in the sentence.
Stage 3: The "Thinking" Core - The Transformer Blocks
- The Collaboration Phase (Self-Attention): This is the magic. Each word looks at all the previous words in the sentence to understand its context. It's like people in a meeting listening to what's been said before they speak.
- The Introspection Phase (Feed-Forward Network): After gathering context, each word "thinks" individually to process what it has learned.
- My model stacks 6 of these "thinking" blocks, allowing it to understand deeper and more complex patterns with each layer.
Stage 4: The Output - Turning Thought back into Language
- What it does: After passing through all the layers, the model generates a list of scores for every possible next character. It then uses these scores to predict the most likely next character, appends it to the sequence, and repeats the entire process, generating the text one character at a time.

4. The Demonstration: Putting My SLM to the Test

I tested the model with different kinds of prompts to see how it "thinks":

The Scholar: When given the first half of a real verse, it completed it with a thematically perfect (though not always identical) conclusion, proving it's not just a lookup table.
- Prompt: You have a right to perform your prescribed duties, but
- Result: ...but those soul Who tread the path celestial, worship Me With hearts unwandering...
The Creative Poet: When given a brand new sentence in the right style, it generated a completely original and authentic-sounding passage.
- Prompt: He who is free from the grip of desire
- Result: ...Burned clean in act by the white fire of truth, The wise call that man wise; and such an one...
The Oracle: When given a single core concept, it expounded on it in a way that demonstrated deep contextual understanding.
- Prompt: karma
- Result: ...He that, being self-contained, hath vanquished doubt, Disparting self from service, soul from works, Enlightened and emancipate, my Prince! Works fetter him no more!

5. Key Learnings & Takeaways

Feasibility: Building a powerful, custom LLM from scratch on a CPU is absolutely possible if you scope the domain correctly.
The Power of Specialization: A smaller, specialized model can outperform a giant, generalist model within its narrow field of expertise.
Prompting is Pattern-Matching: I learned firsthand that prompting isn't about giving "commands." It's about providing the beginning of a statistical pattern that the model knows how to complete based on its training.
The "Why" Matters: This journey from using to understanding has been invaluable for me as an architect and has fundamentally deepened my grasp of AI.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
01-Training the Model.png		01-Training the Model.png
02-Generate text after Training.png		02-Generate text after Training.png
03-Prompt 01.png		03-Prompt 01.png
04-Prompt 02.png		04-Prompt 02.png
05-Creative Prompts.png		05-Creative Prompts.png
06-Conceptual Prompts.png		06-Conceptual Prompts.png
07-Model Files.png		07-Model Files.png
08-Dataset source.png		08-Dataset source.png
Console Output .txt		Console Output .txt
README.md		README.md
SLM from scratch.pdf		SLM from scratch.pdf
domain_data.txt		domain_data.txt
generate.py		generate.py
slm_diagram_1_input.png		slm_diagram_1_input.png
slm_diagram_2_transformer.png		slm_diagram_2_transformer.png
slm_diagram_3_output.png		slm_diagram_3_output.png
slm_diagram_4_training.png		slm_diagram_4_training.png
slm_diagram_5_inference.png		slm_diagram_5_inference.png
train_slm.py		train_slm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Beyond the API: My Journey Building a Language Model (AI) from Scratch

1. The Goal: From "Black Box" to "Glass Box"

2. The Project: A Specialist AI on the "Bhagavad Gita"

3. The Architecture: A Look Under the Hood

4. The Demonstration: Putting My SLM to the Test

5. Key Learnings & Takeaways

About

Uh oh!

Releases

Packages

Languages

Natarajan-R/SLM_from_SCRATCH

Folders and files

Latest commit

History

Repository files navigation

Beyond the API: My Journey Building a Language Model (AI) from Scratch

1. The Goal: From "Black Box" to "Glass Box"

2. The Project: A Specialist AI on the "Bhagavad Gita"

**3. The Architecture: A Look Under the Hood **

4. The Demonstration: Putting My SLM to the Test

5. Key Learnings & Takeaways

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

3. The Architecture: A Look Under the Hood

Packages