- As a software architect working with Cloud AI and local LLMs, I felt a gap between using these powerful tools and truly understanding them. So, I decided to close that gap by building one myself, from the ground up, using only Python and PyTorch on my personal Linux laptop.
- My objective wasn't to compete with large-scale models. It was to demystify the "magic" of Generative AI.
- I wanted to build a Small Language Model (SLM) that was powerful and non-toy, but highly specialized in a single domain.
- The Challenge: Could this be done on a standard CPU, without GPUs, by carefully managing the scope of knowledge, not the quality of the architecture?
- I created a character-level language model trained exclusively on the text of the Bhagavad Gita.
- This means its entire "world" and knowledge base is contained within this single, profound philosophical text.
- The result is not a generalist chatbot, but a deep expert, capable of generating text that is stylistically and thematically perfect for its domain.
My SLM's "brain" is a Transformer architecture, which works in a few key stages:
-
Stage 1: The Input - Turning Language into Numbers
- What it does: A simple tokenizer converts each character of a prompt (e.g., 'A', 'r', 'j', 'u', 'n', 'a') into a unique number. The model only ever sees these numbers.
-
Stage 2: The Embedding - Giving Numbers Meaning
- What it does: This layer acts like a rich dictionary, converting each number into a dense vector (a list of 128 numbers). This vector represents the character's "personality" and, crucially, its position in the sentence.
-
Stage 3: The "Thinking" Core - The Transformer Blocks
- The Collaboration Phase (Self-Attention): This is the magic. Each word looks at all the previous words in the sentence to understand its context. It's like people in a meeting listening to what's been said before they speak.
- The Introspection Phase (Feed-Forward Network): After gathering context, each word "thinks" individually to process what it has learned.
- My model stacks 6 of these "thinking" blocks, allowing it to understand deeper and more complex patterns with each layer.
-
Stage 4: The Output - Turning Thought back into Language
- What it does: After passing through all the layers, the model generates a list of scores for every possible next character. It then uses these scores to predict the most likely next character, appends it to the sequence, and repeats the entire process, generating the text one character at a time.
I tested the model with different kinds of prompts to see how it "thinks":
-
The Scholar: When given the first half of a real verse, it completed it with a thematically perfect (though not always identical) conclusion, proving it's not just a lookup table.
- Prompt:
You have a right to perform your prescribed duties, but - Result:
...but those soul Who tread the path celestial, worship Me With hearts unwandering...
- Prompt:
-
The Creative Poet: When given a brand new sentence in the right style, it generated a completely original and authentic-sounding passage.
- Prompt:
He who is free from the grip of desire - Result:
...Burned clean in act by the white fire of truth, The wise call that man wise; and such an one...
- Prompt:
-
The Oracle: When given a single core concept, it expounded on it in a way that demonstrated deep contextual understanding.
- Prompt:
karma - Result:
...He that, being self-contained, hath vanquished doubt, Disparting self from service, soul from works, Enlightened and emancipate, my Prince! Works fetter him no more!
- Prompt:
- Feasibility: Building a powerful, custom LLM from scratch on a CPU is absolutely possible if you scope the domain correctly.
- The Power of Specialization: A smaller, specialized model can outperform a giant, generalist model within its narrow field of expertise.
- Prompting is Pattern-Matching: I learned firsthand that prompting isn't about giving "commands." It's about providing the beginning of a statistical pattern that the model knows how to complete based on its training.
- The "Why" Matters: This journey from using to understanding has been invaluable for me as an architect and has fundamentally deepened my grasp of AI.