|
Learning Hardware Design Through Practice |
A progressive learning framework for RTL development using open-source tools
π Documentation Index - Complete guide to all documentation, organized by type
RTL Design Sherpa guides you through digital hardware design with hands-on learning from first principles.
We start with fundamental building blocks (adders, multipliers, FIFOs), progress to protocol-specific modules (AXI, DMA engines), and culminate in complete FPGA-ready systems. Every module is both educational and production-quality - meeting real timing and resource constraints.
What makes RTL Design Sherpa different:
-
From scratch: Python generators β SystemVerilog β synthesis. No black boxes, every design decision explained.
-
Safety net for exploration: Comprehensive test suites at every level (unit, integration, formal) let you experiment with confidence. Try different optimizations - the tests catch regressions.
-
Performance-driven: Multiple implementations of key modules, with measured area/speed tradeoffs. SimPy models predict behavior before writing RTL.
-
Industry practices: Open-source tools (cocotb, Verilator, Yosys) demonstrating verification methodologies used in production.
-
Complete transparency: Build systems, Makefiles, debugging sessions - all the "hidden knowledge" made visible.
Whether you're learning your first Verilog module or optimizing a high-speed interconnect, RTL Design Sherpa provides the detailed explanations, working examples, and verification infrastructure to build understanding from the ground up.
graph TD
L1[Level 1: Common Building Blocks<br/>224 modules] --> L2[Level 2: AMBA Protocol Infrastructure<br/>124 modules]
L2 --> L3[Level 3: Integration Examples]
L3 --> L4[Level 4: Production Components<br/>10+ components]
L4 --> L5[Level 5: Complete FPGA Projects]
L1 -.- L1D[Counters, FIFOs, Arbiters<br/>Math, Floating-Point, Data Integrity]
L2 -.- L2D[APB, AXI4, AXI4-Lite<br/>AXI-Stream, AMBA5 protocols]
L3 -.- L3D[CDC Counter Display<br/>APB Crossbar, Bridges]
L4 -.- L4D[STREAM, RAPIDS, Bridge<br/>Converters, Retro Legacy Blocks]
L5 -.- L5D[NexysA7 FPGA Projects<br/>Full SoC designs]
- Component Projects Index - All production-ready components
- Documentation Index - Complete documentation guide
- Common Library (224 modules) - Documentation - AI Guide
- Counters, FIFOs, arbiters, integer math, floating-point (BF16/FP16/FP32/FP8), data integrity
- AMBA Infrastructure (124 modules) - Documentation - AI Guide
- APB, AXI4, AXI4-Lite, AXI-Stream, AMBA5 protocols
| Component | Status | Description |
|---|---|---|
| STREAM | β Ready | Tutorial DMA engine with scatter-gather |
| RAPIDS | π‘ In Progress | Advanced DMA with network interfaces |
| Bridge | β Ready | AXI protocol bridges and converters |
| Converters | β Ready | UART-to-AXI4-Lite, protocol conversion |
| APB Crossbar | β Ready | MΓN APB interconnect |
| Retro Legacy | β Ready | HPET, PIC, PIT, RTC, UART, GPIO, etc. |
| Delta | π Planned | Network-on-Chip mesh |
| HIVE | π Planned | Distributed RISC-V control |
- Common Tests - Unit tests for common modules
- AMBA Tests - Protocol compliance tests
- TBClasses - Project-specific testbench classes (local)
- cocotb-framework (PyPI) - Reusable verification components (BFMs, scoreboards)
- Source: RTLDesignSherpa-DV
- Install:
pip install cocotb-framework(included in requirements.txt)
- RTL Generators - Math circuits, floating-point modules
- Documentation Tools - md_to_docx.py, header management
Location: rtl/common/ | Documentation: Full Index | AI Guide
Learn fundamental RTL design patterns through 224 reusable modules:
- Counters: Binary, Gray code, Johnson, Ring, Load/Clear variants
- Adders: Han-Carlson prefix adders (16/22/32/44/48/72-bit), Brent-Kung
- Multipliers: Dadda 4:2 compressor trees (8/11/24-bit)
- Math: Leading zeros, bit reversal, parity, CRC
- BF16: Adder, multiplier, FMA, reciprocal, division, square root
- FP16 (IEEE 754): Complete arithmetic suite
- FP32 (IEEE 754): Adder, multiplier, FMA
- FP8 (E4M3/E5M2): ML-optimized formats
- Converters: Cross-format conversion (FP32βFP16βBF16βFP8)
- FIFOs: Synchronous, asynchronous, dual-clock domain
- Shift Registers: LFSR (Fibonacci/Galois), universal shifters
- Memory: CAM (Content Addressable Memory), buffers
- Arbiters: Round-robin (simple, weighted, PWM), priority encoders
- Encoders/Decoders: Priority encoding, address decoding
- Clock Management: Dividers, gate control, pulse generation
- Reset: Synchronizers, CDC utilities
- CRC Engines: Generic CRC supporting 300+ standards
- ECC: Hamming code (SECDED), parity checkers
Example Module: counter_bin.sv
// Simple binary counter - foundation for timers, state machines
module counter_bin #(
parameter WIDTH = 8
) (
input logic i_clk,
input logic i_rst_n,
input logic i_enable,
output logic [WIDTH-1:0] o_count
);Tests: val/common/ - Every module has comprehensive CocoTB tests
Location: rtl/amba/ | Documentation: Full Index | AI Guide
Apply common building blocks to implement industry-standard protocols (124 modules):
- APB Masters - Command/response interfaces with FIFO buffering
- APB Slaves - Register interfaces with address decoding
- APB Interconnect - Multi-master/multi-slave crossbar
- APB Bridges - Protocol conversion, CDC
Example: APB register slave demonstrates parameter-driven design
apb_slave #(
.ADDR_WIDTH(12),
.DATA_WIDTH(32)
) u_apb_slave (
.pclk, .presetn, .paddr, .psel, .penable, .pwrite,
.pwdata, .pready, .prdata, .pslverr
);- AXI4 Masters - Read/write with dual skid buffers
- AXI4 Slaves - Response generation, address decoding
- AXI4 Infrastructure - FIFOs, skid buffers, arbiters
- Monitoring - Protocol compliance checkers
- AXI4-Lite Masters - Register-optimized masters
- AXI4-Lite Slaves - Configuration registers
- Protocol Bridges - APB β AXI-Lite conversion
- Stream Masters/Slaves - Streaming interfaces
- Flow Control - Backpressure, buffering
- Sideband Support - TID, TDEST, TUSER, TSTRB
- GAXI Buffers - Generic skid buffers, FIFOs, CDC
- Monitors - Transaction monitoring, performance analysis
- Arbiters - Advanced arbitration for monitor buses
Tests: val/amba/ - Protocol compliance and integration tests
Locations: rtl/integ_common/ | rtl/integ_amba/
Practice integrating multiple modules into working systems:
- CDC Counter Display - Cross clock domain counter with display logic
- Multi-Clock Systems - Demonstrate CDC techniques
Example: CDC Counter Display
Clock Domain A (Fast) Clock Domain B (Slow)
Counter β CDC β Display
@ 100MHz Sync @ 10MHz
- APB Crossbar - Multi-master to multi-slave interconnect
- 1-to-1, 1-to-4, 2-to-1, 2-to-4 configurations
- Address decoding, weighted arbitration
- APB Bridges - Protocol conversion examples
- AXI Systems - Multi-component integration
Tests: val/integ_common/ | val/integ_amba/
Location: projects/components/ | Documentation: Component Index
Build complete, production-ready peripherals for FPGA deployment (10+ components):
| Component | Status | Description |
|---|---|---|
| STREAM | β Ready | Tutorial DMA with 8 channels, scatter-gather, APB config |
| RAPIDS | π‘ In Progress | Advanced DMA with alignment fixup, network TX/RX, credit flow |
| Component | Status | Description |
|---|---|---|
| APB Crossbar | β Ready | Parametric MΓN APB interconnect with round-robin arbitration |
| Bridge | β Ready | AXI4 protocol bridges, width converters, CDC |
| Converters | β Ready | UART-to-AXI4-Lite, protocol conversion bridges |
Status: β
Production Ready | Location: projects/components/retro_legacy_blocks/
Collection of 9 legacy/retro peripherals with full APB interfaces:
| Peripheral | Description |
|---|---|
| HPET | High Precision Event Timer (2/3/8 timers, 64-bit) |
| GPIO | General Purpose I/O with interrupts |
| UART 16550 | Full 16550-compatible UART |
| 8259 PIC | Programmable Interrupt Controller |
| 8254 PIT | Programmable Interval Timer |
| RTC | Real-Time Clock |
| SMBUS | System Management Bus controller |
| PM/ACPI | Power Management / ACPI support |
| IOAPIC | I/O Advanced PIC |
Documentation: Block Status | PRD
| Component | Status | Description |
|---|---|---|
| Delta | π Planned | 4Γ4 Network-on-Chip mesh with virtual channels |
| HIVE | π Planned | Distributed RISC-V control (VexRiscv + 16 SERV monitors) |
| BCH | π Planned | BCH error correction encoder/decoder |
Planned: Full SoC designs combining all levels:
- Simple SoC: APB HPET + Memory + UART
- DMA System: RAPIDS DMA + Multi-bank memory
- Communication Hub: Ethernet MAC + DMA + Buffers
- Processing Subsystem: Custom accelerators + Interconnect
Every module demonstrates professional verification practices:
Test Structure:
# Reusable testbench class (in bin/TBClasses/)
class ModuleTB(TBBase):
def __init__(self, dut):
super().__init__(dut)
self.setup_drivers()
self.setup_monitors()
self.setup_scoreboards()
async def setup_clocks_and_reset(self):
"""Standard clock and reset initialization"""
async def write_register(self, addr, data):
"""Protocol-specific register write"""
# Test suite (organized by level)
class ModuleBasicTests:
async def test_register_access(self): ...
async def run_all_basic_tests(self): ...
class ModuleMediumTests:
async def test_complex_scenario(self): ...
async def run_all_medium_tests(self): ...
class ModuleFullTests:
async def test_stress(self): ...
async def run_all_full_tests(self): ...Test Hierarchy:
- Basic Tests - Register access, reset behavior, simple operations
- Medium Tests - Complex features, multi-component interactions
- Full Tests - Stress testing, CDC, edge cases
Test Configuration (conftest.py):
- Auto-creates logs directory
- Registers pytest markers (basic, medium, full)
- Preserves all logs
- Parametrized test fixtures
Running Tests:
# Run all tests for a module
pytest val/common/test_counter_bin.py -v
# Run specific test level
pytest val/amba/ -v -m basic # Basic tests only
pytest val/amba/ -v -m medium # Medium tests only
pytest val/amba/ -v -m full # Full tests only
# Run component tests (example: Retro Legacy Blocks)
pytest projects/components/retro_legacy_blocks/dv/tests/hpet/ -v- Verilator - High-performance RTL simulator
- Supports SystemVerilog
- VCD/FST waveform generation
- Fast execution for large designs
- GTKWave - Waveform viewer
- Pre-configured signal groups
- Professional visualization
- Verible - SystemVerilog tools
- Linting and style checking
- Code formatting
- Parsing and analysis
- CocoTB - Python-based testbench framework
- Intuitive Python test writing
- Full SystemVerilog integration
- Extensive protocol libraries
- pytest - Test runner and framework
- Test discovery and execution
- Parametrized testing
- Rich reporting
- Custom VIP - Verification IP for protocols
- APB, AXI4, AXI4-Lite, AXI-Stream drivers/monitors
- Scoreboards and coverage collectors
- PeakRDL - SystemRDL tools
- Register file generation from specifications
- APB4, AXI4-Lite interface generation
- C header generation
- Documentation generation
- Python 3.8+ - Scripting and automation
- Code generation (math circuits, register files)
- Analysis tools (dependency, UML)
- Documentation generation (Wavedrom)
- Make - Build automation
- Git - Version control with CI/CD integration
rtldesignsherpa/
βββ rtl/ # RTL source code (350+ modules)
β βββ common/ # 224 building blocks (counters, math, FP, etc.)
β βββ amba/ # 124 AMBA protocol modules
β β βββ apb/ # APB protocol
β β βββ axi4/ # AXI4 full protocol
β β βββ axil4/ # AXI4-Lite
β β βββ axis/ # AXI4-Stream
β β βββ axi5/ # AMBA5 components
β β βββ apb5/ # APB5 protocol
β β βββ gaxi/ # Generic AXI infrastructure
β β βββ shared/ # Shared utilities (CDC, monitors)
β βββ integ_common/ # Common integration examples
β βββ integ_amba/ # AMBA integration examples
β
βββ projects/ # Component projects (10+)
β βββ components/
β β βββ stream/ # STREAM DMA engine
β β βββ rapids/ # RAPIDS DMA engine
β β βββ bridge/ # Protocol bridges
β β βββ converters/ # UART-to-AXI4-Lite, etc.
β β βββ apb_xbar/ # APB crossbar
β β βββ retro_legacy_blocks/ # 9 legacy peripherals
β β βββ delta/ # Network-on-Chip (planned)
β β βββ hive/ # RISC-V control (planned)
β β βββ bch/ # BCH ECC (planned)
β βββ NexysA7/ # FPGA projects
β
βββ val/ # Validation/Test suites
β βββ common/ # Common module tests
β βββ amba/ # AMBA protocol tests
β
βββ bin/ # Tools and automation
β βββ CocoTBFramework/ # Testbench infrastructure (200+ files)
β βββ rtl_generators/ # RTL code generators
β β βββ bf16/ # BF16 floating-point generators
β β βββ ieee754/ # IEEE 754 FP generators
β β βββ verilog/ # Generic RTL generators
β βββ md_to_docx.py # Documentation generator
β βββ update_doc_headers.py # Header management
β
βββ docs/ # Documentation
β βββ markdown/ # Technical documentation
β β βββ RTLCommon/ # Common library docs
β β βββ RTLAmba/ # AMBA library docs
β β βββ CocoTBFramework/ # Framework docs
β β βββ projects/ # Component docs
β βββ DOCUMENTATION_INDEX.md # Master doc index
β
βββ CLAUDE.md # Repository AI guide
βββ README.md β docs/markdown/overview.md # This file (symlink)
1. Install Prerequisites:
# Ubuntu/Debian
sudo apt update
sudo apt install -y verilator gtkwave python3 python3-pip git make
# Fedora/RHEL
sudo dnf install -y verilator gtkwave python3 python3-pip git make
# macOS (via Homebrew)
brew install verilator gtkwave python3 git make2. Install Python Dependencies:
pip3 install cocotb pytest cocotb-test
pip3 install peakrdl peakrdl-regblock # For register generation3. Clone Repository:
git clone https://github.com/yourusername/rtldesignsherpa.git
cd rtldesignsherpa# Run basic counter test
pytest val/common/test_counter_bin.py -v
# View waveforms (after test generates VCD)
gtkwave val/common/local_sim_build/test_counter_bin/dump.vcd# Run APB slave tests
pytest val/amba/test_apb_slave.py -v
# Run only basic tests
pytest val/amba/test_apb_slave.py -v -m basic# Run 2-to-4 crossbar test
pytest val/integ_amba/test_apb_xbar.py -v -k "2to4"# Run HPET tests from Retro Legacy Blocks collection
pytest projects/components/retro_legacy_blocks/dv/tests/hpet/ -v
# Run specific test
pytest projects/components/retro_legacy_blocks/dv/tests/hpet/test_apb_hpet.py -vLevel 1 - Common Modules:
- Common Library PRD - Requirements and specifications
- Common CLAUDE Guide - AI-assisted development
- Common Tests - Example test patterns
Level 2 - AMBA Protocols:
- AMBA Infrastructure PRD - Protocol specifications
- AMBA CLAUDE Guide - Implementation patterns
- AMBA Tests - Protocol compliance tests
Level 3 - Integration:
- Integration Examples - Working multi-module designs
- Integration Tests - System-level verification
Level 4 - Components:
- Component Index - All components
- Component Overview - Design patterns
- Retro Legacy Blocks - Legacy peripheral collection
- HPET Specification - Complete HPET guide
Standards:
- AMBA Specifications - ARM protocols
- SystemRDL 2.0 - Register specification
Tools:
- CocoTB Documentation - Verification framework
- Verilator Manual - Simulator guide
- PeakRDL Docs - Register generation
Books Referenced:
- Advanced FPGA Design by Steve Kilts
- Synthesis of Arithmetic Circuits by Deschamps, Bioul, Sutter
1. Design the Module (choose your level):
// rtl/common/my_module.sv (Level 1)
// or
// rtl/amba/my_protocol.sv (Level 2)
module my_module #(
parameter WIDTH = 8
) (
input logic i_clk,
input logic i_rst_n,
// ... ports
);2. Create Testbench:
# bin/TBClasses/{subsystem}/my_module_tb.py
class MyModuleTB(TBBase):
def __init__(self, dut):
super().__init__(dut)
async def setup_clocks_and_reset(self):
# Clock and reset initialization
pass
# bin/TBClasses/{subsystem}/my_module_tests_basic.py
class MyModuleBasicTests:
async def test_basic_functionality(self):
# Test implementation
pass3. Create Test Runner:
# val/{subsystem}/test_my_module.py
import cocotb
import pytest
from cocotb_test.simulator import run
from TBClasses.{subsystem}.my_module_tb import MyModuleTB
from TBClasses.{subsystem}.my_module_tests_basic import MyModuleBasicTests
@cocotb.test()
async def my_module_test(dut):
tb = MyModuleTB(dut)
await tb.setup_clocks_and_reset()
tests = MyModuleBasicTests(tb)
result = await tests.run_all_basic_tests()
assert result
@pytest.mark.parametrize("width", [8, 16, 32])
def test_my_module(request, width):
run(verilog_sources=[...], parameters={'WIDTH': width}, ...)4. Run Tests:
pytest val/{subsystem}/test_my_module.py -v5. Document:
- Add to subsystem PRD.md
- Update CLAUDE.md with patterns
- Create examples in documentation
Current Status:
- Common Library: >95% line coverage, >90% branch coverage
- AMBA Protocols: >95% line coverage, 100% protocol compliance
- APB HPET: 5/6 configurations at 100% (12 tests each)
- Integration: Full system-level verification
- 224 Common Modules - Counters, FIFOs, arbiters, math, floating-point
- 124 AMBA Modules - APB, AXI4, AXI4-Lite, AXI-Stream, AMBA5
- 10+ Production Components - DMA engines, bridges, legacy peripherals
- 350+ Total RTL Modules - Complete verification infrastructure
Modules have been characterized across FPGA technologies:
| Category | Fmax Range | Use Cases |
|---|---|---|
| Basic Logic | 100-800 MHz | Counters, registers, control |
| Advanced Math | 200-600 MHz | DSP, arithmetic operations |
| Protocol Masters/Slaves | 200-500 MHz | APB, AXI interfaces |
| Integration Examples | 100-400 MHz | Multi-module systems |
| Production Components | 100-200 MHz | Complete peripherals |
We welcome contributions at all levels:
Level 1-2: New building blocks or protocol modules Level 3: Integration examples and use cases Level 4: Production components Level 5: Complete FPGA projects
Guidelines:
- Follow existing module structure and naming
- Include comprehensive CocoTB tests (3-level hierarchy)
- Document in PRD.md and CLAUDE.md
- Achieve >95% test coverage
- Provide integration examples
- University Courses: Complete RTL design curriculum
- Self-Learning: Progressive path from basics to production
- Industry Preparation: Professional verification practices
- IP Development: Starting point for commercial IP
- Prototyping: Rapid hardware proof-of-concept
- Tool Evaluation: Open-source vs. commercial comparison
- Cost-Effective Development: No expensive EDA licenses
- Team Training: Standardized practices and workflows
- IP Portfolio: Foundation for valuable hardware assets
- β STREAM DMA - Tutorial DMA engine complete
- β Bridge components - AXI4 width converters, CDC bridges complete
- β Retro Legacy Blocks - 9 peripherals with MAS documentation
- π‘ RAPIDS DMA - Advanced DMA in progress
- π‘ Floating-Point - FP32 FMA, additional converters
- Delta Network-on-Chip mesh implementation
- HIVE distributed RISC-V control
- BCH error correction codec
- NexysA7 FPGA integration examples
- Complete SoC reference designs
- PCIe/Ethernet/USB controllers
- Formal verification integration
- ASIC synthesis flow examples
RTL Design Sherpa believes that:
- Learning by Doing - Best way to learn hardware design is building real circuits
- Progressive Complexity - Start simple, build up systematically
- Verification First - Quality comes from comprehensive testing
- Open Source - Knowledge should be accessible to everyone
- Industry Practices - Teach real-world professional techniques
The journey from a simple counter to a complete DMA engine teaches not just RTL, but the entire hardware development process.
[Your License Here]
- GitHub Issues: [Report issues or request features]
- Documentation: [Link to docs]
- Community: [Link to discussions/forum]
RTL Design Sherpa: Guiding you from first principles to production-ready hardware design.
