Skip to content

sean-galloway/RTLDesignSherpa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

276 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RTL Design Sherpa Logo

RTL Design Sherpa

Learning Hardware Design Through Practice

SystemVerilog CocoTB Verilator

RTL Modules Components License Open Source

A progressive learning framework for RTL development using open-source tools

πŸ“š Documentation Index - Complete guide to all documentation, organized by type


Project Mission

RTL Design Sherpa guides you through digital hardware design with hands-on learning from first principles.

We start with fundamental building blocks (adders, multipliers, FIFOs), progress to protocol-specific modules (AXI, DMA engines), and culminate in complete FPGA-ready systems. Every module is both educational and production-quality - meeting real timing and resource constraints.

What makes RTL Design Sherpa different:

  • From scratch: Python generators β†’ SystemVerilog β†’ synthesis. No black boxes, every design decision explained.

  • Safety net for exploration: Comprehensive test suites at every level (unit, integration, formal) let you experiment with confidence. Try different optimizations - the tests catch regressions.

  • Performance-driven: Multiple implementations of key modules, with measured area/speed tradeoffs. SimPy models predict behavior before writing RTL.

  • Industry practices: Open-source tools (cocotb, Verilator, Yosys) demonstrating verification methodologies used in production.

  • Complete transparency: Build systems, Makefiles, debugging sessions - all the "hidden knowledge" made visible.

Whether you're learning your first Verilog module or optimizing a high-speed interconnect, RTL Design Sherpa provides the detailed explanations, working examples, and verification infrastructure to build understanding from the ground up.

Learning Path

graph TD
    L1[Level 1: Common Building Blocks<br/>224 modules] --> L2[Level 2: AMBA Protocol Infrastructure<br/>124 modules]
    L2 --> L3[Level 3: Integration Examples]
    L3 --> L4[Level 4: Production Components<br/>10+ components]
    L4 --> L5[Level 5: Complete FPGA Projects]

    L1 -.- L1D[Counters, FIFOs, Arbiters<br/>Math, Floating-Point, Data Integrity]
    L2 -.- L2D[APB, AXI4, AXI4-Lite<br/>AXI-Stream, AMBA5 protocols]
    L3 -.- L3D[CDC Counter Display<br/>APB Crossbar, Bridges]
    L4 -.- L4D[STREAM, RAPIDS, Bridge<br/>Converters, Retro Legacy Blocks]
    L5 -.- L5D[NexysA7 FPGA Projects<br/>Full SoC designs]
Loading

Quick Navigation

πŸ“š Documentation

πŸ—οΈ RTL Building Blocks

🎯 Component Projects

Component Status Description
STREAM βœ… Ready Tutorial DMA engine with scatter-gather
RAPIDS 🟑 In Progress Advanced DMA with network interfaces
Bridge βœ… Ready AXI protocol bridges and converters
Converters βœ… Ready UART-to-AXI4-Lite, protocol conversion
APB Crossbar βœ… Ready MΓ—N APB interconnect
Retro Legacy βœ… Ready HPET, PIC, PIT, RTC, UART, GPIO, etc.
Delta πŸ“‹ Planned Network-on-Chip mesh
HIVE πŸ“‹ Planned Distributed RISC-V control

πŸ§ͺ Verification

πŸ› οΈ Tools


Progressive Learning Approach

Level 1: Common Building Blocks (Foundation)

Location: rtl/common/ | Documentation: Full Index | AI Guide

Learn fundamental RTL design patterns through 224 reusable modules:

Integer Arithmetic (44+ modules)

  • Counters: Binary, Gray code, Johnson, Ring, Load/Clear variants
  • Adders: Han-Carlson prefix adders (16/22/32/44/48/72-bit), Brent-Kung
  • Multipliers: Dadda 4:2 compressor trees (8/11/24-bit)
  • Math: Leading zeros, bit reversal, parity, CRC

Floating-Point (120+ modules)

  • BF16: Adder, multiplier, FMA, reciprocal, division, square root
  • FP16 (IEEE 754): Complete arithmetic suite
  • FP32 (IEEE 754): Adder, multiplier, FMA
  • FP8 (E4M3/E5M2): ML-optimized formats
  • Converters: Cross-format conversion (FP32↔FP16↔BF16↔FP8)

Data Structures

  • FIFOs: Synchronous, asynchronous, dual-clock domain
  • Shift Registers: LFSR (Fibonacci/Galois), universal shifters
  • Memory: CAM (Content Addressable Memory), buffers

Control Logic

  • Arbiters: Round-robin (simple, weighted, PWM), priority encoders
  • Encoders/Decoders: Priority encoding, address decoding
  • Clock Management: Dividers, gate control, pulse generation
  • Reset: Synchronizers, CDC utilities

Data Integrity

  • CRC Engines: Generic CRC supporting 300+ standards
  • ECC: Hamming code (SECDED), parity checkers

Example Module: counter_bin.sv

// Simple binary counter - foundation for timers, state machines
module counter_bin #(
    parameter WIDTH = 8
) (
    input  logic             i_clk,
    input  logic             i_rst_n,
    input  logic             i_enable,
    output logic [WIDTH-1:0] o_count
);

Tests: val/common/ - Every module has comprehensive CocoTB tests


Level 2: AMBA Protocol Infrastructure

Location: rtl/amba/ | Documentation: Full Index | AI Guide

Apply common building blocks to implement industry-standard protocols (124 modules):

APB (Advanced Peripheral Bus)

Example: APB register slave demonstrates parameter-driven design

apb_slave #(
    .ADDR_WIDTH(12),
    .DATA_WIDTH(32)
) u_apb_slave (
    .pclk, .presetn, .paddr, .psel, .penable, .pwrite,
    .pwdata, .pready, .prdata, .pslverr
);

AXI4 Full Protocol

AXI4-Lite (Simplified Register Interface)

AXI4-Stream (High-Throughput Data)

Shared Infrastructure

  • GAXI Buffers - Generic skid buffers, FIFOs, CDC
  • Monitors - Transaction monitoring, performance analysis
  • Arbiters - Advanced arbitration for monitor buses

Tests: val/amba/ - Protocol compliance and integration tests


Level 3: Integration Examples

Locations: rtl/integ_common/ | rtl/integ_amba/

Practice integrating multiple modules into working systems:

Simple Integrations (integ_common)

  • CDC Counter Display - Cross clock domain counter with display logic
  • Multi-Clock Systems - Demonstrate CDC techniques

Example: CDC Counter Display

Clock Domain A (Fast)    Clock Domain B (Slow)
    Counter      β†’  CDC  β†’    Display
   @ 100MHz         Sync      @ 10MHz

Protocol Integrations (integ_amba)

  • APB Crossbar - Multi-master to multi-slave interconnect
    • 1-to-1, 1-to-4, 2-to-1, 2-to-4 configurations
    • Address decoding, weighted arbitration
  • APB Bridges - Protocol conversion examples
  • AXI Systems - Multi-component integration

Tests: val/integ_common/ | val/integ_amba/


Level 4: Production Components

Location: projects/components/ | Documentation: Component Index

Build complete, production-ready peripherals for FPGA deployment (10+ components):

DMA Engines

Component Status Description
STREAM βœ… Ready Tutorial DMA with 8 channels, scatter-gather, APB config
RAPIDS 🟑 In Progress Advanced DMA with alignment fixup, network TX/RX, credit flow

Interconnect and Bridges

Component Status Description
APB Crossbar βœ… Ready Parametric MΓ—N APB interconnect with round-robin arbitration
Bridge βœ… Ready AXI4 protocol bridges, width converters, CDC
Converters βœ… Ready UART-to-AXI4-Lite, protocol conversion bridges

Retro Legacy Blocks

Status: βœ… Production Ready | Location: projects/components/retro_legacy_blocks/

Collection of 9 legacy/retro peripherals with full APB interfaces:

Peripheral Description
HPET High Precision Event Timer (2/3/8 timers, 64-bit)
GPIO General Purpose I/O with interrupts
UART 16550 Full 16550-compatible UART
8259 PIC Programmable Interrupt Controller
8254 PIT Programmable Interval Timer
RTC Real-Time Clock
SMBUS System Management Bus controller
PM/ACPI Power Management / ACPI support
IOAPIC I/O Advanced PIC

Documentation: Block Status | PRD

Future Components

Component Status Description
Delta πŸ“‹ Planned 4Γ—4 Network-on-Chip mesh with virtual channels
HIVE πŸ“‹ Planned Distributed RISC-V control (VexRiscv + 16 SERV monitors)
BCH πŸ“‹ Planned BCH error correction encoder/decoder

Level 5: Complete FPGA Projects (Future)

Planned: Full SoC designs combining all levels:

  • Simple SoC: APB HPET + Memory + UART
  • DMA System: RAPIDS DMA + Multi-bank memory
  • Communication Hub: Ethernet MAC + DMA + Buffers
  • Processing Subsystem: Custom accelerators + Interconnect

Verification Methodology

CocoTB-Based Testing

Every module demonstrates professional verification practices:

Test Structure:

# Reusable testbench class (in bin/TBClasses/)
class ModuleTB(TBBase):
    def __init__(self, dut):
        super().__init__(dut)
        self.setup_drivers()
        self.setup_monitors()
        self.setup_scoreboards()

    async def setup_clocks_and_reset(self):
        """Standard clock and reset initialization"""

    async def write_register(self, addr, data):
        """Protocol-specific register write"""

# Test suite (organized by level)
class ModuleBasicTests:
    async def test_register_access(self): ...
    async def run_all_basic_tests(self): ...

class ModuleMediumTests:
    async def test_complex_scenario(self): ...
    async def run_all_medium_tests(self): ...

class ModuleFullTests:
    async def test_stress(self): ...
    async def run_all_full_tests(self): ...

Test Hierarchy:

  1. Basic Tests - Register access, reset behavior, simple operations
  2. Medium Tests - Complex features, multi-component interactions
  3. Full Tests - Stress testing, CDC, edge cases

Test Configuration (conftest.py):

  • Auto-creates logs directory
  • Registers pytest markers (basic, medium, full)
  • Preserves all logs
  • Parametrized test fixtures

Running Tests:

# Run all tests for a module
pytest val/common/test_counter_bin.py -v

# Run specific test level
pytest val/amba/ -v -m basic      # Basic tests only
pytest val/amba/ -v -m medium     # Medium tests only
pytest val/amba/ -v -m full       # Full tests only

# Run component tests (example: Retro Legacy Blocks)
pytest projects/components/retro_legacy_blocks/dv/tests/hpet/ -v

Technology Stack

Core Tools (All Open-Source)

Simulation and Analysis

  • Verilator - High-performance RTL simulator
    • Supports SystemVerilog
    • VCD/FST waveform generation
    • Fast execution for large designs
  • GTKWave - Waveform viewer
    • Pre-configured signal groups
    • Professional visualization
  • Verible - SystemVerilog tools
    • Linting and style checking
    • Code formatting
    • Parsing and analysis

Verification Framework

  • CocoTB - Python-based testbench framework
    • Intuitive Python test writing
    • Full SystemVerilog integration
    • Extensive protocol libraries
  • pytest - Test runner and framework
    • Test discovery and execution
    • Parametrized testing
    • Rich reporting
  • Custom VIP - Verification IP for protocols
    • APB, AXI4, AXI4-Lite, AXI-Stream drivers/monitors
    • Scoreboards and coverage collectors

Register Generation

  • PeakRDL - SystemRDL tools
    • Register file generation from specifications
    • APB4, AXI4-Lite interface generation
    • C header generation
    • Documentation generation

Development and Automation

  • Python 3.8+ - Scripting and automation
    • Code generation (math circuits, register files)
    • Analysis tools (dependency, UML)
    • Documentation generation (Wavedrom)
  • Make - Build automation
  • Git - Version control with CI/CD integration

Repository Structure

rtldesignsherpa/
β”œβ”€β”€ rtl/                          # RTL source code (350+ modules)
β”‚   β”œβ”€β”€ common/                   # 224 building blocks (counters, math, FP, etc.)
β”‚   β”œβ”€β”€ amba/                     # 124 AMBA protocol modules
β”‚   β”‚   β”œβ”€β”€ apb/                 # APB protocol
β”‚   β”‚   β”œβ”€β”€ axi4/                # AXI4 full protocol
β”‚   β”‚   β”œβ”€β”€ axil4/               # AXI4-Lite
β”‚   β”‚   β”œβ”€β”€ axis/                # AXI4-Stream
β”‚   β”‚   β”œβ”€β”€ axi5/                # AMBA5 components
β”‚   β”‚   β”œβ”€β”€ apb5/                # APB5 protocol
β”‚   β”‚   β”œβ”€β”€ gaxi/                # Generic AXI infrastructure
β”‚   β”‚   └── shared/              # Shared utilities (CDC, monitors)
β”‚   β”œβ”€β”€ integ_common/            # Common integration examples
β”‚   └── integ_amba/              # AMBA integration examples
β”‚
β”œβ”€β”€ projects/                     # Component projects (10+)
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ stream/              # STREAM DMA engine
β”‚   β”‚   β”œβ”€β”€ rapids/              # RAPIDS DMA engine
β”‚   β”‚   β”œβ”€β”€ bridge/              # Protocol bridges
β”‚   β”‚   β”œβ”€β”€ converters/          # UART-to-AXI4-Lite, etc.
β”‚   β”‚   β”œβ”€β”€ apb_xbar/            # APB crossbar
β”‚   β”‚   β”œβ”€β”€ retro_legacy_blocks/ # 9 legacy peripherals
β”‚   β”‚   β”œβ”€β”€ delta/               # Network-on-Chip (planned)
β”‚   β”‚   β”œβ”€β”€ hive/                # RISC-V control (planned)
β”‚   β”‚   └── bch/                 # BCH ECC (planned)
β”‚   └── NexysA7/                 # FPGA projects
β”‚
β”œβ”€β”€ val/                          # Validation/Test suites
β”‚   β”œβ”€β”€ common/                  # Common module tests
β”‚   └── amba/                    # AMBA protocol tests
β”‚
β”œβ”€β”€ bin/                          # Tools and automation
β”‚   β”œβ”€β”€ CocoTBFramework/         # Testbench infrastructure (200+ files)
β”‚   β”œβ”€β”€ rtl_generators/          # RTL code generators
β”‚   β”‚   β”œβ”€β”€ bf16/                # BF16 floating-point generators
β”‚   β”‚   β”œβ”€β”€ ieee754/             # IEEE 754 FP generators
β”‚   β”‚   └── verilog/             # Generic RTL generators
β”‚   β”œβ”€β”€ md_to_docx.py            # Documentation generator
β”‚   └── update_doc_headers.py    # Header management
β”‚
β”œβ”€β”€ docs/                         # Documentation
β”‚   β”œβ”€β”€ markdown/                # Technical documentation
β”‚   β”‚   β”œβ”€β”€ RTLCommon/           # Common library docs
β”‚   β”‚   β”œβ”€β”€ RTLAmba/             # AMBA library docs
β”‚   β”‚   β”œβ”€β”€ CocoTBFramework/     # Framework docs
β”‚   β”‚   └── projects/            # Component docs
β”‚   └── DOCUMENTATION_INDEX.md   # Master doc index
β”‚
β”œβ”€β”€ CLAUDE.md                     # Repository AI guide
└── README.md β†’ docs/markdown/overview.md  # This file (symlink)

Getting Started

Installation

1. Install Prerequisites:

# Ubuntu/Debian
sudo apt update
sudo apt install -y verilator gtkwave python3 python3-pip git make

# Fedora/RHEL
sudo dnf install -y verilator gtkwave python3 python3-pip git make

# macOS (via Homebrew)
brew install verilator gtkwave python3 git make

2. Install Python Dependencies:

pip3 install cocotb pytest cocotb-test
pip3 install peakrdl peakrdl-regblock  # For register generation

3. Clone Repository:

git clone https://github.com/yourusername/rtldesignsherpa.git
cd rtldesignsherpa

Quick Start Examples

Level 1: Test a Simple Counter

# Run basic counter test
pytest val/common/test_counter_bin.py -v

# View waveforms (after test generates VCD)
gtkwave val/common/local_sim_build/test_counter_bin/dump.vcd

Level 2: Test APB Slave

# Run APB slave tests
pytest val/amba/test_apb_slave.py -v

# Run only basic tests
pytest val/amba/test_apb_slave.py -v -m basic

Level 3: Test APB Crossbar Integration

# Run 2-to-4 crossbar test
pytest val/integ_amba/test_apb_xbar.py -v -k "2to4"

Level 4: Test Retro Legacy Block Component

# Run HPET tests from Retro Legacy Blocks collection
pytest projects/components/retro_legacy_blocks/dv/tests/hpet/ -v

# Run specific test
pytest projects/components/retro_legacy_blocks/dv/tests/hpet/test_apb_hpet.py -v

Learning Resources

Documentation by Level

Level 1 - Common Modules:

Level 2 - AMBA Protocols:

Level 3 - Integration:

Level 4 - Components:

External References

Standards:

Tools:

Books Referenced:

  • Advanced FPGA Design by Steve Kilts
  • Synthesis of Arithmetic Circuits by Deschamps, Bioul, Sutter

Development Workflow

Creating a New Module

1. Design the Module (choose your level):

// rtl/common/my_module.sv (Level 1)
// or
// rtl/amba/my_protocol.sv (Level 2)
module my_module #(
    parameter WIDTH = 8
) (
    input  logic             i_clk,
    input  logic             i_rst_n,
    // ... ports
);

2. Create Testbench:

# bin/TBClasses/{subsystem}/my_module_tb.py
class MyModuleTB(TBBase):
    def __init__(self, dut):
        super().__init__(dut)

    async def setup_clocks_and_reset(self):
        # Clock and reset initialization
        pass

# bin/TBClasses/{subsystem}/my_module_tests_basic.py
class MyModuleBasicTests:
    async def test_basic_functionality(self):
        # Test implementation
        pass

3. Create Test Runner:

# val/{subsystem}/test_my_module.py
import cocotb
import pytest
from cocotb_test.simulator import run

from TBClasses.{subsystem}.my_module_tb import MyModuleTB
from TBClasses.{subsystem}.my_module_tests_basic import MyModuleBasicTests

@cocotb.test()
async def my_module_test(dut):
    tb = MyModuleTB(dut)
    await tb.setup_clocks_and_reset()
    tests = MyModuleBasicTests(tb)
    result = await tests.run_all_basic_tests()
    assert result

@pytest.mark.parametrize("width", [8, 16, 32])
def test_my_module(request, width):
    run(verilog_sources=[...], parameters={'WIDTH': width}, ...)

4. Run Tests:

pytest val/{subsystem}/test_my_module.py -v

5. Document:

  • Add to subsystem PRD.md
  • Update CLAUDE.md with patterns
  • Create examples in documentation

Performance and Quality

Test Coverage

Current Status:

  • Common Library: >95% line coverage, >90% branch coverage
  • AMBA Protocols: >95% line coverage, 100% protocol compliance
  • APB HPET: 5/6 configurations at 100% (12 tests each)
  • Integration: Full system-level verification

Module Counts

  • 224 Common Modules - Counters, FIFOs, arbiters, math, floating-point
  • 124 AMBA Modules - APB, AXI4, AXI4-Lite, AXI-Stream, AMBA5
  • 10+ Production Components - DMA engines, bridges, legacy peripherals
  • 350+ Total RTL Modules - Complete verification infrastructure

Synthesis Results

Modules have been characterized across FPGA technologies:

Category Fmax Range Use Cases
Basic Logic 100-800 MHz Counters, registers, control
Advanced Math 200-600 MHz DSP, arithmetic operations
Protocol Masters/Slaves 200-500 MHz APB, AXI interfaces
Integration Examples 100-400 MHz Multi-module systems
Production Components 100-200 MHz Complete peripherals

Contributing

We welcome contributions at all levels:

Level 1-2: New building blocks or protocol modules Level 3: Integration examples and use cases Level 4: Production components Level 5: Complete FPGA projects

Guidelines:

  • Follow existing module structure and naming
  • Include comprehensive CocoTB tests (3-level hierarchy)
  • Document in PRD.md and CLAUDE.md
  • Achieve >95% test coverage
  • Provide integration examples

Use Cases

Educational

  • University Courses: Complete RTL design curriculum
  • Self-Learning: Progressive path from basics to production
  • Industry Preparation: Professional verification practices

Professional

  • IP Development: Starting point for commercial IP
  • Prototyping: Rapid hardware proof-of-concept
  • Tool Evaluation: Open-source vs. commercial comparison

Startup/Small Teams

  • Cost-Effective Development: No expensive EDA licenses
  • Team Training: Standardized practices and workflows
  • IP Portfolio: Foundation for valuable hardware assets

Roadmap

Current Focus

  • βœ… STREAM DMA - Tutorial DMA engine complete
  • βœ… Bridge components - AXI4 width converters, CDC bridges complete
  • βœ… Retro Legacy Blocks - 9 peripherals with MAS documentation
  • 🟑 RAPIDS DMA - Advanced DMA in progress
  • 🟑 Floating-Point - FP32 FMA, additional converters

Near-Term

  • Delta Network-on-Chip mesh implementation
  • HIVE distributed RISC-V control
  • BCH error correction codec
  • NexysA7 FPGA integration examples

Long-Term

  • Complete SoC reference designs
  • PCIe/Ethernet/USB controllers
  • Formal verification integration
  • ASIC synthesis flow examples

Project Philosophy

RTL Design Sherpa believes that:

  1. Learning by Doing - Best way to learn hardware design is building real circuits
  2. Progressive Complexity - Start simple, build up systematically
  3. Verification First - Quality comes from comprehensive testing
  4. Open Source - Knowledge should be accessible to everyone
  5. Industry Practices - Teach real-world professional techniques

The journey from a simple counter to a complete DMA engine teaches not just RTL, but the entire hardware development process.


License

[Your License Here]


Contact and Support

  • GitHub Issues: [Report issues or request features]
  • Documentation: [Link to docs]
  • Community: [Link to discussions/forum]

RTL Design Sherpa: Guiding you from first principles to production-ready hardware design.

About

This site is hopefully a springboard for others to learn about coding in System Verilog and experimenting with FPGAs.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors