Skip to content

Architecture

zwabbit edited this page Mar 15, 2016 · 11 revisions

The compute unit is composed of the following logical components, each performing a step in the operation of running instructions. There is a significant degree of complexity in each component that makes modification and extension tricky. The following pages will attempt to explain in detail how each module performs its duties and how these duties interact with the rest of the pipeline. Please note that the guide will not attempt to provide a file by file walkthrough of the project. It will however highlight the important modules and any technical points that warrant attention.

[Fetch] - The fetch retrieves instructions from the instruction memory and also acts as the interface between the compute unit and the outside world.

Wavepool - The wavepool tracks the progress of instruction execution for up to 40 wavefronts running in the compute unit.

Decode - The decode module takes in an instruction provided to it by the wavepool and determines what resources are needed for it to execute.

[Issue] - The issue unit determines when the resources an instruction requires are available and execution can proceed. It is easily the most complicated piece in MIAOW and unless you have an explicit need to modify the scheduling system it is best left alone. Seriously. Here be dragons. Also the fewer people that ask about it the longer I can put off trying to document it.

[Exec] - The exec is used to generate an execution mask that indicates which of the 64 threads in a wavefront are actually supposed to be executing.

[SALU] - The scalar ALU is the one used to perform arithmetic operations for things like branch instructions.

[SIMD] - The SIMD is a 16 wide integer vector ALU. There are four of them in a full compute unit configuration.

[SIMF] - The SIMF is a 16 wide floating point vector ALU. There are four of them in a full compute unit configuration.

[SGPR] - The scalar general purpose register file is used for scalar arithmetic operations. It is also where constant data shared across multiple threads is stored.

[VGPR] - The vector general purpose register file is used for vector arithmetic operations.

[RFA] - The register file arbitrator is designed to mediate access to the single write port possessed by the VGPR. It prioritizes access to the LSU as memory operations tend to be high latency and thus fulfilling them as quickly as possible permits progress on whichever wavefront was stalled on them.

LSU - The LSU deals with memory requests.

Clone this wiki locally