DLX

DLX
Designer	John L. HennessyandDavid A. Patterson
Bits	32-bit
Introduced	1994
Version	1.0
Design	RISC
Type	Load–store
Encoding	Fixed
Branching	Condition register
Endianness	Bi-endian
Extensions	None, butMDMX&MIPS-3Dcould be used
Open	Yes
Registers
General-purpose	32 (R0=0)
Floating point	32 (paired DP for 32-bit)

TheDLX(pronounced "Deluxe" ) is aRISC processor architecturedesigned byJohn L. HennessyandDavid A. Patterson,the principal designers of theStanford MIPSand theBerkeley RISCdesigns (respectively), the two benchmark examples of RISC design (named after the Berkeley design).

The DLX is essentially a cleaned up (and modernized) simplified Stanford MIPS CPU. The DLX has a simple32-bitload/store architecture, somewhat unlike the modernMIPS architectureCPU. As the DLX was intended primarily for teaching purposes, the DLX design is widely used inuniversity-level computer architecture courses.

There are two known "softcore"hardware implementations: ASPIDA and VAMP. The ASPIDA project resulted in a core with many nice features: it is open source, supportsWishbone,has an asynchronous design, supports multipleISAs,and isASICproven. VAMP is a DLX-variant that was mathematically verified as part of Verisoft project. It was specified withPVS,implemented inVerilog,and runs on aXilinx FPGA.A full stack from compiler to kernel toTCP/IPwas built on it.

History[edit]

In the Stanford MIPS architecture, one of the methods used to gain performance was to force all instructions to complete in one clock cycle. This forced compilers to insert "no-ops"in cases where the instruction would definitely take longer than one clock cycle. Thus input and output activities (like memory accesses) specifically forced this behaviour, leading to artificial program bloat. In general MIPS programs were forced to have a lot of wasteful NOP instructions, a behaviour that was an unintended consequence. The DLX architecture does not force single clock cycle execution, and is therefore immune to this problem.

In the DLX design a more modern approach to handling long instructions was used: data-forwarding and instruction reordering. In this case the longer instructions are "stalled" in their functional units, and then re-inserted into the instruction stream when they can complete. Externally this design behaviour makes it appear as if execution had occurred linearly.

How it works[edit]

DLX instructions can be broken down into three types,R-type,I-typeandJ-type.R-type instructions are pureregisterinstructions, with three register references contained in the 32-bit word. I-type instructions specify two registers, and use 16 bits to hold animmediatevalue. Finally J-type instructions arejumps,containing a 26-bit address.

Opcodesare 6 bits long, for a total of 64 possible basic instructions. To select one of 32 registers 5 bits are needed.

In the case of R-type instructions this means that only 21 bits of the 32-bit word are used, which allows the lower 6 bits to be used as "extended instructions".
The DLX can support more than 64 instructions, as long as those instructions work purely on registers. This quirk is useful for things likeFPUsupport.

Pipeline[edit]

The DLX, like the MIPS design, bases its performance on the use of aninstruction pipeline.In the DLX design this is a fairly simple one,"classic" RISCin concept. The pipeline contains five stages:

IF – Instruction Fetch unit/cycle: IR<-Mem(PC); NPC<-PC+4; Operation: Send out the PC and fetch the instruction from memory into theInstruction Register (IR);increment the PC by 4 to address the next sequential instruction. The IR is used to hold the next instruction that will be needed on subsequent clock cycles; likewise the register NPC is used to hold the next sequential PC.
ID – Instruction Decode unit: Operation: Decode the instruction and access the register file to read the registers. This unit gets instruction from IF, and extracts opcode and operand from that instruction. It also retrieves register values if requested by the operation.
EX – Execution unit/effective address cycle: Operation: TheALUoperates on the operands prepared in prior cycle, performing one of the four functions depending on the DLX instruction type.; Memory Reference: Register–Register ALU instruction, Register–Immediate ALU instruction; Branch
MEM – Memory access unit: The DLX instructions active in this unit are loads, stores and branches.; Memory reference: access memory if needed. If instruction is load, data returns from memory and is placed in the LMD (load memory data) register; Branch
WB – WriteBack unit: Typically referred to as "the store unit" in modern terminology. Write the result into the register file, whether it comes from the memory system or from the ALU.

References[edit]

Sailer, Philip M.;Kaeli, David R.(1996).The DLX Instruction Set Architecture Handbook.Morgan Kaufmann.ISBN 1-55860-371-9.
Patterson, David;Hennessy, John(1996).Computer Architecture: A Quantitative Approach(1st ed.).Morgan Kaufmann.ISBN 978-1-55-860329-5.
Patterson, David;Hennessy, John(1994).Computer Organization and Design(1st ed.).Morgan Kaufmann.ISBN 978-1-55-860281-6.

External links[edit]

v t e Reduced instruction set computer(RISC) architectures
Origins	IBM 801 Berkeley RISC Stanford MIPS
Active	Analog Devices Blackfin ARC ARM AVR eSi-RISC LatticeMico8,LatticeMico32 MIPS OpenRISC Power ISA Renesas M32R,SuperH,V850 RISC-V SPARC Sunway Unicore Xilinx MicroBlaze,PicoBlaze
Discontinued	Alpha AMD Am29000 Apollo PRISM Atmel AVR32 Clipper CR16 CRISP DEC PRISM Intel i860,i960 META MIPS-X Motorola 88000,M·CORE PA-RISC POWER,PowerPC,ROMP

History[edit]

How it works[edit]

Pipeline[edit]

See also[edit]

References[edit]

External links[edit]