# CSE 30321 MIPS Single Cycle Dataflow

#### Lectures 11-12

## The organization of a computer

Von Neumann Model:

- · Stored-program machine instructions are represented as numbers
- · Programs can be stored in memory to be read/written just like numbers



#### The goals of this lecture are...

Lectures 11-12

- ...to show how ISAs map to real HW and affect the organization of processing logic...
- ...and to set up a discussion of pipelining + other principles of modern processing...

#### Lectures 11-12

### Functions of Each Component

- Datapath: performs data manipulation operations
  - arithmetic logic unit (ALU)
  - floating point unit (FPU)
- · Control: directs operation of other components
  - finite state machines
  - micro-programming
- · Memory: stores instructions and data
  - random access v.s. sequential access
  - volatile v.s. non-volatile
  - RAMs (SRAM, DRAM), ROMs (PROM, EEPROM), disk
  - tradeoff between speed and cost/bit
- Input/Output and I/O devices: interface to the environment
  - mouse, keyboard, display, device drivers

#### Lectures 11-12

#### The Performance Perspective

- · Performance of a machine determined by
  - Instruction count, clock cycles per instruction, clock cycle time
- · Processor design (datapath and control) determines:
  - Clock cycles per instruction
  - Clock cycle time
- · We will discuss a simplified MIPS implementation

#### Lectures 11-12

#### **MIPS Instruction Formats**

- · All MIPS instructions are 32 bits (4 bytes) long.
- · R-type:

| 31 26  | 25 21  | 20 16  | 3 15 11 | 10 6      | 5 5 (     |
|--------|--------|--------|---------|-----------|-----------|
| op (6) | rs (5) | rt (5) | rd (5)  | shamt (5) | funct (6) |

· I-Type:

| 31 26  | 25 21  | 20 16  | 6 1 5                        |
|--------|--------|--------|------------------------------|
| Op (6) | rs (5) | rt (5) | Address/Immediate value (16) |

· J-type

| 31 26  | 25                  | 0 |
|--------|---------------------|---|
| Op (6) | Target address (26) |   |

## Review of Design Steps

- Instruction Set Architecture => RTL representation
- RTL representation =>
  - Datapath components
  - Datapath interconnects
- · Datapath components => Control signals
- Control signals => Control logic
- Writing RTL: How many states (cycles) should an instruction take?
  - CPI
  - Datapath component sharing

#### Lectures 11-12

# Let's talk about this generally on the board first...

- Let's just look at our instruction formats and "derive" a simple datapath
  - (we need to make all of these instruction formats "work")

#### The MIPS Subset

- To simplify things a bit we'll just look at a few instructions:
  - memory-reference: lw. sw
  - arithmetic-logical: add, sub, and, or, slt
  - branching: beq, j
- · Organizational overview:
  - fetch an instruction based on the content of PC
  - decode the instruction
  - fetch operands
    - · (read one or two registers)
- At simplest level, this is how Von Neumann, RISC model works

- execute
  - (effective address calculation/arithmetic-logical operations/comparison)
- store result
  - · (write to memory / write to register / update PC)

#### Lectures 11-12

### Implementation Overview

· Abstract / Simplified View:

simplest view of Von Neumann, RISC μP



- 2 types of signals: data and control
- <u>Clocking strategy</u>: All storage elements clocked by same clock edge.

#### What we'll do ...

- · ...look at instruction encodings...
- · ...look at datapath development...
- · ...discuss how we generate the control signals to make the datapath elements work...

#### Lectures 11-12

#### Review of Design Steps

- Instruction set Architecture => RTL representation
- RTL representation =>
- i.e. PC ← PC + 4 (or \$4 ← \$3 + \$2)
- Datapath components
- Datapath interconnects need these to do
- Datapath components => Control signals
- · Control signals => Control logic
  - gic need these to do
  - need these to do
- Writing RTL: How many states (cycles) should an instruction take?
  - CPI
  - Datapath component sharing

#### Lectures 11-12

## What to be Done for Each Instruction?



- · How many cycles should the above take?
- · You are the architect so you decide!
- · Less cycles => more to be done in one cycle

Lectures 11-12

#### **Instruction Fetch Unit**

- Fetch the instruction: mem[PC],
- · Update the program counter:
  - sequential code: PC <- PC+4
  - branch and jump: PC <- "something else"



## Single Cycle Implementation

- · Each instruction takes one cycle to complete.
- We wait for everything to settle down, and the right thing to be done
  - ALU might not produce "right answer" right away (why?)
  - we use write signals along with clock to determine when to write
- Cycle time determined by length of the longest path PC instr. fetch &

Lectures 11-12

# Let's say we want to fetch... ...an R-type instruction (arithmetic)

· Instruction format:

| 31 | 26  | 25     | 21 20 | 16    | 15 | 11  | 10   | 6      | 5         | 0 |
|----|-----|--------|-------|-------|----|-----|------|--------|-----------|---|
| ор | (6) | rs (5) | r     | t (5) | rd | (5) | sham | nt (5) | funct (6) |   |

- · RTL:
  - Instruction fetch: mem[PC]
  - ALU operation: reg[rd] <- reg[rs] op reg[rt]
  - Go to next instruction: Pc <- PC+ 4
- Ra, Rb and Rw are from instruction's rs, rt, rd fields.
- Actual ALU operation and register write should occur after decoding the instruction.

#### Datapath for R-Type Instructions



- Register timing:
  - Register can always be read.
  - Register write only happens when RegWr is set to high and at the falling edge of the clock

Lectures 11-12

# Datapath for I-Type A/L Instructions

note that we reuse ALU...



#### I-Type Arithmetic/Logic Instructions

Lectures 11-12

Instruction format:

| 3 | 31 26  | 25 21  | 20 16  | S 15 0                       |
|---|--------|--------|--------|------------------------------|
|   | Op (6) | rs (5) | rt (5) | Address/Immediate value (16) |

- · RTL for arithmetic operations: e.g., ADDI
  - Instruction fetch: mem[PC]
  - Add operation: reg[rt] <- reg[rs] + SignExt(imm16)
  - Go to next instruction: Pc <- PC+ 4
- · Also, immediate instructions

Lectures 11-12

## I-Type Load/Store Instructions

· Instruction format:

```
31 26 25 21 20 16 15 COp (6) rs (5) rt (5) Address/Immediate value (16)
```

- · RTL for load/store operations: e.g., LW
  - Instruction fetch: mem[PC]
  - Compute memory address: Addr <- reg[rs] +
    SignExt(imm16)</pre>
  - Load data into register: reg[rt] <- mem[Addr]
  - Go to next instruction: Pc <- PC+ 4
- · How about store?

```
same thing, just skip 3<sup>rd</sup> step
(mem[addr] ← reg[rs])
```

# Datapath for Load/Store Instructions



Lectures 11-12

## Datapath for Branch Instructions



#### I-Type Branch Instructions

Instruction format:

| 31 2   | 6 25 | 21    | 20 1   | 6 15                         | ) |
|--------|------|-------|--------|------------------------------|---|
| Op (6) | r    | s (5) | rt (5) | Address/Immediate value (16) |   |

- · RTL for branch operations: e.g., BEQ
  - Instruction fetch: mem[PC]
  - Compute condition: Cond <- reg[rs] reg[rt]
  - Calculate the next instruction's address:

if (Cond eq 0) then 
$$PC \leftarrow PC + 4 + (SignExd(imm16) \times 4)$$
 else?

Lectures 11-12

#### Next Address Logic



When does the correct new PC become available? Can we do better?

#### Lectures 11-12

#### J-Type Jump Instructions

· Instruction format:

| 31 26  | 25                  | 0 |
|--------|---------------------|---|
| Op (6) | Target address (26) |   |

- · RTL operations: e.g., BEQ
  - Instruction fetch: mem[PC]
  - Set up PC: PC <- ((PC+ 4)<31:29> CONCAT(target<25:0>) x 4

Lectures 11-12

## A Single Cycle Datapath



**Instruction Fetch Unit** 



Lectures 11-12

#### Let's trace a few instructions

- · For example...
  - Add \$5, \$6, \$7
  - SW 0(\$9), \$10
  - Sub \$1, \$2, \$3
  - LW \$11, 0(\$12)

#### Lectures 11-12

## The HW needed, plus control

Single cycle MIPS machine



(I.e. now, we need to make the HW do what we want it to do - add, subtract, etc. - when we want it to...)

Lectures 11-12

## Implementing Control

- Implementation Steps:
  - Identify control inputs and control output (control words)
  - Make a control signal table for each cycle
  - Derive control logic from the control table
- · Do we need a FSM here?

Control inputs: Opcode (5 bits) Func (6 bits)



Control outputs:
RegDst
MemtoReg
RegWrite
MemRead
MemWrite
ALUSrc
ALUctr
Branch
Jump



Lectures 11-12

### Implementing Control

- · Implementation Steps:
  - 1. Identify control inputs and control outputs
  - 2. Make a control signal table for each cycle
  - 3. Derive control logic from the control table
    - This logic can take on many forms: combinational logic, ROMs, microcode, or combinations...

Lectures 11-12 Lectures 11-12

## Single Cycle Control Input/Output



Lectures 11-12

## The HW needed, plus control

Single cycle
MIPS machine



Control Signal Table

|              | R-1    | type   |        | (inpu  |        |  |
|--------------|--------|--------|--------|--------|--------|--|
|              | Add    | Sub    | LW     | SW     | BEQ    |  |
| Func (input) | 100000 | 100010 | xxxxx  | xxxxxx | xxxxx  |  |
| Op (input)   | 000000 | 000000 | 100011 | 101011 | 000100 |  |
| RegDst       | 1      | 1      | 0      | X      | X      |  |
| ALUSrc       | 0      | 0      | 1      | 1      | 0      |  |
| Mem-to-Reg   | 0      | 0      | 1      | X      | X      |  |
| Reg. Write   | 1      | 1      | 1      | 0      | 0      |  |
| Mem. Read    | 0      | 0      | 1      | 0      | 0      |  |
| Mem. Write   | 0      | 0      | 0      | 1      | 0      |  |
| Branch       | 0      | 0      | 0      | 0      | 1      |  |
| ALUOp        | Add    | Sub    | 00     | 00     | 01     |  |

(outputs)

Lectures 11-12

## Main control, ALU control



- Use OP field to generate ALUOp (encoding)
  - Control signal fed to ALU control block
- Use Func field and ALUOp to generate ALUctr (decoding)
  - Specifically sets 3 ALU control signals
    - · B-Invert, Carry-in, operation

Lectures 11-12 Lectures 11-12

#### Main control, ALU control



#### Lectures 11-12

#### The Logic



#### Generating ALUctr

· We want these outputs:





#### Lectures 11-12

#### Recall...

Single cycle MIPS machine



## Well, here's what we did...

Single cycle MIPS machine



Lectures 11-12

## Single cycle versus multi-cycle

Lectures 11-12

(and again, remember, realistically logic, ISAs, insturction types, etc. would be much more complex)

(we'd also have to route all signals too...which may affect how we'd like to organzie processing logic)

Lectures 11-12

### Single-Cycle Implementation

- · Single-cycle, fixed-length clock:
  - CPI = 1
  - Clock cycle = propagation delay of the longest datapath operations among all instruction types
  - Easy to implement
- · Single-cycle, variable-length clock:
  - CPI = 1
  - Clock cycle =  $\Sigma$  (%(type-i instructions) \* propagation delay of the type "i" instruction datapath operations)
  - Better than the previous, but impractical to implement
- · Disadvantages:
  - What if we have floating-point operations?
  - How about component usage?

## Multiple Cycle Alternative

- · Break an instruction into smaller steps
- · Execute each step in one cycle.
- Execution sequence:
  - Balance amount of work to be done
  - Restrict each cycle to use only one major functional unit
  - At the end of a cycle
    - · Store values for use in later cycles, why?
    - · Introduce additional "internal" registers
- The advantages:
  - Cycle time much shorter
  - Diff. inst. take different # of cycles to complete
  - Functional unit used more than once per instruction