# 쳊

# CSE 30321 Computer Architecture I

### Lecture 13-14 - Multi Cycle Machines

*Michael Niemier* Department of Computer Science and Engineering

## Single cycle Control Implementation



X.S. Hu

## How to Determine Cycle Length?



5-1

#### □ Calculate cycle time assuming negligible delays except:

- memory (2ns), ALU and adders (2ns), register file access (1ns)
- R-type: max {mem + RF + ALU + RF, Add} = 6ns
- LW: max{mem + RF + ALU + mem + RF, Add} = 8ns
- SW: max{mem + RF + ALU + mem, Add} = 7ns
- BEQ: max{mem + RF + ALU, max{Add, mem + Add}} = 5ns

## Some Observations

### **Datapath:**

- How many times is each component used during an instruction execution?
- Components can be combined by overlapping different instruction types
  - > Register file by all instruction types
  - > How about ALU?
  - > How about sign-extension unit?

#### **Control**:

- For each type of instruction, identify control signals for each datapath component involved
- Control signals are generated from the instruction opcode (instr[31:26])

X.S. Hu

5-3

X.S. Hu

IIM

5-2

IN

## Single-Cycle Implementation



- □ Single-cycle, fixed-length clock:
  - CPI = 1
  - Clock cycle = propagation delay of the longest datapath operations among all instruction types
  - Easy to implement
- □ Single-cycle, variable-length clock:

■ CPI = 1

- Clock cycle = Σ (%(type-i instructions) \* propagation delay of the type-i instruction datapath operations)
- better than the previous one but impractical to implement
- Disadvantages:
  - What if we have floating-point operations?
  - How about component usage?

## Multiple Cycle Alternative

- □ Break an instruction into smaller steps
- □ Execute each step in one cycle
- **Execution sequence:** 
  - Balance the amount of work to be done, why?
  - Restrict each cycle to use only one major functional unit, why?
  - At the end of a cycle
    - > store values for use in later cycles, why?
    - introduce additional "internal" registers

### □ The advantages:

- Cycle time is much shorter
- Different instructions take different number of cycles to complete
- Allows a functional unit to be used more than once per instruction

X.S. Hu

# Multiple-Cycle Implementation



5-5

X.S. Hu

### **Datapath**

- Component sharing: ALU, Instruction/Data memory
  - >ALU used to compute address and to increment PC
  - > Memory used for instruction and data
- Additional elements: MUX's, Instr Register, Target Register
  - If a value needs to be alive during multiple cycles, it should stay unchanged during the whole time.

### **Control**:

Needed for each datapath element during each clock cycle

# What to be Done for Each Instruction?



How many cycles should the above take?
 You are the architect so you decide!
 Less cylces => more to be done in one cycle

X.S. Hu

5-7

5-6

## **Five Step Execution**

### 1. Instruction Fetch (lfetch):

- Fetch instruction at address (\$PC)
- Store the instruction in register IR
- Increment PC
- 2. Instruction Decode and Register Read (Decode):
  - Decode the instruction type and read register
  - Store the register contents in registers <u>A</u> and <u>B</u>
  - Compute new PC address and store it in <u>ALUOut</u>
- 3. Execution, Memory Address Computation, or Branch Completion (Execute):
  - Compute memory address (for LW and SW), or
  - Perform R-type operation (for R-type instruction), or
  - Update PC (for Branch and Jump)
  - Store memory address or register operation result in <u>ALUOut</u>

## Five Step Execution (cont'd)

- 4. Memory Access or R-type instruction completion (MemRead/RegWrite/MemWrite):
  - Read memory at address ALUOut and store it in MDR
  - Write ALUOut content into register file, or
  - Write memory at address ALUOut with the value in <u>B</u>
- 5. Write-back step (WrBack):
  - Write the memory content read into register file

#### □ Number of cycles for an instruction:

- R-type: 4
- Iw: 5
- sw: 4
- Branch or Jump: 3

X.S. Hu

## Some Simple Questi

# Some Simple Questions

# 妕

5-9

IN

### □ How many cycles will it take to execute this code?

lw \$t2, 0(\$t3)
lw \$t3, 4(\$t3)
beq \$t2, \$t3, Label #assume branch is not taken
add \$t5, \$t2, \$t3
sw \$t5, 8(\$t3)
Label: ...

### 5+5+3+4+4=21

- What is being done during the 8th cycle of execution? Compute memory address: 4+\$t3
- □ In what cycle does the actual addition of \$t2 and \$t3 takes place? 16

# Step 1: Instruction Fetch

- □ Use PC to fetch instruction and put it in the Instruction Register.
- □ Increment the PC by 4 and put the result back in the PC.
- □ How about express this in RTL?

### IR=Mem[PC], PC=PC+4

- □ What is the advantage of updating the PC now?
- Basic principle: do it ASAP!

X.S. Hu

X.S. Hu

5-11

5-10

## Step 2: Decode and Register Read



```
X.S. Hu
```

Step 4 RegWrite/MemRead



```
MDR = Mem[ALUOut];
  or
Mem[ALUOut] = B;
```

**R**-type instructions finish

```
RF[IR[15:11]] = ALUOut;
```





X.S. Hu

5-13

IN





5-14

IN

□ Which type of instruction needs this?

- **RF[IR[20:16]]= MDR;**
- □ What about all the other instructions?

## RTL Description: Put All Together (1)

### Ifetch: -> Decode, IR = Mem[PC], PC = PC + 4; Decode: ->Execute, A = RF[IR[25:21]], B = RF[IR[20:16]], ALUOut = PC + Sign\_Ext(IR[15:0]) << 2); Execute: if (opcode=lw) or (opcode=sw) then -> MRead/RegWrite, ALUOut = A + Sign\_Ext(IR[15:0]); if (opcode="R-type") then -> MRead/RegWrite, ALUOut = A op B; if (opcode=branch) then -> Ifetch, if (A=B) then PC= ALUout; if (opcode=jump) then -> Ifetch, PC=PC[31:28]||IR[25:0]||00;

## RTL Description: Put All Together (2)

MRead/RegWrite: if (opcode=lw) then -> WriteBack, MDR = Mem[ALUOut]; if (opcode=sw) then -> lfetch, Mem[ALUOut] = MDR; RF[IR[15:11]] = ALUOut, ->lfetch;

WriteBack: Mem[ALUOut] = MDR, ->Ifetch;

X.S. Hu

## **Execution Sequence Summary**

| - 5 | <u> </u> |   |
|-----|----------|---|
| т   | N        | 1 |
|     | - 1      | 9 |
|     |          | 1 |

5-17

斡

|                                                               | Action for R-type                                                  | Action for memory-reference                            | Action for                    | Action for                         |  |
|---------------------------------------------------------------|--------------------------------------------------------------------|--------------------------------------------------------|-------------------------------|------------------------------------|--|
| Step name                                                     | instructions                                                       | instructions                                           | branches                      | jumps                              |  |
| Instruction fetch                                             | IR = Mem[PC],                                                      |                                                        |                               |                                    |  |
|                                                               |                                                                    | PC = PC + 4                                            |                               |                                    |  |
| Instruction                                                   | A =RF [IR[25:21]],                                                 |                                                        |                               |                                    |  |
| decode/register fetch                                         | B = RF [IR[20:16]],<br>ALUOut = PC + (sign-extend (IR[1:-0]) << 2) |                                                        |                               |                                    |  |
|                                                               |                                                                    |                                                        |                               |                                    |  |
| Execution, address<br>computation, branch/<br>jump completion | ALUOut = A op B                                                    | ALUOut = A + sign-extend<br>(IR[15:0])                 | if (A =B) then<br>PC = ALUOut | PC = PC [31:28]  <br>(IR[25:0]<<2) |  |
| Memory access or R-type<br>completion                         | RF [IR[15:11]] =<br>ALUOut                                         | Load: MDR = Mem[ALUOut]<br>or<br>Store: Mem[ALUOut]= B |                               |                                    |  |
| Memory read completion                                        |                                                                    | Load: RF[IR[20:16]] = MDR                              |                               |                                    |  |

# A Multiple Cycle Datapath



Where do we need to insert mux's?Any other functional units?

X.S. Hu

X.S. Hu

IN

5-18

IIN

## Multiple Cycle Design

### $\hfill\square$ Break up the instructions into steps, each step takes a cycle

- balance the amount of work to be done
- restrict each cycle to use only one major functional unit

### □ At the end of a cycle

- store values for use in later cycles (easiest thing to do)
- introduce additional "internal" registers



## Control Signals



## Exercise: Add a New Instruction

### Let's try "jal"

政

IIM

RTL: PC = (PC+4)[3:0] || TargetAddr[25:0], RF[31] = PC + 4;



X.S. Hu

# ħħ

5-22

IN

## Implementing the Control

- □ Value of control signals is dependent upon:
  - what instruction is being executed
  - which step is being performed
- □ How to represent all the information?
  - finite state diagram
  - microprogramming
- Realization of a control unit is independent of the representation used
  - Control outputs: random logic, ROM, PLA
  - Next-state function: same as above or an explicit sequencer

X.S. Hu

5-23