## L07-L09 recap: Fundamental lesson(s)

- Over the next 3 lectures (using the MIPS ISA as context) • I'll explain:
  - How functions are treated and processed in assembly
  - How system calls are enabled in assembly
  - How exceptions are handled in assembly
- I'll also explain why it's important that register ٠ conventions be followed

## L07-L09 recap: Why it's important...

- If you ever write a compiler or OS some day, you will need to be aware of, and code for all of the issues to be discussed over the next 3 lectures
- If you understand what architectural overhead may be associated with (compiled) procedure calls, you should be able to write much more efficient HLL code

2

## Lecture 10 The MIPS datapath

Suggested reading: (HP Chapter 4.1 - 4.4)



#### **Processor components**





| for i  | =0;              | L<5;                       | i++ ·              |
|--------|------------------|----------------------------|--------------------|
|        | a = 1            |                            |                    |
| }      |                  | Ţ                          |                    |
| MULT   | 1.r2.r3          | ¥ # r1                     | $\leftarrow r2^*r$ |
|        |                  |                            |                    |
|        |                  |                            |                    |
| ADD r2 | 2, <b>r1</b> ,r4 | <b>↓</b> # <mark>r2</mark> | ← r1+r             |
|        | 2, <b>r1</b> ,r4 |                            | ← r1+r             |

HLL code translation

## **Fundamental lesson(s)**

 Today we'll discuss what hardware is required to execute an instruction, as well as how to best "organize" it.

## Why it's important...

- If you ever design the HW for a microprocessor, etc. you'll need to be aware of these types of issues
- Understanding organization and how it impacts the delay of something like a memory reference - will make you a better programmer
- It is now more and more important to design HW/SW simultaneously

### The organization of a computer

#### Von Neumann Model:

- Stored-program machine instructions are represented as numbers
- Programs can be stored in memory to be read/written just like numbers.



### **Review: Functions of Each Component**

- · Datapath: performs data manipulation operations
  - arithmetic logic unit (ALU)
  - floating point unit (FPU)
- · Control: directs operation of other components
  - finite state machines
  - micro-programming (to be discussed)
- Memory: stores instructions and data
  - random access v.s. sequential access
  - volatile v.s. non-volatile
  - RAMs (SRAM, DRAM), ROMs (PROM, EEPROM), disk
  - tradeoff between speed and cost/bit
- Input/Output and I/O devices: interface to environment
  - mouse, keyboard, display, device drivers

5

## Let's "derive" the MIPS datapath:

- To simplify things a bit we'll just look at a few instructions:
  - memory-reference: lw, sw
  - arithmetic-logical: add, sub, and, or, slt
  - branching: beq, j

Very common instructions

With Von Neumann,

things for each

instruction

RISC model do similar

- Organizational overview:
  - fetch an instruction based on the content of PC
  - decode the instruction
  - fetch operands
    - (read one or two registers)
  - execute
    - (effective address calculation/arithmetic-logical operations/ comparison)
  - store result
    - (write to memory / write to register / update PC)



# A Single Cycle Datapath



## **Single-Cycle Implementation**

- Single-cycle, fixed-length clock:
  - CPI = 1
  - Clock cycle = propagation delay of the longest datapath operations among all instruction types
  - Easy to implement
- How to determine cycle length?
- Calculate cycle time assuming negligible delays except:
  - memory (2ns), ALU and adders (2ns), register file access (1ns)
  - R-type: max {mem + RF + ALU + RF, Add} = 6ns
  - LW: max{mem + RF + ALU + mem + RF, Add} = 8ns
  - SW: max{mem + RF + ALU + mem, Add} = 7ns
  - BEQ: max{mem + RF + ALU, max{Add, mem + Add}} = 5ns

What is the CC time?

## PERFORMANCE

### Before, spoke about "multi-cycle" datapath



### The multi-cycle approach (& benefits):

- Break an instruction into smaller steps
- Execute each step in one cycle.
- Execution sequence:
  - Balance amount of work to be done
  - Restrict each cycle to use only one major functional unit
  - At the end of a cycle
    - Store values for use in later cycles
    - Introduce additional "internal" registers
- The advantages:
  - Cycle time much shorter
  - Diff. inst. take different # of cycles to complete
  - Functional unit used more than once per instruction

13

## **MIPS multi-cycle datapath**



# 5 steps an instruction could take:

- 1. Instruction Fetch (lfetch):
  - Fetch instruction at address (\$PC)
  - Store the instruction in register <u>IR</u>
  - Increment PC
- 2. Instruction Decode and Register Read (Decode):
  - Decode the instruction type and read register
  - Store the register contents in registers  $\underline{\textbf{A}}$  and  $\underline{\textbf{B}}$
  - Compute new PC address and store it in <u>ALUOut</u>
- 3. Execution, Memory Address Computation, or Branch Completion (Execute):
  - Compute memory address (for LW and SW), or
  - Perform R-type operation (for R-type instruction), or
  - Update PC (for Branch and Jump)
  - Store memory address or register operation result in ALUOut

## 5 steps an instruction could take:

- 4. Memory Access or R-type instruction completion (MemRead/RegWrite/MemWrite):
  - Read memory at address ALUOut and store it in MDR
  - Write ALUOut content into register file, or
  - Write memory at address ALUOut with the value in **B**
- 5. Write-back step (WrBack):
  - Write the memory content read into register file

#### Number of cycles for an instruction:

- R-type: 4
- lw: 5
- sw:4
- Branch or Jump: 3

# Instruction execution (summary):

|                         | Action for R-type                           | Action for memory-reference          | Action for     | Action for      |  |
|-------------------------|---------------------------------------------|--------------------------------------|----------------|-----------------|--|
| Step name               | instructions                                | instructions                         | branches       | jumps           |  |
| Instruction fetch       | IR = Mem[PC],                               |                                      |                |                 |  |
| -                       |                                             | PC = PC + 4                          |                |                 |  |
| Instruction             | A =RF [IR[25:21]],                          |                                      |                |                 |  |
| decode/register fetch   | B = RF [IR[20:16]],                         |                                      |                |                 |  |
|                         | ALUOut = PC + (sign-extend (IR[1:-0]) << 2) |                                      |                |                 |  |
| Execution, address      | ALUOut = A op B                             | ALUOut = A + sign-extend             | if (A =B) then | PC = PC [31:28] |  |
| computation, branch/    |                                             | (IR[15:0]) PC = ALUOut (IR[25:0]<<2) |                |                 |  |
| jump completion         |                                             |                                      |                |                 |  |
| Memory access or R-type | RF [IR[15:11]] =                            | Load: MDR = Mem[ALUOut]              |                |                 |  |
| completion              | ALUOut                                      | or                                   |                |                 |  |
|                         |                                             | Store: Mem[ALUOut]= B                |                |                 |  |
| Memory read completion  |                                             | Load: RF[IR[20:16]] = MDR            |                |                 |  |

17

### Some questions:

#### · How many cycles will it take to execute this code?

```
Iw $t2, 0($t3)
Iw $t3, 4($t3)
beq $t2, $t3, Label #assume branch is not taken
add $t5, $t2, $t3
sw $t5, 8($t3)
```

Label: ...

5+5+3+4+4=21

- What is being done during the 8th cycle of execution?
   Compute memory address: 4 + content of \$t3
- In what cycle does the addition of \$t2 and \$t3 takes place?
   16
- How would performance compare if multi-cycle clock period is 2 ns and cycle cycle period is 10 ns?

```
(21 CCs x 2 ns) vs. (5 CCs x 10 ns): 42 ns vs. 50 ns
```

## **Foreshadowing: pipelines**



Data must be stored from one stage to the next in pipeline registers/latches. hold temporary values between clocks and needed info. for execution.

## Another way to look at it...



(Tying computer architecture to logic design...)

# A SHORT DISCUSSION ABOUT CONTROL LOGIC

21

### The HW needed, plus control





## Single cycle control: inputs & outputs



## **Control Signal Table**

| R-type       |        |        | (inputs) |        |          |
|--------------|--------|--------|----------|--------|----------|
|              | Add    | Sub    | LW       | SW     | BEQ      |
| Func (input) | 100000 | 100010 | xxxxxx   | XXXXXX | * xxxxxx |
| Op (input)   | 000000 | 000000 | 100011   | 101011 | 000100   |
| RegDst       | 1      | 1      | 0        | X      | X        |
| ALUSrc       | 0      | 0      | 1        | 1      | 0        |
| Mem-to-Reg   | 0      | 0      | 1        | X      | X        |
| Reg. Write   | 1      | 1      | 1        | 0      | 0        |
| Mem. Read    | 0      | 0      | 1        | 0      | 0        |
| Mem. Write   | 0      | 0      | 0        | 1      | 0        |
| Branch       | 0      | 0      | 0        | 0      | 1        |
| ALUOp 🛉      | Add    | Sub    | 00       | 00     | 01       |

(outputs)

#### 25

## Main control, ALU control



- Use OP field to generate ALUOp (encoding) - Control signal fed to ALU control block
- Use Func field and ALUOp to generate ALUctr (decoding)
  - Specifically sets 3 ALU control signals
    - B-Invert, Carry-in, operation

## build a Main Control Block

Main control, ALU control

#### Func ALUctr /6 OP ALU Main ALUOp Control 3 Control 6 2

Outputs of main control, become inputs to ALU control

|               | R-type   | lw  | sw  | beq      |
|---------------|----------|-----|-----|----------|
| ALU Operation | "R-type" | add | add | subtract |
| ALUOp<1:0>    | 10       | 00  | 00  | 01       |

We have 8 bits of input to our ALU control block; we need 3 bits of output...

ALU

Or in other words...

00 = ALU performs add

01 = ALU performs sub

10 = ALU does what function code says

## The HW needed, plus control





# **Generating ALUctr**

• We want these outputs:



## The HW needed, plus control





# The logic:



### Well, here's what we did...

Single cycle MIPS machine



## **Controller FSM for 6-instruction processor**

#### **Recall:**

- With multi-cycle, need to generate control signals at right time
- · Captured by FSM



33

TABLE 8.2 Instruction opcodes.

Oncode

Instruction

# Might also store control signals in ROM





# **MIPS FSM diagram**



#### 34

### **Exceptions**

- · Exceptions: unexpected events from within the processor
  - arithmetic overflow
  - undefined instruction
  - switching from user program to OS
- Interrupts: unexpected events from outside of the processor
   I/O request
- Consequence: alter the normal flow of instruction execution
- Key issues:
  - detection
  - action
    - · save the address of the offending instruction in the EPC
    - · transfer control to OS at some specified address
- Exception type indication:
  - status register
  - interrupt vector

Another reason that register naming conventions are important.



## **Datapath with Exception Handling**

