# L07-L09 recap: Fundamental lesson(s)

- Over the next 3 lectures (using the MIPS ISA as context)
   I'll explain:
  - How functions are treated and processed in assembly
  - How system calls are enabled in assembly
  - How exceptions are handled in assembly
- I'll also explain why it's important that register conventions be followed

#### L07-L09 recap: Why it's important...

- If you ever write a compiler or OS some day, you will need to be aware of, and code for all of the issues to be discussed over the next 3 lectures
- If you understand what architectural overhead may be associated with (compiled) procedure calls, you should be able to write much more efficient HLL code

# Lecture 10 The MIPS datapath

Suggested reading: (HP Chapter 4.1—4.4)

# Multicore processors and programming



#### **Processor components**



**Goal:** describe the fundamental components required in a single core of a modern microprocessor as well as how they interact with each other, with main memory, and with external storage media.

#### **Processor comparison**



Pentium'



Writing more efficient code



The right HW for the right application



**HLL** code translation

#### Fundamental lesson(s)

 Today we'll discuss what hardware is required to execute an instruction, as well as how to best "organize" it.

# Why it's important...

- If you ever design the HW for a microprocessor, etc.
   you'll need to be aware of these types of issues
- Understanding organization and how it impacts the delay of something like a memory reference - will make you a better programmer
- It is now more and more important to design HW/SW simultaneously

#### The organization of a computer

#### **Von Neumann Model:**

- Stored-program machine instructions are represented as numbers
- Programs can be stored in memory to be read/written just like numbers.



#### **Review: Functions of Each Component**

- Datapath: performs data manipulation operations
  - arithmetic logic unit (ALU)
  - floating point unit (FPU)
- Control: directs operation of other components
  - finite state machines
  - micro-programming (to be discussed)
- Memory: stores instructions and data
  - random access v.s. sequential access
  - volatile v.s. non-volatile
  - RAMs (SRAM, DRAM), ROMs (PROM, EEPROM), disk
  - tradeoff between speed and cost/bit
- Input/Output and I/O devices: interface to environment
  - mouse, keyboard, display, device drivers

#### Let's "derive" the MIPS datapath:

- To simplify things a bit we'll just look at a few instructions:
  - memory-reference: lw, sw
  - arithmetic-logical: add, sub, and, or, slt
  - branching: beq, j

Very common instructions

- Organizational overview:
  - fetch an instruction based on the content of PC
  - decode the instruction
  - fetch operands
    - (read one or two registers)
  - execute



- store result
  - (write to memory / write to register / update PC)



With Von Neumann, RISC model do similar things for each instruction

# A Single Cycle Datapath



#### **PERFORMANCE**

#### Single-Cycle Implementation

- Single-cycle, fixed-length clock:
  - CPI = 1
  - Clock cycle = propagation delay of the longest datapath operations among all instruction types
  - Easy to implement
- How to determine cycle length?
- Calculate cycle time assuming negligible delays except:
  - memory (2ns), ALU and adders (2ns), register file access (1ns)

```
    R-type: max {mem + RF + ALU + RF, Add} = 6ns
    LW: max{mem + RF + ALU + mem + RF, Add} = 8ns
    SW: max{mem + RF + ALU + mem, Add} = 7ns
    BEQ: max{mem + RF + ALU, max{Add, mem + Add}} = 5ns
```

What is the CC time?

#### Before, spoke about "multi-cycle" datapath



#### The multi-cycle approach (& benefits):

- Break an instruction into smaller steps
- Execute each step in one cycle.
- Execution sequence:
  - Balance amount of work to be done
  - Restrict each cycle to use only one major functional unit
  - At the end of a cycle
    - Store values for use in later cycles
    - Introduce additional "internal" registers

#### The advantages:

- Cycle time much shorter
- Diff. inst. take different # of cycles to complete
- Functional unit used more than once per instruction

# MIPS multi-cycle datapath



Note introduction of temporary storage registers

#### 5 steps an instruction could take:

#### 1. Instruction Fetch (Ifetch):

- Fetch instruction at address (\$PC)
- Store the instruction in register <u>IR</u>
- Increment PC

#### 2. Instruction Decode and Register Read (Decode):

- Decode the instruction type and read register
- Store the register contents in registers A and B
- Compute new PC address and store it in <u>ALUOut</u>

# 3. Execution, Memory Address Computation, or Branch Completion (Execute):

- Compute memory address (for LW and SW), or
- Perform R-type operation (for R-type instruction), or
- Update PC (for Branch and Jump)
- Store memory address or register operation result in <u>ALUOut</u>

#### 5 steps an instruction could take:

- 4. Memory Access or R-type instruction completion (MemRead/RegWrite/MemWrite):
  - Read memory at address ALUOut and store it in MDR
  - Write ALUOut content into register file, or
  - Write memory at address ALUOut with the value in B
- 5. Write-back step (WrBack):
  - Write the memory content read into register file

#### Number of cycles for an instruction:

- R-type: 4
- lw: 5
- sw: 4
- Branch or Jump: 3

## Instruction execution (summary):

|                                                         | Action for R-type                           | Action for memory-reference                        | Action for                 | Action for                         |  |  |
|---------------------------------------------------------|---------------------------------------------|----------------------------------------------------|----------------------------|------------------------------------|--|--|
| Step name                                               | instructions                                | instructions                                       | instructions branches      |                                    |  |  |
| Instruction fetch                                       |                                             | IR = Mem[PC],                                      |                            |                                    |  |  |
|                                                         | PC = PC + 4                                 |                                                    |                            |                                    |  |  |
| Instruction                                             | A =RF [IR[25:21]],                          |                                                    |                            |                                    |  |  |
| decode/register fetch                                   | B = RF [IR[20:16]],                         |                                                    |                            |                                    |  |  |
|                                                         | ALUOut = PC + (sign-extend (IR[1:-0]) << 2) |                                                    |                            |                                    |  |  |
| Execution, address computation, branch/ jump completion | ALUOut = A op B                             | ALUOut = A + sign-extend (IR[15:0])                | if (A =B) then PC = ALUOut | PC = PC [31:28]  <br>(IR[25:0]<<2) |  |  |
| Memory access or R-type completion                      | RF [IR[15:11]] =<br>ALUOut                  | Load: MDR = Mem[ALUOut]  or  Store: Mem[ALUOut]= B |                            |                                    |  |  |
| Memory read completion                                  |                                             | Load: RF[IR[20:16]] = MDR                          |                            |                                    |  |  |

#### Some questions:

How many cycles will it take to execute this code?

```
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label #assume branch is not taken
add $t5, $t2, $t3
sw $t5, 8($t3)
Label: ...
5+5+3+4+4=21
```

- What is being done during the 8th cycle of execution?
   Compute memory address: 4 + content of \$t3
- In what cycle does the addition of \$t2 and \$t3 takes place?
- How would performance compare if multi-cycle clock period is 2 ns and cycle cycle period is 10 ns?

(21 CCs x 2 ns) vs. (5 CCs x 10 ns): 42 ns vs. 50 ns

#### Foreshadowing: pipelines



Data must be stored from one stage to the next in pipeline registers/latches. hold temporary values between clocks and needed info. for execution.

#### Another way to look at it...



(Tying computer architecture to logic design...)

# A SHORT DISCUSSION ABOUT CONTROL LOGIC

#### The HW needed, plus control

Single cycle MIPS machine



# Single cycle control: inputs & outputs



# **Control Signal Table**

| R-type       |        |        |        | (inputs) |                 |  |
|--------------|--------|--------|--------|----------|-----------------|--|
|              | Add    | Sub    | LW     | SW       | BEQ             |  |
| Func (input) | 100000 | 100010 | XXXXXX | XXXXXX   | <b>▼</b> xxxxxx |  |
| Op (input)   | 000000 | 000000 | 100011 | 101011   | 000100          |  |
| RegDst       | 1      | 1      | 0      | X        | X               |  |
| ALUSrc       | 0      | 0      | 1      | 1        | 0               |  |
| Mem-to-Reg   | 0      | 0      | 1      | X        | X               |  |
| Reg. Write   | 1      | 1      | 1      | 0        | 0               |  |
| Mem. Read    | 0      | 0      | 1      | 0        | 0               |  |
| Mem. Write   | 0      | 0      | 0      | 1        | 0               |  |
| Branch       | 0      | 0      | 0      | 0        | 1               |  |
| ALUOp †      | Add    | Sub    | 00     | 00       | 01              |  |

(outputs)

#### The HW needed, plus control

Single cycle MIPS machine



#### Main control, ALU control



- Use OP field to generate ALUOp (encoding)
  - Control signal fed to ALU control block
- Use Func field and ALUOp to generate ALUctr (decoding)
  - Specifically sets 3 ALU control signals
    - B-Invert, Carry-in, operation

#### Main control, ALU control



|                      | R-type   | lw  | sw  | beq      |
|----------------------|----------|-----|-----|----------|
| <b>ALU Operation</b> | "R-type" | add | add | subtract |
| ALUOp<1:0>           | 10       | 00  | 00  | 01       |

We have 8 bits of input to our ALU control block; we need 3 bits of output...

#### Or in other words...

00 = ALU performs add

01 = ALU performs sub

10 = ALU does what function code says

#### **Generating ALUctr**

We want these outputs:

|                                        | <b>ALU Operation</b>  | and      | or    | add                                        | sub        | slt |           | and - 00 |     |
|----------------------------------------|-----------------------|----------|-------|--------------------------------------------|------------|-----|-----------|----------|-----|
|                                        | ALUctr<2:0>           | 000      | 001   | 010                                        | 110        | 111 |           | or - 01  | mux |
| ALUctr<2> = B-negate (C-in & B-invert) |                       |          |       |                                            | adder - 10 |     |           |          |     |
| ALUctr<1> = Select ALU Output          |                       |          | Inver | Invert B and C-in must be a 1 for subtract |            |     | less - 11 |          |     |
| Α                                      | LUctr<0> = Select ALU | J Output | +     | a 1 to                                     | or subtra  | act |           |          |     |



#### The logic:



#### The HW needed, plus control

Single cycle MIPS machine



# Well, here's what we did...

Single cycle MIPS machine



#### Controller FSM for 6-instruction processor

#### Recall:

- With multi-cycle, need to generate control signals at right time
- Captured by FSM
- Each level analogous to 1 CC



TABLE 8.2 Instruction opcodes.

Opcode

0000

Instruction

MOV Ra, d

# MIPS FSM diagram



# Might also store control signals in ROM





#### **Exceptions**

- Exceptions: unexpected events from within the processor
  - arithmetic overflow
  - undefined instruction
  - switching from user program to OS
- Interrupts: unexpected events from outside of the processor
  - I/O request
- Consequence: alter the normal flow of instruction execution
- Key issues:
  - detection
  - action
    - save the address of the offending instruction in the EPC
    - transfer control to OS at some specified address
- Exception type indication:
  - status register
  - interrupt vector

Another reason that register naming conventions are important.

# FSM with Exception Handling



#### Datapath with Exception Handling

