# Lecture 21 Pipelining Hazards, Branches, Modern

University of Notre Dame, Department of Computer Science & Engineering

#### CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern)

## How do we deal with hazards?

- Often, pipeline must be stalled
- Stalling pipeline usually lets some instruction(s) in pipeline proceed, another/others wait for data, resource, etc.
- A note on terminology:
  - If we say an instruction was "issued later than instruction x", we mean that it was issued after instruction x and is not as far along in the pipeline
  - If we say an instruction was "issued earlier than instruction x", we mean that it was issued before instruction x and is further along in the pipeline

# The hazards of pipelining

- · Pipeline hazards prevent next instruction from executing during designated clock cycle
- There are 3 classes of hazards:
  - Structural Hazards:
    - · Arise from resource conflicts
    - HW cannot support all possible combinations of instructions
  - Data Hazards:
    - · Occur when given instruction depends on data from an instruction ahead of it in pipeline
  - Control Hazards:
    - Result from branch, other instructions that change flow of program (i.e. change PC)

University of Notre Dame, Department of Computer Science & Engineering

#### CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern)

# Stalls and performance

- Stalls impede progress of a pipeline and result in deviation from 1 instruction executing/clock cycle
- Pipelining can be viewed to:
  - Decrease CPI or clock cycle time for instruction
  - Let's see what affect stalls have on CPI...
- CPI pipelined =
  - Ideal CPI + Pipeline stall cycles per instruction
  - 1 + Pipeline stall cycles per instruction
- Ignoring overhead and assuming stages are balanced:

| Speedup = | CPI unpipelined                |
|-----------|--------------------------------|
| Speedup - | 1+ pipeline stall cycles per i |

(Recall combinational logic slide)

per instruction pipeline stall cycle





University of Notre Dame, Department of Computer Science & Engineering



|           | •  |    |    | Cloo  | ck Numb | er — |    |     |     |     |
|-----------|----|----|----|-------|---------|------|----|-----|-----|-----|
| Inst. #   | 1  | 2  | 3  | 4     | 5       | 6    | 7  | 8   | 9   | 10  |
| LOAD      | IF | ID | EX | MEM   | WB      |      |    |     |     |     |
| Inst. i+1 |    | IF | ID | EX    | MEM     | WB   |    |     |     |     |
| Inst. i+2 |    |    | IF | ID    | EX      | MEM  | WB |     |     |     |
| Inst. i+3 |    |    |    | stall | IF      | ID   | EX | MEM | WB  |     |
| Inst. i+4 |    |    |    |       |         | IF   | ID | EX  | MEM | WB  |
| Inst. i+5 |    |    |    |       |         |      | IF | ID  | EX  | MEA |
| Inst. i+6 |    |    |    |       |         |      |    | IF  | ID  | EX  |

CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern)

- LOAD instruction "steals" an instruction fetch cycle which will cause the pipeline to stall.
- Thus, no instruction completes on clock cycle 8





## Forwarding

- Problem illustrated on previous slide can actually be solved relatively easily – with <u>forwarding</u>
- In this example, result of the ADD instruction not really needed until after ADD actually produces it
- Can we move the result from EX/MEM register to the beginning of ALU (where SUB needs it)?
  - Yes! Hence this slide!
- Generally speaking:
  - Forwarding occurs when a result is passed directly to functional unit that requires it.
  - Result goes from output of one unit to input of another





#### CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern)

14

# HW Change for Forwarding







## Memory Data Hazards

- Seen register hazards, can also have  $\underline{\text{memory }} \underline{\text{hazards}}$ 
  - RAW:
    - store R1, O(SP)
    - load R4, O(SP)

|                 | 1 | 2 | 3  | 4   | 5  | 6  |
|-----------------|---|---|----|-----|----|----|
| Store R1, O(SP) | F | D | EX | M _ | WB |    |
| Load R1, O(SP)  |   | F | D  | EX  | M  | WB |

- In simple pipeline, memory hazards are easy
  - In order, one at a time, read & write in same stage
- In general though, more difficult than register hazards

### CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern)

# Data hazards and the compiler

- Compiler should be able to help eliminate some stalls caused by data hazards
- i.e. compiler could not generate a LOAD instruction that is immediately followed by instruction that uses result of LOAD's destination register.
- Technique is called "pipeline/instruction scheduling"

University of Notre Dame, Department of Computer Science & Engineering









| CSE 30321 - Lectur      | re 21 – Pipelining (Hazards, Branches, Modern) 2 | 6 |
|-------------------------|--------------------------------------------------|---|
| Dealing w/bra           | nch hazards: always stall                        |   |
| • Branch not taken      |                                                  |   |
| - Still must wait 3 cyc | cles                                             |   |
| - Time lost             |                                                  |   |
| - Could have spent cyc  | cles fetching and decoding next instructions     |   |
| clock cycle: CC 1 CC 2  | CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12       |   |
| 40 beq \$1, \$3, \$28   |                                                  |   |
| stall                   | webe auto auto auto                              |   |
| stall                   | TIM-1 CUEDE CUEDE CUEDE                          |   |
| stall                   |                                                  |   |
| 44 and \$12, \$2, \$5   |                                                  |   |
| 48 or \$13, \$6, \$2    |                                                  |   |
| 52 add \$14, \$2, \$2   | ĨĨĨŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢŢ            |   |
|                         |                                                  |   |



### CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern)

28

## Flushing unwanted instructions from pipeline

- Useful to compare w/stalling pipeline:
  - Simple stall: inject bubble into pipe at ID stage only
    - Change control to 0 in the ID stage
    - Let "bubbles" percolate to the right
  - Flushing pipe: must change inst. In IF, ID, and EX
    - Zero instruction field of IF/ID pipeline register
    - Use new control signal IF.Flush
    - Use existing "bubble injection" mux that zeros control for stalls
    - Signal ID.Flush is ORed w/stall signal from hazard detection unit
    - Add new muxes to zero EX pipeline register control lines
    - Both muxes controlled by single EX.Flush signal
- Control determines when to flush:
  - Depends on Opcode and value of branch condition

















Branch Prediction Accuracy: how often is our prediction correct



# What about control logic?

- For MIPS integer pipeline, all data hazards can be checked during ID phase of pipeline
- If data hazard, instruction stalled before its issued
- Whether forwarding is needed can also be determined at this stage, controls signals set
- If hazard detected, control unit of pipeline must stall pipeline and prevent instructions in IF, ID from advancing
- All control information carried along in pipeline registers so only these fields must be changed

University of Notre Dame, Department of Computer Science & Engineering

### CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern)

## Hazard Detection Logic

- Insert a bubble into pipeline if any are true:
  - ID/EX.RegWrite AND
    - ((ID/EX.RegDst=0 AND ID/EX.WriteRegRt=IF/ID.ReadRegRs) OR
    - (ID/EX.RegDst=1 AND ID/EX.WriteRegRd=IF/ID.ReadRegRs) OR
    - (ID/EX.RegDst=0 AND ID/EX.WriteRegRt=IF/ID.ReadRegRt) OR
    - (ID/EX.RegDst=1 AND ID/EX.WriteRegRd=IF/ID.ReadRegRt))

### - OR EX/MEM AND

- ((EX/MEM.WriteReg = IF/ID.ReadRegRs) OR
- (EX/MEM.WriteReg = IF/ID.ReadRegRt))
- OR MEM/WB.RegWrite AND
  - ((MEM/WB.WriteReg = IF/ID.ReadRegRs) OR
  - (MEM/WB.WriteReg = IF/ID.ReadRegRt))

Pipeline Notation Register ID/EX.RegDst Field

University of Notre Dame, Department of Computer Science & Engineering

# **Detecting Data Hazards**



#### University of Notre Dame, Department of Computer Science a Engineering

### CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern)

40

## RAW: Detect and Stall

- · detect RAW & stall instruction at ID before register read
  - mechanics? disable PC, F/D write
  - RAW detection? compare register names
    - notation: rs1(D) = src register #1 of inst. in D stage
    - compare: rs1(D) & rs2(D) w/ rd(D/X), rd(X/M), rd(M/W)
    - stall (disable PC + F/D, clear D/X) on any match
  - RAW detection? register busy-bits
    - $\cdot$  set for rd(D/X) when instruction passes ID
    - clear for rd(M/W)
    - stall if rs1(D) or rs2(D) are "busy"
  - (plus) low cost, simple
  - (minus) low performance (many stalls)



| CSE 30321 - Lecture                                                    | 21 - Pipelining (Hazards, Branches, Modern)       | 43 |  |  |  |  |
|------------------------------------------------------------------------|---------------------------------------------------|----|--|--|--|--|
| Flushing pip                                                           | eline after exception                             |    |  |  |  |  |
| clock cycle: CC 1 CC 2 CC<br>40 sub \$11, \$2, \$4                     | 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC 10 CC 11 CC 12 |    |  |  |  |  |
| 44 and \$12, \$2, \$5                                                  |                                                   |    |  |  |  |  |
| 48 or \$13, \$6, \$2                                                   | Here exception detected when add is in EX stage   |    |  |  |  |  |
| 4b add \$1, \$2, \$1                                                   |                                                   |    |  |  |  |  |
| 50 slt \$15, \$6, \$7                                                  |                                                   |    |  |  |  |  |
| 48 lw \$16, 50(\$7)                                                    |                                                   |    |  |  |  |  |
| 40000040 sw \$25, 1000(0)                                              |                                                   |    |  |  |  |  |
| • Cycle 6:                                                             |                                                   |    |  |  |  |  |
| - Exception detected, flush signals generated, bubbles injected        |                                                   |    |  |  |  |  |
| · Cycle 7                                                              |                                                   |    |  |  |  |  |
| - 3 bubbles appear in ID, EX, MEM stages                               |                                                   |    |  |  |  |  |
| - PC gets 40000040 <sub>hex</sub> , TrapPC gets 50 <sub>hex</sub>      |                                                   |    |  |  |  |  |
| University of Notre Dame, Department of Computer Science & Engineering |                                                   |    |  |  |  |  |

#### CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern)

44

## Managing exception hazards gets much worse!

• Different exception types may occur in different stages:

| Exception Cause       | Where it occurs   |
|-----------------------|-------------------|
| Undefined instruction | ID                |
| Invoking OS           | EX                |
| I/O device request    | Flexible          |
| Hardware malfunction  | Anywhere/flexible |

- · Challenge is to associate exception with proper instruction: difficult!
  - Relax this requirement in non-critical cases: imprecise exceptions
    - Most machines use precise instructions
  - Further challenge: exceptions can happen at same time University of Notre Dame, Department of Computer Science & Engineering





Increased time to resolve hazards





# Data hazard specifics

- There are actually 3 different kinds of data hazards!
  - Read After Write (RAW)
  - Write After Write (WAW)
  - Write After Read (WAR)
- We'll discuss/illustrate each on forthcoming slides. However, 1<sup>st</sup> a note on convention.
  - Discussion of hazards will use generic instructions i & j.
  - i is always issued before j.
  - Thus, i will always be further along in pipeline than j.
- With an in-order issue/in-order completion machine,

University of Notre Dame, Department of Computer Science & Engineering

#### CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern) CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern) Read after write (RAW) hazards Write after write (WAW) hazards • With RAW hazard, instruction j tries to read a • With WAW hazard, instruction j tries to write an operand before instruction i writes it. source operand before instruction i writes it. • Thus, j would incorrectly receive an old or incorrect • The writes are performed in wrong order leaving the value value written by earlier instruction • Graphically/Example: • Graphically/Example: (Note: how can this happen???) 12 i: ADD R1, R2, R3



Instruction j is a

write instruction

issued after i

i: DIV F1, F2, F3 Instruction i is a write instruction issued before j

University of Notre Dame, Department of Computer Science & Engineering

Instruction i is a

write instruction

issued before j

Instruction j is a read instruction

issued after i

i: SUB R4, R1, R6

University of Notre Dame, Department of Computer Science & Engineering

50

52

53

55

## WAW

- write-after-write (WAW) = artificial (name) dependence
  - add R1,R2,R3
  - sub R2,R4,R1
  - or R1,R6,R3
  - problem: reordering could leave wrong value in R1
    - later instruction that reads R1 would get wrong value
  - can't happen in vanilla pipeline (reg. writes are in order)
    - another reason for making ALU ops go through MEM stage
    - can happen: multi-cycle operations (e.g., FP, cache misses)
  - artificial: using different output register for or solves
    - Also a dependence on name (R1)

University of Notre Dame, Department of Computer Science & Engineering

### CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern)

## WAR

- write-after-read (WAR) = artificial (name) dependence
  - add R1, R2, R3
  - sub R2, R4, R1
  - or R1, R6, R3
  - problem: add could use wrong value for R2
    - can't happen in vanilla pipeline (reads in ID, writes in WB)
  - can happen if: early writes (e.g., auto-increment) + late reads (??)
  - can happen if: out-of-order reads (e.g., out-of-order execution)
  - <u>artificial</u>: using different output register for sub solves
    - The dependence is on the name R2, but not on actual dataflow

University of Notre Dame, Department of Computer Science & Engineering

#### CSE 30321 - Lecture 21 - Pipelining (Hazards, Branches, Modern)

# Write after read (WAR) hazards

- With WAR hazard, instruction j tries to write an operand before instruction i reads it.
- Instruction i would incorrectly receive newer value of its operand;
  - Instead of getting old value, it could receive some newer, undesired value:
- Graphically/Example:



Instruction j is a Instruction i is a write instruction issued after i issued before j

i: DIV F7, F1, F3 j: SUB F1, F4, F6