# CSE 30321 - Lecture 13/14 - In Class Handout

For the sequence of instructions shown below, show how they would progress through the pipeline. **For all of these problems:** 

- Stalls are indicated by placing the code of the stage where the hazard would be discovered in the succeeding square
  We will assume a standard 5 stage pipeline
- We will assume a standard 5 stage pipeline
  - (IF = Instruction Fetch, ID = Instruction Decode, EX = Execute, M = Memory Access, WB = Write Back)
- Assume that each stage of the pipeline takes just 1 clock cycle to finish.

## Example 1:

- Assume that forwarding HAS NOT been implemented
- Assume that you CANNOT read and write a register in the same clock cycle

| Instruction                 | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10                                                                        | 11 | 12 | 13 | 14  | 15                             | 16                               | 17   |  |  |
|-----------------------------|----|----|----|----|----|----|----|----|----|---------------------------------------------------------------------------|----|----|----|-----|--------------------------------|----------------------------------|------|--|--|
| <b>Add</b><br>\$5, \$3, \$4 | IF | ID | EX | М  | WB |    |    |    |    |                                                                           |    |    |    |     |                                |                                  |      |  |  |
| <b>Add</b><br>\$6, \$5, \$7 |    | IF | ID | ID | ID | ID | EX | м  | w  | Add must wait until \$5 written by previous add;<br>reads \$5 in ID stage |    |    |    |     |                                |                                  |      |  |  |
| <b>LW</b><br>\$7, 0(\$6)    |    |    | IF | IF | IF | IF | ID | ID | ID | ID                                                                        | EX | м  | WB | add | ′ stalle<br>l; need<br>eads in | ls \$6 to                        | - 00 |  |  |
| <b>SUB</b><br>\$1, \$2, \$3 |    |    |    |    |    |    | IF | IF | IF | IF                                                                        | ID | EX | М  | WB  |                                | Pipeline full so<br>SUB can't go |      |  |  |
| <b>Add</b><br>\$9, \$7, \$8 |    |    |    |    |    |    |    |    |    |                                                                           | IF | ID | ID | ID  | EX                             | М                                | WB   |  |  |

## (Last add must wait for \$7 from LW)

## Example 2:

- Let's do the same problem as before, but now assume that forwarding HAS been implemented
- Assume that you CANNOT read and write from the same register in the register file in the same clock cycle

| Instruction                 | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14                                                                            | 15              | 16                | 17 |  |  |  |
|-----------------------------|----|----|----|----|----|----|----|----|----|----|----|----|----|-------------------------------------------------------------------------------|-----------------|-------------------|----|--|--|--|
| <b>Add</b><br>\$5, \$3, \$4 | IF | ID | EX | М  | WB |    |    |    |    |    |    |    |    | Data to be loaded into \$5<br>available at end of CC 3 /<br>Beginning of CC 4 |                 |                   |    |  |  |  |
| <b>Add</b><br>\$6, \$5, \$7 |    | IF | ID | EX | М  | WB |    |    |    |    |    |    |    | Add gets data for \$5 directly from output of ALU                             |                 |                   |    |  |  |  |
| <b>LW</b><br>\$7, 0(\$6)    |    |    | IF | ID | EX | м  | WB |    |    |    |    |    |    | v gets<br>from o                                                              |                 |                   |    |  |  |  |
| <b>SUB</b><br>\$1, \$2, \$3 |    |    |    | IF | ID | EX | М  | WB |    |    |    |    | No | deper<br>other                                                                |                 | es on a<br>ctions |    |  |  |  |
| <b>Add</b><br>\$9, \$7, \$8 |    |    |    |    | IF | ID | EX | М  | WB |    |    |    | b  | gets \$<br>etweer<br>write<br>(its an                                         | n mem<br>back s | iory an<br>stage  | d  |  |  |  |

# <u>Example 3:</u>

- Like Example 2, assume that forwarding HAS been implemented
- Assume that you CAN read and write a register in the same clock cycle

| Instruction                 | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12                                                                                                  | 13 | 14 | 15 | 16 | 17 |  |  |
|-----------------------------|----|----|----|----|----|----|----|----|----|----|----|-----------------------------------------------------------------------------------------------------|----|----|----|----|----|--|--|
| <b>Add</b><br>\$1, \$6, \$9 | IF | ID | EX | М  | WB |    |    |    |    |    |    |                                                                                                     |    |    |    |    |    |  |  |
| <b>Add</b><br>\$6, \$2, \$4 |    | IF | ID | EX | М  | WB |    |    |    |    |    |                                                                                                     |    |    |    |    |    |  |  |
| <b>LW</b><br>\$7, 0(\$6)    |    |    | IF | ID | EX | М  | WB |    |    |    |    | LW instruction producing result<br>that will be stored in \$7; even with                            |    |    |    |    |    |  |  |
| <b>SUB</b><br>\$1, \$7, \$8 |    |    |    | IF | ID | ID | EX | М  | WB |    |    | forwarding must stall; data not<br>available until end of CC #6 and<br>needed at beginning of CC #6 |    |    |    |    |    |  |  |
| <b>Add</b><br>\$9, \$1, \$8 |    |    |    |    | IF | IF | ID | EX | М  | WB |    |                                                                                                     |    |    |    |    |    |  |  |

## Example 4:

- Assume that 16% of instructions change the flow of a program
  - $\circ$  1 in 6 is about right actually.
  - 4% are unconditional branches
    - Unconditional branches would incur a 3 CC penalty because you would need to calculate a new address
  - 12% are conditional branches
    - 50% of conditional branches are taken
    - 50% of conditional branches are not taken
  - What is the impact on performance assuming:
    - o N instructions are executed
    - We predict that branches are not taken
      - Seemingly the only way right?
- Well...

-

- $\circ$   $\,$  4% of the time we will have a 3 CC penalty
  - Therefore: 0.04 x N x 3 = 0.12 N
- o 6% of the time we will also have a 3 CC penalty because we guessed wrong
  - i.e. 0.12 x 0.5 x 3 x N = 0.18 N
- If our ideal CPI is 1 we now have...
  - N + 0.3N = 1.3N
- We take a 30% performance hit!

# Example 5:

Assume the following:

- 25% of instructions are loads  $\rightarrow$  50% of the time, the <u>next</u> instruction uses the loaded value
- 13% of instructions are stores
- 19% of instructions are conditional branches
- 2% of instructions are unconditional branches
- 43% of instructions are something else

# Also...

-

- You have a 5 stage pipeline with forwarding
- There is a 1 CC penalty if an instruction immediately needs a loaded value
- We have added extra hardware to resolve a jump/branch instruction in the decode stage
  Therefore, there is just a 1 CC penalty
- 75% of conditional branches are predicted correctly

# What is the CPI of our pipeline?

- $0.23 \times 0.5 \times 1 = 0.115$ 
  - $\circ$  23% of the time we have a lw and 50% of those times, we need the result right away
  - 0.02 x 1 = 0.02
    - $\circ$  2% of the time we have a jump and have a 1 CC penalty
- $0.25 \times 0.19 \times 1 = 0.0475$ 
  - $\circ$  25% of the time we guess wrong on our branch and have a 1 CC penalty

Therefore 0.115 + 0.02 + 0.0475 = 0.1825

# If our ideal CPI is 1, then our new CPI is 1.1825

# Example 6:

- Assume that forwarding HAS been implemented
- We will stall if we encounter a branch instruction
- Branches or Jumps are resolved after the EX stage.
- Assume that register \$2 has the value of 0 and \$3 has the value of 0

| Instruction                    | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14  | 15              | 16                           | 17   |
|--------------------------------|----|----|----|----|----|----|----|----|----|----|----|----|----|-----|-----------------|------------------------------|------|
| <b>LW</b><br>\$1, 4(\$9)       | IF | ID | EX | М  | w  |    |    |    |    |    |    |    |    |     |                 |                              |      |
| <b>Add</b><br>\$4, \$1, \$9    |    | IF | ID | ID | EX | М  | w  |    |    |    |    |    |    | Add | gets d<br>forwa | ata fro<br>Irding            | m lw |
| <b>Sub</b><br>\$7, \$4, \$9    |    |    | IF | IF | ID | EX | м  | WB |    |    |    |    |    |     | -               | data fr<br>wardin            |      |
| <b>BEQ</b><br>\$2, \$3, X      |    |    |    |    | IF | ID | EX |    |    |    |    |    |    |     |                 | still wa<br>pipeli           | -    |
| <b>Add</b><br>\$9, \$8, \$7    |    |    |    |    |    |    |    |    |    |    |    |    |    |     |                 |                              |      |
| <b>And</b><br>\$4, \$5, \$5    |    |    |    |    |    |    |    |    |    |    |    |    |    |     |                 |                              |      |
| <b>X: Add</b><br>\$4, \$5, \$9 |    |    |    |    |    |    |    | IF | ID | EX | М  | WB |    | aft | er BEC          | t Add u<br>9 finish<br>compa | nes  |

## Example 7:

- Assume that forwarding HAS been implemented
- We will predict that any branch instruction is **NOT TAKEN**
- Branches or Jumps are resolved after the EX stage.
- Assume that register \$2 has the value of 0 and \$3 has the value of 0

| Instruction                    | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12                                                                                                                                          | 13                           | 14     | 15     | 16     | 17  |  |
|--------------------------------|----|----|----|----|----|----|----|----|----|----|----|---------------------------------------------------------------------------------------------------------------------------------------------|------------------------------|--------|--------|--------|-----|--|
| <b>LW</b><br>\$1, 4(\$9)       | IF | ID | EX | М  | w  |    |    |    |    |    |    |                                                                                                                                             |                              |        |        |        |     |  |
| <b>Add</b><br>\$4, \$1, \$9    |    | IF | ID | ID | EX | М  | w  |    |    |    |    |                                                                                                                                             |                              |        |        |        |     |  |
| <b>Sub</b><br>\$7, \$4, \$9    |    |    | IF | IF | ID | EX | М  | WB |    |    |    |                                                                                                                                             |                              |        |        |        |     |  |
| <b>BEQ</b><br>\$2, \$3, X      |    |    |    |    | IF | ID | EX |    |    |    |    |                                                                                                                                             |                              |        |        |        |     |  |
| <b>Add</b><br>\$9, \$8, \$7    |    |    |    |    |    | IF | ID |    |    |    |    | hov                                                                                                                                         | and A<br>vever, t<br>ate unt | they w | ould n | ot cha | nge |  |
| <b>And</b><br>\$4, \$5, \$5    |    |    |    |    |    |    | IF |    |    |    |    | state until CC 10 and 11. They<br>never get this far so there is no<br>harm done. We can kill them and<br>restart the next add instruction. |                              |        |        |        |     |  |
| <b>X: Add</b><br>\$4, \$5, \$9 |    |    |    |    |    |    |    | IF | ID | EX | М  | w                                                                                                                                           |                              |        |        |        |     |  |

## Example 8:

- Assume that forwarding HAS been implemented
- We will predict that any branch instruction is TAKEN
- Branches or jumps are resolved after the EX stage.
- Assume that register \$2 has the value of 0 and \$3 has the value of 0

| Instruction                    | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
|--------------------------------|----|----|----|----|----|----|----|----|---|----|----|----|----|----|----|----|----|
| <b>LW</b><br>\$1, 4(\$9)       | IF | ID | EX | М  | w  |    |    |    |   |    |    |    |    |    |    |    |    |
| <b>Add</b><br>\$4, \$1, \$9    |    | IF | ID | ID | EX | М  | w  |    |   |    |    |    |    |    |    |    |    |
| <b>Sub</b><br>\$7, \$4, \$9    |    |    | IF | IF | ID | EX | М  | WB |   |    |    |    |    |    |    |    |    |
| <b>BEQ</b><br>\$2, \$3, X      |    |    |    |    | IF | ID | EX |    |   |    |    |    |    |    |    |    |    |
| <b>Add</b><br>\$9, \$8, \$7    |    |    |    |    |    |    |    |    |   |    |    |    |    |    |    |    |    |
| <b>And</b><br>\$4, \$5, \$5    |    |    |    |    |    |    |    |    |   |    |    |    |    |    |    |    |    |
| <b>X: Add</b><br>\$4, \$5, \$9 |    |    |    |    |    | IF | ID | EX | М | W  |    |    |    |    |    |    |    |

This is the best situation – the last add instruction finishes 2 CC's earlier.

## Example 9:

For the sequence of instructions shown below, show how they would progress through the pipeline.

<u>Part 1:</u>

- Assume that forwarding HAS been implemented -
- -We will predict that any branch instruction is **NOT TAKEN**
- Branches or Jumps are resolved after the EX stage. -
- Assume that register \$8 <u>does not equal</u> \$1 for the  $1^{st}$  Beq instruction Assume that register \$17 <u>does equal</u> \$26 for the  $2^{nd}$  Beq instruction -
- \_

| Instruction                     | 1 | 2 | 3 | 4 | 5 | 6 | 7   | 8 | 9 | 10  | 11 | 12 | 13 | 14 | 15 | 16                  | 17 |
|---------------------------------|---|---|---|---|---|---|-----|---|---|-----|----|----|----|----|----|---------------------|----|
| <b>SUB</b><br>\$1, \$2, \$3     | F | D | E | м | w |   |     |   |   |     |    |    |    |    |    |                     |    |
| <b>Add</b><br>\$8, \$9, \$10    |   | F | D | E | М | W |     |   |   |     |    |    |    |    |    |                     |    |
| <b>Beq</b><br>\$1, \$8, X       |   |   | F | D | E | М | W ┥ |   |   |     |    |    |    |    |    | n think<br>h pipeli |    |
| <b>Lw</b><br>\$7, 0(\$20)       |   |   |   | F | D | E | M   | w |   |     |    |    |    |    |    |                     |    |
| <b>Add</b><br>\$11, \$7, \$12   |   |   |   |   | F | D | D   | ł | М | w   |    |    |    |    |    |                     |    |
| <b>Sw</b><br>\$11, 0(\$24)      |   |   |   |   |   | F | F   | D | E | M 🔻 | W  |    |    |    |    |                     |    |
| <b>X: Addi</b><br>\$17, \$17, 1 |   |   |   |   |   |   |     | F | D | E   | Μ  | W  |    |    |    |                     |    |
| <b>Beq</b><br>\$17, \$26, Y     |   |   |   |   |   |   |     |   | F | D   | E  | M  | w  |    |    |                     |    |
| <b>Sub</b><br>\$5, \$6, \$7     |   |   |   |   |   |   |     |   |   | F   | D  |    |    |    |    |                     |    |
| <b>Or</b><br>\$8, \$5, \$5      |   |   |   |   |   |   |     |   |   |     | F  |    |    |    |    |                     |    |
| <b>Y: Addi</b><br>\$17, \$17, 1 |   |   |   |   |   |   |     |   |   |     |    | F  | D  | E  | Μ  | W                   |    |
| <b>Sw</b><br>\$17, 0(\$10)      |   |   |   |   |   |   |     |   |   |     |    |    | F  | D  | E  | M 🕈                 | w  |
| <b>SUB</b><br>\$1, \$2, \$3     |   |   |   |   |   |   |     |   |   |     |    |    |    | F  | D  | E                   |    |
| <b>Add</b><br>\$8, \$9, \$10    |   |   |   |   |   |   |     |   |   |     |    |    |    |    | F  | D                   |    |

## <u>Part 2:</u>

- (i) Assume that this sequence of code is executed 100 times. How many cycles does the pipelined implementation take?
- (ii) How many cycles would this code take in a multi-cycle implementation?
- From Part 1, you can see that it takes 17 clock cycles to execute 12 instructions.
- However, we can start the next "iteration" in clock cycle 14. Therefore, it *really* only takes 13 cycles for each iteration and 17 CCs for the last one.
- Therefore, iterations 1 through 99 take 13 CCs each
  (13 x 99 = 1287 CCs)
- Iteration 100 takes 17 CCs
- Therefore 1287 CCs + 17 CCs = 1304 CCs
- For the multi-cycle implementation, we have:
  - o 9 instructions that take 4 CCs
  - o 2 instructions that take 3 CCs
  - $\circ$  1 instruction that takes 5 CCs
- Therefore, each "iteration" takes: (9x4) + (2x3) + (1x5) = 36 + 6 + 5 = 47 CCs
- If there are 100 iterations, then 4700 CCs are required

Pipelining gives us a speed up of 4700 / 1304 = 3.6 for this implemention

- Little to no extra HW is needed!