#### Lectures 02, 03 Introduction to Stored Programs

Suggested reading: HP Chapters 1.1-1.3

#### Some slides/images from Vahid text - hence this notice:

Copyright © 2007 Frank Vahid

Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities, subject to keeping this copyright notice in place and unmodified. These slides may be posted of axiaminimate for versions on publicly-accessible courses vehsites. PowerPoint source (or pdf with animations) may <u>table</u> by posted to publicly-accessible vehsites, but may be posted of axiamini protected sites or distributed directly to students by other electronic means. Instructors may make printions of the slides available to students for a reasonable photocopying charge, without incurring royables. Any other use requires explicit permission. Instructors may obtain PowerPoint source or bottim special use permissions from Wiley – see <u>functoward</u> which can different and.

#### **Fundamental lesson(s)**

• How code you write (compiled C for example) is ultimately run on HW.

# efficient code The right HW for the right application Why it's important...

Writing more

Multicore processors and programming

• You'll learn what your microprocessor actually does when you compile and execute code written in a HLL.

**Processor components** 

- Equally important, at the heart of this discussion is the "stored program model".
  - This is a fundamental idea that we'll discuss over the entire semester ... and many things build off of it.

1



| ] | }                                                         |        |        |        |    |  |
|---|-----------------------------------------------------------|--------|--------|--------|----|--|
|   | MULT r1,r2,r3 # r1 ← r2*r3<br>ADD r2,r1,r4 ↓ # r2 ← r1+r4 |        |        |        |    |  |
| 1 |                                                           |        | •      |        | i. |  |
|   | 110011                                                    | 000001 | 000010 | 000011 |    |  |
|   | 001110                                                    | 000010 | 000001 | 000100 |    |  |
| - |                                                           |        |        |        |    |  |

4

### Board Discussion #1: Introduction to stored programs

#### **Board discussion summary:**

Stored program model has been around for a long time...





6

### **Board discussion summary:**



PC+1

becomes



001110

000010

000001

# A simple "Von Neumann" architecture

- "Von Neumann architecture" synonymous
   with "programmable processor"
- Processing generally consists of:
   Other, peripheral
   HW
  - Loading some data
  - Transforming that data
  - Storing that data
- Datapath: core of a programmable processor
  - Can read/write data memory
  - Has register file to hold subsets of memory
     (in a local, fast memory)
  - Has ALU to transform local data



#### Datapath

000100

## **Basic datapath operations**

- · Load: load data from data memory to RF
- ALU operation: transforms data by passing one or two RF values through ALU (for ADD, SUB, AND, OR, etc.); data written back to RF
- Store operation: stores RF register value back into data memory
- Each operation can be done in one clock cycle



#### The datapath control unit

- To carry out each instruction, the control unit must:
  - Fetch Read instruction from instruction memory
  - Decode Determine the operation and operands of the instruction
  - Execute Carry out the instruction's operation using the datapath



# The datapath control unit

- D[9] = D[0] + D[1] requires a sequence of four datapath operations:
  - 0: RF[0] = D[0] 1: RF[1] = D[1] 2: RF[2] = RF[0] + RF[1] 3: D[9] = RF[2]
- Each operation is an instruction
- Sequence of instructions program
- Programmable processors decomposing desired computations into processor-supported operations
- Store program in *instruction memory*
- *Control unit* reads each instruction and executes it on the datapath
  - PC: Program counter address of current instruction
  - IR: Instruction register current instruction



<u>Foreshadowing</u>: What if we want ALU to add, subtract? How do we tell it what to do?

#### 10

#### The datapath control unit

- To carry out each instruction, the control unit must:
  - Fetch Read instruction from instruction memory
  - Decode Determine the operation and operands of the instruction
  - Execute Carry out the instruction's operation using the datapath



#### Datapath + control = 3-instruction programmable processor

What does this tell you Instruction Set – List of allowable inst • about data memory? representation in memory, e.g., - Load instruction -0000 r<sub>3</sub>r<sub>2</sub>r<sub>1</sub>r<sub>0</sub> d<sub>7</sub>d<sub>6</sub>d<sub>5</sub>d<sub>4</sub>d<sub>3</sub>d<sub>2</sub>d<sub>1</sub>d<sub>0</sub> - Store instruction  $+ 0001 r_3 r_2 r_1 r_0 d_7 d_6 d_5 d_4 d_3 d_2 d_1 d_0$ - Add instruction  $-\frac{1}{2}$  0010 ra<sub>3</sub>ra<sub>2</sub>ra<sub>1</sub>ra<sub>0</sub> rb<sub>3</sub>rb<sub>2</sub>rb<sub>1</sub>rb<sub>0</sub> rc<sub>3</sub>rc<sub>2</sub>rc<sub>1</sub>rc<sub>0</sub> Desired program F[0]=D[0] "Instruction" is an idea that What does this tell us about F[1]=D[1] helps abstract 1s, 0s, but the register file? F[2]=RF[0]+RF[1] D[9]=RF[2] still provides info. about HW Instruction memory I 0: 0000 0000 0000000 Instructions in 0s and 1s 1:0000 0001 00000001 2: 0010 0010 0000 0001 - machine code 3: 0001 0010 00001001 opcode operands 13

#### Toward a more detailed, realistic datapath...

Now, create detailed connections among components



# Toward a more detailed, realistic datapath...

- To design the processor, we can begin with a high-level state machine description of the processor's behavior
  - Control unit manages instruction fetch, flow through datapath HW



#### Toward a more detailed, realistic datapath...

- Convert high-level state machine description of entire processor to FSM description of controller
  - Use datapath and other components to achieve same behavior



#### Be sure you understand the timing!



#### Important: understand the timing!

- Will the correct instruction be fetched if PC is incremented during the fetch cycle?
- While executing "MOV R1, 3", what is the content of PC and IR at the end of the 1st cycle, 2nd cycle, 3rd cycle, etc.? (assume we're at start of program)
- What if it takes more than 1 cycle for memory read?



#### Assembly code (for 3-instruction processor)

- Machine code (0s and 1s) hard to work with
- Assembly code uses mnemonics
  - Load instruction-MOV Ra, d
    - specifies the operation RF[a]=D[d].
      - a is # between 0 and 15
      - R0 means RF[0], R1 means RF[1], etc.
      - d is # between 0 and 255
  - • Store instruction-MOV d, Ra
    - specifies the operation D[d]=RF[a]
  - • Add instruction-ADD Ra, Rb, Rc
    - specifies the operation RF[a]=RF[b]+RF[c]

machine code

| 0: 0000 0000 0000 |
|-------------------|
| 0.00000000000000  |
| 1:0000 0001 0000  |
| 2:0010 0010 0000  |
| 3: 0001 0010 0000 |
|                   |

 00 0000 00000000
 0: MOV R0, 0

 00 0001 00000001
 1: MOV R1, 1

 0 0010 0000 0001
 2: ADD R2, R0, R1

 01 0010 00001001
 3: MOV 9, R2

assembly code

18

#### A 6-Instruction programmable processor

- · Let's add three more (useful) instructions:
  - Load-constant instruction  $-0011 r_3 r_2 r_1 r_0 c_7 c_6 c_5 c_4 c_3 c_2 c_1 c_0$ • MOV Ra, #c-specifies the operation RF[a]=c
  - Subtract instruction  $-0100 \text{ ra}_3 \text{ra}_2 \text{ra}_1 \text{ra}_0 \text{ rb}_3 \text{rb}_2 \text{rb}_1 \text{rb}_0 \text{ rc}_3 \text{rc}_2 \text{rc}_1 \text{rc}_0$ 
    - SUB Ra, Rb, Rc—specifies the operation RF[a]=RF[b] RF[c]
  - Jump-if-zero instruction 0101  $ra_3ra_2ra_1ra_0 o_7o_6o_5o_4o_3o_2o_1o_0$ 
    - JMPZ Ra, offset—specifies the operation PC = PC + offset if RF[a] is 0

| Instruction     | Meaning                 |  |
|-----------------|-------------------------|--|
| MOV Ra, d       | RF[a] = D[d]            |  |
| MOV d, Ra       | D[d] = RF[a]            |  |
| ADD Ra, Rb, Rc  | RF[a] = RF[b] + RF[c]   |  |
| MOV Ra, #C      | RF[a] = C               |  |
| SUB Ra, Rb, Rc  | RF[a] = RF[b] - RF[c]   |  |
| JMPZ Ra, offset | PC=PC+offset if RF[a]=0 |  |
|                 |                         |  |

| TABLE 8.2 Instruction opcodes. |        |  |  |
|--------------------------------|--------|--|--|
| Instruction                    | Opcode |  |  |
| MOV Ra, d                      | 0000   |  |  |
| MOV d, Ra                      | 0001   |  |  |
| ADD Ra, Rb, Rc                 | 0010   |  |  |
| MOV Ra, #C                     | 0011   |  |  |
| SUB Ra, Rb, Rc                 | 0100   |  |  |
| JMPZ Ra, offset                | 0101   |  |  |
|                                |        |  |  |

TABLE O.O. Loster d'an annual

#### Example program

| Compare the contents of D[4] and D[5]. |                                          |               | TABLE 8.2 Instruction      |                 |  |  |
|----------------------------------------|------------------------------------------|---------------|----------------------------|-----------------|--|--|
|                                        | If equal, D[3] =1, otherwise set D[3]=0. |               |                            |                 |  |  |
| ii equal                               | , רוס                                    | =1, otherwise | WISE SET D[3]=0.           |                 |  |  |
|                                        |                                          |               |                            | MOV d, Ra       |  |  |
|                                        |                                          |               |                            | ADD Ra, Rb, Rc  |  |  |
|                                        |                                          |               |                            | MOV Ra, #C      |  |  |
|                                        | VON                                      | R0, #1        | # RF[0] = 1                | SUB Ra, Rb, Rc  |  |  |
| r                                      | NON                                      | R1, 4         | # RF[1] = D[4]             | JMPZ Ra, offset |  |  |
| r                                      | NON                                      | R2, 5         | # RF[2] = D[5]             |                 |  |  |
| 5                                      | SUB                                      | R3, R1, R2    | # RF[3] = RF[1]-RF[2]      |                 |  |  |
| J                                      | JMPZ                                     | R3, B1        | # if RF[3] = 0, jump to B1 |                 |  |  |
| 5                                      | SUB                                      | R0, R0, R0    | # RF[0] = 0                |                 |  |  |
| B1: N                                  | NON                                      | 3, R0         | # D[3] = RF[0]             |                 |  |  |

#### **Program for the 6-Instruction processor**

- Example program:
  - Count number of non-zero words in D[4] and D[5]
  - Result will be either 0, 1, or 2
  - Put result in D[9]

| TABLE 8.2 Instruction opcodes. |        |  |  |
|--------------------------------|--------|--|--|
| Instruction                    | Opcode |  |  |
| MOV Ra, d                      | 0000   |  |  |
| MOV d, Ra                      | 0001   |  |  |
| ADD Ra, Rb, Rc                 | 0010   |  |  |
| MOV Ra, #C                     | 0011   |  |  |
| SUB Ra, Rb, Rc                 | 0100   |  |  |
| JMPZ Ra, offset                | 0101   |  |  |
|                                |        |  |  |



21

opcodes. Opcode

0000

0001 0010

0011 0100 0101

### **Modifications to 3-instruction processor**

- Load-constant instruction 0011  $r_3r_2r_1r_0c_7c_6c_5c_4c_3c_2c_1c_0$
- Subtract instruction  $0100 ra_3 ra_2 ra_1 ra_0$  $rb_3 rb_2 rb_1 rb_0 rc_3 rc_2 rc_1 rc_0$
- Jump-if-zero instruction
   0101 ra<sub>3</sub>ra<sub>2</sub>ra<sub>1</sub>ra<sub>0</sub>
   0<sub>7</sub>0<sub>6</sub>0<sub>5</sub>0<sub>4</sub>0<sub>3</sub>0<sub>2</sub>0<sub>1</sub>0<sub>0</sub>



#### Adding instructions can also mean adding hardware

#### Extending the control unit and datapath





#### **Controller FSM for 6-instruction processor**



# HOW "REALISTIC" IS WHAT WE JUST DISCUSSED?

25

#### **ARM7TDMI** is real, commodity processor



#### ARM7TDMI is real, commodity processor





#### Very similar to instruction execution stages just discussed

#### Where is it used?



Fuji xerox DocuPrint C2090FS Colour Printer







Over 10 billion units shipped.

http://www.arm.com/products/processors/classic/arm7/arm7tdmi.php

Microsoft Xbox 360 Wireless

Steering Wheel



Triworks BEAUTY RF PLUS mod. BRF1

Nokia 500 Navigation



**Board Discussion #5:** Wrap up, final examples

