# Lecture 27 Memory Technology + Storage + I/O

#### University of Notre Dame

Lecture 27 - Memory Technology + Storage + I/O

## SRAM (Static Random Access Memory)



- "logic" (CPU process, registers are SRAM)
- store bits in flip-flops (cross-coupled NORs)
- not very dense (six transistors per bit)
- + fast

@ 2004 by Lebeck, Sorin, Roth,

Hill, Wood, Sohi, Smith. Vijaykumar, Lipasti

+ doesn't need to be "refreshed" (data stays as long as power is on)

Storage Hierarchy I: Caches

COMPSCI 220 / ECE 252 Lecture Notes

### Storage Hierarchy II: Main Memory



main memory

- memory technology (DRAM)
- interleaving
- special DRAMs
- processor/memory integration

virtual memory and address translation

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory

University of Notre Dame

Lecture 27 - Memory Technology + Storage + I/O

## DRAM (Dynamic Random Access Memory)



- bit stored as charge in capacitor
  - optimized for density (1 transistor for DRAM vs. 6 for SRAM)
- capacitor discharges on a read (destructive read)
  - read is automatically followed by a write (to restore bit)
- charge leaks away over time (not static)
  - refresh by reading/writing every bit once every 2ms (row at a time)
- access time = time to read
- cycle time = time between reads > access time

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory

## **DRAM Chip Specs**

| Year | #bits | Access Time | Cycle Time |
|------|-------|-------------|------------|
| 1980 | 64Kb  | 150ns       | 300ns      |
| 1990 | 1Mb   | 80ns        | 160ns      |
| 1993 | 4Mb   | 60ns        | 120ns      |
| 2000 | 64Mb  | 50ns        | 100ns      |
| 2004 | 1Gb   | 45ns        | 75ns       |

density: +60% annual

· Moore's law: density doubles every 18 months

• speed: %7 annual

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory

7

University of Notre Dame

Lecture 27 - Memory Technology + Storage + I/O

7

## **Example: Simple Main Memory**

- 32-bit wide DRAM (1 word of data at a time)
- pretty wide for an actual DRAM
  access time: 2 cycles (A)
- transfer time: 1 cycle (T)
  - time on the bus
- cycle time: 4 cycles (B = cycle time access time)
  - · B includes time to refresh after a read
- · what is the miss penalty for a 4-word block?

## Comparison with SRAM

#### SRAM

- optimized for speed, then density
  - + 1/4-1/8 access time of DRAM
  - 1/4 density of DRAM
- bits stored as flip-flops (4-6 transistors per bit)
- static: bit not erased on a read
  - + no need to refresh
  - greater power dissipated than DRAM Think about in context of leakage!
  - + access time = cycle time

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory

6

University of Notre Dame

Lecture 27 - Memory Technology + Storage + I/O

## Simple Main Memory

| cycle       | addr | mem |
|-------------|------|-----|
| 1           | 12   | Α   |
| 2<br>3<br>4 |      | Α   |
| 3           |      | T/B |
| 4           |      | В   |
| 5           | 13   | Α   |
| 6           |      | Α   |
| 7           |      | T/B |
| 8           |      | В   |
| 9           | 14   | Α   |
| 10          |      | Α   |
| 11          |      | T/B |
| 12          |      | В   |
| 13          | 15   | Α   |
| 14          |      | Α   |
| 15          |      | T/B |
| 16          |      | В   |

4-word access = 15 cycles

4-word cycle = 16 cycles

can we speed this up?

- lower latency?
  - no
  - A,B & T are fixed
- higher bandwidth?

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohl, Smith, Viiavkumar, Lipasti © 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory

### Bandwidth: Wider DRAMs

| cycle | addr | mem |
|-------|------|-----|
| 1     | 12   | Α   |
| 2     |      | Α   |
| 3     |      | T/B |
| 4     |      | В   |
| 5     | 14   | Α   |
| 6     |      | Α   |
| 7     |      | T/B |
| 8     |      | В   |

new parameter

- 64-bit DRAMs
- 4-word access = 7 cycles
- 4-word cycle = 8 cycles

- 64-bit bus
  - · wide buses (especially off-chip) are hard
  - electrical problems
- 64-bit DRAM is probably too wide

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory 10

University of Notre Dame

Lecture 27 - Memory Technology + Storage + I/O

11

## Simple Interleaving

| cycle | addr | bank0 | bank1 | bank2 | bank3 |
|-------|------|-------|-------|-------|-------|
| 1     | 12   | Α     | Α     | Α     | Α     |
| 2     |      | Α     | Α     | Α     | Α     |
| 3     |      | T/B   | В     | В     | В     |
| 4     |      | В     | T/B   | В     | В     |
| 5     |      |       |       | Т     | В     |
| 6     |      |       |       |       | Т     |

4-word access = 6 cycles

4-word cycle = 4 cycles

© 2004 by Lebeck, Sorin, Roth,

Hill, Wood, Sohi, Smith,

Vijaykumar, Lipasti

- + can start a new access in cycle 5
- + overlap access with transfer
- + and still use a 32-bit bus!

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory

12

## Bandwidth: Simple Interleaving/Banking

use multiple DRAMs, exploit their aggregate bandwidth

- each DRAM called a bank
  - not true: sometimes collection of DRAMs together called a bank
- · M 32-bit banks
- simple interleaving: banks share address lines
- word A in bank (A % M) at (A div M)
  - e.g., M=4, A=9: bank 1, location 2

| 0      | 1      | 2      | 3      |  |
|--------|--------|--------|--------|--|
| 4      | 5      | 6      | 7      |  |
| 8      | 9      | 10     | 11     |  |
| bank 0 | bank 1 | bank 2 | bank 3 |  |

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory 11

University of Notre Dame

Lecture 27 - Memory Technology + Storage + I/O

1

## **Processor/Memory Integration**

the next logical step: processor and memory on same chip

- move on-chip: FP, L2 caches, graphics. why not memory?
- problem: processor/memory technologies incompatible
  - · different number/kinds of metal layers
  - DRAM: capacitance is a good thing, logic: capacitance a bad thing

what needs to be done?

- use some DRAM area for simple processor (10% enough)
- eliminate external memory bus, milk performance from that
- integrate interconnect interfaces (processor/memory unit)
- re-examine tradeoffs: technology, cost, performance
- research projects: PIM, IRAM

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory

### Storage Hierarchy III: I/O System



- often boring, but still quite important
  - · ostensibly about general I/O, mainly about disks
- performance: latency & throughput
- disks
  - parameters
  - extensions
- buses

@ 2004 by Lebeck, Sorin, Roth Hill, Wood, Sohi, Smith,

COMPSCI 220 / ECE 252 Lecture Notes

Storage Hierarchy III: Disks, Buses and I/O

University of Notre Dame

Lecture 27 - Memory Technology + Storage + I/O

### I/O Device Characteristics

type

input: read only

- · output: write only
- storage: both
- partner
  - human
  - machine
- data rate
  - peak transfer rate

**Output to display** device partner data rate KB/s type 0.01 I human mouse CRT 0 60,000 human I/O 2-8 modem machine I/O 500-6000 LAN machine machine 2000 tape storage disk 2000-10,000 storage nachine **Both input & output** 

Input to system

Of interest to this discussion

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy III: Disks, Buses and I/O

## I/O (Disk) Performance

- who cares? you do
  - remember Amdahl's Law
  - want fast disk access (fast swap, fast file reads)
- I/O performance metrics

  - · raw data bandwidth: bytes per second
  - latency: response time
- is I/O (disk) latency important? why not just context-switch?

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith. Vijavkumar, Lipasti

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy III: Disks, Buses and I/O

University of Notre Dame

Lecture 27 - Memory Technology + Storage + I/O

### **Disk Parameters**



- 1–20 platters (data on both sides)
  - · magnetic iron-oxide coating
  - 1 read/write head per side
- 500–2500 tracks per platter
- 32–128 sectors per track
  - sometimes fewer on inside tracks
- 512–2048 bytes per sector
  - · usually fixed number of bytes/sector
  - data + ECC (parity) + gap
- 4–24GB total
- 3000–10000 RPM

@ 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith Vijavkumar, Lipasti

sector

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy III: Disks, Buses and I/O

University of Notre Dame

University of Notre Dame

What metrics are

important for what

applications?

## Disk Performance Example

#### parameters

- 3600 RPM ⇒ 60 RPS (may help to think in units of tracks/sec)
- · avg seek time: 9ms
- 100 sectors per track, 512 bytes per sector
- · controller + queuing delays: 1ms
- Q: average time to read 1 sector (512 bytes)?
  - rate<sub>transfer</sub> = 100 sectors/track \* 512 B/sector \* 60 RPS = 2.4 MB/s
  - t<sub>transfer</sub> = 512 B / 2.4 MB/s = 0.2ms
  - $t_{rotation} = .5 / 60 RPS = 8.3 ms$
  - t<sub>disk</sub> = 9ms (seek) + 8.3ms (rotation) + 0.2ms (xfer) + 1ms = 18.5ms
  - t<sub>transfer</sub> is only a small component! counter-intuitive?
  - end of story? no! t<sub>queuing</sub> not fixed (gets longer with more requests)

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy III: Disks, Buses and I/O 7

#### University of Notre Dame

Lecture 27 - Memory Technology + Storage + I/O

19

### **Disk Alternatives**

- · solid state disk (SSD)
  - DRAM + battery backup with standard disk interface
  - + fast: no seek time, no rotation time, fast transfer rate
  - expensive
- FLASH memory
  - + fast: no seek time, no rotation time, fast transfer rate
  - + non-volatile
  - slow
  - "wears" out over time

Actually, reads are proportional to normal DRAM, but writes take longer

- · optical disks (CDs, DVDs)
  - cheap if write-once, expensive if write-multiple
  - slow

@ 2004 by Lebeck, Sorin, Roth

Hill, Wood, Sohi, Smith.

Vijaykumar, Lipasti

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy III: Disks, Buses and I/O

10

### **Disk Usage Models**

· data mining + supercomputing

· large files, sequential reads

• raw data transfer rate (rate<sub>transfer</sub>) is most important

· transaction processing

- large files, but random access, many small requests
- IOPS is most important
- time sharing filesystems
  - small files, sequential accesses, potential for file caching
  - IOPS is most important

#### must design disk (I/O) system based on target workload

· use disk benchmarks (they exist)

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy III: Disks, Buses and I/O

9

#### University of Notre Dame

Lecture 27 - Memory Technology + Storage + I/O

2

### **Extensions to Conventional Disks**

- · increasing density: more sensitive heads, finer control
  - increases cost
- · fixed head: head per track
  - + seek time eliminated
  - low track density
- parallel transfer: simultaneous read from multiple platters
  - difficulty in looking onto different tracks on multiple surfaces
  - lower cost alternatives possible (disk arrays)

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy III: Disks, Buses and I/O

### More Extensions to Conventional Disks

- disk caches: disk-controller RAM buffers data
  - + fast writes: RAM acts as a write buffer
  - + better utilization of host-to-device path
  - high miss rate increases request latency
- disk scheduling: schedule requests to reduce latency
  - · e.g., schedule request with shortest seek time
  - e.g., "elevator" algorithm for seeks (head sweeps back and forth)
  - works best for unlikely cases (long queues)

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy III: Disks, Buses and I/O 12

#### University of Notre Dame

Lecture 27 - Memory Technology + Storage + I/O

2

## Bus Issues (Memory & I/O Buses)

- · clocking: is bus clocked?
  - synchronous: clocked, short bus ⇒ fast
  - asynchronous: no clock, use "handshaking" instead ⇒ slow
- switching: when is control of bus acquired and released?
  - atomic: bus held until request complete ⇒ slow
  - split-transaction (pipelined): bus free btwn request & reply ⇒ fast
- arbitration: how do we decide who gets the bus next?
  - overlap arbitration for next master with current transfer
  - daisy chain: closer devices have priority  $\Rightarrow$  slow
  - distributed: wired-OR, low-priority back-off  $\Rightarrow$  medium
- some other issues
  - split data/address lines, width, burst transfer

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy III: Disks, Buses and I/O

17

### I/O System Architecture



© 2004 by Lebeck, Sorin, Roth.

Hill, Wood, Sohi, Smith.

Vijaykumar, Lipasti

- buses
  - · memory bus
  - I/O bus
- I/O processing
  - program controlled
  - DMA
  - I/O processors (IOPs)

University of Notre Dame

COMPSCI 220 / ECE 252 Lecture Notes

Storage Hierarchy III: Disks, Buses and I/O

Lecture 27 - Memory Technology + Storage + I/O

2.4

16

## I/O and Memory Buses

|        |           | bits   | MHz    | peak MB/s | special features       |
|--------|-----------|--------|--------|-----------|------------------------|
| memory | Summit    | 128    | 60     | 960       |                        |
| buses  | Challenge | 256    | 48     | 1200      |                        |
|        | XDBus     | 144    | 66     | 1056      |                        |
| I/O    | ISA       | 16     | 8      | 16        | original PC bus        |
| buses  | IDE       | 16     | 8      | 16        | tape, CD-ROM           |
|        | PCI       | 32(64) | 33(66) | 133(266)  | "plug+play"            |
|        | SCSI/2    | 8/16   | 5/10   | 10/20     | high-level interface   |
|        | PCMCIA    | 8/16   | 8      | 16        | modem, "hot-swap"      |
|        | USB       | serial | isoch. | 1.5       | power line, packetized |
|        | FireWire  | serial | isoch. | 100       | fast USB               |

- memory buses: speed (usually custom design)
- I/O buses: compatibility (usually industry standard) + cost

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy III: Disks, Buses and I/O