1
COSC 3P92
Cosc 3P92
Week 9 Lecture slides
The human brain starts working the moment you are
born and never stops until you stand up to speak in
public.
George Jessel
2
COSC 3P92
• In a typical computer system, the storage system is organized according to
the following hierarchy:
Archival Storage (magnetic tape
or photographic)
Moving head disk (magnetic or optical)
High speed drum
Charge Coupled Device
Main memory
Cache
Internal
decreasing
cost/bit
decreasing
access time
fast access (1-20 ns.) and small capacity (1-4K byte)
slow access (1-5 s.) and large capacity (almost unlimited)
Memory
Organization
3
COSC 3P92
Memory
speed
• Access time (Ta)
– the average time taken to read a unit of information
e.g., 100 ns (100 x 10**-9 s)
• Access rate (Ra) = 1/Ta (bits/second)
e.g., 1/100ns = 10 Mb/s
• Cycle time (Tc)
– the average time lapse between two successive read
operations
e.g., 500 ns (500 x 10**-9 s)
• Bandwidth or transfer rate (Rc) = 1/Tc
(bits/second)
e.g., 1/500ns = 2 Mb/s
4
COSC 3P92
Classes of Memory
• RAM (“normal memory”)
• Direct-access storage: HD, CD ROM, DVD
• Sequential access storage tapes: DAT
• Associative (content-addressable) memory:
searches for data via bit patterns
– CAM (Content Addressable Memory)
» Includes comparison logic with each bit of storage.
» A data value is broadcast to all words of storage and
compared with the values there.
» Words which match are flagged.
» Subsequent operations can then work on flagged words.
» (computing-dictionary.thefreedictionary.com)
• ROM
5
COSC 3P92
Categories of RAM and ROM
Mask PROM EPROM,
ROM EAROM
RAM ROM
magnetic semiconductor
core
static dynamic
Bipolar MOS
Mask PROM
ROM
primary memory
6
COSC 3P92
Main Memory Design
1K x 4
RAM chip
10
4
A9-A0
WE
CS
D3-D0
S WE MODE Status of the Power
Bi-directional
Datelines D3-D0
X not selected High impedance Standby
L Write Acts as input bus Active
H Read Acts as output bus Active
7
COSC 3P92
Main Memory Design
Q. How do we build a 4K x 4 RAM using four
1K x 4 RAM chips?
Chip A11 A10 A9 A8 A7 . . . A0 Range
0 0 0 x x x . . . x 0000 to 1023
1 0 1 x x x . . . x 1024 to 2047
2 1 0 x x x . . . x 2048 to 3071
3 1 1 x x x . . . x 3072 to 4096
8
COSC 3P92
Processor
log2 n 1-of-n
Decoder
Addr bus
Memory bank
Enable 1
Enable 2
Enable n
1 2
n
(On bus in parallel )
Main Memory
Design
• Q. How do we build a 256KB RAM system
with an 16-bit address bus and four 64KB
RAM chips?
• Memory band-switching
9
COSC 3P92
Processor
Data bus
base
address
4-bit
Base
16-bit Address bus
Offset
20 bit
Physical
address
to memory
Main memory
design
• Memory address extension
10
COSC 3P92
CPU
c
a
c
h
e
Main
memory
external
storage
C
a
c
h
e
Cache Memory
• Cache: fast-access memory buffer
• locality principle: programs usually use limited memory
areas, in contrast to totally random access
– spatial: location, address
– temporal: time accessed
– if commonly used memory can be buffered in high-speed cache,
overall performance enhanced
– cache takes form of small amount of store, with hardware support
for maintenance and lookup
– each cache cell saves a cache line - block of main memory (4-64
words)
• cache hit:
– requested memory resides in cache
11
COSC 3P92
• cache miss:
– requested memory not in cache, and must be fetched from main
memory and put into cache
• unified cache:
– instns, data share same cache
• split cache:
– separate instn, data caches
• parallel access:
– double the bandwidth
• level 2 cache:
– between instn/data cache and main memory
• Cache maintenance algorithms similar in spirit to
virtual memory ideas at operating system level; main
difference is that cache is hardware-supported,
whereas v.m. is software implemented
Cach
e
12
COSC 3P92
Measuring cache
performance
• c - cache access time
• m - main memory access time
• hr - hit ratio ( 0 <= hr <= 1) :
– # cache hits / total memory requests
• mr - miss ratio (1-hr)
• mean access time = c + (1-hr)m
– if hr --> 1 then m.a.t. = c
– if hr --> 0 then m.a.t. = c + m
13
COSC 3P92
Cache
Example
• example:
let c = 160 ns
m = 960 ns
h = .90 (common)
mean = 160 + (1-.90)960
= 256 ns
efficiency = c / mean
= 160/256 = 62.5%
14
COSC 3P92
Direct
mapping
0
1
i
M-1
0
N-1
Cache
Main memory
i mod N
15
COSC 3P92
Direct Mapped
Cache
16
COSC 3P92
Direct mapping
• use a hash function to find cache location
• normally, modulo some bit field of address, then
just use low end field
• cache fields:
– valid bit
– tag - block # being held
– value - data block
• scheme:
• memory request:
– compute cache slot (low n bits)
– check block (tag) field
» hit: return value
» miss: fetch block from memory, give to CPU, and put into
that computed slot (replace existing item if there)
• can occasionally produce thrashing
– eg. addresses that are multiple of cache size (64K) will reside at
same entry
– split instn/data cache helps avoid thrashing
17
COSC 3P92
Set associative
mapping 0
1
i
M-1
Set 0
Set 1
Set N/S - 1
Cache
Main memory
Set i mod (N/S)
S blocks per set
18
COSC 3P92
Set associative
mapping
19
COSC 3P92
Set associative mapping
• [4.39]
• use same hash function as direct mapping, except
that each cache slot holds multiple data blocks
– usually max. 4 blocks (“4-way”)
• searching blocks in a slot done associatively:
simultaneous pattern matching
• more flexible than direct: multiple blocks in set
• use smaller tag than associative, therefore
cheaper to implement associative matching
• commonly used in larger systems (VAX 11-780)
• which line should be replaced when slot full?
– eg. LRU (least recently used)
20
COSC 3P92
Writing back to the memory
• only write to memory if cache data modified.
• write back (write-deferred):
– (i) use a modified bit. When swapping a cache slot or ending
job, write slot if its modified bit is set
• write through:
– (ii) whenever modifying data, always write it back to main
memory
– have to do this if memory being shared in a DMA or
multiprocessing system
21
COSC 3P92
Example: direct
mapping
4 byte blocks
1 byte words
8 slots in cache
22
COSC 3P92
Example
(cont)
23
COSC 3P92
The end

Memory Organization In Assembly Language

  • 1.
    1 COSC 3P92 Cosc 3P92 Week9 Lecture slides The human brain starts working the moment you are born and never stops until you stand up to speak in public. George Jessel
  • 2.
    2 COSC 3P92 • Ina typical computer system, the storage system is organized according to the following hierarchy: Archival Storage (magnetic tape or photographic) Moving head disk (magnetic or optical) High speed drum Charge Coupled Device Main memory Cache Internal decreasing cost/bit decreasing access time fast access (1-20 ns.) and small capacity (1-4K byte) slow access (1-5 s.) and large capacity (almost unlimited) Memory Organization
  • 3.
    3 COSC 3P92 Memory speed • Accesstime (Ta) – the average time taken to read a unit of information e.g., 100 ns (100 x 10**-9 s) • Access rate (Ra) = 1/Ta (bits/second) e.g., 1/100ns = 10 Mb/s • Cycle time (Tc) – the average time lapse between two successive read operations e.g., 500 ns (500 x 10**-9 s) • Bandwidth or transfer rate (Rc) = 1/Tc (bits/second) e.g., 1/500ns = 2 Mb/s
  • 4.
    4 COSC 3P92 Classes ofMemory • RAM (“normal memory”) • Direct-access storage: HD, CD ROM, DVD • Sequential access storage tapes: DAT • Associative (content-addressable) memory: searches for data via bit patterns – CAM (Content Addressable Memory) » Includes comparison logic with each bit of storage. » A data value is broadcast to all words of storage and compared with the values there. » Words which match are flagged. » Subsequent operations can then work on flagged words. » (computing-dictionary.thefreedictionary.com) • ROM
  • 5.
    5 COSC 3P92 Categories ofRAM and ROM Mask PROM EPROM, ROM EAROM RAM ROM magnetic semiconductor core static dynamic Bipolar MOS Mask PROM ROM primary memory
  • 6.
    6 COSC 3P92 Main MemoryDesign 1K x 4 RAM chip 10 4 A9-A0 WE CS D3-D0 S WE MODE Status of the Power Bi-directional Datelines D3-D0 X not selected High impedance Standby L Write Acts as input bus Active H Read Acts as output bus Active
  • 7.
    7 COSC 3P92 Main MemoryDesign Q. How do we build a 4K x 4 RAM using four 1K x 4 RAM chips? Chip A11 A10 A9 A8 A7 . . . A0 Range 0 0 0 x x x . . . x 0000 to 1023 1 0 1 x x x . . . x 1024 to 2047 2 1 0 x x x . . . x 2048 to 3071 3 1 1 x x x . . . x 3072 to 4096
  • 8.
    8 COSC 3P92 Processor log2 n1-of-n Decoder Addr bus Memory bank Enable 1 Enable 2 Enable n 1 2 n (On bus in parallel ) Main Memory Design • Q. How do we build a 256KB RAM system with an 16-bit address bus and four 64KB RAM chips? • Memory band-switching
  • 9.
    9 COSC 3P92 Processor Data bus base address 4-bit Base 16-bitAddress bus Offset 20 bit Physical address to memory Main memory design • Memory address extension
  • 10.
    10 COSC 3P92 CPU c a c h e Main memory external storage C a c h e Cache Memory •Cache: fast-access memory buffer • locality principle: programs usually use limited memory areas, in contrast to totally random access – spatial: location, address – temporal: time accessed – if commonly used memory can be buffered in high-speed cache, overall performance enhanced – cache takes form of small amount of store, with hardware support for maintenance and lookup – each cache cell saves a cache line - block of main memory (4-64 words) • cache hit: – requested memory resides in cache
  • 11.
    11 COSC 3P92 • cachemiss: – requested memory not in cache, and must be fetched from main memory and put into cache • unified cache: – instns, data share same cache • split cache: – separate instn, data caches • parallel access: – double the bandwidth • level 2 cache: – between instn/data cache and main memory • Cache maintenance algorithms similar in spirit to virtual memory ideas at operating system level; main difference is that cache is hardware-supported, whereas v.m. is software implemented Cach e
  • 12.
    12 COSC 3P92 Measuring cache performance •c - cache access time • m - main memory access time • hr - hit ratio ( 0 <= hr <= 1) : – # cache hits / total memory requests • mr - miss ratio (1-hr) • mean access time = c + (1-hr)m – if hr --> 1 then m.a.t. = c – if hr --> 0 then m.a.t. = c + m
  • 13.
    13 COSC 3P92 Cache Example • example: letc = 160 ns m = 960 ns h = .90 (common) mean = 160 + (1-.90)960 = 256 ns efficiency = c / mean = 160/256 = 62.5%
  • 14.
  • 15.
  • 16.
    16 COSC 3P92 Direct mapping •use a hash function to find cache location • normally, modulo some bit field of address, then just use low end field • cache fields: – valid bit – tag - block # being held – value - data block • scheme: • memory request: – compute cache slot (low n bits) – check block (tag) field » hit: return value » miss: fetch block from memory, give to CPU, and put into that computed slot (replace existing item if there) • can occasionally produce thrashing – eg. addresses that are multiple of cache size (64K) will reside at same entry – split instn/data cache helps avoid thrashing
  • 17.
    17 COSC 3P92 Set associative mapping0 1 i M-1 Set 0 Set 1 Set N/S - 1 Cache Main memory Set i mod (N/S) S blocks per set
  • 18.
  • 19.
    19 COSC 3P92 Set associativemapping • [4.39] • use same hash function as direct mapping, except that each cache slot holds multiple data blocks – usually max. 4 blocks (“4-way”) • searching blocks in a slot done associatively: simultaneous pattern matching • more flexible than direct: multiple blocks in set • use smaller tag than associative, therefore cheaper to implement associative matching • commonly used in larger systems (VAX 11-780) • which line should be replaced when slot full? – eg. LRU (least recently used)
  • 20.
    20 COSC 3P92 Writing backto the memory • only write to memory if cache data modified. • write back (write-deferred): – (i) use a modified bit. When swapping a cache slot or ending job, write slot if its modified bit is set • write through: – (ii) whenever modifying data, always write it back to main memory – have to do this if memory being shared in a DMA or multiprocessing system
  • 21.
    21 COSC 3P92 Example: direct mapping 4byte blocks 1 byte words 8 slots in cache
  • 22.
  • 23.