1
COSC 3P92
Cosc 3P92
Week9 Lecture slides
The human brain starts working the moment you are
born and never stops until you stand up to speak in
public.
George Jessel
2.
2
COSC 3P92
• Ina typical computer system, the storage system is organized according to
the following hierarchy:
Archival Storage (magnetic tape
or photographic)
Moving head disk (magnetic or optical)
High speed drum
Charge Coupled Device
Main memory
Cache
Internal
decreasing
cost/bit
decreasing
access time
fast access (1-20 ns.) and small capacity (1-4K byte)
slow access (1-5 s.) and large capacity (almost unlimited)
Memory
Organization
3.
3
COSC 3P92
Memory
speed
• Accesstime (Ta)
– the average time taken to read a unit of information
e.g., 100 ns (100 x 10**-9 s)
• Access rate (Ra) = 1/Ta (bits/second)
e.g., 1/100ns = 10 Mb/s
• Cycle time (Tc)
– the average time lapse between two successive read
operations
e.g., 500 ns (500 x 10**-9 s)
• Bandwidth or transfer rate (Rc) = 1/Tc
(bits/second)
e.g., 1/500ns = 2 Mb/s
4.
4
COSC 3P92
Classes ofMemory
• RAM (“normal memory”)
• Direct-access storage: HD, CD ROM, DVD
• Sequential access storage tapes: DAT
• Associative (content-addressable) memory:
searches for data via bit patterns
– CAM (Content Addressable Memory)
» Includes comparison logic with each bit of storage.
» A data value is broadcast to all words of storage and
compared with the values there.
» Words which match are flagged.
» Subsequent operations can then work on flagged words.
» (computing-dictionary.thefreedictionary.com)
• ROM
5.
5
COSC 3P92
Categories ofRAM and ROM
Mask PROM EPROM,
ROM EAROM
RAM ROM
magnetic semiconductor
core
static dynamic
Bipolar MOS
Mask PROM
ROM
primary memory
6.
6
COSC 3P92
Main MemoryDesign
1K x 4
RAM chip
10
4
A9-A0
WE
CS
D3-D0
S WE MODE Status of the Power
Bi-directional
Datelines D3-D0
X not selected High impedance Standby
L Write Acts as input bus Active
H Read Acts as output bus Active
7.
7
COSC 3P92
Main MemoryDesign
Q. How do we build a 4K x 4 RAM using four
1K x 4 RAM chips?
Chip A11 A10 A9 A8 A7 . . . A0 Range
0 0 0 x x x . . . x 0000 to 1023
1 0 1 x x x . . . x 1024 to 2047
2 1 0 x x x . . . x 2048 to 3071
3 1 1 x x x . . . x 3072 to 4096
8.
8
COSC 3P92
Processor
log2 n1-of-n
Decoder
Addr bus
Memory bank
Enable 1
Enable 2
Enable n
1 2
n
(On bus in parallel )
Main Memory
Design
• Q. How do we build a 256KB RAM system
with an 16-bit address bus and four 64KB
RAM chips?
• Memory band-switching
10
COSC 3P92
CPU
c
a
c
h
e
Main
memory
external
storage
C
a
c
h
e
Cache Memory
•Cache: fast-access memory buffer
• locality principle: programs usually use limited memory
areas, in contrast to totally random access
– spatial: location, address
– temporal: time accessed
– if commonly used memory can be buffered in high-speed cache,
overall performance enhanced
– cache takes form of small amount of store, with hardware support
for maintenance and lookup
– each cache cell saves a cache line - block of main memory (4-64
words)
• cache hit:
– requested memory resides in cache
11.
11
COSC 3P92
• cachemiss:
– requested memory not in cache, and must be fetched from main
memory and put into cache
• unified cache:
– instns, data share same cache
• split cache:
– separate instn, data caches
• parallel access:
– double the bandwidth
• level 2 cache:
– between instn/data cache and main memory
• Cache maintenance algorithms similar in spirit to
virtual memory ideas at operating system level; main
difference is that cache is hardware-supported,
whereas v.m. is software implemented
Cach
e
12.
12
COSC 3P92
Measuring cache
performance
•c - cache access time
• m - main memory access time
• hr - hit ratio ( 0 <= hr <= 1) :
– # cache hits / total memory requests
• mr - miss ratio (1-hr)
• mean access time = c + (1-hr)m
– if hr --> 1 then m.a.t. = c
– if hr --> 0 then m.a.t. = c + m
16
COSC 3P92
Direct mapping
•use a hash function to find cache location
• normally, modulo some bit field of address, then
just use low end field
• cache fields:
– valid bit
– tag - block # being held
– value - data block
• scheme:
• memory request:
– compute cache slot (low n bits)
– check block (tag) field
» hit: return value
» miss: fetch block from memory, give to CPU, and put into
that computed slot (replace existing item if there)
• can occasionally produce thrashing
– eg. addresses that are multiple of cache size (64K) will reside at
same entry
– split instn/data cache helps avoid thrashing
19
COSC 3P92
Set associativemapping
• [4.39]
• use same hash function as direct mapping, except
that each cache slot holds multiple data blocks
– usually max. 4 blocks (“4-way”)
• searching blocks in a slot done associatively:
simultaneous pattern matching
• more flexible than direct: multiple blocks in set
• use smaller tag than associative, therefore
cheaper to implement associative matching
• commonly used in larger systems (VAX 11-780)
• which line should be replaced when slot full?
– eg. LRU (least recently used)
20.
20
COSC 3P92
Writing backto the memory
• only write to memory if cache data modified.
• write back (write-deferred):
– (i) use a modified bit. When swapping a cache slot or ending
job, write slot if its modified bit is set
• write through:
– (ii) whenever modifying data, always write it back to main
memory
– have to do this if memory being shared in a DMA or
multiprocessing system