Memory Hierarchy

Overview

In this chapter, we shall explore the principles and design of the memory hierarchy, a foundational concept in modern computer architecture. The central processing unit (CPU) operates at a speed far exceeding that of main memory. This disparity, often termed the "processor-memory performance gap," presents a significant bottleneck that would otherwise leave the processor perpetually idle, waiting for data. The memory hierarchy is the architectural solution to this critical problem, organizing storage into a pyramid of levels, each varying in speed, capacity, and cost. Our primary goal is to create the illusion of a single, large, and fast memory that can keep pace with the processor's demands.

To mitigate this performance gap, computer systems employ a multi-level memory structure predicated on the principle of locality of reference. This principle observes that programs tend to reuse data and instructions they have recently accessed (temporal locality) and access data items near those they have recently accessed (spatial locality). By placing smaller, faster, and more expensive memory levels—such as caches—closer to the CPU, we can satisfy the vast majority of memory requests swiftly. The objective is to engineer a system that approaches the speed of the fastest memory level and the capacity of the largest, all at a reasonable cost.

For the GATE examination, a thorough understanding of the memory hierarchy is paramount. A significant portion of questions in Computer Organization and Architecture will test one's ability to analyze system performance based on memory parameters. We will delve into quantitative metrics such as hit rate, miss rate, miss penalty, and the calculation of Average Memory Access Time (AMAT), expressed as $T_{avg} = (\text{Hit Rate} \times \text{Hit Time}) + (\text{Miss Rate} \times \text{Miss Penalty})$ . Mastery of these calculations is essential for solving numerical problems related to system performance, which are a recurring feature of the examination.

---

Chapter Contents

| # | Topic | What You'll Learn |
|---|-------|-------------------|
| 1 | Cache Memory | Principles, organization, mapping, and performance analysis. |

---

Learning Objectives

❗ By the End of This Chapter

After completing this chapter, you will be able to:

Explain the principle of locality of reference and its role in justifying the memory hierarchy.

Calculate the Average Memory Access Time (AMAT) for a multi-level memory system.

Analyze and compare different cache mapping techniques: direct, fully associative, and set-associative.

Evaluate the impact of cache parameters, such as block size and replacement policies, on system performance.

---

We now turn our attention to Cache Memory...

Part 1: Cache Memory

Introduction

In modern computer systems, the speed of the processor has increased at a far greater rate than the speed of main memory. This disparity, often termed the processor-memory performance gap, creates a significant bottleneck. A processor capable of executing billions of instructions per second would be rendered ineffective if it had to wait for slow main memory on every data access. Cache memory is the principal architectural solution devised to bridge this gap.

The effectiveness of cache memory is predicated on a fundamental property of program behavior known as the Principle of Locality. This principle has two components: temporal locality, which is the tendency for a program to access the same memory location again in the near future, and spatial locality, which is the tendency to access memory locations that are physically close to recently accessed locations. By storing recently used data and instructions in a small, fast memory (the cache) situated between the processor and main memory, the average memory access time can be dramatically reduced. This chapter provides a comprehensive examination of cache organization, performance metrics, and operational policies, which are essential topics for the GATE examination.

📖 Cache Memory

A cache is a small, fast, and expensive memory that is used to store copies of data from the most frequently used main memory locations. When the processor needs to read from or write to a location in main memory, it first checks whether a copy of that data is in the cache. If so, the processor reads from or writes to the cache, which is much faster than accessing main memory.

---

Key Concepts

1. Cache Organization and Address Mapping

The central challenge in cache design is determining how blocks of main memory are mapped to blocks (or lines) in the cache. The physical address generated by the processor is interpreted by the cache controller to locate data. This address is partitioned into three fields: the Tag, the Index (or Set), and the Block Offset.

Physical Address

Tag

Index / Set

Block Offset

Identifies which main memory
block is in the cache line.

Selects the cache line or
set to check.

Selects the specific byte
within the cache block.

Block Offset: Determines the location of the desired byte within a cache block. If the block size is $2^b$ bytes, then $b$ bits are required for the block offset.
Index: Selects which cache line (or set of lines) to place the main memory block into. If there are $2^i$ lines in the cache, $i$ bits are used for the index.
Tag: The remaining bits of the address are stored along with the data in the cache line. The tag is used to verify if the block stored in the cache line is the one requested by the processor.

The mapping function determines how these fields are defined. There are three primary techniques.

a) Direct Mapping

In a direct-mapped cache, each block from main memory can map to only one specific cache line. The mapping is determined by the index field of the address.

📐 Direct Mapping Address Division

\text{Cache Line Number} = (\text{Main Memory Block Address}) \pmod{\text{Number of Cache Lines}}

Address Fields:

Block Offset = $\log_2(\text{Block Size in bytes})$ bits

Index = $\log_2(\text{Number of Cache Lines})$ bits

Tag = (Total Address Bits) - (Index Bits) - (Block Offset Bits)

Application: Used in problems where cache size, block size, and memory size are given, and you need to find the size of the tag, index, or offset fields. This is a very frequent question pattern.

Worked Example:

Problem: Consider a system with a $16$ KB direct-mapped cache, a block size of $32$ bytes, and a $32$ -bit physical address. Determine the number of bits for the Tag, Index, and Block Offset fields. Assume the memory is byte-addressable.

Solution:

Step 1: Calculate the Block Offset bits.
The block size is $32$ bytes.

\text{Block Offset bits} = \log_2(32) = 5 \text{ bits}

Step 2: Calculate the number of cache lines.

\text{Cache Size} = 16 \text{ KB} = 16 \times 2^{10} \text{ bytes}

\text{Block Size} = 32 \text{ bytes}

\text{Number of Cache Lines} = \frac{\text{Cache Size}}{\text{Block Size}} = \frac{16 \times 2^{10}}{32} = \frac{2^4 \times 2^{10}}{2^5} = 2^9 = 512 \text{ lines}

Step 3: Calculate the Index bits.
The number of index bits is the logarithm of the number of cache lines.

\text{Index bits} = \log_2(512) = 9 \text{ bits}

Step 4: Calculate the Tag bits.
The total address is $32$ bits.

\text{Tag bits} = \text{Total bits} - \text{Index bits} - \text{Offset bits}

\text{Tag bits} = 32 - 9 - 5 = 18 \text{ bits}

Answer: The address is divided into $18$ Tag bits, $9$ Index bits, and $5$ Block Offset bits.

b) K-Way Set-Associative Mapping

This is a compromise between direct mapping and fully associative mapping. The cache is divided into sets, each containing $K$ cache lines. A main memory block can map to any of the $K$ lines within a specific set, but only to that set.

📐 Set-Associative Mapping Address Division

\text{Cache Set Number} = (\text{Main Memory Block Address}) \pmod{\text{Number of Sets}}

Address Fields:

Number of Sets = $\frac{\text{Number of Cache Lines}}{K \text{ (Associativity)}}$

Block Offset = $\log_2(\text{Block Size in bytes})$ bits

Set (Index) = $\log_2(\text{Number of Sets})$ bits

Tag = (Total Address Bits) - (Set Bits) - (Block Offset Bits)

Application: Most practical caches use this scheme. GATE problems often ask to find the associativity

K

or the size of one of the address fields. Note that direct mapping is simply

1

-way set-associative mapping.

Worked Example:

Problem: A $32$ -bit address is used for a $128$ KB, $4$ -way set-associative cache with a block size of $64$ bytes. Find the number of bits in the Tag, Set, and Block Offset fields.

Solution:

Step 1: Calculate the Block Offset bits.
The block size is $64$ bytes.

\text{Block Offset bits} = \log_2(64) = 6 \text{ bits}

Step 2: Calculate the number of cache lines.

\text{Cache Size} = 128 \text{ KB} = 2^7 \times 2^{10} = 2^{17} \text{ bytes}

\text{Block Size} = 64 \text{ bytes} = 2^6 \text{ bytes}

\text{Number of Cache Lines} = \frac{2^{17}}{2^6} = 2^{11} \text{ lines}

Step 3: Calculate the number of sets.
The cache is $4$ -way set-associative ( $K=4$ ).

\text{Number of Sets} = \frac{\text{Number of Cache Lines}}{\text{Associativity}} = \frac{2^{11}}{4} = \frac{2^{11}}{2^2} = 2^9 = 512 \text{ sets}

Step 4: Calculate the Set (Index) bits.

\text{Set bits} = \log_2(\text{Number of Sets}) = \log_2(512) = 9 \text{ bits}

Step 5: Calculate the Tag bits.
The total address is $32$ bits.

\text{Tag bits} = 32 - \text{Set bits} - \text{Offset bits}

\text{Tag bits} = 32 - 9 - 6 = 17 \text{ bits}

Answer: The address is divided into $17$ Tag bits, $9$ Set bits, and $6$ Block Offset bits.

c) Fully Associative Mapping

In this scheme, a main memory block can be placed in any available cache line. There is no index field; the entire address (except for the block offset) is used as the tag. This offers the most flexibility and avoids conflict misses, but the hardware required to simultaneously compare the tag with all cache lines is prohibitively expensive for large caches.

---

2. Cache Performance Evaluation

The performance of a cache is primarily measured by its ability to satisfy memory requests quickly. This is quantified using the Average Memory Access Time (AMAT).

📖 Hit and Miss

Cache Hit: The requested data is found in the cache. The time taken is the Hit Time ( $T_{hit}$ or $T_C$ ).
Cache Miss: The requested data is not in the cache and must be fetched from main memory. The additional time required is the Miss Penalty ( $M_P$ ).
Hit Rate (H): The fraction of memory accesses that are hits.
Miss Rate (M): The fraction of memory accesses that are misses. $M = 1 - H$ .

a) Average Memory Access Time (AMAT) for a Single-Level Cache

The AMAT is the weighted average of the time taken for a hit and the time taken for a miss.

📐 Average Memory Access Time (AMAT)

AMAT = (\text{Hit Rate} \times \text{Hit Time}) + (\text{Miss Rate} \times \text{Miss Time})

AMAT = H \times T_{hit} + M \times (T_{hit} + M_P)

Variables:

$H$ = Hit rate

$M$ = Miss rate ( $1-H$ )

$T_{hit}$ = Time to access the cache (hit time)

$M_P$ = Miss penalty (time to fetch a block from main memory)

When to use: For calculating the effective memory access time for a system with a single level of cache.

b) AMAT for Multi-Level Caches

Modern systems employ a hierarchy of caches (L1, L2, L3). The calculation of AMAT becomes more involved. There are two common ways these problems are presented in GATE.

1. Hierarchical Access (Global Hit/Miss Rates):
The processor first checks L1. If it misses, it checks L2. If L2 misses, it goes to main memory.

📐 AMAT for Multi-Level Cache (Hierarchical)

AMAT = T_{L1} + M_{L1} \times (T_{L2} + M_{L2} \times M_{P_{L2}})

Variables:

$T_{L1}$ = L1 hit time

$M_{L1}$ = L1 miss rate

$T_{L2}$ = L2 hit time (time to access L2 after an L1 miss)

$M_{L2}$ = L2 miss rate (global miss rate, i.e., fraction of total accesses that miss in both L1 and L2)

$M_{P_{L2}}$ = Miss penalty for L2 (time to access main memory)

When to use: When global miss rates are given or can be inferred.

2. Using Local Miss Rates:
This is a more common and intuitive formulation.

📐 AMAT for Multi-Level Cache (Local Miss Rates)

AMAT = T_{L1} + M_{L1} \times (\text{L1 Miss Penalty})

where the L1 Miss Penalty is the AMAT for the L2 cache:

\text{L1 Miss Penalty} = T_{L2} + M_{L2} \times (\text{L2 Miss Penalty})

So, for a two-level cache system:

AMAT = T_{L1} + M_{L1} \times (T_{L2} + M_{L2} \times T_{Mem})

Variables:

$T_{L1}$ = L1 access time

$M_{L1}$ = L1 local miss rate (misses in L1 / accesses to L1)

$T_{L2}$ = L2 access time

$M_{L2}$ = L2 local miss rate (misses in L2 / accesses to L2)

$T_{Mem}$ = Main memory access time

When to use: This is the standard formula when local miss rates for each cache level are provided.

c) Cache Performance and CPU Execution Time

Cache misses introduce stalls into the processor pipeline, increasing the overall execution time. The impact is measured by an increase in the effective Cycles Per Instruction (CPI).

📐 Effective CPI with Memory Stalls

CPI_{effective} = CPI_{base} + \text{Memory Stall Cycles per Instruction}

\text{Stalls} = (\text{Memory Accesses per Instruction}) \times (\text{Miss Rate}) \times (\text{Miss Penalty in cycles})

When to use: When analyzing the impact of cache performance on overall processor throughput. Note that instruction fetches and data loads/stores are both memory accesses.

Worked Example:

Problem: A processor has a base CPI of 1.5. It features a data cache with a 5% miss rate and an instruction cache with a 2% miss rate. 30% of instructions are loads or stores. The miss penalty is 80 cycles for both caches. Calculate the effective CPI.

Solution:

Step 1: Calculate stall cycles from the instruction cache.
Every instruction requires one memory access for the fetch.

\text{Instruction Stalls} = (\text{1 access/instruction}) \times (\text{0.02 miss rate}) \times (\text{80 cycles penalty})

\text{Instruction Stalls} = 1.6 \text{ cycles/instruction}

Step 2: Calculate stall cycles from the data cache.
30% of instructions are loads/stores, so they access the data cache.

\text{Data Stalls} = (\text{0.3 accesses/instruction}) \times (\text{0.05 miss rate}) \times (\text{80 cycles penalty})

\text{Data Stalls} = 0.3 \times 4 = 1.2 \text{ cycles/instruction}

Step 3: Calculate the total effective CPI.

CPI_{effective} = CPI_{base} + \text{Instruction Stalls} + \text{Data Stalls}

CPI_{effective} = 1.5 + 1.6 + 1.2 = 4.3

Answer: \boxed{4.3}

---

3. Write Policies

When a processor writes to memory, the cache must be updated. Write policies govern how and when the main memory is updated to maintain consistency.

a) Write-Through vs. Write-Back

❗ Must Remember

In a write-through cache, eviction of a block never triggers a write to main memory. In a write-back cache, eviction of a dirty block (dirty bit = 1) always triggers a write to main memory. A read miss can cause the eviction of a dirty block, which in turn leads to a write-back operation.

b) Write Allocation Policies

This policy determines what happens on a write miss.

Write-Allocate (Fetch on Write): On a write miss, the block is first fetched from main memory and loaded into the cache. The write operation then proceeds as a write hit. This policy is almost always used with write-back caches, as subsequent writes to the same block will then be fast cache hits.

No-Write-Allocate (Write Around): On a write miss, the data is written directly to main memory, bypassing the cache. The cache is not modified. This policy is often paired with write-through caches, as it avoids polluting the cache with data that might not be read soon.

---

Problem-Solving Strategies

💡 GATE Strategy: The Address Dissection Method

For any cache mapping problem, immediately determine the three key parameters: Cache Size ( $C$ ), Block Size ( $B$ ), and Associativity ( $K$ ). Then, follow this sequence:

Block Offset Bits: This is always the first and easiest calculation. Bits = $\log_2(B)$ .

Number of Lines: Total Lines = $C/B$ .

Number of Sets: Sets = (Total Lines) / $K$ . (For direct mapped, $K=1$ , so Sets = Lines).

Set/Index Bits: Bits = $\log_2(\text{Number of Sets})$ .

Tag Bits: Bits = $(Total Address Bits) - (Set/Index Bits) - (Block Offset Bits)$ .

By following this rigid sequence, you can solve almost any address mapping question without confusion.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Confusing Bytes and Words: Always check if the system is byte-addressable or word-addressable. Most GATE problems assume byte-addressable. Ensure all units (cache size, block size) are consistently in bytes for calculations.
❌ Incorrect AMAT Formula for Multi-Level Caches: Confusing local and global miss rates. If the problem states "L2 hit rate is 80%", this is a local rate. If it says "80% of accesses that miss L1 hit in L2", this is also local. Read the phrasing carefully. The standard formula

T_{L1} + M_{L1} \times (T_{L2} + M_{L2} \times T_{Mem})

is the safest to use.

❌ Forgetting the Dirty Bit in Write-Back: Misunderstanding that a read miss can cause a write to main memory. If a read miss forces the eviction of a dirty block from a write-back cache, a write-back operation must occur before the new block is loaded.
❌ Calculating Total Tag Memory: The total size of the tag directory is (Number of Cache Lines) $\times$ (Tag Bits per Line). Students often forget to multiply by the number of lines.

---

Practice Questions

:::question type="NAT" question="A 16-way set associative cache has a capacity of 256 KB. The block size is 128 bytes. The system uses a 37-bit physical address. The number of bits required for the tag field is ______." answer="23" hint="Follow the address dissection method: first find offset bits, then set bits, and finally the tag bits." solution="
Step 1: Calculate the Block Offset bits.
Block Size = 128 bytes = $2^7$ bytes.

\text{Block Offset bits} = \log_2(128) = 7 \text{ bits}

Step 2: Calculate the number of cache lines.
Cache Size = 256 KB = $2^8 \times 2^{10} = 2^{18}$ bytes.

\text{Number of Lines} = \frac{\text{Cache Size}}{\text{Block Size}} = \frac{2^{18}}{2^7} = 2^{11} \text{ lines}

Step 3: Calculate the number of sets.
Associativity ( $K$ ) = 16 = $2^4$ .

\text{Number of Sets} = \frac{\text{Number of Lines}}{K} = \frac{2^{11}}{2^4} = 2^7 = 128 \text{ sets}

Step 4: Calculate the Set bits.

\text{Set bits} = \log_2(\text{Number of Sets}) = \log_2(128) = 7 \text{ bits}

Step 5: Calculate the Tag bits.
Total Address Bits = 37.

\text{Tag bits} = 37 - (\text{Set bits}) - (\text{Offset bits})

\text{Tag bits} = 37 - 7 - 7 = 23 \text{ bits}

Answer:

\boxed{23}

" :::

:::question type="MSQ" question="A system uses a write-back cache with a write-allocate policy. Which of the following statements is/are TRUE?" options=["A write hit will always set the dirty bit to 1.","A read miss might cause data to be written to the main memory.","On a write miss, the relevant block is first read from main memory into the cache.","The cache does not need to store a dirty bit for each cache line."] answer="A,B,C" hint="Consider the sequence of operations for each scenario: write hit, read miss (causing eviction), and write miss." solution="

A: A write hit will always set the dirty bit to 1. This is true. In a write-back cache, any modification to a cache line must be marked by setting the dirty bit, so the system knows to write it back to memory upon eviction.

B: A read miss might cause data to be written to the main memory. This is true. If a read miss occurs and the cache set where the new block must be placed is full, a block must be evicted. If the evicted block is 'dirty' (its dirty bit is 1), it must be written back to main memory before the new block is loaded.

C: On a write miss, the relevant block is first read from main memory into the cache. This is true. This is the definition of the write-allocate policy. The block is brought into the cache so that the write can proceed as a cache hit.

D: The cache does not need to store a dirty bit for each cache line. This is false. The dirty bit is essential for the operation of a write-back cache to track which lines have been modified and need to be written back to memory.

"
:::

:::question type="NAT" question="A processor has a two-level cache system. L1 cache access time is 2 ns, and its local miss rate is 10%. L2 cache access time is 20 ns, and its local miss rate is 40%. The main memory access time is 95 ns. The average memory access time (in ns, rounded to one decimal place) is ______." answer="7.8" hint="Use the hierarchical AMAT formula: $AMAT = T_{L1} + M_{L1} \times (T_{L2} + M_{L2} \times T_{Mem})$ ." solution="
Step 1: State the formula for AMAT in a two-level cache system.

AMAT = T_{L1} + M_{L1} \times (T_{L2} + M_{L2} \times T_{Mem})

Step 2: Substitute the given values into the formula.

$T_{L1} = 2$ ns

$M_{L1} = 0.10$

$T_{L2} = 20$ ns

$M_{L2} = 0.40$

$T_{Mem} = 95$ ns

AMAT = 2 + 0.10 \times (20 + 0.40 \times 95)

Step 3: Calculate the value inside the parenthesis (L1 miss penalty).

\text{L1 Miss Penalty} = 20 + 0.40 \times 95

\text{L1 Miss Penalty} = 20 + 38 = 58 \text{ ns}

Step 4: Complete the AMAT calculation.

AMAT = 2 + 0.10 \times 58

AMAT = 2 + 5.8 = 7.8 \text{ ns}

Answer:

\boxed{7.8}

" :::

:::question type="MCQ" question="A direct-mapped cache has 16 lines. A program accesses the following sequence of memory block addresses: 3, 4, 3, 19, 4, 20. How many hits does this sequence result in, assuming the cache is initially empty?" options=["0","1","2","3"] answer="2" hint="A block with address `B` maps to cache line `B mod 16`. Trace the state of the cache for each access." solution="
The mapping function is `Cache Line = Block Address mod 16`.

Access 3: `3 mod 16 = 3`. Miss. Cache is empty. Line 3 is loaded with Block 3.
Access 4: `4 mod 16 = 4`. Miss. Cache is empty at this line. Line 4 is loaded with Block 4.
Access 3: `3 mod 16 = 3`. Hit. Line 3 contains Block 3.
Access 19: `19 mod 16 = 3`. Miss. Line 3 is occupied by Block 3. Since the tags will not match (Tag for 3 is different from Tag for 19), this is a conflict miss. Line 3 is replaced with Block 19.
Access 4: `4 mod 16 = 4`. Hit. Line 4 still contains Block 4.
Access 20: `20 mod 16 = 4`. Miss. Line 4 is occupied by Block 4. This is a conflict miss. Line 4 is replaced with Block 20.

The total number of hits is 2. Answer:

\boxed{2}

" :::

---

Summary

❗ Key Takeaways for GATE

Address Mapping is Fundamental: You must be able to rapidly calculate the sizes of Tag, Index/Set, and Block Offset fields for Direct, Set-Associative, and Fully Associative caches. This is the most frequently tested calculation.

AMAT is Key to Performance: Master the calculation of Average Memory Access Time for both single-level and multi-level caches. Pay close attention to the problem wording to distinguish between local and global miss rates.

Write Policies Dictate Behavior: Understand the precise operational differences between Write-Through and Write-Back, especially concerning the dirty bit, memory traffic, and actions on eviction. The interaction with Write-Allocate vs. No-Write-Allocate is a common source of conceptual questions.

Connect Cache to CPU Performance: Be prepared to calculate the effective CPI of a processor by quantifying the stall cycles introduced by instruction and data cache misses.

---

What's Next?

💡 Continue Learning

This topic connects to:

Virtual Memory: The concepts of address translation in caches are analogous to page mapping in virtual memory. The Translation Lookaside Buffer (TLB) is itself a specialized cache for page table entries. Understanding one strengthens your grasp of the other.

Pipelining: Cache misses are a primary cause of pipeline stalls (structural hazards if memory unit is busy, or data hazards leading to stalls). Understanding cache behavior is crucial for analyzing the performance of a pipelined processor.

Master these connections for a more holistic understanding of computer architecture and to solve integrated problems in the GATE exam.

---

Chapter Summary

📖 Memory Hierarchy - Key Takeaways

From our detailed examination of the memory hierarchy and cache systems, we can distill several essential principles that are paramount for both theoretical understanding and practical problem-solving.

The Principle of Locality: The design and effectiveness of the entire memory hierarchy are predicated on the principle of locality. Temporal locality suggests that recently accessed data is likely to be accessed again, while spatial locality suggests that data with addresses near a recently accessed item are also likely to be accessed. High-performance memory systems are engineered to exploit this predictable program behavior.

Hierarchy and Trade-offs: The memory hierarchy is a layered structure (registers, cache, main memory, secondary storage) built upon a fundamental trade-off between speed, cost, and capacity. As we move further from the CPU, access time and cost per bit decrease, while capacity increases. The goal is to create the illusion of a large, fast, and inexpensive memory.

Cache Organization Schemes: We have analyzed three primary methods of cache organization. Direct-mapped caches are simple but prone to conflict misses. Fully-associative caches eliminate conflict misses but are complex and costly to implement. Set-associative caches offer a pragmatic balance, providing a significant reduction in conflict misses over direct-mapped caches with manageable hardware complexity.

Address Translation: The process of mapping a main memory address to a cache location is a critical skill. An address is partitioned into a Tag, an Index (or Set), and a Block Offset. The size of these fields is determined by the main memory size, cache size, block size, and associativity, and their correct calculation is fundamental to solving cache-related problems.

Write Policies: The strategy for handling write operations, primarily Write-Through and Write-Back, involves a trade-off between memory bandwidth and write latency. A write-through policy ensures main memory is always up-to-date but can create a performance bottleneck, often mitigated by a write buffer. A write-back policy is more efficient but requires additional complexity (e.g., a dirty bit) to manage data consistency.

Performance Metrics (AMAT): The definitive measure of memory system performance is the Average Memory Access Time (AMAT). It is calculated using the formula:

AMAT = T_{hit} + (p_{miss} \times T_{miss\_penalty})

where

T_{hit}

is the cache hit time,

p_{miss}

is the miss rate, and

T_{miss\_penalty}

is the time required to service a miss. Minimizing each of these components is the central objective of cache design.

Cache Misses and Replacement Policies: Cache misses can be categorized into three types: Compulsory, Capacity, and Conflict (the "3Cs"). Understanding this classification is key to diagnosing performance issues. When a miss occurs in a set-associative or fully-associative cache and the corresponding set is full, a replacement policy such as Least Recently Used (LRU) or First-In, First-Out (FIFO) is used to select a block for eviction.

---

Chapter Review Questions

:::question type="MCQ" question="A processor uses a 2-way set-associative cache with a total of 4 sets and an LRU replacement policy. The main memory is byte-addressable and the block size is 16 bytes. The system is subjected to the following sequence of block address accesses: 1, 5, 1, 9, 5. What are the final tag values present in the cache set accessed by this sequence?" options=["{1, 2}","{0, 1}","{0, 2}","{1, 9}"] answer="A" hint="All given block addresses map to the same set. Calculate the tag for each block address and trace the cache state using LRU." solution="
1. Determine Set Index and Tag:

Number of Sets = 4.

Set Index = Block Address mod 4.

Tag = $\lfloor$ Block Address / 4 $\rfloor$ .

For all block addresses in the sequence {1, 5, 9}, the Set Index is $1 \pmod 4 = 1$ , $5 \pmod 4 = 1$ , and $9 \pmod 4 = 1$ . All accesses map to Set 1.

2. Calculate Tags:

Tag for Block 1 = $\lfloor 1 / 4 \rfloor = 0$ .

Tag for Block 5 = $\lfloor 5 / 4 \rfloor = 1$ .

Tag for Block 9 = $\lfloor 9 / 4 \rfloor = 2$ .

3. Trace the State of Set 1 (Capacity = 2 blocks, LRU policy):

Access 1 (Block 1, Tag 0): Miss. Set 1 is empty. Load block.

- State: `{T=0}`.

Access 2 (Block 5, Tag 1): Miss. Set 1 has space. Load block.

- State: `{T=0, T=1}`. (T=0 is LRU).

Access 3 (Block 1, Tag 0): Hit. Tag 0 is present. Update LRU.

- State: `{T=1, T=0}`. (T=1 is now LRU).

Access 4 (Block 9, Tag 2): Miss. Set 1 is full. Evict LRU (Tag 1). Load block.

- State: `{T=0, T=2}`. (T=0 is now LRU).

Access 5 (Block 5, Tag 1): Miss. Set 1 is full. Evict LRU (Tag 0). Load block.

- State: `{T=2, T=1}`. (T=2 is now LRU).

The final tags present in Set 1 are 1 and 2.
Answer: $\boxed{A}$
"
:::

:::question type="NAT" question="A processor has a two-level cache system. The L1 cache has a hit time of 1 ns and a miss rate of 10%. The L2 cache has a hit time of 10 ns and a local miss rate of 20%. The main memory access time is 120 ns. What is the Average Memory Access Time (AMAT) for this system, in nanoseconds?" answer="4.4" hint="The AMAT for a multi-level cache is calculated hierarchically. The miss penalty for L1 is the time it takes to access L2. The miss penalty for L2 is the time it takes to access main memory." solution="
The Average Memory Access Time (AMAT) for a two-level cache system is given by the formula:

AMAT = T_{L1\_hit} + p_{L1\_miss} \times (\text{L1 Miss Penalty})

The L1 Miss Penalty is the time to access the next level of the hierarchy, which is L2. This time itself is the AMAT of the L2 cache.

\text{L1 Miss Penalty} = T_{L2\_hit} + p_{L2\_miss\_local} \times (\text{L2 Miss Penalty})

The L2 Miss Penalty is the time to access main memory,

T_{MM}

Combining these, we get the full formula:

AMAT = T_{L1\_hit} + p_{L1\_miss} \times (T_{L2\_hit} + p_{L2\_miss\_local} \times T_{MM})

Given values:

$T_{L1\_hit} = 1$ ns

$p_{L1\_miss} = 0.10$

$T_{L2\_hit} = 10$ ns

$p_{L2\_miss\_local} = 0.20$

$T_{MM} = 120$ ns

Substitute the values into the formula:

AMAT = 1 + 0.10 \times (10 + 0.20 \times 120)

AMAT = 1 + 0.10 \times (10 + 24)

AMAT = 1 + 0.10 \times (34)

AMAT = 1 + 3.4

AMAT = 4.4 \text{ ns}

Answer:

\boxed{4.4}

"
:::

:::question type="MCQ" question="For a fixed cache size and block size, increasing the associativity of a cache from 2-way to 4-way will typically lead to which of the following outcomes?" options=["A decrease in conflict misses and an increase in the number of tag bits.","An increase in conflict misses and a decrease in the number of tag bits.","A decrease in compulsory misses and an increase in the number of tag bits.","An increase in capacity misses and a decrease in the number of tag bits."] answer="A" hint="Consider how associativity affects the mapping flexibility and how the main memory address is partitioned into tag, set, and offset fields." solution="
Let us analyze the effects of increasing associativity while keeping cache size and block size constant.

1. Effect on Misses:

Conflict Misses: Associativity determines how many blocks can reside in a single set. Increasing associativity from 2-way to 4-way means that four different memory blocks that map to the same set can coexist in the cache, whereas previously only two could. This increased flexibility directly reduces the probability of conflict misses, where a block is evicted because another block maps to the same set, even though other sets in the cache are empty.

Compulsory and Capacity Misses: These are generally unaffected by associativity. Compulsory misses are unavoidable first-time accesses. Capacity misses occur when the program's working set is larger than the entire cache, which is a function of cache size, not associativity.

2. Effect on Address Fields:

The total cache size is given by $C = S \times A \times B$ , where $S$ is the number of sets, $A$ is the associativity, and $B$ is the block size.

If $C$ and $B$ are constant, and we increase $A$ , the number of sets $S$ must decrease ( $S = C / (A \times B)$ ).

The number of set index bits is $\log_2(S)$ . Since $S$ decreases, the number of set bits also decreases.

The number of block offset bits ( $\log_2(B)$ ) remains constant as $B$ is fixed.

A 32-bit or 64-bit memory address is partitioned as: `Tag | Set | Offset`.

Since the total number of address bits is fixed, and the number of set bits decreases while offset bits remain constant, the number of tag bits must increase to compensate.

Therefore, increasing associativity leads to a decrease in conflict misses and an increase in the number of tag bits required.
"
:::

:::question type="NAT" question="Consider a 4-way set-associative cache with a capacity of 64 KB and a block size of 32 bytes. The system uses a 32-bit physical address. Calculate the total size of the cache memory in bits, including the data, tag, and valid bits for each block." answer="563200" hint="First, calculate the number of blocks and sets. Then, determine the size of the tag, set, and offset fields from the 32-bit address. The total size is the sum of data bits, tag bits, and status bits (e.g., one valid bit per block) across all blocks in the cache." solution="
1. Calculate Cache Parameters:

Cache Size = $64 \text{ KB} = 64 \times 2^{10}$ bytes

Block Size = $32 \text{ Bytes} = 2^5$ bytes

Associativity (A) = 4

2. Calculate Number of Blocks and Sets:

Number of Blocks = Cache Size / Block Size = $(64 \times 2^{10}) / 32 = (2^6 \times 2^{10}) / 2^5 = 2^{11} = 2048$ blocks.

Number of Sets (S) = Number of Blocks / Associativity = $2048 / 4 = 512$ sets.

3. Deconstruct the 32-bit Address:

Offset bits: Required to address a byte within a block. Size = $\log_2(\text{Block Size}) = \log_2(32) = 5$ bits.

Set bits: Required to select a set. Size = $\log_2(\text{Number of Sets}) = \log_2(512) = 9$ bits.

Tag bits: The remaining bits of the address. Size = Total bits - Set bits - Offset bits = $32 - 9 - 5 = 18$ bits.

4. Calculate Total Cache Size in Bits:
The total size is the sum of the storage required for all blocks. For each block, we need to store the data, the tag, and at least one valid bit.

Data per block = $32 \text{ bytes} = 32 \times 8 = 256$ bits.

Tag bits per block = 18 bits.

Status bits per block = 1 valid bit. (We assume only a valid bit for simplicity, as is common in GATE problems unless other bits like dirty bits are specified).

Total bits per block = Data bits + Tag bits + Valid bit = $256 + 18 + 1 = 275$ bits.

Total Cache Size = Number of Blocks

\times

Total bits per block

\text{Total Size} = 2048 \times 275

\text{Total Size} = 563200 \text{ bits}

Answer:

\boxed{563200}

"
:::

---

What's Next?

💡 Continue Your GATE Journey

Having completed Memory Hierarchy, you have established a firm foundation for several advanced chapters in Computer Organization and Architecture. The principles of caching, locality, and performance trade-offs are recurrent themes in system design.

Key connections:

Relation to Previous Learning: The concepts explored in this chapter directly address the processor-memory performance gap, a fundamental challenge introduced in our initial discussion of computer performance. The principles of locality and hierarchical memory design are the primary architectural solutions to this bottleneck, enabling the high-speed CPUs we have studied to operate efficiently without being constantly stalled by slow main memory.

Chapters That Build on These Concepts:

- Pipelining: Your understanding of cache hits, misses, and the miss penalty is critical for analyzing CPU pipelines. A cache miss results in a pipeline stall, and the Average Memory Access Time (AMAT) you learned to calculate is a key parameter in determining the overall Cycles Per Instruction (CPI) of a pipelined processor.
- Virtual Memory: The next logical step in our study is Virtual Memory. You will find that the mechanisms of virtual memory—such as paging and the Translation Lookaside Buffer (TLB)—are conceptually analogous to cache memory. Pages are like large cache blocks, page tables are like cache directories, and a TLB is, in effect, a specialized cache for page table entries. The principles of mapping, replacement, and locality apply directly.
- I/O Systems: In the study of Input/Output systems, the interaction between devices, memory, and the cache becomes important. Concepts like Direct Memory Access (DMA) must be considered in the context of cache coherency to ensure that the CPU and I/O devices see a consistent view of memory.

Memory Hierarchy

Memory Hierarchy

Overview

Chapter Contents

Learning Objectives

Part 1: Cache Memory

Introduction

Key Concepts

1. Cache Organization and Address Mapping

a) Direct Mapping

b) K-Way Set-Associative Mapping

c) Fully Associative Mapping

2. Cache Performance Evaluation

a) Average Memory Access Time (AMAT) for a Single-Level Cache

b) AMAT for Multi-Level Caches

c) Cache Performance and CPU Execution Time

3. Write Policies

a) Write-Through vs. Write-Back

b) Write Allocation Policies

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Chapter Summary

Chapter Review Questions

What's Next?

🎯 Key Points to Remember

Related Topics in Computer Organization and Architecture

Instruction Pipelining

CPU Components

I/O Interface

Instruction Set Architecture (ISA)

More Resources

Study Notes

Short Notes

Test Series

Mock Tests

Previous Year Papers

Chapter-wise PYQs

Chapter Practice

Why Choose MastersUp?

AI-Powered Plans

15,000+ Questions

Smart Analytics

Bookmark & Revise