Intermediate Code Generation

Overview

The compilation process can be logically divided into two primary parts: the analysis phase, or front end, and the synthesis phase, or back end. Between these two lies the crucial stage of intermediate code generation. In this chapter, we shall explore the design and implementation of this intermediate phase, which serves as a conceptual bridge connecting source language constructs to target machine instructions. The primary purpose of generating an intermediate representation is to create a machine-independent abstraction of the source program. This abstraction simplifies the overall compiler design, enabling a clean separation between the concerns of source language parsing and target-specific code generation, thereby enhancing the compiler's modularity and portability.

For the Graduate Aptitude Test in Engineering (GATE), a thorough understanding of this phase is indispensable. Questions frequently probe the translation of high-level programming constructs—such as conditional statements, looping structures, and array addressing—into their intermediate forms. Mastery of these translations is not merely an academic exercise; it is fundamental to solving problems related to subsequent phases, particularly code optimization. The choice of intermediate representation directly influences the scope and effectiveness of optimization techniques, and a firm grasp of these concepts will provide the necessary foundation for tackling complex questions that integrate multiple stages of compilation.

We begin our study by examining the various forms that an intermediate representation can take. These forms range from high-level, graphical representations like Abstract Syntax Trees (ASTs), which remain close to the source language structure, to low-level, linear representations like Three-Address Code, which more closely resemble machine instructions. Each representation possesses distinct advantages and is suited for different types of analysis and transformation. Our objective is to build a systematic framework for understanding how to generate these forms and why one might be preferred over another in a given compiler architecture.

---

Chapter Contents

| # | Topic | What You'll Learn |
|---|-------|-------------------|
| 1 | Intermediate Representations | Machine-independent representations of source programs. |

---

Learning Objectives

❗ By the End of This Chapter

After completing this chapter, you will be able to:

Explain the role and significance of intermediate code generation in the compilation process.

Differentiate between various forms of intermediate representations, such as Syntax Trees, Postfix Notation, and Three-Address Code.

Translate high-level language constructs, including conditional statements, loops, and procedure calls, into Three-Address Code.

Analyze the suitability of different intermediate representations for facilitating code optimization and target code generation.

---

We now turn our attention to Intermediate Representations...

Part 1: Intermediate Representations

Introduction

In the architecture of a modern compiler, the process of translating a high-level source program into low-level target machine code is seldom a monolithic operation. Instead, this complex transformation is judiciously divided into two major phases: the front-end and the back-end. The front-end is concerned with the analysis of the source language, performing lexical analysis, parsing, and semantic analysis to produce a representation that captures the meaning of the program. The back-end then takes this representation and performs optimization and code generation for a specific target architecture. The critical bridge between these two phases is the Intermediate Representation (IR), or intermediate code.

The IR is a machine-independent and language-independent representation of the source program. Its design is a trade-off: it must be high-level enough to be easily generated from the source language constructs, yet low-level enough to be readily translated into the instruction set of the target machine. By employing an IR, we achieve a profound modularity. A compiler for $N$ languages and $M$ target machines can be constructed with $N$ front-ends and $M$ back-ends, rather than $N \times M$ distinct compilers. This abstraction is fundamental to the portability and reusability of compiler technology. Our focus here will be on the structure, types, and generation of these crucial intermediate forms.

Front-End
(Source Language Dependent)

Intermediate
Representation (IR)

Back-End
(Target Machine Dependent)

📖 Intermediate Representation (IR)

An Intermediate Representation is an abstract data structure or code used internally by a compiler to represent the source program. It is generated by the compiler's front-end and consumed by its back-end. The IR is designed to be independent of both the specific source language and the target machine architecture.

---

Key Concepts

1. Types of Intermediate Representations

IRs can be broadly classified based on their level of abstraction. High-level IRs retain more source language structure, while low-level IRs are closer to the target machine's instruction set.

* Abstract Syntax Tree (AST): An AST is a tree representation of the abstract syntactic structure of the source code. Each interior node represents an operator, and the children of the node represent the operands. The AST is a high-level IR that is very close to the source language, making it suitable for semantic analysis and source-level transformations. For instance, the expression `a = b + 5` would be represented as a tree with an assignment operator at the root.

* Three-Address Code (TAC): TAC is a low-level, linear IR where each instruction has at most one operator on the right-hand side. The general form is `x = y op z`, where `x`, `y`, and `z` are names, constants, or compiler-generated temporaries. This form is advantageous because it exposes the computation order and is easy to translate into machine code.

* Control Flow Graph (CFG): A CFG represents the flow of control within a program. The nodes of the graph are basic blocks (sequences of straight-line code), and a directed edge from block $B_1$ to $B_2$ indicates that control can pass from the last instruction of $B_1$ to the first instruction of $B_2$ .

❗ Must Remember

The Symbol Table is a critical data structure used throughout the compilation process to store information about identifiers (variables, functions, etc.). However, it is not an Intermediate Representation of the program's executable code. It is an auxiliary data structure that works in tandem with the IR.

2. Three-Address Code (TAC) and its Implementations

Three-Address Code is arguably the most common form of IR for optimization and code generation. Let us consider the expression `x = (a + b) * (c + d)`. Its TAC representation might be:

```
t1 = a + b
t2 = c + d
t3 = t1 * t2
x = t3
```

TAC can be implemented in several ways, each with its own characteristics.

a) Quadruples

A quadruple, or "quad," is a record structure with four fields: `op`, `arg1`, `arg2`, and `result`.

📐 Quadruple Representation

(op, arg1, arg2, result)

Variables:

$op$ : The operator.

$arg1$ : The first operand.

$arg2$ : The second operand.

$result$ : The destination for the result of the operation.

When to use: This is the most common and straightforward implementation of TAC. The explicit `result` field makes it easy to reorder instructions during optimization.

b) Triples

A triple is a record structure with three fields: `op`, `arg1`, and `arg2`. The result of an operation is not stored explicitly but is referred to by its position (or index) in the sequence of triples.

📐 Triple Representation

(op, arg1, arg2)

Variables:

$op$ : The operator.

$arg1$ : The first operand (can be a pointer to another triple).

$arg2$ : The second operand (can be a pointer to another triple).

When to use: Triples are more space-efficient than quadruples. However, moving code during optimization is difficult because it requires updating all references to the moved triple.

c) Indirect Triples

Indirect triples offer a solution to the code-movement problem of triples. This implementation uses two data structures: an array of triples and a separate list of pointers to these triples. Optimization involves reordering the pointers in the list, not the triples themselves.

Worked Example:

Problem: Generate the Quadruple, Triple, and Indirect Triple representations for the statement `w = (a + b) * -(a + b)`.

Solution:

First, we write the Three-Address Code for the statement. Notice the common subexpression `a + b`.

```
t1 = a + b
t2 = uminus t1
t3 = t1 * t2
w = t3
```

Quadruples:

| # | op | arg1 | arg2 | result |
|---|---|---|---|---|
| 0 | `+` | `a` | `b` | `t1` |
| 1 | `uminus` | `t1` | | `t2` |
| 2 | `*` | `t1` | `t2` | `t3` |
| 3 | `=` | `t3` | | `w` |

Triples:

Here, references to results are made by index. For instance, in triple (1), `(0)` refers to the result of the operation at index 0.

| # | op | arg1 | arg2 |
|---|---|---|---|
| 0 | `+` | `a` | `b` |
| 1 | `uminus` | `(0)` | |
| 2 | `*` | `(0)` | `(1)` |
| 3 | `=` | `w` | `(2)` |

Indirect Triples:

The triples themselves are the same as above. The execution order is given by a separate instruction list.

| Instruction List | | Triple Array | # | op | arg1 | arg2 |
|---|---|---|---|---|---|---|
| 35 | `(0)` | | `0` | `+` | `a` | `b` |
| 36 | `(1)` | | `1` | `uminus` | `(0)` | |
| 37 | `(2)` | | `2` | `*` | `(0)` | `(1)` |
| 38 | `(3)` | | `3` | `=` | `w` | `(2)` |

If an optimization decided to swap instructions 36 and 37 (if it were legal), we would only need to swap the pointers in the instruction list, leaving the triple array untouched.

---

3. Directed Acyclic Graphs (DAGs)

A Directed Acyclic Graph (DAG) is a more compact representation of an expression that is particularly useful for identifying common subexpressions. In a DAG:

Interior nodes represent operators.

Leaf nodes represent variables or constants.

A node can have multiple parents if its computed value is used in several places.

This structure naturally eliminates redundancy.

Worked Example:

Problem: Construct the DAG for the following code segment and determine the number of nodes.

```c
x = a + b;
y = x - c;
z = a + b;
w = z - c;
```

Solution:

Step 1: Process `x = a + b`.
Create nodes for `a` and `b`. Create a `+` node with `a` and `b` as children. Label this `+` node as `x`.

Step 2: Process `y = x - c`.
The node for `x` already exists. Create a node for `c`. Create a `-` node with `x` and `c` as children. Label this `-` node as `y`.

Step 3: Process `z = a + b`.
The compiler observes that the expression `a + b` has already been computed. Instead of creating a new `+` node, it simply adds the label `z` to the existing `+` node that is already labeled `x`.

Step 4: Process `w = z - c`.
The compiler observes that `z` refers to the same node as `x`. Therefore, `z - c` is identical to `x - c`, which has also been computed. It adds the label `w` to the existing `-` node that is already labeled `y`.

Step 5: Count the nodes in the final DAG.
The distinct nodes are: `a`, `b`, `c`, the `+` node, and the `-` node.

Result:
The total number of nodes in the DAG is 5.

-
y, w

+
x, z

---

4. Basic Blocks and Control Flow Graphs

For analyzing control flow and performing many optimizations, we first partition the three-address code into basic blocks.

📖 Basic Block

A basic block is a maximal sequence of consecutive instructions such that:

Control flow can only enter the block at the first instruction (the leader).

Control flow can only leave the block at the last instruction, without halting or branching out from the middle.

To find the basic blocks, we must first identify the leaders in the code.

Algorithm to Identify Leaders:

The very first instruction in the program is a leader.

Any instruction that is the target of a conditional or unconditional `goto` is a leader.

Any instruction that immediately follows a conditional or unconditional `goto` is a leader.

Once leaders are identified, a basic block consists of a leader and all subsequent instructions up to, but not including, the next leader.

Worked Example:

Problem: Partition the following three-address code into basic blocks.

```
(1) i = 1
(2) L1: t1 = 4 * i
(3) t2 = a[t1]
(4) if t2 < 10 goto L2
(5) t3 = 0
(6) goto L3
(7) L2: t3 = 1
(8) L3: b[t1] = t3
(9) i = i + 1
(10) if i <= 20 goto L1
```

Solution:

Step 1: Identify the leaders.

Instruction (1) is the first instruction, so it is a leader.

Instruction (2) is the target of the `goto L1` at (10), so it is a leader.

Instruction (7) is the target of the `goto L2` at (4), so it is a leader.

Instruction (8) immediately follows the `goto L3` at (6), but it is also the target of that `goto`. So it is a leader.

Instruction (5) immediately follows the conditional `goto` at (4), so it is a leader.

The leaders are: (1), (2), (5), (7), (8).

Step 2: Form the basic blocks.
Each block starts with a leader and runs until the instruction before the next leader.

Block B1:
(1) `i = 1`

Block B2:
(2) `L1: t1 = 4 * i`
(3) `t2 = a[t1]`
(4) `if t2 < 10 goto L2`

Block B3:
(5) `t3 = 0`
(6) `goto L3`

Block B4:
(7) `L2: t3 = 1`

Block B5:
(8) `L3: b[t1] = t3`
(9) `i = i + 1`
(10) `if i <= 20 goto L1`

Result: There are 5 basic blocks. From these blocks, we can construct a Control Flow Graph (CFG) where nodes are B1-B5 and edges represent the flow of control (e.g., an edge from B2 to B4 for the `goto L2` and an edge from B2 to B3 for the fall-through case).

---

5. Backpatching

When generating code for Boolean expressions and flow-of-control statements like `if-else` or `while` loops in a single pass, the compiler faces a challenge. For a forward jump, the target label's address is not yet known when the `goto` statement is generated. Backpatching is a technique to solve this problem.

The core idea is to generate `goto` instructions with empty target fields. The addresses of these incomplete instructions are maintained in lists. When the target label is finally defined, the compiler can go back and fill in, or "backpatch," the correct target address into all the instructions on the list.

We typically use three synthesized attributes for a non-terminal `E` representing an expression:

`E.truelist`: A list of addresses of incomplete jumps that should go to the `true` label.

`E.falselist`: A list of addresses of incomplete jumps that should go to the `false` label.

`E.nextlist`: A list of addresses of incomplete jumps for statements like `while`.

Consider the grammar `S -> if E then S1`. When the parser sees `then`, the expression `E` has been processed. We now know the location of the code for `S1`. We can backpatch `E.truelist` to point to the start of `S1`'s code. This technique allows for efficient, one-pass generation of control-flow code. Both Boolean expressions and general flow-of-control statements can be handled this way.

---

Problem-Solving Strategies

💡 GATE Strategy

Counting Basic Blocks: To quickly find basic blocks, scan the code for `goto` statements and labels. Mark the target of every `goto` as a leader. Mark the statement immediately following any `goto` as a leader. Finally, mark the first statement. Then, count the number of marked leaders to find the number of blocks.

Counting DAG Nodes: To count DAG nodes for a sequence of assignments, process them one by one. For each expression `y op z`, check if a node representing `y op z` already exists. If yes, reuse it. If no, create a new interior node. The total count is the number of leaf nodes (unique variables/constants) plus the number of unique interior nodes (operations).

TAC Implementations: Remember the key difference: Quadruples are `(op, arg1, arg2, result)`, making them independent. Triples are `(op, arg1, arg2)`, with results referenced by position, making them position-dependent.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Confusing Symbol Table with IR: The Symbol Table stores metadata about identifiers. The IR represents the program's executable logic. They are distinct.

✅ Correct Approach: Understand that the IR describes what the program does, while the Symbol Table describes what the names mean.

❌ Incorrectly Identifying Leaders: A common error is forgetting that the statement immediately following a `goto` (conditional or unconditional) is always a leader.

✅ Correct Approach: Strictly apply all three rules for identifying leaders: (1) first statement, (2) target of a jump, (3) statement after a jump.

❌ Misinterpreting Triple Pointers: In a triple like `(2): (*, (0), (1))`, the operands `(0)` and `(1)` are not integer values. They are pointers to the results of the triples at index 0 and index 1.

✅ Correct Approach: Treat parenthesized numbers in triples as references to other instructions in the triple list.

---

Practice Questions

:::question type="MCQ" question="Consider the C assignment `p = q[i+1] r;`, where `q` is an array of integers (4 bytes per element). Which of the following is a plausible three-address code representation?" options=["t1 = i + 1; t2 = q[t1]; t3 = t2 r; p = t3;","t1 = i + 1; t2 = 4 t1; t3 = addr(q); t4 = t3[t2]; t5 = t4 r; p = t5;","t1 = i + 1; t2 = addr(q); t3 = t2 + t1; p = t3 r;","t1 = i + 1; t2 = 4 t1; t3 = addr(q) + t2; t4 = t3; t5 = t4 r; p = t5;"] answer="t1 = i + 1; t2 = 4 t1; t3 = addr(q) + t2; t4 = t3; t5 = t4 r; p = t5;" hint="Think about how array addressing is broken down into simpler address calculations at the TAC level. The offset must be calculated explicitly." solution="Step 1: The expression `i+1` is computed first. `t1 = i + 1`.
Step 2: To find the memory offset for `q[i+1]`, the index `t1` must be multiplied by the size of the element, which is 4. `t2 = 4 * t1`.
Step 3: The base address of the array `q` is added to the offset `t2` to get the final address of the element. `t3 = addr(q) + t2`.
Step 4: The value at this memory address must be dereferenced (loaded). `t4 = *t3`.
Step 5: This value is then multiplied by `r`. `t5 = t4 * r`.
Step 6: The final result is assigned to `p`. `p = t5`. This matches the correct option."
:::

:::question type="NAT" question="Consider the following sequence of three-address code instructions. What is the number of basic blocks this sequence can be partitioned into?

(1) read(n)
(2) i = 1
(3) prod = 1
(4) L1: if i > n goto L2
(5) prod = prod * i
(6) i = i + 1
(7) goto L1
(8) L2: print(prod)" answer="3" hint="Identify all leaders using the three rules. The number of leaders equals the number of basic blocks." solution="Step 1: Identify the leaders based on the rules.

Rule 1: Instruction (1) is the first statement, so it is a leader.

Rule 2: Instruction (4) is the target of `goto L1`, so it is a leader.

Rule 2: Instruction (8) is the target of `goto L2`, so it is a leader.

Rule 3: Instruction (5) follows a conditional goto, but it is not a leader because it is not the start of a new block; the block starts at L1 (instruction 4) and continues. Wait, let's re-apply the rules carefully.

Leaders are:

Instruction (1) is the first instruction.

Instruction (4), label `L1`, is the target of the `goto` at (7).

Instruction (8), label `L2`, is the target of the `goto` at (4).

Instruction (5) follows a conditional goto.

Instruction (8) follows an unconditional goto.

Let's list them: (1), (4), (8). Wait, what about (5)? Rule 3 says the instruction immediately following a goto is a leader. So (5) is a leader. But that would make (4) a block of size 1. This is a subtle point. A basic block is a maximal sequence. Let's restart.
Leaders:

(1) is a leader (first statement). Block B1 starts here.

(4) is a leader (target of goto). B1 ends at (3). Block B2 starts at (4).

(8) is a leader (target of goto). B2 ends at (7). Block B3 starts at (8).

So the leaders are (1), (4), and (8). This gives 3 basic blocks.
B1: (1), (2), (3)
B2: (4), (5), (6), (7)
B3: (8)
The number of basic blocks is 3."
:::

:::question type="NAT" question="A Directed Acyclic Graph (DAG) is constructed for the following C code segment. What is the minimum number of nodes in the DAG?

```c
p = x * y;
q = z + p;
r = x * y;
s = z + r;
t = s + q;
```
" answer="6" hint="Construct the DAG step-by-step, reusing nodes for common subexpressions. Count the final number of unique leaf and operator nodes." solution="Step 1: `p = x y`. Create leaves for `x`, `y`. Create an operator node `` with children `x`, `y`. Label this node `p`. Nodes: `x`, `y`, `*`. Total: 3.
Step 2: `q = z + p`. Create leaf for `z`. The node for `p` already exists. Create an operator node `+` with children `z` and `p`'s node (``). Label this node `q`. Nodes: `x`, `y`, `z`, ``, `+`. Total: 5.
Step 3: `r = x y`. The expression `x y` has already been computed and its result is available at the node labeled `p`. We reuse this node and add the label `r` to it. No new nodes are created.
Step 4: `s = z + r`. The variable `r` refers to the same node as `p`. So `z + r` is the same as `z + p`. This expression has already been computed and its result is at the node labeled `q`. We reuse this node and add the label `s` to it. No new nodes are created.
Step 5: `t = s + q`. The variables `s` and `q` both refer to the same `+` node. The expression is `(z+p) + (z+p)`. We create a new `+` node. Its children are both the existing `+` node (labeled `s` and `q`). Label this new node `t`. Nodes: `x`, `y`, `z`, `*`, `+` (for q/s), `+` (for t). Total: 6.
The final nodes are: leaves `x`, `y`, `z`; and operators `*`, `+`, `+`. Total is 6."
:::

:::question type="MSQ" question="Which of the following statements about intermediate representations and related concepts are correct?" options=["Backpatching is a technique used to resolve forward jumps in a single pass.","A Directed Acyclic Graph (DAG) for an expression can have more nodes than its corresponding Abstract Syntax Tree (AST).","In the three-address code implementation using triples, reordering code for optimization is computationally expensive.","A basic block can contain multiple jump instructions."] answer="A,C" hint="Evaluate each statement's validity. Consider the properties of each data structure and algorithm." solution="A. Correct. Backpatching is precisely for generating code for control-flow statements with forward jumps (like `if-then-else`) in one pass by filling in target addresses later.
B. Incorrect. A DAG is always more compact or equal in size to an AST for the same expression. The DAG merges common sub-nodes, while the AST duplicates them, so a DAG can never have more nodes.
C. Correct. In triples, results are referenced by their position. If a triple is moved, all other triples that reference it must be updated, which is an expensive operation. This is the primary motivation for indirect triples.
D. Incorrect. By definition, a basic block can only have a jump instruction (if any) as its very last statement. Control cannot branch out from the middle of a block."
:::

---

Summary

❗ Key Takeaways for GATE

IR's Role: The Intermediate Representation is the crucial link between the language-dependent front-end and the machine-dependent back-end, enabling compiler modularity and portability.

TAC is Central: Three-Address Code (TAC) is the most prevalent form of IR. Understand its implementations (Quadruples, Triples) and how to convert expressions into TAC.

DAGs for Optimization: Directed Acyclic Graphs (DAGs) are powerful for representing expressions compactly and are the primary mechanism for identifying and eliminating common subexpressions.

Master Basic Blocks: The ability to correctly identify leaders and partition code into basic blocks is a frequently tested skill. Remember the three rules for finding leaders.

Backpatching for Control Flow: Backpatching is the standard one-pass technique for generating code for control-flow statements by resolving forward jump addresses after the target is known.

---

What's Next?

💡 Continue Learning

This topic serves as a foundation for the subsequent phases in the compiler's back-end.

Code Optimization: Many optimization techniques, such as common subexpression elimination, constant folding, and loop optimizations, operate directly on intermediate representations like TAC, DAGs, and Control Flow Graphs. Understanding IR is a prerequisite for studying optimization.
Code Generation: The final phase of the compiler translates the optimized IR into target machine code. The structure of the IR (especially TAC) heavily influences the complexity and efficiency of the code generation algorithms.

---

Chapter Summary

📖 Intermediate Code Generation - Key Takeaways

In our study of the intermediate code generation phase, we have established its critical role as a bridge between the front-end (analysis) and back-end (synthesis) of a compiler. The following points encapsulate the essential knowledge required for a thorough understanding of this topic.

Purpose and Abstraction: The primary purpose of an intermediate representation (IR) is to provide a machine-independent and language-independent abstraction of the source program. This decoupling allows for the development of retargetable compilers and facilitates powerful, machine-independent code optimization.

Types of Intermediate Representations: We have classified IRs into several categories. High-level representations, such as Abstract Syntax Trees (ASTs), preserve hierarchical structural information from the source code. Low-level representations, like Three-Address Code (TAC), are linear, closer to machine instructions, and more suitable for optimization and final code generation.

Three-Address Code (TAC): This is the most prevalent form of IR. Each TAC instruction has at most one operator on the right-hand side and is of the general form $x = y \text{ op } z$ . This format simplifies the analysis of data flow and the application of optimization techniques. We examined various instruction types, including assignment, arithmetic, logical, unconditional/conditional jumps, and procedure calls.

Implementation of TAC: We explored three primary data structures for implementing TAC: Quadruples, Triples, and Indirect Triples.

Quadruples: A record with four fields (operator, arg1, arg2, result). The result is explicitly named, making it easy to move instructions during optimization.
Triples: A record with three fields (operator, arg1, arg2). Results are referred to by their position, making them more space-efficient but harder to manipulate for optimizations that involve code movement.
* Indirect Triples: A hybrid approach that uses a separate list of pointers to triples. This representation combines the space efficiency of triples with the flexibility of quadruples for code motion.

Syntax-Directed Translation: The generation of intermediate code is systematically driven by the grammar of the source language. We have seen how semantic rules, embedded within a Syntax-Directed Definition (SDD), can be used to generate TAC for various constructs like arithmetic expressions, boolean expressions (using short-circuiting), and control flow statements (e.g., `if-else`, `while`).

Addressing and Data Structures: We discussed the translation of array references into TAC, which involves calculating memory offsets based on the array's base address and dimensions. Furthermore, we considered how symbol table information, such as type and storage location, is crucial for generating correct intermediate code.

---

Chapter Review Questions

:::question type="MCQ" question="Consider the following expression: `p = a (b + c) - (b + c) / d`. Which of the following statements is TRUE regarding its translation into Three-Address Code (TAC) and its implementation using Triples, assuming a common subexpression elimination optimization is performed during generation?" options=["The Triples representation would require re-computation or complex pointer manipulation if the TAC block containing the result of `(b+c)` were moved.", "The number of temporary variables required is 3.", "The TAC will contain exactly 4 instructions.", "Using Indirect Triples offers no advantage over Triples in this specific context."] answer="A" hint="Think about how Triples refer to intermediate results. What happens if an instruction that produces such a result is moved to a different location?" solution="Let us first generate the optimized Three-Address Code for the expression `p = a (b + c) - (b + c) / d`.

The common subexpression is `(b + c)`.

t_1

= b + c`

t_2

= a *

t_1

t_3

t_1

/ d`

t_4

t_2

t_3

`p =

t_4

This requires 4 temporary variables ( $t_1, t_2, t_3, t_4$ ) and 5 instructions. Thus, options B and C are false.

Now let's represent this using Triples. Results are referred to by their instruction number.

| # | Op | Arg1 | Arg2 |
|---|---|---|---|
| (0) | + | b | c |
| (1) | * | a | (0) |
| (2) | / | (0) | d |
| (3) | - | (1) | (2) |
| (4) | = | p | (3) |

In the Triples representation, the results of instructions (0), (1), (2), and (3) are implicitly defined by their position. For example, instruction (1) uses the result of instruction (0) as its second argument.

If a code optimization pass were to move the block of code containing instruction (0), all subsequent instructions that refer to `(0)` (namely, (1) and (2)) would become incorrect unless they are also updated. This makes code motion difficult. Indirect Triples solve this by adding a layer of indirection—we would move the pointer in the instruction list, not the triple itself. Therefore, the statement in option A is correct. Option D is false because Indirect Triples are specifically designed to overcome this limitation of Triples.
"
:::

:::question type="NAT" question="Consider the following C code snippet involving short-circuit evaluation:
```c
if (a > b && (c < d || e != f)) {
x = 1;
}
```
What is the minimum number of conditional and unconditional jump instructions required in the Three-Address Code representation for this `if` statement?" answer="3" hint="Recall the 'fall-through' logic for short-circuiting. A jump is needed to skip code when a condition is met (e.g., `a > b` is false) or to enter the `then` block when the full condition is true." solution="Let us translate the boolean expression using short-circuit logic. We will define `L_true` as the label for the `x = 1;` block and `L_false` as the label for the code following the `if` statement.

The expression is `a > b && (c < d || e != f)`.

Translate `a > b`: If `a > b` is false, the entire `&&` expression is false. So, we jump to `L_false`.

`if a <= b goto L_false` (This is the first conditional jump)

Translate `c < d`: If the first part was true, we now evaluate the second part. If `c < d` is true, the `||` expression is true, and thus the entire condition is true. We can jump to `L_true`.

`if c < d goto L_true` (This is the second conditional jump)

Translate `e != f`: This is only evaluated if `c < d` was false. If `e != f` is false, the `||` is false, and the entire condition is false. So we jump to `L_false`.

`if e == f goto L_false` (This is the third conditional jump)

If `e != f` was true, we fall through to the `L_true` block.

The generated TAC structure would look like this:

```
if a <= b goto L_false // Jump 1 (conditional)
if c < d goto L_true // Jump 2 (conditional)
if e == f goto L_false // Jump 3 (conditional)
L_true:
x = 1
L_false:
...
```

In this standard translation, we require 3 conditional jump instructions. No unconditional jumps are necessary if the `L_true` block is placed immediately after the conditional checks. Therefore, the minimum number of jump instructions (both conditional and unconditional) is 3.
"
:::

:::question type="MCQ" question="A compiler uses an Abstract Syntax Tree (AST) as its primary intermediate representation. Which of the following compiler phases would find this representation least suitable for its primary operations?" options=["Semantic Analysis", "Source-level Debugging", "Machine-Independent Code Optimization", "Machine Code Generation"] answer="D" hint="Consider what information is most useful for each phase. ASTs are hierarchical and abstract. Which phase requires a linear, machine-proximate representation?" solution="Let's analyze the suitability of an AST for each phase:

* A. Semantic Analysis: The AST is highly suitable for semantic analysis. The tree structure allows for easy traversal to perform tasks like type checking, scope resolution, and verifying that variables are declared before use. The hierarchical nature directly reflects the program's logical structure.

* B. Source-level Debugging: An AST is useful for mapping executable code back to the original source lines, which is essential for source-level debuggers. The tree nodes can be annotated with line and column numbers.

* C. Machine-Independent Code Optimization: While many optimizations are performed on linear IRs like Three-Address Code, some high-level, structure-based optimizations (e.g., function inlining, some forms of loop optimization) can be performed directly on the AST.

* D. Machine Code Generation: This is the phase where the representation is least suitable. The final code generation phase is concerned with register allocation, instruction selection, and memory addressing. These tasks require a linear, low-level representation that is close to the target machine's instruction set, such as Three-Address Code or a similar linear IR. The abstract, hierarchical nature of an AST is a poor match for the sequential nature of machine instructions. Compilers almost always convert the AST to a linear IR before this phase.

Therefore, Machine Code Generation finds the AST least suitable for its operations.
"
:::

:::question type="NAT" question="Consider the following sequence of Three-Address Code instructions being executed. The initial value of memory location `b` is 10 and `c` is 5.
```
(1) t1 = b + c
(2) t2 = t1 * 2
(3) b = t2 - 5
(4) t1 = t1 + 1
(5) c = t1
```
After the execution of all five instructions, what is the final value stored in the memory location `c`?" answer="16" hint="Trace the value of each variable and temporary step-by-step. Be careful about how `t1` is updated mid-sequence." solution="We will trace the values of the variables `b`, `c` and the temporaries `t1`, `t2` through the execution of the instructions.

Initial state: `b = 10`, `c = 5`

`(1) t1 = b + c`

`t1 = 10 + 5 = 15`
State: `b=10`, `c=5`, `t1=15`

`(2) t2 = t1 * 2`

`t2 = 15 * 2 = 30`
State: `b=10`, `c=5`, `t1=15`, `t2=30`

`(3) b = t2 - 5`

`b = 30 - 5 = 25`
State: `b=25`, `c=5`, `t1=15`, `t2=30`

`(4) t1 = t1 + 1`

The value of `t1` is updated.
`t1 = 15 + 1 = 16`
State: `b=25`, `c=5`, `t1=16`, `t2=30`

`(5) c = t1`

The new value of `t1` is assigned to `c`.
`c = 16`
State: `b=25`, `c=16`, `t1=16`, `t2=30`

After all instructions are executed, the final value in memory location `c` is 16.
"
:::

---

What's Next?

💡 Continue Your GATE Journey

Having completed Intermediate Code Generation, you have established a firm foundation for the subsequent synthesis phases of the compiler. The abstract, machine-independent representation we have constructed is not an end in itself, but rather a powerful substrate for improvement and translation.

Key connections:

* Relation to Previous Chapters: We began this phase with the output of the Syntax Analysis phase—typically an Abstract Syntax Tree (AST). The process of syntax-directed translation, which we used extensively, is a direct application of the concepts of Syntax-Directed Definitions learned in the parsing chapter. We have effectively "flattened" the hierarchical parse tree into a linear, instruction-like sequence.

* Foundation for Future Chapters: The Three-Address Code generated in this chapter serves as the direct input to the Code Optimization phase. The regular, simple structure of TAC is precisely what enables the application of various optimization algorithms, such as common subexpression elimination, dead code elimination, and loop optimizations. Following optimization, the refined intermediate code is then passed to the Code Generation phase, which will translate it into the target machine's assembly or machine language. Your understanding of IR is therefore fundamental to comprehending the entire back-end of the compiler.

Intermediate Code Generation

Intermediate Code Generation

Overview

Chapter Contents

Learning Objectives

Part 1: Intermediate Representations

Introduction

Key Concepts

1. Types of Intermediate Representations

2. Three-Address Code (TAC) and its Implementations

a) Quadruples

b) Triples

c) Indirect Triples

3. Directed Acyclic Graphs (DAGs)

4. Basic Blocks and Control Flow Graphs

5. Backpatching

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Chapter Summary

Chapter Review Questions

What's Next?

🎯 Key Points to Remember

Related Topics in Compiler Design

Code Optimization

Syntax-Directed Translation

Parsing (Syntax Analysis)

Lexical Analysis

More Resources

Study Notes

Short Notes

Test Series

Mock Tests

Previous Year Papers

Chapter-wise PYQs

Chapter Practice

Why Choose MastersUp?

AI-Powered Plans

15,000+ Questions

Smart Analytics

Bookmark & Revise