Deep Dive into V8 JS Engine | Node js
Episode 08 - Deep Dive into V8 JS Engine
Hey everyone! Welcome back to the Node.js tutorial series. Today we are going to take a deep dive into the V8 JavaScript Engine!
Understanding how V8 works behind the scenes will make you a better developer and help you write more performant code!
What we will cover:
- Parsing Stage: Lexical Analysis
- Tokenization
- Syntax Analysis and AST
- Interpreted vs Compiled Languages
- Ignition (Interpreter)
- TurboFan (JIT Compiler)
- Hot Code Optimization
- Deoptimization
Behind the Scenes: Parsing Stage in V8 Engine
When JavaScript code is executed, it goes through several stages in the V8 engine. The first stage is parsing, which includes lexical analysis and tokenization.
V8 Engine Pipeline:
===================
Your JavaScript Code
│
▼
┌─────────────────────┐
│ A) PARSING STAGE │
│ - Lexical Analysis │
│ - Tokenization │
│ - Syntax Analysis │
│ - AST Generation │
└─────────────────────┘
│
▼
┌─────────────────────┐
│ B) INTERPRETATION │
│ - Ignition │
│ - Bytecode │
└─────────────────────┘
│
▼
┌─────────────────────┐
│ C) OPTIMIZATION │
│ - TurboFan │
│ - Machine Code │
└─────────────────────┘
A) Parsing Stage: Lexical Analysis and Tokenization
1. Lexical Analysis
Purpose: The main goal of lexical analysis is to break down the raw JavaScript code into manageable pieces called tokens.
Process:
Input Code: =========== var a = 10; Lexical Analysis Process: ========================= The code is scanned character by character to identify individual tokens.
2. What is Tokenization?
Definition: Tokenization is the process of converting code into a series of tokens. Each token represents a fundamental element of the language, such as keywords, operators, identifiers, and literals.
Example:
For the code var a = 10;, the tokens might be:
Tokenization Example: ===================== var a = 10; Token 1: var → (keyword) Token 2: a → (identifier) Token 3: = → (operator) Token 4: 10 → (literal) Token 5: ; → (punctuation)
Why Tokenization?
Tokenization helps the V8 engine to read and understand the code more effectively by breaking it down into smaller, more manageable pieces. This step is crucial for further analysis and compilation.
3. Output
Tokens: The result of tokenization is a list of tokens that the V8 engine uses in subsequent stages of parsing.
Another Example:
================
function add(a, b) {
return a + b;
}
Tokens:
-------
function → (keyword)
add → (identifier)
( → (punctuation)
a → (identifier)
, → (punctuation)
b → (identifier)
) → (punctuation)
{ → (punctuation)
return → (keyword)
a → (identifier)
+ → (operator)
b → (identifier)
; → (punctuation)
} → (punctuation)
In the parsing stage, lexical analysis breaks down JavaScript code into tokens. This process helps the V8 engine understand and process the code by converting it into a format that can be easily analyzed and optimized in later stages.
2nd Stage: Syntax Analysis and Abstract Syntax Tree (AST)
After the lexical analysis and tokenization stages, the next step in the parsing process is syntax analysis. In this stage, the tokens are converted into an Abstract Syntax Tree (AST).
1. Syntax Analysis
Purpose: To analyze the syntactic structure of the tokens and build the Abstract Syntax Tree (AST).
The tokens are analyzed according to the grammar rules of JavaScript to create a hierarchical tree structure that represents the code.
2. Abstract Syntax Tree (AST)
The AST is a tree-like data structure that represents the syntactic structure of the source code. Each node in the tree corresponds to a construct in the code, such as variables, expressions, or statements.
Example:
For the code:
var a = 10;
The AST might look something like this:
Abstract Syntax Tree:
=====================
Program
│
▼
VariableDeclaration
│
┌──────┴──────┐
│ │
▼ ▼
kind declarations
"var" │
▼
VariableDeclarator
│
┌──────┴──────┐
│ │
▼ ▼
id init
│ │
▼ ▼
Identifier Literal
name: "a" value: 10
Website: You can explore AST structures using tools like AST Explorer, which provides a visual representation of the AST for various pieces of code.
Visit: https://astexplorer.net/
3. Interesting Fact: Syntax Errors
Syntax Errors: When the V8 engine reads code, it processes tokens one by one. If an unexpected token is encountered that does not fit the grammar rules, a syntax error occurs.
This is because the AST cannot be generated if the code does not adhere to the expected syntax, indicating that something is wrong with the structure of the code.
Syntax Error Example: ===================== // Invalid code var = 10; // V8 encounters unexpected token // Expected: identifier after 'var' // Got: '=' operator SyntaxError: Unexpected token '='
In the syntax analysis stage, the tokens are analyzed to create an Abstract Syntax Tree (AST), which provides a structured and hierarchical representation of the code. This tree structure helps the V8 engine understand and process the code more effectively.
B) Interpreter and Compilation
1. Interpreted vs. Compiled Languages
Interpreted Languages:
- Definition: These languages are executed line by line. The interpreter reads and executes the code directly.
- Pros: Faster to start executing code, easier to debug.
- Cons: Slower execution compared to compiled languages because of the line-by-line interpretation.
- Example: Python
Interpreted Language: ===================== Line 1 → Read → Execute Line 2 → Read → Execute Line 3 → Read → Execute ... No pre-compilation step!
Compiled Languages:
- Definition: These languages are first translated into machine code (binary code) through a process called compilation. The machine code is then executed by the computer's hardware.
- Pros: Faster execution because the code is pre-compiled into machine code.
- Cons: Longer initial compilation time, more complex debugging process.
- Example: C, C++
Compiled Language:
==================
Source Code
│
▼
Compiler (Takes time)
│
▼
Machine Code (Binary)
│
▼
Execute (Very Fast!)
Q: Is JavaScript interpreted or compiled language? 🤨
A: JavaScript is neither purely interpreted nor purely compiled. It utilizes a combination of both techniques!
JavaScript = Interpreted + Compiled (JIT)
=========================================
┌─────────────┐
│ Your Code │
└──────┬──────┘
│
▼
┌─────────────┐
│ Parsing │
│ (AST) │
└──────┬──────┘
│
▼
┌─────────────┐
│ Ignition │ ← Interpreter
│ (Bytecode) │
└──────┬──────┘
│
┌─────┴─────┐
│ │
▼ ▼
Execute Hot Code?
│ │
│ ▼
│ ┌─────────────┐
│ │ TurboFan │ ← JIT Compiler
│ │(Optimized │
│ │Machine Code)│
│ └─────────────┘
│ │
└─────┬─────┘
│
▼
Execution
Interpreter:
- Initial Execution: JavaScript uses an interpreter to execute code quickly and start running the script. This allows for rapid execution of scripts and immediate feedback.
Compiler:
- Just-In-Time (JIT) Compilation: JavaScript engines like V8 use JIT compilation to improve performance. JIT compilation involves compiling code into machine code at runtime, just before execution. This process optimizes performance by compiling frequently executed code paths into optimized machine code.
1. Abstract Syntax Tree (AST) to Bytecode
AST to Bytecode: After parsing the code and generating the AST, the code is passed to the interpreter. In the V8 engine, this interpreter is called Ignition.
Ignition:
- Converts the AST into bytecode
- Bytecode is a lower-level, intermediate representation of the code that the JavaScript engine can execute more efficiently than raw source code
- Execution: Ignition reads and executes the bytecode line by line
Ignition Process:
=================
AST
│
▼
┌─────────┐
│ Ignition │
│(Interpreter)
└────┬────┘
│
▼
Bytecode
│
▼
Execute!
2. Just-In-Time (JIT) Compilation
TurboFan:
A compiler within the V8 engine that optimizes frequently executed (hot) code paths.
When Ignition identifies a portion of the code that runs frequently (hot code), it sends this code to TurboFan for optimization.
Optimization: TurboFan converts the bytecode into optimized machine code, which improves performance for repeated executions.
TurboFan Optimization:
======================
Bytecode
│
│ (This code runs 1000 times!)
│
▼
┌──────────┐
│ TurboFan │
│ (JIT │
│ Compiler)│
└────┬─────┘
│
▼
Optimized
Machine Code
│
▼
Super Fast
Execution!
3. Hot Code Optimization and Deoptimization
Hot Code: Refers to code that is executed frequently. TurboFan focuses on optimizing hot code to improve performance.
Optimization Assumptions:
TurboFan makes certain assumptions during optimization based on the types and values it encounters. For example, if a function is optimized with the assumption that it only processes numbers, it will run very efficiently for such cases.
Optimization Example:
=====================
function add(a, b) {
return a + b;
}
// Called many times with numbers
add(5, 10);
add(20, 30);
add(100, 200);
// ... 1000 more times
TurboFan: "This function always gets numbers!"
"Let me optimize it for number addition!"
Result: Super fast optimized machine code!
Deoptimization:
Scenario: If TurboFan's assumptions are incorrect (e.g., a function that was optimized for numbers receives strings), the optimization may fail.
Process: In such cases, TurboFan will deoptimize the code and revert it to a less optimized state. The code is then sent back to Ignition for further interpretation and possible re-optimization.
Deoptimization Example:
=======================
function add(a, b) {
return a + b;
}
// Optimized for numbers
add(5, 10); // ✅ Fast!
add(20, 30); // ✅ Fast!
// Suddenly...
add("hello", "world"); // 😱 Strings?!
TurboFan: "Wait! My assumption was wrong!"
"I optimized for numbers, not strings!"
"Deoptimizing... sending back to Ignition!"
Result: Code runs slower until re-optimized
Key Terms
Inline Caching: A technique used to speed up property access by caching the results of lookups.
Copy Elision: An optimization technique that eliminates unnecessary copying of objects.
Developer Note - Best Practice!
Best Practice: For optimal performance, try to pass consistent types and values to functions.
For example, if a function is optimized for numeric calculations, avoid passing strings to prevent deoptimization.
// ❌ BAD - Inconsistent types (causes deoptimization)
function calculate(value) {
return value * 2;
}
calculate(10); // Number
calculate("hello"); // String - Deoptimization!
calculate(20); // Number
calculate([1,2,3]); // Array - Deoptimization again!
// ✅ GOOD - Consistent types (stays optimized)
function calculateNumber(value) {
return value * 2;
}
calculateNumber(10); // Number
calculateNumber(20); // Number
calculateNumber(30); // Number
// TurboFan stays happy! Optimized code runs fast!
The Complete V8 Pipeline
Complete V8 Engine Flow:
========================
JavaScript Source Code
│
▼
┌───────────────────────┐
│ LEXICAL ANALYSIS │
│ (Tokenization) │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ SYNTAX ANALYSIS │
│ (AST Generation) │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ IGNITION │
│ (Interpreter) │
│ AST → Bytecode │
└───────────┬───────────┘
│
Execute Bytecode
│
Is this HOT code?
(Runs frequently?)
│
┌───────┴───────┐
│ │
No Yes
│ │
▼ ▼
Continue ┌───────────┐
with │ TURBOFAN │
Ignition │ (JIT) │
│ │ Optimize! │
│ └─────┬─────┘
│ │
│ Optimized
│ Machine Code
│ │
│ ┌─────┴─────┐
│ │ │
│ Assumption Assumption
│ Correct? Wrong?
│ │ │
│ ▼ ▼
│ Super Deoptimize
│ Fast! (Back to
│ Ignition)
│ │
└───────┬───────┘
│
▼
Execution
Different JavaScript Engines
All of these processes work differently in each JavaScript engine, such as SpiderMonkey or others, but the V8 engine is considered the best on the market.
Understanding the structure of the V8 engine is very beneficial for you!
| Engine | Used By | Interpreter | Compiler |
|---|---|---|---|
| V8 | Chrome, Node.js | Ignition | TurboFan |
| SpiderMonkey | Firefox | Interpreter | IonMonkey |
| JavaScriptCore | Safari | LLInt | FTL |
| Chakra | Old Edge | Interpreter | SimpleJIT |
Explore More!
Now, I'll show you what bytecode looks like. Your task is to explore this code!
Bytecode Examples:
https://github.com/v8/v8/blob/master/test/cctest/interpreter/bytecode_expectations/IfConditions.golden
V8 Official Website:
https://v8.dev/
Quick Recap
| Concept | Description |
|---|---|
| Lexical Analysis | Breaking code into tokens |
| Tokenization | Converting code into keywords, operators, etc. |
| AST | Tree structure representing code syntax |
| Ignition | V8's interpreter (AST → Bytecode) |
| TurboFan | V8's JIT compiler (Bytecode → Machine Code) |
| Hot Code | Frequently executed code |
| Deoptimization | When optimization assumptions fail |
Interview Questions
Q: What is tokenization in V8?
"Tokenization is the process of breaking down JavaScript code into tokens. Each token represents a fundamental element like keywords (var, function), operators (+, =), identifiers (variable names), and literals (values)."
Q: What is an Abstract Syntax Tree (AST)?
"AST is a tree-like data structure that represents the syntactic structure of the source code. Each node in the tree corresponds to a construct in the code like variables, expressions, or statements."
Q: Is JavaScript interpreted or compiled?
"JavaScript is neither purely interpreted nor purely compiled. It uses both techniques. Initially, code is interpreted by Ignition for quick execution. Then, frequently executed (hot) code is compiled by TurboFan using JIT compilation for optimization."
Q: What is Ignition and TurboFan?
"Ignition is V8's interpreter that converts AST to bytecode and executes it. TurboFan is V8's JIT compiler that optimizes frequently executed code by converting bytecode into optimized machine code."
Q: What is deoptimization in V8?
"Deoptimization occurs when TurboFan's optimization assumptions are incorrect. For example, if a function optimized for numbers suddenly receives strings, TurboFan deoptimizes and sends the code back to Ignition."
Key Points to Remember
- Lexical Analysis breaks code into tokens
- Tokens = keywords, operators, identifiers, literals
- AST = tree representation of code structure
- Syntax errors occur when AST can't be generated
- JavaScript uses both interpretation and compilation
- Ignition = interpreter (AST → Bytecode)
- TurboFan = JIT compiler (optimizes hot code)
- Hot code = frequently executed code
- Deoptimization happens when assumptions fail
- Best practice: Use consistent types for better optimization
What's Next?
Now you understand how the V8 engine works behind the scenes! In the next episode, we will:
- Learn about the libuv thread pool
- Understand worker threads
- Explore more about async operations
Keep coding, keep learning! See you in the next one!
Post a Comment