Deep Dive into V8 JS Engine | Node js

Episode 08 - Deep Dive into V8 JS Engine

Hey everyone! Welcome back to the Node.js tutorial series. Today we are going to take a deep dive into the V8 JavaScript Engine!

Understanding how V8 works behind the scenes will make you a better developer and help you write more performant code!

What we will cover:

  • Parsing Stage: Lexical Analysis
  • Tokenization
  • Syntax Analysis and AST
  • Interpreted vs Compiled Languages
  • Ignition (Interpreter)
  • TurboFan (JIT Compiler)
  • Hot Code Optimization
  • Deoptimization

Behind the Scenes: Parsing Stage in V8 Engine

When JavaScript code is executed, it goes through several stages in the V8 engine. The first stage is parsing, which includes lexical analysis and tokenization.

V8 Engine Pipeline:
===================

Your JavaScript Code
        │
        ▼
┌─────────────────────┐
│  A) PARSING STAGE   │
│  - Lexical Analysis │
│  - Tokenization     │
│  - Syntax Analysis  │
│  - AST Generation   │
└─────────────────────┘
        │
        ▼
┌─────────────────────┐
│  B) INTERPRETATION  │
│  - Ignition         │
│  - Bytecode         │
└─────────────────────┘
        │
        ▼
┌─────────────────────┐
│  C) OPTIMIZATION    │
│  - TurboFan         │
│  - Machine Code     │
└─────────────────────┘

A) Parsing Stage: Lexical Analysis and Tokenization

1. Lexical Analysis

Purpose: The main goal of lexical analysis is to break down the raw JavaScript code into manageable pieces called tokens.

Process:

Input Code:
===========
var a = 10;

Lexical Analysis Process:
=========================
The code is scanned character by character
to identify individual tokens.

2. What is Tokenization?

Definition: Tokenization is the process of converting code into a series of tokens. Each token represents a fundamental element of the language, such as keywords, operators, identifiers, and literals.

Example:

For the code var a = 10;, the tokens might be:

Tokenization Example:
=====================

var a = 10;

Token 1: var     → (keyword)
Token 2: a       → (identifier)
Token 3: =       → (operator)
Token 4: 10      → (literal)
Token 5: ;       → (punctuation)

Why Tokenization?

Tokenization helps the V8 engine to read and understand the code more effectively by breaking it down into smaller, more manageable pieces. This step is crucial for further analysis and compilation.

3. Output

Tokens: The result of tokenization is a list of tokens that the V8 engine uses in subsequent stages of parsing.

Another Example:
================

function add(a, b) {
    return a + b;
}

Tokens:
-------
function  → (keyword)
add       → (identifier)
(         → (punctuation)
a         → (identifier)
,         → (punctuation)
b         → (identifier)
)         → (punctuation)
{         → (punctuation)
return    → (keyword)
a         → (identifier)
+         → (operator)
b         → (identifier)
;         → (punctuation)
}         → (punctuation)

In the parsing stage, lexical analysis breaks down JavaScript code into tokens. This process helps the V8 engine understand and process the code by converting it into a format that can be easily analyzed and optimized in later stages.

2nd Stage: Syntax Analysis and Abstract Syntax Tree (AST)

After the lexical analysis and tokenization stages, the next step in the parsing process is syntax analysis. In this stage, the tokens are converted into an Abstract Syntax Tree (AST).

1. Syntax Analysis

Purpose: To analyze the syntactic structure of the tokens and build the Abstract Syntax Tree (AST).

The tokens are analyzed according to the grammar rules of JavaScript to create a hierarchical tree structure that represents the code.

2. Abstract Syntax Tree (AST)

The AST is a tree-like data structure that represents the syntactic structure of the source code. Each node in the tree corresponds to a construct in the code, such as variables, expressions, or statements.

Example:

For the code:

var a = 10;

The AST might look something like this:

Abstract Syntax Tree:
=====================

        Program
           │
           ▼
    VariableDeclaration
           │
    ┌──────┴──────┐
    │             │
    ▼             ▼
  kind         declarations
  "var"            │
                   ▼
           VariableDeclarator
                   │
            ┌──────┴──────┐
            │             │
            ▼             ▼
           id           init
            │             │
            ▼             ▼
       Identifier      Literal
        name: "a"     value: 10

Website: You can explore AST structures using tools like AST Explorer, which provides a visual representation of the AST for various pieces of code.

Visit: https://astexplorer.net/

3. Interesting Fact: Syntax Errors

Syntax Errors: When the V8 engine reads code, it processes tokens one by one. If an unexpected token is encountered that does not fit the grammar rules, a syntax error occurs.

This is because the AST cannot be generated if the code does not adhere to the expected syntax, indicating that something is wrong with the structure of the code.

Syntax Error Example:
=====================

// Invalid code
var = 10;

// V8 encounters unexpected token
// Expected: identifier after 'var'
// Got: '=' operator

SyntaxError: Unexpected token '='

In the syntax analysis stage, the tokens are analyzed to create an Abstract Syntax Tree (AST), which provides a structured and hierarchical representation of the code. This tree structure helps the V8 engine understand and process the code more effectively.

B) Interpreter and Compilation

1. Interpreted vs. Compiled Languages

Interpreted Languages:

  • Definition: These languages are executed line by line. The interpreter reads and executes the code directly.
  • Pros: Faster to start executing code, easier to debug.
  • Cons: Slower execution compared to compiled languages because of the line-by-line interpretation.
  • Example: Python
Interpreted Language:
=====================

Line 1  →  Read  →  Execute
Line 2  →  Read  →  Execute
Line 3  →  Read  →  Execute
...

No pre-compilation step!

Compiled Languages:

  • Definition: These languages are first translated into machine code (binary code) through a process called compilation. The machine code is then executed by the computer's hardware.
  • Pros: Faster execution because the code is pre-compiled into machine code.
  • Cons: Longer initial compilation time, more complex debugging process.
  • Example: C, C++
Compiled Language:
==================

Source Code
    │
    ▼
Compiler (Takes time)
    │
    ▼
Machine Code (Binary)
    │
    ▼
Execute (Very Fast!)

Q: Is JavaScript interpreted or compiled language? 🤨

A: JavaScript is neither purely interpreted nor purely compiled. It utilizes a combination of both techniques!

JavaScript = Interpreted + Compiled (JIT)
=========================================

      ┌─────────────┐
      │  Your Code  │
      └──────┬──────┘
             │
             ▼
      ┌─────────────┐
      │   Parsing   │
      │    (AST)    │
      └──────┬──────┘
             │
             ▼
      ┌─────────────┐
      │  Ignition   │ ← Interpreter
      │ (Bytecode)  │
      └──────┬──────┘
             │
       ┌─────┴─────┐
       │           │
       ▼           ▼
    Execute    Hot Code?
       │           │
       │           ▼
       │    ┌─────────────┐
       │    │  TurboFan   │ ← JIT Compiler
       │    │(Optimized   │
       │    │Machine Code)│
       │    └─────────────┘
       │           │
       └─────┬─────┘
             │
             ▼
        Execution

Interpreter:

  • Initial Execution: JavaScript uses an interpreter to execute code quickly and start running the script. This allows for rapid execution of scripts and immediate feedback.

Compiler:

  • Just-In-Time (JIT) Compilation: JavaScript engines like V8 use JIT compilation to improve performance. JIT compilation involves compiling code into machine code at runtime, just before execution. This process optimizes performance by compiling frequently executed code paths into optimized machine code.

1. Abstract Syntax Tree (AST) to Bytecode

AST to Bytecode: After parsing the code and generating the AST, the code is passed to the interpreter. In the V8 engine, this interpreter is called Ignition.

Ignition:

  • Converts the AST into bytecode
  • Bytecode is a lower-level, intermediate representation of the code that the JavaScript engine can execute more efficiently than raw source code
  • Execution: Ignition reads and executes the bytecode line by line
Ignition Process:
=================

     AST
      │
      ▼
  ┌─────────┐
  │ Ignition │
  │(Interpreter)
  └────┬────┘
       │
       ▼
   Bytecode
       │
       ▼
   Execute!

2. Just-In-Time (JIT) Compilation

TurboFan:

A compiler within the V8 engine that optimizes frequently executed (hot) code paths.

When Ignition identifies a portion of the code that runs frequently (hot code), it sends this code to TurboFan for optimization.

Optimization: TurboFan converts the bytecode into optimized machine code, which improves performance for repeated executions.

TurboFan Optimization:
======================

    Bytecode
       │
       │ (This code runs 1000 times!)
       │
       ▼
  ┌──────────┐
  │ TurboFan │
  │  (JIT    │
  │ Compiler)│
  └────┬─────┘
       │
       ▼
  Optimized
  Machine Code
       │
       ▼
  Super Fast
  Execution!

3. Hot Code Optimization and Deoptimization

Hot Code: Refers to code that is executed frequently. TurboFan focuses on optimizing hot code to improve performance.

Optimization Assumptions:

TurboFan makes certain assumptions during optimization based on the types and values it encounters. For example, if a function is optimized with the assumption that it only processes numbers, it will run very efficiently for such cases.

Optimization Example:
=====================

function add(a, b) {
    return a + b;
}

// Called many times with numbers
add(5, 10);
add(20, 30);
add(100, 200);
// ... 1000 more times

TurboFan: "This function always gets numbers!"
         "Let me optimize it for number addition!"

Result: Super fast optimized machine code!

Deoptimization:

Scenario: If TurboFan's assumptions are incorrect (e.g., a function that was optimized for numbers receives strings), the optimization may fail.

Process: In such cases, TurboFan will deoptimize the code and revert it to a less optimized state. The code is then sent back to Ignition for further interpretation and possible re-optimization.

Deoptimization Example:
=======================

function add(a, b) {
    return a + b;
}

// Optimized for numbers
add(5, 10);     // ✅ Fast!
add(20, 30);    // ✅ Fast!

// Suddenly...
add("hello", "world");  // 😱 Strings?!

TurboFan: "Wait! My assumption was wrong!"
         "I optimized for numbers, not strings!"
         "Deoptimizing... sending back to Ignition!"

Result: Code runs slower until re-optimized

Key Terms

Inline Caching: A technique used to speed up property access by caching the results of lookups.

Copy Elision: An optimization technique that eliminates unnecessary copying of objects.

Developer Note - Best Practice!

Best Practice: For optimal performance, try to pass consistent types and values to functions.

For example, if a function is optimized for numeric calculations, avoid passing strings to prevent deoptimization.

// ❌ BAD - Inconsistent types (causes deoptimization)
function calculate(value) {
    return value * 2;
}

calculate(10);      // Number
calculate("hello"); // String - Deoptimization!
calculate(20);      // Number
calculate([1,2,3]); // Array - Deoptimization again!


// ✅ GOOD - Consistent types (stays optimized)
function calculateNumber(value) {
    return value * 2;
}

calculateNumber(10);  // Number
calculateNumber(20);  // Number
calculateNumber(30);  // Number
// TurboFan stays happy! Optimized code runs fast!

The Complete V8 Pipeline

Complete V8 Engine Flow:
========================

        JavaScript Source Code
                │
                ▼
    ┌───────────────────────┐
    │     LEXICAL ANALYSIS  │
    │     (Tokenization)    │
    └───────────┬───────────┘
                │
                ▼
    ┌───────────────────────┐
    │    SYNTAX ANALYSIS    │
    │    (AST Generation)   │
    └───────────┬───────────┘
                │
                ▼
    ┌───────────────────────┐
    │       IGNITION        │
    │    (Interpreter)      │
    │  AST → Bytecode       │
    └───────────┬───────────┘
                │
         Execute Bytecode
                │
        Is this HOT code?
        (Runs frequently?)
                │
        ┌───────┴───────┐
        │               │
       No              Yes
        │               │
        ▼               ▼
    Continue      ┌───────────┐
    with          │ TURBOFAN  │
    Ignition      │  (JIT)    │
        │         │ Optimize! │
        │         └─────┬─────┘
        │               │
        │         Optimized
        │         Machine Code
        │               │
        │         ┌─────┴─────┐
        │         │           │
        │     Assumption   Assumption
        │      Correct?      Wrong?
        │         │           │
        │         ▼           ▼
        │      Super       Deoptimize
        │      Fast!       (Back to
        │                   Ignition)
        │               │
        └───────┬───────┘
                │
                ▼
            Execution

Different JavaScript Engines

All of these processes work differently in each JavaScript engine, such as SpiderMonkey or others, but the V8 engine is considered the best on the market.

Understanding the structure of the V8 engine is very beneficial for you!

Engine Used By Interpreter Compiler
V8 Chrome, Node.js Ignition TurboFan
SpiderMonkey Firefox Interpreter IonMonkey
JavaScriptCore Safari LLInt FTL
Chakra Old Edge Interpreter SimpleJIT

Explore More!

Now, I'll show you what bytecode looks like. Your task is to explore this code!

Bytecode Examples:

https://github.com/v8/v8/blob/master/test/cctest/interpreter/bytecode_expectations/IfConditions.golden

V8 Official Website:

https://v8.dev/

Quick Recap

Concept Description
Lexical Analysis Breaking code into tokens
Tokenization Converting code into keywords, operators, etc.
AST Tree structure representing code syntax
Ignition V8's interpreter (AST → Bytecode)
TurboFan V8's JIT compiler (Bytecode → Machine Code)
Hot Code Frequently executed code
Deoptimization When optimization assumptions fail

Interview Questions

Q: What is tokenization in V8?

"Tokenization is the process of breaking down JavaScript code into tokens. Each token represents a fundamental element like keywords (var, function), operators (+, =), identifiers (variable names), and literals (values)."

Q: What is an Abstract Syntax Tree (AST)?

"AST is a tree-like data structure that represents the syntactic structure of the source code. Each node in the tree corresponds to a construct in the code like variables, expressions, or statements."

Q: Is JavaScript interpreted or compiled?

"JavaScript is neither purely interpreted nor purely compiled. It uses both techniques. Initially, code is interpreted by Ignition for quick execution. Then, frequently executed (hot) code is compiled by TurboFan using JIT compilation for optimization."

Q: What is Ignition and TurboFan?

"Ignition is V8's interpreter that converts AST to bytecode and executes it. TurboFan is V8's JIT compiler that optimizes frequently executed code by converting bytecode into optimized machine code."

Q: What is deoptimization in V8?

"Deoptimization occurs when TurboFan's optimization assumptions are incorrect. For example, if a function optimized for numbers suddenly receives strings, TurboFan deoptimizes and sends the code back to Ignition."

Key Points to Remember

  • Lexical Analysis breaks code into tokens
  • Tokens = keywords, operators, identifiers, literals
  • AST = tree representation of code structure
  • Syntax errors occur when AST can't be generated
  • JavaScript uses both interpretation and compilation
  • Ignition = interpreter (AST → Bytecode)
  • TurboFan = JIT compiler (optimizes hot code)
  • Hot code = frequently executed code
  • Deoptimization happens when assumptions fail
  • Best practice: Use consistent types for better optimization

What's Next?

Now you understand how the V8 engine works behind the scenes! In the next episode, we will:

  • Learn about the libuv thread pool
  • Understand worker threads
  • Explore more about async operations

Keep coding, keep learning! See you in the next one!