Databases - SQL & NoSQL | Node js

Episode 12 - Databases - SQL & NoSQL

Hey everyone! Welcome back to the Node.js tutorial series. Today we're going to learn about something that every backend developer MUST understand - Databases!

Whether you're building a small app or a large-scale system, you'll need to store data somewhere. Let's dive into the world of databases!

What we will cover:

  • What is a Database?
  • Types of Databases
  • RDBMS - Relational Databases
  • NoSQL Databases
  • MongoDB - Document Database
  • RDBMS vs NoSQL Comparison
  • When to Use What?
  • Interview Questions

What is a Database?

A database is an organized collection of structured information, or data, typically stored electronically in a computer system.

Think of it like a super-powered Excel sheet that can:

  • Store millions of records
  • Search data incredibly fast
  • Handle multiple users at once
  • Keep data safe and consistent
Without Database:
=================
// Storing data in a file
const users = [
    { name: "John", age: 25 },
    { name: "Jane", age: 30 }
];
fs.writeFileSync("users.json", JSON.stringify(users));

Problems:
- What if 1000 users try to write at same time?
- How to search efficiently in millions of records?
- What if the file gets corrupted?
- No relationships between data!

With Database:
==============
- Handles concurrent access
- Lightning fast queries
- Data integrity & backup
- Relationships & constraints

Types of Databases

There are MANY types of databases! Let's look at the most important ones:

Types of Databases:
===================

1. Relational Databases (RDBMS)
   └── MySQL, PostgreSQL, Oracle, SQL Server

2. NoSQL Databases
   ├── Document: MongoDB, CouchDB
   ├── Key-Value: Redis, DynamoDB
   ├── Column-Family: Cassandra, HBase
   └── Graph: Neo4j, Amazon Neptune

3. In-Memory Databases
   └── Redis, Memcached

4. Time Series Databases
   └── InfluxDB, TimescaleDB

5. Graph Databases
   └── Neo4j, Amazon Neptune

6. Distributed SQL
   └── CockroachDB, Google Spanner

7. Object-Oriented Databases
   └── db4o, ObjectDB

8. Hierarchical Databases
   └── IBM IMS

9. Network Databases
   └── IDMS

10. Cloud Databases
    └── Amazon RDS, Google Cloud SQL, Azure SQL

Don't worry! As a beginner, you mainly need to focus on Relational (SQL) and NoSQL databases.

RDBMS - Relational Database Management System

Let's start with the most traditional and widely used type of database!

What is RDBMS?

RDBMS stands for Relational Database Management System. It stores data in tables with rows and columns, just like an Excel spreadsheet!

RDBMS Structure:
================

Table: users
+----+----------+-----+------------------+
| id | name     | age | email            |
+----+----------+-----+------------------+
| 1  | John     | 25  | john@email.com   |
| 2  | Jane     | 30  | jane@email.com   |
| 3  | Bob      | 22  | bob@email.com    |
+----+----------+-----+------------------+

- Each row is a RECORD
- Each column is a FIELD
- Tables can be RELATED to each other!

History of RDBMS

The relational database was invented by Edgar F. Codd in 1970 while working at IBM!

Timeline:
=========

1970 → Edgar F. Codd publishes "A Relational Model of Data"
       at IBM Research Laboratory

1974 → IBM creates System R (first RDBMS implementation)

1979 → Oracle releases first commercial RDBMS

1995 → MySQL is released (Open Source!)

1996 → PostgreSQL is released (Open Source!)

Today → RDBMS powers most of the internet!

Codd's 12 Rules

Edgar F. Codd defined 12 rules (actually 13, numbered 0-12) that a database must follow to be considered a true RDBMS!

Codd's Rules (Simplified):
==========================

Rule 0: Foundation Rule
        → Must use relational capabilities to manage data

Rule 1: Information Rule
        → All data must be stored in tables

Rule 2: Guaranteed Access Rule
        → Every data element must be accessible

Rule 3: Systematic Treatment of Null Values
        → NULL means missing/unknown data

Rule 4: Dynamic Online Catalog
        → Database structure stored in tables too

Rule 5: Comprehensive Data Sublanguage Rule
        → Must support a query language (SQL!)

Rule 6: View Updating Rule
        → Views should be updatable

Rule 7: High-Level Insert, Update, Delete
        → Must support set-level operations

Rule 8: Physical Data Independence
        → Changes to storage shouldn't affect apps

Rule 9: Logical Data Independence
        → Changes to tables shouldn't break apps

Rule 10: Integrity Independence
         → Integrity constraints defined in database

Rule 11: Distribution Independence
         → Should work even if data is distributed

Rule 12: Non-subversion Rule
         → Can't bypass integrity using low-level access

Popular RDBMS Systems

MySQL:
======
- Most popular open-source RDBMS
- Powers WordPress, Facebook (partially)
- Great for web applications
- Easy to learn and use

PostgreSQL:
===========
- Advanced open-source RDBMS
- More features than MySQL
- Better for complex queries
- Used by Instagram, Spotify

Oracle:
=======
- Enterprise-grade RDBMS
- Very expensive but powerful
- Used by banks, large corporations

SQL Server:
===========
- Microsoft's RDBMS
- Integrates well with Windows/.NET
- Used in enterprise environments

SQL - Structured Query Language

To interact with RDBMS, we use SQL (Structured Query Language)!

SQL Examples:
=============

-- Create a table
CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100),
    age INT,
    email VARCHAR(255)
);

-- Insert data
INSERT INTO users (name, age, email)
VALUES ('John', 25, 'john@email.com');

-- Read data
SELECT * FROM users WHERE age > 20;

-- Update data
UPDATE users SET age = 26 WHERE name = 'John';

-- Delete data
DELETE FROM users WHERE id = 1;

-- Join tables (POWERFUL!)
SELECT users.name, orders.product
FROM users
JOIN orders ON users.id = orders.user_id;

NoSQL Databases

Now let's talk about the newer kid on the block - NoSQL!

What is NoSQL?

NoSQL stands for "Not Only SQL" (not "No SQL"!). These databases don't use traditional table-based structure.

NoSQL = Not Only SQL
====================

- Doesn't mean "No SQL at all"
- Means "Not ONLY SQL" - more flexible!
- Different data models
- Schema-less or flexible schema
- Built for scale and speed

History of NoSQL

Timeline:
=========

1998 → Carlo Strozzi creates first "NoSQL" database
       (But it was still relational!)

2000s → Internet boom creates new challenges
        - Millions of users
        - Huge amounts of data
        - Need for speed and scale

2007 → Amazon publishes Dynamo paper

2008 → Facebook releases Cassandra

2009 → MongoDB is released!
       The term "NoSQL" gains popularity

Today → NoSQL is essential for modern apps!

Why NoSQL Was Created

The Problem with RDBMS at Scale:
================================

Imagine you have 100 MILLION users!

RDBMS Challenges:
- Fixed schema (hard to change structure)
- Scaling is expensive (vertical scaling)
- JOINs become slow with huge data
- Not great for unstructured data

NoSQL Solutions:
- Flexible schema (easy to evolve)
- Horizontal scaling (add more servers)
- No complex JOINs needed
- Perfect for unstructured data

Types of NoSQL Databases

1. Document Databases (MongoDB, CouchDB):
=========================================
{
    "_id": "12345",
    "name": "John",
    "age": 25,
    "address": {
        "city": "New York",
        "zip": "10001"
    },
    "hobbies": ["reading", "gaming"]
}

- Data stored as JSON-like documents
- Flexible structure
- Great for: Content management, catalogs

2. Key-Value Stores (Redis, DynamoDB):
======================================
key: "user:123"  →  value: "John Doe"
key: "session:abc" → value: "{...session data...}"

- Simple key-value pairs
- Super fast!
- Great for: Caching, sessions

3. Column-Family (Cassandra, HBase):
====================================
Row Key: "user123"
    Column Family: "profile"
        name: "John"
        age: 25
    Column Family: "activity"
        last_login: "2024-01-15"
        posts: 42

- Data stored in column families
- Great for: Analytics, time-series

4. Graph Databases (Neo4j):
===========================
(John)-[:FRIENDS_WITH]->(Jane)
(Jane)-[:WORKS_AT]->(Google)
(John)-[:LIKES]->(Post123)

- Data stored as nodes and relationships
- Great for: Social networks, recommendations

MongoDB - The Most Popular NoSQL Database

Let's focus on MongoDB since it's the most commonly used NoSQL database with Node.js!

MongoDB Facts:
==============

- Name comes from "huMONGOus" (huge!)
- Released in 2009
- Document-oriented database
- Stores data as BSON (Binary JSON)
- Perfect match for JavaScript/Node.js!

MongoDB Structure

MongoDB vs RDBMS Terminology:
=============================

RDBMS          →    MongoDB
------              -------
Database       →    Database
Table          →    Collection
Row            →    Document
Column         →    Field
Primary Key    →    _id
JOIN           →    Embedding/Referencing
MongoDB Document Example:
=========================

// This is a MongoDB document (in a "users" collection)
{
    "_id": ObjectId("507f1f77bcf86cd799439011"),
    "name": "John Doe",
    "age": 25,
    "email": "john@email.com",
    "address": {
        "street": "123 Main St",
        "city": "New York",
        "zip": "10001"
    },
    "hobbies": ["reading", "coding", "gaming"],
    "createdAt": ISODate("2024-01-15T10:30:00Z")
}

Notice:
- Nested objects (address)
- Arrays (hobbies)
- Auto-generated _id
- Flexible structure!

MongoDB Queries

MongoDB Query Examples:
=======================

// Insert a document
db.users.insertOne({
    name: "John",
    age: 25,
    email: "john@email.com"
});

// Find all users
db.users.find();

// Find users over 20
db.users.find({ age: { $gt: 20 } });

// Update a user
db.users.updateOne(
    { name: "John" },
    { $set: { age: 26 } }
);

// Delete a user
db.users.deleteOne({ name: "John" });

RDBMS vs NoSQL - The Ultimate Comparison

Feature RDBMS (SQL) NoSQL
Data Structure Tables with rows & columns Documents, Key-Value, Graph, etc.
Schema Fixed schema (rigid) Flexible schema (dynamic)
Query Language SQL (standardized) Varies by database
Scaling Vertical (bigger server) Horizontal (more servers)
Relationships JOINs (powerful) Embedding or References
ACID Compliance Strong ACID Eventual consistency (usually)
Best For Complex queries, transactions Large scale, flexible data
Examples MySQL, PostgreSQL, Oracle MongoDB, Redis, Cassandra

When to Use RDBMS?

Use RDBMS When:
===============

✅ You need complex queries with JOINs
✅ Data has clear relationships
✅ ACID transactions are critical
✅ Data structure is well-defined
✅ You need strong data integrity

Examples:
- Banking systems (transactions!)
- E-commerce (orders, inventory)
- ERP systems
- Traditional business applications

When to Use NoSQL?

Use NoSQL When:
===============

✅ You have huge amounts of data
✅ Data structure changes frequently
✅ You need horizontal scaling
✅ Working with unstructured data
✅ Speed is more important than consistency

Examples:
- Social media platforms
- Real-time analytics
- Content management systems
- IoT applications
- Gaming leaderboards
- Session storage

ACID vs BASE

ACID (RDBMS):
=============
A - Atomicity    → All or nothing
C - Consistency  → Data always valid
I - Isolation    → Transactions don't interfere
D - Durability   → Data survives crashes

BASE (NoSQL):
=============
BA - Basically Available  → System always responds
S  - Soft State          → State may change over time
E  - Eventual Consistency → Data syncs eventually

Vertical vs Horizontal Scaling

Vertical Scaling (RDBMS):
=========================

Before:        After:
[Server]  →   [BIGGER Server]
4GB RAM       64GB RAM
2 CPU         16 CPU

- Add more power to ONE server
- Has limits (can't scale forever)
- Expensive at high scale

Horizontal Scaling (NoSQL):
===========================

Before:           After:
[Server]    →    [Server] [Server] [Server]

- Add MORE servers
- Theoretically unlimited
- More cost-effective at scale
- Data distributed across servers

Using Databases with Node.js

// MySQL with Node.js
const mysql = require('mysql2');

const connection = mysql.createConnection({
    host: 'localhost',
    user: 'root',
    database: 'myapp'
});

connection.query('SELECT * FROM users', (err, results) => {
    console.log(results);
});
// MongoDB with Node.js
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');

async function main() {
    await client.connect();
    const db = client.db('myapp');
    const users = await db.collection('users').find().toArray();
    console.log(users);
}

main();

Quick Recap

Concept Description
Database Organized collection of structured data
RDBMS Relational database with tables (MySQL, PostgreSQL)
NoSQL "Not Only SQL" - flexible, scalable databases
SQL Query language for relational databases
MongoDB Popular document-based NoSQL database
Document JSON-like data structure in MongoDB
ACID Transaction properties (Atomicity, Consistency, Isolation, Durability)
Vertical Scaling Adding more power to one server
Horizontal Scaling Adding more servers

Interview Questions

Q: What is the difference between SQL and NoSQL databases?

"SQL databases are relational, store data in tables with fixed schemas, use SQL for queries, and scale vertically. NoSQL databases are non-relational, have flexible schemas, use various query methods, and scale horizontally. SQL is best for complex queries and transactions, while NoSQL excels at handling large-scale, unstructured data."

Q: What is ACID in databases?

"ACID stands for Atomicity (all or nothing), Consistency (data always valid), Isolation (transactions don't interfere), and Durability (data survives crashes). RDBMS systems are ACID compliant, ensuring reliable transactions."

Q: When would you choose MongoDB over MySQL?

"I would choose MongoDB when dealing with large amounts of unstructured or semi-structured data, when the schema needs to evolve frequently, when horizontal scaling is required, or when working with document-based data like content management systems. MySQL is better for complex relational data with transactions."

Q: What is horizontal vs vertical scaling?

"Vertical scaling means adding more resources (CPU, RAM) to a single server - it has limits and gets expensive. Horizontal scaling means adding more servers to distribute the load - it's more scalable and cost-effective. NoSQL databases are designed for horizontal scaling, while RDBMS traditionally uses vertical scaling."

Q: Who invented the relational database model?

"Edgar F. Codd invented the relational database model in 1970 while working at IBM. He defined 12 rules (0-12) that a true RDBMS should follow."

Key Points to Remember

  • Database is an organized collection of data
  • RDBMS uses tables with rows and columns
  • NoSQL means "Not Only SQL" - flexible alternatives
  • SQL is the query language for relational databases
  • MongoDB stores data as JSON-like documents
  • Edgar F. Codd invented RDBMS in 1970
  • RDBMS is ACID compliant (reliable transactions)
  • NoSQL follows BASE (eventual consistency)
  • RDBMS scales vertically, NoSQL scales horizontally
  • Choose based on your data structure and scale needs
  • MySQL/PostgreSQL for relational data with complex queries
  • MongoDB for flexible, large-scale document data

What's Next?

Now you understand the fundamentals of databases! In the upcoming episodes, we will:

  • Connect Node.js to MongoDB
  • Perform CRUD operations
  • Learn about Mongoose ODM
  • Build a complete API with database

Keep coding, keep learning! See you in the next one!