🗄️ Module 12: SQL & Database Querying
This module covers essential data analytics concepts and practical applications.
Advanced Level
⏱️ 45-60 minutes
📚 Topics Covered
-
✓ Database Fundamentals for Analytics
-
✓ SQL Basics: SELECT, WHERE, ORDER BY
-
✓ Filtering & Sorting Data
-
✓ Aggregations & GROUP BY
-
✓ JOINs: Combining Multiple Tables
-
✓ Subqueries & Common Table Expressions
-
✓ Window Functions for Advanced Analytics
-
✓ Query Optimization & Best Practices
🔑 Key Concepts
-
• Writing SQL queries to extract business data
-
• Joining tables to answer complex questions
-
• Aggregating data for reporting and analysis
-
• Using advanced SQL techniques efficiently
-
• Optimizing queries for performance
12.1 Why SQL is Essential for Data Analytics
SQL (Structured Query Language) is the universal language for working with databases. Most business data lives in relational databases.
Why Analysts Need SQL:
- Direct Data Access - Query production databases yourself, no dependency on IT
- Handle Big Data - Analyze millions of rows efficiently on the server
- Data Preparation - Clean, filter, aggregate data before analysis
- Universal Skill - Works with MySQL, PostgreSQL, SQL Server, Oracle, Snowflake
- Integration - Connects to Excel, Python, Tableau, Power BI, Google Sheets
Industry Reality:
According to Stack Overflow's Developer Survey, SQL is consistently in the top 3 most-used languages.
For data analysts specifically, 85%+ of job postings require SQL proficiency. It's the foundation
skill that enables everything else in analytics.
Relational Database Structure:
| Component |
Description |
Example |
| Database |
Container holding related tables |
Sales_DB, HR_Database |
| Table |
Collection of related records (like Excel sheet) |
Customers, Orders, Products |
| Row (Record) |
Single data entry |
One customer, one order |
| Column (Field) |
Attribute or property |
Name, Email, Order_Date |
| Primary Key |
Unique identifier for each row |
Customer_ID, Order_ID |
| Foreign Key |
Reference to primary key in another table |
Customer_ID in Orders table |
Sample Database Schema - E-commerce:
CUSTOMERS (Customer_ID, First_Name, Last_Name, Email, City, State, Join_Date)
ORDERS (Order_ID, Customer_ID, Order_Date, Total_Amount, Status)
ORDER_ITEMS (Item_ID, Order_ID, Product_ID, Quantity, Unit_Price)
PRODUCTS (Product_ID, Product_Name, Category, Price, Stock_Quantity)
Relationships:
• Customers ←→ Orders (one customer can have many orders)
• Orders ←→ Order_Items (one order contains many items)
• Products ←→ Order_Items (one product appears in many order items)
12.2 SQL Basics - SELECT Statement
SELECT retrieves data from tables. It's the foundation of every SQL query.
Basic SELECT Syntax:
Retrieve all columns and rows:
SELECT * FROM Customers;
Retrieve specific columns:
SELECT First_Name, Last_Name, Email
FROM Customers;
Retrieve with calculated column:
SELECT
First_Name,
Last_Name,
YEAR(CURRENT_DATE) - YEAR(Join_Date) AS Years_Member
FROM Customers;
Retrieve unique values (remove duplicates):
SELECT DISTINCT City
FROM Customers;
Count total rows:
SELECT COUNT(*) AS Total_Customers
FROM Customers;
Column Aliases (AS keyword):
Purpose: Rename columns in output for clarity
SELECT
First_Name AS "First",
Last_Name AS "Last",
Total_Amount AS "Order Total"
FROM Orders;
12.3 Filtering Data with WHERE Clause
WHERE filters rows based on conditions - critical for focused analysis.
Comparison Operators:
| Operator |
Meaning |
Example |
| = |
Equal to |
WHERE Status = 'Shipped' |
| != or <> |
Not equal to |
WHERE Status != 'Cancelled' |
| > , >= , < , <= |
Greater/Less than |
WHERE Total_Amount > 100 |
| BETWEEN |
In range (inclusive) |
WHERE Price BETWEEN 10 AND 50 |
| IN |
Matches any in list |
WHERE State IN ('CA','NY','TX') |
| LIKE |
Pattern matching |
WHERE Email LIKE '%@gmail.com' |
| IS NULL |
Missing value |
WHERE Phone IS NULL |
WHERE Clause Examples:
Single condition:
SELECT * FROM Orders
WHERE Order_Date >= '2025-01-01';
Multiple conditions with AND:
SELECT * FROM Products
WHERE Category = 'Electronics'
AND Price > 500;
Multiple conditions with OR:
SELECT * FROM Customers
WHERE State = 'California'
OR State = 'New York';
Combining AND/OR (use parentheses):
SELECT * FROM Orders
WHERE (Status = 'Pending' OR Status = 'Processing')
AND Total_Amount > 1000;
Pattern matching with LIKE:
SELECT * FROM Customers
WHERE Last_Name LIKE 'S%'; -- Starts with S
SELECT * FROM Products
WHERE Product_Name LIKE '%Pro%'; -- Contains "Pro"
12.4 Sorting with ORDER BY
ORDER BY controls the sequence of results - essential for reports and top N analysis.
ORDER BY Examples:
Sort ascending (default):
SELECT * FROM Products
ORDER BY Price;
Sort descending:
SELECT * FROM Orders
ORDER BY Total_Amount DESC;
Sort by multiple columns:
SELECT * FROM Customers
ORDER BY State, City, Last_Name;
Top 10 highest value orders:
SELECT * FROM Orders
ORDER BY Total_Amount DESC
LIMIT 10;
Bottom 5 products by stock:
SELECT Product_Name, Stock_Quantity
FROM Products
ORDER BY Stock_Quantity ASC
LIMIT 5;
Performance Tip: ORDER BY can be slow on large tables. If you only need top N results,
always use LIMIT to reduce processing. Example: "Get top 100" not "Get all then take top 100 in Excel."
12.5 Aggregate Functions & GROUP BY
Aggregate functions summarize data - the backbone of analytical queries.
Core Aggregate Functions:
| Function |
Purpose |
Example |
| COUNT() |
Count rows or non-null values |
COUNT(*), COUNT(Email) |
| SUM() |
Total of numeric column |
SUM(Total_Amount) |
| AVG() |
Average value |
AVG(Price) |
| MIN() |
Minimum value |
MIN(Order_Date) |
| MAX() |
Maximum value |
MAX(Stock_Quantity) |
GROUP BY - Summarizing by Categories:
Total sales by month:
SELECT
DATE_FORMAT(Order_Date, '%Y-%m') AS Month,
COUNT(*) AS Order_Count,
SUM(Total_Amount) AS Total_Revenue,
AVG(Total_Amount) AS Avg_Order_Value
FROM Orders
GROUP BY DATE_FORMAT(Order_Date, '%Y-%m')
ORDER BY Month;
Sales by product category:
SELECT
Category,
COUNT(*) AS Product_Count,
AVG(Price) AS Avg_Price,
MIN(Price) AS Lowest_Price,
MAX(Price) AS Highest_Price
FROM Products
GROUP BY Category
ORDER BY Product_Count DESC;
Customers by state:
SELECT
State,
COUNT(*) AS Customer_Count
FROM Customers
GROUP BY State
ORDER BY Customer_Count DESC
LIMIT 10;
HAVING Clause - Filtering After Aggregation:
Customers who spent over $5,000:
SELECT
Customer_ID,
COUNT(*) AS Order_Count,
SUM(Total_Amount) AS Total_Spent
FROM Orders
GROUP BY Customer_ID
HAVING SUM(Total_Amount) > 5000
ORDER BY Total_Spent DESC;
Note: WHERE filters rows BEFORE grouping
HAVING filters groups AFTER aggregation
12.6 JOINs - Combining Tables
JOINs connect related tables to answer complex business questions.
Types of JOINs:
| JOIN Type |
Returns |
Use Case |
| INNER JOIN |
Only matching rows from both tables |
Orders with customer info (exclude orphans) |
| LEFT JOIN |
All from left table + matches from right |
All customers including those with no orders |
| RIGHT JOIN |
All from right table + matches from left |
Rarely used (use LEFT JOIN instead) |
| FULL OUTER JOIN |
All rows from both tables |
Complete reconciliation, find mismatches |
JOIN Examples:
INNER JOIN - Orders with customer details:
SELECT
o.Order_ID,
o.Order_Date,
c.First_Name,
c.Last_Name,
c.City,
o.Total_Amount
FROM Orders o
INNER JOIN Customers c ON o.Customer_ID = c.Customer_ID
WHERE o.Order_Date >= '2025-01-01';
LEFT JOIN - All customers with order counts:
SELECT
c.Customer_ID,
c.First_Name,
c.Last_Name,
COUNT(o.Order_ID) AS Order_Count,
COALESCE(SUM(o.Total_Amount), 0) AS Total_Spent
FROM Customers c
LEFT JOIN Orders o ON c.Customer_ID = o.Customer_ID
GROUP BY c.Customer_ID, c.First_Name, c.Last_Name
ORDER BY Total_Spent DESC;
Multiple JOINs - Complete order details:
SELECT
o.Order_ID,
c.First_Name || ' ' || c.Last_Name AS Customer,
p.Product_Name,
oi.Quantity,
oi.Unit_Price,
(oi.Quantity * oi.Unit_Price) AS Line_Total
FROM Orders o
INNER JOIN Customers c ON o.Customer_ID = c.Customer_ID
INNER JOIN Order_Items oi ON o.Order_ID = oi.Order_ID
INNER JOIN Products p ON oi.Product_ID = p.Product_ID
WHERE o.Order_Date BETWEEN '2025-01-01' AND '2025-03-31';
12.7 Subqueries & Common Table Expressions (CTEs)
Subqueries and CTEs break complex problems into manageable steps.
Subquery Examples:
Above-average spending customers:
SELECT Customer_ID, Total_Spent
FROM (
SELECT
Customer_ID,
SUM(Total_Amount) AS Total_Spent
FROM Orders
GROUP BY Customer_ID
) AS customer_totals
WHERE Total_Spent > (
SELECT AVG(Total_Spent)
FROM (
SELECT SUM(Total_Amount) AS Total_Spent
FROM Orders
GROUP BY Customer_ID
) AS averages
);
Products never ordered:
SELECT Product_ID, Product_Name
FROM Products
WHERE Product_ID NOT IN (
SELECT DISTINCT Product_ID
FROM Order_Items
);
CTEs - Cleaner and More Readable:
Same query using CTE (WITH clause):
WITH customer_spending AS (
SELECT
Customer_ID,
SUM(Total_Amount) AS Total_Spent
FROM Orders
GROUP BY Customer_ID
),
average_spending AS (
SELECT AVG(Total_Spent) AS avg_amount
FROM customer_spending
)
SELECT
cs.Customer_ID,
cs.Total_Spent,
av.avg_amount AS Average_Spent
FROM customer_spending cs
CROSS JOIN average_spending av
WHERE cs.Total_Spent > av.avg_amount
ORDER BY cs.Total_Spent DESC;
CTE Advantages:
• More readable - each step has a meaningful name
• Reusable - reference same CTE multiple times
• Easier debugging - test each CTE independently
• Better performance in some databases
12.8 Window Functions - Advanced Analytics
Window functions perform calculations across rows while preserving individual row details (unlike GROUP BY which collapses rows).
Common Window Functions:
- ROW_NUMBER() - Unique sequential number for each row
- RANK() - Ranking with gaps for ties (1, 2, 2, 4)
- DENSE_RANK() - Ranking without gaps (1, 2, 2, 3)
- LAG() / LEAD() - Access previous/next row value
- SUM/AVG/COUNT OVER() - Running totals, moving averages
Window Function Examples:
Rank customers by spending:
SELECT
Customer_ID,
SUM(Total_Amount) AS Total_Spent,
RANK() OVER (ORDER BY SUM(Total_Amount) DESC) AS Rank
FROM Orders
GROUP BY Customer_ID
ORDER BY Total_Spent DESC;
Running total of daily revenue:
SELECT
Order_Date,
SUM(Total_Amount) AS Daily_Revenue,
SUM(SUM(Total_Amount)) OVER (
ORDER BY Order_Date
) AS Running_Total
FROM Orders
GROUP BY Order_Date
ORDER BY Order_Date;
Month-over-month growth:
WITH monthly_revenue AS (
SELECT
DATE_FORMAT(Order_Date, '%Y-%m') AS Month,
SUM(Total_Amount) AS Revenue
FROM Orders
GROUP BY DATE_FORMAT(Order_Date, '%Y-%m')
)
SELECT
Month,
Revenue,
LAG(Revenue) OVER (ORDER BY Month) AS Prev_Month,
Revenue - LAG(Revenue) OVER (ORDER BY Month) AS Change,
ROUND(((Revenue - LAG(Revenue) OVER (ORDER BY Month)) /
LAG(Revenue) OVER (ORDER BY Month)) * 100, 2) AS Growth_Pct
FROM monthly_revenue;
12.9 Query Optimization Best Practices
Writing efficient SQL saves time and reduces server load.
Performance Best Practices:
- Select Only What You Need - Avoid SELECT * in production queries
- Use WHERE to Filter Early - Reduce data before JOIN operations
- Index Key Columns - Create indexes on frequently queried/joined columns
- Limit Results - Use LIMIT for testing and previews
- Avoid Functions on Indexed Columns - WHERE YEAR(date) prevents index use
- Use EXPLAIN - Analyze query execution plan to find bottlenecks
Before/After Optimization:
❌ Slow Query:
SELECT *
FROM Orders o
JOIN Customers c ON o.Customer_ID = c.Customer_ID
WHERE YEAR(o.Order_Date) = 2025;
Problems:
• SELECT * retrieves unnecessary columns
• YEAR() function prevents index use on Order_Date
• No LIMIT (returns all rows)
✓ Optimized Query:
SELECT
o.Order_ID,
o.Order_Date,
o.Total_Amount,
c.First_Name,
c.Last_Name
FROM Orders o
JOIN Customers c ON o.Customer_ID = c.Customer_ID
WHERE o.Order_Date >= '2025-01-01'
AND o.Order_Date < '2026-01-01'
LIMIT 1000;
Improvements:
• Specific columns only
• Date range allows index use
• LIMIT controls output size
• Result: 10x-100x faster on large tables
✓ Module 12 Complete
You've learned:
- Database fundamentals and relational structure
- SQL basics: SELECT, WHERE, ORDER BY with multiple examples
- Filtering data with comparison and pattern matching operators
- Aggregating data with COUNT, SUM, AVG, MIN, MAX and GROUP BY
- Joining tables with INNER JOIN, LEFT JOIN for complex queries
- Subqueries and CTEs for multi-step analysis
- Window functions for advanced analytics (rankings, running totals)
- Query optimization techniques for better performance
- Real-world SQL examples ready to adapt for your business
Next: Module 13 covers predictive analytics and machine learning basics.