📊 Data Analytics Mastery Course

Master techniques for collecting, analyzing, and interpreting data to drive informed business decisions and strategic insights.

📚 Total Modules

20

🎯 Skill Levels

All Levels

🌎 Coverage

USA & Canada

⏱️ Total Duration

~20 Hours

🗄️ Module 12: SQL & Database Querying

This module covers essential data analytics concepts and practical applications.

Advanced Level
⏱️ 45-60 minutes

📚 Topics Covered

  • ✓ Database Fundamentals for Analytics
  • ✓ SQL Basics: SELECT, WHERE, ORDER BY
  • ✓ Filtering & Sorting Data
  • ✓ Aggregations & GROUP BY
  • ✓ JOINs: Combining Multiple Tables
  • ✓ Subqueries & Common Table Expressions
  • ✓ Window Functions for Advanced Analytics
  • ✓ Query Optimization & Best Practices

🔑 Key Concepts

  • • Writing SQL queries to extract business data
  • • Joining tables to answer complex questions
  • • Aggregating data for reporting and analysis
  • • Using advanced SQL techniques efficiently
  • • Optimizing queries for performance

12.1 Why SQL is Essential for Data Analytics

SQL (Structured Query Language) is the universal language for working with databases. Most business data lives in relational databases.

Why Analysts Need SQL:

  • Direct Data Access - Query production databases yourself, no dependency on IT
  • Handle Big Data - Analyze millions of rows efficiently on the server
  • Data Preparation - Clean, filter, aggregate data before analysis
  • Universal Skill - Works with MySQL, PostgreSQL, SQL Server, Oracle, Snowflake
  • Integration - Connects to Excel, Python, Tableau, Power BI, Google Sheets
Industry Reality:
According to Stack Overflow's Developer Survey, SQL is consistently in the top 3 most-used languages. For data analysts specifically, 85%+ of job postings require SQL proficiency. It's the foundation skill that enables everything else in analytics.

Relational Database Structure:

Component Description Example
Database Container holding related tables Sales_DB, HR_Database
Table Collection of related records (like Excel sheet) Customers, Orders, Products
Row (Record) Single data entry One customer, one order
Column (Field) Attribute or property Name, Email, Order_Date
Primary Key Unique identifier for each row Customer_ID, Order_ID
Foreign Key Reference to primary key in another table Customer_ID in Orders table

Sample Database Schema - E-commerce:

CUSTOMERS (Customer_ID, First_Name, Last_Name, Email, City, State, Join_Date)
ORDERS (Order_ID, Customer_ID, Order_Date, Total_Amount, Status)
ORDER_ITEMS (Item_ID, Order_ID, Product_ID, Quantity, Unit_Price)
PRODUCTS (Product_ID, Product_Name, Category, Price, Stock_Quantity)

Relationships:
• Customers ←→ Orders (one customer can have many orders)
• Orders ←→ Order_Items (one order contains many items)
• Products ←→ Order_Items (one product appears in many order items)

12.2 SQL Basics - SELECT Statement

SELECT retrieves data from tables. It's the foundation of every SQL query.

Basic SELECT Syntax:

Retrieve all columns and rows:
SELECT * FROM Customers;

Retrieve specific columns:
SELECT First_Name, Last_Name, Email
FROM Customers;

Retrieve with calculated column:
SELECT
  First_Name,
  Last_Name,
  YEAR(CURRENT_DATE) - YEAR(Join_Date) AS Years_Member
FROM Customers;

Retrieve unique values (remove duplicates):
SELECT DISTINCT City
FROM Customers;

Count total rows:
SELECT COUNT(*) AS Total_Customers
FROM Customers;

Column Aliases (AS keyword):

Purpose: Rename columns in output for clarity

SELECT
  First_Name AS "First",
  Last_Name AS "Last",
  Total_Amount AS "Order Total"
FROM Orders;

12.3 Filtering Data with WHERE Clause

WHERE filters rows based on conditions - critical for focused analysis.

Comparison Operators:

Operator Meaning Example
= Equal to WHERE Status = 'Shipped'
!= or <> Not equal to WHERE Status != 'Cancelled'
> , >= , < , <= Greater/Less than WHERE Total_Amount > 100
BETWEEN In range (inclusive) WHERE Price BETWEEN 10 AND 50
IN Matches any in list WHERE State IN ('CA','NY','TX')
LIKE Pattern matching WHERE Email LIKE '%@gmail.com'
IS NULL Missing value WHERE Phone IS NULL

WHERE Clause Examples:

Single condition:
SELECT * FROM Orders
WHERE Order_Date >= '2025-01-01';

Multiple conditions with AND:
SELECT * FROM Products
WHERE Category = 'Electronics'
AND Price > 500;

Multiple conditions with OR:
SELECT * FROM Customers
WHERE State = 'California'
OR State = 'New York';

Combining AND/OR (use parentheses):
SELECT * FROM Orders
WHERE (Status = 'Pending' OR Status = 'Processing')
AND Total_Amount > 1000;

Pattern matching with LIKE:
SELECT * FROM Customers
WHERE Last_Name LIKE 'S%'; -- Starts with S

SELECT * FROM Products
WHERE Product_Name LIKE '%Pro%'; -- Contains "Pro"

12.4 Sorting with ORDER BY

ORDER BY controls the sequence of results - essential for reports and top N analysis.

ORDER BY Examples:

Sort ascending (default):
SELECT * FROM Products
ORDER BY Price;

Sort descending:
SELECT * FROM Orders
ORDER BY Total_Amount DESC;

Sort by multiple columns:
SELECT * FROM Customers
ORDER BY State, City, Last_Name;

Top 10 highest value orders:
SELECT * FROM Orders
ORDER BY Total_Amount DESC
LIMIT 10;

Bottom 5 products by stock:
SELECT Product_Name, Stock_Quantity
FROM Products
ORDER BY Stock_Quantity ASC
LIMIT 5;
Performance Tip: ORDER BY can be slow on large tables. If you only need top N results, always use LIMIT to reduce processing. Example: "Get top 100" not "Get all then take top 100 in Excel."

12.5 Aggregate Functions & GROUP BY

Aggregate functions summarize data - the backbone of analytical queries.

Core Aggregate Functions:

Function Purpose Example
COUNT() Count rows or non-null values COUNT(*), COUNT(Email)
SUM() Total of numeric column SUM(Total_Amount)
AVG() Average value AVG(Price)
MIN() Minimum value MIN(Order_Date)
MAX() Maximum value MAX(Stock_Quantity)

GROUP BY - Summarizing by Categories:

Total sales by month:
SELECT
  DATE_FORMAT(Order_Date, '%Y-%m') AS Month,
  COUNT(*) AS Order_Count,
  SUM(Total_Amount) AS Total_Revenue,
  AVG(Total_Amount) AS Avg_Order_Value
FROM Orders
GROUP BY DATE_FORMAT(Order_Date, '%Y-%m')
ORDER BY Month;

Sales by product category:
SELECT
  Category,
  COUNT(*) AS Product_Count,
  AVG(Price) AS Avg_Price,
  MIN(Price) AS Lowest_Price,
  MAX(Price) AS Highest_Price
FROM Products
GROUP BY Category
ORDER BY Product_Count DESC;

Customers by state:
SELECT
  State,
  COUNT(*) AS Customer_Count
FROM Customers
GROUP BY State
ORDER BY Customer_Count DESC
LIMIT 10;

HAVING Clause - Filtering After Aggregation:

Customers who spent over $5,000:
SELECT
  Customer_ID,
  COUNT(*) AS Order_Count,
  SUM(Total_Amount) AS Total_Spent
FROM Orders
GROUP BY Customer_ID
HAVING SUM(Total_Amount) > 5000
ORDER BY Total_Spent DESC;

Note: WHERE filters rows BEFORE grouping
HAVING filters groups AFTER aggregation

12.6 JOINs - Combining Tables

JOINs connect related tables to answer complex business questions.

Types of JOINs:

JOIN Type Returns Use Case
INNER JOIN Only matching rows from both tables Orders with customer info (exclude orphans)
LEFT JOIN All from left table + matches from right All customers including those with no orders
RIGHT JOIN All from right table + matches from left Rarely used (use LEFT JOIN instead)
FULL OUTER JOIN All rows from both tables Complete reconciliation, find mismatches

JOIN Examples:

INNER JOIN - Orders with customer details:
SELECT
  o.Order_ID,
  o.Order_Date,
  c.First_Name,
  c.Last_Name,
  c.City,
  o.Total_Amount
FROM Orders o
INNER JOIN Customers c ON o.Customer_ID = c.Customer_ID
WHERE o.Order_Date >= '2025-01-01';

LEFT JOIN - All customers with order counts:
SELECT
  c.Customer_ID,
  c.First_Name,
  c.Last_Name,
  COUNT(o.Order_ID) AS Order_Count,
  COALESCE(SUM(o.Total_Amount), 0) AS Total_Spent
FROM Customers c
LEFT JOIN Orders o ON c.Customer_ID = o.Customer_ID
GROUP BY c.Customer_ID, c.First_Name, c.Last_Name
ORDER BY Total_Spent DESC;

Multiple JOINs - Complete order details:
SELECT
  o.Order_ID,
  c.First_Name || ' ' || c.Last_Name AS Customer,
  p.Product_Name,
  oi.Quantity,
  oi.Unit_Price,
  (oi.Quantity * oi.Unit_Price) AS Line_Total
FROM Orders o
INNER JOIN Customers c ON o.Customer_ID = c.Customer_ID
INNER JOIN Order_Items oi ON o.Order_ID = oi.Order_ID
INNER JOIN Products p ON oi.Product_ID = p.Product_ID
WHERE o.Order_Date BETWEEN '2025-01-01' AND '2025-03-31';

12.7 Subqueries & Common Table Expressions (CTEs)

Subqueries and CTEs break complex problems into manageable steps.

Subquery Examples:

Above-average spending customers:
SELECT Customer_ID, Total_Spent
FROM (
  SELECT
    Customer_ID,
    SUM(Total_Amount) AS Total_Spent
  FROM Orders
  GROUP BY Customer_ID
) AS customer_totals
WHERE Total_Spent > (
  SELECT AVG(Total_Spent)
  FROM (
    SELECT SUM(Total_Amount) AS Total_Spent
    FROM Orders
    GROUP BY Customer_ID
  ) AS averages
);

Products never ordered:
SELECT Product_ID, Product_Name
FROM Products
WHERE Product_ID NOT IN (
  SELECT DISTINCT Product_ID
  FROM Order_Items
);

CTEs - Cleaner and More Readable:

Same query using CTE (WITH clause):
WITH customer_spending AS (
  SELECT
    Customer_ID,
    SUM(Total_Amount) AS Total_Spent
  FROM Orders
  GROUP BY Customer_ID
),
average_spending AS (
  SELECT AVG(Total_Spent) AS avg_amount
  FROM customer_spending
)
SELECT
  cs.Customer_ID,
  cs.Total_Spent,
  av.avg_amount AS Average_Spent
FROM customer_spending cs
CROSS JOIN average_spending av
WHERE cs.Total_Spent > av.avg_amount
ORDER BY cs.Total_Spent DESC;
CTE Advantages:
• More readable - each step has a meaningful name
• Reusable - reference same CTE multiple times
• Easier debugging - test each CTE independently
• Better performance in some databases

12.8 Window Functions - Advanced Analytics

Window functions perform calculations across rows while preserving individual row details (unlike GROUP BY which collapses rows).

Common Window Functions:

  • ROW_NUMBER() - Unique sequential number for each row
  • RANK() - Ranking with gaps for ties (1, 2, 2, 4)
  • DENSE_RANK() - Ranking without gaps (1, 2, 2, 3)
  • LAG() / LEAD() - Access previous/next row value
  • SUM/AVG/COUNT OVER() - Running totals, moving averages

Window Function Examples:

Rank customers by spending:
SELECT
  Customer_ID,
  SUM(Total_Amount) AS Total_Spent,
  RANK() OVER (ORDER BY SUM(Total_Amount) DESC) AS Rank
FROM Orders
GROUP BY Customer_ID
ORDER BY Total_Spent DESC;

Running total of daily revenue:
SELECT
  Order_Date,
  SUM(Total_Amount) AS Daily_Revenue,
  SUM(SUM(Total_Amount)) OVER (
    ORDER BY Order_Date
  ) AS Running_Total
FROM Orders
GROUP BY Order_Date
ORDER BY Order_Date;

Month-over-month growth:
WITH monthly_revenue AS (
  SELECT
    DATE_FORMAT(Order_Date, '%Y-%m') AS Month,
    SUM(Total_Amount) AS Revenue
  FROM Orders
  GROUP BY DATE_FORMAT(Order_Date, '%Y-%m')
)
SELECT
  Month,
  Revenue,
  LAG(Revenue) OVER (ORDER BY Month) AS Prev_Month,
  Revenue - LAG(Revenue) OVER (ORDER BY Month) AS Change,
  ROUND(((Revenue - LAG(Revenue) OVER (ORDER BY Month)) /
    LAG(Revenue) OVER (ORDER BY Month)) * 100, 2) AS Growth_Pct
FROM monthly_revenue;

12.9 Query Optimization Best Practices

Writing efficient SQL saves time and reduces server load.

Performance Best Practices:

  • Select Only What You Need - Avoid SELECT * in production queries
  • Use WHERE to Filter Early - Reduce data before JOIN operations
  • Index Key Columns - Create indexes on frequently queried/joined columns
  • Limit Results - Use LIMIT for testing and previews
  • Avoid Functions on Indexed Columns - WHERE YEAR(date) prevents index use
  • Use EXPLAIN - Analyze query execution plan to find bottlenecks

Before/After Optimization:

❌ Slow Query:
SELECT *
FROM Orders o
JOIN Customers c ON o.Customer_ID = c.Customer_ID
WHERE YEAR(o.Order_Date) = 2025;


Problems:
• SELECT * retrieves unnecessary columns
• YEAR() function prevents index use on Order_Date
• No LIMIT (returns all rows)

✓ Optimized Query:
SELECT
  o.Order_ID,
  o.Order_Date,
  o.Total_Amount,
  c.First_Name,
  c.Last_Name
FROM Orders o
JOIN Customers c ON o.Customer_ID = c.Customer_ID
WHERE o.Order_Date >= '2025-01-01'
AND o.Order_Date < '2026-01-01'
LIMIT 1000;


Improvements:
• Specific columns only
• Date range allows index use
• LIMIT controls output size
• Result: 10x-100x faster on large tables

✓ Module 12 Complete

You've learned:

  • Database fundamentals and relational structure
  • SQL basics: SELECT, WHERE, ORDER BY with multiple examples
  • Filtering data with comparison and pattern matching operators
  • Aggregating data with COUNT, SUM, AVG, MIN, MAX and GROUP BY
  • Joining tables with INNER JOIN, LEFT JOIN for complex queries
  • Subqueries and CTEs for multi-step analysis
  • Window functions for advanced analytics (rankings, running totals)
  • Query optimization techniques for better performance
  • Real-world SQL examples ready to adapt for your business

Next: Module 13 covers predictive analytics and machine learning basics.

← Back to All Modules Next Module →