Skip to content

Latest commit

 

History

History
120 lines (97 loc) · 3.09 KB

2-sql-vs-aggregation.mdx

File metadata and controls

120 lines (97 loc) · 3.09 KB
description
Let's compare SQL with an aggregation pipeline.

📘 SQL vs Aggregations

SQL

A SQL query statement usually starts with a SELECT where we put a list of the fields we need, then a FROM clause to state the table/s (or in this case, the subquery) that will return the rows. We can filter out using WHERE and group data using GROUP.

We read this from the inside. If there's too much nesting, it's not easy to follow.

Get authors’ bios with books that have an average 5-star rating

SELECT authors.name, authors.bio
FROM library.authors authors
JOIN library.author_book author_book_join ON authors.id = author_book_join.author_id
JOIN (
    SELECT books.id
    FROM library.books books
    JOIN library.reviews reviews ON books.id = reviews.book_id
    GROUP BY books.id
    HAVING AVG(reviews.rating) = 5
) five_star_books ON author_book_join.book_id = five_star_books.id;

Get annual, average, and max spending from customers in all cities

SELECT
    city,
    SUM(annual_spend) Total_Spend,
    AVG(annual_spend) Average_Spend,
    MAX(annual_spend) Max_Spend,
    COUNT(annual_spend) customers
FROM (
    SELECT t1.city, customer.annual_spend
    FROM customer 
    LEFT JOIN (
        SELECT address.address_id, city.city,
               address.customer_id, address.location
        FROM address LEFT JOIN city
        ON address.city_id = city.city_id
        ) AS t1
    ON
    (customer.customer_id = t1.customer_id AND
              t1.location = "home")
) AS t2
GROUP BY city;

Equivalent MongoDB aggregation pipeline

Get authors’ bios with books that have an average 5-star rating

We go through four stages:

  • Group all the reviews for every book, calculating the average rating.
  • Filter out all average ratings other than 5.
  • Now we have reviews with 5 stars, but we also want the author bio, so we join with author to get the bio.
  • We add a new field called bio with just the author's bio.
db.reviews.aggregate([
  { 
    $group: {
        _id: "$bookId",
        averageRating: {
          $avg: "$rating",
        },
      },
  },
  { $match: { averageRating: 5 } },
  { $lookup: {
        from: "authors",
        localField: "_id",
        foreignField: "books",
        as: "author",
      },
  },
  {$addFields: {
    bio: "$author.bio"
  }},
])

Get annual, average, and max spending from customers in all cities

Here we pass three stages: one to return one document per element in the address array, and then we filter out to get only the documents that have a home address location. Finally, we do the grouping. As we'll see, this can be split and tested separately and resembles our code.

db.customers.aggregate([
	{
		$unwind: "$address",
	},
	{
		$match: {"address.location": "home"}
	},
	{
		 $group: {
			_id: "$address.city",
			totalSpend: {$sum: "$annualSpend"},
			averageSpend: {$avg: "$annualSpend"},
			maximumSpend: {$max: "$annualSpend"},
			customers: {$sum: 1}
		}
	}
])

:::info See also SQL to Aggregation Mapping Chart :::