One of our subscribers reached us and asked How Grouping works in SQL ? So today , this is what we are going to do . We will discuss in detail about grouping in SQL and how it actually works behind the scenes. Also , this is our first post in Questions category and we will be doing more as there are bundle of requests in the queue. So if you want to contribute and like us to write something for you , please do not hesitate.
How Grouping works in SQL?
Grouping summarises the whole data set into smaller chunks of data where each row represents a group . Group is when we combine a group of things, elements under some rule. While working with SQL reports , a lot of times we will come across scenarios where we want to summarise the data. For this purpose we use MS SQL Group By Clause
Also see : MS SQL Group By Clause
because this clause summarise the data into groups . Time to dig in now.
Prepare the data
Examples and queries that we will see here will run on our University Management System database . If you want to use these SQL queries for practice , create the same database , you need to just copy paste the script from here .
Case Study : Grade Reports from University Management System
We are asked to get the top 5 students with top grades . One best query writing practice is to try and visualise the end result and then build the query around it. To do this , you need first to see the underlying data. If you run the below SQL query , you will have a good look at the data we have.
SELECT P.FirstName,C.Title,Grade FROM [School].[dbo].[StudentGrade] SG LEFT JOIN [School].[dbo].Course C ON C.CourseID=SG.CourseID LEFT JOIN [dbo].Person P ON P.PersonID=SG.StudentID
Remember our objective. We need first five students with top marks out of these forty rows. Give a brief and a long look at the table. You are seeing three columns here. Student names , course names and respective grades. Objective is to find the top 5 students .
Let’s break up the table and visualise what is we want. We want two things
- Student Names
- Student Grades
Now it is clear that we want top achievers which are actually the students who have obtained highest marks. How can you find that? By adding marks in all courses of each student.
Aggregate functions in SQL will help us to add the grades . See the Grades columns , its a column of numbers.
Can you guess of which aggregate functions in SQL we will use? SQL Count () or SQL Sum() ?
See : Difference Between SUM() & Count ()
It has to be SQL Group By SUM which will do the job for us. Now , we have already determined that when Grouping clause is in effect each row represents
a group . So to achieve the objective remove Course column from the query
Read : MS SQL Group By Clause
SELECT P.FirstName,GradeFROM [School].[dbo].[StudentGrade] SG LEFT JOIN [School].[dbo].Course C ON C.CourseID=SG.CourseID LEFT JOIN [dbo].Person P ON P.PersonID=SG.StudentID
See the table again. Let’s find total marks of each student. Run this query
SELECT P.FirstName,SUM(GRADE) AS TotalStudentsinCourse FROM [School].[dbo].[StudentGrade] SG LEFT JOIN [School].[dbo].Course C ON C.CourseID=SG.CourseID LEFT JOIN [dbo].Person P ON P.PersonID=SG.StudentID
It will give us out this
We have applied the aggregate function but haven’t told our SQL Server on what basis it should be grouped. We want to group the grades against each student that is why we are writing FirstName in query . To apply SQL aggregate function of SUM on grades column you need to put First Name in Group By Clause. When you do that it will create groups for each student like this
SELECT P.FirstName,SUM(GRADE) AS TotalStudentsinCourse FROM [School].[dbo].[StudentGrade] SG LEFT JOIN [School].[dbo].Course C ON C.CourseID=SG.CourseID LEFT JOIN [dbo].Person P ON P.PersonID=SG.StudentID GROUP BY P.FirstName
All students are appearing only once because all same elements are grouped together . Each student was taking up more than one course so SQL Server first combines the elements into group. In our case , unique student names. Now it will apply SQL Sum for each group. Combine both steps and SQL Server handles bot these steps like this
This is the final output you will see
See how SQL Server reduces the table into groups and then by the use of aggregate functions in SQL it gives us results for those groups. We will use TOP clause with Order By Clause to find the results
SELECT TOP (5) P.FirstName,SUM(GRADE) AS TotalStudentsinCourse FROM [School].[dbo].[StudentGrade] SG LEFT JOIN [School].[dbo].Course C ON C.CourseID=SG.CourseID LEFT JOIN [dbo].Person P ON P.PersonID=SG.StudentID GROUP BY P.FirstName ORDER BY SUM(GRADE) DESC
If you use TOP clause always use Order By clause with it to get correct data. Because , results can be random , so if you apply Order By then the order will be consistent. Topic for another day!
ALL YOU NEED TO KNOW ABOUT GROUPING
- Grouping is applied on all columns as combination in the same order they appear in query.
- Each row represents a unique row.
- Filtering on grouping is done by using HAVING Clause
- Use ORDER BY clause to arrange the data set in a order.
if you can find more points about grouping , do comment and we will add this in our list.
I hope now you understand about How Grouping Works in SQL because this has been a brief article. Now if you like this article , do share with your friends and colleagues and recommend us . Oh! and if the sponsored content is bothering you , apologies for that too because it’s the only source of revenue for this blog.
See you with next post very soon!