4分钟内说明sql窗口分析功能
表中的内容(Table of Content)
- Introduction介绍
- What are Window (Analytic) Functions?什么是窗口(分析)功能?
- Anatomy of a Window Function 窗函数剖析
- Example of an Aggregate Function vs Window Function集合函数与窗口函数的示例
- Advantages of Window Functions窗口功能的优点
介绍(Introduction)
You’re comfortable with simple queries (SELECT FROM WHERE GROUP BY) and you want to take your skills to the next level. What should you learn next?
您对简单的查询(从WHERE GROUP BY中选择)感到满意,并且希望将自己的技能提高到一个新的水平。 接下来您应该学什么?
Window functions, also called analytics functions, are powerful SQL functions that will take you from a beginner SQL user to an intermediate SQL user. In this article, we’ll briefly go over what a window function is and then provide a simple example to give some clarity.
窗口函数(也称为分析函数)是功能强大SQL函数,可将您从初学者SQL用户带到中级SQL用户。 在本文中,我们将简要介绍什么是窗口函数,然后提供一个简单的示例以使其更加清晰。
什么是窗口(分析)功能? (What are Window (Analytic) Functions?)
A window function is like an aggregate function in the sense that it returns aggregate values (eg. SUM(), COUNT(), MAX()).
窗口函数在返回聚合值(例如SUM(),COUNT(),MAX())的意义上类似于聚合函数。
What makes window functions different is that it does not group the result set. The number of rows in the output is the same as the number of rows in the input.
使窗口函数与众不同的原因是它不对结果集进行分组。 输出中的行数与输入中的行数相同。
If this doesn’t make sense to you yet, don’t worry. Stick around until the example and it will make much more sense. :)
如果这对您还没有意义,请不要担心。 坚持学习直到该示例,它将变得更加有意义。 :)
窗函数剖析 (Anatomy of a Window Function)
The basic anatomy of a window function is as follows:
窗口函数的基本结构如下:
SELECT SUM() OVER(PARTITION BY ___ ORDER BY ___) FROM Table
To make it easier to understand, there are three main parts to remember:
为了更容易理解,需要记住三个主要部分:
The aggregate function: In the example above, I used SUM() but you can also use COUNT(), AVG(), etc…
聚合函数:在上面的示例中,我使用了SUM(),但您也可以使用COUNT(),AVG()等…
PARTITION BY: Simply think of this as a GROUP BY clause, but instead, it’s called PARTITION BY.
PARTITION BY:只需将其视为GROUP BY子句,而是将其称为PARTITION BY。
ORDER BY: ORDER BY is the same as you would expect. This is important to consider when the order of your output matters.
ORDER BY: ORDER BY与您期望的相同。 当您的输出顺序很重要时,考虑这一点很重要。
集合函数与窗口函数的示例 (Example of an Aggregate Function vs Window Function)
To provide some more clarity, suppose we have the following table:
为了提供更多的清晰度,假设我们有下表:
If we wanted to get the average GPA by gender, we could use an aggregate function and run the following query to get the following result:
如果要按性别获取平均GPA,可以使用汇总函数并运行以下查询以获取以下结果:
SELECT Gender, AVG(GPA) as avg_gpa
FROM students
GROUP BY Gender
The next part is key.
下一部分是关键。
Now suppose we wanted to get the following output:
现在假设我们想要获得以下输出:
We could use an aggregate function (like the one above) and then join the results to the initial table, but that would require two steps.
我们可以使用一个聚合函数(如上面的函数),然后将结果连接到初始表,但这需要两个步骤。
Instead, we can use a window function to achieve the same output:
相反,我们可以使用窗口函数来实现相同的输出:
SELECT *,
AVG(GPA) OVER (PARTITION BY Gender) as avg_gpa
FROM table
With the query above, we are partitioning (splitting) the data by gender, and calculating the average GPA for each gender. It then creates a new column called avg_gpa and appends the associated average GPA for each row.
通过上面的查询,我们正在按性别对数据进行划分(拆分),并计算每种性别的平均GPA。 然后,它将创建一个称为avg_gpa的新列,并为每行附加关联的平均GPA。
窗口功能的优点 (Advantages of Window Functions)
There are three main advantages to using window functions:
使用窗口功能有三个主要优点:
简单 (Simplicity)
Once you get the hang of it, window functions are much more simple to use. Notice in the example above that only one extra line is required to get the desired output using a window function, as opposed to using an aggregating function and then joining the results.
一旦掌握了窍门,窗口函数就更易于使用。 请注意,在上面的示例中,与使用聚合函数然后合并结果相比,使用窗口函数仅需要多一行即可获得所需的输出。
速度 (Speed)
Related to the point above, using window functions are much faster than using the alternative method. This will come in handy when you’re processing hundreds or thousands of gigabytes of data.
与上述要点相关,使用窗口函数比使用替代方法要快得多。 当您处理成百上千个千兆字节的数据时,这将派上用场。
多功能性 (Versatility)
Most importantly, there’s a ton of versatility with window functions that goes beyond the scope of this article. Some examples include adding a moving average, adding a row number, and lagging the data.
最重要的是,窗口函数具有大量的多功能性,这超出了本文的范围。 一些示例包括添加移动平均线,添加行号和滞后数据。
谢谢阅读! (Thanks for Reading!)
By reading this, you should have an understanding of what a window function is, how to write a window function, and the advantages of a window function.
通过阅读本文,您应该了解什么是窗口函数,如何编写窗口函数以及窗口函数的优点。
Hopefully, the examples have made it easier to understand how a window function works. As always, I wish you the best of luck in your learning endeavors!
希望这些示例可以使您更容易理解窗口函数的工作原理。 一如既往,祝您学习顺利!
特伦斯·辛 (Terence Shin)
Check out my free data science resource with new material every week!
每周查看我的免费数据科学资源以及新材料!
If you enjoyed this, follow me on Medium for more
如果您喜欢这个,请在Medium上关注我以了解更多
Let’s connect on LinkedIn
让我们在LinkedIn上建立联系
翻译自: https://towardsdatascience.com/sql-window-analytic-functions-explained-in-4-minutes-6171c6983263