在业务数据分析中,经常需要统计每周的订单量、销售额等指标,同时计算近3周、近4周的滚动平均值来观察趋势变化,这个需求可以通过SQL的日期处理结合窗口函数高效实现。
第一步:实现按周统计基础指标
首先需要把原始数据按周分组,统计每周的目标指标。这里需要先处理日期字段,将日期转换为对应的周标识,不同数据库的日期函数略有差异,以下是常见数据库的实现方式:
MySQL按周统计示例
MySQL中可以使用YEARWEEK()函数获取年和周的组合标识,第一个参数传日期字段,第二个参数传1表示周从周一开始,符合国内常用的周起始规则:
-- 假设有订单表order_table,包含order_date(订单日期)、order_amount(订单金额)字段
SELECT
YEARWEEK(order_date, 1) AS week_id, -- 周标识,格式为YYYYWW
MIN(order_date) AS week_start, -- 周起始日
MAX(order_date) AS week_end, -- 周结束日
SUM(order_amount) AS week_sales -- 周销售额
FROM order_table
GROUP BY YEARWEEK(order_date, 1)
ORDER BY week_id;
PostgreSQL按周统计示例
PostgreSQL可以使用DATE_TRUNC()函数直接截断到周维度,得到周的起始日期:
SELECT
DATE_TRUNC('week', order_date) AS week_start, -- 周起始日,默认周从周日开始,可通过设置调整
SUM(order_amount) AS week_sales
FROM order_table
GROUP BY DATE_TRUNC('week', order_date)
ORDER BY week_start;
第二步:用窗口函数计算滚动平均
得到按周统计的结果后,就可以使用窗口函数计算滚动平均值。滚动平均属于典型的滑动窗口计算场景,SQL的AVG()函数结合OVER()子句即可实现,核心是通过ROWS或RANGE指定窗口范围。
如果需要计算近3周(包含当前周)的滚动平均,窗口范围可以设置为ROWS BETWEEN 2 PRECEDING AND CURRENT ROW,表示当前行往前2行到当前行的范围:
-- 以上一步的周统计结果作为子查询,计算近3周滚动平均
WITH week_stat AS (
SELECT
YEARWEEK(order_date, 1) AS week_id,
MIN(order_date) AS week_start,
SUM(order_amount) AS week_sales
FROM order_table
GROUP BY YEARWEEK(order_date, 1)
)
SELECT
week_id,
week_start,
week_sales,
-- 计算近3周滚动平均,不足3周时按实际行数计算
AVG(week_sales) OVER (
ORDER BY week_id
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS rolling_3w_avg
FROM week_stat
ORDER BY week_id;
日期处理的注意事项
- 周起始日设置:不同数据库的周默认起始日不同,MySQL的
YEARWEEK()第二个参数可以指定,PostgreSQL可以通过SET datestyle调整,避免周划分不符合业务预期。 - 跨年周处理:跨年的周标识需要保证排序正确,比如2023年第52周之后是2024年第1周,按周标识排序时不会出现顺序错乱。
- 缺失周处理:如果某周没有数据,分组后不会出现该周的记录,会导致滚动窗口的行数不对,需要先生成连续的周序列再左关联统计结果,补0处理缺失周。
补充:缺失周补0的完整示例
以下是MySQL中生成连续周序列并补0的完整实现:
-- 生成近12周的连续周序列
WITH RECURSIVE week_series AS (
SELECT
YEARWEEK(DATE_SUB(CURDATE(), INTERVAL 11 WEEK), 1) AS week_id,
DATE_SUB(CURDATE(), INTERVAL 11 WEEK) AS week_start
UNION ALL
SELECT
YEARWEEK(DATE_ADD(week_start, INTERVAL 1 WEEK), 1),
DATE_ADD(week_start, INTERVAL 1 WEEK)
FROM week_series
WHERE week_start < CURDATE()
),
-- 原始周统计
week_stat AS (
SELECT
YEARWEEK(order_date, 1) AS week_id,
SUM(order_amount) AS week_sales
FROM order_table
WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 11 WEEK)
GROUP BY YEARWEEK(order_date, 1)
)
-- 关联计算滚动平均
SELECT
ws.week_id,
ws.week_start,
COALESCE(ws2.week_sales, 0) AS week_sales,
AVG(COALESCE(ws2.week_sales, 0)) OVER (
ORDER BY ws.week_id
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS rolling_3w_avg
FROM week_series ws
LEFT JOIN week_stat ws2 ON ws.week_id = ws2.week_id
ORDER BY ws.week_id;