Add missing data from previous month or year cumul

2020-02-16 02:13发布

Say I have the following data:

select 1 id, 'A' name, '2007' year, '04' month,  5 sales  from dual union all
select 2 id, 'A' name, '2007' year, '05' month,  2 sales  from dual union all
select 3 id, 'B' name, '2008' year, '12' month,  3 sales  from dual union all
select 4 id, 'B' name, '2009' year, '12' month, 56 sales  from dual union all
select 5 id, 'C' name, '2009' year, '08' month, 89 sales  from dual union all
select 13 id,'B' name, '2016' year, '01' month, 10 sales  from dual union all
select 14 id,'A' name, '2016' year, '02' month,  8 sales  from dual union all
select 15 id,'D' name, '2016' year, '03' month, 12 sales  from dual union all
select 16 id,'E' name, '2016' year, '04' month, 34 sales  from dual

I want to cumulatively add up all the sales across all years and their respective periods (months). The output should look like the following:

name    year    month   sale   opening bal   closing bal
 A      2007     04      5        0              5
 A      2007     05      2        5              7
 B      2008     12      3        12             15
 A      2008     04      0        5              5    -- to be generated
 A      2008     05      0        7              7    -- to be generated
 B      2009     12      56       15             71
 C      2009     08      89       71             160
 A      2009     04      0        5              5    -- to be generated
 A      2009     05      0        7              7    -- to be generated
 B      2016     01      10       278            288
 B      2016     12      0         71             71  -- to be generated
 A      2016     02      8        288            296
 A      2016     04      0         5              5   -- to be generated
 A      2016     05      0         7              7   -- to be generated
 D      2016     03      12       296            308
 E      2016     04      34       308            342
 C      2016     08      0        160            160  -- to be generated

The Opening balance is the closing balance of previous month, and if it goes into next year than the opening balance for next year is the closing balance of the previous year. It should be able to work like this for subsequent years. I've got this part working. However, I don't know how to get around ths missing in say 2009 that exists in 2008. For instance the key A,2008,04 and also A,2008,05 does not exist in 2009 and the code should be able to add it in 2009 like above. Same applies for other years and months.

I'm working on Oracle 12c.

Thanks in advance.

标签: sql oracle plsql
2条回答
The star\"
2楼-- · 2020-02-16 02:46

A variation on @boneists approach, starting with your sample data in a CTE:

with t as (
  select 1 id, 'A' name, '2007' year, '04' month,  5 sales  from dual union all
  select 2 id, 'A' name, '2007' year, '05' month,  2 sales  from dual union all
  select 3 id, 'B' name, '2008' year, '12' month,  3 sales  from dual union all
  select 4 id, 'B' name, '2009' year, '12' month, 56 sales  from dual union all
  select 5 id, 'C' name, '2009' year, '08' month, 89 sales  from dual union all
  select 13 id,'B' name, '2016' year, '01' month, 10 sales  from dual union all
  select 14 id,'A' name, '2016' year, '02' month,  8 sales  from dual union all
  select 15 id,'D' name, '2016' year, '03' month, 12 sales  from dual union all
  select 16 id,'E' name, '2016' year, '04' month, 34 sales  from dual
),
y (year, rnk) as (
  select year, dense_rank() over (order by year)
  from (select distinct year from t)
),
r (name, year, month, sales, rnk) as (
  select t.name, t.year, t.month, t.sales, y.rnk
  from t
  join y on y.year = t.year
  union all
  select r.name, y.year, r.month, 0, y.rnk
  from y
  join r on r.rnk = y.rnk - 1
  where not exists (
    select 1 from t where t.year = y.year and t.month = r.month and t.name = r.name
  )
)
select name, year, month, sales,
  nvl(sum(sales) over (partition by name order by year, month
    rows between unbounded preceding and 1 preceding), 0) as opening_bal,
  nvl(sum(sales) over (partition by name order by year, month
    rows between unbounded preceding and current row), 0) as closing_bal
from r
order by year, month, name;

Which gets the same result too, though it also doesn't match the expected results in the question:

NAME YEAR MONTH      SALES OPENING_BAL CLOSING_BAL
---- ---- ----- ---------- ----------- -----------
A    2007 04             5           0           5
A    2007 05             2           5           7
A    2008 04             0           7           7
A    2008 05             0           7           7
B    2008 12             3           0           3
A    2009 04             0           7           7
A    2009 05             0           7           7
C    2009 08            89           0          89
B    2009 12            56           3          59
B    2016 01            10          59          69
A    2016 02             8           7          15
D    2016 03            12           0          12
A    2016 04             0          15          15
E    2016 04            34           0          34
A    2016 05             0          15          15
C    2016 08             0          89          89
B    2016 12             0          69          69

The y CTE (feel free to use more meaningful names!) generates all the distinct years from your original data, and also adds a ranking, so 2007 is 1, 2008 is 2, 2009 is 3, and 2016 is 4.

The r recursive CTE combines your actual data with dummy rows with zero sales, based on the name/month data from previous years.

From what that recursive CTE produces you can do your analytic cumulative sum to add the opening/closing balances. This is using windowing clauses to decide which sales values to include - essentially the opening and closing balances are the sum of all values up to this point, but opening doesn't include the current row.

查看更多
劫难
3楼-- · 2020-02-16 03:00

This is the closest I can get to your result, although I realise it's not an exact match. For example, your opening balances don't look correct (where did the opening balance of 12 come from for the output row for id = 3?). Anyway, hopefully the following will enable you to amend as appropriate:

with sample_data as (select 1 id, 'A' name, '2007' year, '04' month,  5 sales  from dual union all
                     select 2 id, 'A' name, '2007' year, '05' month,  2 sales  from dual union all
                     select 3 id, 'B' name, '2008' year, '12' month,  3 sales  from dual union all
                     select 4 id, 'B' name, '2009' year, '12' month, 56 sales  from dual union all
                     select 5 id, 'C' name, '2009' year, '08' month, 89 sales  from dual union all
                     select 13 id, 'B' name, '2016' year, '01' month, 10 sales  from dual union all
                     select 14 id, 'A' name, '2016' year, '02' month,  8 sales  from dual union all
                     select 15 id, 'D' name, '2016' year, '03' month, 12 sales  from dual union all
                     select 16 id, 'E' name, '2016' year, '04' month, 34 sales  from dual),
             dts as (select distinct year
                     from   sample_data),
             res as (select sd.name,
                            dts.year,
                            sd.month,
                            nvl(sd.sales, 0) sales,
                            min(sd.year) over (partition by sd.name, sd.month) min_year_per_name_month,
                            sum(nvl(sd.sales, 0)) over (partition by name order by to_date(dts.year||'-'||sd.month, 'yyyy-mm')) - nvl(sd.sales, 0) as opening,
                            sum(nvl(sd.sales, 0)) over (partition by name order by to_date(dts.year||'-'||sd.month, 'yyyy-mm')) as closing
                     from   dts
                            left outer join sample_data sd partition by (sd.name, sd.month) on (sd.year = dts.year))
select name,
       year,
       month,
       sales,
       opening,
       closing
from   res
where  (opening != 0 or closing != 0)
and    year >= min_year_per_name_month
order by to_date(year||'-'||month, 'yyyy-mm'),
         name;

NAME YEAR MONTH      SALES    OPENING    CLOSING
---- ---- ----- ---------- ---------- ----------
A    2007 04             5          0          5
A    2007 05             2          5          7
A    2008 04             0          7          7
A    2008 05             0          7          7
B    2008 12             3          0          3
A    2009 04             0          7          7
A    2009 05             0          7          7
C    2009 08            89          0         89
B    2009 12            56          3         59
B    2016 01            10         59         69
A    2016 02             8          7         15
D    2016 03            12          0         12
A    2016 04             0         15         15
E    2016 04            34          0         34
A    2016 05             0         15         15
C    2016 08             0         89         89
B    2016 12             0         69         69

I've used Partition Outer Join to link any month and name combination in the table (in my query, the sample_data subquery - you wouldn't need that subquery, you'd just use your table instead!) to any year in the same table, and then working out the opening / closing balances. I then discard any rows that have an opening and closing balance of 0.

查看更多
登录 后发表回答