Filter Rows Within Each Group

Question

I have a table t containing minute-level data with fields such as datetime (timestamp), stock_id (stock code), and ret (return). My goal is to:

First, group by date(datetime) and stock_id, and calculate the moving standard deviation mstdp(ret, 5) within each group, labeling it as st.
Then, group by date(datetime) and stock_id again, and filter rows within each group where st is greater than mean(st) + stdp(st).

This is the minimal data example ：

t=table(1:0, `datetime`stock_id`ret,[STRING,SYMBOL,DECIMAL32(4)])

insert into t values ('2023-01-03 09:30:00', `A, 0.012),
('2023-01-03 10:00:00', `A, 0.008),
('2023-01-03 10:30:00', `A, 0.015),
('2023-01-03 11:00:00', `A, -0.005),
('2023-01-03 11:30:00', `A, 0.020),
('2023-01-03 13:00:00', `A, 0.025),
('2023-01-03 13:30:00', `A, 0.018),
('2023-01-03 09:30:00', `B, 0.005),
('2023-01-03 10:00:00', `B, 0.010),
('2023-01-03 10:30:00', `B, 0.003),
('2023-01-03 11:00:00', `B, 0.015),
('2023-01-03 11:30:00', `B, -0.008),
('2023-01-03 13:00:00', `B, 0.022),
('2023-01-04 09:30:00', `A, 0.009),
('2023-01-04 10:00:00', `A, 0.014),
('2023-01-04 10:30:00', `A, 0.007),
('2023-01-04 11:00:00', `A, 0.019),
('2023-01-04 11:30:00', `A, -0.003),
('2023-01-04 09:30:00', `B, 0.012),
('2023-01-04 10:00:00', `B, 0.008),
('2023-01-04 10:30:00', `B, 0.016),
('2023-01-04 11:00:00', `B, -0.002),
('2023-01-04 11:30:00', `B, 0.011);

I attempted the following code:

select myFunc(st) as value
from (
    select datetime, stock_id, mstdp(ret, 5) as st
    from t
    context by date(datetime), stock_id
)
group by date(datetime) as date, stock_id
having st > mean(st) + stdp(st)

However, this returns an error:

The HAVING clause after GROUP BY must be followed by a boolean expression.

I understand that HAVING is typically used for filtering aggregated results, but here I want to evaluate each row's st value within its group rather than filtering aggregated values. How can I correctly implement this type of row-wise filtering within groups in DolphinDB? Should I use WHERE or another approach instead?

I tried this way(having st > (mean(st) + stdp(st))), but I'm still getting the same error — haru
– haru, Commented Dec 2 at 2:52
Oh you can't normally (in any RDBMS I have used) access both st and an aggregated version of st in the same having statement. I suggest you provide us with a minimal reproducible example to illustrate what you are trying to achieve, data wise, so that someone can help you build the correct query. As it stands its not clear what the desired outcome is. — Dale K
– Dale K, Commented Dec 2 at 3:10
To calculate aggregate values without GROUP BY (so that you can subsequently filter the rows) use analytic functions. If you add a minimal reproducible example I'll gladly add example code. docs.dolphindb.com/en/Programming/SQLStatements/… — MatBailie
– MatBailie, Commented Dec 2 at 11:03
You can’t filter rows within a group, a group is a single row. As other people have suggested, please update your question with sample data, the result you are trying to achieve and (if necessary) an explanation of the logic — NickW
– NickW, Commented Dec 2 at 15:36

Robert Hamilton · Accepted Answer · 2025-12-05 21:16:26Z

0

I won't be able to give a full answer since you are using MSSQL extensions instead of standard SQL. But the easiest way is to use sql analytic functions. Here is an ANSI standard example (using H2 for convenience) of how you might filter elements of a group based on group aggregates like mean and standard deviation:

-- tested in H2
drop table stocks;
create table stocks(a int,x double);
insert into stocks values (1,1.0),(1,2.0),  (1,3.0),(1,1.0);
with t as (
   select x,
   stddev(x) over (partition by a) as std, 
   avg(x) over (partition by a) as mean 
 from 
   stocks)
   select 
      x,
      std,
      mean 
   from t 
 where x>mean+std

   select x,std,mean from t where x>mean+std;
X   STD     MEAN  
3.0 0.9574271077563381  1.75

answered Dec 5 at 21:16

Robert Hamilton

1819 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Furqan Masood · Accepted Answer · 2025-12-05 21:20:04Z

The issue is that you're trying to use HAVING to filter individual rows based on a comparison with group-level aggregates, which isn't the correct syntax. In DolphinDB, HAVING is for filtering groups based on aggregate conditions, not for row-level filtering within groups.

Here's the correct approach using a window function with context by:

sql

select *
from (
    select datetime, stock_id, ret, mstdp(ret, 5) as st
    from t
    context by date(datetime), stock_id
)
where st > avg(st) + std(st)
context by date(datetime), stock_id

Explanation:

Inner query: Calculate st = mstdp(ret, 5) for each row within (date, stock_id) groups
Outer query with context by: For each row, calculate avg(st) and std(st) within its group, then filter rows where st > avg(st) + std(st)

Note: Use std() instead of stdp() in the filter condition - std() is the sample standard deviation which is typically what you want for comparison purposes.

Alternative approach using a subquery:

If you prefer to be more explicit, you can calculate the threshold first:

sql

select t1.*
from (
    select datetime, stock_id, ret, mstdp(ret, 5) as st
    from t
    context by date(datetime), stock_id
) t1
left join (
    select date(datetime) as date, stock_id, avg(st) + std(st) as threshold
    from (
        select datetime, stock_id, mstdp(ret, 5) as st
        from t
        context by date(datetime), stock_id
    )
    group by date(datetime), stock_id
) t2
on date(t1.datetime) = t2.date and t1.stock_id = t2.stock_id
where t1.st > t2.threshold

The first approach with context by in the WHERE clause is cleaner and more efficient for this use case.

Collectives™ on Stack Overflow

Filter Rows Within Each Group

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related