-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Currently I am investigating if Pinot can help us to calculate inbytes and outbytes from Radius accounting data.
The setup is:
sysbench (produces artificial accounting data) -> freeradius -> kafka -> pinot
The randomness of the artificial data is low so we end up with several "sessions" having the same Acct-Unique-Session-ID.
e.g. query:
select a.acctuniquesessionid, a.acctstatustype, a.eventtime, a.eventtimestamp from radius_json a
where a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82'
order by a.eventtime
returns:
acctuniquesessionid acctstatustype eventtime eventtimestamp
da0f86138ec36b2364889f048bf2ac82 Start 1712050245468 Apr 2 2024 09:30:45 UTC
da0f86138ec36b2364889f048bf2ac82 Stop 1712050293427 Apr 2 2024 09:31:33 UTC
da0f86138ec36b2364889f048bf2ac82 Start 1712759564017 Apr 10 2024 14:32:43 UTC
da0f86138ec36b2364889f048bf2ac82 Interim-Update 1712759589307 Apr 10 2024 14:33:09 UTC
da0f86138ec36b2364889f048bf2ac82 Interim-Update 1712759612449 Apr 10 2024 14:33:32 UTC
da0f86138ec36b2364889f048bf2ac82 Interim-Update 1712759616370 Apr 10 2024 14:33:36 UTC
da0f86138ec36b2364889f048bf2ac82 Stop 1712759668126 Apr 10 2024 14:34:27 UTC
Obviously there are two different sessions with the same acctuniquesessionid in the data.
In real life we can not guarantee that Session ID's are always unique, so this might also happen in real life data.
The solution is to filter out all 'Start' - 'Stop' combinations where there is a 'Start' or a 'Stop' in between.
And here comes the problem:
query:
select a.acctuniquesessionid, (a.eventtime - b.eventtime)/1000 as Zeit, a.acctstatustype, b.acctstatustype, a.eventtime, b.eventtime
,( select count(c.eventtime) from radius_json c where b.acctuniquesessionid = c.acctuniquesessionid
and a.acctuniquesessionid = c.acctuniquesessionid
and a.eventtime > 0 and b.eventtime > 0
and (c.acctstatustype = 'Stop' or c.acctstatustype = 'Start' )
and c.eventtime between b.eventtime+1 and a.eventtime-1
)
from radius_json a
join radius_json b
on b.acctuniquesessionid = a.acctuniquesessionid
and a.acctstatustype = 'Stop' and b.acctstatustype = 'Start' -- and a._3gppimsi <> 'null'
and a.eventtime > b.eventtime
where
not exists ( select 1 from radius_json c where b.acctuniquesessionid = c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid
and a.eventtime > 0 and b.eventtime > 0
and (c.acctstatustype = 'Stop' or c.acctstatustype = 'Start' )
and c.eventtime between b.eventtime+1 and a.eventtime-1
)
-- and a.acctuniquesessionid = 'fac07ae7853810946d87e868e463af2c'
-- and a.acctuniquesessionid = 'ac6baa744130522c1eb1eec161114d1b'
and a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82'
order by Zeit desc
Returns:
acctuniquesessionid Zeit acctstatustype acctstatustype eventtime eventtime EXPR$4
da0f86138ec36b2364889f048bf2ac82 104 Stop Start 1712759668126 1712759564017 0
da0f86138ec36b2364889f048bf2ac82 47 Stop Start 1712050293427 1712050245468 0
The two sessions as expected.
Removing the filter a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82' shows a very different result
query:
select a.acctuniquesessionid, (a.eventtime - b.eventtime)/1000 as Zeit, a.acctstatustype, b.acctstatustype, a.eventtime, b.eventtime
,( select count(c.eventtime) from radius_json c where b.acctuniquesessionid = c.acctuniquesessionid
and a.acctuniquesessionid = c.acctuniquesessionid
and a.eventtime > 0 and b.eventtime > 0
and (c.acctstatustype = 'Stop' or c.acctstatustype = 'Start' )
and c.eventtime between b.eventtime+1 and a.eventtime-1
)
from radius_json a
join radius_json b
on b.acctuniquesessionid = a.acctuniquesessionid
and a.acctstatustype = 'Stop' and b.acctstatustype = 'Start' -- and a._3gppimsi <> 'null'
and a.eventtime > b.eventtime
where
not exists ( select 1 from radius_json c where b.acctuniquesessionid = c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid
and a.eventtime > 0 and b.eventtime > 0
and (c.acctstatustype = 'Stop' or c.acctstatustype = 'Start' )
and c.eventtime between b.eventtime+1 and a.eventtime-1
)
-- and a.acctuniquesessionid = 'fac07ae7853810946d87e868e463af2c'
-- and a.acctuniquesessionid = 'ac6baa744130522c1eb1eec161114d1b'
-- and a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82'
order by Zeit desc
result (shortened by thousands of other sessions):
da0f86138ec36b2364889f048bf2ac82 709422 Stop Start 1712759668126 1712050245468 0
So here obviously both subqueries return wrong results. There is a 'Start' and a 'Stop' between timestamp 1712759668126 and 1712050245468 but it is not detected anymore when the filter for the acctuniquessionid is removed
Am I missing here something or am I hitting a bug in multistage engine?
I am using Pinot 1.1.0 as docker compose image (single instance of all server processes)
The query plan for both queries is the same beside the Logical Filter
first query:
LogicalFilter(condition=[AND(=($14, _UTF-8'da0f86138ec36b2364889f048bf2ac82'), =($12, _UTF-8'Stop'))])
second query:
LogicalFilter(condition=[=($12, _UTF-8'Stop')])
PLEASE NOTE: Some filters are redundant like "b.acctuniquesessionid = c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid" and are a test if the result changes. But it doesn't