0

Refer to this example attached as follows. It consists 2 methods of streaming, one by defined function, another by class append method. The 2 methods produce exactly same results no matter how I tune the repay or subscribetable settings.

This is a bit confusing. I thought that the batchSize=0 in subscribetable means the engine would process a batch of messages once arrival. Then this means the number of messages in the batch could be either 1 or many. In the function I do volume_[0], picking the first element of the vector. I would expect to see something like 0 1 2 3 3 5 5 5 8 9 etc., but it produces strictly increasing results.

So is it

  1. the input of the function is always vector of length 1?

  2. Can I confirm that class append method process per row, and function process per batch, and the batch size is always 1

  3. Or is it my subscribetable settings are wrong, introduce only one row per batch? If it is, how should I correct it?



try{unsubscribeTable(,tableName="input_stream",actionName="tes")}catch(ex){}
try{dropStreamTable("input_stream")}catch(ex){}
try{dropStreamTable("volume_stream")}catch(ex){}
try{dropStreamEngine(`tes)}catch(ex){}
go;


max_number = 999999
t = table(
    2020.01.01 00:00:01 + 0..max_number as datetime, 
    0..max_number as `volume,
    take(`apple,max_number+1) as `id
)
// insert into t values(2020.01.01 00:00:00,, )
// t.sortBy!(`datetime)


def tes(volume_,id_){
    return volume_[0], id_[0]
}

class MyCumSum {
    def MyCumSum() {}
    def append(volume_,id_) {
        return volume_, id_
    }
}

share(table=streamTable(1000:0, `datetime`volume`id, [DATETIME,INT,STRING]), sharedName=`input_stream)
share(table=streamTable(1000:0, `datetime`volume`id, [DATETIME,INT,STRING]),sharedName=`volume_stream)
go;

createReactiveStateEngine(
    name="tes", 
    metrics=<[datetime, tes(volume,id) as `volume`id]>,
    // metrics=<[datetime, MyCumSum().append(volume,id)]>,
    dummyTable=input_stream, 
    outputTable=volume_stream
)

subscribeTable(
    tableName="input_stream",
    actionName="tes",
    handler=getStreamEngine("tes"),
    // batchSize=200000,
    // throttle=0.001,
    reconnect=true,
    msgAsTable=true
)

timing = now()
replay(inputTables=t, outputTables = input_stream, timeColumn=`datetime)

do{}while ((select count(*) from volume_stream)["count"][0] < max_number+1)
timing = now() - timing
print(timing)

1 Answer 1

0

Based on the example you provided, there seems to be some confusion about how the reactive state engine processes data with different subscription settings. Let me clarify how it works.

In the reactive state engine, when a batch of data enters the engine, data within the same group is processed row by row, while data across different groups is aggregated into vectors, and the function is called once per vector per group. The input to the function is indeed in vector form.

You can test the result by the example below:

try{unsubscribeTable(,tableName="input_stream",actionName="tes")}catch(ex){}
try{dropStreamTable("input_stream")}catch(ex){}
try{dropStreamTable("volume_stream")}catch(ex){}
try{dropStreamEngine(`tes)}catch(ex){}
go;


max_number = 999999
t = table(
    2020.01.01 00:00:01 + 0..max_number as datetime, 
    0..max_number as `volume,
    take(`apple`amazon,max_number+1) as `id
)
// insert into t values(2020.01.01 00:00:00,, )
// t.sortBy!(`datetime)


def tes(volume_,id_){
    return volume_[0], id_[0]
}

class MyCumSum {
    def MyCumSum() {}
    def append(volume_,id_) {
        return volume_, id_
    }
}

share(table=streamTable(1000:0, `datetime`volume`id, [DATETIME,INT,STRING]), sharedName=`input_stream)
share(table=streamTable(1000:0, `id1`datetime`volume`id, [STRING,DATETIME,INT,STRING]),sharedName=`volume_stream)
go;

createReactiveStateEngine(
    name="tes", 
    metrics=<[datetime, tes(volume,id) as `volume`id]>,
    // metrics=<[datetime, MyCumSum().append(volume,id)]>,
    dummyTable=input_stream, 
    outputTable=volume_stream,
    keyColumn="id"
)

subscribeTable(
    tableName="input_stream",
    actionName="tes",
    handler=getStreamEngine("tes"),
    // batchSize=200000,
    // throttle=0.001,
    reconnect=true,
    msgAsTable=true
)


replay(inputTables=t, outputTables = input_stream, timeColumn=`datetime)

enter image description here

In this case, you can see the same group will return the same result (which means the first volume of this group).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.