0

I understand that this is not possible in DynamoDB.

DynamoDB Streams seem to be an option, but...

My struggle involves being able to create aggregates over a specific period of time. I want to be able to get the top XX in the past 24 hours at any point in time.

Assume I have categories such as Topics (e.g., IoT, AI, AML, etc.). I want to count the top trending topics over a period of time, such as the top 10 trending topics in the past 24 hours or 72 hours. Based on number of posts in these topics.

How can I achieve something like this with DynamoDB?

I use Topic as my PK, and Timestamp#UserId as SK. So each time a user make a post on a topic, a new document is created.. I save some attributes in the document, but that's not relevant here.

Assume:

  1. There are thousands of inserts every second (i.e thousands of posts on different topics)
  2. Getting the trending topics isn't a real-time requierment. It can be something that gets calculated every 1h or so and posted somewhere else.

Any pointers is appreciated on best approach, and if another stack would make more sense.

Cheers

9
  • 1
    Do you have a fixed range of topics? Do you expect more than 1000 writes per second per topic? Commented Jan 20, 2024 at 14:10
  • 1
    Might be a use case for custom CloudWatch Metrics, populated via DynamoDB Streams CDC, and CloudWatch Insights. Commented Jan 20, 2024 at 15:42
  • 1
    Having a coudwatch metric for each of hundreds of topics would be wildly expensive, though that would be a very convenient way to get the time windows. Commented Jan 20, 2024 at 15:45
  • 1
    @LeeroyHannigan Not fixed. Topics can be created by any user any point of time. Kind of like "hashtag" concept in X/Twitter. Commented Jan 21, 2024 at 0:04
  • 1
    @erik258 correct. And two more issues would be the limitaiton of the document size (400K) and cost associated as I'll be charged for the full size every time. Still thinking of another solution, and will post the one I eventually implement. But keen to hear more thoughts and arguments from others. Commented Jan 22, 2024 at 1:00

1 Answer 1

1

I ended up doing this:

Lambda to write to DDB aggregate table (and original table) from same function

Aggregate table has TimestampHour as PK and Country#City#Topic as SK. Attributes: Count (for now)

The Lambda function will find the right doc using the PK and SK, and update count

EventBridge runs every 1h, invoke another Lambda function, which would go through the last 24 hours documents in the aggregate table, and create trending scores (based on sum counts), rank top 10 and descend order (for each country, city and globally). Write results to S3 json files with proper easy strcuture to construct URLs and retrieve trending topic per country, city or global.

S3 has CloudFront caching its content. So the app query that directly. Pretty fast.

S3 has LCM policy that expire objects that aren't updated in the last 24 hours

Seems to do the trick for now, will continue to optimise.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.