0

Is it possible to limit the number of rows read from a table with an Athena query, and if so how?

I looked at the LIMIT statement, but it seems LIMIT doesn't affect how many rows are scanned, but simply limits the number of rows returned from a query. But I'm not looking to reduce the number of rows returned, but restrict the query to be performed on, say, the top N rows in a table. For example, I have a table that has a date column, call it mydate, that is the same for every row in the table, mytable. I want to return a single value of mydate. I could do this with:

select 
     min(mydate)
from
     mytable

but this will perform a full table scan. What I really want is for a query to take the table I provide it and "look" at the first row and return mydate, so it only has to scan a single record.

I use the SAS statistical software and this can be done in it by running:

data work.temp;         /*create a temp dataset in the working directory*/
set  mytable(obs=2);    /*read in only the first two rows from the mytable table*/
run;

I also use R and in the R language, I could do this even more simply and directly using the following code:

mytable[1, ]$mydate

So, I'm looking for a way, if possible, to accomplish the same thing that SAS and R are doing above.

Thanks.

2
  • 1
    The main way to optimize the big data queries in Athena is using partitions. The engine knows how to translate the hints from your query to the relevant partitions and the folders of the data. Commented May 10, 2024 at 5:14
  • Right. I already have partitions set up, but was just looking for way to further optimize. Commented May 11, 2024 at 18:04

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.