0

I have a question regarding SQL performance and was hoping someone would have the answer.

I have the database table tbl_users and I want to get the total number of users I have. I could write it as SELECT COUNT(*) FROM tbl_users. I presume such query would have performance implications were I to have a handful of users vs. several millions of them. (So, assumption #1 is that the more rows I have, the more resources this query will consume).

In this particular case I need to run this query at a relatively high frequency and each time I need to get up-to-date data (so, caching is not an option).

Assuming my assmption #1 is correct, I then thought of structuring it the following way:

  • create tbl_stats with a field userCounter
  • each time there is an insert in tbl_users, userCounter is updated +1
  • each time I need to get my user count, I can pull that one field from tbl_stats

Now, I realize that by doing it this way, the data in userCounter is technically a duplicate, which is bad form.

So, will my first query (assuming millions of rows of data) consume that many resources to warrant me to implement my alternative design? If so (or if possibly yes), then is my alternative design consistent with best practices?

0

6 Answers 6

5

If your table is indexed, which it almost certainly will be, then the performance of select count(*) probably will not be as bad as you might anticipate - even if you have millions of rows.

But, if it does become a concern, then rather than roll your own solution, look into using an indexed view.

Sign up to request clarification or add additional context in comments.

2 Comments

If you are working on a high TPM system, a 1000ms query is very significant
thanks. my main concern was regarding performance, which apparently isn't such a big issue in this case. i also never knew of indexed views and im going to research it a bit.
3

I have a database table with almost 5 million records, the following query returns in less than a second

select count(userID) from tblUsers

This query returns in 2 seconds

select count(*) from tblUsers

I'd personally just go with select count() rather than creating a duplicate field

1 Comment

thanks. it's good to have some real performance data to help me make an assessment.
3

On some systems you can ask the system to maintain the counts for you. For example, in SQL Server you can have an indexed view on the count:

create view vwCountUsers
with schema binding
as
select count_big(*) as count
from dbo.tbl_users;

create clustered index cdxCountUsers on vwCountUsers (count);

The system will maintain the count for you and will always be available at nearly no cost.

2 Comments

+1 essentially doing what the trigger would, to perform delta-updating. You have the same problems with concurrency and blocking on high TPM/TPS systems.
True, concurrency blocking will be a serious issue under heavy load. You could, as an alternative, query sys.partitions.rows, but that one is not guaranteed to be accurate (although we try very hard to keep it accurate).
2

I think this is one of those scenarios where you really need to measure the performance to make a good decision. I would wager that a simple COUNT() isn't going to create enough latency that you would need to implement your proposed work-around.

If you are worried I would encapsulate your COUNT() in a function or stored procedure so you can quickly swap it out later if performance does become a problem.

1 Comment

excuse my ignorance, but would an indexed view be an implementation of a stored procedure?
1

If you have a desperate need and a real business case for up to the minute accurate counts, then the trigger would be the way to go. Just make sure it caters for all multi-user issues such as concurrency and transactions.

It could become a bottleneck because instead of 5 transactions being able to insert into a new table, they will queue up waiting to update the userCounter table, and you may even get deadlocks.

There are other options for less accurate counts, but if you want accurate then there are very few other choices, but I'll try to think of some:

  1. You could partition the data and in userCounter store a count by day. If the data only gets added for the current day, do a select sum(dailycount) from counter + select count(*) from table where {date=today}

  2. You could at least use the nolock or readpast options to lessen resource usage:

select * from tbl with (readpast)
select * from tbl with (nolock)

1 Comment

you bring up a good point re: concurrency and transactions. throws my proposed alternative design out of the window.
1

There are somethings it makes sense to precalculate for performace reasons (comlex calculations over years of data). That's why data warehouses exists much of the time to speed reporting. Select count(*) is generally not one of them if you have any indexing on the table at all. There are far worse performance problems to solve than that. I get 1 second to return the count on a table with 13 million rows.

I'm all about writing code that will is more likely to perform well than the alternative (avoiding correlated subqueries, using set-based operatiosn instead of cursors, having sargable where clauses), but this is a mirco optimization that should not be addressed until there is a real performance problem.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.