SQL: using count() vs. keeping a separate field

Question

I have a question regarding SQL performance and was hoping someone would have the answer.

I have the database table tbl_users and I want to get the total number of users I have. I could write it as SELECT COUNT(*) FROM tbl_users. I presume such query would have performance implications were I to have a handful of users vs. several millions of them. (So, assumption #1 is that the more rows I have, the more resources this query will consume).

In this particular case I need to run this query at a relatively high frequency and each time I need to get up-to-date data (so, caching is not an option).

Assuming my assmption #1 is correct, I then thought of structuring it the following way:

create tbl_stats with a field userCounter
each time there is an insert in tbl_users, userCounter is updated +1
each time I need to get my user count, I can pull that one field from tbl_stats

Now, I realize that by doing it this way, the data in userCounter is technically a duplicate, which is bad form.

So, will my first query (assuming millions of rows of data) consume that many resources to warrant me to implement my alternative design? If so (or if possibly yes), then is my alternative design consistent with best practices?

Matthew Burr · Accepted Answer · 2011-02-24 19:37:56Z

5

If your table is indexed, which it almost certainly will be, then the performance of select count(*) probably will not be as bad as you might anticipate - even if you have millions of rows.

But, if it does become a concern, then rather than roll your own solution, look into using an indexed view.

answered Feb 24, 2011 at 19:37

Matthew Burr

6663 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

RichardTheKiwi Over a year ago

If you are working on a high TPM system, a 1000ms query is very significant

Mauro Over a year ago

thanks. my main concern was regarding performance, which apparently isn't such a big issue in this case. i also never knew of indexed views and im going to research it a bit.

RandomWebGuy · Accepted Answer · 2011-02-24 19:40:59Z

3

I have a database table with almost 5 million records, the following query returns in less than a second

select count(userID) from tblUsers

This query returns in 2 seconds

select count(*) from tblUsers

I'd personally just go with select count() rather than creating a duplicate field

answered Feb 24, 2011 at 19:40

RandomWebGuy

1,43912 silver badges23 bronze badges

1 Comment

Mauro Over a year ago

thanks. it's good to have some real performance data to help me make an assessment.

Remus Rusanu · Accepted Answer · 2011-02-24 19:44:22Z

3

On some systems you can ask the system to maintain the counts for you. For example, in SQL Server you can have an indexed view on the count:

create view vwCountUsers
with schema binding
as
select count_big(*) as count
from dbo.tbl_users;

create clustered index cdxCountUsers on vwCountUsers (count);

The system will maintain the count for you and will always be available at nearly no cost.

answered Feb 24, 2011 at 19:44

Remus Rusanu

296k42 gold badges459 silver badges583 bronze badges

2 Comments

RichardTheKiwi Over a year ago

+1 essentially doing what the trigger would, to perform delta-updating. You have the same problems with concurrency and blocking on high TPM/TPS systems.

Remus Rusanu Over a year ago

True, concurrency blocking will be a serious issue under heavy load. You could, as an alternative, query sys.partitions.rows, but that one is not guaranteed to be accurate (although we try very hard to keep it accurate).

Chris Van Opstal · Accepted Answer · 2011-02-24 19:43:25Z

2

I think this is one of those scenarios where you really need to measure the performance to make a good decision. I would wager that a simple COUNT() isn't going to create enough latency that you would need to implement your proposed work-around.

If you are worried I would encapsulate your COUNT() in a function or stored procedure so you can quickly swap it out later if performance does become a problem.

edited Feb 24, 2011 at 19:43

answered Feb 24, 2011 at 19:36

Chris Van Opstal

37.7k10 gold badges77 silver badges91 bronze badges

1 Comment

Mauro Over a year ago

excuse my ignorance, but would an indexed view be an implementation of a stored procedure?

RichardTheKiwi · Accepted Answer · 2011-02-24 19:40:58Z

1

If you have a desperate need and a real business case for up to the minute accurate counts, then the trigger would be the way to go. Just make sure it caters for all multi-user issues such as concurrency and transactions.

It could become a bottleneck because instead of 5 transactions being able to insert into a new table, they will queue up waiting to update the userCounter table, and you may even get deadlocks.

There are other options for less accurate counts, but if you want accurate then there are very few other choices, but I'll try to think of some:

You could partition the data and in userCounter store a count by day. If the data only gets added for the current day, do a select sum(dailycount) from counter + select count(*) from table where {date=today}
You could at least use the nolock or readpast options to lessen resource usage:

select * from tbl with (readpast)
select * from tbl with (nolock)

answered Feb 24, 2011 at 19:40

RichardTheKiwi

108k28 gold badges206 silver badges269 bronze badges

1 Comment

Mauro Over a year ago

you bring up a good point re: concurrency and transactions. throws my proposed alternative design out of the window.

HLGEM · Accepted Answer · 2011-02-24 19:46:07Z

1

There are somethings it makes sense to precalculate for performace reasons (comlex calculations over years of data). That's why data warehouses exists much of the time to speed reporting. Select count(*) is generally not one of them if you have any indexing on the table at all. There are far worse performance problems to solve than that. I get 1 second to return the count on a table with 13 million rows.

I'm all about writing code that will is more likely to perform well than the alternative (avoiding correlated subqueries, using set-based operatiosn instead of cursors, having sargable where clauses), but this is a mirco optimization that should not be addressed until there is a real performance problem.

answered Feb 24, 2011 at 19:46

HLGEM

97k15 gold badges120 silver badges191 bronze badges

Collectives™ on Stack Overflow

SQL: using count() vs. keeping a separate field

6 Answers 6

2 Comments

1 Comment

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

1 Comment

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related