Comments on Just a little Python: MongoDB's Write Lock

This scenario wasn't tried, and it really fall...

2012-05-30T21:17:19.074-04:00

This scenario wasn't tried, and it really falls outside of the problem that the 2.0 write lock improvements addresses, as most of the documents you were updating were likely already in RAM.

I will point out, however, that during bulk updates the write lock *is* yielded, which is why you saw 99.9% write lock and not 100.0%. So reads were making *some* progress.

I would be interested in seeing a detailed analysis of your scenario so we can quantify just how much bulk updates *do* slow reads (and also to put it in context with other SQL and NoSQL databases). If you have code that can reproduce that behavior, I'd love to see it.

Thanks so much for the comment!

What we have found is that if you have to do an up...

2012-05-30T20:08:50.836-04:00

What we have found is that if you have to do an update in a collection that is doing bulk update of documents or a single document with an array of considerable size and you have an index on that array - the write lock is held for 99.9% and never yeilded -- This affects the read performance ..

I don't know if this kind of scenario was tried in this test case to see the write lock woes

Hi Rick, Thanks for the clear and concise explana...

2012-05-13T14:52:42.336-04:00

Hi Rick,

Thanks for the clear and concise explanation of MongoDB locking.

Regards,
Aparna

Sorry, didn't mean to mislead with the graphs ...

2012-04-21T12:51:03.510-04:00

Sorry, didn't mean to mislead with the graphs -- I just wanted to show what the effect of the write lock was in the absence of faulting for the first graph.

As to making the array "read-only", that's a little misleading. Looking at this post http://victortrac.com/EC2_Ephemeral_Disks_vs_EBS_Volumes we see that a 4x RAID-0 array on aws ephemeral storage gets approx. 165 random seeks per second. If we scale that down to 1 ephemeral disk (which I was using), we are looking at 40-50 random seeks per second (and my benchmark was calculated to cause a random seek for during a page fault for most of the writes). This means that the *theoretical* maximum number of random page faults per second is 40-50 for the hardware I was using.

In other words, we were saturating the disk bandwidth due to a pathological benchmark setup. The point was to see how this "worst-case scenario" would affect read throughput, and as the graphs show, there was indeed a problem in 1.8 since the write lock was being held during all those expensive random seeks.

tl;dr: the disk can't handle >50 random seeks per second, so that's the max write fault rate we can even consider for the benchmark.

PS: Your comment about "READ-ONLY for data sets larger than RAM" is only true if you have writes faulting a majority of the time. In most real-world scenarios, your working set (data you frequently access) is *significantly* smaller than your total data set.

The only thing we really care about here is the working set, so it's true that MongoDB becomes *much* slower if your RAM is insufficient to hold your working set, but then again, that's the case with any SQL-based solution as well. And 10gen has *always* recommended that your RAM be sufficient to contain your working set. And if it does, your performance looks like the first graph, regardless of whether your *total* data set fits in RAM or not.

Great data, this was very interesting. But please...

2012-04-21T03:09:16.742-04:00

Great data, this was very interesting. But please note the different scales of the X-axis between the two graphs! That's a bit misleading.

Before this change Mongo was mandatory READ-ONLY for data sets larger than RAM. (Note the drops to 0 queries/sec). After the change, you now see "just" a 30% performance loss due to 50 write-faults / sec.

It's only an 'improvement' relative to what existed before. 50 writes / sec is a very low load to trade for 1000 reads/sec. This means Mongo is still essentially READ-ONLY for data sets larger than RAM.

This issue on FreeBSD is being tracked here https...

2012-01-30T12:02:15.103-05:00

This issue on FreeBSD is being tracked here

https://jira.mongodb.org/browse/SERVER-663

Since I was trying to measure the effect of writes...

2012-01-20T09:34:35.725-05:00

Since I was trying to measure the effect of writes on reads, I used the 'fire and forget' model. I did, however, call getLastError after the last write to make sure they had all completed, and the time elapsed from the beginning of the writes to the last one completed was used in the 'writes per second' calculation.

Very good and interesting article. Comments have a...

2012-01-20T00:43:33.301-05:00

Very good and interesting article. Comments have added more insight to this.
One question: I know MongoDB supports different write consistency models like fire and forget, safe and replica_safe. On what model these tests were carried out? (I understand it may not be replica_safe) What will happen if the write consistency model is changed?

@felipe: If you can keep your working set in RAM, ...

2012-01-09T10:25:06.924-05:00

@felipe: If you can keep your working set in RAM, then MongoDB should have no problem at all scaling writes. One way you can do this is by using MongoDB's auto-sharding feature, specifically built to scale writes. In that case, you have a write lock per *shard*, not per *database*, plus you have more RAM (split over your shard servers), so I'd say you should be able to scale an ebay like site. You might want to get a consultation with 10gen to make sure you size everything appropriately, though.

Rick, always wonderful to see some empirical testi...

2012-01-09T09:00:56.414-05:00

Rick, always wonderful to see some empirical testing. I'm planning an e-bay like site and one of my main motivations behind chosing MongoDB is the scalability.

What do you reckon about an ebay like site which eventually would have heavy writing?

thank you, nice work

2012-01-05T14:33:12.894-05:00

thank you, nice work

@charsyam - no, the ephemeral disk is the non-EBS ...

2012-01-04T09:18:33.640-05:00

@charsyam - no, the ephemeral disk is the non-EBS volume. I used it so that network latency wouldn't affect the benchmarks, as there have been reports of EBS volumes giving inconsistent performance. In a real MongoDB deployment, you'd need to either a) use EBS or b) use replica sets to achieve durability on EC2, but for the benchmark I thought it was the best option available.

Thank Rick~~ i have a question. does ephemeral di...

2012-01-04T09:05:35.225-05:00

Thank Rick~~

i have a question.
does ephemeral disk means EBS?

thank you.

Thanks for the comments! @Anonymous re: BSD: I...

2012-01-03T08:34:20.529-05:00

Thanks for the comments!

@Anonymous re: BSD: I'm not actually a MongoDB developer, but I'll pass that info along to 10gen.

Excellent summary. Thanks for this very informativ...

2012-01-03T08:11:16.362-05:00

Excellent summary. Thanks for this very informative post.

Nice work!

2012-01-03T05:00:22.255-05:00

Nice work!

Regarding "hot" and "cold" pag...

2012-01-03T01:23:39.021-05:00

Regarding "hot" and "cold" pages, you guys could really improve performance on FreeBSD if you'd stop using msync() and start using fsync(). There was a discussion I partook in a few weeks ago with a user complaining about horrible performance in MongoDB on FreeBSD which resulted in analysis of your use of msync(), which is obviously intended for Linux. Given that you call msync() with the entire mapped region, you're effectively calling fsync() -- and the user changed your code to do exactly that and saw a tremendous speed improvement. Kernel developers commented as well. I urge you guys to consider improving this too. The thread in question:

http://www.mail-archive.com/freebsd-stable@freebsd.org/msg118225.html

And the performance discovery after I recommended the user try using fsync() instead (but I recommend you read the entire thread):

http://www.mail-archive.com/freebsd-stable@freebsd.org/msg118283.html

Nice. All my data fits in memory atm but good to k...

2012-01-03T01:09:44.747-05:00

Nice. All my data fits in memory atm but good to know that i wont have to worry about the occasional fault write if it ever doesnt.

Very interesting and useful. Thanks for sharing th...

2012-01-02T11:57:02.509-05:00

Very interesting and useful. Thanks for sharing this!

2011-12-31T19:58:06.824-05:00

This comment has been removed by a blog administrator.

@Francis -- the code is already open-sourced. I li...

2011-12-31T19:03:17.837-05:00

@Francis -- the code is already open-sourced. I linked it in the article: https://sourceforge.net/u/rick446/random/

any chance you can open-source the scripts used to...

2011-12-31T17:13:48.528-05:00

any chance you can open-source the scripts used to generate this data/analysis?