tag:blogger.com,1999:blog-18508356.post7482087188291653420..comments2026-02-13T11:24:21.556-05:00Comments on Just a little Python: GridFS: The MongoDB FilesystemRick Copelandhttp://www.blogger.com/profile/11612114223288841087noreply@blogger.comBlogger20125tag:blogger.com,1999:blog-18508356.post-68485661721844053962015-10-29T09:00:32.924-04:002015-10-29T09:00:32.924-04:00If you're trying to ensure that MongoDB places...If you're trying to ensure that MongoDB places each segment on a different shard, you can do that with shard tags (see https://docs.mongodb.org/manual/core/tag-aware-sharding/ https://docs.mongodb.org/manual/tutorial/administer-shard-tags/ and https://docs.mongodb.org/manual/core/tag-aware-sharding/). You could set up gridfs to store 'chunks' of the size of your segments (this is a client option). Then if you want to force every 'chunk #0' to shard 0, you would use the 'n' field in the chunks collection as the shard key and then tag shard 0 with 'chunk #0'. This is pretty fragile, however (since you have to tag each 'n' individually), and you might be better off writing a custom gridfs-like layer that included a shard_id in each chunk. Hopefuly this helps!Rick Copelandhttps://www.blogger.com/profile/01076166064171435559noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-20671091779581285142015-10-29T06:27:14.242-04:002015-10-29T06:27:14.242-04:00Segments are all the same size, what matters to me...Segments are all the same size, what matters to me is the efficiency since the segments retrieved by django server from the GridFS servers are to be processed and compiled together to yield the real file. <br />Is there any specific configuration needed on the servers that MongoDBs are installed to serve uploads/downloads?Amirnoreply@blogger.comtag:blogger.com,1999:blog-18508356.post-5937550298649778762015-10-26T02:56:48.218-04:002015-10-26T02:56:48.218-04:00If you're thinking you'll be storing the f...If you're thinking you'll be storing the file's segments as 'chunks' in gridfs, it would only really work well if all the segments were the same size (and it looks like that's not the case). You could still store the segments as independent files in gridfs, though.Rick Copelandhttps://www.blogger.com/profile/11612114223288841087noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-30385302534681543762015-10-25T13:57:58.770-04:002015-10-25T13:57:58.770-04:00Hi Rick, very informative post, thanks, I have a d...Hi Rick, very informative post, thanks, I have a django app that processes a file and yields N segments (segments are encrypted) of the file each to be stored on a different server, each segment's size being from KBs to 70MBs. do you think using GridFS would be a wise choice for such application?Amirnoreply@blogger.comtag:blogger.com,1999:blog-18508356.post-64859733331831656752013-04-17T15:25:16.318-04:002013-04-17T15:25:16.318-04:00Thanks for the comment!
I get the colors via Pygm...Thanks for the comment!<br /><br />I get the colors via Pygments (pygments.org)<br /><br />-RickRick Copelandhttps://www.blogger.com/profile/11612114223288841087noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-2597973000422363362013-04-17T14:23:31.105-04:002013-04-17T14:23:31.105-04:00This tutorial is very helpful. This really helped ...This tutorial is very helpful. This really helped me getting started with GridFS. Thank a lot.<br /><br /><br />btw can you let me know how did you get colors in python shell. Thanks.KSChttps://www.blogger.com/profile/17681347852605902095noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-45370723782438107172013-01-24T21:19:12.674-05:002013-01-24T21:19:12.674-05:00Hey Rick -
Does gridFS require you to have an und...Hey Rick - <br />Does gridFS require you to have an underlying nfs layer to provide storage, or does it store everything locally on each server? Are you aware of a limit in terms of the number of servers that can be a part of the same database with Mongo and gridFS? If we expand a network out to be 100 servers wide I think we would have issues with replica sets since mongo's limit is 12 members I thought?<br /><br />I guess I'm just thinking through practical applications for a large scale website.Weshttps://www.blogger.com/profile/04369782399110997443noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-71630733971126792792012-10-13T10:22:07.641-04:002012-10-13T10:22:07.641-04:00My pleasure!My pleasure!Rick Copelandhttps://www.blogger.com/profile/11612114223288841087noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-64939175013375366512012-10-05T15:14:21.809-04:002012-10-05T15:14:21.809-04:00Thanks Rick. That was very helpful!Thanks Rick. That was very helpful!Davidhttp://blogs.law.harvard.edu/dlarochellenoreply@blogger.comtag:blogger.com,1999:blog-18508356.post-83366092413600867402012-10-04T17:41:11.584-04:002012-10-04T17:41:11.584-04:00Thanks for the comment, David!
I don't have t...Thanks for the comment, David!<br /><br />I don't have the exact data on the number of files stored on the SourceForge MongoDB gridfs, but my off-the-cuff response is that 100s of millions of files should be fine as long as whatever you're using to query is well-indexed. Whether you can get good performance out of such a system depends on your usage patterns, hardware, etc.<br /><br />If your files are guaranteed to be under 16MB and frequently smaller than say 256kB, you're probably better off using bson.Binary objects to store them inside documents. Assuming you're usually reading or writing an entire file at once, this will perform much better than GridFS. With the advent of MongoDB 2.2, you might also consider storing your files in a separate database on the main server to reduce the impact that GridFS has on your "normal" MongoDB performance.<br /><br />Hope that helps!Rick Copelandhttps://www.blogger.com/profile/11612114223288841087noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-56416192493579738842012-10-04T17:20:04.208-04:002012-10-04T17:20:04.208-04:00How many files do you store? We currently store hu...How many files do you store? We currently store hundreds of millions of mostly small files on a single server. However, file systems don't handle this well.<br /><br />We're looking at moving to CouchDB or Mongo. We eventually want to be able to store billions of small files.Davidhttp://blogs.law.harvard.edu/dlarochellenoreply@blogger.comtag:blogger.com,1999:blog-18508356.post-4300862863624766962012-06-09T13:11:48.922-04:002012-06-09T13:11:48.922-04:00One benefit to using gridfs over the server's ...One benefit to using gridfs over the server's native filesystem is that gridfs will be available to your application servers automatically, without having to worry about setting up NFS. Another is that as you grow your MongoDB cluster, adding shards and replicas, the gridfs performance can scale as well. It's really a question (to me) of reducing the number of moving parts that can break. <br /><br />Already using MongoDB for the majority of your app's data and want to support multiple app servers, but some of your objects are too big to fit in a bson.Binary field? Gridfs is probably the shortest path to completion. Already have a filesystem shared between your app servers? Then that filesystem might be the best place to put stuff. Never intend to grow beyond the need for a single application server? Might as well use the filesystem.Rick Copelandhttps://www.blogger.com/profile/11612114223288841087noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-35789484457822559002012-06-09T09:24:17.387-04:002012-06-09T09:24:17.387-04:00what's the benefit of using this over using th...what's the benefit of using this over using the server's filesystem? wouldn't an association be simpler and faster then trying to fit a file in to the db? unless ofcourse you want to copy the db and ship it off to another server, but then again there are many file sync options out there..<br />pls enlighten meAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-18508356.post-58357337948481167942012-05-26T13:11:48.202-04:002012-05-26T13:11:48.202-04:00mod_gridfs looks interesting, thanks for pointing ...<a href="https://bitbucket.org/onyxmaster/mod_gridfs/" rel="nofollow">mod_gridfs</a> looks interesting, thanks for pointing it out!Rick Copelandhttps://www.blogger.com/profile/11612114223288841087noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-44689120214841219732012-05-26T12:06:42.214-04:002012-05-26T12:06:42.214-04:00Shameless self-promotion: http://xm.x-infinity.com...Shameless self-promotion: http://xm.x-infinity.com/2012/04/as-were-to-move-our-terabytes-of-files.htmlAristarkhhttps://www.blogger.com/profile/04009203277358585561noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-73296729233102636162012-05-25T21:05:54.316-04:002012-05-25T21:05:54.316-04:00Thanks for the comment. I'll definitely have t...Thanks for the comment. I'll definitely have to check out Khartoum. I should also mention <a href="https://github.com/mdirolf/nginx-gridfs" rel="nofollow">nginx-gridfs</a>, an nginx module with similar functionality. I haven't used it, but if you want to serve frequently-changing large files, it's probably worth checking out.Rick Copelandhttps://www.blogger.com/profile/11612114223288841087noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-12297819927994937562012-05-25T21:01:42.077-04:002012-05-25T21:01:42.077-04:00I'm not sure what exactly you were doing to ca...I'm not sure what exactly you were doing to cause the problems you describe, but your experience certainly doesn't square with mine working at SourceForge. There's no magic to GridFS; reading any file under 256k will cause two document fetches from MongoDB; up to 512k, 3 document fetches, etc. Remember that *all* the gridfs magic (well, except for md5 computation, IIRC) happens in the *client*.<br /><br />Perhaps if you're constantly writing to GridFS, you might cause problems, but that's due more to <a href="http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html" rel="nofollow">MongoDB's global read/write lock</a> that I covered in a previous post.<br /><br />If you actually have any test cases that cause performance degradation due to gridfs usage, I'd be more than interested in seeing them....Rick Copelandhttps://www.blogger.com/profile/11612114223288841087noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-44566550127207764172012-05-25T18:30:44.304-04:002012-05-25T18:30:44.304-04:00After playing with GridFS a bit, people start to w...After playing with GridFS a bit, people start to wonder how they can serve files to web browsers from it. I've written a Python server for this, at https://bitbucket.org/btubbs/khartoum/.Brenthttps://www.blogger.com/profile/01161283368069280745noreply@blogger.comtag:blogger.com,1999:blog-18508356.post-38847546552485030302012-05-25T18:27:51.865-04:002012-05-25T18:27:51.865-04:00the truth is, gridfs is not production ready. You ...the truth is, gridfs is not production ready. You will have insane problems and it will reduce performance on your main collections by 70%.<br /><br />Either use 2 mongocluster just for gridfs or just dont :) and use file system.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-18508356.post-43299640822901331392012-05-25T14:24:27.603-04:002012-05-25T14:24:27.603-04:00Great intro to GridFS, thanks Rick. While I'd ...Great intro to GridFS, thanks Rick. While I'd heard of GridFS, I'd never paid attention to what it was, how it worked, or how to use it. Your post explains it all very well - cool stuff!Tim Van Steenburghhttps://www.blogger.com/profile/05830461287807787990noreply@blogger.com