Comments on Just a little Python: Getting Started with MongoDB and Python

My experience with MongoDB has generally used it a...

2016-02-01T14:55:27.167-05:00

My experience with MongoDB has generally used it as the main data store for a new application. Migrations, when they happened, were typically Python scripts that extracted data from one source and inserted it into the database. In some cases it was CSV data, in others it was result sets from queries in MySQL or Postgres databases.

Rick, thanks for replying so quickly. Would you m...

2016-02-01T13:21:34.071-05:00

Rick, thanks for replying so quickly. Would you mind sharing, what method have you used in your real world projects to get the information into MongoDB? What physical form did the original info that you had to deal with come in and how did you dump it in the collection?

MongoDB "documents," as it seems you'...

2016-02-01T07:27:40.532-05:00

MongoDB "documents," as it seems you've discovered, are completely different from Word documents. Since MongoDB stores structured data (JSON/BSON), extracting the structure you're looking for in a pile of unstructured Word documents is something you'd have to configure manually, maybe with an ETL tool (I don't really know much about the market there) or some manual coding.

For Excel documents, the story is a little better, as at least Excel has rows and columns, which can be exported as a CSV file. CSV files can then be imported using the mongoimport (I think that's the name) tool that comes with the MongoDB distribution. Of course, what you end up with there is documents that look a whole lot like table rows, since they actually came from spreadsheet rows.

Hope this helps!

Sorry for such a real basic question here, but I&#...

2016-01-31T22:27:48.101-05:00

Sorry for such a real basic question here, but I'm trying to find some real world examples of how people actually get a pile of documents (physical documents like word docs or excel sheets) into a MongoDB collection. I've read a lot of articles that demonstrate how you can manually code information with JSON syntax using the doc ID, first name field and value, last name filed and value, etc. But, if I've got a folder full of say 10,000 word docs with customer info in each one and I want to be able to query that pile of docs and pull up say a result set that contains all customers from Iowa, how would I do that? How are all those documents parsed into JSON and then dumped into document objects into the collection? Is there some ETL type of program that does that? (and if so, what would it be?) I've googled like crazy trying to find an answer to that, but come up with zilch.

Cool tool, Bob. Thanks for the comment!

2012-12-13T11:28:16.989-05:00

Cool tool, Bob. Thanks for the comment!

Cool rundown, thanks Rick! In case anyone who is ...

2012-12-13T11:25:26.063-05:00

Cool rundown, thanks Rick! In case anyone who is learning MongoDB finds it useful, I just launched a free tool called querymongo.com that translates MySQL syntax into MongoDB syntax. Hope someone can use it to get up to speed faster!

Glad to help!

2012-07-22T14:09:42.529-04:00

Glad to help!

Ah, I worked out the "perfect match" ju...

2012-07-22T13:20:23.032-04:00

Ah, I worked out the "perfect match" just after posting this. My fault for trying to do things in a rush. Then I spent a while looking for how to use regexes.

that did the job and got me a bit further on.

Many Thanks.

Thanks for the comment. I'm sorry if the examp...

2012-07-22T10:11:23.662-04:00

Thanks for the comment. I'm sorry if the example was confusing. The example I used was looking for blog articles with an *exact match* with one of the app_config_ids passed in.

$in is always looking for an exact match in a list. If you want to find a partial match (such as a prefix), you need to use either $regex or a compiled python regular expression. For instance, if you're trying to find the articles starting with 'Hadoop', the query would be

articles.find({'title': {'$regex': '^Hadoop.*'}})

Again, sorry for the confusion. I'll try to be more explicit in the future. Thanks again for commenting.

Warning to others blog_post.find({'state'...

2012-07-22T09:20:10.978-04:00

Warning to others

blog_post.find({'state':'published','app_config_id':{'$in':app_config_ids}})

I just wasted an hour to find this syntax does not work.

I have a collection "articles" with a field "title" and one document with title
"Hadoop Development Environment OS X"

these work

temp = articles.find({"title": "Hadoop Development Environment OS X"});

temp = articles.find({"title":{"$in":["Hadoop Development Environment OS X"]}})

This returns nothing.
keys = ["Hadoop"]
temp = articles.find({"title":{"$in":keys}})

i.e all I can get back os a perfect match not a partial match.

Either I am misunderstanding, but I worked form an internet example the author said works fine, or there is a problem with the driver.

I am fairly confident I did not misunderstand the mongo docs.

FYI I deleted the user and DB I used in the post, ...

2012-01-18T13:15:03.506-05:00

FYI I deleted the user and DB I used in the post, so don't go trying any funny business ;-).