How do I perform the SQL Join equivalent in MongoD

2019-08-21 18:08发布

How do I perform the SQL Join equivalent in MongoDB?

For example say you have two collections (users and comments) and I want to pull all the comments with pid=444 along with the user info for each.

  { uid:12345, pid:444, comment="blah" }
  { uid:12345, pid:888, comment="asdf" }
  { uid:99999, pid:444, comment="qwer" }

  { uid:12345, name:"john" }
  { uid:99999, name:"mia"  }

Is there a way to pull all the comments with a certain field (eg. ...find({pid:444}) ) and the user information associated with each comment in one go?

At the moment, I am first getting the comments which match my criteria, then figuring out all the uid's in that result set, getting the user objects, and merging them with the comment's results. Seems like I am doing it wrong.

标签: mongodb join
2楼-- · 2019-08-21 18:45

You can join two collection in Mongo by using lookup which is offered in 3.2 version. In your case the query would be


or you can also join with respect to users then there will be a little change as given below.


It will work just same as left and right join in SQL.

3楼-- · 2019-08-21 18:50

There is a specification that a lot of drivers support that's called DBRef.

DBRef is a more formal specification for creating references between documents. DBRefs (generally) include a collection name as well as an object id. Most developers only use DBRefs if the collection can change from one document to the next. If your referenced collection will always be the same, the manual references outlined above are more efficient.

Taken from MongoDB Documentation: Data Models > Data Model Reference > Database References

4楼-- · 2019-08-21 18:52

This page on the official mongodb site addresses exactly this question:

When we display our list of stories, we'll need to show the name of the user who posted the story. If we were using a relational database, we could perform a join on users and stores, and get all our objects in a single query. But MongoDB does not support joins and so, at times, requires bit of denormalization. Here, this means caching the 'username' attribute.

Relational purists may be feeling uneasy already, as if we were violating some universal law. But let’s bear in mind that MongoDB collections are not equivalent to relational tables; each serves a unique design objective. A normalized table provides an atomic, isolated chunk of data. A document, however, more closely represents an object as a whole. In the case of a social news site, it can be argued that a username is intrinsic to the story being posted.

5楼-- · 2019-08-21 18:52

It depends on what you're trying to do.

You currently have it set up as a normalized database, which is fine, and the way you are doing it is appropriate.

However, there are other ways of doing it.

You could have a posts collection that has imbedded comments for each post with references to the users that you can iteratively query to get. You could store the user's name with the comments, you could store them all in one document.

The thing with NoSQL is it's designed for flexible schemas and very fast reading and writing. In a typical Big Data farm the database is the biggest bottleneck, you have fewer database engines than you do application and front end servers...they're more expensive but more powerful, also hard drive space is very cheap comparatively. Normalization comes from the concept of trying to save space, but it comes with a cost at making your databases perform complicated Joins and verifying the integrity of relationships, performing cascading operations. All of which saves the developers some headaches if they designed the database properly.

With NoSQL, if you accept that redundancy and storage space aren't issues because of their cost (both in processor time required to do updates and hard drive costs to store extra data), denormalizing isn't an issue (for embedded arrays that become hundreds of thousands of items it can be a performance issue, but most of the time that's not a problem). Additionally you'll have several application and front end servers for every database cluster. Have them do the heavy lifting of the joins and let the database servers stick to reading and writing.

TL;DR: What you're doing is fine, and there are other ways of doing it. Check out the mongodb documentation's data model patterns for some great examples.

6楼-- · 2019-08-21 18:53

Here's an example of a "join" * Actors and Movies collections:

It makes use of .mapReduce() method

* join - an alternative to join in document-oriented databases

7楼-- · 2019-08-21 18:53

I think, if You need normalized data tables - You need to try some other database solutions.

But I've foun that sollution for MOngo on Git By the way, in inserts code - it has movie's name, but noi movie's ID.


You have a collection of Actors with an array of the Movies they've done.

You want to generate a collection of Movies with an array of Actors in each.

Some sample data

 db.actors.insert( { actor: "Richard Gere", movies: ['Pretty Woman', 'Runaway Bride', 'Chicago'] });
 db.actors.insert( { actor: "Julia Roberts", movies: ['Pretty Woman', 'Runaway Bride', 'Erin Brockovich'] });


We need to loop through each movie in the Actor document and emit each Movie individually.

The catch here is in the reduce phase. We cannot emit an array from the reduce phase, so we must build an Actors array inside of the "value" document that is returned.

The code
map = function() {
  for(var i in this.movies){
    key = { movie: this.movies[i] };
    value = { actors: [ ] };
    emit(key, value);

reduce = function(key, values) {
  actor_list = { actors: [] };
  for(var i in values) {
    actor_list.actors = values[i].actors.concat(actor_list.actors);
  return actor_list;

Notice how actor_list is actually a javascript object that contains an array. Also notice that map emits the same structure.

Run the following to execute the map / reduce, output it to the "pivot" collection and print the result:

printjson(db.actors.mapReduce(map, reduce, "pivot")); db.pivot.find().forEach(printjson);

Here is the sample output, note that "Pretty Woman" and "Runaway Bride" have both "Richard Gere" and "Julia Roberts".

{ "_id" : { "movie" : "Chicago" }, "value" : { "actors" : [ "Richard Gere" ] } }
{ "_id" : { "movie" : "Erin Brockovich" }, "value" : { "actors" : [ "Julia Roberts" ] } }
{ "_id" : { "movie" : "Pretty Woman" }, "value" : { "actors" : [ "Richard Gere", "Julia Roberts" ] } }
{ "_id" : { "movie" : "Runaway Bride" }, "value" : { "actors" : [ "Richard Gere", "Julia Roberts" ] } }

登录 后发表回答