JavaEar 专注于收集分享传播有价值的技术资料

Migrating from SQL Server DB to MongoDB: questions about whether to embed or to reference

I'm working on my first NoSql design and need some help with how much to normalize.

I had a simple relational database:

   Users (Id, UserName, Password, Email, Name, FacebookId, DateCreated)
   Questions (Id, UserId, Question, DateCreated)
   Answers (Id, QuestionId, Answer, DateCreated)

I'd like to convert this to a Mongoose schema. I am not sure how much I have to embed and how much I have to reference. Here are some of my ideas:

Have just one collection Users, and embed everything in it:

mongoose.model('Users', {
        userName: String, 
        password: String,
        email: String,
        name: String,
        facebookId: String
        dateCreated: Date
        questions : [{ question: String, date: Date, answers: [{ answer: String, answeredByUserId: {type: Schema.Types.ObjectId, ref: 'User' }}] }]
    });

Have 2 collections (I'll limit the max # of number of answers to 10)

mongoose.model('Users', {   
    userName: String, 
    password: String,
    email: String,
    name: String,
    facebookId: String
    dateCreated: Date    });

mongoose.model('Questions', {   
    question: String, 
    dateCreated: Date,
    askedByUserId: {type: Schema.Types.ObjectId, ref: 'User' },
    answers: [{ answer: String, date: Date, answeredByUserId: {type: Schema.Types.ObjectId, ref: 'User' } }] }) });

Have 3 separate collections (no limit of # of answers):

mongoose.model('Users', {   
    userName: String, 
    password: String,
    email: String,
    name: String,
    facebookId: String
    dateCreated: Date    });

mongoose.model('Questions', {   
    question: String, 
    dateCreated: Date,
    askedByUserId: {type: Schema.Types.ObjectId, ref: 'Users' } })  });

mongoose.model('Answers', {     
    answer: String, 
    dateCreated: Date,
    answeredByUserId: {type: Schema.Types.ObjectId, ref: 'Users' }
    questionId: {type: Schema.Types.ObjectId, ref: 'Questions' } })  });

These are the kinds of queries I'll be making:

  • GetAllUsers
  • GetAllQuestions
  • GetAllQuestionsWithAnswers
  • GetAllQuestionsAskedByUser(userId)
  • GetAllAnswersAnsweredByUser(userId)

Given the last two queries, does it make sense to keep a reference to questions in the Users collection for faster access?

Have references to Questions and Answers in the User table:

mongoose.model('Users', {   
    userName: String, 
    password: String,
    email: String,
    name: String,
    facebookId: String,
    dateCreated: Date,
    Questions: [{ type: Schema.Types.ObjectId, ref: 'Questions' }],
    Answers: [{ type: Schema.Types.ObjectId, ref: 'Answers' }]    });

Am I thinking in the right direction? In my scenario, which schema would be the best option?

1个回答

    最佳答案
  1. I like your way of thinking and analysis you are applying to the problem.
    The thing to consider is that records on disk are laid out one after another. If you are storing everything in one collection, and questions and answers are arrays that can grow, then once there is no space between records to add another question/answer, the record will have to be moved -- causing disk file fragmentation. You could pre-allocate padding between records for growth, but thats a waste of disk space. So that approach is out. The other thing I am thinking is that, most likely, you will not display questions without answers -- or may be you will display a list of questions with top 2-3 answers per question -- so thats like a hybrid approach where Questions collection will have 3 answers per question residing in an array -- no fragmentation, and the rest of the answers are in a separate collection. Alternatively, you have mentioned that you will limit the number of answers to 10 -- so perhaps you can pre-allocate 10 "dummy" answers upfront and avoid fragmentation (at the expense of disk space) In summary, I would go with one User collection, one Questons collection where each question record has a field pointing to the user that has asked it, and either a hybrid question/answer approach with a seperate Answer collection, or one Question/Answer collection where the answer array is limited to 10.