2012-03-20

MongoDB Berlin 2012

Today, it was time for this year's MongoDB Berlin - two parallel tracks with some interesting talks. I've attended the following ones:
  1. Scalable and Flexible File Storage for Financial Websites - Powered by MongoDB GridFS (hold in German): It was about a document management system based on MongoDB incl. an introduction into the system architecture (replica sets, arbiter, sharding, config servers, mongos); the mongos process also runs on the application servers. Also, a short note on problems encountered: hotspots in sharding keys, some lack of monitoring the config servers, in some cases the usage of the index has to be forced by using $elemMatch().
  2. Journaling and the Storage Engine: A roundup of the internal file format. It is planned to split the data files and the index files, so that the index files can be moved to other drives to increase I/O. Journals are rotated append-only files (three of 1 GB each) and are only used for crash recovery (not for replication!) - the journal is written every 100 ms or 100 MB of data; the write impact is about 5-30% (when the journal resides on same drive as the data files) or about 3% (when drives are separated). Journaling should be enabled by default, but at least on one node within a replica set. For compaction either repairDatabase() (version 1.8) or compact() (version 2.0+) can be used.
  3. Tips, Tricks and Hacks: Write operations are normally "fire and forget" - there is no acknowledgement. getLastErrorModes() indicates that a write counts as success based on tags (e.g. DC Europe and DC North America) - used for data center awareness. Use upsert() and insert() instead of save(). mongos (the routing server) can be used to hide different replica sets behind a single connection. findAndModify() returns a document and updates it in a single operation. Polling the replication oplog can be used for further asynchronous processing or triggering.
  4. Indexing and Query Optimization: Indexes speed up queries, but slow down inserts and updates; max. 64 indexes per collection. Hints can be given for queries to force using certain indexes, explain() also works.
  5. Node.js and Mongodb Building Blocks for Your Next HTML5 game: Presentation of an online game based on Akihabara, WebSocket, MongoDB (capped collections), Express, Node (incl. cluster). The source code is hosted on GitHub.
  6. How and When to Scale MongoDB with Sharding: Used to scale write performance and to extend the available memory for the hot set. Exactly 3 config servers are needed - they are critical, as they influence the routing of client requests to certain shards! Also some hints concerning shard key selection and splitting the chunks and re-balancing between shards were given.
  7. 10 Key Performance Indicators: Introduction into the following tools and commands: mongostat, profiling, the MongoDB monitoring service (MMS), serverStatus(), stats() etc. Some other hints: Avoid connecting flooding, as too many connections lead to lots of context switches. The padding factor should be less than 2.
  8. MongoDB's New Aggregation Framework: Introduction into a new pipeline architecture available with version 2.1+ - no MapReduce.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.