MongoDB Indexing, count(), and unique validations

11/10/2012 Comments off

Slow Queries in MongoDB

I rebuilt the database tier powering App Cloud earlier this week and uncovered some performance problems caused by slow queries. As usual, two or three were caused by missing indexes and were easily fixed by adding index coverage. MongoDB has decent index functionality for most use cases.

Investigating Slow count() queries

Unfortunately, I noticed a large variety of slow queries issuing complex count() queries like:

{ 
  count: "users", 
  query: { 
    email: "bob@company.com", 
    _id: { $ne: ObjectId('509e83e132a5752f5f000001') }
  }, 
  fields: null 
}

Investigating our users collection, I saw a proper index on _id and email. Unfortunately, MongoDB can’t use indexes properly for count() operations. That’s a serious drawback, but not one I can change.

Where were these odd looking queries coming from? Why would we be looking for a user with a given email but NOT a given id?

The uniqueness validation on the email key of the User document and many other models was the culprit. Whenever a User is created/updated, ActiveModel is verifying there are no other Users with the given email:

class User
  include MongoMapper::Document

  key :email, String, unique: true
end

Use the Source!

Why is a unique validation triggering this type of count() query? Within Rails 3.x, this functionality is handled by the UniquenessValidator#validate_each implementation, which checks for records using the model’s exists?() query:

  finder_class.unscoped.where(relation).exists?

The exists?() method is a convention in both ActiveRecord and MongoMapper, checking for any records within the given scope. MongoMapper delegates it’s querying capability to the Plucky gem, where we can find the exists?() implementation using count():

  def exists?(query_options={})
    !count(query_options).zero?
  end

Root Cause and a Patch to work-around MongoMapper/Plucky

In SQL, using count() is a nice way to check for the existence of records. Unfortunately, since MongoDB won’t use indices properly for count(), this incurs a big performance hit on large collections.

I added a MongoMapper patch to work-around the issue. We can patch the exists?() method to use find_one() without any fields instead of the expensive count() path:

module MongoMapper
  module Plugins
    module Querying
      module ClassMethods
        # Performance Hack: count() operations can't use indexes properly.
        # Use find() instead of count() for faster queries via indexes.
        def exists?(query_options={})
          !!only(:_id).find_one(query_options)
        end
      end
    end
  end 
end

Password Salting isn’t Just For Servers

06/07/2012 Comments off

Current Events: LinkedIn Password Breach

Everyone has been talking about the massive LinkedIn password breach. Apparently, unknown actors have compromised some subset of the LinkedIn user base and a password hash database has been released into the wild.

What Does it Mean?

The good news is that the password database isn’t associated with your login credentials and isn’t immediately exploitable by everyone in the world. The bad news is that this is certainly proof a group of bad actors have the full credentials for most of LinkedIn.

Weak Password Storage

It appears this breach was possible due to poor algorithm choices by LinkedIn, using a simple one-way hash for each password that can be cracked more easily than proper password storage. There are several great posts about how to do this right as an application developer, but this post is focused on the user.

What Does it Mean For You?

The real threat in breaches like this is that people share the same password on multiple sites. It’s easy enough to recognize when LinkedIn or one site has a breach and change your password, but what else could someone with your email/login gain access to if you re-use passwords like most people do?

What’s the Best Plan to Protect Yourself?

The best way to manage this is to use a password manager with random passwords for every site you authenticate to. However, this is too difficult and complex for the majority of people. There’s an easier way to use a few h3 secrets to create unique, user salted passwords.

What’s the Realistic, Everyday Plan?

If you can’t use truly random passwords and a password manager, you should generate a random 8 letter password, and then add a word or 4-6 letters to it to get a unique password just for that site. For example, if you’re strong password was “Nafra6he” you could make your password “linkedinNafra6he”. It’s just as easy to remember and only a little harder to type. If the password database for LinkedIn is compromised and you have a custom prefix on all your other sites, you won’t be at risk. For example, if your google password with this schema was “googleNafra6he”.

Categories: LifeHacks, Security

Mapping Rails 3.0 Commands

03/31/2012 Comments off

Rails 3.0 introduces ./script/rails

When Rails 3.0 was released, all of the individual command utilities (the Rails console, Rails generator, etc) were all consolidated into a single script:

./script/rails

That’s Annoying

Being used to typing “./script/console” for the last 4 years of my life, this is annoying. It’s a lot easier to tab complete within the ./script directory to the exact command you want and then fill in the arguments.

My Answer: rails_command_stubs

To solve this, I built a wrapper script you can drop in your Rails ./script directory. You can then symlink in commonly used commands and reference them directly and they pass through to the ./script/rails equivalent.

./script/console production
# Calls ./script/rails console production

Enter rails_command_stubs.

Categories: Uncategorized

Resque Queue Priority

02/17/2012 Comments off

Resque Queue Priority

 
 
Queue Priority

Resque allows each worker process to work a prioritized list of work queues. When jobs are added, they end up in one particular queue. Each worker scans the queues in that priority order to determine the next job to process. This allows you to ensure that higher priority work is processed before lower priority work.

TL;DR: Resque workers process work in the priority order specified.

Categories: Debugging, Operations, Rails Tags:

A Year of WOW (20% time) at PatientsLikeMe

01/27/2012 1 comment

What is WOW Week?

PatientsLikeMe has built our own version of Google’s “20% Time” that we call “WOW Week”. WOW Week is a week of unstructured development time for engineers, where they can work on anything they choose to improve our products. This lets people focus on their personal passions or explore riskier ideas. See my more detailed post about what WOW Week is and how it works for PatientsLikeMe.

2011 WOW Week Projects in Review

It’s easy to pay lip-service to the idea of 20% time, but PatientsLikeMe actually dedicates entire weeks at a time. This post showcases what a year of WOW produced in 2011. Each of these projects was initiated by an engineer in their own time and most made it into production.

Clinical Trials (In Production)

Credit: James Kebinger and Jeff Dwyer

Provide a friendly search interface to National Clinical Trial registry and automatically match patients within PatientsLikeMe to relavant trials they qualify for.

Clinical Trials

Read more…

WOW Week at PatientsLikeMe

01/20/2012 2 comments

What is WOW Week?

PatientsLikeMe has built our own version of Google’s “20% Time” that we call “WOW Week”. WOW Week is a week of unstructured development time for engineers, where they can work on anything to improve our products as long as they demo their progress in front of the company at the end of the week.

The engineering team works in 2-week long development sprints. After three development sprints in a row, we have a “Technical Debt” week and a WOW Week.

Why a Week at a time versus 20% Time?

It’s easy to pay lip-service to the concept of 20% time for engineers while scheduling a full load of work. I’ve seen this happen many times at other companies. PatientsLikeMe avoids this pitfall by creating a public block of time for the entire company. See the 2011 WOW showcase to see how much we build.

Scheduling a complete week allows a single context-switch into innovation mode for everyone. This maximizes the value of this time, instead of dividing it into smaller chunks that are diluted by context switching and deadlines.

Read more…

How PatientsLikeMe.com Monitors Ops w/ PagerDuty

04/26/2011 1 comment

PagerDuty Dispatch

Summary (TL;DR)
We have a network of production monitoring tools at patientslikeme.com, where monit, NewRelic, and Pingdom feed alerts through PagerDuty to produce e-mail, SMS, and Pager alerts for production issues. PagerDuty has a ticketing system to assign a given problem to a single person. It’s awesome.

Life Before PagerDuty
Whenever a background worker was automatically restarted, we deployed a fix, or any minor system event occurred a handful of e-mails would be generated to our whole Ops team and most of them would get SMS messages for each. We mostly ignored all of this noise. When a genuine emergency occurred, we often didn’t react immediately. Because we were all getting alerted, often 2-3 of us would respond in a piling-on effect. This sucks.

Principles of Proper Ops Monitoring

  1. People only get alerts for serious issues requiring human intervention
  2. Only One Person Alerted at a Time
  3. Serious Issues Should Wake You Up at 4AM

Read more…

Follow

Get every new post delivered to your Inbox.