Archive

Posts Tagged ‘linkedin’

MongoDB Indexing, count(), and unique validations

11/10/2012 Comments off

Slow Queries in MongoDB

I rebuilt the database tier powering App Cloud earlier this week and uncovered some performance problems caused by slow queries. As usual, two or three were caused by missing indexes and were easily fixed by adding index coverage. MongoDB has decent index functionality for most use cases.

Investigating Slow count() queries

Unfortunately, I noticed a large variety of slow queries issuing complex count() queries like:

{ 
  count: "users", 
  query: { 
    email: "bob@company.com", 
    _id: { $ne: ObjectId('509e83e132a5752f5f000001') }
  }, 
  fields: null 
}

Investigating our users collection, I saw a proper index on _id and email. Unfortunately, MongoDB can’t use indexes properly for count() operations. That’s a serious drawback, but not one I can change.

Where were these odd looking queries coming from? Why would we be looking for a user with a given email but NOT a given id?

The uniqueness validation on the email key of the User document and many other models was the culprit. Whenever a User is created/updated, ActiveModel is verifying there are no other Users with the given email:

class User
  include MongoMapper::Document

  key :email, String, unique: true
end

Use the Source!

Why is a unique validation triggering this type of count() query? Within Rails 3.x, this functionality is handled by the UniquenessValidator#validate_each implementation, which checks for records using the model’s exists?() query:

  finder_class.unscoped.where(relation).exists?

The exists?() method is a convention in both ActiveRecord and MongoMapper, checking for any records within the given scope. MongoMapper delegates it’s querying capability to the Plucky gem, where we can find the exists?() implementation using count():

  def exists?(query_options={})
    !count(query_options).zero?
  end

Root Cause and a Patch to work-around MongoMapper/Plucky

In SQL, using count() is a nice way to check for the existence of records. Unfortunately, since MongoDB won’t use indices properly for count(), this incurs a big performance hit on large collections.

I added a MongoMapper patch to work-around the issue. We can patch the exists?() method to use find_one() without any fields instead of the expensive count() path:

module MongoMapper
  module Plugins
    module Querying
      module ClassMethods
        # Performance Hack: count() operations can't use indexes properly.
        # Use find() instead of count() for faster queries via indexes.
        def exists?(query_options={})
          !!only(:_id).find_one(query_options)
        end
      end
    end
  end 
end
Advertisements

Resque Queue Priority

02/17/2012 Comments off

Resque Queue Priority

 
 
Queue Priority

Resque allows each worker process to work a prioritized list of work queues. When jobs are added, they end up in one particular queue. Each worker scans the queues in that priority order to determine the next job to process. This allows you to ensure that higher priority work is processed before lower priority work.

TL;DR: Resque workers process work in the priority order specified.

Categories: Debugging, Operations, Rails Tags:

WOW Week at PatientsLikeMe

01/20/2012 2 comments

What is WOW Week?

PatientsLikeMe has built our own version of Google’s “20% Time” that we call “WOW Week”. WOW Week is a week of unstructured development time for engineers, where they can work on anything to improve our products as long as they demo their progress in front of the company at the end of the week.

The engineering team works in 2-week long development sprints. After three development sprints in a row, we have a “Technical Debt” week and a WOW Week.

Why a Week at a time versus 20% Time?

It’s easy to pay lip-service to the concept of 20% time for engineers while scheduling a full load of work. I’ve seen this happen many times at other companies. PatientsLikeMe avoids this pitfall by creating a public block of time for the entire company. See the 2011 WOW showcase to see how much we build.

Scheduling a complete week allows a single context-switch into innovation mode for everyone. This maximizes the value of this time, instead of dividing it into smaller chunks that are diluted by context switching and deadlines.

Read more…

How PatientsLikeMe.com Monitors Ops w/ PagerDuty

04/26/2011 1 comment

PagerDuty Dispatch

Summary (TL;DR)
We have a network of production monitoring tools at patientslikeme.com, where monit, NewRelic, and Pingdom feed alerts through PagerDuty to produce e-mail, SMS, and Pager alerts for production issues. PagerDuty has a ticketing system to assign a given problem to a single person. It’s awesome.

Life Before PagerDuty
Whenever a background worker was automatically restarted, we deployed a fix, or any minor system event occurred a handful of e-mails would be generated to our whole Ops team and most of them would get SMS messages for each. We mostly ignored all of this noise. When a genuine emergency occurred, we often didn’t react immediately. Because we were all getting alerted, often 2-3 of us would respond in a piling-on effect. This sucks.

Principles of Proper Ops Monitoring

  1. People only get alerts for serious issues requiring human intervention
  2. Only One Person Alerted at a Time
  3. Serious Issues Should Wake You Up at 4AM

Read more…

Guerilla Scrum: Minimum Viable Process

04/13/2011 Comments off

TL;DR
You can start with the minimum viable process right now (Standups, a Backlog, and Demo/Retro). You can use this foundation to build a customized process framework that works for your team.

Frustrations with Heavy Process
The Software world seems split between process dogmatists and pragmatists. Dogmatists believe that the entire canon of Scrum practices must be enacted as a whole, or “you’re doing it wrong.”

I couldn’t disagree more – every process was designed to solve a problem in a particular context. This attitude runs rampant, and many teams chafe under the weight of heavy process. This post aims to enumerate the minimum viable scrum process.

Think Critically: A Process Should Solve a Problem
Any process or rule should solve a specific, measurable problem. Processes should exist to make your life better. You should know the problem(s) each process is intended to solve and answer:

  1. Is it working?
  2. Do you still need it?

Guerilla Scrum: Core Processes
I call these core Scrum processes Guerilla Scrum, the minimum viable process to get started and generate what you need:

  1. Daily Standups
  2. Single, Prioritized Backlog
  3. Fixed Iterations
  4. Demo & Retrospective

Read more…

Categories: Uncategorized Tags:

What Makes a Great Startup Engineer?

03/20/2011 1 comment

(The following is an answer to this question on Quora)

What Makes a Great Startup Engineer?
Working in a startup is both difficult and awesome for the same reasons: very little process/politics, but always more work than anyone has time to do.

  1. Organized & Driven – aka they can Get Shit Done
  2. Smart
  3. Energetic/Passionate
  4. Quick Learner
  5. Not an Asshole, you should enjoy spending time with them

Read more…

Passenger Resource Collision

01/06/2011 1 comment

Passenger and Smart Spawn Mode
Phusion Passenger has an intelligent application server worker pool based on Apache for hosting Rails apps. It uses a clever forking process spawner with Ruby Enterprise Edition to Pre-load your Rails application and environment in a parent process before forking off Child workers.

This allows you to load your application once for N workers, speeding up startup time and reducing memory usage dramatically with REE copy-on-write support.

Preventing Resource Collision
However, there is a hidden risk in forking children this way. Each child inherits an initialized Rails environment from the parent, which can cause resource duplication or collision. Passenger resets the database connection out of the box – each Rails worker has it’s own ActiveRecord connection pool.

However, other resources like Redis or Memcache are NOT reset or protected this way. When deploying a production scale application it’s essential to implement this. Otherwise, you could have each of 8+ workers on one server trying to re-use the same connection or file-handle.

PatientsLikeMe solution – Rails Initializer to Reset Resources

require 'passenger_safe_resource'

# Initialize Redis
PassengerSafeResource.load('redis') do
  env = ENV["RACK_ENV"] || ENV["RAILS_ENV"] || "development"
  redis_conf = YAML.load_file("config/redis.yml")[env]
  fail "No configuration for #{env}" unless redis_conf
  host, port, db = redis_conf.split(':')

  REDIS = Redis.new(:host => host, :port => port, :thread_safe => true, :db => db)
end

# Reset Vanity Redis Connection
PassengerSafeResource.load('vanity') do
  Vanity.playground.reconnect!
end

# Reset Memcached Connection
PassengerSafeResource.load('memcache') do
  PassengerSafeResource.reset_memcache(Rails.cache)
end

Helper to Manage Passenger events

module PassengerSafeResource
  # Helper to reset memcache connection
  def self.reset_memcache(cache)
    return unless cache.kind_of?(ActiveSupport::Cache::MemCacheStore)

    cache.instance_variable_get(:@data).reset
  end

  # Helper to load/reset a resource with Passenger
  def self.load(resource_name, &block)
    if defined?(PhusionPassenger)
      PhusionPassenger.on_event(:starting_worker_process) do |forked|
        if forked
          Rails.logger.info "PassengerSafeResource(#{resource_name}): Forking Child, loading in child"
          yield
        else
          Rails.logger.info "PassengerSafeResource(#{resource_name}): Non-Forking Spawn, NOOP"
        end
      end
    else
      Rails.logger.info "PassengerSafeResource(#{resource_name}): Non-Passenger mode, loading resource"
      yield
    end
  end
end
Categories: Uncategorized Tags: