Archive

Archive for the ‘Database’ Category

ActiveModel Type Coercion and API Validation

02/03/2013 1 comment

ActiveModel Type Coercion

ActiveModel is a powerful ORM that looks up the data types for model values and applies automatic type coercion and casting for user data. Data coming from Web requests as JSON or query parameters are usually cast from Strings into other datatypes, implicitly, during assignment. This is a helpful feature 99% of the time, but can be frustrating when invalid values are silently discarded. Especially if you’re trying to use model level validations for API validation.

Optional API Value Validation

As we built the API for App Cloud, we put together rules for handling JSON input and validating data. For optional user input values, we decided that if a user provided invalid data, we should return a validation error vs. silently discarding invalid values. For example, if there was a “duration” value, that was an Integer number of days a value of “foo” should produce an error. However, the default behavior is to silently discard a value like “foo” assigned to a numeric field.

We know this has happened when the coerced value is nil but there is some user data present before the coercion. There’s an example Validator below implementing this logic.

A Coerced Value Validator

Digging into the source code, ActiveModel first assigns the value for a field ‘duration’ into an attribute ‘duration_before_type_case’, which preserves the raw value before coercing the input based on the type of the attribute. Using this, we can mix-in validations for models in our API that will mark the record invalid if data is silently discarded during type coercion. Before this, providing ‘foo’ as an integer value results in nil but no errors. Now, we’ll capture the error on this input.

class CoercedValueValidator < ActiveModel::Validator
  def validate(record)
    coercible_attrs = record.keys.select do |k,v|
      [Integer, Float, Date, Time].include?(v.type)
    end.map(&:first)

    coercible_attrs.each do |attr|
      if record.send("#{attr}_before_type_cast").present? && record.send(attr).nil?
        record.errors.add(attr, 'is invalid')
      end
    end
  end
end
Advertisements
Categories: Database, Rails

MongoDB Indexing, count(), and unique validations

11/10/2012 Comments off

Slow Queries in MongoDB

I rebuilt the database tier powering App Cloud earlier this week and uncovered some performance problems caused by slow queries. As usual, two or three were caused by missing indexes and were easily fixed by adding index coverage. MongoDB has decent index functionality for most use cases.

Investigating Slow count() queries

Unfortunately, I noticed a large variety of slow queries issuing complex count() queries like:

{ 
  count: "users", 
  query: { 
    email: "bob@company.com", 
    _id: { $ne: ObjectId('509e83e132a5752f5f000001') }
  }, 
  fields: null 
}

Investigating our users collection, I saw a proper index on _id and email. Unfortunately, MongoDB can’t use indexes properly for count() operations. That’s a serious drawback, but not one I can change.

Where were these odd looking queries coming from? Why would we be looking for a user with a given email but NOT a given id?

The uniqueness validation on the email key of the User document and many other models was the culprit. Whenever a User is created/updated, ActiveModel is verifying there are no other Users with the given email:

class User
  include MongoMapper::Document

  key :email, String, unique: true
end

Use the Source!

Why is a unique validation triggering this type of count() query? Within Rails 3.x, this functionality is handled by the UniquenessValidator#validate_each implementation, which checks for records using the model’s exists?() query:

  finder_class.unscoped.where(relation).exists?

The exists?() method is a convention in both ActiveRecord and MongoMapper, checking for any records within the given scope. MongoMapper delegates it’s querying capability to the Plucky gem, where we can find the exists?() implementation using count():

  def exists?(query_options={})
    !count(query_options).zero?
  end

Root Cause and a Patch to work-around MongoMapper/Plucky

In SQL, using count() is a nice way to check for the existence of records. Unfortunately, since MongoDB won’t use indices properly for count(), this incurs a big performance hit on large collections.

I added a MongoMapper patch to work-around the issue. We can patch the exists?() method to use find_one() without any fields instead of the expensive count() path:

module MongoMapper
  module Plugins
    module Querying
      module ClassMethods
        # Performance Hack: count() operations can't use indexes properly.
        # Use find() instead of count() for faster queries via indexes.
        def exists?(query_options={})
          !!only(:_id).find_one(query_options)
        end
      end
    end
  end 
end