Archive

Archive for the ‘Operations’ Category

Adventures in DHCP

12/08/2010 Comments off

Internet Down
Working for a startup, office IT crises can be as important as production operations.

We all arrived in the office this morning to find out there was no internet access. Further debugging showed that all of the DHCP lease requests were failing. However, the internet connection was working if you manually configured your IP/DNS info.

Background
We had installed a new DHCP/DNS server last Friday night, so we immediately suspected some failure there. Thankfully, we had the old server as a hot backup. Surprisingly, the hot backup server didn’t work either.

We fired up a network sniffer and saw that between all of the clients in the office network and the DHCP server, someone else was intercepting the DHCP discovery requests and responding negatively to every attempt before the real server could.

DHCP is a broadcast protocol, so it’s easy for a misconfigured machine or router to hijack valid network requests. All of the evidence pointed to a similar thing happening here.

At the same time, we had just opened up a new room in our offices and had wired drops put in. There was a new patch panel with wiring for several drops in the room. I was immediately suspicious of a new router/device in this area due to the timing. However, we didn’t find any new devices.

Mystery Solved
It turns out that someone had wired the new patch panel up to an existing Time-capsule/Airport hub, but plugged it into the Uplink port instead of a hub port. This caused the Airport to configure itself as a router, turn on DHCP forwarding to the dead network (the new room), and intercept all DHCP requests from the entire office.

Categories: Debugging, Operations

REE Cuts Rails Test Time in Half

12/07/2010 Comments off

Ruby Enterprise Edition (REE)
I spent the night after work switching our build/stage server to Ruby Enterprise Edition. I switched both our Hudson based builds and our Passenger staging servers.

REE is well known for it’s superior garbage collection and memory management, but I was shocked to see how much faster it executed in Ruby CPU-bound contexts. We saw about a 55% drop in runtime, taking our average build times from 55min to 30min.

Build/Test Times Cut Almost in Half
Hudson Screen Shot

Performance Drill-down

  • Unit Tests: From 1036s to 579s (to run 3880 tests)
  • Functional Tests: From 844s to 448s (to run 860 tests)
  • Cucumber Tests: From 498s to 255s (to run 1078 steps)

The nginx/REE/Passenger stack is known as the best of breed production Rails stack, but I can’t believe how much of a benefit we’ve gotten from introducing the same components into our build and staging systems.

This effort was initially a functional testing pass to verify our system performed correctly under REE, I never expected to achieve such massive performance gains on it’s own merits.

Tips/Tricks & Gotchas

  • RVM is the best way to test/incrementally introduce a new ruby interpreter
  • If you’re using bundler w/ file-system bundles (via –path) you need to completely rebuild them when you switch Ruby interpreters
  • If you have a previous Passenger Apache module installer, you need to rebuild/reinstall the REE based Apache module

Footnotes:
Original Ruby Version
ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux]
REE Version
ruby 1.8.7 (2010-04-19 patchlevel 253) [x86_64-linux], MBARI 0x6770, Ruby Enterprise Edition 2010.02

Cache Segmentation for Rails Apps

11/23/2010 3 comments

Problem: Cache Correctness vs. Cache Persistence
In many Rails apps, there are two caching requirements: Cache as much static data as possible, Flush any cached content that varies with each release (cached models, for example).  Many simple deployment strategies flush all cache data to prevent serialization conflicts with models, but this can cause an expensive penalty in cache misses until your cache is warmed up again.

At PatientsLikeMe, we faced both of these problems.  We have a high volume of action-cached data with a long shelf-life that’s difficult to pre-cache.  We also have a number of volatile model based caches that need to be flushed every time we deploy.

Solution: Segment Stable vs. Volatile Caches
I built this simple segmented cache strategy to meet our needs: a stable memcache namespace for any data that should persist between deploys, and a volatile cache namespace that’s swapped on every deploy.

Rails Cache Configuration Example

revision_file = Rails.root.join('REVISION')
if File.exist?(revision_file)
  revision = File.read(revision_file).match /[a-f,0-9]{6}$/
end
config.cache_store = :mem_cache_store, memcache_host, { :namespace => "volatile-#{ revision ? revision[0] : '0' }" }
config.action_controller.cache_store = :mem_cache_store, memcache_host, { :namespace => 'stable' }

EDIT: Thanks to Jeremy for the refactoring for this sample.

You’ll notice this sets one Cache object for the default Rails cache (used by models and most things) and a custom, stable Cache object for anything in a Controller (Action and Fragment Caches).

Categories: Caching, Operations