A Year of WOW (20% time) at PatientsLikeMe

01/27/2012 1 comment

What is WOW Week?

PatientsLikeMe has built our own version of Google’s “20% Time” that we call “WOW Week”. WOW Week is a week of unstructured development time for engineers, where they can work on anything they choose to improve our products. This lets people focus on their personal passions or explore riskier ideas. See my more detailed post about what WOW Week is and how it works for PatientsLikeMe.

2011 WOW Week Projects in Review

It’s easy to pay lip-service to the idea of 20% time, but PatientsLikeMe actually dedicates entire weeks at a time. This post showcases what a year of WOW produced in 2011. Each of these projects was initiated by an engineer in their own time and most made it into production.

Clinical Trials (In Production)

Credit: James Kebinger and Jeff Dwyer

Provide a friendly search interface to National Clinical Trial registry and automatically match patients within PatientsLikeMe to relavant trials they qualify for.

Clinical Trials

Read more…

WOW Week at PatientsLikeMe

01/20/2012 2 comments

What is WOW Week?

PatientsLikeMe has built our own version of Google’s “20% Time” that we call “WOW Week”. WOW Week is a week of unstructured development time for engineers, where they can work on anything to improve our products as long as they demo their progress in front of the company at the end of the week.

The engineering team works in 2-week long development sprints. After three development sprints in a row, we have a “Technical Debt” week and a WOW Week.

Why a Week at a time versus 20% Time?

It’s easy to pay lip-service to the concept of 20% time for engineers while scheduling a full load of work. I’ve seen this happen many times at other companies. PatientsLikeMe avoids this pitfall by creating a public block of time for the entire company. See the 2011 WOW showcase to see how much we build.

Scheduling a complete week allows a single context-switch into innovation mode for everyone. This maximizes the value of this time, instead of dividing it into smaller chunks that are diluted by context switching and deadlines.

Read more…

How PatientsLikeMe.com Monitors Ops w/ PagerDuty

04/26/2011 1 comment

PagerDuty Dispatch

Summary (TL;DR)
We have a network of production monitoring tools at patientslikeme.com, where monit, NewRelic, and Pingdom feed alerts through PagerDuty to produce e-mail, SMS, and Pager alerts for production issues. PagerDuty has a ticketing system to assign a given problem to a single person. It’s awesome.

Life Before PagerDuty
Whenever a background worker was automatically restarted, we deployed a fix, or any minor system event occurred a handful of e-mails would be generated to our whole Ops team and most of them would get SMS messages for each. We mostly ignored all of this noise. When a genuine emergency occurred, we often didn’t react immediately. Because we were all getting alerted, often 2-3 of us would respond in a piling-on effect. This sucks.

Principles of Proper Ops Monitoring

  1. People only get alerts for serious issues requiring human intervention
  2. Only One Person Alerted at a Time
  3. Serious Issues Should Wake You Up at 4AM

Read more…

Guerilla Scrum: Minimum Viable Process

04/13/2011 Comments off

TL;DR
You can start with the minimum viable process right now (Standups, a Backlog, and Demo/Retro). You can use this foundation to build a customized process framework that works for your team.

Frustrations with Heavy Process
The Software world seems split between process dogmatists and pragmatists. Dogmatists believe that the entire canon of Scrum practices must be enacted as a whole, or “you’re doing it wrong.”

I couldn’t disagree more – every process was designed to solve a problem in a particular context. This attitude runs rampant, and many teams chafe under the weight of heavy process. This post aims to enumerate the minimum viable scrum process.

Think Critically: A Process Should Solve a Problem
Any process or rule should solve a specific, measurable problem. Processes should exist to make your life better. You should know the problem(s) each process is intended to solve and answer:

  1. Is it working?
  2. Do you still need it?

Guerilla Scrum: Core Processes
I call these core Scrum processes Guerilla Scrum, the minimum viable process to get started and generate what you need:

  1. Daily Standups
  2. Single, Prioritized Backlog
  3. Fixed Iterations
  4. Demo & Retrospective

Read more…

Categories: Uncategorized Tags:

What Makes a Great Startup Engineer?

03/20/2011 1 comment

(The following is an answer to this question on Quora)

What Makes a Great Startup Engineer?
Working in a startup is both difficult and awesome for the same reasons: very little process/politics, but always more work than anyone has time to do.

  1. Organized & Driven – aka they can Get Shit Done
  2. Smart
  3. Energetic/Passionate
  4. Quick Learner
  5. Not an Asshole, you should enjoy spending time with them

Read more…

Passenger Resource Collision

01/06/2011 1 comment

Passenger and Smart Spawn Mode
Phusion Passenger has an intelligent application server worker pool based on Apache for hosting Rails apps. It uses a clever forking process spawner with Ruby Enterprise Edition to Pre-load your Rails application and environment in a parent process before forking off Child workers.

This allows you to load your application once for N workers, speeding up startup time and reducing memory usage dramatically with REE copy-on-write support.

Preventing Resource Collision
However, there is a hidden risk in forking children this way. Each child inherits an initialized Rails environment from the parent, which can cause resource duplication or collision. Passenger resets the database connection out of the box – each Rails worker has it’s own ActiveRecord connection pool.

However, other resources like Redis or Memcache are NOT reset or protected this way. When deploying a production scale application it’s essential to implement this. Otherwise, you could have each of 8+ workers on one server trying to re-use the same connection or file-handle.

PatientsLikeMe solution – Rails Initializer to Reset Resources

require 'passenger_safe_resource'

# Initialize Redis
PassengerSafeResource.load('redis') do
  env = ENV["RACK_ENV"] || ENV["RAILS_ENV"] || "development"
  redis_conf = YAML.load_file("config/redis.yml")[env]
  fail "No configuration for #{env}" unless redis_conf
  host, port, db = redis_conf.split(':')

  REDIS = Redis.new(:host => host, :port => port, :thread_safe => true, :db => db)
end

# Reset Vanity Redis Connection
PassengerSafeResource.load('vanity') do
  Vanity.playground.reconnect!
end

# Reset Memcached Connection
PassengerSafeResource.load('memcache') do
  PassengerSafeResource.reset_memcache(Rails.cache)
end

Helper to Manage Passenger events

module PassengerSafeResource
  # Helper to reset memcache connection
  def self.reset_memcache(cache)
    return unless cache.kind_of?(ActiveSupport::Cache::MemCacheStore)

    cache.instance_variable_get(:@data).reset
  end

  # Helper to load/reset a resource with Passenger
  def self.load(resource_name, &block)
    if defined?(PhusionPassenger)
      PhusionPassenger.on_event(:starting_worker_process) do |forked|
        if forked
          Rails.logger.info "PassengerSafeResource(#{resource_name}): Forking Child, loading in child"
          yield
        else
          Rails.logger.info "PassengerSafeResource(#{resource_name}): Non-Forking Spawn, NOOP"
        end
      end
    else
      Rails.logger.info "PassengerSafeResource(#{resource_name}): Non-Passenger mode, loading resource"
      yield
    end
  end
end
Categories: Uncategorized Tags:

Best of 2010 – Top 5 Books/Albums/Films

12/31/2010 1 comment

Books

  1. William Gibson – Zero History

    This book is the culmination of the trilogy of present-day futurist fiction from Gibson, a great adventure through the streets of London and Paris in search of underground fashion and conspiracy.

  2. Freedom – Jonathan Franzen


    This is the first novel to live up to The Corrections, beautiful writing, horribly true to life characters, and a gift for capturing the modern American situation. Deeply human and flawed characters grappling with building a life, being parents, and finding meaning despite the unintended choices that lead us to our lives.

  3. Although, of Course, You End Up Becoming Yourself – David Lipsky

    Less a book than a fairly raw presentation of a 48 hour road-trip/interview with David Foster Wallace, a solid meal for anyone interested in the world of writers, the life of DFW, or interesting conversations about making a life for yourself.

  4. Read more…

Categories: Uncategorized

File Handle Leaks in Hudson

12/17/2010 Comments off

Hudson is Awesome
We recently switched from cruisecontrol.rb to Hudson and have been much happier. It’s more reliable and we get much better resource management using build queues.

Hudson Failure
However, this week Hudson has stopped responding several times with the following error:

Dec 17, 2010 12:41:29 PM hudson.triggers.SCMTrigger$Runner runPolling
SEVERE: Failed to record SCM polling
hudson.plugins.git.GitException: Error retrieving tag names
        at hudson.plugins.git.GitAPI.getTagNames(GitAPI.java:650)
   ... snip ...
Caused by: java.io.IOException: Cannot run program "git" (in directory "/home/cruise/.hudson/server/jobs/plm-website-master/workspace"): java.io.IOException: error=24, Too many open files
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
   ... snip ...
Caused by: java.io.IOException: java.io.IOException: error=24, Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:164)
   ... snip ...

Proximate Cause
Hudson is bound by the default process file limit, of 1024 on this linux box. When it hits the limit, these type of failures occur preventing forking off child processes. Something was leaking file handles. Using lsof showed file handles allocated up the the limit, 99% of which were pipes.

Root Cause
Whenever Hudson forks off a child process, it monitors the stdout/stderr streams even after they finish executing. So if you spawn off a daemonized process and don’t close out your output streams, you will leak 2-3 file handles on every execution. We have a simple ruby script that spawns off to report build status in our Campfire chatroom.

Being a naive script, we weren’t properly closing out our file handles.

We switched the Post-build script execution from:

ruby campfire.rb &

To:

ruby scrub_fds.sh campfire.rb &

Where this is scrub_fds.sh:

#!/bin/sh

eval exec {0..2}\>/dev/null
eval exec {3..255}\>\&- # Close ALL file descriptors

$*

Reference Links

UPDATE: Looks like this is mostly caused by a bug in the Git plugin for Hudson (Fixed in 1.390).

Categories: Debugging, Operations Tags:

Adding Bundler to Passenger Hosted Apps

12/15/2010 Comments off

We upgraded one of our applications to manage dependencies with bundler at PatientsLikeMe. When we deployed the new version of the application on a Passenger app server, we saw errors loading our bundle:

rubygems/dependency.rb:52:in `initialize': Valid types are [:development, :runtime], not nil (ArgumentError)

Some quick googling show this is a problem with rubygems on the system, with the fix being to upgrade rubygems as follows:

$> sudo gem update --system
$> gem -v 
1.3.7

We restarted the application via:

touch tmp/restart.txt

However, we still experienced the same bundler problem – as if the wrong gem system were being used in an RVM or multi-ruby environment.

This is an easy gotcha, as tmp/restart.txt only re-loads the application via the Passenger Spawn process, it doesn’t reload Passenger or the configuration.

When you’re changing system gems or other configuration loaded by Passenger, you need to restart the entire Apache stack hosting Passenger:

sudo /usr/sbin/apachectl restart

This resolved the problem.

Categories: Debugging, Operations, Rails

Protecting Yourself from Firesheep Using an SSH Tunnel

12/13/2010 Comments off

What is Firesheep?
Firesheep is a recently released packet sniffer with built in side-jacking, that monitors insecure networks (usually open WiFi) for web application traffic, steals session information, and automatically impersonates your logged in session to many sites (Google, Facebook, Yahoo, etc).

SSH Tunneling/Proxy in OSX
The simplest way to protect yourself is to establish a secure VPN/tunnel for all of your web browsing to prevent sniffing of that traffic on the network. This moves the insecure traffic between the server and the web application and off of the local network your browsing.

If you have access to a Linux server with SSH, you can build a local SSH tunnel from a port on your machine out through the server to the internet. For those of you at PatientsLikeMe, dev2 is a great server to use for this. Below is an example SSH command to load a persistent SSH tunnel with a SOCKS proxy locally forwarding traffic over it.

ssh -D 8080 -f -C -q -N wpeterson@dev2.plm

Read more…

Categories: Uncategorized