Jason Punyon

A Wild Anomaly Appears! Part 2: The Anomaling

After all the rave reviews of my last post I knew you were just on the edge of your seat waiting to hear more about my little unsupervised text anomaly detector.

So, we’ve got some working ML! Our job’s done right?

‘Round these parts, it ain’t shipped ‘til it’s fast and it’s got it’s own chat bot. We spend all day in chat and there’s a cast of characters we’ve come to know, and love, and hate.


Pinbot helps us not take each other’s database migrations. You can say “taking ###” and Pinbot removes the last taken migration from this room’s pins and replaces it with this one. So we always know what migration we’re up to, even if someone hasn’t pushed it yet. Also pictured: Me calling myself a dumbass in the git logs. Which gets broadcast by our TeamCity bot. Someone starred it.

Hair on Fire

Hair on fire bot helps Careers keep making money. Hair on fire pops up every now and again to tell us a seriously critical error that affects the bottom line has happened on the site. If someone is buying something or on their way to buying something and gets an exception Hair On Fire dumps the details directly to chat so someone can get to work immediately.


Here’s another little Careers helper. We have a policy that any advertising images that go up on Stack Overflow must be reviewed by a real live person. When display advertising is purchased from our ad sales/ad operations teams, they do it. When we introduced Company Page Ads we automated the process of getting images on Stack Overflow and no one had to talk to a sales person. This begat LogoBot who reminds us to do our job and make sure no one’s putting up animated gifs or other such tawdriness.

Malfunctioning Eddie

Malfunctioning Eddie’s…got problems.

Anomaly Bot

Which brings me to the Anomaly bot. We need to make sure that all these anomalous jobs I’m detecting get in front of the sales support team. They are the human layer of detectors I alluded to in my last post who used to have to check over every single job posted to the board.

There it is. Anomaly bot. Where does it lead us puny humans?

Welcome to the super secret admin area of Careers. At the top of the page we have a list of the jobs that were posted today. There are 3 columns, the anomaly score (which is based solely on title), the job title, and a few buttons to mark the job anomalous or not. The second section of the page is for all jobs currently on the board.

I’m hoping the heatmap pops out at you. It runs from Red (pinned to the all-time most anomalous job) to Green (pinned to the all-time most middle-of-the-road job ever). The jobs posted today are light orange at worst, so that’s pretty good! On the “all jobs” list there’s a bunch of red that we need to review.

Just to give a reference, here was the first version sans heatmap.

So much more information in that tiny extra bit of color. If you want to make simple heatmaps it’s really easy to throw together some javascript that uses the power of HSL.

What’s Next?

We’re gonna let this marinate for a while to actually test my hypothesis that we only have to look at the top 10% of jobs by anomaly score. The sales support team’s gonna clear the backlog in the “all jobs” section of the report, then use the tool for a little while and then we’ll have the data we need to actually set the threshold. Once we do that the Anomaly bot can get a little smarter. Right now Anomaly bot just shows every three hours with that same dumb message. Maybe it’ll only show up when there’s jobs above our human-trained threshold (modulo a safety factor). Maybe we’ll change it to pop up as soon as an anomalous job gets posted on the board.

Here, have some code

If you want to use the very code we’re using right now to score the job titles it’s up on Nuget, and the source is on Github

Got experience solving problems like this one? Wanna work at a place like Stack Exchange? Head on over to our completely middle-of-the-road job listing and get in touch with us.

A Wild Anomaly Appears!

So, I’m working on the new Data Team at Stack Exchange now. Truth is we have no idea what we’re doing (WANNA JOIN US?). But every now and then we come across something that works a little too well and wonder why we haven’t heard about it before.

We run a niche job board for programmers that has about 2900 jobs on it this morning. Quality has been pretty easy to maintain. We have a great spam filter called “The $350 Price Tag”. Then we have some humans that look over the jobs that get posted looking for problems. Overall the system works well, but at 2900 jobs a month that means someone has to look through about 150 jobs every working day. They’re looking for a needle in a haystack as most (>95%) of the jobs posted are perfectly appropriate for the board, so there’s a lot of “wasted” time spent looking for ghosts that aren’t there. And it’s pretty boring to boot. I’m sure that person would rather do other things with their time.

It’d be nice if we had an automated way of dealing with this. We have no idea what we’re doing, so we frequently just reach into our decidedly meager bag of tricks, pull one out, and try it on a problem. I’d done that a few times before on this problem, trying Naive Bayes or Regularized Logistic Regression, but had gotten nowhere. There are a lot of different ways a job can be appropriate for the board and there are a lot of different ways a job could be not appropriate for the board which makes coming up with a representative training set difficult.

Last week while taking another whack at the problem I Googled “Text Anomaly” and came across David Guthrie’s 186 page Ph. D. thesis, Unsupervised Detection of Anomalous Text. There’s a lot there, but the novel idea was simple enough (and it worked in his experiments and mine) that I’m surprised I haven’t heard about it until now.

Distance to the Textual Complement

Say you have a bunch of documents. You pull one out and want to determine how anomalous it is with respect to all the others. Here’s what you do:

  1. Choose some features to characterize the document in question.
  2. Convert the document to its feature representation.
  3. Treat all the other documents as one giant document and convert that to its feature representation.
  4. Calculate the distance between the two.

Do this for every document and sort the results descending by the distance calculated in step 4. The documents at the top of the list are the “most anomalous”.

That’s it. Pretty simple to understand and implement. There are two choices to make: which features, and which distance metric to use.

Obscurity of Vocabulary, I choose you!

In any machine learning problem you have to come up with features to characterize the set of things you’re trying to work on. This thesis is chock full of features, 166 of them broken up into a few different categories. This part of the paper was a goldmine for me (remember, I have no idea what I’m doing). The text features I knew about before this were word counts, frequencies, tf-idf and maybe getting a little into part of speech tags. The kinds of features he talks about are stuff I never would’ve come up with on my own. If you’re doing similar work and are similarly lost, take a look there for some good examples of feature engineering.

The set of features that stood out to me the most were the ones in a section called “Obscurity of Vocabulary Usage”. The idea was to look at a giant reference corpus and rank the words in the corpus descending by frequency. Then you make lists of the top 1K, 5K, 10K, 50K, etc. words. Then you characterize a document by calculating the percentages of the document’s words that fall into each bucket.

Manhattan Distance, I choose you!

Guthrie pits a bunch of distance metrics against eachother and for Distance to the Textual Complement method the Manhattan distance got the blue ribbon, so I used that.


When I’ve been looking through the jobs before I can pretty much tell by their titles whether they’re busted or not, so my documents were just the job titles. There isn’t really a good reference corpus from which to build the Top-N word lists, so I just used the job titles themselves. I tried a couple different sets of Ns but ended up on 100, 300, 1000, 3000, and 10000 (Really ~7,000 as that’s the number of unique terms in all job titles).


Here’s the sort of all the jobs that were on the board yesterday.

Basically everything on the right is good and has low scores (distances).

Most of the jobs on the left have something anomalous in their titles. Let’s group up the anomalies by the reasons they’re broken and look over some of them.

Stuff that just ain’t right

These jobs just don’t belong on the board. We need to follow up with these people and issue refunds.

  1. Supervisor Commercial Administration Fox Networks Group
  2. Sales Executive Risk North Americas
  3. Associate Portfolio Manager Top Down Research
  4. Senior Actuarial Pre-Sales Consultant
  5. Ad Operations Coordinator
  6. NA Customer Service Representative for Sungard Energy
  7. Manager, Curation

Just Terrible Titles

These jobs would belong, but the titles chosen were pretty bad. Misspellings, too many abbreviations, etc. We need to follow up with our customers about these titles to improve them.

  1. Javascript Devlopers Gibraltar
  2. Sr Sys Admin Tech Oper Aux Svcs
  3. VR/AR Developer

Duplicate Information

Duplicate Information

Anywhere you see the title for a job on Stack Overflow or Careers we also show you things like the location of the job, whether it’s remote or not, and the company’s name. These titles duplicate that information. We need to follow up with our customers to improve them.

  1. Delivery Manager, Trade Me, New Zealand
  2. Visualization Developer, Calgary
  3. Technical Expert Hyderabad
  4. Technical Expert Pune
  5. Technical Expert Chennai
  6. Technical Expert Gurgaon
  7. New York Solutions Architect
  8. Sr Fullstack Eng needed for Cargurus We reach over 10MM unique visitors monthly
  9. Sr. Tester, Sky News, West London
  10. Chief Information Officer/CIO Audible.com
  11. Machine Learning Engineer Part Time Remote Working Available

What about the false positives?

A number of false positives are produced (this is just a sample):

  1. Computer Vision Scientist
  2. Mac OSX Guru
  3. Angularjs + .NET + You
  4. Android Developer 100% Boredom Free Zone
  5. Java Developer 100% Boredom Free Zone
  6. DevOps Engineer - Winner, Hottest DC Startups!!! $10M Series A
  7. Jr. Engineer at Drizly

Some of these (Computer Vision, Mac OSX) are just infrequently found on our board. Some of these people are trying to be unique (and are successful, by this analysis) so that their listing stands out.

Guthrie goes into a bit of detail about this in a section on precision and recall in the paper. His conclusion is that this kind of anomaly detection is particularly suited to when you have a human layer of detectors as the last line of defense and want to reduce the work they have to do. An exhaustive exploration of the scores finds that all of the jobs we need to follow up on are in the top 10% when ordered descending by their anomaly scores. Setting that threshold should cut the job our humans have to do by 90%, making them happier and less bored, and improving the quality of the job board.

So You Want a Zillion Developers…

I work at Stack Overflow on Careers 2.0. In addition to our job board we have a candidate database where you can search for developers to hire. Our candidate database has 124K+ developers in it right now.

Customers frequently gawk at this number because they’ve looked at other products in the dev hiring space that offer millions of candidates in their databases. Sourcing.io claims to have “over 4 million developers” in their database. Gild offers “Over 6 Million Developers”. Entelo will give you access to “18+ million candidates indexed from 20+ social sites.”

Yeah man, your numbers stink

Hey. That hurts.

Let’s put those numbers in perspective. The vast majority of the developers “in” these other databases don’t even know they exist. The devs never signed up to be listed or even indicated that they were looking for work. There isn’t even a way to opt out. These databases are built by scraping APIs and data dumps from sites developers actually care about like Stack Overflow and GitHub.

On the other hand the only people you’ll find in the Careers 2.0 database are ones who made the affirmative choice to be found. They start by getting an invitation to create a profile. They build out a profile with their employment and education history, open source projects, books they read, peer reviewed answers on Stack Overflow, and so on. Then they can choose to be listed as either an active candidate (they’re in the market for a job right now) or a passive candidate (they’re happy where they are but are willing to hear your offer). After a candidate gets hired they can delist themselves from the database so future employers don’t waste any time on them.

So the difference between us and them is that we give you a smaller number of candidates who are already interested in job offers and they give you a giant database filled with hope and built by skeez.

We have some data from Careers that tells us hope is not a recruiting strategy.

Our Experiment

Careers 2.0 experimented with the “index a bunch of people who don’t know they’re being indexed” model to see if it could possibly work. We created what we called “mini-profiles” which consisted exclusively of already public information available on Stack Overflow. We would add mini-profiles to the database if the Stack Overflow user provided a location in their profile and had a minimum number of answers with a minimum score. We showed these mini-profiles along with our “real” candidates in search results. If an employer wanted to contact one of the people behind a mini-profile Careers 2.0 would send them an e-mail asking if they want to open up a conversation with the employer. If the candidate wanted to continue they could authorize us to share their contact information with the employer and they’d start working on their love connection.

Our Results

We track response rates to employer messages to look out for bad actors and generally make sure the messaging system is healthy. A candidate can respond to a message interested/not interested or they can fail to respond at all. Response rate is defined as Messages Responded To / Messages Sent. When we compared the response rates of messages to mini-profiles to the response rates of messages to “real” profiles the results were not good for mini-profiles. Messages to “real” profiles were 6.5x more likely to get a response than messages to mini-profiles. That was the last and only straw for mini-profiles. We retired the experiment earlier this year.

So what about the zillions of programmers?

All those services I named at the beginning of this post do what we did in our experiment, just a little more extensively by including devs from more places online. I have to believe that the response rates from their unqualified leads are similar to the ones we found in our experiment. I suppose technically the response rates from randodevs on GitHub or Bitbucket could be higher than that of randodevs on Stack Overflow thus invalidating our conclusion, but anecdotal evidence from our customers about those other services suggests not.

“Wait a sec there Jason,” you’re thinking, “if their databases are at least 6.5x larger than yours I’ll still get more responses to my messages right?” Absolutely! That’s called spam. You are totally allowed to go down the path of the spammer but let me hip you to the two problems there. The first problem with your plan is that devs hate recruiting spam more than they hate PHP, and they hate PHP alot. The word will get out that you’re wasting everyone’s time. People will write about it. The second problem is that spam is supposed to be cheap. This isn’t cheap. In this case you’ll have to spend at least 6.5x the time wading through these zillions of devs identifying the ones that meet your hiring criteria, messaging them, and waiting for responses. So not only are you wasting their time, you’re wasting yours.

We aren’t going to build this business off hope and spam and wasting people’s time. If a smaller database is the price, so be it.

Commuting: A Perverse Incentive at Stack Exchange

So, we just went through comp review season here at the Stack Exchange. This is pretty much the only time of year we talk about money, because that’s the way we want it. We pay people enough to be happy and then shut up about it. You’ll probably only ever hear stuff about comp from me around September each year because that’s the only time it’s on my mind. The system works, and I’m generally happy about my financial situation, but we have a comp policy about remote work that subjects me to a bit of a perverse incentive when it comes to commuting.

The policy is that if you work out of the New York office, you get a 10% pay increase relative to what you’d be making if you worked remote. The reason for this has always been a little cloudy to me. I’ve heard cost of living adjustment. I’ve heard we want to incentivize people to be in the office because of “accidental” innovation from pick-up meetings and conversations in the hall. Regardless of the reason, that’s the policy.

I live in Stamford, CT and have been commuting to the New York Office 3 days a week (down from 5) since my daughter Elle was born in December. My total commute time averages just under 3 hours a day (10 min from my house to the Metro North, 55 minutes to Grand Central, 20 minutes from Grand Central down to the Office). So I end up commuting about 36 hours per month (down from 60).

On the Metro North getting a seat means cramming in next to one or two other people in side-by-side seats leaving little elbow room for typing (or living, FSM forbid they’re overweight), sitting in the seats that face each other and knee-knock with people who are drawn from a population with a mean height of 7 feet, or sitting on the floor in a vestibule near the doors. Some days the Metro North crawls because apparently they didn’t design this surface rail line to deal with even the slightest amount of rain. The subway is the subway, you get what you get. This commute stinks and it’d be my default position to forgo it.

Here’s where the perversion comes in. Let’s say I make $120K a year (I’m using this number because the math works out simply) out of the New York Office and decide to go remote. Every month I’ll make $1K less and get 36 hours of my life back. So Stack Exchange thinks my commute is worth $27.78 an hour. 4x minimum wage for no productive output is nice work if you can get it.

When done right, it makes people extremely productive. Private office? Check. Flexible hours? Check. Short commute? Check. I’ll let you in on a secret: most of our remote developers work longer hours than our in-office devs. It’s not required, and probably won’t always be the case, but when going to work is as simple as walking upstairs (pants optional, but recommended) people just tend to put in more hours and work more productively.

Going remote means a large portion of the 36 hours a month I spend commuting would go back to productive work (I won’t lie, some of it will be spent enjoying time with my daughter) so Stack Exchange is better off. I’d be happier because I get to skip the dreadful commute and work instead so I’d be better off. But I don’t make nearly enough that I can just drop 10% of my pay and not feel it.

Fun With RNGs: Calculating π

So, calculating π is a fun pastime for people it seems. There are many ways to do it, but this one is mine. It’s 12 lines of code, it wastes a lot of electricity and it takes forever to converge.

public double EstimatePi(int numberOfTrials)
  var r = new Random();
  return 4 * Enumerable.Range(1, numberOfTrials)
                       .Select(o => {
                                      var x = r.NextDouble();
                                      var y = r.NextDouble();
                                      return Math.Pow(x, 2) + Math.Pow(y, 2) < 1 ? 1 : 0;

What’s going on here? First we initialize our random number generator. Then for 1 to the number of trials we specify in the argument we do the following:

  1. Generate two random numbers between 0 and 1. We use one for the X coordinate and one for the Y coordinate of a point.
  2. We test if the point (X,Y) is inside the unit circle by using the formula for a circle (x2 + y2 = r2).
  3. If the point (X,Y) is inside the circle we return a 1 otherwise a zero.

Then we take the average of all those zeros and ones and multiply it by a magic number, 4. We have to multiply by four because the points we generate are all in the upper right quadrant of the xy-plane.

How bad is it? Here’s some output:

    Number Of Trials       Estimate of Pi
        10                  3.6
        100                 3.24
        1000                3.156
        10000               3.1856
        100000              3.14064
        1000000             3.139544
        10000000            3.1426372
        100000000           3.14183268
        1000000000          3.141593 (Took 2:23 to complete)

Things That, Were I to Unfortunately Wake Up Tomorrow as a Recruiter, I Would Never Do

I would never send e-mails that make potential candidates for a position think I’m not effective at finding potential candidates for a position. Giving candidates that impression just makes them think I stink at everything else too.

Subject: Barbara Nelson in Search of Javascript Expertise

Do you mean the Barbara Nelson?

Hello from Barbara!

What a great salutation! Not. Save that one for your next family newsletter.

I saw your profile either on github or on stackoverflow

Really? WOW! It sounds like you did a lot of research on me and moreover you’re the kind of go-getter who keeps the relevant information she needs at her fingertips at all times.

I am looking for several strong JavaScript Object-Oriented Engineers (not “just” web developers). These three openings have been especially challenging to fill…

Well alright let me click through and see what these jobs are about. Oh…no company names? The third one is really a C++ job? And you say you’re having trouble filling these positions?

Some JavaScript opportunities I am helping to fill are at solid funded start-ups, some are at start-ups already acquired by a well-known global company with solid benefits. We can make your relocation to the beautiful Bay Area happen if there’s a good fit.

That’s good I guess…I’m not really that interested in moving to the Bay Area.

Those who are interested in a brief discussion on the phone: please send a resume or an online profile that reflects your experience, a good time to talk, and a good phone number, and we’ll schedule a quick call.

Those who sent this e-mail should learn how to address the recipient directly and singularly instead of giving the impression that this is just another useless e-mail blast from a contingency recruiter.

If you never want to hear about career opportunities from me again, just let me know; reply and say so.

By the way you almost whited that out I’d almost think you didn’t want me to actually do that.

I love referrals.

I love how I almost don’t even get the feeling you’re trying to get me to do your job for you.



So let’s look an e-mail with a similar goal.

Subject: Facebook Engineering

Do you mean the Facebook? Let’s not be unfair to poor Barbara. Her subject line is much harder to get right than this one.

I hope all is well. I had the pleasure of stumbling upon your information online and saw that you have been working on some pretty neat stuff with Stack Overflow and various companies (it wasn’t disclosed on your resume) plus you have an awesome academic background from SUNY Geneseo to complement it.

This is much better than what Barbara had to say about me. Minimally Jeremy has read my public Careers 2.0 profile and noted my current position and where I went to school. He also called out the fact that I don’t list the companies I’ve worked at before on my profile (mainly so I can write about my experiences there when I want to without anyone getting bent out of shape). This e-mail is about me. It’s not a cattle call.

I am currently helping grow our engineering team in the NYC office and would love to chat with you about what you’ve been up to and perhaps put us on your radar; if nothing else we can have a friendly conversation. Let me know what works for you and we can schedule a time at your convenience. If this isn’t the right time, I completely understand and we can stay in touch based on your schedule – no rush. I look forward to hearing from you.

Great tone. Sounds like a human. He tells me what he’s after while being accomodating and not pushy. He makes me believe that if I respond, he’s going to respond back. Jeremy could’ve broken some of this down into paragraphs to make it less WALLOFTEXT but other than that it was a decent recruiting e-mail.

Get Your Redis On on Windows

TL;DR: Want a virtual machine running redis in however long it takes you to download 400MB + a little completely automated install time? Follow the instructions here and you’ll be on your way.

Well, it only took me a year of having this blog for me to write up something even remotely technical. But here you are, and here I am…so let’s just tone down the celebration a little bit and get on with it already.

So…it’s hard running a dev environment sometimes. We at the Stack Exchange will use anything to get the job done, but on the programmer’s side we’re mainly a windows shop. One piece of software we’ve come to know and love is Redis though. We love it so much we’ve got antirez on speed dial. It’s really the greatest.

Here’s where it isn’t quite the greatest though (for us): it’s really meant to run on Linux. Some people have made mostly working windows builds in the past that were good enough for dev’ing on but had weird behavior when it came to background operations. They’re great and I appreciate the work they d(o|id), but they fall behind when redis bumps stable versions (it’s behind 1 stable version right now leaving out features like the Lua scripting engine). Microsoft went through the rigamarole of patching redis so that it will run on windows, but that patch isn’t getting merged to master…ever.

So what’s a girl to do? When I’ve been on a team of one and had this kind of problem I thought to myself, “Self! Get VMWare on here, spin up a one off VM with ubuntu and just run it there! Problem Solved!” and many internal high-fives were had. But when you’re on a team of 6 (the Careers team, plug: we’re hiring) that doesn’t really scale well. So what are my choices? Let’s go to the big board of options:

  • Just tell my teammates “Hey, spend a couple hours spinning up your own VM and hope the one you have and the one I have match up and behave exactly the same”. (HINT: No)
  • Check in a 10 gig VM into source control and push so the other members of the team can run it too? (HINT: NO. That’s an example of what we call the “I quit” check-in.)

So how do you solve this problem?

Enter the Hobo

So it turns out a bunch of other people have this problem too (WEIRD RITE?). A smart dude decided to solve it and created Vagrant. Vagrant is a super simple yet powerful way to create and manage a reproducible virtual dev environment. Check in a couple kB of config and you get a virtual machine (or a multiple machine environment) your whole team can run. Vagrant wraps around Virtual Box for it’s virtual machines and it’s not just for windows. It runs on Linux and Mac too. Let’s run it down.

Getting in Installed

Follow the startup guide here. It’s basically install VirtualBox and install Vagrant.

Creating a machine

To create the machine, the first thing we do is create your Vagrantfile. Don’t worry…it isn’t a driter fetish. It’s just a config file that outlines how your virtual machine is setup. It’s also just a bit of ruby. Here’s the one we’re using:

File /Users/jpunyon/code/octopress/source/downloads/code/get-your-redis-on-on-windows/Vagrantfile could not be found

So first we tell Vagrant which box to use. A box is essentially a map from a key to a file. Box names can be anything you want, in this case I just have a name telling me that it’s ubuntu’s latest 64-bit release.

Next we have a url that points to a file. As it says in the comment there, this url points to a box file that will be downloaded if the box with the name in config.vm.box doesn’t exist. This is nice because it means i send the file and when my teammate runs it it will go fetch everything it needs to create the virtual machine. Brilliant. A bunch of base boxes can be found at Vagrantbox.es. They have many different guest operating systems and versions and such to use. Very cool.

Next we have some port forwarding settings. Vagrant takes care of setting up the network for you, you just have to tell it what you need. So I’m just forwarding to port 6379 on the guest machine (the default port on which redis runs) from port 6379 on my host machine.

Next I customize the vm to have a gig of memory instead of whatever the base box has by default.

So that’s it for the configuration the box. The last line runs a provisioner which will setup the box once it’s running. There are a number of provisioners to choose from including Puppet, Chef and shell. This was the gotcha for me when I was doing it the first time. The docs list the provisioners in this order…

So I spent an hour or two trying to grok the chef and puppet docs and ended up getting frustrated. Those systems have a bunch of abstractions in them which probably make them great for doing sys-adminny type stuff but in my head I was screaming “AAAARRRGH. JUST LET ME RUN A FUCKING SHELL SCRIPT!”. Of course I go back to the vagrant docs after that, look an inch or two down and feel like an idiot. I do wish the bullet points there went in order of increasing complexity though.

Long story short, the provisioner just executes the specified shell script on the guest box after it boots up.

Shellack It

So once the machine boots up what do we want it to do? Well, this:

File /Users/jpunyon/code/octopress/source/downloads/code/get-your-redis-on-on-windows/init.sh could not be found

So first we make the directories where redis will live. Then we go to the top level one, download and extract the code for the version of redis we’re interested in and build it. Then we copy the resulting executables to their final home in /opt/redis/bin.

Next we copy an init.d script to where it needs to be, then we copy the redis configuration to where it lives. Add a redis user, start redis and we’re all finished.

You might be asking “How did that init.d script and redis configuration get into the vagrant directory on the guest box?”

The way you run vagrant is by going to the directory where the Vagrantfile lives and typing vagrant up. That starts the whole ball rolling. When vagrant starts up your VM, it automatically shares the directory where the Vagrantfile is with the guest box at /vagrant on the guest box. It’s a magical default behavior.

So that’s pretty much it

Well, for now anyway. Vagrant can be used to setup multiple machine environments (which I might do next to test out an elasticsearch cluster for Careers). It has many more bells and whistles to keep your virtual dev environment running lean and mean. I’ve been super impressed with just how easy it is to work with (total home grown code to get my redis VM up was 31 lines, 15 of which were the shell script for installing redis) and bonus everyone on my team thinks I’m a hero. It’s that magical.

+1 Vagrant…+1.


This is the init.d script I used which I cribbed from Ian Lewis.

File /Users/jpunyon/code/octopress/source/downloads/code/get-your-redis-on-on-windows/redis.init.d could not be found

Geography’s the Fuck.

Raganwald poses an interesting question. Why do some of the best minds in our industry spend time figuring out how to make people click more on ads? Aren’t there more interesting problems for these bright up-and-comers to spend their valuable time and insight on?

One simple answer to the question is Geography.

How do you know this?

I’ve spent a little time figuring out how to make people click more on ads :)

I work at Stack Exchange on our Careers 2.0. We try to make it easier for programmers to get better jobs. We have two main ways of getting people to the Careers 2.0 website. Our users evangelize on our behalf by inviting their programmer friends to show off their accomplishments on their own Careers profiles (like mine here).

The second way looks like this

Me and the team spent a bit of time trying to figure out how to get programmers to click on this ad. It (and it’s smaller variant) shows up on the vast majority of the pages of Stack Overflow.

The number one way to make people click on this ad is to show them a job from a place near them. It’s simple as that. Showing the user a job near them outperforms every other way of constructing the ad we could come up with by a factor of between 2 and 5.

Location is still incredibly important to job seekers (and employers). If our ad analysis is to be believed, they are more interested in having a job near them than having a job that matches their skills. Or a job that is interesting. Or a job where they work on “super important” problems.


I know this to be true from my own experience. I spent the summers of my college years working in Berkeley, CA for Lawrence Livermore National Laboratory. We did low energy nuclear physics experiments measuring certain reaction cross sections on radioactive nuclei. Some of this work gets used downstream by the Stockpile Stewardship program. Stockpile Stewardship is the program responsible for maintaining the reliability of the United States nuclear arsenal. The United States doesn’t test nuclear weapons anymore and this program ensures that the ones we have continue to work, and ascertain the failure modes of the ones that aren’t going to work. I think this would qualify as an important problem to Raganwald (and most other people). Most of the work I did was on software for simulating our particle and gamma ray detectors.

After college was over I ended up back in New York because that’s where my life was. I worked at a management consulting company and then a couple hedge funds before I got my head out of my ass and realized I wanted to work on something that mattered a little bit more than the continued aggrandizement of the uber-rich. By then I’d been married for a few years. My wife had a job in New York, my family was in New York (and elsewhere on the East coast). If I was going to work on something that mattered, it was going to have to be something that mattered…in New York.

Wait…haven’t we solved this?

But, what about technology? Hasn’t technology solved the geography problem? Don’t we have Skype, company chat, Google hangouts and shudder the lowly telephone to connect people remotely? All these media come with their own problems. You need people who are exceptionally good at communicating through electronic media for these solutions to work. It is REALLY HARD. Stack Exchange was founded remotely. We built our own chat system because others were inadequate. We hire a bunch of remote devs, sys admins and others. I do internal support for people in 3 timezones (that’s more than any of my international hedge fund jobs). A lot of people aren’t cut out for it. I’ll admit it, sometimes I’m not.

Long story short it’s really easy to say “Hey we’ll just hire the best people remotely.” but much harder to do in practice. Your culture has to be just so.


Unfortunately you and the “right job for you” (for some criteria of right) are star-crossed. At all times there’s an ordered list of jobs that are a best match for you based on all factors. Rarely anyone has the Juliet at the top of their list. You and your Romeo pass eachother silently (and not so silently) in the night for myriad reasons. Transaction costs for switching jobs are high. People need certainty. I’m harping on geography here, but it’s merely one of the reasons people aren’t working “the right” job or fixing “the right” problems.

Geography is important to programmers. It’s probably at the top of the list of factors that goes into deciding whether a job is “right”. If the company working on “important” problems doesn’t jive with your geography, you’re probably going to leave it by the wayside.

Thanks to Matt Jibson for an edit.

A Guy Walks Into an Apple Store…

So me and my wife had a babby recently. Unfortunately my wife had some preterm labor around week 33 and we had to spend about 10 days in the hospital. Don’t worry, everything turned out all right (see perfection below), but I was pretty burnt out after the 10 days in the hospital. My wife was happy with my performance during our mini-crisis and she told me to go indulge a little. I’d been super excited about the iPhone 5 release, so I went and picked one up.


Fast forward a few weeks and a couple things started going wrong with the phone. The first thing you can probably see from Elle’s picture. Let me blow it up for you…

That’s not a birthmark, something was wrong with the camera. It showed up the week my wife was due. I had slightly more important things happening so it just went on the ever growing pile of stuff I needed to do later.

Then right after the babby was born another problem popped up. I was talking to my sister on the phone at CVS and the receiver just cut out. Tried calling back, no dice. The speakerphone worked and if I used the headphones I got sound but the receiver was completely borked. But then a day later I picked up the phone and called someone and voila the receiver was working again. Weird. Cut to a few days later and it was out again. So the problem was intermittent.

I’d never had a problem with an Apple product before, so I had no idea how their customer service was in situations like this. I assumed the worst. I worried they were going to tell me it was my fault and they had never seen problems like this and if I want to get it fixed it’d cost ${hefty sum}. So as I usually do when I think I’m going in against a company that’s going to try to screw me I went about building my case.

I started searching around for problems like mine. I found a couple threads talking about the earpiece problem. So it wasn’t just me or my phone. I fired up my iPad and loaded up those bookmarks in safari and headed to the store. I didn’t have a plan for the camera discussion, but the earpiece case seemed pretty strong. I figured I’d lead with that, then throw the camera problem in later.

I drove to the store, steeling myself to deal with crappy customer service and a maybe dim/maybe dumb rep. Went over the plan in my head a couple times and thought about how to deal with refutations. I arrived, walked up to the genius bar and made an appointment. They told me it’d be an hour so I cruised around the mall for a while and came back. I bellied up to the bar to meet my genius with my first rehearsed line on the tip of my tongue ready to start sparring.

She asked me what the problem was and I demonstrated it for her by pressing play on one of my voicemails and holding the phone up to her ear. “See, no sound.” I said. Then before I was able to say anything else she whisked my phone away into the back and came back a minute later. “Yep. I confirmed there was no sound. We’re going to go ahead and replace your phone.”

But…but…I had all my plans to deal with a crappy customer service rep and get outraged about some ridiculous policy. I practiced arguments. I didn’t even get to show you my links!

I had dealing-with-bad-customer-service-blue-balls. I was so ready for an argument that just never materialized. So what do you do when you get what you want? YOU SHUT UP and let them give it to you. I almost made it but broke while she was setting up the new phone. I tossed out a “Is this kind of thing common? I read about it on blah blah blah” nonchalantly, half making conversation but mostly just needing to get out some of that pent up preparation frustration. I picked up my brand new working phone and headed out happy as a school girl.

Compare and Contrast

A few days later I picked up Influence: The Psychology of Persuasion. This isn’t usually the kind of thing I read, but I was really impressed with a guy I learned about named Apollo Robbins. It was reported he was enamored with the book so I grabbed it on my kindle and started reading.

Chapter 1 of this book talks about a bunch of stuff, but the thing that caught my eye was the contrast principle.

There is a principle in human perception, the contrast principle, that affects the way we see the difference between two things that are presented one after another. Simply put, if the second item is fairly different from the first, we will tend to see it as more different than it actually is. So if we lift a light object first and then lift a heavy object, we will estimate the second object to be heavier than if we had lifted it without first trying the light one.

Robert B. Cialdini Influence: The Psychology of Persuasion

So What’s The Point?

OK, so I’ve had you reading this pedestrian story about increasing the planet’s population, things working out splendidly at the Apple store, a cool video on the internet and a 6 year old book for 13 paragraphs now. Here’s the point:

Having good customer service is even more powerful than you think.

I know…that came out of nowhere, right?

Things go wrong with company’s products. This makes customers worry. Not just because the thing they have is broken, but because there’s so much bad customer service out there and they now have to go interact with it. Cable companies, utility companies, these guys, the MTA, and the DMV are all out there creating bad customer service experiences.

When something goes wrong with your product your customer gets worried. They’re primed by all the bad customer service experiences they’ve ever had and they call you up or walk into your store. At this point you have a tremendous opportunity to take advantage of the contrast principle. They’ve presented themselves with all the bad service they’ve ever gotten worrying if you’ll be the same. They’re aiding and abetting your success before they even talk to you. Present them with just adequate service and because of the amplifying effect of the contrast principle they’ll think you’re the greatest thing since that last thing that was so great.

Now imagine how over the moon your customers will be when you empower your customer service team to go the extra mile and literally do everything it takes to make them happy…

Rock Stars Went Where?

So I read this article today about how the tech industry is too elitist and thinks everyone who’s good is already rich and can follow their bliss until the cows come home waiting for the olympian software companies of the world to come hire them. I just wanted to provide a data point from Stack Exchange. I’m not gonna bloviate that we’re all rock stars at Stack Exchange (I wouldn’t even describe myself like that) but here’s the list of schools attended by all the programmers and sysadmins at Stack Exchange:

Georgia State University
Arcadia University
Metro State College of Denver
Technische Universität Clausthal
University of Texas
University of Exeter
Boston University
University of New South Wales
Rensselaer Polytechnic Institute
State University of New York at Geneseo (Plug: The physics department is awesome.)
University of Pittsburgh
Colorado State University
New England Conservatory of Music
Cleveland Conservatory of Music
North Carolina State University (2)
Washington University, St Louis.
Montgomery County Community College
Northern Territory University

You might notice a significant dearth of Ivy there. (And the one Ivy Leaguer we do have has been promoted to management so he doesn’t even code anymore. We still love him though :))

An underlying premise of Stack Overflow and Stack Exchange is that there’s tremendous amounts of knowledge tied up in “ordinary” people, and given an easy way to show it off and a little incentive they can make this knowledge available and help as many people as possible. Given Stack Exchange’s belief in the common (wo)?man it’s not surprising that the educational background of our programmers and system administrators is pretty…well, ordinary.

Locke setup up a straw man straw man by invoking Joel. I know he said the words “Ivy League” but Joel wasn’t really talking about the Ivy League. He was talking about selectivity. Specifically that as a sorting criteria a resume showed that the associated applicant had successfully navigated some process that was highly selective:

Selectivity. Another thing we look for on resumes is evidence that someone has gone through some highly selective process in the past. Not everyone at Ivy League schools is worth hiring, and not everyone at community college is worth avoiding, but getting into a very selective school does at least mean that someone, somewhere judged you using some kind of selection process and decided that you were pretty smart. Our company criterion for selectivity is usually getting into a school or program that accepts less than 30% of its applicants (there are about 60 schools in the US that meet this standard), or working for a company which is known to have a difficult application process, like a whole day of interviews. Highly selective branches of the military like officer’s training or pilot’s courses, or even just getting into the Marines indicates someone that has made it through some kind of difficult application/selection procedure and all in all this is a positive sign.

Tech companies are looking outside the Ivy League and the Dan Shippers of the world for their “rock star” developers. I could list the previous positions of all the Stack Exchange programmers and sysadmins but I’ll save you the suspense: none of us were paying our bills running our own startups just waiting for something awesome like Stack Exchange to come along. In my case I sought them out and quit my cushy yet crappy hedge fund job.

Complaining about how it’s not fair because they went to an Ivy League school and you didn’t or they started a company and you didn’t and these things makes their resume look better than yours isn’t going to get you very far, though. You have to make it easy for employers to choose you from the pile of resumes. You know there’s competition out there for these jobs. If you want to work at a start up you have know you’re going up against some of the best in the industry and you have to make yourself stand out.

Your resume (or your Careers 2.0 profile) has the single purpose of showing them you’re worth it. It has to make them believe you’re worth calling for an interview. Provide some evidence. Don’t have any? You’re a programmer…manufacture some! Make a blog. Write a ruby gem. Answer some questions on Stack Overflow. Write some code for a charity. Make a website for your band. This stuff won’t take forever to do. It isn’t like the old days where you had to apprentice with a master furniture maker for a decade to get some cred. The medium you work in allows you to make things in days, not years. It doesn’t have to change the world, it just has to give the person looking at your resume a reason to choose you instead of the other g(al|uy).

Just in case you’re in the market and looking to stand out…we built Careers 2.0 to help you stand out by enabling you to show off your programmery stuff. It’s invitation only so if you’d like an invite just tweet me @JasonPunyon and I’ll hook you up. You may not have gone to an Ivy League school but if you can show off some evidence that you’ve got skills you rank up there with the best of the Yalees and Browners. You’ve got a section for open source projects hosted at Github, Bitbucket, Code Plex or Source Forge. There’s a section for Apps you’ve written. Write a blog or read any good books lately? Choose your favorites and show them off. When employers search the Careers database for candidates with particular skills we take all of this stuff into account and sort you accordingly.

Also! We’re always looking for great developers to come work with us at Stack Exchange. Don’t let what school you went to keep you from applying today (we’ll even take you Ivy Leaguers).

Oh and again, if you’re into physics…GO TO GENESEO…if only to watch Dr. Fletcher’s valiant yearly attempts to demonstrate quantum tunneling by running full speed towards the classroom wall. He hasn’t succeeded yet, but if he tries long enough probability says…