Jan 24, 2015

Avoid Car Accidents in NYC

With all of the driving I do in the city, I wondered how I can improve my odds of avoiding accidents, everything from small collisions to serious accidents, and safely reach my destination.  If you are in the same boat (even if you take taxis on a regular basis), you may be interested in this post.  

Using data from NY's Open Data website, I analyzed all reported motor vehicle collisions in NYC during 2014.  Because time and location data (in the form of longitude/latitude coordinates) was available for each accident, I was able to create a series to diagrams and heat maps to help answer the core questions of avoiding vehicle accidents, specifically: (1) what areas should you avoid while driving, and (2) what times of day should you avoid while driving?  There were about 173,000 reported vehicle collisions in NYC last year, let's see what the data reveals.

Where to not Drive
To start, below is heat map of those 173,000 collisions in NYC (we'll zoom in on specific areas in a moment).  The green areas have few collisions, the red areas have many collisions (over 50 a year).  Coming as no surprise, Manhattan's midtown area has the highest concentration of accidents.  However, you can also see that major highways in Brooklyn and Queens sustain regular accidents (again, not a surprise).

Heat Map of Vehicle Collisions in NYC, 2014 (Red = Highest Concentration of Collisions)

If we zoom in on Manhattan and the surrounding borough areas, we have the below heat map.  I have indicated areas where accidents are most prevalent. 



Midtown Manhattan (running from ~34th St to ~59th St) is not only the busiest area of NYC, it also is fraught with construction, so it's no surprise accidents are heavily concentrated here.  Delancy Street and Canal Street are the major roads leading to major bridges, namely the Williamsburg Bridge and Manhattan Bridge.  In addition, heading west, Canal St. is the major throughfare to the Holland Tunnel to NJ.  These two roads have to accommodate a lot of traffic as a result.  Rounding out the accident-prone areas of NYC is the Barclay's center in Brooklyn, at the intersections of two major roads: Atlantic Avenue and Flatbush Avenue.  It might be hard to avoid these areas due to their centrality but we'll also discuss how timing plays a role when driving.

The Worst Place to Drive
Is there a specific area in NYC that has the highest concentration of collisions per year?  In fact, there is.  It is in midtown Manhattan (east side), and it's where the Queensboro Bridge enters the city at 60th Street & 2nd Avenue.  This areas sustains over 275 collisions a year!  That's about one collision every 32hrs.  Craziness.  Is there something about this bridge and 2nd Ave that leads to this staggering statistic?  I think so.  Check out how the bridge enters the city: it essentially comes to a T-intersection with 2nd Ave and the surrounding side streets.  Imagine trying to navigate this:




No thanks.  Adding to the mayhem, the trolley tram from Manhattan to Roosevelt Island is located in the immediate vicinity, with lots of associated pedestrian foot traffic.

And that T-intersection I was talking about?  Here's a street view of the area.  As you can see, approaching from the bridge, if you inch out even a little bit on 2nd Ave, you're going to be hit.



The Queensboro Bridge is also a heavily used bridge because there's no toll for crossing it, which probably adds to the concentration of accidents in this area.

When to Not Drive
Avoiding the above-mentioned 'hot spots' makes sense but this recommendation probably doesn't need to be heeded all day.  What are the best / worst times to drive in NYC?  Using the same data set mentioned above, I created the following histogram, which shows how many collisions occurred in 2014 at different times of day.



Not surprising, most collisions occur during rush hour - but using the above chart as a proxy, rush 'hour' happens to be a rush day, occurring from ~8:00AM to ~7:00PM when accidents are at daily highs.  Ouch.  To decrease your chances of collision, you'll have to leave / enter NYC before 8:00AM, or after 7:00PM.  That kind of sucks but at least you know when your chances of being in a collision increase / decrease.

There are also a few interesting side-notes worth mentioning about the above histogram:

  1. You can see there is a slight up-tick in accidents in the early hours (around midnight to 2:00AM).  This is likely due to party-goers driving themselves home or taking a taxi home.
  2. You'll notice a very sharp decrease in the number of collisions in the middle of rush hour (around 4:00PM)!  My guess for this nearly 3,000 accident daily drop occurs because of taxi shift changes.  In NYC, taxis are driven by two drivers, each on duty for a 12hr shift.  To make each driver's shift equally attractive, the high demand time of day is split between the two people, the split occurring at peak demand time.  Unfortunately, the first driver has to hand off the taxi to the second driver in Queens (where most taxi depots are located), resulting in a massive taxi decrease in the city.  This NY Times article explains this further.

Driving on The Weekends
Breaking down collisions by day of the week, we obtain the below histogram (which is a little hard to read at first):




You can see two interesting things:

  1. Saturday and Sunday experience about 30% fewer collisions than any other weekday
  2. (Likely) due to Saturday night party goers, late-night collisions peak late Saturday / early Sunday (i.e. at least double the number of collisions in comparison to any other weekday).

Recommendations
Using the above heat maps and historgrams, you should:

  1. Avoid Midtown (especially the Queensboro Bridge) and Lower Manhattan (Delancey / Canal St.) by using other entries into the city, or (even better) by using the subway
  2. If you have to drive into / out of the city via the 'hot spots' do so before 8:00AM or after 7:00PM to lower your chances of accident
  3. Drive into / out of the city at any time during the weekends (but be aware of driving late at night: there are more collisions at this time, and you'll likely be tired as well)

Other than these recommendations, using the data, we can also conclude that the number of accidents increases as there are more vehicles on the road.  However, with the help of some informative data analysis, maybe we can break this trend.  Drive safely!

______
SAS was used for this analysis; code is available upon request
heatmaps generated using QGIS
Source of data from NY's Open Data website 

Jan 11, 2015

All of NYC's Farmers Market Locations

Fresh veggies, spices, cheeses, beers, soaps - the litany of products available at NYC's farmers markets is impressive!  If you're a fellow New Yorker, you've probably gone to at least one market.  But I'm also sure you've overlooked or missed at least one because you weren't aware of its location or season of operation.  Why not make a map showing all of NYC's farmers market locations?  That could be useful.

I made the below map from NY State's Open Data initiative.  From the state's website, anyone can download data sets on publicly available information; in this case, the location of every NYC farmers market.

All of the market locations have been color coded to indicate when they're running (Summer, Multiple Seasons, Year-Round), and you can filter the map to highlight only those markets that fit a Season criteria of your choice.  Click a market icon, and you'll see the name, operating hours and days, and the website of the farmers market.  If you want to view the map in full screen, click the link at the bottom-left of the map.


There's probably a farmers market closer to you than you think; take a look!



View NYC Farmers' Markets in a full screen map


If you don't see a market on the map that you definitely know is there (remember, Smorgasburg in Williamsburg, Brooklyn technically doesn't count) let me know and I'll inform the state to update their database!


______
Excel used for categorizing markets
Map produced using BatchGeo
Raw data from Data.ny.gov

Jan 8, 2015

When & How Should you Play the PowerBall Lottery?

Do you play the PowerBall lottery on a consistent basis?  Why not improve your odds of making money by following some simple principles about when and how you should play PowerBall.

If you're unfamiliar with PowerBall, here's a quick run-down on how to play and your odds of winning.  Every Wednesday and Saturday, 5 white balls are selected at random from a group of 59, and a single red PowerBall is selected from a group of 35.  To win the jackpot (which is always a minimum of $40,000,000), you have to match the 5 white balls and the 1 red PowerBall.  However, in addition to the jackpot, there are eight other ways to win, each with a different cash winning and chance (odds) of winning; here are the ways to win:


powerball.com
To play, each ticket costs you $2.  There is a 'multiplier' option with your ticket that costs an additional $1, and it works like this: all winnings (except for the jackpot) are multiplied by a number (2 through 5) chosen by a computer prior to the drawing of balls (note that the higher numbers are selected less often by the computer).  The winning with the multiplier are as follows:


powerball.com

Great, you know how to play - but should you?
To help us answer this question, we first need to think of your ticket purchase as an investment, and the drawing outcome as either your profit or loss.  In the simplest case, you make an investment of $2 and hope for a profit.  If you win, you profit is the winnings minus the ticket cost (e.g. $1,000,000 - $2 = $999,998).

We know the odds of each type of win, and the respective payout for each (we'll assume NO multiplier for now).  With this information, we can calculate your ticket's expected value.  Expected value is the average profit (or loss) you expect to make if you play on a continual basis.  If you play only once, your ticket will either be worth $0 (no win) or greater than $0 (you won something), obviously NOT the average.  However, if you consistently play, you will win some times and lose other times, and you'll have an average profit (or loss) depending on your tickets' performances.  This average profit/loss is expected value.  If we assume a jackpot size of $40,000,000 (the minimum), you have the following expected value for your ticket:



The above table tells us that by consistently buying $2 tickets, you can expect a loss in the long run, if the jackpot is $40,000,000.  Put another way, for each ticket you buy, your investment of $2 will, on average, reap a $1.41 loss.  Not good.  You might as well put that money towards paying your NYC rent.

But we also know that the jackpot size increases over time if there are no winners.  So there must be some jackpot size that is large enough to make the investment break-even.  This is true, and the below graph shows the jackpot size needed to achieve this:


FIGURE #1

The above graph tells us that you shouldn't play PowerBall unless the advertised jackpot size is greater than $290,000,000.  Why?  Because this jackpot is large enough to compensate for the dismal odds of winning, and you'll thus have a positive investment in the long run.

But there's a big catch
What if there are multiple jackpot winners?  If multiple people win a jackpot, the winnings are split across all recipients.  This outcome would significantly reduce your ticket's value.  We have to include the probability of more than one jackpot winner within the expected value calculations made above.  To do this, we leverage what are called Bernoulli trials.  In a nutshell, Bernoulli trials calculate the probability of N number of successes with X number of attempts in a particular scenario.  The scenario is winning the jackpot, the number of success is 'greater than 1 winner' and the number of attempts (i.e. trials) is the number of purchased tickets.  The following graph, created using the equation for Bernoulli trials, shows the probability of 0 winners, 1 winner and 'more than 1 winner' as the number of participants increases:



If the graph is complicated, don't worry about it.  The take away is that the probability of more than one winner increases as more lottery tickets are sold (intuitive, right?) but this, unfortunately, decreases your ticket's expected value.  How much does this change your ticket's expected value?  In other words, how will this change Figure #1 above?  To answer this question, we need to know how many people will participate in the lottery at different jackpot sizes.  Luckily for us, there is a strong correlation between the jackpot size and the number of tickets sold.  The below graph (built from two years of historical PowerBall data) clearly shows this correlation: as the jackpot increases in size, more people are struck with lottery fever, and more tickets are sold.  Note I have also placed a regression model line on top of the data, and it will be used to estimate the number of players based upon the jackpot size.


lottoreport.com

Now we have everything needed to update Figure #1 and thus better understand your ticket's expected value with the probability of multiple jackpot winners.  The below chart is the updated version of Figure #1, and it incorporates everything we've learned thus far (odds of you winning, ticket expected value, and probability of other players winning).  As you can see, the below chart has an interesting shape: your ticket's expected value is positive for a certain range and then becomes negative again.  Specifically, your ticket's expected value is positive if the jackpot size is between the values of $300,000,000 and $570,000,000:


FIGURE #2

This means that if you play on a consistent basis ONLY when the jackpot is between $300 and $570MM, you will have, on average, a positive return.  Put another way, for every $2 you spend to purchase a PowerBall ticket, you can expect up to a $0.50/ticket return on your investment.

Cash-Out & Multiplier Options
To be clear, everything we've covered thus far assumes a regular $2 ticket (with no multiplier option at $1 extra per ticket) and the Annuity Option if you win the jackpot.  The annuity option means you win the advertised jackpot amount and are paid it over the course of 30 years (one payment a year).  However, most people choose the 'cash-out' option, meaning the jackpot winnings are immediately paid-out; the consequence is that the cash-out amount is significantly reduced in comparison to the advertised / annuity amount.  I looked at all the winning amounts for the past several years and, on average, the amount is reduced by factor of 1.85.  For example, your $40,000,000 annuity winning would be reduced to approximately $21,620,000 if you opted for the cash-out option.

Now figure #2 above can be updated with the expected value of your ticket under two new scenarios: (1) if you choose the cash-out option, and (2) if you choose the multiplier options (note, I have assumed the best multiplier option of 5).  The results are as follows in Figure #3 below.  As you can see, in the long run, the cash-out option as well as the multiplier option (even in the best case scenario) do NOT result in positive investments:


FIGURE #3

Recommendations
Your chances of winning are the same for every ticket you purchase (and the odds are dismally small, let's just be honest with ourselves) but if you consistently buy PowerBall tickets, you might as well improve your return on investment buy following these recommendations:

  • Purchase tickets only when the advertised jackpot is between $300 and $570MM
  • Purchase only the regular $2 ticket without the multiplier option
  • Choose the annuity option for collecting your jackpot winnings

Best of luck!

______
Excel was used for this analysis; spreadsheets available upon request
Source of data and pictures from Powerball:
http://www.powerball.com/powerball/pb_stories.asp
http://www.powerball.com/powerball/pb_prizes.asp