Wednesday, September 23, 2015

Bonus Stats - Year vs Minutes to Spare



Here's some additional data (you might need to click the image to make it clearer). Of all the finishers of the Feeder Races (minus Berlin, because I still think that's an outlier), here are the breakdowns of how many people finished per cutoff category.

As you can clearly see, for races run this year vs last year, despite fewer total finishers, more people beat the various cutoffs.

Bonus Update - with the last 5 minutes expanded in 15 second intervals (from the data pool of 24 feeder marathons)




FAQ
Believe it or not, I started this thing as a way to ease my mind. I had the post-marathon depression, and needed something to occupy my time. I had a BQ with 67 seconds in the bank, and I wanted to try and prove to myself that I would get in. ... this data exercise has morphed into something much larger. With nearly 100,000 views, I did not expect this blog to get so much attention.

Q1. How accurate were you in predicting last year's cutoff?
A1. I did not do this exercise last year

Q2. Can you run the numbers for last year and see how close your method would be?
A2. No Friggin way! It has taken hours and hours and hours to assemble all the finisher data so far. It would be a neat project, if you want to pay me to do, I will consider it :-)

Q3. Do you still think your method is correct?
A3. Yes. Based on the assumptions; however, the biggest assumption is that the same proportion of people apply for Boston as did last year - I do not know if that is true or not; moreover, I do not know if more Faster people are applying this year vs last year.

Q4. Do you still think the cutoff will be 91 seconds?
A4. No. I think it will be closer to 2 minutes and 10 seconds. [Edit to add: I think it was a mistake to carry the Berlin Marathon through the analysis. It was clearly an outlier, and although listed as one of the top feeder races, is likely not proportionally representative of the BAA application field. The analysis without Berlin is probably accurate, hence my 2:10 prediction].

Q5. Are you going to do this next year?
A5. No!

Q6. What about Erie and Lehigh and marathon X that are "double-qualifiers"?
A6. Yes, I know about that. I only counted them once. Yes, it might be a source of error. But: These marathons were also "double-qualifiers" the year before so by only counting them once and comparing them against each other, I think it smooths the results and keeps a consistent methodology.

Q7. What about people who "age-up"? (i.e. were 44 when they qualified, but are running Boston as a 45 year old - hence different standard)
A7. Yes, again, a source of error... But: The methodology is applied consistently, and there would have been people who "aged-up" the previous year too. As long as the same percentage of people "age-up" across categories, this error will smooth itself out.

Q8. Can I get a copy of all your data?
A8. I'm not sure. Technically I think it is copyrighted by the various races and/or timing companies. I have "screen-scaped" it into a nice tidy database for my own use, which I'm pretty sure is not in contravention of the copyright laws.  Perhaps I can send it to you with names removed? I don't know. It's a large data file with over 500,000 records. Let me think about it and I'll get back to you.

And now, let me ask you this... I've seen my blog posted on Facebook and Forums. Every time, people refer to me as "he". Why is this? ;-)








37 comments:

  1. Thank you very much for keeping this blog going throughout the year.

    To answer your question: "And now, let me ask you this... I've seen my blog posted on Facebook and Forums. Every time, people refer to me as "he". Why is this? ;-)" Click on the View My Complete Profile link on the right. The About section refers to you as Male. Might be the reason why he, or him, rather than her or she has been used by many in the forums and fb. :-) Best of Luck to getting into Boston 2016!!!

    ReplyDelete
    Replies
    1. Oops. My auto-anti-creeper mechanism must have been set to "on" when I created the account.

      Delete
    2. Hi,
      Great work. As so many, I have spent the last days calculating my chances. Your data on feeder races is great information and your predicted cutoff at -2:07 would leave me in (-2:50), but I am concerned about some aspects:
      - what about the -5:00 and better runners that deferred registration to the second week? Hopefully not many...
      - last year, there were about 350 streaker in that group, they get in first and raise the cutoff.
      - BAA data, doesn't seem to be quite consistent. They announced as many time spots as 2015, about 23,500 and said that there were about 15% more first week registration than last years ~16,750, which would be around 19,250, meaning there are only 4,250 spots left...
      So the pessimist in me suggests a cutoff of over three minutes. Can you please tell me that I am wrong and put my mind to ease?

      Thanks, Peter

      Delete
    3. Three minutes?! Holy crap that would make me sad. I thought I was safe with my -2:40 :(

      Delete
    4. Peter,

      I'm pretty much in the same boat as you regarding my time. I'm thinking (maybe I'm being too optimistic) that the BAA would not overestimate the number of spots left. Also, the only place I saw that there were 5,000 spots available was from the Runner's World article and there were some numbers that didn't math up from that and the BAA website.

      Overall, I'm just hoping for the best!

      Delete
  2. You deserve an honorary spot at the Boston Marathon if you do not make the cutoff. Thanks for sharing all of your work. I am a data geek and I have referred many to your website. From female geek to female geek, I salute you!

    ReplyDelete
  3. Great stuff. Why not use the distribution of times in your 2016 feeder race data to predict cutoff? They announced 5000 spots left after week 1, and after calibrating the distribution to predict the the 2014 and 2015 results, you can compute the 2016 time after they revealed the number of slots left. That method gives BQ-118s (versus simple regression which gave BQ-114s).

    ReplyDelete
    Replies
    1. corrected the model based on the finer cuts of times (seconds in the bank) you posted from the feeder races. Now looks like the cutoff will be 127 seconds. Puts me out of contention sadly. Good luck to everyone.

      Delete
  4. Thank you so much for doing all this number crunching! Even though we'll only know for sure on September 30, it's helpful to get a sense of where the predictions stand. Keeping my fingers crossed!

    ReplyDelete
  5. How did you get those % values for the 15 second intervals? Are they accurate?
    By the way, you're amazing.

    ReplyDelete
    Replies
    1. The percentages are % of people in that particular 15 second block vs the total number of people in the 5 minute block, from the data that I have harvested (not all of the race results).

      Oh dear, I think I'm even starting to confuse myself.

      Delete
  6. You freaking rule. And, yes, sell some ads or something and do this next year. You are the only person predicting marathon cutoff times and about 10,000 people are glued to your every word.

    ReplyDelete
    Replies
    1. Agreeing with you again, Dick. I've been on this page several times since I found it.
      Feeling physically ill from the anxiousness.

      Thank you for this analysis! Even if it has flaws, it's 100x better than anything I could figure out on my own.
      I don't understand your latest post but will take 127 seconds. Hoping it doesn't creep much higher than that.

      Delete
  7. You freaking rule. And, yes, sell some ads or something and do this next year. You are the only person predicting marathon cutoff times and about 10,000 people are glued to your every word.

    ReplyDelete
    Replies
    1. If (s)he doesn't do it, I'm considering it myself. I love numbers and am a developer so this is probably in my wheelhouse. Lol! Put my Java skills to work!

      Delete
  8. I have BQ-94 secs and I'm crying now with your last prediction.

    Hope you are very wrong, so we can both get in :)


    Big hug ;)

    ReplyDelete
  9. TY, you made me go get a better BQ

    ReplyDelete
  10. Thank you for a great blog. As a data scientist, I used a few tricks I picked up from you in presenting my work.

    Unfortunately, I'm at BQ-99 and have given up hope. Still, I got something out of the experience.

    ReplyDelete
  11. I'm BQ-124 sec. It has made more determined to crush this barrier on my next marathon. Fellow squeakers RISE UP! Boston will always be there for the "new" you.

    ReplyDelete
    Replies
    1. I yearn for the good old days. When I first started running, it used to be BAA gave you a Bonus 59 seconds - you need 3:40? Don't worry, just run 3:40:59. Make the time and you're in. Now we're fighting tooth and nail for a second here and and a second there. I'm pretty sure I can't run any faster than I did last year. I'll have to wait till I age-up, and try again.

      Delete
    2. Haha - I love this! "Fellow squeakers RISE UP!"

      Yes, let's all rejoice in the fact that we are not alone in our suffering and waiting. I'm praying to the running gods that BQ-152 will get me in!

      Delete
  12. I first found this page when the projected cut-off time was 48 seconds. I have BQ -49 seconds. I was sweating it then and have checked back on this page weekly hoping things would work out in my favor but the projected cutoff kept climbing. Still holding out a little hope that we might get a miracle but have almost come to terms with the impending reality of not getting to run my first Boston in 2016. However, I AM STILL A BOSTON QUALIFIER and will proudly display my 26.2 magnet along with my blue and yellow BQ magnet on my car. Those of us that have a BQ, we have earned it and just because we won't get to run this year shouldn't lesson the accomplishment of qualifying for the prestigious Boston Marathon. Time to regroup, reassess training, and just go after a faster time. Hoping a 2:59 is in my future to leave no doubt during next year's registration. Congrats to everyone who has earned a BQ whether you make the cut or not!

    ReplyDelete
    Replies
    1. Nicely said.

      I have a BQ-94 and I'm suffering bad. Would be my first Boston Marathon too :(

      Delete
  13. Yes, you can only control so much. It is the struggle and determination that counts. There is an element of luck in all of this. Some people get unlucky. A high demand year, hot and humid weather, pouring down rain, terrorists. We are all blessed to be able to run a BQ time, or even to run 26.2 miles.

    ReplyDelete
    Replies
    1. Looks like 2:02 under the cutoff will not make it for 2016. My CC was credited back 180 fee this morning.
      I'm bummed !

      Delete
    2. Not necessarily:

      The Boston Marathon: We have not yet notified all runners who submitted applications during the second week of registration. Most banks only hold a pre-authorization charge for up to three days. If we don't charge the card by then, then the pre-authorization goes away. This neither means you are accepted or not accepted into the race. We are continuing the process of manually verifying submissions, and plan to notify all runners by the middle of next week. At that time, if you are accepted, then your card will either be charged, or pre-authorized again and then charged. Thank you for your patience.

      Delete
    3. Don't give up hope yet. That's just your card dropping the pending charge. The BAA has a post ofntheir FB page about that.

      Delete
    4. Agreed. My wife's card was not charged. She is BQ - 14+ minutes. Signed up late Monday on week 2. Guessing they are just backed on verification. Pretty sure she is getting in. For those who don't get in, considering a late fall marathon?

      Delete
  14. So true to the above 2 comments and thanks!! Needed to read these right about now.

    ReplyDelete
  15. Re: Why are you a "he"... because your stats are sexy, so I can only hope ;)

    ReplyDelete
  16. This is a fantastic analysis!

    How much are you scripting the process of gathering the data (how much room for improvement is there that could make it more palatable to keep this going)? Have you been using athlinks.com to gather data?

    I have so many other questions I want to address with this data set! I was drawn to your blog because I wanted data to address the question of how fair it is that women get 30 more minutes than men, not knowing whether fairness should more heavily weight physiological differences or trying to get comparable participation rates. In the absence of issues cultural and other issues, they'd be the same, but they're really not. It is really hard for women who want to start a family to commit to qualifying and then running in the next Boston. an option to defer in case of pregnancy would seriously help but opens up other cans of worms..... Gotta stop rambling.

    Wouldn't it be fantastic if we could get more registrant data from the BAA? Haha yeah right. But we do get the results ;)

    ReplyDelete
  17. Tremendous job on the analysis. I really hope you do it again next year although I say that I cannot even fathom how much time this takes. The raw data by age group from 25 races alone is daunting. Thank you again !

    ReplyDelete
  18. I was looking at the raw data you have from Boston qualifiers who qualified at the 2015 Boston Marathon. There was about a 14% jump there to 46% who requalified. I am beginning to think that this factor could have contributed substantially to the increased demand and higher cutoff this year. Sort of like the opposite of the Berlin marathon (where people aren't as likely to travel to Boston), qualifiers at Boston: a) are more likely to live near Boston making it easier to go; and b) have already shown a strong desire to actually run Boston as by definition they have done it at least once. With thousands of qualifiers at Boston, the simple fact that there was good racing conditions at Boston this year may drive the analysis a decent amount.

    ReplyDelete
  19. This is a terrific analysis. Thanks very much for doing it. Helped me stay calm until my name was in the entry list! I should also add that I BQ (with 7 minutes plus to spare) in a race not listed on your top 25.... Good luck to everyone!

    ReplyDelete
  20. We should have the final cutoff this week, if the BAA chooses to share it in a timely fashion. Given the early closing of registration, I'm guessing closer to 2 minutes than 90 seconds, in a totally science-free manner. I wonder how many people were going to register, but hadn't gotten around to it?

    ReplyDelete
  21. In your split data, do you have many races with 5K splits? I've done some studies, which you can find poking around from this one:
    http://www.y42k.com/2014/07/11/5k-splits-for-men-and-women-at-the-2013-new-york-marathon/
    and would be interesting in having more data to play with.

    ReplyDelete