Go Back  Bike Forums > Bike Forums > Road Cycling
Reload this Page >

Pro cycling statistics?

Search
Notices
Road Cycling “It is by riding a bicycle that you learn the contours of a country best, since you have to sweat up the hills and coast down them. Thus you remember them as they actually are, while in a motor car only a high hill impresses you, and you have no such accurate remembrance of country you have driven through as you gain by riding a bicycle.” -- Ernest Hemingway

Pro cycling statistics?

Thread Tools
 
Search this Thread
 
Old 04-14-10 | 12:46 PM
  #1  
Thread Starter
Senior Member
 
Joined: Apr 2010
Posts: 58
Likes: 0
Pro cycling statistics?

Does anyone know of a source or repository containing any sort of data-driven statistics on pro cyclists? Or if there is anywhere that breaks down major races into "play-by-play" sequences with subsequent results?

I work at a lab that does a lot of data mining research, and an emerging topic of interest is sports data mining. I haven't found any academic papers on cycling and sports data mining, so I am looking to try investigate the topic to see if I can come up with anything interesting.

Anything you think might be relevant would be helpful as I do not know of any resources out there. Once again, I'm looking for any collection of data-driven statistics about cycling available.

If anyone is interested in the topic, this is a nerdy but interesting read: https://creativity-online.com/news/da...le-tour/137926

Thanks very much
Aero Sapien is offline  
Reply
Old 04-14-10 | 01:28 PM
  #2  
umd's Avatar
umd
Banned
 
Joined: Sep 2005
Posts: 28,387
Likes: 3
From: Santa Barbara, CA

Bikes: Specialized Tarmac SL2, Specialized Tarmac SL, Giant TCR Composite, Specialized StumpJumper Expert HT

I don't really understand what kind of data you are looking for.
umd is offline  
Reply
Old 04-14-10 | 01:37 PM
  #3  
Thread Starter
Senior Member
 
Joined: Apr 2010
Posts: 58
Likes: 0
Essentially anything about cyclists at this point. I don't know what data is available, so I don't know what can be investigated.

Some examples of data mining in other sports:

-In baseball, regression analysis has shown that a sacrifice bunt to move a runner from first base to second base is never as statistically effective as just leaving the runner on first and having the batter take a real at-bat.

-In basketball, large collections of data about individual player performances on a team can be analyzed to show which players compliment other players the best. This is an important managerial decision.

-Data mining plays a huge role in sports betting. For example, in dog racing it is found that the dog that wins is almost always in the front of the pack in the first turn of a race, so dogs that have a tendency to sprint out of the gate are the ones worth betting on rather than dogs that pace themselves in the beginning.

In cycling, for example, one could look at which riders complement other riders the best in team time trials, road races, etc if the data was available. Basically no real sports data mining research has been done on cycling at all, and I am just looking at what type of statistics are available at this point. After looking at what is available, I could proceed at asking questions and forming hypothesis given the data I have.

I'm a college student writing a paper and figured if I could do it on cycling, I absolutely will.

Last edited by Aero Sapien; 04-14-10 at 01:42 PM.
Aero Sapien is offline  
Reply
Old 04-14-10 | 01:44 PM
  #4  
umd's Avatar
umd
Banned
 
Joined: Sep 2005
Posts: 28,387
Likes: 3
From: Santa Barbara, CA

Bikes: Specialized Tarmac SL2, Specialized Tarmac SL, Giant TCR Composite, Specialized StumpJumper Expert HT

You would have to watch a race and compile the stats yourself, as I doubt that there is any kind of meaningful play-by-play stats within a race. If you had the data, it would be interesting to know about the breaks in a race. Which riders went off the front, for how long, with which other riders. How long did the break last, how much work did each rider do in the break. If the break lasted, what place did each rider get. Then for races which finished in a sprint, it would be interesting to know how far from the line the sprinter started their sprint, whether they had their own team lead them out (and how many riders). Or if they didn't have a leadout or sat on another team's train. What position in the field were they at 10k to go, 5k to go, 1k to go, 900m, 800m, 700m, etc.

Then there are stage races. Stage races would be easier to get stats on because they are individual events which add up to make the whole. So you can track each rider's relative placing on each stage and in the whole, etc.
umd is offline  
Reply
Old 04-14-10 | 01:52 PM
  #5  
Nate552's Avatar
Senior Member
 
Joined: Nov 2008
Posts: 2,620
Likes: 0
From: TX

Bikes: Orbea Orca Trek 5500 Trek Equinox

I've seen Versus display how much work each rider was doing in a breakaway. I wonder if they keep that data somewhere.
Nate552 is offline  
Reply
Old 04-14-10 | 01:53 PM
  #6  
Thread Starter
Senior Member
 
Joined: Apr 2010
Posts: 58
Likes: 0
Originally Posted by umd
Then there are stage races. Stage races would be easier to get stats on because they are individual events which add up to make the whole. So you can track each rider's relative placing on each stage and in the whole, etc.
Hmm...

Do you think looking at the stage-by-stage results of the Tour de France, one could predict a riders performance on a stage given a riders previous stage or something? I don't exactly know the dynamics of pro stage racing, but are there days where riders purposely take the day off and do the bare minimum to complete a days work so that they can win a future stage, or does everyone go max effort every day?

The stage by stage results of the tour are readily available..if one could make a model to predict a stage's results given previous stage history, that would be interesting I think



EDIT: I would definitely love to watch crits or something and gather the data about sprints, break aways, etc, but that's way too much work for one individual =/
Aero Sapien is offline  
Reply
Old 04-14-10 | 01:56 PM
  #7  
umd's Avatar
umd
Banned
 
Joined: Sep 2005
Posts: 28,387
Likes: 3
From: Santa Barbara, CA

Bikes: Specialized Tarmac SL2, Specialized Tarmac SL, Giant TCR Composite, Specialized StumpJumper Expert HT

Originally Posted by Aero Sapien
Do you think looking at the stage-by-stage results of the Tour de France, one could predict a riders performance on a stage given a riders previous stage or something?
Probably not
umd is offline  
Reply
Old 04-14-10 | 01:59 PM
  #8  
Thread Starter
Senior Member
 
Joined: Apr 2010
Posts: 58
Likes: 0
I'm sort of confused as to what you meant by tracking each players relative placing on each stage and in the whole
Aero Sapien is offline  
Reply
Old 04-14-10 | 02:07 PM
  #9  
umd's Avatar
umd
Banned
 
Joined: Sep 2005
Posts: 28,387
Likes: 3
From: Santa Barbara, CA

Bikes: Specialized Tarmac SL2, Specialized Tarmac SL, Giant TCR Composite, Specialized StumpJumper Expert HT

Each stage is it's own race. The rider gets a ranking for that race based on when they crossed the finish line.

They also get a time. Their cumulative time for all stages determines their place in the GC (General Classification).

Riders also get points for being the first (several) over climbs and at sprints. There are separate points competitions.

All the data is there, I just don't know what you could do with it. I doubt you could predict the winner from one stage to another based on how they did on the previous stages.
umd is offline  
Reply
Old 04-14-10 | 02:21 PM
  #10  
Quel's Avatar
Senior Member
 
Joined: Jan 2008
Posts: 3,653
Likes: 1
From: Washington, DC
Pros ride at least 70 miles per day average.
Quel is offline  
Reply
Old 04-14-10 | 02:29 PM
  #11  
everything
Guest
 
Posts: n/a
this topic is interesting but collecting a lot of data from pro cycling, i think, is pretty much a dead end. You can only find snippets here and there such as average speeds from the tour or a riders power output for one race. it seems L'equipe obtains a lot of data on specific aspects like speeds of alpe d'huez. i vaguely remember people writing that satre went up the mtn in 2008 as fast as lemond/fignon used too. those type of records exist.

forget about expecting basketball and baseball type data from cycling. the sports are too different. Each race is unique and happens once per year with different racers. courses and the racing calender can remain the same for many years and then suddenly change. i don't know what the right language would be but something like no trial can be repeated. unlike in the nba, where they play so many many games and they are either home or away. so in college ball you get rpis or whatever that index is that the ncaa uses.

sure, you can collect disparate data piece meal, laboriously or gain knowledge over time and maybe you could be a superior gambler but nothing exists to allow data mining. during last years tour commentators tried to analyze contadors climbing ability but it was controversial with some people arguiing there was not enough data available. and if i remember correctly, contador would never say what his v02 max was.

https://bikeraceinfo.com/tdf/tdfstats.html
you could focus on one thing like the tour. the link above has some basic data. maybe you could add columns for ave temps, precipitation, indexes for level of technology or sophistication of teams operations, amount of climbing, number of sprint stages, mtn top finishes, number of teams, team budgets, etc. but thats a lot of work unless i guess you work for l'equipe.
 
Reply
Old 04-14-10 | 02:38 PM
  #12  
everything
Guest
 
Posts: n/a
https://www.poissons52.fr/actualites/...atistiques.php
https://www.memoire-du-cyclisme.net/e.../stats_tdf.php
https://www.les-sports.info/tour-de-f...21-t94-u0.html

there's some but it's all too disparate and it doesn't get to the most interesting data about individuals...
 
Reply
Old 04-14-10 | 02:56 PM
  #13  
Senior Member
 
Joined: Jul 2007
Posts: 57
Likes: 0
Originally Posted by everything
and if i remember correctly, contador would never say what his v02 max was.
This among many things would be difficult. Many riders are unwilling to give out power numbers, or V02 Max numbers, because they allow opposing teams an advantage. If people knew what Contrador's 5 min power number was they have a target to work at.

Secondly there is simply to much variation in races. Weather plays a huge difference in the outcome of a race. How are you going to account for the weather, let alone the variations in weather over a 180 mile bike ride. Routes in many of the grand tours where betting would be most common change every year. Teams change every year, and any cycling statistic is going to be heavily influenced by the riders team.

You could probably collect data on how often a break away succeeds. However to produce a useful model yo have to take into account all of the variables. Here is a list of variables to review, and I'm sure there are some that I left off:

-Race
-Distance to finish breakaway begins
-Weather conditions at breakaway
-Weather conditions in peleton
-Power output of riders in breakaway
-V02 max of riders in breakaway
-Aerodynamics of riders positions in breakaway
-Level of cooperation in breakaway
-Power output of riders in peleton
-V02 max of riders in peleton
-Aerodynamics of riders positions in peleton
-Level of cooperation in peleton
-GC position of riders in breakaway
-Race stage within tour
-Course profile
-Road Conditions
-Fatigue level of riders
-Number of riders in breakaway
-Likelihood of a crash
-Likelihood of a mechanical
-Affect of GC leaders getting a mechanical

Theres many many more. Simply too many variables to have any kind of statistically significant model
aecky01 is offline  
Reply
Old 04-14-10 | 03:01 PM
  #14  
Thread Starter
Senior Member
 
Joined: Apr 2010
Posts: 58
Likes: 0
Eesh, I am feeling a bit discouraged but don't want to give up yet.

Data mining is all about finding patterns, and they are worthy to report whether they are trivial or surprising. A classic example of an unexpected pattern found was that diapers and alcohol are purchased together at significant rates; young fathers who buy diapers for their kids often also desire the stress-relief alcohol provides.

I'm thinking focusing on one race might be easier for now; so during the Tour de France, all that matters is the cumulative time right?

I'm looking at the results @ https://www.letour.fr/2009/TDF/LIVE/u...ent/index.html

Do you think any sort of predictive modeling, or correlation between individual stage results could be found? For example, do you think you could accurate predict that a rider finishing in the top 10% 2 days in a row may take the third day easy and finish in the bottom half? Any interesting predictions or correlations you think may exist?
Aero Sapien is offline  
Reply
Old 04-14-10 | 03:11 PM
  #15  
umd's Avatar
umd
Banned
 
Joined: Sep 2005
Posts: 28,387
Likes: 3
From: Santa Barbara, CA

Bikes: Specialized Tarmac SL2, Specialized Tarmac SL, Giant TCR Composite, Specialized StumpJumper Expert HT

Originally Posted by aecky01
You could probably collect data on how often a break away succeeds. However to produce a useful model yo have to take into account all of the variables. Here is a list of variables to review, and I'm sure there are some that I left off:
That's basically what I was getting at. That would probably give the most useful/interesting results, but you would have to create all of the data for it by watching endless hours of race videos.

Last edited by umd; 04-14-10 at 03:14 PM.
umd is offline  
Reply
Old 04-14-10 | 03:22 PM
  #16  
Thread Starter
Senior Member
 
Joined: Apr 2010
Posts: 58
Likes: 0
So essentially we are coming to the conclusion that I won't be able to get all hot and passionate over studying some statistics in cycling.

Thanks for crushing my dreams guys.

I'm going to look at stage races and omniums more because I think UMD is right about them being a bit easier to study, and I'll also read some papers on horse racing and such to get some more ideas.

If anyone comes up with something magical please let me know
Aero Sapien is offline  
Reply
Old 04-14-10 | 03:25 PM
  #17  
mattm's Avatar
**** that
 
Joined: Dec 2006
Posts: 15,402
Likes: 106
From: CALI
Originally Posted by Aero Sapien
Essentially anything about cyclists at this point. I don't know what data is available, so I don't know what can be investigated.

Some examples of data mining in other sports:

-In baseball, regression analysis has shown that a sacrifice bunt to move a runner from first base to second base is never as statistically effective as just leaving the runner on first and having the batter take a real at-bat.
Baseball has been broken down so many ways because there's a lot of time between the action, and everything happens in a defined order. You could even argue slow motion compared to cycling.

In a road race things are so much more fluid, "scorers" would have a hard time keeping up. Also think about how the race splits up, etc.

But if you want to talk about results, that's a different story. See here for a start: https://www.bikeforums.net/showthread...iction-Website
__________________
cat 1.

my race videos
mattm is offline  
Reply
Old 04-14-10 | 03:25 PM
  #18  
Senior Member
 
Joined: Jul 2006
Posts: 158
Likes: 0
From: Philadelphia
I think you might find some interesting data just looking at the relationship among times and placing for sprinters v. their teammates in stages with a spring finish, particularly those near to TT or TTT stages. That is, how deep will a lead-out train go for a sprint stage the day before or after a TT? It might be interesting to look at teams like Garmin or HTC-Columbia from last year, where in each GT, they sent teams with several time trial specialists along with 1 or 2 sprinters.

This wouldn't be easy to figure, as they only record individual placing, not individual times, for everybody finishing in a large group, but you might be able to look at the top 5-10 finishers in a sprint stage v. the placing of their teammates (i.e. their train) over the stages.
epenthetic is offline  
Reply
Old 04-14-10 | 03:29 PM
  #19  
Ygduf's Avatar
¯\_(ツ)_/¯
 
Joined: Jun 2008
Posts: 10,978
Likes: 4
From: Redwood City, CA

Bikes: aggressive agreement is what I ride.

The Garmin Team was posting their .gpx of the TdF. Their HR/Power numbers were removed, but you could put known rider weight into different websites to get power estimates. That's all without wind/draft/exact equip weight, etc, though.
Ygduf is offline  
Reply
Old 04-14-10 | 03:32 PM
  #20  
Senior Member
 
Joined: Jul 2006
Posts: 158
Likes: 0
From: Philadelphia
Actually, I'm not sure HTC-Columbia sent many TT'ers, I was really just thinking about who was doing work for Cav v. other (individual) goals.
epenthetic is offline  
Reply
Old 04-14-10 | 04:32 PM
  #21  
everything
Guest
 
Posts: n/a
it would be neat to study how often breakaways succeeded prior to radios being popular and compare it to how often they've succeeded with radios being used. Then if radios are banned, you could update the study and hypothesize the new rate will be more like the older rate without radios. I bet the french have already done this for the tour but unless you can actually find evidence it's been done, why not do it yourself? And when i say rate, i mean a ratio of breaks winning compared to total road races, stages minus tts. you could also disregard tough mtn stages etc.
 
Reply
Old 04-14-10 | 05:19 PM
  #22  
Keith99's Avatar
Senior Member
 
Joined: Apr 2005
Posts: 5,863
Likes: 3
Originally Posted by Aero Sapien
Hmm...

Do you think looking at the stage-by-stage results of the Tour de France, one could predict a riders performance on a stage given a riders previous stage or something? I don't exactly know the dynamics of pro stage racing, but are there days where riders purposely take the day off and do the bare minimum to complete a days work so that they can win a future stage, or does everyone go max effort every day?

The stage by stage results of the tour are readily available..if one could make a model to predict a stage's results given previous stage history, that would be interesting I think



EDIT: I would definitely love to watch crits or something and gather the data about sprints, break aways, etc, but that's way too much work for one individual =/

Yes, but it won't show in the stats!

When riders do this they may be different kinds of riders and do it differently. The most common would be the sprinters. Any mountian stage their goal is to survive with hte least work. But that does not mean just going easy, too slow and they are eliminated. Generally all the sprinters end up in one huge pack off the back. The exceptions are in stages with moderate mountians where guys who are not pure sprinters may try to stay with the lead pack and if it stays togeather they may be the best sprinter left.

Anyone who is a GC contender never takes a day off, or is it always takes the day off when it is flat? They can not afford to lose time to other GC contenders. No easy day if any other contender is off the front.

Pure climbers may take a day off to a degree to be fresh for the next day. But that just means finishing in the main group. On a flat stage that is pretty much always the case. In the mountians it can mean they were resting, it can mean they tried early and then did not have the gas later, or it can mean they are tired or sick.

There are several huge problems with statistics in a major tour. Except for a handfull of riders in contention for the green jersey placing at the finish does not matter. Official time matters more, and everyone in a group gets the same time. For teams with a GC contender the first goal of at at least all but one or 2 of the other riders is the GC riders placing, not their own placing. For teams with a sprinter the same kind of dedication often takes over. (But if the team has only a sprinter then other riders may be free to seek their own good on stages that may not have a sprint finish).

There might be something to find on how often a break stays away, but even there it is not simple. A lot depends on if the other teams care. In today's huge peloton if the other teams all want to chase a break will fail. Success depends more on willingness to chase than legs. There MIGHT be something to be found on how often breaks succede in a tour based on the final 30 miles of the day and the next days stage. But I doubt there are enough datapoints to showanything with mathematical rigour.

Here is a grain of hope for you. Most years the TDF has 3 individual time trials. First the prologue, very short, then later 2 others. Often one much longer than the other. Perhaps you could find something there. At least yuo have numbers to start with. I'd go for comparing to a particular placing, I'm thinking 5th place. Why? Because you see distortion in hte top places, especially in the first and last time trials. In the prologue there are actually specialists who seek just that stage win and then afterwards to hold the yellow for as long as possible. The last time trial is almost always the last chance ot make up time. That means the leaders are all concerend about gaining or losing time, not in general but in chunks that are enough to change placing that day. This means very often the tour leader is simply matching the second place rider.
Keith99 is offline  
Reply
Old 04-14-10 | 05:22 PM
  #23  
Keith99's Avatar
Senior Member
 
Joined: Apr 2005
Posts: 5,863
Likes: 3
Originally Posted by everything
it would be neat to study how often breakaways succeeded prior to radios being popular and compare it to how often they've succeeded with radios being used. Then if radios are banned, you could update the study and hypothesize the new rate will be more like the older rate without radios. I bet the french have already done this for the tour but unless you can actually find evidence it's been done, why not do it yourself? And when i say rate, i mean a ratio of breaks winning compared to total road races, stages minus tts. you could also disregard tough mtn stages etc.
Succeeding breaks has a chance of finding something. How far out the break started. How many breaks earlier in the stage and how many miles did those previous breaks stay away. Number of riders initially and maximum in the break. Is the next stage tough? (OK judgement but time trial or mountian to start). And of course was it Bastille day.
Keith99 is offline  
Reply
Old 04-14-10 | 05:42 PM
  #24  
everything
Guest
 
Posts: n/a
Originally Posted by Keith99
Succeeding breaks has a chance of finding something. How far out the break started. How many breaks earlier in the stage and how many miles did those previous breaks stay away. Number of riders initially and maximum in the break. Is the next stage tough? (OK judgement but time trial or mountian to start). And of course was it Bastille day.
eh, that's not quite what i had in mind. you'd have to watch every second of the earliest parts of stages to figure out some of those things like breaks attempted and that might not be doable. i'm not even sure a lot of those early moments are broadcast in the US. in the early parts of stages people are attempting breaks extremely frequently and its hard to differentiate among them. i mean its really really dynamic.
 
Reply
Old 04-14-10 | 07:22 PM
  #25  
DXchulo's Avatar
Upgrading my engine
 
Joined: Aug 2004
Posts: 6,218
Likes: 0
From: Alamogordo
I agree that your best shot is working with breakaways. That is the one thing with actual numbers that would be relatively easy to track besides finishing time & position.

Look on CyclingNews and VeloNews for races that they covered with a live report. You can go back through those and track how many riders got in a break and their time gap vs. km until the finish. Then match that to when breakaways succeeded and when the race ended in a bunch sprint. It will take a lot of digging and you might not get perfect data for every race, but you'll at least have something to work with. Basically try to find anything you can to predict when a breakaway will succeed. Maybe for a stage race you could track each break rider's position on GC, as well.

You might also think about TTs and various time checks along the route, though I don't know if those are actually recorded and/or what you'd actually learn from it.
DXchulo is offline  
Reply


Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service -

Copyright © 2026 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.