Pro cycling statistics?
#1
Thread Starter
Senior Member
Joined: Apr 2010
Posts: 58
Likes: 0
Pro cycling statistics?
Does anyone know of a source or repository containing any sort of data-driven statistics on pro cyclists? Or if there is anywhere that breaks down major races into "play-by-play" sequences with subsequent results?
I work at a lab that does a lot of data mining research, and an emerging topic of interest is sports data mining. I haven't found any academic papers on cycling and sports data mining, so I am looking to try investigate the topic to see if I can come up with anything interesting.
Anything you think might be relevant would be helpful as I do not know of any resources out there. Once again, I'm looking for any collection of data-driven statistics about cycling available.
If anyone is interested in the topic, this is a nerdy but interesting read: https://creativity-online.com/news/da...le-tour/137926
Thanks very much
I work at a lab that does a lot of data mining research, and an emerging topic of interest is sports data mining. I haven't found any academic papers on cycling and sports data mining, so I am looking to try investigate the topic to see if I can come up with anything interesting.
Anything you think might be relevant would be helpful as I do not know of any resources out there. Once again, I'm looking for any collection of data-driven statistics about cycling available.
If anyone is interested in the topic, this is a nerdy but interesting read: https://creativity-online.com/news/da...le-tour/137926
Thanks very much
#3
Thread Starter
Senior Member
Joined: Apr 2010
Posts: 58
Likes: 0
Essentially anything about cyclists at this point. I don't know what data is available, so I don't know what can be investigated.
Some examples of data mining in other sports:
-In baseball, regression analysis has shown that a sacrifice bunt to move a runner from first base to second base is never as statistically effective as just leaving the runner on first and having the batter take a real at-bat.
-In basketball, large collections of data about individual player performances on a team can be analyzed to show which players compliment other players the best. This is an important managerial decision.
-Data mining plays a huge role in sports betting. For example, in dog racing it is found that the dog that wins is almost always in the front of the pack in the first turn of a race, so dogs that have a tendency to sprint out of the gate are the ones worth betting on rather than dogs that pace themselves in the beginning.
In cycling, for example, one could look at which riders complement other riders the best in team time trials, road races, etc if the data was available. Basically no real sports data mining research has been done on cycling at all, and I am just looking at what type of statistics are available at this point. After looking at what is available, I could proceed at asking questions and forming hypothesis given the data I have.
I'm a college student writing a paper and figured if I could do it on cycling, I absolutely will.
Some examples of data mining in other sports:
-In baseball, regression analysis has shown that a sacrifice bunt to move a runner from first base to second base is never as statistically effective as just leaving the runner on first and having the batter take a real at-bat.
-In basketball, large collections of data about individual player performances on a team can be analyzed to show which players compliment other players the best. This is an important managerial decision.
-Data mining plays a huge role in sports betting. For example, in dog racing it is found that the dog that wins is almost always in the front of the pack in the first turn of a race, so dogs that have a tendency to sprint out of the gate are the ones worth betting on rather than dogs that pace themselves in the beginning.
In cycling, for example, one could look at which riders complement other riders the best in team time trials, road races, etc if the data was available. Basically no real sports data mining research has been done on cycling at all, and I am just looking at what type of statistics are available at this point. After looking at what is available, I could proceed at asking questions and forming hypothesis given the data I have.
I'm a college student writing a paper and figured if I could do it on cycling, I absolutely will.
Last edited by Aero Sapien; 04-14-10 at 01:42 PM.
#4
Banned
Joined: Sep 2005
Posts: 28,387
Likes: 3
From: Santa Barbara, CA
Bikes: Specialized Tarmac SL2, Specialized Tarmac SL, Giant TCR Composite, Specialized StumpJumper Expert HT
You would have to watch a race and compile the stats yourself, as I doubt that there is any kind of meaningful play-by-play stats within a race. If you had the data, it would be interesting to know about the breaks in a race. Which riders went off the front, for how long, with which other riders. How long did the break last, how much work did each rider do in the break. If the break lasted, what place did each rider get. Then for races which finished in a sprint, it would be interesting to know how far from the line the sprinter started their sprint, whether they had their own team lead them out (and how many riders). Or if they didn't have a leadout or sat on another team's train. What position in the field were they at 10k to go, 5k to go, 1k to go, 900m, 800m, 700m, etc.
Then there are stage races. Stage races would be easier to get stats on because they are individual events which add up to make the whole. So you can track each rider's relative placing on each stage and in the whole, etc.
Then there are stage races. Stage races would be easier to get stats on because they are individual events which add up to make the whole. So you can track each rider's relative placing on each stage and in the whole, etc.
#6
Thread Starter
Senior Member
Joined: Apr 2010
Posts: 58
Likes: 0
Do you think looking at the stage-by-stage results of the Tour de France, one could predict a riders performance on a stage given a riders previous stage or something? I don't exactly know the dynamics of pro stage racing, but are there days where riders purposely take the day off and do the bare minimum to complete a days work so that they can win a future stage, or does everyone go max effort every day?
The stage by stage results of the tour are readily available..if one could make a model to predict a stage's results given previous stage history, that would be interesting I think
EDIT: I would definitely love to watch crits or something and gather the data about sprints, break aways, etc, but that's way too much work for one individual =/
#7
Banned
Joined: Sep 2005
Posts: 28,387
Likes: 3
From: Santa Barbara, CA
Bikes: Specialized Tarmac SL2, Specialized Tarmac SL, Giant TCR Composite, Specialized StumpJumper Expert HT
#9
Banned
Joined: Sep 2005
Posts: 28,387
Likes: 3
From: Santa Barbara, CA
Bikes: Specialized Tarmac SL2, Specialized Tarmac SL, Giant TCR Composite, Specialized StumpJumper Expert HT
Each stage is it's own race. The rider gets a ranking for that race based on when they crossed the finish line.
They also get a time. Their cumulative time for all stages determines their place in the GC (General Classification).
Riders also get points for being the first (several) over climbs and at sprints. There are separate points competitions.
All the data is there, I just don't know what you could do with it. I doubt you could predict the winner from one stage to another based on how they did on the previous stages.
They also get a time. Their cumulative time for all stages determines their place in the GC (General Classification).
Riders also get points for being the first (several) over climbs and at sprints. There are separate points competitions.
All the data is there, I just don't know what you could do with it. I doubt you could predict the winner from one stage to another based on how they did on the previous stages.
#11
Guest
Posts: n/a
this topic is interesting but collecting a lot of data from pro cycling, i think, is pretty much a dead end. You can only find snippets here and there such as average speeds from the tour or a riders power output for one race. it seems L'equipe obtains a lot of data on specific aspects like speeds of alpe d'huez. i vaguely remember people writing that satre went up the mtn in 2008 as fast as lemond/fignon used too. those type of records exist.
forget about expecting basketball and baseball type data from cycling. the sports are too different. Each race is unique and happens once per year with different racers. courses and the racing calender can remain the same for many years and then suddenly change. i don't know what the right language would be but something like no trial can be repeated. unlike in the nba, where they play so many many games and they are either home or away. so in college ball you get rpis or whatever that index is that the ncaa uses.
sure, you can collect disparate data piece meal, laboriously or gain knowledge over time and maybe you could be a superior gambler but nothing exists to allow data mining. during last years tour commentators tried to analyze contadors climbing ability but it was controversial with some people arguiing there was not enough data available. and if i remember correctly, contador would never say what his v02 max was.
https://bikeraceinfo.com/tdf/tdfstats.html
you could focus on one thing like the tour. the link above has some basic data. maybe you could add columns for ave temps, precipitation, indexes for level of technology or sophistication of teams operations, amount of climbing, number of sprint stages, mtn top finishes, number of teams, team budgets, etc. but thats a lot of work unless i guess you work for l'equipe.
forget about expecting basketball and baseball type data from cycling. the sports are too different. Each race is unique and happens once per year with different racers. courses and the racing calender can remain the same for many years and then suddenly change. i don't know what the right language would be but something like no trial can be repeated. unlike in the nba, where they play so many many games and they are either home or away. so in college ball you get rpis or whatever that index is that the ncaa uses.
sure, you can collect disparate data piece meal, laboriously or gain knowledge over time and maybe you could be a superior gambler but nothing exists to allow data mining. during last years tour commentators tried to analyze contadors climbing ability but it was controversial with some people arguiing there was not enough data available. and if i remember correctly, contador would never say what his v02 max was.
https://bikeraceinfo.com/tdf/tdfstats.html
you could focus on one thing like the tour. the link above has some basic data. maybe you could add columns for ave temps, precipitation, indexes for level of technology or sophistication of teams operations, amount of climbing, number of sprint stages, mtn top finishes, number of teams, team budgets, etc. but thats a lot of work unless i guess you work for l'equipe.
#12
Guest
Posts: n/a
https://www.poissons52.fr/actualites/...atistiques.php
https://www.memoire-du-cyclisme.net/e.../stats_tdf.php
https://www.les-sports.info/tour-de-f...21-t94-u0.html
there's some but it's all too disparate and it doesn't get to the most interesting data about individuals...
https://www.memoire-du-cyclisme.net/e.../stats_tdf.php
https://www.les-sports.info/tour-de-f...21-t94-u0.html
there's some but it's all too disparate and it doesn't get to the most interesting data about individuals...
#13
Senior Member
Joined: Jul 2007
Posts: 57
Likes: 0
Secondly there is simply to much variation in races. Weather plays a huge difference in the outcome of a race. How are you going to account for the weather, let alone the variations in weather over a 180 mile bike ride. Routes in many of the grand tours where betting would be most common change every year. Teams change every year, and any cycling statistic is going to be heavily influenced by the riders team.
You could probably collect data on how often a break away succeeds. However to produce a useful model yo have to take into account all of the variables. Here is a list of variables to review, and I'm sure there are some that I left off:
-Race
-Distance to finish breakaway begins
-Weather conditions at breakaway
-Weather conditions in peleton
-Power output of riders in breakaway
-V02 max of riders in breakaway
-Aerodynamics of riders positions in breakaway
-Level of cooperation in breakaway
-Power output of riders in peleton
-V02 max of riders in peleton
-Aerodynamics of riders positions in peleton
-Level of cooperation in peleton
-GC position of riders in breakaway
-Race stage within tour
-Course profile
-Road Conditions
-Fatigue level of riders
-Number of riders in breakaway
-Likelihood of a crash
-Likelihood of a mechanical
-Affect of GC leaders getting a mechanical
Theres many many more. Simply too many variables to have any kind of statistically significant model
#14
Thread Starter
Senior Member
Joined: Apr 2010
Posts: 58
Likes: 0
Eesh, I am feeling a bit discouraged but don't want to give up yet.
Data mining is all about finding patterns, and they are worthy to report whether they are trivial or surprising. A classic example of an unexpected pattern found was that diapers and alcohol are purchased together at significant rates; young fathers who buy diapers for their kids often also desire the stress-relief alcohol provides.
I'm thinking focusing on one race might be easier for now; so during the Tour de France, all that matters is the cumulative time right?
I'm looking at the results @ https://www.letour.fr/2009/TDF/LIVE/u...ent/index.html
Do you think any sort of predictive modeling, or correlation between individual stage results could be found? For example, do you think you could accurate predict that a rider finishing in the top 10% 2 days in a row may take the third day easy and finish in the bottom half? Any interesting predictions or correlations you think may exist?
Data mining is all about finding patterns, and they are worthy to report whether they are trivial or surprising. A classic example of an unexpected pattern found was that diapers and alcohol are purchased together at significant rates; young fathers who buy diapers for their kids often also desire the stress-relief alcohol provides.
I'm thinking focusing on one race might be easier for now; so during the Tour de France, all that matters is the cumulative time right?
I'm looking at the results @ https://www.letour.fr/2009/TDF/LIVE/u...ent/index.html
Do you think any sort of predictive modeling, or correlation between individual stage results could be found? For example, do you think you could accurate predict that a rider finishing in the top 10% 2 days in a row may take the third day easy and finish in the bottom half? Any interesting predictions or correlations you think may exist?
#15
Banned
Joined: Sep 2005
Posts: 28,387
Likes: 3
From: Santa Barbara, CA
Bikes: Specialized Tarmac SL2, Specialized Tarmac SL, Giant TCR Composite, Specialized StumpJumper Expert HT
That's basically what I was getting at. That would probably give the most useful/interesting results, but you would have to create all of the data for it by watching endless hours of race videos.
Last edited by umd; 04-14-10 at 03:14 PM.
#16
Thread Starter
Senior Member
Joined: Apr 2010
Posts: 58
Likes: 0
So essentially we are coming to the conclusion that I won't be able to get all hot and passionate over studying some statistics in cycling.
Thanks for crushing my dreams guys.
I'm going to look at stage races and omniums more because I think UMD is right about them being a bit easier to study, and I'll also read some papers on horse racing and such to get some more ideas.
If anyone comes up with something magical please let me know
Thanks for crushing my dreams guys.
I'm going to look at stage races and omniums more because I think UMD is right about them being a bit easier to study, and I'll also read some papers on horse racing and such to get some more ideas.
If anyone comes up with something magical please let me know
#17
Essentially anything about cyclists at this point. I don't know what data is available, so I don't know what can be investigated.
Some examples of data mining in other sports:
-In baseball, regression analysis has shown that a sacrifice bunt to move a runner from first base to second base is never as statistically effective as just leaving the runner on first and having the batter take a real at-bat.
Some examples of data mining in other sports:
-In baseball, regression analysis has shown that a sacrifice bunt to move a runner from first base to second base is never as statistically effective as just leaving the runner on first and having the batter take a real at-bat.
In a road race things are so much more fluid, "scorers" would have a hard time keeping up. Also think about how the race splits up, etc.
But if you want to talk about results, that's a different story. See here for a start: https://www.bikeforums.net/showthread...iction-Website
#18
Senior Member
Joined: Jul 2006
Posts: 158
Likes: 0
From: Philadelphia
I think you might find some interesting data just looking at the relationship among times and placing for sprinters v. their teammates in stages with a spring finish, particularly those near to TT or TTT stages. That is, how deep will a lead-out train go for a sprint stage the day before or after a TT? It might be interesting to look at teams like Garmin or HTC-Columbia from last year, where in each GT, they sent teams with several time trial specialists along with 1 or 2 sprinters.
This wouldn't be easy to figure, as they only record individual placing, not individual times, for everybody finishing in a large group, but you might be able to look at the top 5-10 finishers in a sprint stage v. the placing of their teammates (i.e. their train) over the stages.
This wouldn't be easy to figure, as they only record individual placing, not individual times, for everybody finishing in a large group, but you might be able to look at the top 5-10 finishers in a sprint stage v. the placing of their teammates (i.e. their train) over the stages.
#19
¯\_(ツ)_/¯
Joined: Jun 2008
Posts: 10,978
Likes: 4
From: Redwood City, CA
Bikes: aggressive agreement is what I ride.
The Garmin Team was posting their .gpx of the TdF. Their HR/Power numbers were removed, but you could put known rider weight into different websites to get power estimates. That's all without wind/draft/exact equip weight, etc, though.
#21
Guest
Posts: n/a
it would be neat to study how often breakaways succeeded prior to radios being popular and compare it to how often they've succeeded with radios being used. Then if radios are banned, you could update the study and hypothesize the new rate will be more like the older rate without radios. I bet the french have already done this for the tour but unless you can actually find evidence it's been done, why not do it yourself? And when i say rate, i mean a ratio of breaks winning compared to total road races, stages minus tts. you could also disregard tough mtn stages etc.
#22
Hmm...
Do you think looking at the stage-by-stage results of the Tour de France, one could predict a riders performance on a stage given a riders previous stage or something? I don't exactly know the dynamics of pro stage racing, but are there days where riders purposely take the day off and do the bare minimum to complete a days work so that they can win a future stage, or does everyone go max effort every day?
The stage by stage results of the tour are readily available..if one could make a model to predict a stage's results given previous stage history, that would be interesting I think
EDIT: I would definitely love to watch crits or something and gather the data about sprints, break aways, etc, but that's way too much work for one individual =/
Do you think looking at the stage-by-stage results of the Tour de France, one could predict a riders performance on a stage given a riders previous stage or something? I don't exactly know the dynamics of pro stage racing, but are there days where riders purposely take the day off and do the bare minimum to complete a days work so that they can win a future stage, or does everyone go max effort every day?
The stage by stage results of the tour are readily available..if one could make a model to predict a stage's results given previous stage history, that would be interesting I think
EDIT: I would definitely love to watch crits or something and gather the data about sprints, break aways, etc, but that's way too much work for one individual =/
Yes, but it won't show in the stats!
When riders do this they may be different kinds of riders and do it differently. The most common would be the sprinters. Any mountian stage their goal is to survive with hte least work. But that does not mean just going easy, too slow and they are eliminated. Generally all the sprinters end up in one huge pack off the back. The exceptions are in stages with moderate mountians where guys who are not pure sprinters may try to stay with the lead pack and if it stays togeather they may be the best sprinter left.
Anyone who is a GC contender never takes a day off, or is it always takes the day off when it is flat? They can not afford to lose time to other GC contenders. No easy day if any other contender is off the front.
Pure climbers may take a day off to a degree to be fresh for the next day. But that just means finishing in the main group. On a flat stage that is pretty much always the case. In the mountians it can mean they were resting, it can mean they tried early and then did not have the gas later, or it can mean they are tired or sick.
There are several huge problems with statistics in a major tour. Except for a handfull of riders in contention for the green jersey placing at the finish does not matter. Official time matters more, and everyone in a group gets the same time. For teams with a GC contender the first goal of at at least all but one or 2 of the other riders is the GC riders placing, not their own placing. For teams with a sprinter the same kind of dedication often takes over. (But if the team has only a sprinter then other riders may be free to seek their own good on stages that may not have a sprint finish).
There might be something to find on how often a break stays away, but even there it is not simple. A lot depends on if the other teams care. In today's huge peloton if the other teams all want to chase a break will fail. Success depends more on willingness to chase than legs. There MIGHT be something to be found on how often breaks succede in a tour based on the final 30 miles of the day and the next days stage. But I doubt there are enough datapoints to showanything with mathematical rigour.
Here is a grain of hope for you. Most years the TDF has 3 individual time trials. First the prologue, very short, then later 2 others. Often one much longer than the other. Perhaps you could find something there. At least yuo have numbers to start with. I'd go for comparing to a particular placing, I'm thinking 5th place. Why? Because you see distortion in hte top places, especially in the first and last time trials. In the prologue there are actually specialists who seek just that stage win and then afterwards to hold the yellow for as long as possible. The last time trial is almost always the last chance ot make up time. That means the leaders are all concerend about gaining or losing time, not in general but in chunks that are enough to change placing that day. This means very often the tour leader is simply matching the second place rider.
#23
it would be neat to study how often breakaways succeeded prior to radios being popular and compare it to how often they've succeeded with radios being used. Then if radios are banned, you could update the study and hypothesize the new rate will be more like the older rate without radios. I bet the french have already done this for the tour but unless you can actually find evidence it's been done, why not do it yourself? And when i say rate, i mean a ratio of breaks winning compared to total road races, stages minus tts. you could also disregard tough mtn stages etc.
#24
Guest
Posts: n/a
Succeeding breaks has a chance of finding something. How far out the break started. How many breaks earlier in the stage and how many miles did those previous breaks stay away. Number of riders initially and maximum in the break. Is the next stage tough? (OK judgement but time trial or mountian to start). And of course was it Bastille day.
#25
I agree that your best shot is working with breakaways. That is the one thing with actual numbers that would be relatively easy to track besides finishing time & position.
Look on CyclingNews and VeloNews for races that they covered with a live report. You can go back through those and track how many riders got in a break and their time gap vs. km until the finish. Then match that to when breakaways succeeded and when the race ended in a bunch sprint. It will take a lot of digging and you might not get perfect data for every race, but you'll at least have something to work with. Basically try to find anything you can to predict when a breakaway will succeed. Maybe for a stage race you could track each break rider's position on GC, as well.
You might also think about TTs and various time checks along the route, though I don't know if those are actually recorded and/or what you'd actually learn from it.
Look on CyclingNews and VeloNews for races that they covered with a live report. You can go back through those and track how many riders got in a break and their time gap vs. km until the finish. Then match that to when breakaways succeeded and when the race ended in a bunch sprint. It will take a lot of digging and you might not get perfect data for every race, but you'll at least have something to work with. Basically try to find anything you can to predict when a breakaway will succeed. Maybe for a stage race you could track each break rider's position on GC, as well.
You might also think about TTs and various time checks along the route, though I don't know if those are actually recorded and/or what you'd actually learn from it.




