Friday, January 29, 2010

The human touch

As mentioned in my previous post, I interviewed Emmeline Hill, PhD, geneticist at University College Dublin and co-founder of the new equine genetic testing company Equinome last week for Thoroughbred Times. You'll have to read the interview in the February 13 issue to get that copyrighted information, but, suffice to say, Dr. Hill is personable, straightforward, and highly confident that what she has discovered is valuable information for Thoroughbred owners and breeders.

I'm not so sure. One of the questions I asked Dr. Hill was whether her test is superior to the opinion of an experienced, competent conformation and pedigree professional. Again, you'll have to read the magazine to see her answer.

One of the reasons I have doubts about Dr. Hill's research is sample size. Hill's basic sample size was 148 elite Thoroughbreds. That's pretty small for a study of a population as large as the Thoroughbred. There are certainly more than 500,000 Thoroughbreds alive in the world right now. A few years ago, the number was pushing 1-million, but that has probably declined significantly.

Hill's elite sample gets divided up further into CCs, CTs, and TTs, naturally, producing even smaller samples of horses with those genotypes. One result of that small sample size can be seen in the standard error of the average best racing distance of the three groups.
Standard error is a measure of uncertainty. I'm sure you've all seen political polls that predict that candidate A will win 51% of the vote and candidate B 49% +- 2%. That “+- 2%” is the standard error. In that case, that means that the race is actually a statistical dead heat, since the real percentages for each candidate theoretically could vary by two percentage points and they're only two percentage points apart in the poll. Another poll the next day could (and often does) give exactly the opposite result.

Hill's results show that the best racing distance for horses with the CC genotype that her “speed gene te average st” tests for is 6.2 furlongs, +- 0.8 furlongs. That's not bad. That means that it's pretty likely that the average best winning distance for those horses is going to be between 5.4 and 7 furlongs roughly 70% of the time (in a normal distribution), and 6.2 furlongs is the most likely number. You're probably not going to do much better than that with such a small sample size.

The problem appears with the CTs and TTs. The average best winning distance for CTs is 9.1 furlongs, +- 2.4 furlongs; average for TTs is 10.5 furlongs, +- 2.7 furlongs. I'm not an expert—it's been almost 40 years since my graduate level statistics courses—but those look like pretty big standard errors to me....just the kind you might get from a human expert.

I am quite confident that with the appropriate pedigree information and a good look at the physical horse that I could predict its best racing distance within about a quarter mile pretty damned consistently.

Still, people tend to want what they perceive as certainty. I strongly suspect that many of the large farms and racing stables in England and Ireland have already opted to have all of their horses tested. To a billionaire, the $1400 per horse cost of the test is pretty meaningless.

I also strongly suspect, however, that the test will have virtually no effect on employment of bloodstock agents and other advisers. Equinome's test cannot tell you if that yearling with the genotype you prefer has offset knees or a curby hock, or whether he moves like a racehorse.

That requires a human eye, human intelligence, and human experience.

P.S. It may well be true that Dr. Hill has since gathered more unpublished evidence that reduces the standard error. But the spreads in the data actually make sense in terms of the way horses actually race. There are horses that are purely sprinters. There is another group that can win at sprint distances but are better up to about 9 furlongs. Then there is another group that generally can't beat decent horses at six furlongs but can win between 7 or 8 and 12 furlongs or more. And then there are horses that don't really fit within any of those patterns but are superior at every distance. Those are rare animals indeed.

Wednesday, January 20, 2010

Don't panic

As expected, Emmeline Hill, PhD.,, published her study on the relationship of specific gene alleles to maximum win distance today at the PLos1 online journal. If you prefer not to slog your way through all the scientific jargon of academe, you can read the commercial version of the results at Equinome, the website of the company founded by Hill and trainer Jim Bolger to market the test based on Hill's research. The core findings are embodied in this page from the website.

Briefly, the research shows that there are two alleles, "C" and "T", at a particular position on a gene that governs muscle mass in Thoroughbreds. This means the horse's genetic code at that particular spot must read either "CC", "CT" or "TT". The important finding from the research on populations of both elite and non-elite Thoroughbreds is that CC horses strongly prefer sprint distances and are more precocious, CTs are mostly milers and 10-furlong horses who may or may not be precocious, and TTs are mostly 10-furlong and up horses.

The distributions of the genes are about the same in the elite and non-elite groups, so Equinome does not claim to test for the class of the animal, just the distance capacity.

What does this mean for those of us who make our livings looking at horses and/or analyzing pedigrees? Not as much as you might first least not if the market responds rationally (perhaps too much to hope for in an irrational business). The important point is that the test has nothing to do with class, only probable distance capacity. I don't know about you, but I think I generally have a pretty good idea of the probable distance capacity of a prospective foal from a mating I recommend. The test would give breeders more information on the prospective sire and dam and the statistical probabilities of the outcome. It is obvious from the data, that in the contemporary, commercial Thoroughbred world, the most desirable combination is CT. And if you mate two CCs or two TTs, you're not going to get any CTs.

It is also obvious, however, that the only way to guarantee you get all CTs is to breed a CC to a TT. In racing mythology, this is what is familiarly known as a "fish and fowl mating", and it is just about as far out of favor as it could get, and for good reason. For instance, if one bred a 2 1/2 mile Ascot Gold Cup winning sire (Yeats, for example) to a filly winner of the 6-furlong Breeders' Cup Filly and Mare Sprint (Informed Decision, for example), what would you expect to get? The perfect 10-furlong horse? Well, no. History and practice have shown far too often that this simply does not work well, and it is very rarely attempted these days, even taking into account the fact that Gold Cup winners get virtually no chance at stud nowadays.

If you breed CTs to CTs (the most obvious and common tactic), you're going to get 25% CCs, 50% CTs and 25% TTs. That matches up extraordinarily well with what happens in the real world when you breed an 8-10 furlong sire to an 8-10 furlong mare. You'll get a few fast horses that can't stay, a good number of middle-distance types, and a few slow ones that can gallop forever.

The market response to this test is going to be very interesting indeed. The test is a bit pricey at 1,000 euros (about $1,400 currently) per sample (according to the terms of service on the website), but then if you're pondering spending $1-million on a yearling or even $100,000 on a stud fee, what's $1400? The more interesting question for the prospective racehorse market is....exactly who is going to buy the test?

The problem for yearling or juvenile buyers is that, according to the website, the test takes three weeks, so you can't look at a horse at the yearling sale, obtain a blood sample (and of course obtaining the seller's permission to do so), get it tested, and buy the horse the next day. That means that the real market may actually be the sellers of yearlings and two-year-olds, not the buyers. And if you were selling ten yearlings, would you really want to tell buyers that five of them are CTs, three CCs, and two TTs? I doubt many will, though I can envision an environment where all essentially are forced to do so should a prestigious breeder begin the practice, just as they are now forced to put damning radiographs in repositories. On the American market at least, those two TTs would be just about guaranteed to be no bids, no matter how handsome they might be. Once the horse is bought, of course, then the buyer has plenty of time to find out just what type of horse he has acquired. It seems to me the likelihood of both buyers and sellers utilizing the test is higher in the juvenile market, where horses are breezing well ahead of the sale and both sides have more time to consider their options.

I plan on interviewing Emmeline Hill on behalf of Thoroughbred Times on Thursday, so check into Thoroughbred Times Today and the Thoroughbred Times website for excerpts and into the weekly print issue for the full interview.

Sunday, January 17, 2010

The future is now?

Easily the most interesting thing that flickered across my laptop screen over the last week was this item from the Irish Times. It may also have been the most significant, but it is impossible to tell at this point.

Emmeline Hill is both a horsewoman and a geneticist. Her grandfather, Charmian Hill, owned the immortal jumper Dawn Run, who won 23 of 37 races in the 1980s and remains the only horse to have won the two biggest races at the Cheltenham festival, the Champion Hurdle (1984) and the Cheltenham Gold Cup Steeplechase (1986). Dawn Run is still regarded as the best mare in the history of English National Hunt racing.

Emmeline Hill's primary occupation over the last decade or more, however, has been as a researcher into the genetics of the Thoroughbred. She began her career as a student of and assistant to Patrick Cunningham, the famed geneticist at Trinity College Dublin, one of the first geneticists to publish peer-reviewed research on Thoroughbred genetics. Cunningham's research showed, among other things, that certain founding fathers of the breed--Herod, Eclipse, Godolphin Arabian--each contribute between 13% and 17% of the genes of modern Thoroughbreds. He also showed that racing ability is about 35% heritable, and that the ability of the modern racehorse, as measured by Timeform ratings, is still increasing at a rate predicted by that measure of heritability.

Hill has built upon and markedly expanded the genetic research Cunningham began. Hill led the pioneering study of Mitochondrial DNA patterns in Thoroughbreds published in the August 2002 issue of Animal Genetics that showed that a few hallowed Thoroughbred female lines do not trace to the foundation mare legitimized by the General Stud Book. You can read Patricia Erigero's excellent summary (originally published in Thoroughbred Times) of Hill's mtDNA work at Erigero's and Anne Peters's Thoroughbred Heritage website, which also includes a link to Hill's original Animal Genetics article.

Not surprisingly, far more geneticists work on human genetics than Thoroughbred genetics, and by the mid-naughts, those researchers had discovered more than 140 human genes connected with fitness and athletic performance. Hill began work on cross-referencing those genes with the equine genome and researching their relationship to performance in Thoroughbreds by sequencing DNA of a sizable sample of Thoroughbreds compared to non-Thoroughbreds.

The result of that study was published in the online journal PLOS1 (Public Library of Science) in June 2009, and publicized in the Irish Times.

As published Hill's study shows that, yes, Thoroughbred genetics is different from non-Thoroughbrd genetics in ways that are known to contribute to muscle mass, aerobic and anaerobic energy usage, lung capacity, etc., etc. Surprise, surprise! And, no, I don't mean to belittle this important research, but as far as I can see, that particular paper simply confirmed what any thoughtful horseman already knew--the Thoroughbred possesses better genes for athletic performance than the non-Thoroughbred.

What I am trying to puzzle out now is exactly how this research translates to Hill's and Jim Bolger's new venture, Equinome, which plans to test DNA of prospective sires and dams and prospective racehorses and make recommendations for breeders and buyers. Those recommendations are clearly meant to be based on Hill's research, but nowhere in what has been published is there any information on variations within the population of Thoroughbreds she tested for the study. In fact there are only two sentences in the PLSO1 paper that presage Equinome's proposed services: "We have identified a number of candidate performance genes that may contain variants that could distinguish elite racehorses from members of the population with less genetic potential for success. Revealing such polymorphisms may aid in the early selection of young Thoroughbreds in the multi-billion dollar global Thoroughbred industry."

It is quite possible, even probable, that Hill and colleagues have, in fact, found variations in the Thoroughbred population used in their study, or in larger populations not included in the published study, but, if so, that data has not been published. Therefore, it is impossible at this point to evaluate the validity of the approach. From the information presented in the PLOS1 paper, one could obviously devise a test to tell a Thoroughbred from a non-Thoroughbred, but then, again, we already have's called the Stud Book. From a scientific point of view, it is clear that this could be a fruitful avenue of further research....but that research, if it exists, has not been published.

All we know about the research Hill's and Bolger's tool is based on is this sentence from last week's Irish Time's article: "The test is based on research by Hill into athletic performance traits in horses conducted and the project was supported by the Irish Thoroughbred Breeders Association which provided the required DNA samples from elite racehorses." Well, that is certainly interesting, but it tells us virtually nothing about whether or not this might be a useful tool.

As I said in the beginning, I'm just trying to figure this out. Nothing that I've written above should be taken as a put down of Hill's research or even of the potential usefulness of the tool (though the Irish Times's rather breathless adulation is perhaps a valid target). I hold a graduate degree in a scientific field (statistical experimental psychology), and I have always approached Thoroughbred breeding as a quasi-scientific endeavor, applying the scientific approach to pedigrees as much as our tools allow. I'm not about to denigrate anyone doing real science on Thoroughbred genetics. Instead, I heartily applaud Hill's research and hunger for more. And I accept that in our capitalist world, the line between science and commercialism is essentially non-existent. In fact, University College Dublin's Nova unit, specifically designed to commercialize UCD's research, awarded Hill a grant to help fund Equinome. But I still think the scientific data should be published first and I wonder what business the ITBA has at least indirectly funding this commercial venture as well.

Do elite Thoroughbreds possess different genes in critical locations than the non-elite population? There can be no doubt whatsoever that this is true. Does Equinome have a test that can tell you whether or not a particular Thoroughbred possesses some of those critical genes? Clearly they believe they do.

If they do, then they will forever change the process of buying and selling horses. Mr. Horse Breeder, do you think it's difficult to sell your less attractive, less well-bred horses now? Mr. Pinhooker, do you find it almost impossible to market a horse that cannot work fast? Wait until everyone demands a DNA test before they will buy your product. The individuals targeted no doubt would change, but the gap between the favored few and the rest? If you think it's bad now, you ain't seen nothing yet.

It was rumored at the time that the Maktoum family cooperated with and perhaps helped fund Hill's mtDNA study. It will be interesting over the next few years to see who pays Equinome for their services and whether their record as buyers improves.

Friday, January 8, 2010

The Northern Dancer effect

Old friend Bill Oppenheim penned one of his most interesting articles in years in the January 6 issue of Thoroughbred Daily News. If you don't subscribe to TDN, that article is behind their pay wall, so I can't help you there.

The theme of the story was Bill's take on the changes in the industry both in the U.S., where he started out and in England, where he now resides (well, Scotland actually) in the last 20 years or so. It's a wide-ranging piece and a very good read that covers the main points pretty accurately and succinctly, with good stuff on the changes in the way people buy horses, the decline of female families as related to owner-breeders, why Europeans don't trust American catalogs, and more.

The most interesting part of the article for this old curmudgeon, however, was the section on inbreeding to Northern Dancer. You'll have to learn about Bill's system of rating runners and sires on your own, but the interesting thing to a pedigree maven is that Bill found that the percentage of his "A runners" (perhaps roughly equivalent to listed winners and above) inbred (though only through sire and broodmare sire lines) to Northern Dancer increased from 4.3% of "A runners" foaled in 1996 to 9.6% of "A runners" foaled in 2005. In other words, about 10% of these elite runners are now by Northern Dancer line horses out of mares by Northern Dancer line horses. But note that he's not counting any other occurrences of Northern Dancer on either the top or the bottom of the pedigree, so, no doubt, he's missing a substantial number of Northern Dancer crosses.

Bill rightly notes all the appropriate caveats about this number, and I'm here to tell you that I have no doubt whatsoever that this represents a serious underestimate of the percentage of horses actually inbred to Northern Dancer among his "A runner" population.

For the racing year of 2008 (the last year Bill's data covered), I kept certain data on every Graded stakes winner in the United States in a spreadsheet, including all inbreedings within the first six generations. 36.5% of them were inbred to Northern Dancer. If I'm not mistaken, Bill's data covers North American racing plus racing in the five major European countries, whereas, as mentioned, my spreadsheet covered only U.S. (not including Canada) racing.

Now here's the thing....Northern Dancer is far, far more widespread in Europe--particularly England and Ireland--than he is currently in the U.S. The son of Nearctic so thoroughly dominates European racing that over half (71 of 140) of the stallions currently listed in Weatherby's stallion book are Northern Dancer line stallions....and most of the non-Northern Dancer line horses will have at least one cross of Northern Dancer on the bottom side.

By contrast, of the 501 horses listed in the Thoroughbred Times online Stallion directory, 162, or 32.4% are from the Northern Dancer male line. That's just one illustration of how much more dominant Northern Dancer is in Europe than in the U.S., but it helps explain why I'm so certain Bill's figures on inbreeding to Northern Dancer are much too low. If the percentage of graded winners in the U.S. inbred to Northern Dancer is somewhere around 35% (and rising, by the way) then it simply has to be higher than that in Europe where Northern Dancer has been totally dominant for the last 20 years. The last time a horse from a male line other than Northern Dancer led the English sire list was 1989 when Blushing Groom topped the list. As we've been discussing this week, exactly how many times which sire line topped the North American list is a vexed question at best, but no matter how you count it, Northern Dancers have stood at the top no more than half the time over that 20 year span.

Now, let's be clear. Neither Bill nor I are complaining about inbreeding to Northern Dancer. It's very clearly a good thing. As Vuillier pointed out over 100 years ago, the best horses (graded stakes winners in this case) of the present basically predict the pedigrees of the future because they're the ones that get the best chance to breed on. So obviously the future of the breed is inbreeding to Northern Dancer.

Those who don't like it can try to avoid it if they their own cost.

Wednesday, January 6, 2010

None of this makes any sense

This will be my last post on sire lists. I promise.

Here's the deal.

Thoroughbred Times's general sire list includes earnings for North American-based sires from the 18 countries for which the Jockey Club database receives complete racing data. According to the Thoroughbred Times general sire list, Giant's Causeway was leading sire of 2009 with earnings of $15,950,453. (Full disclosure: I currently write part time for Thoroughbred Times and was intimately involved in developing the software that produces that list.). Thoroughbred Times also calculates a general sire list for its annual Racing Almanac by North American earnings only. By that criteria, Smart Strike was leading sire in North America in 2009 with earnings of $9,048,551.

The Blood-Horse general sire list includes earnings for North American-based sires for Northern Hemisphere countries--except Hong Kong and Japan (I think....I don't receive the BH print magazine and can't find any explanation online). According to the Blood-Horse general sire list, Giant's Causeway was leading sire of 2009 with earnings of $11,079,918. The Blood-Horse also makes available on their website a leading sire list by North American earnings only. By that criteria, Smart Stike was leading sire of 2009 with earnings of $9,048,551. Like Thoroughbred Times, however, the Blood-Horse uses its list that includes international earnings to designate their leading sire.

The Jockey Club's EquineLine sire reports also report Giant's Causeway as leading sire. According to a Jockey Club representative, they use basically the same criteria as the Blood-Horse "to avoid over inflating progeny earnings because of the purses in Japan/Hong Kong." Okay, so you use earnings from the richest day in Thoroughbred racing--Dubai World Cup day, but don't use Japan and Hong Kong because they "inflate" progeny earnings. Somehow I thought inflating progeny earnings was what sire lists were all about, but never mind.

The Daily Racing Form's sire lists (which, again, I don't see the print version so someone tell me if I have this wrong) list Smart Strike as leading sire of 2009, because their list is based on earnings in North America and Dubai World Cup day only. That's right....not all of Dubai, just World Cup day.

Bloodstock Research's Bloodstock Journal and Brisnet service publishes a leading sire list on their website, but frankly I have no idea what criteria it is based on, because, although Giant's Causeway is listed as the leader, Cape Cross, who stands in Ireland is listed second, and Irish-based Danehill Dancer and English-based Oasis Dream both appear in the top ten.

The NTRA website--which I suppose is as close to an "official" site for Thoroughbred racing information as anything else, at least to the general sporting public--links to this EquineLine list, which, it turns out, is very similar to the Brisnet list, though the earnings totals are different. Best I can figure, Cape Cross, Danehill Dancer, are included because they had a starter in North America in 2009.

Is it any wonder the sporting public is confused by and steadily losing interest in Thoroughbred racing? We can't even begin to agree on how to keep our most vital statistics. By my count we have at least five different ways of counting what should be a simple thing, and, as a result, come up with two different horses as leading sire.

None of this makes any sense.

No other country in the world that I know of includes racing outside their borders in their leading sire statistics. And, yes, when I participated in producing specs for the Thoroughbred Times sire list software, I argued in favor of including worldwide earnings, because that gives the most complete picture of the sire's accomplishments, though, in my defense, I insisted we calculate a North America only list as well.

Sire lists that reflect worldwide earnings keep advertisers happy, because they generally give higher numbers. We have to have those stats, and sire lists based on international earnings (all international earnings--no picking and choosing what countries you use) are a perfectly valid way to look at it. But, ultimately, that should not determine the leading North American sire.

That title should be determined by earnings in North America only. And nobody in the business--that's right, nobody--uses those criteria to determine the horse they call simply "leading sire."