Jump to content
Jays Centre
  • Create Account

Recommended Posts

Posted
Also are you guys using standard league average splits when adjusting for home vs road instead of looking at home vs road splits for individual players and trying to regress them? That would obviously make sense since it would be a lot of work to separate possible park factors from differences in home vs road splits. Using league average differences obviously has its own problems, we'd be overestimating or underestimating the outliers, guys who for some reason perform worse at home and better on the road.
  • Replies 1.5k
  • Created
  • Last Reply

Top Posters In This Topic

Posted
Also are you guys using standard league average splits when adjusting for home vs road instead of looking at home vs road splits for individual players and trying to regress them? That would obviously make sense since it would be a lot of work to separate possible park factors from differences in home vs road splits. Using league average differences obviously has its own problems, we'd be overestimating or underestimating the outliers, guys who for some reason perform worse at home and better on the road.

 

I'm using league average data from 5 years for home/away splits with a few constants used as multipliers. But you could also go by individual player home/away splits and regress towards the mean... seems like it could be a little more precise even (not sure it's worthwhile). The principle is the same as applying regressed platoon splits.

 

Your point about "outliers" is troubling because it's valid... we're using generalization because it might be the only realistic way to try and normalize the projected data. Unfortunately, there are exceptions.

 

Ichiro Suzuki is a good example, career reverse splits. Apply the standard regressed splits on Ichiro Suzuki and you will end up with erroneous projections. Nori Aoki is another example. You'd be better off not regressing their splits at all our perhaps flagging them as having reverse splits so you could apply the opposite approach.

Posted
I'm using league average data from 5 years for home/away splits with a few constants used as multipliers. But you could also go by individual player home/away splits and regress towards the mean... seems like it could be a little more precise even (not sure it's worthwhile). The principle is the same as applying regressed platoon splits.

 

Your point about "outliers" is troubling because it's valid... we're using generalization because it might be the only realistic way to try and normalize the projected data. Unfortunately, there are exceptions.

 

Ichiro Suzuki is a good example, career reverse splits. Apply the standard regressed splits on Ichiro Suzuki and you will end up with erroneous projections. Nori Aoki is another example. You'd be better off not regressing their splits at all our perhaps flagging them as having reverse splits so you could apply the opposite approach.

 

For platoon splits i do them on an individual basis and it works great and there are a few guys with neutral or negative platoon splits and i'm applying my numbers according to that. You mention Nori Aoki and my regression calculations don't see him with reverse platoon splits and if you look at how much BABIP is driving the difference in numbers against lefties versus righties it makes sense. For Ichiro, he has ~2500 PA's against lefties, that's enough to take his platoon splits at face value.

The thing with home vs away numbers for individual players is you gotta separate park factors, i don't know how you do with that hitters because K to BB rates(which are theoretically stable from park to park) aren't that important for hitters. So for now i'm just applying home vs away league average differences.

Posted
For platoon splits i do them on an individual basis and it works great and there are a few guys with neutral or negative platoon splits and i'm applying my numbers according to that. You mention Nori Aoki and my regression calculations don't see him with reverse platoon splits and if you look at how much BABIP is driving the difference in numbers against lefties versus righties it makes sense. For Ichiro, he has ~2500 PA's against lefties, that's enough to take his platoon splits at face value.

The thing with home vs away numbers for individual players is you gotta separate park factors, i don't know how you do with that hitters because K to BB rates(which are theoretically stable from park to park) aren't that important for hitters. So for now i'm just applying home vs away league average differences, there's also the fact that

 

You seem pretty knowledgeable. You should post more.

Posted
For platoon splits i do them on an individual basis and it works great and there are a few guys with neutral or negative platoon splits and i'm applying my numbers according to that. You mention Nori Aoki and my regression calculations don't see him with reverse platoon splits and if you look at how much BABIP is driving the difference in numbers against lefties versus righties it makes sense. For Ichiro, he has ~2500 PA's against lefties, that's enough to take his platoon splits at face value.

 

In my opinion Aoki IS a reverse platoon player (most likely), the same as Ichiro, it's because of their unorthodox hitting style. You may have some case with the BABIP but I still say he's a reverse platoon type. You said your regression calculations don't see him with reverse splits but regression calculations don't see anyone with reverse splits; which goes back to my original point. You said 2500 PA is enough to see Ichiro at face value but actually the standard regression formula doesn't work that way. If you're regressing with 2500 PA against 1000 PA of league average you're still regressing against about 25% of league average (non-reversed splits) which would actually be wrong.

 

The thing with home vs away numbers for individual players is you gotta separate park factors, i don't know how you do with that hitters because K to BB rates(which are theoretically stable from park to park) aren't that important for hitters. So for now i'm just applying home vs away league average differences.

 

By the way K and BB rates are changing with parks (i.e. thinner air makes for less break on balls and vice-versa), and that's why SO/BB rates are included in Park Factor data.

http://www.fangraphs.com/guts.aspx?type=pf&season=2014&teamid=0&sort=7,d

Posted (edited)
in my opinion aoki is a reverse platoon player (most likely), the same as ichiro, it's because of their unorthodox hitting style. You may have some case with the babip but i still say he's a reverse platoon type. You said your regression calculations don't see him with reverse splits but regression calculations don't see anyone with reverse splits; which goes back to my original point. You said 2500 pa is enough to see ichiro at face value but actually the standard regression formula doesn't work that way. If you're regressing with 2500 pa against 1000 pa of league average you're still regressing against about 25% of league average (non-reversed splits) which would actually be wrong.

At second look i noticed that Aoki has both a higher BB rate and a higher ISO against righties than lefties, the 110 point difference in BABIP is almost entirely driving those splits, not only does accounting for BABIP erase those reverse platoon splits but it would make him a conventional left handed hitter. But your bigger point is interesting, i'm pretty new to dealing with regressing splits so i'll defer to your knowledge. You're right that the whole point behind regressing splits is the assumption that all lefties do worse against LHP than RHP and all righties do worse against RHP than LHP and it leaves out the outliers.

 

by the way k and bb rates are changing with parks (i.e. Thinner air makes for less break on balls and vice-versa), and that's why so/bb rates are included in park factor data.

http://www.fangraphs.com/guts.aspx?type=pf&season=2014&teamid=0&sort=7,d

Yeah i've noticed that on the park factors page on fangraphs, but i question the validity because what explains these results? Thinner air doesn't because although Colorado makes sense why are Pittsburgh and Detroit showing up at the bottom and there doesn't seem to be a relationship with temperature or humidity either, there's a mix of cold weather and warm weather cities at the top and the bottom. But the results are statistically significant so there's some variable or a combination of variables that explain this, so maybe i should incorporate them

Edited by nmrch
Posted

LTR and JFas, this might be a stupid question to ask and should tell you how new i am to this but am i wrong in assuming that if a LHH has more than 1000 PA's against LHP (or 1500 if want to use that as a threshold like some do apparently) than we don't have have to regress his splits. Same with 2200 PA(or whatever threshold you use) for RHH vs LHP and 600 PA's for switch hitters vs LHP.

 

I would also appreciate it if you guys could shed light on what you guys are doing with switch hitters in general including what constant you're using.

Posted
What times are the "late" contests usually? 10 PM EST? I have full day and evening (7pm) lineups in my model right now.

 

IIRC (for FD) late games are 9:40 PM EST or later. Evening are 6:40 PM EST or later. There is also early only (2 or more games). If there is only 1 game in early-mid afternoon it's not included in any games.

 

Note: I don't play any full day games if the start time is in the afternoon since line-ups are never up in time for evening/late games. It's an even playing field but I prefer not to leave my line-ups to chance.

Posted

 

This was nice, i now don't have to worry about whether i was doing the proper regressions, some wasted efforts but i also now only have to scrape a 1000 pages a day instead of 3000. Cuts down my program running time from 6 minutes to ~90 seconds and my IP's less likely to be banned by the admin over at Fangraphs.

Posted
This was nice, i now don't have to worry about whether i was doing the proper regressions, some wasted efforts but i also now only have to scrape a 1000 pages a day instead of 3000. Cuts down my program running time from 6 minutes to ~90 seconds and my IP's less likely to be banned by the admin over at Fangraphs.

 

I'm not sure if this program still works or not, or if MLB has changed what's available, but it might be worth your while checking out.

 

http://sourceforge.net/projects/baseballonastic/

Posted

JFas, what does your model do with late games? Your all day projection just ignores the games from which the lineups aren't in yet right? For 1pm starts, even on opening day i don't see the late lineups being declared officially before 1 pm. Todays lineup for the season opener at 8.30pm was sent out like an hour ago at ~3pm.

 

If i might make a request can you add a column to your projections that say which game exactly are included in them as they get updated throughout the day, if that's not too much of a hassle.

 

FOr my projection, I'm looking for a different lineup source than BaseballPress, one that makes best guesses but i can't find a good one, it sucks because doing them manually and formatting them for my program is a pain in the ass.

Posted
JFas, what does your model do with late games? Your all day projection just ignores the games from which the lineups aren't in yet right? For 1pm starts, even on opening day i don't see the late lineups being declared officially before 1 pm. Todays lineup for the season opener at 8.30pm was sent out like an hour ago at ~3pm.

 

If i might make a request can you add a column to your projections that say which game exactly are included in them as they get updated throughout the day, if that's not too much of a hassle.

 

FOr my projection, I'm looking for a different lineup source than BaseballPress, one that makes best guesses but i can't find a good one, it sucks because doing them manually and formatting them for my program is a pain in the ass.

http://mlbstartingnine.com/ seems decent. They only seem to post them when the lineup comes from a reputable source (like a team's twitter feed). Once in a while they change last minute if a manager makes a last minute adjustment. For Fanduel purposes, sometimes the lineups aren't posted for late games until it's too late, but they probably aren't confirmed anywhere else for those games either.

Posted

http://mlbstartingnine.com/

seems decent. They only seem to post them when the lineup comes from a reputable source (like a team's twitter feed). Once in a while they change last minute if a manager makes a last minute adjustment. For Fanduel purposes, sometimes the lineups aren't posted for late games until it's too late, but they probably aren't confirmed anywhere else for those games either.

That's what Baseball press does too, i use it for my program and i think JFas does too, it would just be nice if there was a good source that makes best guesses before they're officially in.

Posted
Ya they ignore games with no lineup in. Opening day is a weird one with spread out games. No other day will be like that really, except Saturdays.

 

I could include a teams included csv separately to not screw up formatting?

 

that would be awesome, thank you very much

Posted
Would take a little work, but you could write an algorithm that guessed lineups based on most recent lineups and opposing pitchers.

 

You could probably pull injury status off fantrax for it too.

Verified Member
Posted
If enough line-ups get posted before 1 PM I will post my FD line-up (filling in the blanks for the non-posted line-ups).... hoping this happens.
Posted
Hopefully some more lineups are announced by 1. My model likes to have both teams lineups in a game to work properly. Only working with the first 3 games right now.

 

Are you gonna be playing the afternoon games on Fanduel? There obviously would be guaranteed lineups in that case, i might skip the all day contests and go for that.

Posted
Don't have the code set up for that. Not enough time to do it this morning. Might just go with a gut lineup with small money today.

 

I didn't have time to write an optimizer yet so i signed up for the Optimizer tool at Fantasy Cruncher, you can upload custom projections and it works pretty well. They have a 7 day "free" trial and i might end up paying for the month until i can find the time to write my optimizer.

 

If you want to post projected point totals here, i can filter out the games and post the optimum lineups for everyone.

Posted
JFas, you've noticed that your lineups violate the 4 player per team rule right? Is that a bug being caused by very few lineups being in?
Posted

P M. Bumgarner P SF@Ari $9,800 swap out

P Kyle Kendrick P Col@Mil $4,200 swap out

C Buster Posey SF@Ari $4,600 swap out

1B Miguel Cabrera Min@Det $5,000 swap out

2B Devon Travis Tor@NYY $3,000 swap out

3B Aramis Ramirez Col@Mil $4,200 swap out

SS Troy Tulowitzki Col@Mil $5,000 swap out

OF Ryan Braun Col@Mil $4,800 swap out

OF Bryce Harper NYM@Was $4,200 swap out

OF G. Stanton Atl@Mia $4,900 swap out

 

Thoughts?

 

Edit: This is DK

Guest
This topic is now closed to further replies.
The Jays Centre Caretaker Fund
The Jays Centre Caretaker Fund

You all care about this site. The next step is caring for it. We’re asking you to caretake this site so it can remain the premier Blue Jays community on the internet.

×
×
  • Create New...