So I was curious to see what would happen if I would apply the same regression model Chris used for xK% (same independent variables), but for batters instead. Interesting to notice the strong relationship between xK% and K% (p < 0.0001):
I did the same for xBB%. While the relationship between xBB% and BB% wasn't as strong, it was still pretty strong and statistically significant (p < 0.0001):
There was also more noticeable heteroskedasticity in this model (the non-linear R^2 value was around 0.67), which I should probably check using a Levene's test. Otherwise, it looks good!
In addition, just for interest's sake, I ran the same multiple regression model but with HR/FB ratio as the dependent variable. I got an adjusted R^2 value of 0.37 (p < 0.0001). Much lower than the other two, but still surprisingly significant.
What do you think, Chris?
I should state what data I used. I used players data btwn 2008-2013 seasons, minimum 200 PA (that's combined seasons, not single seasons, data).
EDIT: Okay, so after running a Levene's test on the xBB%-BB% relationship, the p-value was less than 0.05. As such, the model violates the equality of variance requirement for regression. I'm guessing this is the result of a larger sample size needed for BB% relative to K%.