I think in general you're correct. I don't see how you could account for enough to get meaningful results regarding game calling. The catcherFX data COULD be useful in evaluating framing though, as it might help show whether the pitcher should share credit for the framing results or not (regarding how well they repeatedly hit their target). The other major factors should be relatively easy to account for (umpire familiarity with the pitcher, pitcher stuff, pitcher reputation via previous success and control).