-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plots for Statistical Models #290
Comments
Thanks!
|
What I'm personally trying to accomplish is just a line with the error band representing the fixed coefficient estimates. I've been able to hack together something that rips into the guts of a mixed model and do that when I know what coefficient is a slope or intercept. However, I know that what I'm asking requires a lot of other decision to be made for it to work for other people flexibly. So I'm assuming you want me to answer those questions as potential design implementations and not just y special case. Feel free to rein me in if I'm off the mark here. I would think this would be a two step process. The first part determines what subplots to do given the type of model (e.g., scatter, boxplot, etc.) and then there would be a more deterministic step that can use generic functions (hopefully) to extract the appropriate coefficients from the model to make the series. So the code would be something like: plot(fm::StatisticalModel) = [plot(fm, seriestype=s) for s in determine_series_types(fm)] Now, to answer your questions more directly:
|
Exactly, I'm leveraging the fact that you have a present interest to pick your brain for ideas :-) In terms of your answer to 5, do you have an example of anyone doing this in practice? |
I've seen several of varying quality.
The last one is probably the one I trust the most but there aren't many plots there (first |
I've been thinking about how to actually get this working and have a small example plotting just the confidence band. I've tried to be obnoxiously verbose about the variable names so it's self explanatory. I think this part is pretty close to being the ideal implementation given the correct arguments: function xi_error(xi, x̄, stderror_intercept, stderror_slope)
return sqrt(stderror_intercept + ((x_i - x̄) * stderror_slope)^2)
end
function linear_error_band!(sc, x, ŷ, x̄, stderror_intercept, stderror_slope)
band_error = [xi_error(x_i) for x_i in x]
band!(sc, x, ŷ .- band_error, ŷ .+ band_error)
end The problem is that I don't think there is a common syntax that can be used to extract these. We'd need to be able to do something like: # name is the model's coefname for x
function linear_error_band!(sc, fm::RegressionModel, name)
x = modelmatrix(fm)[:,name]
linear_error_band!(
sc,
x,
response(fm),
mean(x),
stderror_intercept(fm, name),
stderror_slope(fm, name)
)
end |
I was thinking more about using the inbuilt using GLM, StatsModels, StatsPlots, DataFrames, RDatasets
iris = dataset("datasets", "iris")
mod = lm(@formula(PetalLength ~ PetalWidth), iris)
@recipe function f(mod::RegressionModel)
newx = [ones(200) range(extrema(mod.model.pp.X[:,2])..., length = 200)]
newy, l, u = predict(mod, newx, interval = :confidence)
@series begin
newx[:,2], newy
end
@series begin
primary := false
fillalpha --> 0.5
linealpha := 0
fillto := l
newx, u
end
end
plot(mod)
@df iris scatter!(:PetalWidth, :PetalLength, ms = 2, c = :black) |
Note that I'm having trouble with |
I don't know if it's useful but I have a an example here that can be used to produce a ribbon and line given two continuous variables. |
This is where I've been getting stuck too. I've been doing stuff like I've been working on some stuff in StatsModels that could hopefully make it easier to map all this information back to the original formula terms, but I don't anticipate quick changes in the JuliaStats repos even if I PR's into StatsModels. |
oh they can be quick. I think making the statistical objects offer the correct data back with accessors is the right way to go here |
I could make a PR based on this to start with and we could hash it out from there? using RecipesBase
@recipe function f(mod::RegressionModel)
newx = [ones(200) range(extrema(mod.model.pp.X[:,2])..., length = 200)]
newy, l, u = predict(mod, newx, interval = :confidence)
ribbon := (u-newy, newy-l)
label --> "regression"
newx[:,2], newy
end
# use as
using GLM, StatsModels, StatsPlots, DataFrames, RDatasets
iris = dataset("datasets", "iris")
mod = lm(@formula(PetalLength ~ SepalLength), iris)
plot(mod, lw = 4, legend = :topleft)
@df iris scatter!(:SepalLength, :PetalLength, ms = 4, group = :Species, msc = :white) |
I'd be up for that. I'm sure a lot of people have opinions about this so it would be nice to not solve everything and find out that people disagree with it. BTW, I have a PR to StatsModels.jl here that starts to solve the problem of indexing formulas. It's definitely a breaking change (switches A less robust but temporary solution to deriving information about the variables we want to plot is something like this: modelmatrix(mod)[:, findifirst(==(coefficient_name_i_want_to_plot), coefnames(mod))] |
Might make more sense to define the recipes based on the combination of term and model. Then you could have different recipes for continuous vs. categorical terms. And the term gives you all the information you need to generate the x axis (e.g., the |
Getting the correct data back via accessors would be a lot easier with some API changes we've been considering for a long time (requiring that models hold onto their |
This sounds like a good idea. We can build plots around StatsModels terms and then it's easier for packages like GLM to make a custom method for something like |
As I brought up on slack, it would be nice to have a way of directly creating plots form fitted statistical models. I've been able to make a plot similar to this for linear models:
There are other plotting packages that allow you to throw your x and y in there and get something similar but this means that you can't actually plot the coefficients of a fitted model that may have other variables that influence estimated coefficients. Just for reference, I've been using GLMs and mixed linear models.
The text was updated successfully, but these errors were encountered: