You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My basic idea was to mimic the trends of already experimented algorithms, one for each complexity class (substring for linear, PeakSegPDPA for loglinear, cDPA for quadratic) via a glm/lm method (going for the straight y~x formula, or curves like y~I(x^2) or y~(log(x)) would work too) and subsequently combine them to obtain a plot with all the three regression lines denoting the margin for complexity classes linear, log-linear and quadratic.
As an initial try with geom_smooth() to plot the regression, I made a mistake by applying the old logic (adding third column to the data frames to distinguish by color aesthetics for each algo, rbind()'ing the data frames and plotting the same) to plot the regression which resulted in a line stretching between the two geom lines (kept just for comparison at the first) for the PDPA and cDPA algos (tried with 2 of them first), which made me realize I was just plotting the regression line for the whole rbinded data frame, (with an average that comes in between the two, and hence the position) instead of creating regression lines for the two algorithms seperately:
So I thought to create seperate geom_smooth layers (+ point/line with different colour assigned) for the algorithms (seperately specifying aesthetics) and started by re-assigning column names to the data frames, (containing benchmarked data of PDPA/cDPA) replacing 'Timings' column with the name of their respective complexity classes (so as to distinguish based on that, for the y-axis), combining them via a cbind (common column being Data sizes) and finally plotting them by first with PDPA (y set as loglinear) and then with the cDPA (y = quadratic) layer:
library(testComplexity)
library(data.table)
df.PeakSegPDPA<- asymptoticTimings(PeakSegOptimal::PeakSegPDPA(rpois(N, 1),rep(1, length(rpois(N, 1))), 3L), data.sizes=10^seq(1, 4, by=0.5))
df.cDPA<- asymptoticTimings(PeakSegDP::cDPA(rpois(N, 1), rep(1, length(rpois(N, 1))), 3L), data.sizes=10^seq(1, 4, by=0.5))
# Changing the 'Timings' column to complexity class names, in order to distinguish via that: (y component)
colnames(df.cDPA)[1] ="Quadratic"
colnames(df.PeakSegPDPA)[1] ="Loglinear"# merge by data sizes column created copies x 10 (70k obs. for the 700 here), so used cbind, which just creates an extra column: df<- cbind(df.PeakSegPDPA, df.cDPA)
df[2] <-NULL# Deleting one data size column, since cbind keeps the attribute from both the data frames, at the 2nd and 4th indices here # Resultant data frame:
data.table(df)
LoglinearQuadraticDatasizes1:296102138101102:15770053802103:12330045301104:14150141402105:1263014120110---696:306150800374906430110000697:335036102326113330110000698:310979501395108040210000699:354960002321143340110000700:381043901372805470110000# plot:
ggplot(df, aes(x=`Data sizes`, y="Loglinear")) +
geom_point(color="red") +# using point since line would get overlapped by the straight regression line
geom_smooth(method="glm", se=FALSE, fullrange=TRUE) +
geom_point(aes(y="Quadratic"), color="blue") +
geom_smooth(aes(y="Quadratic"), method="glm", se=FALSE, fullrange=TRUE) +
labs(x="Data size", y="Runtime") +
scale_x_log10() + scale_y_log10()
This throws an error:
Removed the log scale on Y, but did not get whats expected:
The text was updated successfully, but these errors were encountered:
you are using glm to fit the models right? use the predict method for each glm model to get predicted times, then save them to a data table, then using geom_line with aes(color=complexity), something like....
you are using glm to fit the models right? use the predict method for each glm model to get predicted times, then save them to a data table, then using geom_line with aes(color=complexity), something like....
My basic idea was to mimic the trends of already experimented algorithms, one for each complexity class (substring for linear, PeakSegPDPA for loglinear, cDPA for quadratic) via a glm/lm method (going for the straight
y~x
formula, or curves likey~I(x^2)
ory~(log(x))
would work too) and subsequently combine them to obtain a plot with all the three regression lines denoting the margin for complexity classes linear, log-linear and quadratic.As an initial try with
geom_smooth()
to plot the regression, I made a mistake by applying the old logic (adding third column to the data frames to distinguish by color aesthetics for each algo,rbind()
'ing the data frames and plotting the same) to plot the regression which resulted in a line stretching between the two geom lines (kept just for comparison at the first) for the PDPA and cDPA algos (tried with 2 of them first), which made me realize I was just plotting the regression line for the whole rbinded data frame, (with an average that comes in between the two, and hence the position) instead of creating regression lines for the two algorithms seperately:So I thought to create seperate
geom_smooth
layers (+ point/line with different colour assigned) for the algorithms (seperately specifying aesthetics) and started by re-assigning column names to the data frames, (containing benchmarked data of PDPA/cDPA) replacing 'Timings' column with the name of their respective complexity classes (so as to distinguish based on that, for the y-axis), combining them via a cbind (common column beingData sizes
) and finally plotting them by first with PDPA (y set as loglinear) and then with the cDPA (y = quadratic) layer:This throws an error:
Removed the log scale on Y, but did not get whats expected:
The text was updated successfully, but these errors were encountered: