-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a time limit (checked per iteration) to asymptoticTimings #4
Comments
first of all please use variable names like second, " cDPA is classified incorrectly as log-linear for N = 2 or 1" is also confusing, because N is typically used to refer to absolute number of rows/data/observations. but here you are using it for the 10^N value right? can you please use something like N = 10^power.of.ten instead? yes adding a time limit makes sense, but please make sure that our default time limit works with all of the test cases we have discussed #2 |
a default time limit of 0.1 second seems reasonable to me (then the user probably has a result in less than one second?) |
Right! am using seconds now (should be more convenient for the user than nanoseconds)
Yup - I got what your saying, the user would typically expect that for N or for in our case since we need it for the number of data set sizes in our sequence of powers of 10, we could give something like the maximum dataset size as the parameter, i.e. the size uptil which the expression would be tested upto. (we can directly include a sequence of
following this, the user can conveniently enter max dataset size (can rename or we can rename the parameter as asymptoticTimings <- function(e, power.of.ten, max.seconds)
{
lang.obj <- substitute(e)
fun.obj <- function(power.of.ten)
{
eval(lang.obj)
}
time.limit = ifelse(missing(max.seconds), 10^7, max.seconds*10^9)
data.set.sizes <- 10^seq(1, power.of.ten, by = 0.5)
l <- length(data.set.sizes)
timings.list <- list()
for(i in 1:l)
{
benchmarked.timings <- as.data.frame(microbenchmark(fun.obj(data.set.sizes[i])))
benchmarked.timings$data.set.size <- data.set.sizes[i] # collecting data set sizes at each iteration
timings.list[[i]] <- data.frame(benchmarked.timings$time, benchmarked.timings$data.set.size)
if(mean(benchmarked.timings$time) > time.limit) break
}
resultant.df <- do.call(rbind, timings.list)
colnames(resultant.df) <- c("Timings", "Data set sizes")
return(resultant.df)
} Also I'll add the additional parameter RN (as mentioned in proposal) or something with a better name such as
Yup sure, I'll test them and let you know the results in that issue's thread
In my case it takes a bit more than one second (2-3 seconds in most runs) to get the result, but I guess its because of the other factors such as time taken to evaluate the rest of the R code or my laptop speed. |
I'm sorry but can you explain what you are using N for? I am super confused now. For me N is the number of data points, data size, or rows/observations. the user should be able to specify N in their expression.
|
Am using N for the data size
Yup, they can specify the dataset size like asymptoticTimings <- function(e, N, max.seconds)
{
lang.obj <- substitute(e)
fun.obj <- function(N)
{
eval(lang.obj)
}
time.limit = ifelse(missing(max.seconds), 10^7, max.seconds*10^9)
power.of.ten = 0
while(N > 1)
{ if(N %% 10 == 0)
{ power.of.ten <- power.of.ten + 1
N <- N / 10
}
}
data.set.sizes <- 10^seq(1, power.of.ten, by = 0.5)
l <- length(data.set.sizes)
timings.list <- list()
for(i in 1:l)
{
benchmarked.timings <- as.data.frame(microbenchmark(fun.obj(data.set.sizes[i])))
benchmarked.timings$data.set.size <- data.set.sizes[i]
timings.list[[i]] <- data.frame(benchmarked.timings$time, benchmarked.timings$data.set.size)
if(mean(benchmarked.timings$time) > time.limit) break
}
resultant.df <- do.call(rbind, timings.list)
colnames(resultant.df) <- c("Timings", "Data set sizes")
return(resultant.df)
} |
this is inefficient, what are you trying to do?
|
Sorry I hadn't thought much while writing that - I was trying to get the power of ten from the value supplied as N, the simple way to do which was:
|
why are you trying to get the power of ten from the value supplied as N?
if the user wants to specify a sequence of data sizes manually I think the
syntax should be something like testComplexity(cDPA(rpois(N, 10)),
N=10^seq(1, 4))
…On Thu, Jun 4, 2020 at 11:34 AM Anirban ***@***.***> wrote:
this is inefficient, what are you trying to do?
while(N > 1)
{ if(N %% 10 == 0)
{ power.of.ten <- power.of.ten + 1
N <- N / 10
}
}
Sorry I hadn't thought much while writing that - I was trying to get the
power of ten from the value supplied as N, the simple way to do which was:
power.of.ten = log(N, 10)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHDX4U2KLW7APNN6S6JTFTRU7SJRANCNFSM4NQPSI6Q>
.
|
So we can use N for a range of values? sorry for the confusion I initially thought we had to include a single data size as N. asymptoticTimings <- function(e, data.sizes, max.seconds)
{
lang.obj <- substitute(e)
fun.obj <- function(data.sizes)
{
eval(lang.obj)
}
time.limit = ifelse(missing(max.seconds), 10^8, max.seconds*10^9)
l <- length(data.sizes)
timings.list <- list()
for(i in 1:l)
{
benchmarked.timings <- as.data.frame(microbenchmark(fun.obj(data.sizes[i])))
benchmarked.timings$data.size <- data.sizes[i]
timings.list[[i]] <- data.frame(benchmarked.timings$time, benchmarked.timings$data.size)
if(mean(benchmarked.timings$time) > time.limit) break
}
resultant.df <- do.call(rbind, timings.list)
colnames(resultant.df) <- c("Timings", "Data sizes")
return(resultant.df)
} > asymptoticTimeComplexityClass(asymptoticTimings(PeakSegDP::cDPA(rpois(data.sizes, 1), rep(1, length(rpois(data.sizes, 1))), 3L), data.sizes = 10^seq(1, 3, by = 0.5)))
[1] "quadratic" Renamed it to Edit: Closing this thread as the core objective here (implementation of the parameter |
@tdhock @neerajdhanraj
Consider a threshold
time.limit
as the maximum time value the benchmark can produce upto in an iteration (of a particular data set size) on average. (taking the mean, for only one comparison at each iteration)If the set threshold is crossed (which is measured after the run/iteration, like here) then it will stop further computation by breaking out of the loop, and avoid extra time spent on larger data set sizes.
This would be a double edged sword depending on what time limit the user provides, i.e. if a very small time bound is provided, the complexity might not be correct, but then the inclusion of user set limits would have the expected possibility of unfavourable values set (same case for GuessCompx) and then again, the user can always test for higher time bounds to verify.
For instance, cDPA is classified incorrectly as log-linear for N = 2 or 1 (as expected since the trend is similar to PeakSegPDPA/log-linear complexity at smaller dataset sizes) and classified correctly as quadratic from N = 3 onwards, so by observation from the timings obtained for N = 3, I selected
time.limit
= 107 as a default time limit for our function (will modify if value is not suitable), a possibly safe value after which it classifies correctly. For demonstration, I also tested fortime.limit
= 106, one-tenth of our taken threshold value which would classify incorrectly, as shown below: (also the function works pretty fast in this case, with both the timing computation fromasymptoticTimings
and complexity classification fromasymptoticComplexityClass
taking under 5 seconds, since N = 3 here)Timings function:
Also I've merged the
asymptoticTimings
andasymptoticTimingsFun
part as we could do with one single function.Do these changes seem okay or should I discard this logic?
The text was updated successfully, but these errors were encountered: