forked from ck37/varimpact
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
74 lines (70 loc) · 3.38 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
VarImpact TODOs
Alan priorities:
- HIGHEST: Add CV-TMLE (also Mark priority!)
* IN PROGRESS, largely working.
* Needs more testing with non-binary outcomes. E.g. transform theta?
- Unadjusted difference in means estimate (t-test) to compare to TMLE estimate
- Cell sizes in the groups that it chooses (A = 0 and A = 1)
- Option of ratios (delta method), rather than differences
- Alan will email Minh, maybe Jeremy too. (may be embedded in TMLE)
- In ByV table, use the labels of the quantiles to show the actual range.
- Specify names of variables to use for variable importance
Chris items:
* Arguments
- Also support "X" optional argument, ala glm
- Parameter for the number of covariates after which we use dimensionality reduction (HOPACH).
- Argument for p-value cutoff for consistent results.
* Coding style
- Create subfunctions.
* Output
- Sort consistent results by p-value then estimate size?
- Support summary()
* Quality
- Warn/error if there are character columns in the data, as a bonus check.
- Figure out how to reduce the number of potential failures in various steps.
- Count and report how many variables are skipped due to minCell constraint.
- Automatically handle rows that are all NAs?
* Testing
- Make examples quicker to execute, esp. during devtools::check()
- Add other tests.
* Performance
- Try Rprofvis
- Use Jeremy's method of not wasting SL folds during CV-TMLE.
* Documentation
- Improve varImpact() roxygen description.
- Document the return object
- Vignette
* Examples
- Use doParallel instead of doMC so that it works on Windows.
* Benchmarking
- Benchmark to random forest, lasso, OLS and SL-perturbation algorithms.
* BUGS/ERRORS
- HOPACH can fail twice in a row. Need to figure out why and fix (see factor test 1).
- HOPACH (Minh example): Error in rowMeans(dmat[labels == ordlabels[j], labels == labels[medoid2]]) :
* seem to occur in hopach::mssnextlevel()
* possibly due to call to msssplitcluster()
* may need a ", drop = F" when subsetting a matrix somewhere.
* TODO: fork HOPACH and add a dimension check / debug stuff near there.
'x' must be an array of at least two dimensions
- Error in predmat[which, seq(nlami)] = preds : replacement has length zero
- Occurs when estimating TMLE on training samples.
- Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
one multinomial or binomial class has 1 or 0 observations; not allowed
-- Can happen with BreastCancer dataset, seems to depend on seed.
-- Happens in parkinson's competition dataset.
-- Seems to occur when a variable has value with a small number of obs.
Then when estimating the treatment propensity score for a given fold
there may be only a single value for all observations.
- Error in y %*% rep(1, nc) : non-conformable arguments
-- When estimating TMLE
- "Missing value(s) in medoidsdata in collap()"
* WARNINGS
- Warning in collap(data, level, d, dmat, newmed) (conducting HOPACH, at least on factors):
Not enough medoids to use newmed='medsil' in collap() -
using newmed='nn' instead
(CK: suppressing warnings currently to avoid this one.)
From existing documentation:
- Allow reporting of results that randomly do not have estimates for some of validation samples.
Wishlist
- Support clustering or variable reduction algorithms other than HOPACH.
- Support more than 2 external (CV-TMLE) folds.