Skip to content

Commit

Permalink
readme.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mlesnoff committed Jan 8, 2024
1 parent 4ed05eb commit 33d01a8
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 143 deletions.
165 changes: 26 additions & 139 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,166 +251,54 @@ Object `Ttrain` above can also be built directly by:
Ttrain = mod.fm.T
```

Some summary of the model (% of explained variance, etc.) can be displayed by:

### **Fitting a predictive model** </span>

Let us consider the example of a Gaussian kernel PLSR with 15 latent variables, using function `kplsr`.

The keyword arguments required or allowed in the function can be found at its help page, here see
```julia
?kplsr
summary(mod, Xtrain)
```

### **Fitting a predictive model**

#### **Example of a KPLSR**

The embedded syntax to fit a model is as follows:
Let us consider a Gaussian kernel partial least squares regression (KPLSR), using function `kplsr`.

The embedded syntax to fit the model is as follows:
```julia
## Below, the character `;` within the
## function definition specifies that
## `nlv`, `kern` and `gamma`
## are keyword arguments of the function.
nlv = 15
nlv = 15 # nb. latent variables
kern = :krbf ; gamma = .001
mod = kpls(; nlv, kern, gamma)
fit!(mod, Xtrain, ytrain)
```
This is the strictly the same as:
```julia
mod = kplsr(nlv = 15,
kern = :krbf, gamma = .001)
fit!(mod, Xtrain, ytrain)
```

Predictions are given by:
As for PCA, the score matrices can be computed by:
```julia
pred = predict(mod, Xtest).pred
Ttrain = transf(mod, Xtrain)
## or: Ttrain = mod.fm.T
Ttest = transf(mod, Xtest)
```
## Some summary
summary(fm, Xtrain)

## Computation of the PLS scores (LVs) for Xtest
Jchemo.transform(fm, Xtest)
Jchemo.transform(fm, Xtest; nlv = 1)

## PLS b-coefficients
Jchemo.coef(fm)
Jchemo.coef(fm; nlv = 2)

## Predictions and performance of the fitted model
res = Jchemo.predict(fm, Xtest)
res.pred
rmsep(res.pred, Ytest)
mse(res.pred, Ytest)
and model summary by:

Jchemo.predict(fm, Xtest).pred
Jchemo.predict(fm, Xtest; nlv = 0:3).pred
```julia
summary(mod, Xtrain)
```

#### **Tuning a model by grid-search**
- #### With gridscore
Predictions (Y-values) are given by:
```julia
using Jchemo, StatsBase, CairoMakie
ntrain = 150 ; p = 200
ntest = 80
Xtrain = rand(ntrain, p) ; ytrain = rand(ntrain)
Xtest = rand(ntest, p) ; ytest = rand(ntest)
## Train is splitted to Cal+Val to tune the model,
## and the generalization error is estimated on Test.
nval = 50
s = sample(1:ntrain, nval; replace = false)
Xcal = rmrow(Xtrain, s)
ycal = rmrow(ytrain, s)
Xval = Xtrain[s, :]
yval = ytrain[s]
## Computation of the performance over the grid
## (the model is fitted on Cal, and the performance is
## computed on Val)
nlv = 0:10
res = gridscorelv(
Xcal, ycal, Xval, yval;
score = rmsep, fun = plskern, nlv)
## Plot the results
plotgrid(res.nlv, res.y1,
xlabel = "Nb. LVs", ylabel = "RMSEP").f
## Predictions and performance of the best model
u = findall(res.y1 .== minimum(res.y1))[1]
res[u, :]
fm = plskern(Xtrain, ytrain; nlv = res.nlv[u]) ;
res = Jchemo.predict(fm, Xtest)
rmsep(res.pred, ytest)
## *Note*: For PLSR models, using gridscorelv is much faster
## than using the generic function gridscore.
## In the same manner, for ridge regression models,
## gridscorelb is much faster than gridscore.
## Syntax for the generic gridscore
pars = mpar(nlv = nlv)
res = gridscore(
Xcal, ycal, Xval, yval;
score = rmsep, fun = plskern, pars = pars)
pred = predict(mod, Xtest).pred
```

- #### With gridcv
**Examples of tuning** of predictive models (test-set validation and cross-validation) are given in the help pages of functions `gridscore` and `gridcv`:

```julia
using Jchemo, StatsBase, CairoMakie

ntrain = 150 ; p = 200
ntest = 80
Xtrain = rand(ntrain, p) ; ytrain = rand(ntrain)
Xtest = rand(ntest, p) ; ytest = rand(ntest)
## Train is used to tune the model,
## and the generalization error is estimated on Test.

## Build the cross-validation (CV) segments
## Replicated K-Fold CV
K = 5 # Nb. folds
rep = 10 # Nb. replications (rep = 1 ==> no replication)
segm = segmkf(ntrain, K; rep = rep)

## Or replicated test-set CV
m = 30 # Size of the test-set
rep = 10 # Nb. replications (rep = 1 ==> no replication)
segm = segmts(ntrain, m; rep = rep)

## Computation of the performances over the grid
nlv = 0:10
rescv = gridcvlv(
Xtrain, ytrain; segm = segm,
score = rmsep, fun = plskern, nlv) ;
pnames(rescv)
res = rescv.res

## Plot the results
plotgrid(res.nlv, res.y1,
xlabel = "Nb. LVs", ylabel = "RMSEP").f

## Predictions and performance of the best model
u = findall(res.y1 .== minimum(res.y1))[1]
res[u, :]
fm = plskern(Xtrain, ytrain; nlv = res.nlv[u]) ;
res = Jchemo.predict(fm, Xtest)
rmsep(res.pred, ytest)

## *Note*: For PLSR models, using gridcvlv is much faster
## than using the generic function gridcv.
## In the same manner, for ridge regression models,
## gridcvlb is much faster than gridcv.

## Using the generic function gridcv:
pars = mpar(nlv = nlv)
rescv = gridcv(
Xtrain, ytrain; segm = segm,
score = rmsep, fun = plskern, pars = pars) ;
pnames(rescv)
res = rescv.res
?gridscore
?gridcv
```
### **Pipelines**




# <span style="color:green"> Credit </span>

Expand All @@ -426,8 +314,7 @@ res = rescv.res

### How to cite

Lesnoff, M. 2021. Jchemo: Machine learning and chemometrics
on high-dimensional data with Julia. https://github.com/mlesnoff/Jchemo.
Lesnoff, M. 2021. Jchemo: Chemometrics and machine learning on high-dimensional data with Julia. https://github.com/mlesnoff/Jchemo.
UMR SELMET, Univ Montpellier, CIRAD, INRA, Institut Agro, Montpellier, France

### Acknowledgments
Expand Down
6 changes: 3 additions & 3 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ makedocs(;
"Home" => "index.md",
"Available methods" => "domains.md",
"Index of functions" => "api.md",
"News" => "news.md",
"Examples" => "see_jchemodemo.md",
"Datasets" => "see_jchemodata.md"
"News" => "news.md"
#"Examples" => "see_jchemodemo.md",
#"Datasets" => "see_jchemodata.md"
]
)

Expand Down
3 changes: 2 additions & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ end

See also the related projects:

- [JchemoData.jl](https://github.com/mlesnoff/JchemoData.jl): Datasets repository (used in the examples)

- [JchemoDemo](https://github.com/mlesnoff/JchemoDemo): Training material

- [JchemoData.jl](https://github.com/mlesnoff/JchemoData.jl): Datasets repository (used in the examples)

[Return to [Jchemo.jl](https://github.com/mlesnoff/Jchemo.jl)]

Expand Down

2 comments on commit 33d01a8

@mlesnoff
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/98462

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.3.0 -m "<description of version>" 33d01a8c72c76b48cba5f199d8fec0c73b0c7b29
git push origin v0.3.0

Please sign in to comment.