Skip to content

Provides Scikit-Learn compatible transforms for spectroscopic data preprocessing.

License

Notifications You must be signed in to change notification settings

franckalbinet/soilspectfm

Repository files navigation

SoilSpecTfm

Spectral Processing Tools for Soil Spectroscopy

By translating specialized soil spectroscopy methods into the scikit-learn framework, SoilSpecTfm and SoilSpecData connect this niche domain with Python’s vast machine learning ecosystem, making advanced ML/DL tools accessible to soil scientists.

Implemented transforms developed so far include:

  • Baseline corrections:

    • SNV: Standard Normal Variate
    • MSC: Multiplicative Scatter Correction
    • Detrend: Detrend the spectrum (planned)
    • ALS: Asymmetric Least Squares detrend the spectrum (planned)
  • Derivatives:

    • TakeDerivative: Take derivative (1st, 2nd, etc.) of the spectrum and apply Savitzky-Golay smoothing
    • GapSegmentDerivative: (planned)
  • Smoothing:

  • Other transformations:

    • ToAbsorbance: Transform the spectrum to absorbance
    • Resample: Resample the spectrum to a new wavenumber range
    • Trim: Trim the spectrum to a specific wavenumber range

Key Features:

  • Seamless integration with scikit-learn’s machine learning ecosystem
  • Complement with SoilSpecData package for soil spectroscopy workflows
  • Pipeline-ready transformers with consistent API

All transformers follow scikit-learn conventions:

  • Implement fit/transform interface
  • Support get_params/set_params for GridSearchCV
  • Provide detailed documentation and examples

Installation

pip install soilspectfm

Quick Start

from soilspectfm.core import (SNV, 
                              TakeDerivative, 
                              ToAbsorbance, 
                              Resample, 
                              WaveletDenoise)

from sklearn.pipeline import Pipeline

Loading OSSL dataset

Let’s use OSSL dataset as an example using SoilSpecData package.

from soilspecdata.datasets.ossl import get_ossl
ossl = get_ossl()
mir_data = ossl.get_mir()

Preprocessing pipeline

Transforms are fully compatible with scikit-learn and can be used in a pipeline as follows:

pipe = Pipeline([
    ('snv', SNV()), # Standard Normal Variate transformation
    ('denoise', WaveletDenoise()), # Wavelet denoising
    ('deriv', TakeDerivative(window_length=11, polyorder=2, deriv=1)) # First derivative
])

X_tfm = pipe.fit_transform(mir_data.spectra)

Quick visualization

from soilspectfm.visualization import plot_spectra
from matplotlib import pyplot as plt
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 7))

ax1 = plot_spectra(
    mir_data.spectra, 
    mir_data.wavenumbers,
    ax=ax1,
    ascending=False,
    color='black',
    alpha=0.6,
    lw=0.5,
    xlabel='Wavenumber (cm$^{-1}$)',
    title='Raw Spectra'
)

ax2 = plot_spectra(
    X_tfm,
    mir_data.wavenumbers,
    ax=ax2,
    ascending=False,
    color='steelblue',
    alpha=0.6,
    lw=0.5,
    xlabel='Wavenumber (cm$^{-1}$)',
    title='SNV + Derivative (1st order) Transformed Spectra'
)

plt.tight_layout()

Dependencies

  • fastcore
  • numpy
  • scipy
  • scikit-learn
  • matplotlib

Further references

Contributing

Developer guide

If you are new to using nbdev here are some useful pointers to get you started.

Install spectfm in Development mode:

# make sure spectfm package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to spectfm
$ nbdev_prepare

License

This project is licensed under the Apache2 License - see the LICENSE file for details.

Support

For questions and support, please open an issue on GitHub.

About

Provides Scikit-Learn compatible transforms for spectroscopic data preprocessing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages