Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hello,can you help me ? #29

Open
wants to merge 96 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
1db4060
fix s2_t_hat calculation
ilia-kats Jan 15, 2020
1727ab2
First port of SpatialDE to gpflow + tensorflow
ilia-kats Jan 28, 2020
c125580
remove apparently unused file
ilia-kats Jan 29, 2020
ef72229
fix s2_t_hat calculation
ilia-kats Jan 15, 2020
2f0737b
remove apparently unused file
ilia-kats Jan 29, 2020
8a0c4f4
add option to use score test instead of LRT test
ilia-kats Mar 5, 2020
ded3891
Merge branch 'master' of https://github.com/ilia-kats/SpatialDE
ilia-kats Mar 5, 2020
1e570ae
do not fit multiple GPs for multiple kernels
ilia-kats Mar 17, 2020
d8f03a4
enable pickling of frozen models, improve power spectrum plot
ilia-kats Mar 18, 2020
cdec422
fix deleted function in __init__.py
ilia-kats Mar 18, 2020
22717c6
implement an optimizer based on tensorflow probability, add some minor
ilia-kats Mar 20, 2020
84cae8b
full refactor
ilia-kats Mar 23, 2020
09c441d
minor bugfixes
ilia-kats Mar 24, 2020
781aba7
refactor score tests in prepration for NB null model
ilia-kats Mar 27, 2020
b09ad95
testing negative binomial null model
ilia-kats Mar 30, 2020
dd1d203
performance: perform score test once per gene and model
ilia-kats Mar 30, 2020
9a19a45
Merge branch 'gpflow'
ilia-kats Mar 31, 2020
5a6902b
fix up some merge issues
ilia-kats Mar 31, 2020
db545ad
fully integrate gpflow into spatialde
ilia-kats Apr 6, 2020
b687830
robustify against singular matrices, some bugfixes
ilia-kats Apr 7, 2020
f9d400b
NB score test: use proper MLE for mean and dispersion
ilia-kats Apr 14, 2020
a0b3227
clean up obsolete functions
ilia-kats Apr 14, 2020
9714c83
setup: bump version, add gpflow dependency
ilia-kats Apr 14, 2020
11ef99e
remove deleted function from __init__.py
ilia-kats Apr 14, 2020
88cf687
increase minimum lengthscale for automatically generated kernel space
ilia-kats Apr 15, 2020
fbd166d
bugfixes for spectral mixture kernel
ilia-kats Apr 15, 2020
2217ff2
work around negative variances
ilia-kats Apr 15, 2020
599abb4
NB score test: initialize alpha MLE with MoM estimator
ilia-kats Apr 15, 2020
8809ddf
score test: do score test before GP fitting
ilia-kats Jun 2, 2020
0e4b883
add tissue segmentation functionality
ilia-kats Jun 2, 2020
fd44a3c
tissue segmentation: fix some variable names left over from testing
ilia-kats Jun 2, 2020
62597ea
major refactor
ilia-kats Jun 4, 2020
a1749f5
add missing type hinting imports
ilia-kats Jun 4, 2020
065ea38
implement reader for 10x SpaceRanger output
ilia-kats Jun 4, 2020
0d24dec
score test: fix the case where data are subset due to several
ilia-kats Jun 4, 2020
1d32da8
full sparse matrix support
ilia-kats Jun 4, 2020
27bcc40
score test: fix testing multiple kernels
ilia-kats Jun 5, 2020
fd76bcd
simplify kernels + distance cache
ilia-kats Jun 5, 2020
ff61793
tissue segmentation: fix off-by-one error
ilia-kats Jun 8, 2020
3fc4f73
tissue segmentation: fix elbo calculation
ilia-kats Jun 9, 2020
7404517
more tissue segmentation fixes
ilia-kats Jun 12, 2020
be9ff05
tissue segmentation: some more elbo fixes
ilia-kats Jun 19, 2020
28a5477
fix sizefactor calculation for sparse matrices
ilia-kats Jul 8, 2020
5014f93
space ranger import: properly handle duplicate gene names
ilia-kats Jul 8, 2020
1694543
tissue segmentation: properly log about not using spatial information
ilia-kats Jul 9, 2020
389cc57
tissue segmentation: tighten convergence tolerance
ilia-kats Jul 10, 2020
8e088d3
properly handle AnnData files with duplicate gene names in the
ilia-kats Jul 14, 2020
76630fe
make score test work with scipy 1.5
ilia-kats Jul 14, 2020
b568612
port AEH to tensorflow with Dirichlet Process prior
ilia-kats Sep 2, 2020
eb78e70
also export SpatialPatterns class
ilia-kats Sep 2, 2020
f3905b5
aeh: clean up some imports
ilia-kats Sep 2, 2020
468360e
tissue segmentation: fix class pruning
ilia-kats Sep 7, 2020
4c44a7a
tissue segmentation: adjust alpha hyperprior
ilia-kats Sep 7, 2020
8068e0a
aeh: discard unused classes after optimization
ilia-kats Sep 7, 2020
75898a2
implement SVCA model
ilia-kats Oct 8, 2020
d2dd0bf
score test: use gpflow.default_float() consistently
ilia-kats Oct 8, 2020
fe9e67b
clean up repository, modernize setup
ilia-kats Oct 8, 2020
5a7472f
black-ify
ilia-kats Oct 8, 2020
f13a046
fix importing pip-installed SpatialDE
ilia-kats Oct 9, 2020
4bbe4b4
move kernels to _internal, set up basic sphinx-docs
ilia-kats Oct 9, 2020
f09094d
replace old README
ilia-kats Oct 9, 2020
2dd0fb2
SVCA: allow running fit_spatial_interactions without testing first
ilia-kats Oct 12, 2020
0d6b7c8
svca.fit_spatial_interactions: properly initialize kernel
ilia-kats Oct 13, 2020
262dec6
svca: better initialization
ilia-kats Oct 15, 2020
81a7038
svca: use squared exponential kernel (faster)
ilia-kats Oct 19, 2020
5b93d99
svca: performance optimization
ilia-kats Oct 20, 2020
916e0e3
svca: fix test (got broken in previous commit)
ilia-kats Oct 22, 2020
8c3d7c8
fix spatially variable genes test
ilia-kats Nov 4, 2020
6cc9e16
tissue segmentation: write the labels to the AnnData as pd.Categorical
ilia-kats Nov 5, 2020
f3a853f
Merge branch 'master' of https://github.com/ilia-kats/SpatialDE into …
ilia-kats Nov 5, 2020
e93cef6
svca: handle sparse matrices
ilia-kats Nov 6, 2020
3c15d79
aeh: save spatial patterns into adata.obs
ilia-kats Nov 11, 2020
267fd68
when working with AnnData, allow the user to specify the layer
ilia-kats May 17, 2021
714a63c
make setting a different default_float work for score test
ilia-kats Jun 28, 2021
4da263e
make caching the distance matrices optional
ilia-kats Jun 28, 2021
ec40f5a
add docstrings, adjust Sphinx config
ilia-kats Aug 27, 2021
603fbaf
docs: use custom template for classes
ilia-kats Aug 30, 2021
6bd6d1c
auto-build docs on push
ilia-kats Aug 30, 2021
13eddb8
update black version
ilia-kats Aug 31, 2021
f9ff81f
test: improve speed for AnnData with lots of metadata
ilia-kats Sep 21, 2021
37e22ad
tissue segmentation: don't crash when no coordinates are found
ilia-kats Oct 1, 2021
d1bbc5f
workflow: use Python 3.9 for docs (no NumPy wheels for 3.10 yet)
ilia-kats Oct 26, 2021
44d5dbe
docs workflow: debug
ilia-kats Oct 26, 2021
0345513
docs workflow: explicitly install missing NaiveDE dep
ilia-kats Oct 26, 2021
bc8ee56
docs workflow: remove debug step
ilia-kats Oct 26, 2021
906c5e7
README: add docs badge
ilia-kats Oct 26, 2021
78da0ac
fixes for GPflow 2.1
ilia-kats Nov 10, 2021
a3d5ec7
add layer argument to GP fitting
ilia-kats Jan 9, 2023
0e27c17
some more bugfixes, black-ify
ilia-kats Jan 9, 2023
cde7720
update pre-commit action
ilia-kats Jan 9, 2023
c5aa9cd
update pre-commit config
ilia-kats Jan 9, 2023
42b076c
allow specifying a .obs column to use as size factor for normalization
ilia-kats Jan 12, 2023
a69ba45
fix SGPR model
ilia-kats Jan 13, 2023
25f96b6
fit_fast: add parameter to enable/disable sparse GPs
ilia-kats Jan 13, 2023
3703cef
Add `.test` for Normal distribution (#1)
jykr Dec 20, 2024
6dbdb55
update sphinx-docs-to-gh-pages version
ilia-kats Dec 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
performance: perform score test once per gene and model
Previously, a score test was performed for every lengthscale. Now, only
the maximum likelihood model is tested.
  • Loading branch information
ilia-kats committed Mar 30, 2020
commit dd1d203927c4536154ea86c95d4ecf2c121d011b
1 change: 0 additions & 1 deletion Python-module/SpatialDE/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
from .base import dyn_de
from .base import run
from .base import model_search
from .aeh import fit_patterns
151 changes: 84 additions & 67 deletions Python-module/SpatialDE/base.py
Original file line number Diff line number Diff line change
@@ -4,7 +4,7 @@
import logging
from time import time
import warnings
from typing import Optional
from typing import Optional, Dict, Tuple

import numpy as np
from scipy import stats
@@ -64,48 +64,43 @@ def inducers_grid(X, ninducers):
xx, xy = np.meshgrid(xvals, yvals)
return np.hstack((xx.reshape((xx.size, 1)), xy.reshape((xy.size, 1))))


def fit_model(
model: Model,
exp_tab: pd.DataFrame,
raw_counts: pd.DataFrame,
score_test: ScoreTest
raw_counts: pd.DataFrame
):
results = []
with warnings.catch_warnings():
warnings.simplefilter("ignore", RuntimeWarning)
for i, gene in enumerate(tqdm(exp_tab.columns)):
y = exp_tab.iloc[:, i].to_numpy()
rawy = raw_counts.iloc[:,i].to_numpy()
model.sety(y, rawy)
t0 = time()

res = model.optimize()
t = time() - t0
res = {
"g": gene,
"max_ll": model.log_marginal_likelihood,
"max_delta": model.delta,
"max_mu_hat": model.mu,
"max_s2_t_hat": model.sigma_s2,
"max_s2_e_hat": model.sigma_n2,
"time": t,
"n": model.n,
"FSV": model.FSV,
"s2_FSV": model.s2_FSV,
"s2_logdelta": model.s2_logdelta,
"converged": res.success,
"M": model.n_parameters,
}
for (k, v) in vars(model.kernel).items():
if k not in res:
res[k] = v
stest = score_test()
res["pval"] = stest.pval
res["kappa"] = stest.kappa
res["U_tilde"] = stest.U_tilde
res["nu"] = stest.nu
results.append(res)
with model:
for i, gene in enumerate(tqdm(exp_tab.columns)):
y = exp_tab.iloc[:, i].to_numpy()
rawy = raw_counts.iloc[:,i].to_numpy()
model.sety(y, rawy)
t0 = time()

res = model.optimize()
t = time() - t0
res = {
"g": gene,
"max_ll": model.log_marginal_likelihood,
"max_delta": model.delta,
"max_mu_hat": model.mu,
"max_s2_t_hat": model.sigma_s2,
"max_s2_e_hat": model.sigma_n2,
"time": t,
"n": model.n,
"FSV": model.FSV,
"s2_FSV": model.s2_FSV,
"s2_logdelta": model.s2_logdelta,
"converged": res.success,
"M": model.n_parameters,
}
for (k, v) in vars(model.kernel).items():
if k not in res:
res[k] = v

results.append(res)
return pd.DataFrame(results)


@@ -136,12 +131,51 @@ def kspace_walk(kernel_space: dict, X: np.ndarray):
except TypeError:
yield factory(kern, X, lengthscales), kern

def dyn_de(
X: pd.DataFrame, exp_tab: pd.DataFrame, raw_counts: pd.DataFrame, kernel_space:Optional[dict]=None, null_model:str="const"
):
def score_test(results: pd.DataFrame, exp_tab:pd.DataFrame, raw_counts:pd.DataFrame, tests: Dict[Tuple[str, float], ScoreTest], testskey: str):
with tqdm(total=results.shape[0]) as pbar:
def test(df):
results = []
with tests[df.name] as test:
for gene in df.g:
t0 = time()
test.model.sety(exp_tab[gene], raw_counts[gene])
stest = test()
t = time() - t0
res = {
"g": gene,
"pval": stest.pval,
"kappa": stest.kappa,
"U_tilde": stest.U_tilde,
"nu": stest.nu,
"test_time": t
}
results.append(res)
pbar.update()
return pd.DataFrame(results)
testresults = results.groupby(testskey, sort=False).apply(test)
results = pd.concat((results.set_index('g'), testresults.set_index('g')), axis=1)
results.time += results.test_time
results.index.name = 'g' # FIXME: https://github.com/pandas-dev/pandas/issues/21629
return results.drop(columns='test_time').reset_index()

def run(X: pd.DataFrame, exp_tab:pd.DataFrame, raw_counts:pd.DataFrame, kernel_space:Optional[dict]=None, null_model:str="const") -> pd.DataFrame:
""" Perform SpatialDE test

X : matrix of spatial coordinates times observations
exp_tab : Expression table, assumed appropriatealy normalised.

The grid of covariance matrices to search over for the alternative
model can be specifiec using the kernel_space parameter.
"""
if kernel_space == None:
kernel_space = {"SE": [5.0, 25.0, 50.0]}
l_min, l_max = get_l_limits(X)
kernel_space = {
"SE": np.logspace(np.log10(l_min), np.log10(l_max), 10),
#'PER': np.logspace(np.log10(l_min), np.log10(l_max), 10),
#'linear': None
}

logging.info("Performing DE test")
results = []

stest_class = lambda x: None
@@ -154,50 +188,33 @@ def dyn_de(

logging.info("Fitting gene models")
n_models = 0
stests = {}
for model, mname in kspace_walk(kernel_space, X.to_numpy()):
res = fit_model(model, exp_tab, raw_counts, stest_class(X.to_numpy(), exp_tab.to_numpy(), raw_counts.to_numpy(), model))
res = fit_model(model, exp_tab, raw_counts)
stests[model] = stest_class(X.to_numpy(), exp_tab.to_numpy(), raw_counts.to_numpy(), model)
res["model"] = mname
res["_model"] = model
results.append(res)
n_models += 1

n_genes = exp_tab.shape[1]
logging.info("Finished fitting {} models to {} genes".format(n_models, n_genes))

results = pd.concat(results, sort=True).reset_index(drop=True)
sizes = results.groupby(["model", "g"]).size().groupby("model").unique()
sizes = results.groupby(["model", "g"], sort=False).size().groupby("model", sort=False).unique()
results = results.set_index("model")
results.loc[sizes > 1, "M"] += 1
results = results.reset_index()
results["BIC"] = -2 * results["max_ll"] + results["M"] * np.log(results["n"])

return results


def run(X: pd.DataFrame, exp_tab:pd.DataFrame, raw_counts:pd.DataFrame, kernel_space:Optional[dict]=None, null_model:str="const") -> pd.DataFrame:
""" Perform SpatialDE test

X : matrix of spatial coordinates times observations
exp_tab : Expression table, assumed appropriatealy normalised.

The grid of covariance matrices to search over for the alternative
model can be specifiec using the kernel_space parameter.
"""
if kernel_space == None:
l_min, l_max = get_l_limits(X)
kernel_space = {
"SE": np.logspace(np.log10(l_min), np.log10(l_max), 10),
#'PER': np.logspace(np.log10(l_min), np.log10(l_max), 10),
#'linear': None
}

logging.info("Performing DE test")
results = dyn_de(X, exp_tab, raw_counts, kernel_space, null_model)
results = results.loc[results.groupby(["model", "g"], sort=False)["max_ll"].idxmax()]
results = results.loc[results.groupby("g", sort=False)["BIC"].idxmin()]

results = results.loc[results.groupby(["model", "g"])["max_ll"].idxmax()]
results = results.loc[results.groupby("g")["BIC"].idxmin()]
logging.info("Performing score test")
results = score_test(results, exp_tab, raw_counts, stests, "_model")
results["p.adj"] = bh_adjust(results["pval"].to_numpy())

return results.reset_index(drop=True)
return results.drop(columns="_model").reset_index(drop=True)

def model_search(X, exp_tab, DE_mll_results, kernel_space=None):
""" Compare model fits with different models.
71 changes: 59 additions & 12 deletions Python-module/SpatialDE/models.py
Original file line number Diff line number Diff line change
@@ -15,14 +15,21 @@ def __init__(self, X: np.ndarray, kernel: Kernel):
self.X = X
self.n = X.shape[0]
self.kernel = kernel
self.K = self.kernel.K(self.X)

self._K = None
self._y = None
self._rawy = None

def _reset(self):
pass

@property
def K(self):
if self._K is not None:
return self._K
else:
return self.kernel.K(self.X)

@property
def n_parameters(self) -> float:
return 0
@@ -109,19 +116,32 @@ def _check_y(self):
if self.y.shape[0] != self.n:
raise RuntimeError("different numbers of observations in y and X")

def __enter__(self):
self._K = self.K

def __exit__(self, *args):
self._K = None


class GPModel(Model):
def __init__(self, X: np.ndarray, kernel: Kernel):
super().__init__(X, kernel)

K = kernel.K(X)
self._y = None
self._logdelta = 0
self._reset()

def __enter__(self):
super().__enter__()
self._reset()
# Gower normalization factor for covariance matric K
# Based on https://github.com/PMBio/limix/blob/master/limix/utils/preprocess.py
self.gower = (np.trace(self.K) - np.sum(np.mean(self.K, axis=0))) / (self.n - 1)

self._y = None
self._logdelta = 0
self._reset()
return self

def __exit__(self, *args):
super().__exit__(*args)

def _reset(self):
self._s2_FSV = None
@@ -253,12 +273,15 @@ def _calc_s2_FSV(self) -> float:
class SGPR(GPModel):
def __init__(self, X: np.ndarray, Z: np.ndarray, kern: Kernel):
super().__init__(X, kernel=kern)
self.Z = Z

K_uu = kern.K(Z)
K_uf = kern.K(Z, X)
K_ff = kern.K_diag(X)
def __enter__(self):
super().__enter__()
K_uu = self.kernel.K(self.Z)
K_uf = self.kernel.K(self.Z, self.X)
K_ff = self.kernel.K_diag(self.X)

L = np.linalg.cholesky(K_uu + 1e-6 * np.eye(Z.shape[0]))
L = np.linalg.cholesky(K_uu + 1e-6 * np.eye(self.Z.shape[0]))
LK_uf = scipy.linalg.solve_triangular(L, K_uf, lower=True)
A = LK_uf @ LK_uf.T

@@ -268,6 +291,16 @@ def __init__(self, X: np.ndarray, Z: np.ndarray, kern: Kernel):
self._By = None
self._traceterm = np.sum(K_ff) - np.sum(LK_uf ** 2)

return self

def __exit__(self, *args):
super().__exit(*args)
self._Lambda = None
self._B = None
self._B1 = None
self._By = None
self._traceterm = None

@property
def n_parameters(self) -> float:
return 3
@@ -306,7 +339,8 @@ def _calc_sigma_n2(self):
return self._residual_quadratic() / self.n

def _ychanged(self):
self._By = np.dot(self._B, self.y)
if self._B is not None:
self._By = np.dot(self._B, self.y)

def _logdeltachanged(self):
self._dL = self.delta + self._Lambda
@@ -315,11 +349,23 @@ def _logdeltachanged(self):
class GPR(GPModel):
def __init__(self, X: np.ndarray, kern: Kernel):
super().__init__(X, kernel=kern)
K = kern.K(X)

def __enter__(self):
super().__enter__()

K = self.kernel.K(self.X)
self._Lambda, self._U = np.linalg.eigh(K)
self._U1 = np.sum(self._U, axis=0)
self._Uy = None

return self

def __exit__(self, *args):
super().__exit__(*args)
self._Lambda = self._U = None
self._U1 = None
self._Uy = None

@property
def n_parameters(self) -> float:
return 3
@@ -347,7 +393,8 @@ def _calc_sigma_n2(self) -> float:
return self.sigma_s2 * self.delta

def _ychanged(self):
self._Uy = np.dot(self._U.T, self.y)
if self._U is not None:
self._Uy = np.dot(self._U.T, self.y)

def _logdeltachanged(self):
self._dL = self.delta + self._Lambda
Loading