Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize intercept as logit of outcome rate for logistic regression models #81

Open
egillax opened this issue Jan 29, 2025 · 0 comments

Comments

@egillax
Copy link

egillax commented Jan 29, 2025

Hi @msuchard ,

I believe currently the intercept is initialized as zero (at least for logistic regression, only model I checked). Given we are often dealing with rare outcomes I think it would be better to use the logit of the outcome rate ( log(p / (1 - p)) ). In this case the model converges faster (as measured in fever iterations, see below) and seems to give the same results. I do think this could end up giving ~ 20-30% speed up for outcome rates of 1-3% which are common in our data. The example below for Eunomia gives a speedup of over 2x with outcome rate of 2%, but it's rather small data (n=1800, p=372). There's also a pure Cyclops reprex at the bottom giving about 30% fewer iterations but I had to change the simulateCyclopsData to give 1-3% outcome rates.

Of course I could supply startingCoefficients myself from the PLP to add this. But if there are no downsides maybe this should be the default in Cyclops ? What do you think ?

reprex:

library(PatientLevelPrediction)
    library(Cyclops)
    
    connectionDetails <- Eunomia::getEunomiaConnectionDetails()
#> attempting to download GiBleed
#> attempting to extract and load: /tmp/RtmpplDETg/GiBleed_5.3.zip to: /tmp/RtmpplDETg/GiBleed_5.3.sqlite
    Eunomia::createCohorts(connectionDetails)
#> Cohorts created in table main.cohort
#>   cohortId       name
#> 1        1  Celecoxib
#> 2        2 Diclofenac
#> 3        3    GiBleed
#> 4        4     NSAIDs
#>                                                                                        description
#> 1    A simplified cohort definition for new users of celecoxib, designed specifically for Eunomia.
#> 2    A simplified cohort definition for new users ofdiclofenac, designed specifically for Eunomia.
#> 3 A simplified cohort definition for gastrointestinal bleeding, designed specifically for Eunomia.
#> 4       A simplified cohort definition for new users of NSAIDs, designed specifically for Eunomia.
#>   count
#> 1  1844
#> 2   850
#> 3   479
#> 4  2694
    outcomeId <- 3 # GIbleed
    seed <- 42
    
    databaseDetails <- createDatabaseDetails(
        connectionDetails = connectionDetails,
        cdmDatabaseSchema = "main",
        cdmDatabaseName = "main",
        cohortDatabaseSchema = "main",
        cohortTable = "cohort",
        outcomeDatabaseSchema = "main",
        outcomeTable =  "cohort",
        targetId = 1,
        outcomeIds = outcomeId,
        cdmVersion = 5)
#> No cdm database id entered so using cdmDatabaseSchema - if cdmDatabaseSchema is the same for multiple different databases, please use cdmDatabaseId to specify a unique identifier for the database and version
    
    covariateSettings <- FeatureExtraction::createDefaultCovariateSettings()
    
    plpData <- getPlpData(databaseDetails = databaseDetails,
                          covariateSettings = covariateSettings,
                          restrictPlpDataSettings = createRestrictPlpDataSettings())
#> Connecting using SQLite driver
#> 
#> Constructing the at risk cohort
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.026 secs
#> Fetching cohorts from server
#> Loading cohorts took 0.031 secs
#> Constructing features on server
#>   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |=======                                                               |  11%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |=========                                                             |  14%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  17%  |                                                                              |============                                                          |  18%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |==============                                                        |  21%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  22%  |                                                                              |================                                                      |  23%  |                                                                              |=================                                                     |  24%  |                                                                              |=================                                                     |  25%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |======================                                                |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  39%  |                                                                              |============================                                          |  40%  |                                                                              |============================                                          |  41%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |===============================                                       |  45%  |                                                                              |================================                                      |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |=================================                                     |  48%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  51%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  54%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  57%  |                                                                              |========================================                              |  58%  |                                                                              |=========================================                             |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |==========================================                            |  61%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  62%  |                                                                              |============================================                          |  63%  |                                                                              |============================================                          |  64%  |                                                                              |=============================================                         |  64%  |                                                                              |=============================================                         |  65%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |===============================================                       |  68%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |=================================================                     |  71%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  77%  |                                                                              |======================================================                |  78%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  79%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  84%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |=============================================================         |  88%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |===============================================================       |  91%  |                                                                              |================================================================      |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  94%  |                                                                              |==================================================================    |  95%  |                                                                              |===================================================================   |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |====================================================================  |  98%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%
#> Executing SQL took 3.26 secs
#> Fetching data from server
#> Fetching data took 0.312 secs
#> Fetching outcomes from server
#> Loading outcomes took 0.0241 secs
    
    population <- createStudyPopulation(
        plpData,
        outcomeId = outcomeId,
        populationSettings = createStudyPopulationSettings(
            minTimeAtRisk = 1,
            riskWindowEnd = 15) # short TAR end for lower outcome rate
    )
#> outcomeId: 3
#> binary: TRUE
#> includeAllOutcomes: TRUE
#> firstExposureOnly: FALSE
#> washoutPeriod: 0
#> removeSubjectsWithPriorOutcome: TRUE
#> priorOutcomeLookback: 99999
#> requireTimeAtRisk: TRUE
#> minTimeAtRisk: 1
#> restrictTarToCohortEnd: FALSE
#> riskWindowStart: 1
#> startAnchor: cohort start
#> riskWindowEnd: 15
#> endAnchor: cohort start
#> restrictTarToCohortEnd: FALSE
#> Removing subjects with prior outcomes (if any)
#> Removing non outcome subjects with insufficient time at risk (if any)
#> Outcome is 0 or 1
#> Population created with: 1800 observations, 1800 unique subjects and 38 outcomes
#> Population created in 0.044 secs
    
    population$y <- population$outcomeCount
    plpData$covariateData$outcomes <- population
    cyclopsData <- convertToCyclopsData(plpData$covariateData$outcomes,
                                        plpData$covariateData$covariates,
                                        modelType = "lr",
                                        addIntercept = TRUE)
    
    prior <- createPrior("laplace", exclude = c(0), useCrossValidation = FALSE)
    
    control <- createControl(noiseLevel = "quiet",
                             cvRepetitions = 1,
                             tolerance = 1e-9,
                             seed = seed)
    
    out <- capture.output(fit <- fitCyclopsModel(cyclopsData, prior, control))
    
    
    
    # Now fit the same model with starting coefficients
    cyclopsData <- convertToCyclopsData(plpData$covariateData$outcomes,
                                        plpData$covariateData$covariates,
                                        modelType = "lr",
                                        addIntercept = TRUE)
    prior <- createPrior("laplace", exclude = c(0), useCrossValidation = FALSE)
    control <- createControl(noiseLevel = "quiet",
                             cvRepetitions = 1,
                             seed = seed,
                             tolerance = 1e-9)
    ncovars <- length(summary(cyclopsData)$covariateId)
    startingCoefficients <- rep(0, ncovars)
    outcomeRate <- mean(population$y)
    startingCoefficients[1] <- log(outcomeRate / (1 - outcomeRate))
    out2 <- capture.output(fit2 <- fitCyclopsModel(cyclopsData,
                                                   prior,
                                                   control,
                                                   startingCoefficients = startingCoefficients))
    message("Original Cyclops fit complete in ", round(fit$timeFit, 4), " seconds. And ",
            fit$iterations, " iterations. With logLikelihood: ", fit$log_likelihood)
#> Original Cyclops fit complete in 0.018 seconds. And 125 iterations. With logLikelihood: -166.622177178957
    
    message("Modified Cyclops fit complete in ", round(fit2$timeFit, 4), " seconds. And ",
            fit2$iterations, " iterations. With logLikelihood: ", fit2$log_likelihood)
#> Modified Cyclops fit complete in 0.0077 seconds. And 67 iterations. With logLikelihood: -166.622177181422

Created on 2025-01-29 with reprex v2.1.1

Pure Cyclops reprex:

library(Cyclops)
    
    # modify https://github.com/OHDSI/Cyclops/blob/develop/R/Simulation.R#L80C1-L80C63
    # strataBackgroundProb <- runif(nstrata,min=0.01,max=0.03)
    
    seed <- 666
    set.seed(seed)
    
    nrows <- 10000
    ncovars <- 500
    
    data <- simulateCyclopsData(nstrata = 1,
                                nrows = nrows,
                                ncovars = ncovars,
                                model = "logistic")
#> Sparseness = 98.99538 %
    
    cyclopsData <- convertToCyclopsData(data$outcomes,
                                        data$covariates,
                                        modelType = "lr",
                                        addIntercept = TRUE)
#> Sorting covariates by covariateId and rowId
    
    prior <- createPrior("laplace", exclude = c(0), useCrossValidation = FALSE)
    
    control <- createControl(noiseLevel = "quiet",
                             cvType = "auto",
                             cvRepetitions = 1,
                             seed = seed)
    out <- capture.output(fit <- fitCyclopsModel(cyclopsData, prior, control))
    
    
    # Now fit the same model with starting coefficients
    cyclopsData <- convertToCyclopsData(data$outcomes,
                                        data$covariates,
                                        modelType = "lr",
                                        addIntercept = TRUE)
#> Sorting covariates by covariateId and rowId
    prior <- createPrior("laplace", exclude = c(0), useCrossValidation = FALSE)
    control <- createControl(noiseLevel = "quiet",
                             cvRepetitions = 1,
                             seed = seed)
    startingCoefficients <- rep(0, ncovars + 1)
    outcomeRate <- mean(data$outcomes$y)
    startingCoefficients[1] <- log(outcomeRate / (1 - outcomeRate))
    out2 <- capture.output(fit2 <- fitCyclopsModel(cyclopsData,
                                                   prior,
                                                   control,
                                                   startingCoefficients = startingCoefficients))
    
    message("Original Cyclops fit complete in ", round(fit$timeFit, 4), " seconds. And ",
            fit$iterations, " iterations. With logLikelihood: ", fit$log_likelihood)
#> Original Cyclops fit complete in 0.0686 seconds. And 17 iterations. With logLikelihood: -756.202262114616
    
    message("Modified Cyclops fit complete in ", round(fit2$timeFit, 4), " seconds. And ",
            fit2$iterations, " iterations. With logLikelihood: ", fit2$log_likelihood)
#> Modified Cyclops fit complete in 0.0472 seconds. And 12 iterations. With logLikelihood: -756.202248337887

Created on 2025-01-29 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant