Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: It seems that longer time budgets result in worse outputs #1394

Open
kabeersvohra opened this issue Jan 17, 2025 · 1 comment
Open
Labels
bug Something isn't working

Comments

@kabeersvohra
Copy link

Describe the bug

I have a data set where I have tried to optimise the hyperparameters on Flaml, and it seems that the model keeps getting worse, the longer I give it. Here is a simple example of the code I have for the model I am trying to optimise:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.utils.class_weight import compute_sample_weight
from sklearn.metrics import f1_score, confusion_matrix, classification_report, precision_score, recall_score
from flaml import AutoML
import numpy as np
import joblib

def create_and_train_pipeline(X_train, y_train, numerical_features, categorical_features, time_budget=60):
    """
    Creates and trains a pipeline without requiring custom wrapper class
    """
    # First, create and fit the preprocessor
    numeric_transformer = Pipeline(steps=[
        ('scaler', StandardScaler())
    ])
    
    categorical_transformer = Pipeline(steps=[
        ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ])
    
    preprocessor = ColumnTransformer(
        transformers=[
            ('num', numeric_transformer, numerical_features),
            ('cat', categorical_transformer, categorical_features)
        ],
        remainder='drop',
        sparse_threshold=0
    )
    
    # Fit the preprocessor first
    X_train_transformed = preprocessor.fit_transform(X_train)
    
    # Train AutoML on the transformed data
    automl = AutoML()
    
    # Train AutoML
    settings = {
        "time_budget": time_budget,
        "task": "classification",
        "estimator_list": ['lgbm', 'rf'],
        "eval_method": "cv",
        "metric": "f1",
        "n_splits": 5,
        "split_type": "stratified"
    }
    
    automl.fit(X_train_transformed, y_train, **settings)
    
    # Create final pipeline with best model
    final_pipeline = Pipeline([
        ('preprocessor', preprocessor),
        ('classifier', automl.model.estimator)  # Use the best model directly
    ])
    
    # Print training results
    print(f"Best ML model:")
    print(automl.model.estimator)
    print("\nBest hyperparameter configuration:")
    print(automl.best_config)
    print("\nBest score on validation data: {:.4f}".format(automl.best_loss))
    
    # Generate and print test metrics
    y_pred = final_pipeline.predict(X_test)
    print("\nTraining Set Metrics:")
    print("\nClassification Report:")
    print(classification_report(y_test, y_pred))
    print("\nConfusion Matrix:")
    print(confusion_matrix(y_test, y_pred))
    
    # Save the pipeline
    joblib.dump(final_pipeline, 'full_prediction_pipeline.joblib')
    
    return final_pipeline, automl

if __name__ == "__main__":
    categorical_features = ['created_on', 'dex_id', 'price_confidence']
    numerical_features = [col for col in X_train.columns if col not in categorical_features]
    
    pipeline, automl = create_and_train_pipeline(
        X_train=X_train,
        y_train=y_train,
        numerical_features=numerical_features,
        categorical_features=categorical_features,
        time_budget=35
    )

Giving a minor f1 of 0.37 and a major f1 of 0.96 with a budget of 35 seconds:

Best score on validation data: 0.5886

Training Set Metrics:

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.95      0.96       930
           1       0.32      0.45      0.37        49

    accuracy                           0.92       979
   macro avg       0.64      0.70      0.67       979
weighted avg       0.94      0.92      0.93       979


Confusion Matrix:
[[883  47]
 [ 27  22]]

If I increase it to 60 seconds I get a minor f1 of 0.34 and a major f1 of 0.96:

Best score on validation data: 0.5815

Training Set Metrics:

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.95      0.96       930
           1       0.30      0.39      0.34        49

    accuracy                           0.92       979
   macro avg       0.63      0.67      0.65       979
weighted avg       0.93      0.92      0.93       979


Confusion Matrix:
[[885  45]
 [ 30  19]]

And after 120 seconds minor f1 of 0.33 and major f1 of 0.96:

Training Set Metrics:

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.95      0.96       930
           1       0.29      0.39      0.33        49

    accuracy                           0.92       979
   macro avg       0.63      0.67      0.65       979
weighted avg       0.93      0.92      0.93       979


Confusion Matrix:
[[884  46]
 [ 30  19]]

I am wondering why it is doing this? The error in the logs seems to be getting reduced however the output model is worse. This seems to be the case even when I define my own custom metric (and negate the output of course). As the negative number is getting minimised (absolute value getting bigger), it seems to give a worse final confusion matrix. What am I doing wrong here? Thanks a lot

Steps to reproduce

No response

Model Used

No response

Expected Behavior

No response

Screenshots and logs

No response

Additional Information

No response

@kabeersvohra kabeersvohra added the bug Something isn't working label Jan 17, 2025
@thinkall
Copy link
Collaborator

Hi @kabeersvohra , it could be caused by overfitting or randomness. Looking into the confusion matrix, you can see that the numbers are very close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants