Skip to content
This repository has been archived by the owner on Mar 20, 2021. It is now read-only.

Unable to ce.fit_transform on the test set #2

Open
lavpy opened this issue Sep 5, 2020 · 1 comment
Open

Unable to ce.fit_transform on the test set #2

lavpy opened this issue Sep 5, 2020 · 1 comment

Comments

@lavpy
Copy link

lavpy commented Sep 5, 2020

Hello Shivanand:

I am trying to implement your library in a Kaggle competition (https://www.kaggle.com/c/house-prices-advanced-regression-techniques). I have transformed the training set applying the following code:

embeddings = ce.get_embeddings(X_train, y_train, categorical_embedding_info=embedding_info, is_classification=True, epochs=100,batch_size=256)

Got the embeddings from the above code. Tried to transform test set the following way:

test_transformed = ce.fit_transform(X_test, embeddings=embeddings, encoders=encoders, drop_categorical_vars=True)

But it raises the following error: You are trying to merge on int32 and object columns. If you wish to proceed you should use pd.concat

@Shivanandroy
Copy link
Owner

Hi @lavpy , This repo is deprecated and is no longer maintained.

To solve you problem, you may need to downgrade the dependencies

!pip install tensorflow_addons==0.8.3
!pip install tqdm==4.41.1
!pip install keras==2.3.1
!pip install tensorflow==2.2.0

Then,

import categorical_embedder as ce
embedding_info = ce.get_embedding_info(X)
X_encoded,encoders = ce.get_label_encoded_data(X)

embeddings = ce.get_embeddings(X, y, categorical_embedding_info=embedding_info, 
                            is_classification=True, epochs=100, batch_size=256)
embeddings_df = ce.get_embeddings_in_dataframe(embeddings, encoders)

Now, embeddings_df will have the embeddings of every categorical variables, you can access them by

embeddings_df['education']

                 education_embedding_0	education_embedding_1
Bachelor's	         0.226899	             0.150172
Below Secondary	          0.438177	              0.406307
Master's & above	 0.071212	            0.054443

Now - Just map these embeddings in your data against your categorical variables

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants