How does Collie deal with mixed datatypes in df, such as cont, cat, text, datetime? #40
-
Hi Collie team, Great work on making this amazing tool. I like how clean and simple the API is. I have couple questions concerning how to use Collie to prep input data with various data types, such as continuous, categorical (both nominal and ordinal), text, datetime, list (of tags), image etc.
Learning with mixed datatype is called multimodal learning. There are multiple such packages (google automl tabular, aws autogluon) can solve classification and regression problems. Here I am interested to see such feature in building recommendation systems. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @wjlgatech. Great question, I'll do my best to answer this! Our plan with Collie was to build a flexible framework for recommendations such that we could include mixed data in the model with ease (including both images, text, and even tabular data). Right now, the only currently supported way to do this is to include this as item metadata that is passed to a hybrid model (see the docs here for information on that). Basically, with this, the model takes in the user embedding and item embedding the model is trained to optimize for, and concatenates these with the item metadata (AKA image embeddings, text embeddings, etc.), and uses this full representation to predict the ranking for that user-item pair. This isn't perfect or fully complete, but it's the start we have currently implemented in the library. In previous hack weeks, I have experimented with including a full on language model (something like a fine-tuned BERT model) as a learnable part of the model in a custom architecture. Ideally, it's easy to do this by just inheriting the We sorta try to show how this works in the tutorial notebooks Future improvements to the library should be more dedicated hybrid models that better encode this data into the model, and the ability to also include user metadata, so my goal is to one day add these to the library. Let me know if any of this doesn't make sense, you have ideas about how we can improve this, or you want to chat more about the exciting world of multimodal learning. Cheers! |
Beta Was this translation helpful? Give feedback.
Hi @wjlgatech.
Great question, I'll do my best to answer this!
Our plan with Collie was to build a flexible framework for recommendations such that we could include mixed data in the model with ease (including both images, text, and even tabular data). Right now, the only currently supported way to do this is to include this as item metadata that is passed to a hybrid model (see the docs here for information on that).
Basically, with this, the model takes in the user embedding and item embedding the model is trained to optimize for, and concatenates these with the item metadata (AKA image embeddings, text embeddings, etc.), and uses this full representation to predict the ranking for that…