Evaluate multiple conditions in a vectorized
fashion
conditions = [<array-of-conditions>]
choices = [<array-of-conditional-outputs>]
result = np.select(conditions, choices, <default-value>)
Given a list of conditional statements, you wrapped them
in a list
and then get another list
but with their,
directly mapped, outputs
The outputs can be scalar, values, other vectors or even an evaluated function
You can use this for a better handling of conditions
Keep in mind that the order of conditions to outputs matters since they will be directly mapped
Also remember that it will return the output of the
first condition that evaluates to true
The traditional approach for vectorized
condtional replacement
would be to use the where function
But when dealing with multiple conditions this can become unsustainable
For this cases you can use the select
function instead
conditions = [
df['Original Record: Date Created'].values == df['Date Created'].values,
df['normalized_status'].str.startswith('CLI').values,
df['normalized_status'].isin(list1).values,
df['normalized_status'].isin(list2).values
]
choices = [
'New Lead',
'Client Lead',
'MTouch Lead',
'EMTouch Lead'
]
df['lead_category1'] = np.select(conditions, choices, default='NA') # Order of operations matter!
You can vectorize multiple nested conditions by creating adding it as a composite condition into the list (one for each branch)
Considering the following multiple condition function:
def sub_conditional(row):
if row['Inactive'] == 'No':
if row['Providers'] == 0:
return 'active_no_providers'
elif row['Providers'] < 5:
return 'active_small'
else:
return 'active_normal'
elif row['duplicate_leads']:
return 'is_dup'
else:
if row['bad_leads']:
return 'active_bad'
else:
return 'active_good'
df['lead_type'] = df.apply(sub_conditional, axis=1)
You can transform it into a vectorized
operation:
conditions = [
((df['Inactive'].values == 'No') & (df['Providers'].values == 0)),
((df['Inactive'].values == 'No') & (df['Providers'].values < 5)),
df['Inactive'].values == 'No',
df['duplicate_leads'].values, # <-- you can also just evaluate boolean arrays
df['bad_leads'].values,
]
choices = [
'active_no_providers',
'active_small',
'active_normal',
'is_dup',
'active_bad',
]
df['lead_type_vec'] = np.select(conditions, choices, default='NA')