Skip to content

Latest commit

 

History

History
110 lines (82 loc) · 2.69 KB

4yz1.md

File metadata and controls

110 lines (82 loc) · 2.69 KB

lang.py.module.numpy.select

Evaluate multiple conditions in a vectorized fashion

Synopsis

conditions = [<array-of-conditions>]
choices = [<array-of-conditional-outputs>]

result = np.select(conditions, choices, <default-value>)

Overview

Given a list of conditional statements, you wrapped them in a list and then get another list but with their, directly mapped, outputs

The outputs can be scalar, values, other vectors or even an evaluated function

You can use this for a better handling of conditions

Keep in mind that the order of conditions to outputs matters since they will be directly mapped

Also remember that it will return the output of the first condition that evaluates to true

Cookbook

Vectorizing multiple conditions

The traditional approach for vectorized condtional replacement would be to use the where function

But when dealing with multiple conditions this can become unsustainable

For this cases you can use the select function instead

conditions = [
    df['Original Record: Date Created'].values == df['Date Created'].values,
    df['normalized_status'].str.startswith('CLI').values,
    df['normalized_status'].isin(list1).values,
    df['normalized_status'].isin(list2).values
]

choices = [
    'New Lead', 
    'Client Lead', 
    'MTouch Lead',
    'EMTouch Lead'
]


df['lead_category1'] = np.select(conditions, choices, default='NA')  # Order of operations matter!

Vectorizing nested multiple conditions

You can vectorize multiple nested conditions by creating adding it as a composite condition into the list (one for each branch)

Considering the following multiple condition function:

def sub_conditional(row):
    if row['Inactive'] == 'No':
        if row['Providers'] == 0:
            return 'active_no_providers'
        elif row['Providers'] < 5:
            return 'active_small'
        else:
            return 'active_normal'
    elif row['duplicate_leads']:
        return 'is_dup'
    else:
        if row['bad_leads']:
            return 'active_bad'
        else:
            return 'active_good'

df['lead_type'] = df.apply(sub_conditional, axis=1)

You can transform it into a vectorized operation:

conditions = [
    ((df['Inactive'].values == 'No') & (df['Providers'].values == 0)),
    ((df['Inactive'].values == 'No') & (df['Providers'].values < 5)),
    df['Inactive'].values == 'No',
    df['duplicate_leads'].values,  # <-- you can also just evaluate boolean arrays
    df['bad_leads'].values,
]

choices = [
    'active_no_providers',
    'active_small',
    'active_normal',
    'is_dup',
    'active_bad',
]

df['lead_type_vec'] = np.select(conditions, choices, default='NA')