lang.py.module.numpy.select

Evaluate multiple conditions in a vectorized fashion

Synopsis

conditions = [<array-of-conditions>]
choices = [<array-of-conditional-outputs>]

result = np.select(conditions, choices, <default-value>)

Overview

Given a list of conditional statements, you wrapped them in a list and then get another list but with their, directly mapped, outputs

The outputs can be scalar, values, other vectors or even an evaluated function

You can use this for a better handling of conditions

Keep in mind that the order of conditions to outputs matters since they will be directly mapped

Also remember that it will return the output of the first condition that evaluates to true

Cookbook

Vectorizing multiple conditions

The traditional approach for vectorized condtional replacement would be to use the where function

But when dealing with multiple conditions this can become unsustainable

For this cases you can use the select function instead

conditions = [
    df['Original Record: Date Created'].values == df['Date Created'].values,
    df['normalized_status'].str.startswith('CLI').values,
    df['normalized_status'].isin(list1).values,
    df['normalized_status'].isin(list2).values
]

choices = [
    'New Lead', 
    'Client Lead', 
    'MTouch Lead',
    'EMTouch Lead'
]


df['lead_category1'] = np.select(conditions, choices, default='NA')  # Order of operations matter!

Vectorizing nested multiple conditions

You can vectorize multiple nested conditions by creating adding it as a composite condition into the list (one for each branch)

Considering the following multiple condition function:

def sub_conditional(row):
    if row['Inactive'] == 'No':
        if row['Providers'] == 0:
            return 'active_no_providers'
        elif row['Providers'] < 5:
            return 'active_small'
        else:
            return 'active_normal'
    elif row['duplicate_leads']:
        return 'is_dup'
    else:
        if row['bad_leads']:
            return 'active_bad'
        else:
            return 'active_good'

df['lead_type'] = df.apply(sub_conditional, axis=1)

You can transform it into a vectorized operation:

conditions = [
    ((df['Inactive'].values == 'No') & (df['Providers'].values == 0)),
    ((df['Inactive'].values == 'No') & (df['Providers'].values < 5)),
    df['Inactive'].values == 'No',
    df['duplicate_leads'].values,  # <-- you can also just evaluate boolean arrays
    df['bad_leads'].values,
]

choices = [
    'active_no_providers',
    'active_small',
    'active_normal',
    'is_dup',
    'active_bad',
]

df['lead_type_vec'] = np.select(conditions, choices, default='NA')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4yz1.md

4yz1.md

lang.py.module.numpy.select

Synopsis

Overview

Cookbook

Vectorizing multiple conditions

Vectorizing nested multiple conditions

Files

4yz1.md

Latest commit

History

4yz1.md

File metadata and controls

lang.py.module.numpy.select

Synopsis

Overview

Cookbook

Vectorizing multiple conditions

Vectorizing nested multiple conditions