How to sort a dataframe based on column or row values
df.sort_values(by=<[field(s)]>[, <extra-arguments>])
You can perform sorting on the entire dataframe with the sort_values()
method. It takes only one mandatory argument than can be a single column
(as a string) or a list of columns
Unlike sort_index this will perform sorting based on the values of a column or row
In case a list is provided, sorting will be performed in the order of the elements in the list
by
:ascending
:- Receives a boolean or a list of booleans
- If
True
it sorts in ascending order (lower to higher) - If
False
it sorts in descending order (higher to lower)
key
axis
: Indicate to sort by rows (axis=0
) or columns (axis=1
)- You can also use
columns
passes as a string
- You can also use
na_position
: Whether you want rows withNaN
to appear firstkind
: The type of sorting algorithm to use- This is ignored when sorting on more than one column
quicksort
: Default sorting algorithmmergesort
: Stable sorting algorithmheapsort
: Heap style sorting algorithm
The simple way to sort a dataframe is by passing the name of the column to
sort_values()
this will by default sort the column in ascending order
according to the values of that column
Sorting does not mutate your original dataframe, it returns a new sorted dataframe
sortded_df = df.sort_values("Scores")
You can sort a dataframe according to the values of more than one column by
passing a list as the by
argument. The order of the sorting will be based on
the order of the elements in the list
import pandas as pd
df_exams = pd.read_csv('StudentsPerformance.csv')
df_exams.sort_values(by=['math score', 'lang score'], ascending=[False, True])
# Output
# The dataframe sorted first by the 'math score' ind descendig order
# and then by the 'lang score' in ascending order
You can perform sorting based on a different criteria by passing a function
that processed the values of the column and return the criteria to sort on with
the key
argument
import pandas as pd
df_exams = pd.read_csv('StudentsPerformance.csv')
df_exams.sort_values(by='race', ascending=True, key=lamda col: col.str.lower())
You can change the sorting algorithm with the kind
argument. It can take any
of three merge algorithm
quicksort
: Default sorting algorithmmergesort
: Stable sorting algorithm- It will maintain the original order of the records with the same key
- Necessary if you plan to perform multiple sorts
heapsort
: Heap style sorting algorithm
This will be ignored if you are sorting on more than one column or label
df.sort_values(
by="cityId",
ascending=False,
kind="mergesort"
)
By default, when performing sorting, regardless of the sorting order, missing values will go at the end of the dataframe.
You can use the na_position='first'
to put them at the begining of your dataframe
df.sort_values(by='Notas', na_position='first')