Extract elements, rows or columns by indices
df.<accessor>[<rows>, <columns>]
You can get elements, rows or columns by using accessor that will reference them by rows and columns
Pandas defines 4 accessors to select rows or columns from your
dataframe as a series
or a dataframe
You can use the following accessros to retrieve rows or columns:
loc[]
: Retrieves a row or column referenced by it's labeliloc[]
: Retrieve a row or column referenced by it's index number- It also supports negative indices to access from the last element
at[]
: Takes the labels of rows and columns and returns a single data valueiat[]
: Takes the indices of rows and columns and returns a data value- You can use it as
Excell
'scoincidir()
function
- You can use it as
The loc[]
and iloc[]
accessors support slicing
Each element (<rows>
and <columns>
) supports passing either a slice or
a list/array alongside each other. When you pass a slice on a label
reference both sides of the slice are inclusive
df.loc[:, 'city'] # Get all the rows from the city column
In this example we select rows 2 to 5 from the first and second column:
df.loc[11:15, ['name', 'city']] # Referenced by label
df.iloc[1:6, [0, 1]] # Referenced by index number
To get a single value from a pair of row and column it's best to
use the at[]
and iat[]
accessors
# Referenced by label
df.at[12, 'name'] # Ana
# Referenced by index number
df.iat[2, 0] # Ana
After selecting parts of your dataframe with any accessor you can then pass new values to modify the existing ones
In this example we are modifying from the column with label 'py-score' every row until the row with the label '13' with values from a list and the rest of the rows with value 0
df.loc[:13, 'py-score'] = [40, 50, 60, 70]
df.loc[14:, 'py-score'] = 0
You can use conditional expression to will operate as filters inside your accessor.
Since the rows part of your accessor can accept any array like to perform slicing
filtered_column_df = df.loc[df[<filter-column>] == "<filter-value>", "<accessed-column>"]
Since the loc
accessor can accept any array like and conditional expression
return a Series
object of boolean
values. We can use it to select rows
In this example we get the rows where the date hour is between 5 and 10 pm
df.set_index('date_time', inplace=True)
peak_hours = df.index.hour.isin(range(5, 10))
df.loc[peak_hours, 'average_price']
The Series
object has three distinct accessors:
cat
: Working with categorical datastr
: Working with values as string data in a vectorized waydt
: Working with datetime values
You can use the iloc
accessor in combination with the get_loc()
method of a
Index object to achieve the same result as loc
In this example we get the row at zero-index
position 0
in the py-score
column
df.iloc[0, df.columns.get_loc('py-score')] = 13