Learn Pandas
- Pandas cheat Sheets: 1 , 2 , 3
- Pandas section from the excellent open source book Python Data Science Book is a very good introduction to Pandas
- Pandas axis
- Indexing and Selection
- Operating on dataframes
- Python for data analysis - A great book by Wes McKinney (the original creator of Pandas)
After completing the exercises below, you should be comfortable with
- Using Pandas
★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus
A1 - Import Pandas and print out the current version (★☆☆)
A2 - Create an empty dataframe (★☆☆)
A3 - Create a dataframe from a dictionary
Create a dictionary like this first. And then convert it into a dataframe
d = {'a': [1, 2],
'b': [3, 4] }
A4 - Create the following dataframe (★☆☆)
Expected output:
city population rainfall
0 San Jose 10 15.5
1 San Francisco 5 10.2
2 Los Angeles 30 5.5
3 Seattle 7 50.5
A5 - Print the shape of above dataframe (★☆☆)
Hint : pd.shape
B1 - In the above dataframe, print out population
column (★☆☆)
Expected output:
0 10
1 5
2 30
3 7
B2 - Print out column 2
Expected output:
0 15.5
1 10.2
2 5.5
3 50.5
B3 - In the previous example, just print out the city names, without index (★☆☆)
Hint : values
Expected output: ['San Jose', 'San Francisco', 'Los Angeles', 'Seattle']
B4 - Print out the row for San Francisco
(★☆☆)
Hint: iloc
Expected output:
city San Francisco
population 5
rainfall 10.2
B5 - Print out rainfall number for Settle (★☆☆)
C1 - Set population for San Francisco to 12 (★☆☆)
Hint: iloc
Expected output:
city population rainfall
0 San Jose 10 15.5
1 San Francisco 12 10.2
2 Los Angeles 30 5.5
3 Seattle 7 50.5
C2 - Add a new row as follows (★☆☆)
city: 'San Diego', population: 8, rainfail: 7.5
Hint : pd.append
Expected output:
city population rainfall
0 San Jose 10 15.5
1 San Francisco 12 10.2
2 Los Angeles 30 5.5
3 Seattle 7 50.5
0 San Diego 8 7.5
D1 - Print rows where population > 10 (★★☆)
Hint : cities['population'] > 10
and pd.loc
Expected output:
city population rainfall
1 San Francisco 12 10.2
2 Los Angeles 30 5.5
D2 - Print rows where population > 10 and rainfal < 10 (★★☆)
Expected output:
city population rainfall
2 Los Angeles 30 5.5
E1 - Read the csv files (★☆☆)
Read the house-sales-sample.csv.
Hint: pd.read_csv
E2 - Print out the column types of the above dataframe (★☆☆)
Hint: df.dtypes
E3 - Print out the information about the above dataframe. Note the datatypes and memory usage (★☆☆)
Hint: df.info()
E4 - And how many sales for 'Bedrooms = 4' (★☆☆)
Hint: query
or df indexing or size
- https://www.w3resource.com/python-exercises/pandas/index.php
- https://www.machinelearningplus.com/python/101-pandas-exercises-python/
- https://github.com/guipsamora/pandas_exercises