Skip to content

Latest commit

 

History

History
192 lines (131 loc) · 4.97 KB

pd-1__pandas-intro.md

File metadata and controls

192 lines (131 loc) · 4.97 KB

Pandas Intro

Objective

Learn Pandas

Reference

Essentials (★☆☆)

More Resources

Checklist

After completing the exercises below, you should be comfortable with

  • Using Pandas

Exercises

Difficulty Level

★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus

A - Creating dataframes

A1 - Import Pandas and print out the current version (★☆☆)

A2 - Create an empty dataframe (★☆☆)

A3 - Create a dataframe from a dictionary
Create a dictionary like this first. And then convert it into a dataframe

d = {'a': [1, 2],
     'b': [3, 4] }

A4 - Create the following dataframe (★☆☆)

Expected output:

            city  population  rainfall
0       San Jose          10      15.5
1  San Francisco           5      10.2
2    Los Angeles          30       5.5
3        Seattle           7      50.5

A5 - Print the shape of above dataframe (★☆☆)
Hint : pd.shape

B - Indexing and Slicing

B1 - In the above dataframe, print out population column (★☆☆)
Expected output:

0    10
1     5
2    30
3     7

B2 - Print out column 2
Expected output:

0    15.5
1    10.2
2     5.5
3    50.5

B3 - In the previous example, just print out the city names, without index (★☆☆)
Hint : values
Expected output: ['San Jose', 'San Francisco', 'Los Angeles', 'Seattle']

B4 - Print out the row for San Francisco (★☆☆)
Hint: iloc
Expected output:

city          San Francisco
population                5
rainfall               10.2

B5 - Print out rainfall number for Settle (★☆☆)

C - Manipulating DF

C1 - Set population for San Francisco to 12 (★☆☆)
Hint: iloc

Expected output:

            city  population  rainfall
0       San Jose          10      15.5
1  San Francisco          12      10.2
2    Los Angeles          30       5.5
3        Seattle           7      50.5

C2 - Add a new row as follows (★☆☆)
city: 'San Diego', population: 8, rainfail: 7.5
Hint : pd.append
Expected output:

            city  population  rainfall
0       San Jose          10      15.5
1  San Francisco          12      10.2
2    Los Angeles          30       5.5
3        Seattle           7      50.5
0      San Diego           8       7.5

D - Searching

D1 - Print rows where population > 10 (★★☆)

Hint : cities['population'] > 10 and pd.loc

Expected output:

            city  population  rainfall
1  San Francisco          12      10.2
2    Los Angeles          30       5.5

D2 - Print rows where population > 10 and rainfal < 10 (★★☆)

Expected output:

            city  population  rainfall
2    Los Angeles          30       5.5

E - Reading Files

E1 - Read the csv files (★☆☆)
Read the house-sales-sample.csv.

Hint: pd.read_csv

E2 - Print out the column types of the above dataframe (★☆☆)
Hint: df.dtypes

E3 - Print out the information about the above dataframe. Note the datatypes and memory usage (★☆☆)
Hint: df.info()

E4 - And how many sales for 'Bedrooms = 4' (★☆☆)
Hint: query or df indexing or size

More Exercices