mod6

Main

main

main()

This main()function is all that anyone working on this project should need to alter. This provides sample code on how to retrieve a Pandas DataFrame from a KSIFeed object, and how to retrieve the column mapping for categorical data

ColumnType Objects

class ColumnType():
 |  ColumnType(name: str, datatype: str)

This base class represents a column in a data feed. Initialize it with a datatype to give hints on how to transform the value. Override transform_value() in order to have fine grained control on how to construct a dataframe

str

 | __str__()

This returns a string representation of the ColumnType If it is called before the data is parsed, categorical data will not yet be mapped Therefore, this should be accessed via the str() method of the ColumnMapper object as the categorical data is mapped on ColumnMapper creation and will always return the full mapping for categorical data

override_map

 | override_map(dict)

This is used by the ColumnMapper to handle ordinal values

transform_value

 | transform_value(value: str)

Columns are transformed via the ollowing logicfollowing logic:

None values are returned as is
"Yes" or "No" are retuyned as a Python bool
if the number looks like an Integer, return a Python int
if the number looks like an Float, return a Python float
if the number is a categorical value, ie it did not pass the previous checks, then build a map of the values as we encounter them and return a Python int. The value of the int represents the order in which the categorical value appeared in the feed, these are not ordinal values. In order to return ordinal values, subclass this object and pre-populate the ColumnMapper with an instance of this subclass

KSIColumnType Objects

class KSIColumnType(ColumnType)

The KSI data feed has two significant problems that this object tries to address:

Objects are stored and returned as a string, even if the data is a DateTime, Integer, Float, or Boolean value
For categorical values, there is no list of possible values in the KSI documentation Therefore, this object has two main functions:
Parse the appropriate Python datatype for the column
Maintain an internal map of Integer values for categorical data In the case that we want the Integer values to be Ordinal, we will have to modify this code so that the ColumnMapper is pre-populated with some subclassess of ColumnType where necessary

transform_value

 | transform_value(value: str)

This is the "meat" of the project. The logic is as follows:

None values are returned as is
Columns with sql datatype "Integer" are returned as a Python int
Columns with sql datatype "Date" are returned as a Python int
Columns with sql datatype "OID" are returned as a Python int
Columns with sql datatype "String" require the following logic:
- "Yes" or "No" are retuyned as a Python bool
- if the number looks like an Integer, return a Python int
- if the number looks like an Float, return a Python float
- if the number is a categorical value, ie it did not pass the previous checks, then build a map of the values as we encounter them and return a Python int. The value of the int represents the order in which the categorical value appeared in the feed, these are not ordinal values. In order to return ordinal values, subclass this object and pre-populate the ColumnMapper with an instance of this subclass

ColumnMapper Objects

class ColumnMapper():
 |  ColumnMapper()

This object builds a list of ColumnType objects from the "fields" section of the JSON. Use this object to interact with ColumnTypes instead of instatiating ColumnTypes directly

str

 | __str__()

Prints out the current list of ColumnTypes :returns str

load_columns_from_json

 | load_columns_from_json(columns_json: dict)

to be called after calling set_ordinal_values() but before calling transform_value()

transform_value

 | transform_value(value: tuple)

Convenience method to look up the correct ColumnType for a value :argument tuple ( name of column , column value to transform :returns the transformed value in the python data type that makes the most sense as per the corresponding ColumnType object's transform_value() method

get_column_types

 | get_column_types()

:returns list a list of all the ColumnTypes this object found inj the "fields" array of the JSON on init.

get_column_names

 | get_column_names()

:returns list a list of all the column names, useful in creating a pandas DataFrame

KSIColumnMapper Objects

class KSIColumnMapper(ColumnMapper):
 |  KSIColumnMapper()

load_columns_from_json

 | load_columns_from_json(columns_json: dict)

to be called after calling set_ordinal_values() but before calling transform_value()

Feed Objects

class Feed():
 |  Feed(baseQuery: str, mapper: ColumnMapper = ColumnMapper())

Abstract class representing a feed. Override parse() to implement.

set_ordinal_values

 | set_ordinal_values(column_name: str, value_map: dict)

This must be called before parse

parse

 | parse(json: object = {})

Override this method in order to read a feed.

run

 | run()

Calls parse.

get_query

 | get_query()

:returns the query that this object called the API with. Cut and paste into a browser to see the raw data

get_rows

 | get_rows()

:returns list a python 2 dimensional list of mixed datatypes in consistent order

get_column_mapper

 | get_column_mapper()

:returns ColumnMapper object

get_data_frame

 | get_data_frame()

:returns Pandas DataFrame. This is the main method of this class

PagedFeedConfiguration Objects

class PagedFeedConfiguration():
 |  PagedFeedConfiguration(page_size: int = 2000, page_size_param_name: str = "resultRecordCount", offset_param_name: str = "resultOffset")

This configuration assumes that the model for pagination is to set page size and page number as URL paramters. This configuratrion does not support next page tokens embedded in response json

PagedFeed Objects

class PagedFeed(Feed):
 |  PagedFeed(baseQuery: str, mapper: ColumnMapper = ColumnMapper(), paging_config: PagedFeedConfiguration = PagedFeedConfiguration())

This class extends Feed. It handles the case where page size is limited by the server and you must iterate through many pages. Page size and page number URL parameters are configured via the PagedFeedConfiguration object.

get_json

 | get_json(page: int = 0)

:returns the json that the API returned, already parsed in Python defaults to the first page retrieved unless you specify the page number

get_query

 | get_query(page: int = 0)

:returns the query that this object called the API with. Cut and paste into a browser to see the raw data defaults to the first page retrieved unless you specify the page number

run

 | run()

RECURSIVE METHOD that calls parse() repeatedly until the number of results is less than the page size Due to time constraints, edge cases not tested.

parse

 | parse(json: object = {})

Override this method in order to read a feed.

KSIFeed Objects

class KSIFeed(PagedFeed):
 |  KSIFeed(index_start: int = None, index_end: int = None, year_start: int = None, year_end: int = None)

This is the main object for calling the KSI API and retrieving a Pandas DataFrame object. You may also retrieve the ColumnMapper from this object to see how the values were mapped. :argument index_start the starting index of the values we want to retrieve, if not set will retrieve all indexes :argument index_end largest index of the values we want to retrieve, , if not set will retrieve all indexes. index_start and index_end must both be set to be included in the query :argument year_start the starting year of the values we want to retrieve, if not set will retrieve all indexes. :argument year_end largest index of the values we want to retrieve, , if not set will retrieve all indexes. year_start and year_end must both be set to be included in the query

parse

 | parse(json: object = {})

This this will go get the data and populate the internal map. It is called repeatedly by the run() method of the parent class until there are no more pages of rows

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Main.py		Main.py
README.md		README.md

randyweinstein/TorontoPoliceKSIAnalysis

Folders and files

Latest commit

History

Repository files navigation

mod6

Main

main

ColumnType Objects

__str__

override_map

transform_value

KSIColumnType Objects

transform_value

ColumnMapper Objects

__str__

load_columns_from_json

transform_value

get_column_types

get_column_names

KSIColumnMapper Objects

load_columns_from_json

Feed Objects

set_ordinal_values

parse

run

get_query

get_rows

get_column_mapper

get_data_frame

PagedFeedConfiguration Objects

PagedFeed Objects

get_json

get_query

run

parse

KSIFeed Objects

parse

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

str

str

Packages