From 060c37c39dadf4c4166257b37f3b35b5b29087a9 Mon Sep 17 00:00:00 2001 From: Sweaty Handshake Date: Mon, 11 Mar 2024 18:19:28 +0000 Subject: [PATCH 1/9] Add multisale and nonlivable sections --- .gitignore | 57 +- README.Rmd | 527 ++++++++++--------- README.md | 100 ++-- docs/spreadsheets/condo_nonlivable_demo.xlsx | Bin 0 -> 10144 bytes 4 files changed, 362 insertions(+), 322 deletions(-) create mode 100644 docs/spreadsheets/condo_nonlivable_demo.xlsx diff --git a/.gitignore b/.gitignore index 6ec62cc..15cd087 100644 --- a/.gitignore +++ b/.gitignore @@ -1,28 +1,29 @@ -# History files -.Rhistory -.Rapp.history - -# R project files -.Rproj.user/ -reports/*_files/ - -# knitr and R markdown default cache directories -*_cache/ -cache/ - -# Temporary files created by R markdown -*.utf8.md -*.knit.md - -# Ignore all data files -*.parquet -*.rds -*.zip -*.csv -*.xlsx -*.xlsm -*.html -*.rmarkdown - -# Ignore scratch documents -scratch*.* +# History files +.Rhistory +.Rapp.history + +# R project files +.Rproj.user/ +reports/*_files/ + +# knitr and R markdown default cache directories +*_cache/ +cache/ + +# Temporary files created by R markdown +*.utf8.md +*.knit.md + +# Ignore all data files +*.parquet +*.rds +*.zip +*.csv +*.xlsx +!condo_nonlivable_demo.xlsx +*.xlsm +*.html +*.rmarkdown + +# Ignore scratch documents +scratch*.* diff --git a/README.Rmd b/README.Rmd index 6bac107..15018a6 100644 --- a/README.Rmd +++ b/README.Rmd @@ -1,258 +1,269 @@ ---- -title: "Table of Contents" -output: - github_document: - toc: true - toc_depth: 3 ---- - - - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.path = "docs/figures/", - out.width = "100%" -) -``` - -> :warning: **NOTE** :warning: -> -> The [condominium model](https://github.com/ccao-data/model-condo-avm) (this repo) is nearly identical to the [residential (single/multi-family) model](https://github.com/ccao-data/model-res-avm), with a few [key differences](#differences-compared-to-the-residential-model). Please read the documentation for the [residential model](https://github.com/ccao-data/model-res-avm) first. - -# Prior Models - -This repository contains code, data, and documentation for the Cook County Assessor's condominium reassessment model. Information about prior year models can be found at the following links: - -| Year(s) | Triad(s) | Method | Language / Framework | Link | -|---------|----------|---------------------------------------------|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| -| 2015 | City | N/A | SPSS | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev/-/tree/master/code.legacy/2015%20City%20Tri/2015%20Condo%20Models) | -| 2018 | City | N/A | N/A | Not available. Values provided by vendor | -| 2019 | North | Linear regression or GBM model per township | R (Base) | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev) | -| 2020 | South | Linear regression or GBM model per township | R (Base) | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev) | -| 2021 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2021-assessment-year) | -| 2022 | North | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2022-assessment-year) | -| 2023 | South | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2023-assessment-year) | - -# Model Overview - -The duty of the Cook County Assessor's Office is to value property in a fair, accurate, and transparent way. The Assessor is committed to transparency throughout the assessment process. As such, this document contains: - -* [A description of the differences between the residential model and this (condominium) model](#differences-compared-to-the-residential-model) -* [An outline of ongoing issues specific to condominium assessments](#ongoing-issues) - -The repository itself contains the [code](./pipeline) and [data](./input) for the Automated Valuation Model (AVM) used to generate initial assessed values for all condominium properties in Cook County. This system is effectively an advanced machine learning model (hereafter referred to as "the model"). It uses previous sales to generate estimated sale values (assessments) for all properties. - -## Differences Compared to the Residential Model - -The Cook County Assessor's Office ***does not track characteristic data for condominiums***. Like most assessors nationwide, our office staff cannot enter buildings to observe property characteristics. For condos, this means we cannot observe amenities, quality, or any other interior characteristics. - -The only information our office has about individual condominium units is their age, location, sale date/price, and percentage of ownership. This makes modeling condos particularly challenging, as the number of usable features is quite small. Fortunately, condos have two qualities which make modeling a bit easier: - -1. Condos are more homogeneous than single/multi-family properties, i.e. the range of potential condo sale prices is much narrower. -2. Condo are pre-grouped into clusters of like units (buildings), and units within the same building usually have similar sale prices. - -We leverage these qualities to produce what we call ***strata***, a feature unique to the condo model. See [Condo Strata](#condo-strata) for more information about how strata is used and calculated. - -> :warning: **NOTE** :warning: -> -> Recently, the CCAO has started to manually collect high-level condominium data, including total building square footage and estimated unit square footage/number of bedrooms. This data is sourced from listings and a number of additional third-party sources and is available for the North and South triads only. - -### Features Used - -Because our office (mostly) cannot observe individual condo unit characteristics, we must rely on aggregate geospatial features, economic features, [strata](#condo-strata), and time of sale to determine condo assessed values. The features in the table below are the ones used in the 2023 assessment model. - -```{r features_used, message=FALSE, echo=FALSE} -library(dplyr) -library(tidyr) -library(yaml) - -condo_params <- read_yaml("params.yaml") -condo_preds <- condo_params$model$predictor$all - -res_params <- read_yaml( - "https://raw.githubusercontent.com/ccao-data/model-res-avm/master/params.yaml" -) -res_preds <- res_params$model$predictor$all - -condo_unique_preds <- setdiff(condo_preds, res_preds) - -ccao::vars_dict %>% - inner_join( - as_tibble(condo_preds), - by = c("var_name_model" = "value") - ) %>% - distinct( - var_name_model, - `Feature Name` = var_name_pretty, - Category = var_type, - Type = var_data_type, - ) %>% - mutate( - Category = recode( - Category, - char = "Characteristic", - econ = "Economic", - geo = "Geospatial", - ind = "Indicator", - time = "Time", - meta = "Meta" - ), - `Feature Name` = recode( - `Feature Name`, - "Tieback Proration Rate" = "Condominium % Ownership", - "Year Built" = "Condominium Building Year Built" - ) - ) %>% - mutate(`Unique to Condo Model` = ifelse( - var_name_model %in% condo_unique_preds | - `Feature Name` %in% - c("Condominium Building Year Built", "Condominium % Ownership"), - "X", "" - )) %>% - arrange(desc(`Unique to Condo Model`), Category) %>% - select(-var_name_model) %>% - knitr::kable(format = "markdown") -``` - -### Valuation - -For the most part, condos are valued the same way as single- and multi-family residential property. We [train a model](https://github.com/ccao-data/model-res-avm#how-it-works) using individual condo unit sales, predict the value of all units, and then apply any [post-modeling adjustment](https://github.com/ccao-data/model-res-avm#post-modeling). - -However, because the CCAO has so [little information about individual units](#differences-compared-to-the-residential-model), we must rely on the [condominium percentage of ownership](#features-used) to differentiate between units in a building. This feature is effectively the proportion of the building's overall value held by a unit. It is created when a condominium declaration is filed with the County (usually by the developer of the building). The critical assumption underlying the condo valuation process is that percentage of ownership correlates with current market value. - -Percentage of ownership is used in two ways: - -1. It is used directly as a predictor/feature in the regression model to estimate differing unit values within the same building. -2. It is used to reapportion unit values directly i.e. the value of a unit is ultimately equal to `% of ownership * total building value`. - -Visually, this looks like: - -![](docs/figures/valuation_perc_owner.png) - -Percentage of ownership is the single most important feature in the condo model. It determines almost all intra-building differences in unit values. - -## Condo Strata - -The condo model uses an engineered feature called *strata* to deliver much of its predictive power. Strata is the binned, time-weighted, 5-year average sale price of the building. There are two strata features used in the model, one with 10 bins and one with 300 bins. Buildings are binned across each triad using either quantiles or 1-dimensional k-means. A visual representation of quantile-based strata binning looks like: - -![](docs/figures/strata.png) - -To put strata in more concrete terms, the table below shows a sample 5-level strata. Each condominium unit would be assigned a strata from this table (Strata 1, Strata 2, etc.) based on the 5-year weighted average sale price of its building. All units in a building will have the same strata. - -```{r strata, echo=FALSE} -library(tibble) - -tribble( - ~"Strata", ~"Range of 5-year Average Sale Price", - "Strata 1", "$0 - $121K", - "Strata 2", "$121K - $149K", - "Strata 3", "$149K - $199K", - "Strata 4", "$199K - $276K", - "Strata 5", "$276K+" -) %>% - knitr::kable(format = "markdown") -``` - -Some additional notes on strata: - -- Strata is calculated in the [ingest stage](./pipeline/00-ingest.R) of this repository. -- Calculating the 5-year average sale price of a building requires at least 1 sale. Buildings with no sales have their strata imputed via KNN (using year built, number of units, and location as features). -- Number of bins (10 and 100) was chosen based on model performance. These numbers yielded the lowest root mean-squared error (RMSE). - -# Ongoing Issues - -The CCAO faces a number of ongoing issues specific to condominium modeling. We are currently working on processes to fix these issues. We list the issues here for the sake of transparency and to provide a sense of the challenges we face. - -### Unit Heterogeneity - -The current modeling methodology for condominiums makes two assumptions: - -1. Condos units within the same building are similar and will sell for similar amounts. -2. If units are not similar, the percentage of ownership will accurately reflect and be proportional to any difference in value between units. - -The model process works even in heterogeneous buildings as long as assumption 2 is met. For example, imagine a building with 8 identical units and 1 penthouse unit. This building violates assumption 1 because the penthouse unit is likely larger and worth more than the other 10. However, if the percentage of ownership of each unit is roughly proportional to its value, then each unit will still receive a fair assessment. - -However, the model can produce poor results when both of these assumptions are violated. For example, if a building has an extreme mix of different units, each with the same percentage of ownership, then smaller, less expensive units will be overvalued and larger, more expensive units will be undervalued. - -This problem is rare, but does occur in certain buildings with many heterogeneous units. Such buildings typically go through a process of secondary review to ensure the accuracy of the individual unit values. - -### Buildings With Few Sales - -The condo model relies on sales within the same building to calculate [strata](#condo-strata). This method works well for large buildings with many sales, but can break down when there are only 1 or 2 sales in a building. The primary danger here is _unrepresentative_ sales, i.e. sales that deviate significantly from the real average value of a building's units. When this happens, buildings can have their average unit sale value pegged too high or low. - -Fortunately, buildings without any recent sales are relatively rare, as condos have a higher turnover rate than single and multi-family property. Smaller buildings with low turnover are the most likely to not have recent sales. - -### Buildings Without Sales - -When no sales have occurred in a building in the 5 years prior to assessment, the building's strata features are imputed. The model will look at nearby buildings that have similar unit counts/age and then try to assign an appropriate strata to the target building. - -Most of the time, this technique produces reasonable results. However, buildings without sales still go through an additional round of review to ensure the accuracy of individual unit values. - -# FAQs - -**Note:** The FAQs listed here are for condo-specific questions. See the residential model documentation for [more general FAQs](https://github.com/ccao-data/model-res-avm#faqs). - -**Q: What are the most important features in the condo model?** - -As with the [residential model](https://github.com/ccao-data/model-res-avm), the importance of individual features varies by location and time. However, generally speaking, the most important features are: - -* Location, location, location. Location is the largest driver of county-wide variation in condo value. We account for location using [geospatial features like neighborhood](#features-used). -* Condo percentage of ownership, which determines the intra-building variation in unit price. -* [Condo building strata](#condo-strata). Strata provides us with a good estimate of the average sale price of a building's units. - -**Q: How do I see my condo building's strata?** - -Individual building [strata](#condo-strata) are not included with assessment notices or shown on the CCAO's website. However, strata *are* stored in the sample data included in this repository. You can load the data ([`input/condo_strata_data.parquet`](./input/condo_strata_data.parquet)) using R and the `read_parquet()` function from the `arrow` library. - -**Q: How do I see the assessed value of other units in my building?** - -You can use the [CCAO's Address Search](https://www.cookcountyassessor.com/address-search#address) to see all the PINs and values associated with a specific condominium building, simply leave the `Unit Number` field blank when submitting a search. - -**Q: How do I view my unit's percentage of ownership?** - -The percentage of ownership for individual units is printed on assessment notices. You may also be able to find it via your building's board or condo declaration. - -# Usage - -Installation and usage of this model is identical to the [installation and usage of the residential model](https://github.com/ccao-data/model-res-avm#usage). Please follow the instructions listed there. - -## Getting Data - -The data required to run these scripts is produced by the [ingest stage](pipeline/00-ingest.R), which uses SQL pulls from the CCAO's Athena database as a primary data source. CCAO employees can run the ingest stage or pull the latest version of the input data from our internal DVC store using: - -```bash -dvc pull -``` - -Public users can download data for each assessment year using the links below. Each file should be placed in the `input/` directory prior to running the model pipeline. - -#### 2021 - -- [assmntdata.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2021/assmntdata.parquet) -- [modeldata.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2021/modeldata.parquet) - -#### 2022 - -- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/assessment_data.parquet) -- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/condo_strata_data.parquet) -- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/land_nbhd_rate_data.parquet) -- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/training_data.parquet) - -#### 2023 - -- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/assessment_data.parquet) -- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/condo_strata_data.parquet) -- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/land_nbhd_rate_data.parquet) -- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/training_data.parquet) - -For other data from the CCAO, please visit the [Cook County Data Portal](https://datacatalog.cookcountyil.gov/). - -# License - -Distributed under the AGPL-3 License. See [LICENSE](./LICENSE) for more information. - -# Contributing - -We welcome pull requests, comments, and other feedback via GitHub. For more involved collaboration or projects, please see the [Developer Engagement Program](https://github.com/ccao-data/people#external) documentation on our group wiki. +--- +title: "Table of Contents" +output: + github_document: + toc: true + toc_depth: 3 +--- + + + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + fig.path = "docs/figures/", + out.width = "100%" +) +``` + +> :warning: **NOTE** :warning: +> +> The [condominium model](https://github.com/ccao-data/model-condo-avm) (this repo) is nearly identical to the [residential (single/multi-family) model](https://github.com/ccao-data/model-res-avm), with a few [key differences](#differences-compared-to-the-residential-model). Please read the documentation for the [residential model](https://github.com/ccao-data/model-res-avm) first. + +# Prior Models + +This repository contains code, data, and documentation for the Cook County Assessor's condominium reassessment model. Information about prior year models can be found at the following links: + +| Year(s) | Triad(s) | Method | Language / Framework | Link | +|---------|----------|---------------------------------------------|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| +| 2015 | City | N/A | SPSS | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev/-/tree/master/code.legacy/2015%20City%20Tri/2015%20Condo%20Models) | +| 2018 | City | N/A | N/A | Not available. Values provided by vendor | +| 2019 | North | Linear regression or GBM model per township | R (Base) | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev) | +| 2020 | South | Linear regression or GBM model per township | R (Base) | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev) | +| 2021 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2021-assessment-year) | +| 2022 | North | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2022-assessment-year) | +| 2023 | South | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2023-assessment-year) | +| 2024 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2024-assessment-year) | + +# Model Overview + +The duty of the Cook County Assessor's Office is to value property in a fair, accurate, and transparent way. The Assessor is committed to transparency throughout the assessment process. As such, this document contains: + +* [A description of the differences between the residential model and this (condominium) model](#differences-compared-to-the-residential-model) +* [An outline of ongoing issues specific to condominium assessments](#ongoing-issues) + +The repository itself contains the [code](./pipeline) for the Automated Valuation Model (AVM) used to generate initial assessed values for all condominium properties in Cook County. This system is effectively an advanced machine learning model (hereafter referred to as "the model"). It uses previous sales to generate estimated sale values (assessments) for all properties. + +## Differences Compared to the Residential Model + +The Cook County Assessor's Office has begun to track a limited number of characteristics (building-level square footage and unit-level square footage, bedrooms, and bathrooms) for condominiums, but the data we have ***varies in both the characteristics available and their completeness*** between triads. Staffing limitations have forced the office to prioritizes smaller condo buildings less likely to have recent unit sales in certain parts of the county. Like most assessors nationwide, our office staff cannot enter buildings to observe property characteristics. For condos, this means we cannot observe amenities, quality, or any other interior characteristics which must instead be gathered from listings and a number of additional third-party sources. + +The only complete information our office currently has about individual condominium units is their age, location, sale date/price, and percentage of ownership. This makes modeling condos particularly challenging, as the number of usable features is quite small. Fortunately, condos have two qualities which make modeling a bit easier: + +1. Condos are more homogeneous than single/multi-family properties, i.e. the range of potential condo sale prices is much narrower. +2. Condo are pre-grouped into clusters of like units (buildings), and units within the same building usually have similar sale prices. + +We leverage these qualities to produce what we call ***strata***, a feature unique to the condo model. See [Condo Strata](#condo-strata) for more information about how strata is used and calculated. + +### Features Used + +Because our individual condo unit characteristics are sparse and incomplete, we must rely on aggregate geospatial features, economic features, [strata](#condo-strata), and time of sale to determine condo assessed values. The features in the table below are the ones used in the 2023 assessment model. + +```{r features_used, message=FALSE, echo=FALSE} +library(dplyr) +library(tidyr) +library(yaml) + +condo_params <- read_yaml("params.yaml") +condo_preds <- condo_params$model$predictor$all + +res_params <- read_yaml( + "https://raw.githubusercontent.com/ccao-data/model-res-avm/master/params.yaml" +) +res_preds <- res_params$model$predictor$all + +condo_unique_preds <- setdiff(condo_preds, res_preds) + +ccao::vars_dict %>% + inner_join( + as_tibble(condo_preds), + by = c("var_name_model" = "value") + ) %>% + distinct( + var_name_model, + `Feature Name` = var_name_pretty, + Category = var_type, + Type = var_data_type, + ) %>% + mutate( + Category = recode( + Category, + char = "Characteristic", + econ = "Economic", + geo = "Geospatial", + ind = "Indicator", + time = "Time", + meta = "Meta" + ), + `Feature Name` = recode( + `Feature Name`, + "Tieback Proration Rate" = "Condominium % Ownership", + "Year Built" = "Condominium Building Year Built" + ) + ) %>% + mutate(`Unique to Condo Model` = ifelse( + var_name_model %in% condo_unique_preds | + `Feature Name` %in% + c("Condominium Building Year Built", "Condominium % Ownership"), + "X", "" + )) %>% + arrange(desc(`Unique to Condo Model`), Category) %>% + select(-var_name_model) %>% + knitr::kable(format = "markdown") +``` + +### Valuation + +For the most part, condos are valued the same way as single- and multi-family residential property. We [train a model](https://github.com/ccao-data/model-res-avm#how-it-works) using individual condo unit sales, predict the value of all units, and then apply any [post-modeling adjustment](https://github.com/ccao-data/model-res-avm#post-modeling). + +However, because the CCAO has so [little information about individual units](#differences-compared-to-the-residential-model), we must rely on the [condominium percentage of ownership](#features-used) to differentiate between units in a building. This feature is effectively the proportion of the building's overall value held by a unit. It is created when a condominium declaration is filed with the County (usually by the developer of the building). The critical assumption underlying the condo valuation process is that percentage of ownership correlates with current market value. + +Percentage of ownership is used in two ways: + +1. It is used directly as a predictor/feature in the regression model to estimate differing unit values within the same building. +2. It is used to reapportion unit values directly i.e. the value of a unit is ultimately equal to `% of ownership * total building value`. + +Visually, this looks like: + +![](docs/figures/valuation_perc_owner.png) + +For what the office terms "nonlivable" spaces, i.e. parking spaces, storage space, and common area, the breakout of value works differently. See [this excel sheet](docs/spreadsheets/condo_nonlivable_demo.xlsx) for an interactive example of how nonlivable spaces are valued based on the total value of a building's livable space. + +Percentage of ownership is the single most important feature in the condo model. It determines almost all intra-building differences in unit values. + +### Multisales + +The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the accurate market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many are sold bundled with deeded parking spaces ("nonlivable" parcels) that are separate parcels and these two-parcel sales are highly reflective of a livable parcel's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. + +## Condo Strata + +The condo model uses an engineered feature called *strata* to deliver much of its predictive power. Strata is the binned, time-weighted, 5-year average sale price of the building. There are two strata features used in the model, one with 10 bins and one with 300 bins. Buildings are binned across each triad using either quantiles or 1-dimensional k-means. A visual representation of quantile-based strata binning looks like: + +![](docs/figures/strata.png) + +To put strata in more concrete terms, the table below shows a sample 5-level strata. Each condominium unit would be assigned a strata from this table (Strata 1, Strata 2, etc.) based on the 5-year weighted average sale price of its building. All units in a building will have the same strata. + +```{r strata, echo=FALSE} +library(tibble) + +tribble( + ~"Strata", ~"Range of 5-year Average Sale Price", + "Strata 1", "$0 - $121K", + "Strata 2", "$121K - $149K", + "Strata 3", "$149K - $199K", + "Strata 4", "$199K - $276K", + "Strata 5", "$276K+" +) %>% + knitr::kable(format = "markdown") +``` + +Some additional notes on strata: + +- Strata is calculated in the [ingest stage](./pipeline/00-ingest.R) of this repository. +- Calculating the 5-year average sale price of a building requires at least 1 sale. Buildings with no sales have their strata imputed via KNN (using year built, number of units, and location as features). +- Number of bins (10 and 100) was chosen based on model performance. These numbers yielded the lowest root mean-squared error (RMSE). + +# Ongoing Issues + +The CCAO faces a number of ongoing issues specific to condominium modeling. We are currently working on processes to fix these issues. We list the issues here for the sake of transparency and to provide a sense of the challenges we face. + +### Unit Heterogeneity + +The current modeling methodology for condominiums makes two assumptions: + +1. Condos units within the same building are similar and will sell for similar amounts. +2. If units are not similar, the percentage of ownership will accurately reflect and be proportional to any difference in value between units. + +The model process works even in heterogeneous buildings as long as assumption 2 is met. For example, imagine a building with 8 identical units and 1 penthouse unit. This building violates assumption 1 because the penthouse unit is likely larger and worth more than the other 10. However, if the percentage of ownership of each unit is roughly proportional to its value, then each unit will still receive a fair assessment. + +However, the model can produce poor results when both of these assumptions are violated. For example, if a building has an extreme mix of different units, each with the same percentage of ownership, then smaller, less expensive units will be overvalued and larger, more expensive units will be undervalued. + +This problem is rare, but does occur in certain buildings with many heterogeneous units. Such buildings typically go through a process of secondary review to ensure the accuracy of the individual unit values. + +### Buildings With Few Sales + +The condo model relies on sales within the same building to calculate [strata](#condo-strata). This method works well for large buildings with many sales, but can break down when there are only 1 or 2 sales in a building. The primary danger here is _unrepresentative_ sales, i.e. sales that deviate significantly from the real average value of a building's units. When this happens, buildings can have their average unit sale value pegged too high or low. + +Fortunately, buildings without any recent sales are relatively rare, as condos have a higher turnover rate than single and multi-family property. Smaller buildings with low turnover are the most likely to not have recent sales. + +### Buildings Without Sales + +When no sales have occurred in a building in the 5 years prior to assessment, the building's strata features are imputed. The model will look at nearby buildings that have similar unit counts/age and then try to assign an appropriate strata to the target building. + +Most of the time, this technique produces reasonable results. However, buildings without sales still go through an additional round of review to ensure the accuracy of individual unit values. + +# FAQs + +**Note:** The FAQs listed here are for condo-specific questions. See the residential model documentation for [more general FAQs](https://github.com/ccao-data/model-res-avm#faqs). + +**Q: What are the most important features in the condo model?** + +As with the [residential model](https://github.com/ccao-data/model-res-avm), the importance of individual features varies by location and time. However, generally speaking, the most important features are: + +* Location, location, location. Location is the largest driver of county-wide variation in condo value. We account for location using [geospatial features like neighborhood](#features-used). +* Condo percentage of ownership, which determines the intra-building variation in unit price. +* [Condo building strata](#condo-strata). Strata provides us with a good estimate of the average sale price of a building's units. + +**Q: How do I see my condo building's strata?** + +Individual building [strata](#condo-strata) are not included with assessment notices or shown on the CCAO's website. However, strata *are* stored in the sample data included in this repository. You can load the data ([`input/condo_strata_data.parquet`](./input/condo_strata_data.parquet)) using R and the `read_parquet()` function from the `arrow` library. + +**Q: How do I see the assessed value of other units in my building?** + +You can use the [CCAO's Address Search](https://www.cookcountyassessor.com/address-search#address) to see all the PINs and values associated with a specific condominium building, simply leave the `Unit Number` field blank when submitting a search. + +**Q: How do I view my unit's percentage of ownership?** + +The percentage of ownership for individual units is printed on assessment notices. You may also be able to find it via your building's board or condo declaration. + +# Usage + +Installation and usage of this model is identical to the [installation and usage of the residential model](https://github.com/ccao-data/model-res-avm#usage). Please follow the instructions listed there. + +## Getting Data + +The data required to run these scripts is produced by the [ingest stage](pipeline/00-ingest.R), which uses SQL pulls from the CCAO's Athena database as a primary data source. CCAO employees can run the ingest stage or pull the latest version of the input data from our internal DVC store using: + +```bash +dvc pull +``` + +Public users can download data for each assessment year using the links below. Each file should be placed in the `input/` directory prior to running the model pipeline. + +#### 2021 + +- [assmntdata.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2021/assmntdata.parquet) +- [modeldata.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2021/modeldata.parquet) + +#### 2022 + +- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/assessment_data.parquet) +- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/condo_strata_data.parquet) +- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/land_nbhd_rate_data.parquet) +- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/training_data.parquet) + +#### 2023 + +- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/assessment_data.parquet) +- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/condo_strata_data.parquet) +- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/land_nbhd_rate_data.parquet) +- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/training_data.parquet) + +#### 2024 + +- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/assessment_data.parquet) +- [char_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/char_data.parquet) +- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/condo_strata_data.parquet) +- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/land_nbhd_rate_data.parquet) +- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/training_data.parquet) + +For other data from the CCAO, please visit the [Cook County Data Portal](https://datacatalog.cookcountyil.gov/). + +# License + +Distributed under the AGPL-3 License. See [LICENSE](./LICENSE) for more information. + +# Contributing + +We welcome pull requests, comments, and other feedback via GitHub. For more involved collaboration or projects, please see the [Developer Engagement Program](https://github.com/ccao-data/people#external) documentation on our group wiki. diff --git a/README.md b/README.md index 81e7aed..04e6cbd 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,7 @@ Table of Contents Model](#differences-compared-to-the-residential-model) - [Features Used](#features-used) - [Valuation](#valuation) + - [Multisales](#multisales) - [Condo Strata](#condo-strata) - [Ongoing Issues](#ongoing-issues) - [Unit Heterogeneity](#unit-heterogeneity) @@ -45,6 +46,7 @@ prior year models can be found at the following links: | 2021 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2021-assessment-year) | | 2022 | North | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2022-assessment-year) | | 2023 | South | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2023-assessment-year) | +| 2024 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2024-assessment-year) | # Model Overview @@ -59,26 +61,32 @@ contains: - [An outline of ongoing issues specific to condominium assessments](#ongoing-issues) -The repository itself contains the [code](./pipeline) and -[data](./input) for the Automated Valuation Model (AVM) used to generate -initial assessed values for all condominium properties in Cook County. -This system is effectively an advanced machine learning model (hereafter -referred to as “the model”). It uses previous sales to generate -estimated sale values (assessments) for all properties. +The repository itself contains the [code](./pipeline) for the Automated +Valuation Model (AVM) used to generate initial assessed values for all +condominium properties in Cook County. This system is effectively an +advanced machine learning model (hereafter referred to as “the model”). +It uses previous sales to generate estimated sale values (assessments) +for all properties. ## Differences Compared to the Residential Model -The Cook County Assessor’s Office ***does not track characteristic data -for condominiums***. Like most assessors nationwide, our office staff -cannot enter buildings to observe property characteristics. For condos, -this means we cannot observe amenities, quality, or any other interior -characteristics. - -The only information our office has about individual condominium units -is their age, location, sale date/price, and percentage of ownership. -This makes modeling condos particularly challenging, as the number of -usable features is quite small. Fortunately, condos have two qualities -which make modeling a bit easier: +The Cook County Assessor’s Office has begun to track a limited number of +characteristics (building-level square footage and unit-level square +footage, bedrooms, and bathrooms) for condominiums, but the data we have +***varies in both the characteristics available and their +completeness*** between triads. Staffing limitations have forced the +office to prioritizes smaller condo buildings less likely to have recent +unit sales in certain parts of the county. Like most assessors +nationwide, our office staff cannot enter buildings to observe property +characteristics. For condos, this means we cannot observe amenities, +quality, or any other interior characteristics which must instead be +gathered from listings and a number of additional third-party sources. + +The only complete information our office currently has about individual +condominium units is their age, location, sale date/price, and +percentage of ownership. This makes modeling condos particularly +challenging, as the number of usable features is quite small. +Fortunately, condos have two qualities which make modeling a bit easier: 1. Condos are more homogeneous than single/multi-family properties, i.e. the range of potential condo sale prices is much narrower. @@ -89,18 +97,10 @@ We leverage these qualities to produce what we call ***strata***, a feature unique to the condo model. See [Condo Strata](#condo-strata) for more information about how strata is used and calculated. -> :warning: **NOTE** :warning: -> -> Recently, the CCAO has started to manually collect high-level -> condominium data, including total building square footage and -> estimated unit square footage/number of bedrooms. This data is sourced -> from listings and a number of additional third-party sources and is -> available for the North and South triads only. - ### Features Used -Because our office (mostly) cannot observe individual condo unit -characteristics, we must rely on aggregate geospatial features, economic +Because our individual condo unit characteristics are sparse and +incomplete, we must rely on aggregate geospatial features, economic features, [strata](#condo-strata), and time of sale to determine condo assessed values. The features in the table below are the ones used in the 2023 assessment model. @@ -117,8 +117,6 @@ the 2023 assessment model. | Condominium Unit Full Baths | Characteristic | numeric | X | | Condominium Building Is Mixed Use | Characteristic | logical | X | | Condominium % Ownership | Meta | numeric | X | -| Condominium Building Strata 1 | Meta | character | X | -| Condominium Building Strata 2 | Meta | character | X | | Land Square Feet | Characteristic | numeric | | | Township Code | Meta | character | | | Neighborhood Code | Meta | character | | @@ -133,7 +131,6 @@ the 2023 assessment model. | Percent Population Age, Under 19 Years Old | acs5 | numeric | | | Percent Population Age, Over 65 Years Old | acs5 | numeric | | | Median Population Age | acs5 | numeric | | -| Percent Population Mobility, In Same House 1 Year Ago | acs5 | numeric | | | Percent Population Mobility, Moved From Other State in Past Year | acs5 | numeric | | | Percent Households Family, Married | acs5 | numeric | | | Percent Households Nonfamily, Living Alone | acs5 | numeric | | @@ -150,24 +147,23 @@ the 2023 assessment model. | Percent Occupied Households, Owner | acs5 | numeric | | | Percent Occupied Households, Total, One or More Selected Conditions | acs5 | numeric | | | Percent Population Mobility, Moved From Within Same County in Past Year | acs5 | numeric | | +| Corner Lot | ccao | logical | | +| Active Homeowner Exemption | ccao | logical | | +| Number of Years Active Homeowner Exemption | ccao | numeric | | | Longitude | loc | numeric | | | Latitude | loc | numeric | | -| Municipality Name | loc | character | | -| FEMA Special Flood Hazard Area | loc | logical | | +| Census Tract GEOID | loc | character | | | First Street Factor | loc | numeric | | -| First Street Risk Direction | loc | numeric | | | School Elementary District GEOID | loc | character | | | School Secondary District GEOID | loc | character | | +| Municipality Name | loc | character | | | CMAP Walkability Score (No Transit) | loc | numeric | | | CMAP Walkability Total Score | loc | numeric | | -| Airport Noise DNL | loc | numeric | | | Property Tax Bill Aggregate Rate | other | numeric | | | Number of PINs in Half Mile | prox | numeric | | | Number of Bus Stops in Half Mile | prox | numeric | | | Number of Foreclosures Per 1000 PINs (Past 5 Years) | prox | numeric | | | Number of Schools in Half Mile | prox | numeric | | -| Number of Schools with Rating in Half Mile | prox | numeric | | -| Average School Rating in Half Mile | prox | numeric | | | Nearest Bike Trail Distance (Feet) | prox | numeric | | | Nearest Cemetery Distance (Feet) | prox | numeric | | | Nearest CTA Route Distance (Feet) | prox | numeric | | @@ -179,7 +175,12 @@ the 2023 assessment model. | Nearest Metra Stop Distance (Feet) | prox | numeric | | | Nearest Park Distance (Feet) | prox | numeric | | | Nearest Railroad Distance (Feet) | prox | numeric | | +| Nearest Secondary Road Distance (Feet) | prox | numeric | | +| Nearest University Distance (Feet) | prox | numeric | | +| Nearest Vacant Land Parcel Distance (Feet) | prox | numeric | | | Nearest Water Distance (Feet) | prox | numeric | | +| Nearest Golf Course Distance (Feet) | prox | numeric | | +| Total Airport Noise DNL | prox | numeric | | ### Valuation @@ -211,10 +212,29 @@ Visually, this looks like: ![](docs/figures/valuation_perc_owner.png) +For what the office terms “nonlivable” spaces, i.e. parking spaces, +storage space, and common area, the breakout of value works differently. +See [this excel sheet](docs/spreadsheets/condo_nonlivable_demo.xlsx) for +an interactive example of how nonlivable spaces are valued based on the +total value of a building’s livable space. + Percentage of ownership is the single most important feature in the condo model. It determines almost all intra-building differences in unit values. +### Multisales + +The condo model is trained on a select number of “multisales” in +addition to single-parcel sales. Multisales are sales that include more +than one parcel and rarely reflect the accurate market price the +included parcels would fetch if they were sold individually. In the case +of condominiums, however, many are sold bundled with deeded parking +spaces (“nonlivable” parcels) that are separate parcels and these +two-parcel sales are highly reflective of a livable parcel’s actual +market price. We split the total value of these two-parcel sales +according to their relative percent of ownership before using them for +training. + ## Condo Strata The condo model uses an engineered feature called *strata* to deliver @@ -398,6 +418,14 @@ running the model pipeline. - [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/land_nbhd_rate_data.parquet) - [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/training_data.parquet) +#### 2024 + +- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/assessment_data.parquet) +- [char_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/char_data.parquet) +- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/condo_strata_data.parquet) +- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/land_nbhd_rate_data.parquet) +- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/training_data.parquet) + For other data from the CCAO, please visit the [Cook County Data Portal](https://datacatalog.cookcountyil.gov/). diff --git a/docs/spreadsheets/condo_nonlivable_demo.xlsx b/docs/spreadsheets/condo_nonlivable_demo.xlsx new file mode 100644 index 0000000000000000000000000000000000000000..9ed84289635e27faf6401ce3894a6a436004e9c1 GIT binary patch literal 10144 zcmeHNWm_EE(j9^X4eky@(BSUw?oP1aK6r2s?m>b(1cF0wC&As_9fJEi$(3_X&V7Hu zz5Sv4>FHiI&#tbus`jp}C<6(F0e}I(0ssI~Kn9()ofQNCfC3EwpaWoE=!n?af=q2e z2CDA%rcQeFZZ_5=IZ!XCvH&l@@Bi=kFZMuLLWg`8BVzE0WV1*jt!1@?AnfM_!*#R*dDG;ik1x_*DH zCPW)y)9=G{7GW}xT42rOftB_<9sj9cH)FN_iWG)bYsg05VMHMdq zQo^Uj`{Db-5^v;wKk3yvYgsrN1`m0i>*wI)I|t{NG_M_##2w1kx=~%GE~l-)~S)2M^cF+QwkfSwKc>Olu`8l&cf)LU-O64U%u zK7;S*zU*~*$m@LterHV>)~}rG3F3@l@48iyuFC?ih18dc20Wp=wi$8_W2TD|VdJia zSY1bk#z*9n%d$Ci_AE_m?_{;rip6K}#s`{>;?!A|LMRL-T=9)GDIl%dEpO{EcYEjb zhc$#tcjFKvb$w$f-F6X25Z-9G`!lYKE03W{sVqy4uF;npOf80fnkK*~AKGh!M908G zxPY$gl>8pLc&#BiJx`LFk|FMD-~s!5oWzDiex%nsqt}()bG&RGPr%-(P!yj)V*+Wk z{9~T~8Ao6geJrn?mYUU#e}p~@er6NS@%hJXbn!~hZ`7e;v-wtXnoQj6`@;Br8XA|NQ%NhqOe|jb(y_Nw zG+ogGDbFh zux?i-PY&1X@H!jUYWOwoBs!gJIthb~>^q^Up?He4aX0m(k=&zpL z5%`ZW)G~^U8wdpegd+g}IN&$_DfCX}rlue#hF=$^pNgLzKUbc{h!T84dr3&QQjLQq ztk|!_KIx?4+~N{}UiLP9ez>6I#2W@yYFC$Jn)>OE!pSHiB415jM(qqEv!R1wLMA5>g%4M+J)|OMB0YDUNs&4 zjmOQC~xZa%)N5Beegt6v4tcx^ZYdMMd zL)7#7)1l5#MvCIy+|=-l-*q$f<_TBGwMaJelxUvPGn%bq`!$6?6Q=BiIa zSIt*b1=IvOtVmm`t6s>Ye<@p#wYiQm8>Atq_kL1UH>Q{Mi;zp8e z{&@N678shIL5yyeDi?-6Sg7TLO80v1%QHf>M5FJknev$JQVa#jxSm=3=+xWW*Eb!D zWwH4KBh8zGjzvO@C4ElClLhkcgCzJ-5GJ{#Re7t~cAi^ER~(}oeZHYuEDy6=B5*HIo%wWP}g+I9dLbHp6pH;!+h%kBgY=c8d7wiuDJHzJpKvHFEe^3dEV^ znR`e;xsno&fu8>MYMqw;#0JN&F#C1NOB7NFV8rl!hPQB0L>5Fq6!#USTI9v`0T|Mq zGb7QdJh#}az+XU&Ss>%Dc~y*7R7B?q)E%YF>~PC`tw%1Efum4=*3a~Cz_bgylzRLc zc5Tpnly(J@53{QFHIX8plYlD(0+V1b){iN2Ssm!L)>$=x27Sz{f$5J^(^N&Q-84G7c!<=G}(Wvl9 z$03SaQ{8qx_?ZYUs;DHV?fS8VAWujLNqVIf3tToGerxK953HrEX=;r6?D<02Q&jt6*~CyDGs!+0=SZ&LU_kpQ0#i>_$CN-O zSn~qTr@{qrj^iEmMc2b{i`LtL=Bg9Xf^3xP!@{60acnesxQo7>lvTKtqmv#Pq*d>ilB2CuV;*JEP^S7WJK|d@^8=whr$8EB>Xw4y zwvka^+|o$e5;fGUEH5T0)8NI%?!w$|@(#8swpm_5((KQ)_4n*oDTZ{Y`w{?nLHcv8 z<(G1S%uQ`f8GfCAiPoW(Of)_RTI{WBbm0~V3a#2NY*3sDVj4bUDsdpus zWi;Mp^!jbLK*QYxSSj~;xRMzf2(rCE1q<#IDCE9eWm-YyHP8oK1u05RRNTepvondA zvy2Z?!YKWPj>xY5n#Dc(jdYZZ5nF7nB-HOYHpn{QiF%rOF>eqrUYF^Xs^V=yO|S}5 z%L^X0%tgqOMyn;>v6%2kny3Ea(DXanRhODyUpE(!-}^9?7J4 zt`&PZs#B>tiIzbz1}RN;LS@jcgg(v{&U>&2wp}~aP)sFH9f7$Ez5jy$-G!ziNN|S< zxb~t@x6?jf9anAtRO2~#J3fqlnS3^Gn%!T`(Mob5cvEyVXdkLx$m*?-KEKb?`kVqo z#^Zetwn4Mol^1`#(pLY`2FP~c@v5TY`96AN%*Xp-$8P0;E$xn1fBr^e#p`x|$G}^G zudKnoXQ1uzbia^T|LL*RpzX0Yw}tk!5{Y;sP!g1!p8b;758C69vTuC|*vnK91r;S! zFL9~Z^Xl{%o`twvLKmAKt(}v9_4Af--*m1f9p>fWdrEnNRnncZzMvTh*9+aGkiG%X zai%Vv%_3xkf|t{^!#ECI_2OF9iNCAE^IPoDzpS=q6QG+`Ujc-U&( z2*zDP4pD)Aa>QO`ykyWQzZ?lYZKfZJWgkaKv|=>NO3BO3NwLnl_Ce!^kBohh^{2{f z-Gz=3=|lQNb!?`3l<^zL(lUZIFl?$oHf=6!gke9_thC zzFhUOkly3b4ppb~vJYK ziTYGyWN0pvEiOgVvUhd&G3&*09?oM>Ipos)M4o{uk51=e(Tbqts*Ekq#gMpc z9FLCgmMk0wd%N!Zoez$?Y?J7C+(V#ucrHIPU%)`4Gi*id3g>X3&+aROP;hf4BFcS` z7fbJ8N|xkDrr;M{67?kOfo&*bJ0++(9$W8@3*K}B#Nt#=fUI6NXHMmZVNU|4(QUqs zf&P6JH+s^a^b)Fk%G>9H?je)dAe&`R&^e{*xDHB68cQtw1-#U%tr!fZm^Eq#)!uA) z-g_|A$||)AnyP@_%RW;!D&CukK%ebt5-fL+m_A|Sp*XM^qfq_Iytbt2Qwk4- zdXF2$e1u4!wCS#R_3pmSWE%h8BFJb4#(a*0sWWwuE$5gir;#Wy^dSTFooTX>*n+W` zOsPSLC1Fn_ZF`NJ>gpwahm0p>XlGe9Vk8&UR3W##R=hi*9$~snRw%)^UjZp{VLTa!8}6pGLTc`mvP46UmJ1k#Y93!${>Z{ER?X31R_s2u#bKZc}D=Xgv6XO)Gs<`ST9~vSS|7P zCW5wk1@El>rIl1`pgnQV1Pf>|X&oXMBRo~@sGT~|K1^|_nJeF3q>i+T?0J|_mg{I< zL-ln?mBSzw)35?Y+8|2<=~yJ!w#5eI0kr`OCukw4&&jL3^}P80E?3)gm~3c$5Tq%V znd;5#Nt!_GLvs_`?;e5n4Bb4Kz5H^$k6`K=0Gw61nKLQqS%cfp-f9+)rvn!uHO0)BR!09&zxQ59TI z8Mv^djKNsPQqokQuAytg&%^PTiY|RI2*1mhro~I@^w6ZP&9QBnjLh+H#!;gU&_ALk zpres5QF3V!@IOGT`_6DXbTcjHQ`K78$i60(k{13@#k8nopJivR>GCBihcha}ZwuP8I__K2E{7G`BxEMqJ6g^v) z_Gf(drq716Ua85ft3eUcj%~iBk&zrv;FSCKWV}kWT>}rEM%W_)0N^6v=QP3zeWf{N-7Ys%sQ0-nnvfzWj*J#VAWe2TBtDFk4ViCP5^DjPHZ|7SjC* z50E0<_86Cu4jI*Mhi9cRE`wl(LWDz9UG=mW`%)MgH6twj$K~Yx`^Jk9q7sa4u6O?N z#r5QguDthWAyWA+5xTnQIVggsmiv0NwYV>Z67}NIH25k>30&YwWy;>t%m~%SLhU4C z(P*KBtdUSLZDokzB+sb`O4GI3a(KV~j-(C}!&T!6*ve+cZ!>TrJ^KJL#Vu_gyl+lo ztU90`rqz;Pn3sZ)NTrHw52=K~N?yMg<*(#h{R%F|AQ&TG2#0u9e*l||!Vk!y8Zwe5 zcT=aER3e-*>j=HJ$Uxe3&Y$TawN8qk5Y*oxr%~bkWK28RlzF&=#C^B5OLZ8uHg(`y zf7RM->DsnfX@&a!o{OIy(5ByV3a`H>!eOn9Y0%|s!tC7 z;9H7uoRqi1WYBxiHAC=y6qS_T=;5d97ZBp9D|{M>e-ep zI74jVmz_xGI>^)|PF#oi?k5uGOH?HWT4-<_1vB~OHmK-}*He`r;0s|>xS zC3V_$vrU}Mp9_o?_G_ZhGdQR~!@p@mlNQqWDChWL#^1Q$yfM)n29So@OApd`XyH7y z(&L z$(bx1E1R-FGMT|k!n{BGJj459re%6ADl;U8Yvh`xB)p06Sd_YTW#x0tf!jpVvGEUU zy70q5DOb>I?{D+oY|UR)$zaPloZ z>e6nFN2|hYQbcaQsDu$kq z`cR=e&NHw`8RN(-6J*x~zSHR#ZUd8B-cI|@8_~eR1vFN+!r{G0MHnOS56E8)Bp_{o z30)!Z8Lv*obfgNG4b0B#b0?}z50SfWM-}dXmlAsqlM3-Y4;LUV_znBB`g7g~>}Zkm z)F8ZwIES89IBI-j{2$m$^OH`mlUf_=ed`9iFpxXrS>Yc|1tK0yd_$h>M-?7T@p=ja zOuUomRYE-PIweeSL}R_8EPw22j>R#4U(l>Yb<`l0Wb|qe&59HqeN_GU2wtK3yJ!eE zno3fDi+f{mCkEr6qG4|6XlkMYa*e0H{LBqnowF1O+9%|s4J<{@NIi;r5VPpN#q(vPE(6eH|<~Ik|c`LlqlEH6(rXq zregzx*aN8-vW&Y9FMtliA{TKoT1Q@>_o%iP{3HtOT9kjd)!5M5Sj61W!uF>eTgS-3cQT>`UHiMjR?ke{1xQZN zYdgK*6O`<$i^+_G8e726{#>+#IGE;sk_MzD(vIcfUeW4$TP^RK6K+qLSkR!WXS z99SGw1%t~wCKAhloq{2q^m!LC46c{OTx3@v!DB85%iG42)xoh^J{(17MsbA)qt*^J zK%G@C4swbK@Iw}@Z?IFpMwUs z_Wrj53J!+AI0Kn3MvR^n=r)mW@%%Lw^J~Bw!Y4z8SE97&E5eR5&}lWmG24~pnuYyd zDB^ghV}>2wxFPnESFIhIbGrnwRN@fk%do|z1TBf@r&!R_Om8%o%=^c%=u$D0v2pFF z?GSm>3ZOTDv+eOU`OQOP^NmSH*0*>V)v>PT4CQ#aAI~LJ){@Q?)FI^8Z(wNcWf}YI zkr5YSL$4$|Buws2_aR6)gYhuT&6vVqpDwu>zdvy?Qc1{DEY)3px^jq~^FQexYpHZ( zb#wlZy%v~G(IVYhP>Dt+bb*> zqp(=h!Zcg9w@MvL=yb%t0dk7LrdU>#np+!`^{o$a$Wwa(ci@ROkf`wOrw`&>(wZ&S zvca=3c~l1P-}k`yDxLyy!bLcz^{nBA);meYDBv;aX1|3O+o=a8K?dKmMJaAzr)zXX5|7o9SO) z>|fXau(wH3=C20+y4B)ez@OJV@L&FE2hZSqoUsir!wEW$`?^CNk4BUbnHh-CD z{SN)T!1@D9Pxue$@8#C-7JjcK{;&{4^vl9uDvRIIf6dB&U;zMkuxkD-NB<80tA_p= f4o~(c_&*g@Q3e_u8$b8-AOKpyWxyct=iUDSo2soJ literal 0 HcmV?d00001 From 9490e010dfdaeacb450fc28b1883e873e8d29c8f Mon Sep 17 00:00:00 2001 From: Sweaty Handshake Date: Mon, 11 Mar 2024 18:35:58 +0000 Subject: [PATCH 2/9] Improve multisale language --- README.Rmd | 5 ++++- README.md | 15 +++++++++------ 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/README.Rmd b/README.Rmd index 15018a6..a490e62 100644 --- a/README.Rmd +++ b/README.Rmd @@ -134,7 +134,10 @@ Percentage of ownership is the single most important feature in the condo model. ### Multisales -The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the accurate market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many are sold bundled with deeded parking spaces ("nonlivable" parcels) that are separate parcels and these two-parcel sales are highly reflective of a livable parcel's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. +The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the accurate market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many units are sold bundled with deeded parking spaces that are separate parcels and these two-parcel sales are highly reflective of the unit's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For example, a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale is adjusted to \$80,000: +$$ +\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000 +$$ ## Condo Strata diff --git a/README.md b/README.md index 04e6cbd..57034aa 100644 --- a/README.md +++ b/README.md @@ -228,12 +228,15 @@ The condo model is trained on a select number of “multisales” in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the accurate market price the included parcels would fetch if they were sold individually. In the case -of condominiums, however, many are sold bundled with deeded parking -spaces (“nonlivable” parcels) that are separate parcels and these -two-parcel sales are highly reflective of a livable parcel’s actual -market price. We split the total value of these two-parcel sales -according to their relative percent of ownership before using them for -training. +of condominiums, however, many units are sold bundled with deeded +parking spaces that are separate parcels and these two-parcel sales are +highly reflective of the unit’s actual market price. We split the total +value of these two-parcel sales according to their relative percent of +ownership before using them for training. For example, a \$100,000 sale +of a unit (4% ownership) and a parking space (1% ownership), the sale is +adjusted to \$80,000: $$ +\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000 +$$ ## Condo Strata From b04733c4bba07fd0a057bfbf967065e8846dd875 Mon Sep 17 00:00:00 2001 From: Sweaty Handshake Date: Mon, 11 Mar 2024 18:40:16 +0000 Subject: [PATCH 3/9] Trying to fix github rendering of formula --- README.Rmd | 4 +--- README.md | 5 ++--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/README.Rmd b/README.Rmd index a490e62..82c88a0 100644 --- a/README.Rmd +++ b/README.Rmd @@ -135,9 +135,7 @@ Percentage of ownership is the single most important feature in the condo model. ### Multisales The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the accurate market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many units are sold bundled with deeded parking spaces that are separate parcels and these two-parcel sales are highly reflective of the unit's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For example, a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale is adjusted to \$80,000: -$$ -\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000 -$$ +$$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$ ## Condo Strata diff --git a/README.md b/README.md index 57034aa..da3b0b3 100644 --- a/README.md +++ b/README.md @@ -234,9 +234,8 @@ highly reflective of the unit’s actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For example, a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale is -adjusted to \$80,000: $$ -\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000 -$$ +adjusted to \$80,000: +$$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$ ## Condo Strata From 37e067611cf4e18b3335665cfa05dc61ac53a916 Mon Sep 17 00:00:00 2001 From: Sweaty Handshake Date: Mon, 11 Mar 2024 18:41:46 +0000 Subject: [PATCH 4/9] Trying to fix github rendering of formula --- README.Rmd | 1 + README.md | 1 + 2 files changed, 2 insertions(+) diff --git a/README.Rmd b/README.Rmd index 82c88a0..0ba27a7 100644 --- a/README.Rmd +++ b/README.Rmd @@ -135,6 +135,7 @@ Percentage of ownership is the single most important feature in the condo model. ### Multisales The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the accurate market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many units are sold bundled with deeded parking spaces that are separate parcels and these two-parcel sales are highly reflective of the unit's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For example, a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale is adjusted to \$80,000: + $$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$ ## Condo Strata diff --git a/README.md b/README.md index da3b0b3..234bc4d 100644 --- a/README.md +++ b/README.md @@ -235,6 +235,7 @@ value of these two-parcel sales according to their relative percent of ownership before using them for training. For example, a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale is adjusted to \$80,000: + $$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$ ## Condo Strata From 3aef95371043606574b6ec5927e399e90ee8106d Mon Sep 17 00:00:00 2001 From: Sweaty Handshake Date: Mon, 11 Mar 2024 18:43:28 +0000 Subject: [PATCH 5/9] Improve multisale language --- README.Rmd | 2 +- README.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.Rmd b/README.Rmd index 0ba27a7..3cdd7e5 100644 --- a/README.Rmd +++ b/README.Rmd @@ -134,7 +134,7 @@ Percentage of ownership is the single most important feature in the condo model. ### Multisales -The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the accurate market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many units are sold bundled with deeded parking spaces that are separate parcels and these two-parcel sales are highly reflective of the unit's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For example, a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale is adjusted to \$80,000: +The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the accurate market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many units are sold bundled with deeded parking spaces that are separate parcels and these two-parcel sales are highly reflective of the unit's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale would be adjusted to \$80,000: $$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$ diff --git a/README.md b/README.md index 234bc4d..9f8c9f8 100644 --- a/README.md +++ b/README.md @@ -232,8 +232,8 @@ of condominiums, however, many units are sold bundled with deeded parking spaces that are separate parcels and these two-parcel sales are highly reflective of the unit’s actual market price. We split the total value of these two-parcel sales according to their relative percent of -ownership before using them for training. For example, a \$100,000 sale -of a unit (4% ownership) and a parking space (1% ownership), the sale is +ownership before using them for training. For a \$100,000 sale of a unit +(4% ownership) and a parking space (1% ownership), the sale would be adjusted to \$80,000: $$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$ From 9d2be18e52c7986bb75be410595b6cfeceb6a8f4 Mon Sep 17 00:00:00 2001 From: Sweaty Handshake Date: Mon, 11 Mar 2024 18:47:01 +0000 Subject: [PATCH 6/9] Remove superfluous language --- README.Rmd | 2 +- README.md | 17 ++++++++--------- 2 files changed, 9 insertions(+), 10 deletions(-) diff --git a/README.Rmd b/README.Rmd index 3cdd7e5..fc05d0a 100644 --- a/README.Rmd +++ b/README.Rmd @@ -134,7 +134,7 @@ Percentage of ownership is the single most important feature in the condo model. ### Multisales -The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the accurate market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many units are sold bundled with deeded parking spaces that are separate parcels and these two-parcel sales are highly reflective of the unit's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale would be adjusted to \$80,000: +The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many units are sold bundled with deeded parking spaces that are separate parcels and these two-parcel sales are highly reflective of the unit's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale would be adjusted to \$80,000: $$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$ diff --git a/README.md b/README.md index 9f8c9f8..e8c97ba 100644 --- a/README.md +++ b/README.md @@ -226,15 +226,14 @@ values. The condo model is trained on a select number of “multisales” in addition to single-parcel sales. Multisales are sales that include more -than one parcel and rarely reflect the accurate market price the -included parcels would fetch if they were sold individually. In the case -of condominiums, however, many units are sold bundled with deeded -parking spaces that are separate parcels and these two-parcel sales are -highly reflective of the unit’s actual market price. We split the total -value of these two-parcel sales according to their relative percent of -ownership before using them for training. For a \$100,000 sale of a unit -(4% ownership) and a parking space (1% ownership), the sale would be -adjusted to \$80,000: +than one parcel and rarely reflect the market price the included parcels +would fetch if they were sold individually. In the case of condominiums, +however, many units are sold bundled with deeded parking spaces that are +separate parcels and these two-parcel sales are highly reflective of the +unit’s actual market price. We split the total value of these two-parcel +sales according to their relative percent of ownership before using them +for training. For a \$100,000 sale of a unit (4% ownership) and a +parking space (1% ownership), the sale would be adjusted to \$80,000: $$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$ From 39bab02b1cc319884817cfde9fc7a41e0b2266d9 Mon Sep 17 00:00:00 2001 From: Sweaty Handshake Date: Mon, 18 Mar 2024 13:29:28 +0000 Subject: [PATCH 7/9] Fix line ending --- .gitignore | 58 +++--- README.Rmd | 542 ++++++++++++++++++++++++++--------------------------- 2 files changed, 300 insertions(+), 300 deletions(-) diff --git a/.gitignore b/.gitignore index 15cd087..3a4727e 100644 --- a/.gitignore +++ b/.gitignore @@ -1,29 +1,29 @@ -# History files -.Rhistory -.Rapp.history - -# R project files -.Rproj.user/ -reports/*_files/ - -# knitr and R markdown default cache directories -*_cache/ -cache/ - -# Temporary files created by R markdown -*.utf8.md -*.knit.md - -# Ignore all data files -*.parquet -*.rds -*.zip -*.csv -*.xlsx -!condo_nonlivable_demo.xlsx -*.xlsm -*.html -*.rmarkdown - -# Ignore scratch documents -scratch*.* +# History files +.Rhistory +.Rapp.history + +# R project files +.Rproj.user/ +reports/*_files/ + +# knitr and R markdown default cache directories +*_cache/ +cache/ + +# Temporary files created by R markdown +*.utf8.md +*.knit.md + +# Ignore all data files +*.parquet +*.rds +*.zip +*.csv +*.xlsx +!condo_nonlivable_demo.xlsx +*.xlsm +*.html +*.rmarkdown + +# Ignore scratch documents +scratch*.* diff --git a/README.Rmd b/README.Rmd index fc05d0a..c7e7d29 100644 --- a/README.Rmd +++ b/README.Rmd @@ -1,271 +1,271 @@ ---- -title: "Table of Contents" -output: - github_document: - toc: true - toc_depth: 3 ---- - - - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.path = "docs/figures/", - out.width = "100%" -) -``` - -> :warning: **NOTE** :warning: -> -> The [condominium model](https://github.com/ccao-data/model-condo-avm) (this repo) is nearly identical to the [residential (single/multi-family) model](https://github.com/ccao-data/model-res-avm), with a few [key differences](#differences-compared-to-the-residential-model). Please read the documentation for the [residential model](https://github.com/ccao-data/model-res-avm) first. - -# Prior Models - -This repository contains code, data, and documentation for the Cook County Assessor's condominium reassessment model. Information about prior year models can be found at the following links: - -| Year(s) | Triad(s) | Method | Language / Framework | Link | -|---------|----------|---------------------------------------------|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| -| 2015 | City | N/A | SPSS | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev/-/tree/master/code.legacy/2015%20City%20Tri/2015%20Condo%20Models) | -| 2018 | City | N/A | N/A | Not available. Values provided by vendor | -| 2019 | North | Linear regression or GBM model per township | R (Base) | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev) | -| 2020 | South | Linear regression or GBM model per township | R (Base) | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev) | -| 2021 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2021-assessment-year) | -| 2022 | North | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2022-assessment-year) | -| 2023 | South | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2023-assessment-year) | -| 2024 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2024-assessment-year) | - -# Model Overview - -The duty of the Cook County Assessor's Office is to value property in a fair, accurate, and transparent way. The Assessor is committed to transparency throughout the assessment process. As such, this document contains: - -* [A description of the differences between the residential model and this (condominium) model](#differences-compared-to-the-residential-model) -* [An outline of ongoing issues specific to condominium assessments](#ongoing-issues) - -The repository itself contains the [code](./pipeline) for the Automated Valuation Model (AVM) used to generate initial assessed values for all condominium properties in Cook County. This system is effectively an advanced machine learning model (hereafter referred to as "the model"). It uses previous sales to generate estimated sale values (assessments) for all properties. - -## Differences Compared to the Residential Model - -The Cook County Assessor's Office has begun to track a limited number of characteristics (building-level square footage and unit-level square footage, bedrooms, and bathrooms) for condominiums, but the data we have ***varies in both the characteristics available and their completeness*** between triads. Staffing limitations have forced the office to prioritizes smaller condo buildings less likely to have recent unit sales in certain parts of the county. Like most assessors nationwide, our office staff cannot enter buildings to observe property characteristics. For condos, this means we cannot observe amenities, quality, or any other interior characteristics which must instead be gathered from listings and a number of additional third-party sources. - -The only complete information our office currently has about individual condominium units is their age, location, sale date/price, and percentage of ownership. This makes modeling condos particularly challenging, as the number of usable features is quite small. Fortunately, condos have two qualities which make modeling a bit easier: - -1. Condos are more homogeneous than single/multi-family properties, i.e. the range of potential condo sale prices is much narrower. -2. Condo are pre-grouped into clusters of like units (buildings), and units within the same building usually have similar sale prices. - -We leverage these qualities to produce what we call ***strata***, a feature unique to the condo model. See [Condo Strata](#condo-strata) for more information about how strata is used and calculated. - -### Features Used - -Because our individual condo unit characteristics are sparse and incomplete, we must rely on aggregate geospatial features, economic features, [strata](#condo-strata), and time of sale to determine condo assessed values. The features in the table below are the ones used in the 2023 assessment model. - -```{r features_used, message=FALSE, echo=FALSE} -library(dplyr) -library(tidyr) -library(yaml) - -condo_params <- read_yaml("params.yaml") -condo_preds <- condo_params$model$predictor$all - -res_params <- read_yaml( - "https://raw.githubusercontent.com/ccao-data/model-res-avm/master/params.yaml" -) -res_preds <- res_params$model$predictor$all - -condo_unique_preds <- setdiff(condo_preds, res_preds) - -ccao::vars_dict %>% - inner_join( - as_tibble(condo_preds), - by = c("var_name_model" = "value") - ) %>% - distinct( - var_name_model, - `Feature Name` = var_name_pretty, - Category = var_type, - Type = var_data_type, - ) %>% - mutate( - Category = recode( - Category, - char = "Characteristic", - econ = "Economic", - geo = "Geospatial", - ind = "Indicator", - time = "Time", - meta = "Meta" - ), - `Feature Name` = recode( - `Feature Name`, - "Tieback Proration Rate" = "Condominium % Ownership", - "Year Built" = "Condominium Building Year Built" - ) - ) %>% - mutate(`Unique to Condo Model` = ifelse( - var_name_model %in% condo_unique_preds | - `Feature Name` %in% - c("Condominium Building Year Built", "Condominium % Ownership"), - "X", "" - )) %>% - arrange(desc(`Unique to Condo Model`), Category) %>% - select(-var_name_model) %>% - knitr::kable(format = "markdown") -``` - -### Valuation - -For the most part, condos are valued the same way as single- and multi-family residential property. We [train a model](https://github.com/ccao-data/model-res-avm#how-it-works) using individual condo unit sales, predict the value of all units, and then apply any [post-modeling adjustment](https://github.com/ccao-data/model-res-avm#post-modeling). - -However, because the CCAO has so [little information about individual units](#differences-compared-to-the-residential-model), we must rely on the [condominium percentage of ownership](#features-used) to differentiate between units in a building. This feature is effectively the proportion of the building's overall value held by a unit. It is created when a condominium declaration is filed with the County (usually by the developer of the building). The critical assumption underlying the condo valuation process is that percentage of ownership correlates with current market value. - -Percentage of ownership is used in two ways: - -1. It is used directly as a predictor/feature in the regression model to estimate differing unit values within the same building. -2. It is used to reapportion unit values directly i.e. the value of a unit is ultimately equal to `% of ownership * total building value`. - -Visually, this looks like: - -![](docs/figures/valuation_perc_owner.png) - -For what the office terms "nonlivable" spaces, i.e. parking spaces, storage space, and common area, the breakout of value works differently. See [this excel sheet](docs/spreadsheets/condo_nonlivable_demo.xlsx) for an interactive example of how nonlivable spaces are valued based on the total value of a building's livable space. - -Percentage of ownership is the single most important feature in the condo model. It determines almost all intra-building differences in unit values. - -### Multisales - -The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many units are sold bundled with deeded parking spaces that are separate parcels and these two-parcel sales are highly reflective of the unit's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale would be adjusted to \$80,000: - -$$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$ - -## Condo Strata - -The condo model uses an engineered feature called *strata* to deliver much of its predictive power. Strata is the binned, time-weighted, 5-year average sale price of the building. There are two strata features used in the model, one with 10 bins and one with 300 bins. Buildings are binned across each triad using either quantiles or 1-dimensional k-means. A visual representation of quantile-based strata binning looks like: - -![](docs/figures/strata.png) - -To put strata in more concrete terms, the table below shows a sample 5-level strata. Each condominium unit would be assigned a strata from this table (Strata 1, Strata 2, etc.) based on the 5-year weighted average sale price of its building. All units in a building will have the same strata. - -```{r strata, echo=FALSE} -library(tibble) - -tribble( - ~"Strata", ~"Range of 5-year Average Sale Price", - "Strata 1", "$0 - $121K", - "Strata 2", "$121K - $149K", - "Strata 3", "$149K - $199K", - "Strata 4", "$199K - $276K", - "Strata 5", "$276K+" -) %>% - knitr::kable(format = "markdown") -``` - -Some additional notes on strata: - -- Strata is calculated in the [ingest stage](./pipeline/00-ingest.R) of this repository. -- Calculating the 5-year average sale price of a building requires at least 1 sale. Buildings with no sales have their strata imputed via KNN (using year built, number of units, and location as features). -- Number of bins (10 and 100) was chosen based on model performance. These numbers yielded the lowest root mean-squared error (RMSE). - -# Ongoing Issues - -The CCAO faces a number of ongoing issues specific to condominium modeling. We are currently working on processes to fix these issues. We list the issues here for the sake of transparency and to provide a sense of the challenges we face. - -### Unit Heterogeneity - -The current modeling methodology for condominiums makes two assumptions: - -1. Condos units within the same building are similar and will sell for similar amounts. -2. If units are not similar, the percentage of ownership will accurately reflect and be proportional to any difference in value between units. - -The model process works even in heterogeneous buildings as long as assumption 2 is met. For example, imagine a building with 8 identical units and 1 penthouse unit. This building violates assumption 1 because the penthouse unit is likely larger and worth more than the other 10. However, if the percentage of ownership of each unit is roughly proportional to its value, then each unit will still receive a fair assessment. - -However, the model can produce poor results when both of these assumptions are violated. For example, if a building has an extreme mix of different units, each with the same percentage of ownership, then smaller, less expensive units will be overvalued and larger, more expensive units will be undervalued. - -This problem is rare, but does occur in certain buildings with many heterogeneous units. Such buildings typically go through a process of secondary review to ensure the accuracy of the individual unit values. - -### Buildings With Few Sales - -The condo model relies on sales within the same building to calculate [strata](#condo-strata). This method works well for large buildings with many sales, but can break down when there are only 1 or 2 sales in a building. The primary danger here is _unrepresentative_ sales, i.e. sales that deviate significantly from the real average value of a building's units. When this happens, buildings can have their average unit sale value pegged too high or low. - -Fortunately, buildings without any recent sales are relatively rare, as condos have a higher turnover rate than single and multi-family property. Smaller buildings with low turnover are the most likely to not have recent sales. - -### Buildings Without Sales - -When no sales have occurred in a building in the 5 years prior to assessment, the building's strata features are imputed. The model will look at nearby buildings that have similar unit counts/age and then try to assign an appropriate strata to the target building. - -Most of the time, this technique produces reasonable results. However, buildings without sales still go through an additional round of review to ensure the accuracy of individual unit values. - -# FAQs - -**Note:** The FAQs listed here are for condo-specific questions. See the residential model documentation for [more general FAQs](https://github.com/ccao-data/model-res-avm#faqs). - -**Q: What are the most important features in the condo model?** - -As with the [residential model](https://github.com/ccao-data/model-res-avm), the importance of individual features varies by location and time. However, generally speaking, the most important features are: - -* Location, location, location. Location is the largest driver of county-wide variation in condo value. We account for location using [geospatial features like neighborhood](#features-used). -* Condo percentage of ownership, which determines the intra-building variation in unit price. -* [Condo building strata](#condo-strata). Strata provides us with a good estimate of the average sale price of a building's units. - -**Q: How do I see my condo building's strata?** - -Individual building [strata](#condo-strata) are not included with assessment notices or shown on the CCAO's website. However, strata *are* stored in the sample data included in this repository. You can load the data ([`input/condo_strata_data.parquet`](./input/condo_strata_data.parquet)) using R and the `read_parquet()` function from the `arrow` library. - -**Q: How do I see the assessed value of other units in my building?** - -You can use the [CCAO's Address Search](https://www.cookcountyassessor.com/address-search#address) to see all the PINs and values associated with a specific condominium building, simply leave the `Unit Number` field blank when submitting a search. - -**Q: How do I view my unit's percentage of ownership?** - -The percentage of ownership for individual units is printed on assessment notices. You may also be able to find it via your building's board or condo declaration. - -# Usage - -Installation and usage of this model is identical to the [installation and usage of the residential model](https://github.com/ccao-data/model-res-avm#usage). Please follow the instructions listed there. - -## Getting Data - -The data required to run these scripts is produced by the [ingest stage](pipeline/00-ingest.R), which uses SQL pulls from the CCAO's Athena database as a primary data source. CCAO employees can run the ingest stage or pull the latest version of the input data from our internal DVC store using: - -```bash -dvc pull -``` - -Public users can download data for each assessment year using the links below. Each file should be placed in the `input/` directory prior to running the model pipeline. - -#### 2021 - -- [assmntdata.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2021/assmntdata.parquet) -- [modeldata.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2021/modeldata.parquet) - -#### 2022 - -- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/assessment_data.parquet) -- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/condo_strata_data.parquet) -- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/land_nbhd_rate_data.parquet) -- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/training_data.parquet) - -#### 2023 - -- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/assessment_data.parquet) -- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/condo_strata_data.parquet) -- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/land_nbhd_rate_data.parquet) -- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/training_data.parquet) - -#### 2024 - -- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/assessment_data.parquet) -- [char_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/char_data.parquet) -- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/condo_strata_data.parquet) -- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/land_nbhd_rate_data.parquet) -- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/training_data.parquet) - -For other data from the CCAO, please visit the [Cook County Data Portal](https://datacatalog.cookcountyil.gov/). - -# License - -Distributed under the AGPL-3 License. See [LICENSE](./LICENSE) for more information. - -# Contributing - -We welcome pull requests, comments, and other feedback via GitHub. For more involved collaboration or projects, please see the [Developer Engagement Program](https://github.com/ccao-data/people#external) documentation on our group wiki. +--- +title: "Table of Contents" +output: + github_document: + toc: true + toc_depth: 3 +--- + + + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + fig.path = "docs/figures/", + out.width = "100%" +) +``` + +> :warning: **NOTE** :warning: +> +> The [condominium model](https://github.com/ccao-data/model-condo-avm) (this repo) is nearly identical to the [residential (single/multi-family) model](https://github.com/ccao-data/model-res-avm), with a few [key differences](#differences-compared-to-the-residential-model). Please read the documentation for the [residential model](https://github.com/ccao-data/model-res-avm) first. + +# Prior Models + +This repository contains code, data, and documentation for the Cook County Assessor's condominium reassessment model. Information about prior year models can be found at the following links: + +| Year(s) | Triad(s) | Method | Language / Framework | Link | +|---------|----------|---------------------------------------------|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| +| 2015 | City | N/A | SPSS | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev/-/tree/master/code.legacy/2015%20City%20Tri/2015%20Condo%20Models) | +| 2018 | City | N/A | N/A | Not available. Values provided by vendor | +| 2019 | North | Linear regression or GBM model per township | R (Base) | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev) | +| 2020 | South | Linear regression or GBM model per township | R (Base) | [Link](https://gitlab.com/ccao-data-science---modeling/ccao_sf_cama_dev) | +| 2021 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2021-assessment-year) | +| 2022 | North | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2022-assessment-year) | +| 2023 | South | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2023-assessment-year) | +| 2024 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2024-assessment-year) | + +# Model Overview + +The duty of the Cook County Assessor's Office is to value property in a fair, accurate, and transparent way. The Assessor is committed to transparency throughout the assessment process. As such, this document contains: + +* [A description of the differences between the residential model and this (condominium) model](#differences-compared-to-the-residential-model) +* [An outline of ongoing issues specific to condominium assessments](#ongoing-issues) + +The repository itself contains the [code](./pipeline) for the Automated Valuation Model (AVM) used to generate initial assessed values for all condominium properties in Cook County. This system is effectively an advanced machine learning model (hereafter referred to as "the model"). It uses previous sales to generate estimated sale values (assessments) for all properties. + +## Differences Compared to the Residential Model + +The Cook County Assessor's Office has begun to track a limited number of characteristics (building-level square footage and unit-level square footage, bedrooms, and bathrooms) for condominiums, but the data we have ***varies in both the characteristics available and their completeness*** between triads. Staffing limitations have forced the office to prioritizes smaller condo buildings less likely to have recent unit sales in certain parts of the county. Like most assessors nationwide, our office staff cannot enter buildings to observe property characteristics. For condos, this means we cannot observe amenities, quality, or any other interior characteristics which must instead be gathered from listings and a number of additional third-party sources. + +The only complete information our office currently has about individual condominium units is their age, location, sale date/price, and percentage of ownership. This makes modeling condos particularly challenging, as the number of usable features is quite small. Fortunately, condos have two qualities which make modeling a bit easier: + +1. Condos are more homogeneous than single/multi-family properties, i.e. the range of potential condo sale prices is much narrower. +2. Condo are pre-grouped into clusters of like units (buildings), and units within the same building usually have similar sale prices. + +We leverage these qualities to produce what we call ***strata***, a feature unique to the condo model. See [Condo Strata](#condo-strata) for more information about how strata is used and calculated. + +### Features Used + +Because our individual condo unit characteristics are sparse and incomplete, we must rely on aggregate geospatial features, economic features, [strata](#condo-strata), and time of sale to determine condo assessed values. The features in the table below are the ones used in the 2023 assessment model. + +```{r features_used, message=FALSE, echo=FALSE} +library(dplyr) +library(tidyr) +library(yaml) + +condo_params <- read_yaml("params.yaml") +condo_preds <- condo_params$model$predictor$all + +res_params <- read_yaml( + "https://raw.githubusercontent.com/ccao-data/model-res-avm/master/params.yaml" +) +res_preds <- res_params$model$predictor$all + +condo_unique_preds <- setdiff(condo_preds, res_preds) + +ccao::vars_dict %>% + inner_join( + as_tibble(condo_preds), + by = c("var_name_model" = "value") + ) %>% + distinct( + var_name_model, + `Feature Name` = var_name_pretty, + Category = var_type, + Type = var_data_type, + ) %>% + mutate( + Category = recode( + Category, + char = "Characteristic", + econ = "Economic", + geo = "Geospatial", + ind = "Indicator", + time = "Time", + meta = "Meta" + ), + `Feature Name` = recode( + `Feature Name`, + "Tieback Proration Rate" = "Condominium % Ownership", + "Year Built" = "Condominium Building Year Built" + ) + ) %>% + mutate(`Unique to Condo Model` = ifelse( + var_name_model %in% condo_unique_preds | + `Feature Name` %in% + c("Condominium Building Year Built", "Condominium % Ownership"), + "X", "" + )) %>% + arrange(desc(`Unique to Condo Model`), Category) %>% + select(-var_name_model) %>% + knitr::kable(format = "markdown") +``` + +### Valuation + +For the most part, condos are valued the same way as single- and multi-family residential property. We [train a model](https://github.com/ccao-data/model-res-avm#how-it-works) using individual condo unit sales, predict the value of all units, and then apply any [post-modeling adjustment](https://github.com/ccao-data/model-res-avm#post-modeling). + +However, because the CCAO has so [little information about individual units](#differences-compared-to-the-residential-model), we must rely on the [condominium percentage of ownership](#features-used) to differentiate between units in a building. This feature is effectively the proportion of the building's overall value held by a unit. It is created when a condominium declaration is filed with the County (usually by the developer of the building). The critical assumption underlying the condo valuation process is that percentage of ownership correlates with current market value. + +Percentage of ownership is used in two ways: + +1. It is used directly as a predictor/feature in the regression model to estimate differing unit values within the same building. +2. It is used to reapportion unit values directly i.e. the value of a unit is ultimately equal to `% of ownership * total building value`. + +Visually, this looks like: + +![](docs/figures/valuation_perc_owner.png) + +For what the office terms "nonlivable" spaces, i.e. parking spaces, storage space, and common area, the breakout of value works differently. See [this excel sheet](docs/spreadsheets/condo_nonlivable_demo.xlsx) for an interactive example of how nonlivable spaces are valued based on the total value of a building's livable space. + +Percentage of ownership is the single most important feature in the condo model. It determines almost all intra-building differences in unit values. + +### Multisales + +The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many units are sold bundled with deeded parking spaces that are separate parcels and these two-parcel sales are highly reflective of the unit's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale would be adjusted to \$80,000: + +$$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$ + +## Condo Strata + +The condo model uses an engineered feature called *strata* to deliver much of its predictive power. Strata is the binned, time-weighted, 5-year average sale price of the building. There are two strata features used in the model, one with 10 bins and one with 300 bins. Buildings are binned across each triad using either quantiles or 1-dimensional k-means. A visual representation of quantile-based strata binning looks like: + +![](docs/figures/strata.png) + +To put strata in more concrete terms, the table below shows a sample 5-level strata. Each condominium unit would be assigned a strata from this table (Strata 1, Strata 2, etc.) based on the 5-year weighted average sale price of its building. All units in a building will have the same strata. + +```{r strata, echo=FALSE} +library(tibble) + +tribble( + ~"Strata", ~"Range of 5-year Average Sale Price", + "Strata 1", "$0 - $121K", + "Strata 2", "$121K - $149K", + "Strata 3", "$149K - $199K", + "Strata 4", "$199K - $276K", + "Strata 5", "$276K+" +) %>% + knitr::kable(format = "markdown") +``` + +Some additional notes on strata: + +- Strata is calculated in the [ingest stage](./pipeline/00-ingest.R) of this repository. +- Calculating the 5-year average sale price of a building requires at least 1 sale. Buildings with no sales have their strata imputed via KNN (using year built, number of units, and location as features). +- Number of bins (10 and 100) was chosen based on model performance. These numbers yielded the lowest root mean-squared error (RMSE). + +# Ongoing Issues + +The CCAO faces a number of ongoing issues specific to condominium modeling. We are currently working on processes to fix these issues. We list the issues here for the sake of transparency and to provide a sense of the challenges we face. + +### Unit Heterogeneity + +The current modeling methodology for condominiums makes two assumptions: + +1. Condos units within the same building are similar and will sell for similar amounts. +2. If units are not similar, the percentage of ownership will accurately reflect and be proportional to any difference in value between units. + +The model process works even in heterogeneous buildings as long as assumption 2 is met. For example, imagine a building with 8 identical units and 1 penthouse unit. This building violates assumption 1 because the penthouse unit is likely larger and worth more than the other 10. However, if the percentage of ownership of each unit is roughly proportional to its value, then each unit will still receive a fair assessment. + +However, the model can produce poor results when both of these assumptions are violated. For example, if a building has an extreme mix of different units, each with the same percentage of ownership, then smaller, less expensive units will be overvalued and larger, more expensive units will be undervalued. + +This problem is rare, but does occur in certain buildings with many heterogeneous units. Such buildings typically go through a process of secondary review to ensure the accuracy of the individual unit values. + +### Buildings With Few Sales + +The condo model relies on sales within the same building to calculate [strata](#condo-strata). This method works well for large buildings with many sales, but can break down when there are only 1 or 2 sales in a building. The primary danger here is _unrepresentative_ sales, i.e. sales that deviate significantly from the real average value of a building's units. When this happens, buildings can have their average unit sale value pegged too high or low. + +Fortunately, buildings without any recent sales are relatively rare, as condos have a higher turnover rate than single and multi-family property. Smaller buildings with low turnover are the most likely to not have recent sales. + +### Buildings Without Sales + +When no sales have occurred in a building in the 5 years prior to assessment, the building's strata features are imputed. The model will look at nearby buildings that have similar unit counts/age and then try to assign an appropriate strata to the target building. + +Most of the time, this technique produces reasonable results. However, buildings without sales still go through an additional round of review to ensure the accuracy of individual unit values. + +# FAQs + +**Note:** The FAQs listed here are for condo-specific questions. See the residential model documentation for [more general FAQs](https://github.com/ccao-data/model-res-avm#faqs). + +**Q: What are the most important features in the condo model?** + +As with the [residential model](https://github.com/ccao-data/model-res-avm), the importance of individual features varies by location and time. However, generally speaking, the most important features are: + +* Location, location, location. Location is the largest driver of county-wide variation in condo value. We account for location using [geospatial features like neighborhood](#features-used). +* Condo percentage of ownership, which determines the intra-building variation in unit price. +* [Condo building strata](#condo-strata). Strata provides us with a good estimate of the average sale price of a building's units. + +**Q: How do I see my condo building's strata?** + +Individual building [strata](#condo-strata) are not included with assessment notices or shown on the CCAO's website. However, strata *are* stored in the sample data included in this repository. You can load the data ([`input/condo_strata_data.parquet`](./input/condo_strata_data.parquet)) using R and the `read_parquet()` function from the `arrow` library. + +**Q: How do I see the assessed value of other units in my building?** + +You can use the [CCAO's Address Search](https://www.cookcountyassessor.com/address-search#address) to see all the PINs and values associated with a specific condominium building, simply leave the `Unit Number` field blank when submitting a search. + +**Q: How do I view my unit's percentage of ownership?** + +The percentage of ownership for individual units is printed on assessment notices. You may also be able to find it via your building's board or condo declaration. + +# Usage + +Installation and usage of this model is identical to the [installation and usage of the residential model](https://github.com/ccao-data/model-res-avm#usage). Please follow the instructions listed there. + +## Getting Data + +The data required to run these scripts is produced by the [ingest stage](pipeline/00-ingest.R), which uses SQL pulls from the CCAO's Athena database as a primary data source. CCAO employees can run the ingest stage or pull the latest version of the input data from our internal DVC store using: + +```bash +dvc pull +``` + +Public users can download data for each assessment year using the links below. Each file should be placed in the `input/` directory prior to running the model pipeline. + +#### 2021 + +- [assmntdata.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2021/assmntdata.parquet) +- [modeldata.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2021/modeldata.parquet) + +#### 2022 + +- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/assessment_data.parquet) +- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/condo_strata_data.parquet) +- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/land_nbhd_rate_data.parquet) +- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2022/training_data.parquet) + +#### 2023 + +- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/assessment_data.parquet) +- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/condo_strata_data.parquet) +- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/land_nbhd_rate_data.parquet) +- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2023/training_data.parquet) + +#### 2024 + +- [assessment_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/assessment_data.parquet) +- [char_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/char_data.parquet) +- [condo_strata_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/condo_strata_data.parquet) +- [land_nbhd_rate_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/land_nbhd_rate_data.parquet) +- [training_data.parquet](https://ccao-data-public-us-east-1.s3.amazonaws.com/models/inputs/condo/2024/training_data.parquet) + +For other data from the CCAO, please visit the [Cook County Data Portal](https://datacatalog.cookcountyil.gov/). + +# License + +Distributed under the AGPL-3 License. See [LICENSE](./LICENSE) for more information. + +# Contributing + +We welcome pull requests, comments, and other feedback via GitHub. For more involved collaboration or projects, please see the [Developer Engagement Program](https://github.com/ccao-data/people#external) documentation on our group wiki. From 7e7a635c967afb8b01f60287210752988fcd28d4 Mon Sep 17 00:00:00 2001 From: Sweaty Handshake Date: Tue, 19 Mar 2024 21:03:52 +0000 Subject: [PATCH 8/9] Add strata back to model features --- README.Rmd | 97 +++++++++++++++++++++++++++++--- README.md | 159 ++++++++++++++++++++++++++++------------------------- 2 files changed, 171 insertions(+), 85 deletions(-) diff --git a/README.Rmd b/README.Rmd index c7e7d29..d6544e9 100644 --- a/README.Rmd +++ b/README.Rmd @@ -62,29 +62,108 @@ Because our individual condo unit characteristics are sparse and incomplete, we ```{r features_used, message=FALSE, echo=FALSE} library(dplyr) +library(glue) +library(jsonlite) +library(purrr) library(tidyr) library(yaml) condo_params <- read_yaml("params.yaml") -condo_preds <- condo_params$model$predictor$all +condo_preds <- as_tibble(condo_params$model$predictor$all) + +# Some values are derived in the model itself, so they are not documented +# in the dbt DAG and need to be documented here +# nolint start +hardcoded_descriptions <- tribble( + ~"column", ~"description", + "sale_year", "Sale year calculated as the number of years since 0 B.C.E", + "sale_day", + "Sale day calculated as the number of days since January 1st, 1997", + "sale_quarter_of_year", "Character encoding of quarter of year (Q1 - Q4)", + "sale_month_of_year", "Character encoding of month of year (Jan - Dec)", + "sale_day_of_year", "Numeric encoding of day of year (1 - 365)", + "sale_day_of_month", "Numeric encoding of day of month (1 - 31)", + "sale_day_of_week", "Numeric encoding of day of week (1 - 7)", + "sale_post_covid", "Indicator for whether sale occurred after COVID-19 was widely publicized (around March 15, 2020)", + "strata_1", + glue("Condominium Building Strata - {condo_params$input$strata$k_1} Levels"), + "strata_2", + glue("Condominium Building Strata - {condo_params$input$strata$k_2} Levels") + # nolint end +) + +# Load the dbt DAG from our prod docs site +dbt_manifest <- fromJSON( + "https://ccao-data.github.io/data-architecture/manifest.json" +) + +# nolint start: cyclomp_linter +get_column_description <- function(colname, dag_nodes, hardcoded_descriptions) { + # Retrieve the description for a column `colname` either from a set of + # dbt DAG nodes (`dag_nodes`) or a set of hardcoded descriptions + # (`hardcoded_descriptions`). Column descriptions that come from dbt DAG nodes + # will be truncated starting from the first period to reflect the fact that + # we use periods in our dbt documentation to separate high-level column + # summaries from their detailed notes + # + # Prefer the hardcoded descriptions, if they exist + if (colname %in% hardcoded_descriptions$column) { + return( + hardcoded_descriptions[ + match(colname, hardcoded_descriptions$column), + ]$description + ) + } + # If no hardcoded description exists, fall back to checking the dbt DAG + for (node_name in ls(dag_nodes)) { + node <- dag_nodes[[node_name]] + for (column_name in ls(node$columns)) { + if (column_name == colname) { + description <- node$columns[[column_name]]$description + if (!is.null(description) && trimws(description) != "") { + # Strip everything after the first period, since we use the first + # period as a delimiter separating a column's high-level summary from + # its detailed notes in our dbt docs + summary_description <- strsplit(description, ".", fixed = TRUE)[[1]][1] + return(gsub("\n", " ", summary_description)) + } + } + } + } + # No match in either the hardcoded descriptions or the dbt DAG, so fall + # back to an empty string + return("") +} +# nolint end + +# Make a vector of column descriptions that we can add to the param tibble +# as a new column +param_notes <- condo_preds$value %>% + ccao::vars_rename(names_from = "model", names_to = "athena") %>% + map(~ get_column_description( + .x, dbt_manifest$nodes, hardcoded_descriptions + )) %>% + unlist() res_params <- read_yaml( "https://raw.githubusercontent.com/ccao-data/model-res-avm/master/params.yaml" ) res_preds <- res_params$model$predictor$all -condo_unique_preds <- setdiff(condo_preds, res_preds) +condo_unique_preds <- setdiff(condo_preds$value, res_preds) -ccao::vars_dict %>% - inner_join( - as_tibble(condo_preds), - by = c("var_name_model" = "value") +condo_preds %>% + mutate(description = param_notes) %>% + left_join( + ccao::vars_dict, + by = c("value" = "var_name_model") ) %>% distinct( - var_name_model, `Feature Name` = var_name_pretty, Category = var_type, Type = var_data_type, + Notes = description, + value, ) %>% mutate( Category = recode( @@ -103,13 +182,13 @@ ccao::vars_dict %>% ) ) %>% mutate(`Unique to Condo Model` = ifelse( - var_name_model %in% condo_unique_preds | + value %in% condo_unique_preds | `Feature Name` %in% c("Condominium Building Year Built", "Condominium % Ownership"), "X", "" )) %>% arrange(desc(`Unique to Condo Model`), Category) %>% - select(-var_name_model) %>% + select(-value) %>% knitr::kable(format = "markdown") ``` diff --git a/README.md b/README.md index e8c97ba..64d58b1 100644 --- a/README.md +++ b/README.md @@ -105,82 +105,89 @@ features, [strata](#condo-strata), and time of sale to determine condo assessed values. The features in the table below are the ones used in the 2023 assessment model. -| Feature Name | Category | Type | Unique to Condo Model | -|:------------------------------------------------------------------------|:---------------|:----------|:----------------------| -| Condominium Building Year Built | Characteristic | numeric | X | -| Total Condominium Building Non-Livable Parcels | Characteristic | numeric | X | -| Total Condominium Building Livable Parcels | Characteristic | numeric | X | -| Total Condominium Building Square Footage | Characteristic | numeric | X | -| Condominium Unit Square Footage | Characteristic | numeric | X | -| Condominium Unit Bedrooms | Characteristic | numeric | X | -| Condominium Unit Half Baths | Characteristic | numeric | X | -| Condominium Unit Full Baths | Characteristic | numeric | X | -| Condominium Building Is Mixed Use | Characteristic | logical | X | -| Condominium % Ownership | Meta | numeric | X | -| Land Square Feet | Characteristic | numeric | | -| Township Code | Meta | character | | -| Neighborhood Code | Meta | character | | -| Sale Year | Time | numeric | | -| Sale Day | Time | numeric | | -| Sale Quarter of Year | Time | character | | -| Sale Month of Year | Time | character | | -| Sale Day of Year | Time | numeric | | -| Sale Day of Month | Time | numeric | | -| Sale Day of Week | Time | numeric | | -| Sale After COVID-19 | Time | logical | | -| Percent Population Age, Under 19 Years Old | acs5 | numeric | | -| Percent Population Age, Over 65 Years Old | acs5 | numeric | | -| Median Population Age | acs5 | numeric | | -| Percent Population Mobility, Moved From Other State in Past Year | acs5 | numeric | | -| Percent Households Family, Married | acs5 | numeric | | -| Percent Households Nonfamily, Living Alone | acs5 | numeric | | -| Percent Population Education, High School Degree | acs5 | numeric | | -| Percent Population Education, Bachelor Degree | acs5 | numeric | | -| Percent Population Education, Graduate Degree | acs5 | numeric | | -| Percent Population Income, Below Poverty Level | acs5 | numeric | | -| Median Income, Household in Past Year | acs5 | numeric | | -| Median Income, Per Capita in Past Year | acs5 | numeric | | -| Percent Population Income, Received SNAP in Past Year | acs5 | numeric | | -| Percent Population Employment, Unemployed | acs5 | numeric | | -| Median Occupied Household, Total, Year Built | acs5 | numeric | | -| Median Occupied Household, Renter, Gross Rent | acs5 | numeric | | -| Percent Occupied Households, Owner | acs5 | numeric | | -| Percent Occupied Households, Total, One or More Selected Conditions | acs5 | numeric | | -| Percent Population Mobility, Moved From Within Same County in Past Year | acs5 | numeric | | -| Corner Lot | ccao | logical | | -| Active Homeowner Exemption | ccao | logical | | -| Number of Years Active Homeowner Exemption | ccao | numeric | | -| Longitude | loc | numeric | | -| Latitude | loc | numeric | | -| Census Tract GEOID | loc | character | | -| First Street Factor | loc | numeric | | -| School Elementary District GEOID | loc | character | | -| School Secondary District GEOID | loc | character | | -| Municipality Name | loc | character | | -| CMAP Walkability Score (No Transit) | loc | numeric | | -| CMAP Walkability Total Score | loc | numeric | | -| Property Tax Bill Aggregate Rate | other | numeric | | -| Number of PINs in Half Mile | prox | numeric | | -| Number of Bus Stops in Half Mile | prox | numeric | | -| Number of Foreclosures Per 1000 PINs (Past 5 Years) | prox | numeric | | -| Number of Schools in Half Mile | prox | numeric | | -| Nearest Bike Trail Distance (Feet) | prox | numeric | | -| Nearest Cemetery Distance (Feet) | prox | numeric | | -| Nearest CTA Route Distance (Feet) | prox | numeric | | -| Nearest CTA Stop Distance (Feet) | prox | numeric | | -| Nearest Hospital Distance (Feet) | prox | numeric | | -| Lake Michigan Distance (Feet) | prox | numeric | | -| Nearest Major Road Distance (Feet) | prox | numeric | | -| Nearest Metra Route Distance (Feet) | prox | numeric | | -| Nearest Metra Stop Distance (Feet) | prox | numeric | | -| Nearest Park Distance (Feet) | prox | numeric | | -| Nearest Railroad Distance (Feet) | prox | numeric | | -| Nearest Secondary Road Distance (Feet) | prox | numeric | | -| Nearest University Distance (Feet) | prox | numeric | | -| Nearest Vacant Land Parcel Distance (Feet) | prox | numeric | | -| Nearest Water Distance (Feet) | prox | numeric | | -| Nearest Golf Course Distance (Feet) | prox | numeric | | -| Total Airport Noise DNL | prox | numeric | | +| Feature Name | Category | Type | Notes | Unique to Condo Model | +|:------------------------------------------------------------------------|:---------------|:----------|:------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------| +| Condominium Building Year Built | Characteristic | numeric | Year the property was constructed | X | +| Total Condominium Building Livable Parcels | Characteristic | numeric | Count of livable 14-digit PINs (AKA condo units) | X | +| Total Condominium Building Non-Livable Parcels | Characteristic | numeric | Count of non-livable 14-digit PINs | X | +| Condominium Building Is Mixed Use | Characteristic | logical | The 10-digit PIN (building) contains a 14-digit PIN that is neither class 299 nor 399 | X | +| Total Condominium Building Square Footage | Characteristic | numeric | Square footage of the *building* (PIN10) containing this unit | X | +| Building Square Footage | Characteristic | numeric | Square footage of the *building* (PIN10) containing this unit | X | +| Condominium Unit Square Footage | Characteristic | numeric | Square footage of the condominium unit associated with this PIN | X | +| Unit Square Footage | Characteristic | numeric | Square footage of the condominium unit associated with this PIN | X | +| Condominium Unit Bedrooms | Characteristic | numeric | Number of bedrooms in the building | X | +| Bedrooms | Characteristic | numeric | Number of bedrooms in the building | X | +| Condominium Unit Half Baths | Characteristic | numeric | Number of half baths | X | +| Half Baths | Characteristic | numeric | Number of half baths | X | +| Condominium Unit Full Baths | Characteristic | numeric | Number of full bathrooms | X | +| Full Baths | Characteristic | numeric | Number of full bathrooms | X | +| Condominium % Ownership | Meta | numeric | Proration rate applied to the PIN | X | +| Condominium Building Strata 1 | Meta | character | Condominium Building Strata - 10 Levels | X | +| Condominium Building Strata 2 | Meta | character | Condominium Building Strata - 100 Levels | X | +| Land Square Feet | Characteristic | numeric | Square footage of the land (not just the building) of the property | | +| Township Code | Meta | character | Cook County township code | | +| Neighborhood Code | Meta | character | Assessor neighborhood code | | +| Sale Year | Time | numeric | Sale year calculated as the number of years since 0 B.C.E | | +| Sale Day | Time | numeric | Sale day calculated as the number of days since January 1st, 1997 | | +| Sale Quarter of Year | Time | character | Character encoding of quarter of year (Q1 - Q4) | | +| Sale Month of Year | Time | character | Character encoding of month of year (Jan - Dec) | | +| Sale Day of Year | Time | numeric | Numeric encoding of day of year (1 - 365) | | +| Sale Day of Month | Time | numeric | Numeric encoding of day of month (1 - 31) | | +| Sale Day of Week | Time | numeric | Numeric encoding of day of week (1 - 7) | | +| Sale After COVID-19 | Time | logical | Indicator for whether sale occurred after COVID-19 was widely publicized (around March 15, 2020) | | +| Percent Population Age, Under 19 Years Old | acs5 | numeric | Percent of the people 17 years or younger | | +| Percent Population Age, Over 65 Years Old | acs5 | numeric | Percent of the people 65 years or older | | +| Median Population Age | acs5 | numeric | Median age for whole population | | +| Percent Population Mobility, Moved From Other State in Past Year | acs5 | numeric | Percent of people (older than 1 year) who moved from another state in the past 12 months | | +| Percent Households Family, Married | acs5 | numeric | Percent of households that are family, married | | +| Percent Households Nonfamily, Living Alone | acs5 | numeric | Percent of households that are non-family, alone (single) | | +| Percent Population Education, High School Degree | acs5 | numeric | Percent of people older than 25 who attained a high school degree | | +| Percent Population Education, Bachelor Degree | acs5 | numeric | Percent of people older than 25 who attained a bachelor’s degree | | +| Percent Population Education, Graduate Degree | acs5 | numeric | Percent of people older than 25 who attained a graduate degree | | +| Percent Population Income, Below Poverty Level | acs5 | numeric | Percent of people above the poverty level in the last 12 months | | +| Median Income, Household in Past Year | acs5 | numeric | Median income per household in the past 12 months | | +| Median Income, Per Capita in Past Year | acs5 | numeric | Median income per capita in the past 12 months | | +| Percent Population Income, Received SNAP in Past Year | acs5 | numeric | Percent of households that received SNAP in the past 12 months | | +| Percent Population Employment, Unemployed | acs5 | numeric | Percent of people 16 years and older unemployed | | +| Median Occupied Household, Total, Year Built | acs5 | numeric | Median year built for all occupied households | | +| Median Occupied Household, Renter, Gross Rent | acs5 | numeric | Median gross rent for only renter-occupied units | | +| Percent Occupied Households, Owner | acs5 | numeric | Percent of households that are owner-occupied | | +| Percent Occupied Households, Total, One or More Selected Conditions | acs5 | numeric | Percent of occupied households with selected conditions | | +| Percent Population Mobility, Moved From Within Same County in Past Year | acs5 | numeric | Percent of people (older than 1 year) who moved in county in the past 12 months | | +| Active Homeowner Exemption | ccao | logical | Parcel has an active homeowner exemption | | +| Corner Lot | ccao | logical | Corner lot indicator | | +| Number of Years Active Homeowner Exemption | ccao | numeric | Number of years parcel has had an active homeowner exemption | | +| Longitude | loc | numeric | X coordinate in degrees (global longitude) | | +| Latitude | loc | numeric | Y coordinate in degrees (global latitude) | | +| Census Tract GEOID | loc | character | 11-digit ACS/Census tract GEOID | | +| First Street Factor | loc | numeric | First Street flood factor The flood factor is a risk score, where 10 is the highest risk and 1 is the lowest risk | | +| School Elementary District GEOID | loc | character | School district (elementary) GEOID | | +| School Secondary District GEOID | loc | character | School district (secondary) GEOID | | +| CMAP Walkability Score (No Transit) | loc | numeric | CMAP walkability score for a given PIN, excluding transit walkability | | +| CMAP Walkability Total Score | loc | numeric | CMAP walkability score for a given PIN, including transit walkability | | +| Municipality Name | loc | character | Taxing district name, as seen on Cook County tax bills | | +| Property Tax Bill Aggregate Rate | other | numeric | Tax bill rate for the taxing district containing a given PIN | | +| Number of PINs in Half Mile | prox | numeric | Number of PINs within half mile | | +| Number of Bus Stops in Half Mile | prox | numeric | Number of bus stops within half mile | | +| Number of Foreclosures Per 1000 PINs (Past 5 Years) | prox | numeric | Number of foreclosures per 1000 PINs, within half mile (past 5 years) | | +| Number of Schools in Half Mile | prox | numeric | Number of schools (any kind) within half mile | | +| Total Airport Noise DNL | prox | numeric | Estimated DNL for a PIN, assuming a baseline DNL of 50 (“quiet suburban”) and adding predicted noise from O’Hare and Midway airports to that baseline | | +| Nearest Bike Trail Distance (Feet) | prox | numeric | Nearest bike trail distance (feet) | | +| Nearest Cemetery Distance (Feet) | prox | numeric | Nearest cemetery distance (feet) | | +| Nearest CTA Route Distance (Feet) | prox | numeric | Nearest CTA route distance (feet) | | +| Nearest CTA Stop Distance (Feet) | prox | numeric | Nearest CTA stop distance (feet) | | +| Nearest Hospital Distance (Feet) | prox | numeric | Nearest hospital distance (feet) | | +| Lake Michigan Distance (Feet) | prox | numeric | Distance to Lake Michigan shoreline (feet) | | +| Nearest Major Road Distance (Feet) | prox | numeric | Nearest major road distance (feet) | | +| Nearest Metra Route Distance (Feet) | prox | numeric | Nearest Metra route distance (feet) | | +| Nearest Metra Stop Distance (Feet) | prox | numeric | Nearest Metra stop distance (feet) | | +| Nearest Park Distance (Feet) | prox | numeric | Nearest park distance (feet) | | +| Nearest Railroad Distance (Feet) | prox | numeric | Nearest railroad distance (feet) | | +| Nearest Secondary Road Distance (Feet) | prox | numeric | Nearest secondary road distance (feet) | | +| Nearest University Distance (Feet) | prox | numeric | Nearest university distance (feet) | | +| Nearest Vacant Land Parcel Distance (Feet) | prox | numeric | Nearest vacant land (class 100) parcel distance (feet) | | +| Nearest Water Distance (Feet) | prox | numeric | Nearest water distance (feet) | | +| Nearest Golf Course Distance (Feet) | prox | numeric | Nearest golf course distance (feet) | | ### Valuation From 198341209cfb7207e536dd2515bf5f4e0ee1ef20 Mon Sep 17 00:00:00 2001 From: Sweaty Handshake Date: Wed, 20 Mar 2024 14:19:33 +0000 Subject: [PATCH 9/9] Address review comments --- README.Rmd | 15 ++++++++------- README.md | 48 ++++++++++++++++++++++++------------------------ 2 files changed, 32 insertions(+), 31 deletions(-) diff --git a/README.Rmd b/README.Rmd index d6544e9..500b362 100644 --- a/README.Rmd +++ b/README.Rmd @@ -34,7 +34,8 @@ This repository contains code, data, and documentation for the Cook County Asses | 2021 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2021-assessment-year) | | 2022 | North | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2022-assessment-year) | | 2023 | South | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2023-assessment-year) | -| 2024 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2024-assessment-year) | +| 2024 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2024-assessment-year) + | # Model Overview @@ -47,7 +48,7 @@ The repository itself contains the [code](./pipeline) for the Automated Valuatio ## Differences Compared to the Residential Model -The Cook County Assessor's Office has begun to track a limited number of characteristics (building-level square footage and unit-level square footage, bedrooms, and bathrooms) for condominiums, but the data we have ***varies in both the characteristics available and their completeness*** between triads. Staffing limitations have forced the office to prioritizes smaller condo buildings less likely to have recent unit sales in certain parts of the county. Like most assessors nationwide, our office staff cannot enter buildings to observe property characteristics. For condos, this means we cannot observe amenities, quality, or any other interior characteristics which must instead be gathered from listings and a number of additional third-party sources. +The Cook County Assessor's Office has started to track a limited number of characteristics (building-level square footage, unit-level square footage, bedrooms, and bathrooms) for condominiums, but the data we have ***varies in both the characteristics available and their completeness*** between triads. Staffing limitations have forced the office to prioritize smaller condo buildings less likely to have recent unit sales in certain parts of the county. Like most assessors nationwide, our office staff cannot enter buildings to observe property characteristics. For condos, this means we cannot observe amenities, quality, or any other interior characteristics which must instead be gathered from listings and a number of additional third-party sources. The only complete information our office currently has about individual condominium units is their age, location, sale date/price, and percentage of ownership. This makes modeling condos particularly challenging, as the number of usable features is quite small. Fortunately, condos have two qualities which make modeling a bit easier: @@ -58,7 +59,7 @@ We leverage these qualities to produce what we call ***strata***, a feature uniq ### Features Used -Because our individual condo unit characteristics are sparse and incomplete, we must rely on aggregate geospatial features, economic features, [strata](#condo-strata), and time of sale to determine condo assessed values. The features in the table below are the ones used in the 2023 assessment model. +Because our individual condo unit characteristics are sparse and incomplete, we primarily must rely on aggregate geospatial features, economic features, [strata](#condo-strata), and time of sale to determine condo assessed values. The features in the table below are the ones used in the 2023 assessment model. ```{r features_used, message=FALSE, echo=FALSE} library(dplyr) @@ -89,8 +90,8 @@ hardcoded_descriptions <- tribble( glue("Condominium Building Strata - {condo_params$input$strata$k_1} Levels"), "strata_2", glue("Condominium Building Strata - {condo_params$input$strata$k_2} Levels") - # nolint end ) +# nolint end # Load the dbt DAG from our prod docs site dbt_manifest <- fromJSON( @@ -207,13 +208,13 @@ Visually, this looks like: ![](docs/figures/valuation_perc_owner.png) -For what the office terms "nonlivable" spaces, i.e. parking spaces, storage space, and common area, the breakout of value works differently. See [this excel sheet](docs/spreadsheets/condo_nonlivable_demo.xlsx) for an interactive example of how nonlivable spaces are valued based on the total value of a building's livable space. +For what the office terms "nonlivable" spaces — parking spaces, storage space, and common area — the breakout of value works differently. See [this excel sheet](docs/spreadsheets/condo_nonlivable_demo.xlsx) for an interactive example of how nonlivable spaces are valued based on the total value of a building's livable space. Percentage of ownership is the single most important feature in the condo model. It determines almost all intra-building differences in unit values. -### Multisales +### Multi-sales -The condo model is trained on a select number of "multisales" in addition to single-parcel sales. Multisales are sales that include more than one parcel and rarely reflect the market price the included parcels would fetch if they were sold individually. In the case of condominiums, however, many units are sold bundled with deeded parking spaces that are separate parcels and these two-parcel sales are highly reflective of the unit's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale would be adjusted to \$80,000: +The condo model is trained on a select number of "multi-sales" in addition to single-parcel sales. Multi-sales are sales that include more than one parcel. In the case of condominiums, many units are sold bundled with deeded parking spaces that are separate parcels. These two-parcel sales are highly reflective of the unit's actual market price. We split the total value of these two-parcel sales according to their relative percent of ownership before using them for training. For a \$100,000 sale of a unit (4% ownership) and a parking space (1% ownership), the sale would be adjusted to \$80,000: $$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$ diff --git a/README.md b/README.md index 64d58b1..13c0261 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ Table of Contents Model](#differences-compared-to-the-residential-model) - [Features Used](#features-used) - [Valuation](#valuation) - - [Multisales](#multisales) + - [Multi-sales](#multi-sales) - [Condo Strata](#condo-strata) - [Ongoing Issues](#ongoing-issues) - [Unit Heterogeneity](#unit-heterogeneity) @@ -47,6 +47,7 @@ prior year models can be found at the following links: | 2022 | North | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2022-assessment-year) | | 2023 | South | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2023-assessment-year) | | 2024 | City | County-wide LightGBM model | R (Tidyverse / Tidymodels) | [Link](https://github.com/ccao-data/model-condo-avm/tree/2024-assessment-year) | +| | | | | | # Model Overview @@ -70,12 +71,12 @@ for all properties. ## Differences Compared to the Residential Model -The Cook County Assessor’s Office has begun to track a limited number of -characteristics (building-level square footage and unit-level square +The Cook County Assessor’s Office has started to track a limited number +of characteristics (building-level square footage, unit-level square footage, bedrooms, and bathrooms) for condominiums, but the data we have ***varies in both the characteristics available and their completeness*** between triads. Staffing limitations have forced the -office to prioritizes smaller condo buildings less likely to have recent +office to prioritize smaller condo buildings less likely to have recent unit sales in certain parts of the county. Like most assessors nationwide, our office staff cannot enter buildings to observe property characteristics. For condos, this means we cannot observe amenities, @@ -100,10 +101,10 @@ more information about how strata is used and calculated. ### Features Used Because our individual condo unit characteristics are sparse and -incomplete, we must rely on aggregate geospatial features, economic -features, [strata](#condo-strata), and time of sale to determine condo -assessed values. The features in the table below are the ones used in -the 2023 assessment model. +incomplete, we primarily must rely on aggregate geospatial features, +economic features, [strata](#condo-strata), and time of sale to +determine condo assessed values. The features in the table below are the +ones used in the 2023 assessment model. | Feature Name | Category | Type | Notes | Unique to Condo Model | |:------------------------------------------------------------------------|:---------------|:----------|:------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------| @@ -219,28 +220,27 @@ Visually, this looks like: ![](docs/figures/valuation_perc_owner.png) -For what the office terms “nonlivable” spaces, i.e. parking spaces, -storage space, and common area, the breakout of value works differently. -See [this excel sheet](docs/spreadsheets/condo_nonlivable_demo.xlsx) for -an interactive example of how nonlivable spaces are valued based on the +For what the office terms “nonlivable” spaces — parking spaces, storage +space, and common area — the breakout of value works differently. See +[this excel sheet](docs/spreadsheets/condo_nonlivable_demo.xlsx) for an +interactive example of how nonlivable spaces are valued based on the total value of a building’s livable space. Percentage of ownership is the single most important feature in the condo model. It determines almost all intra-building differences in unit values. -### Multisales - -The condo model is trained on a select number of “multisales” in -addition to single-parcel sales. Multisales are sales that include more -than one parcel and rarely reflect the market price the included parcels -would fetch if they were sold individually. In the case of condominiums, -however, many units are sold bundled with deeded parking spaces that are -separate parcels and these two-parcel sales are highly reflective of the -unit’s actual market price. We split the total value of these two-parcel -sales according to their relative percent of ownership before using them -for training. For a \$100,000 sale of a unit (4% ownership) and a -parking space (1% ownership), the sale would be adjusted to \$80,000: +### Multi-sales + +The condo model is trained on a select number of “multi-sales” in +addition to single-parcel sales. Multi-sales are sales that include more +than one parcel. In the case of condominiums, many units are sold +bundled with deeded parking spaces that are separate parcels. These +two-parcel sales are highly reflective of the unit’s actual market +price. We split the total value of these two-parcel sales according to +their relative percent of ownership before using them for training. For +a \$100,000 sale of a unit (4% ownership) and a parking space (1% +ownership), the sale would be adjusted to \$80,000: $$\frac{0.04}{0.04 + 0.01} * \$100,000 = \$80,000$$