This project contains mainly three files:
- Scraper 🕷️
- notebooks 📓
- webapp 💻
This notebook uses Spatial Analysis in order to take advantage of the coordinates features extracted of the webpage, in that sense, we made sure that Spatial analysis could leverage the model performance. You can watch the results in the Spatial Autocorrelations Notebook. The conclusions about these features can be sumarry as follows:
- Moran’s test says to us that our data contains a relationship between our target variable and the space.
- The relationship mentioned above it’s not strong but exists. Meaning the Spatial Features could add some value to the model.
- Local Spatial Autocorrelation test validates the well-known hypothesis that Lima is a centric city since the clusters are spread out around the center of the city.
You could also see Choropleths and other exploration images on Exploratory Data Analysis Notebook
Results are stored mainly in 2020_Notebook04_Model_Selection Notebook and Pycaret were used for the rapid development on model selection and features. The main problem with this dataset is that is apparently small to solve the problem of outliers. Outliers are the main thing when it came to overperformance the first benchmarks that we tested.
Also, there is multiple integrations such as Point Of interest Clusters or Crime Clusters added to the model. But since there is many development cost on going with these into production, in comparison with value added on the benchmark metrics, the ML Model is maintained as a basic version in their API. It's also importat to add that the value of the ML Model is totally dependent on the quality of data. This project has only been trained by one-period housing data, there is much pontentiality on seeing trends through time but the Urbania webpage doesn't allow to scrap recurrently information of their webpage.
For the deploynment,
- [CSS & HTML] - Basic thing for web apps!
- [Flask] - As a Backend. (API DONE, form interface in progress)
- [Heroku] - Server app (In progress)
MIT