The databeer projects aims at extracting intelligence from the hundreds of thousands beer recipes accessible online. The first steps, which we are currently working on, are data crawling and data modelling. Then, we'll move on to the machine learning part.
We use scrapy for web crawling, a powerful and versatile web crawler framework for Python.
The first source we crawled. Contains approximately 300 000 recipes. Scrapy files can be found in databeer/brewtoad. The data is then written in csv files, which you can found in databeer/brewtoad/csv.
TBD
This part is still at an early phase, and most notebooks are not fully commented or For the time being, these are mostly sandboxes to play with the data and find ideas for further applications.
In order to lighten the notebooks, some functions are defined in the utils.py file. This file is also a work in progress and will be refactored in the future.
Various aggregations and other tests on the data.
An attempt to focus on Hops data.
Steps 1 and 2 gave us some ideas about what we'd like to obtain and how we might be able to do so. Here are some examples:
- From the hops (defined by time, alpha and relative quantity) sequence of each recipe, train a Recurrent Neural Network (RNN) to suggest an additional hop given a list of hops. This might also be done with Hidden Markov Model (HMM)
- Same as hop but for fermentables
- Same but this time for full recipe (hasn't been thought of in details for know, might not be ideal)
- IBU calculation -> surpass the approximative formula used by most brewers