diff --git a/2018/seminars/7_linear_models/seminar7_v1.ipynb b/2018/seminars/7_linear_models/seminar7_v1.ipynb
new file mode 100644
index 0000000..b9d49d5
--- /dev/null
+++ b/2018/seminars/7_linear_models/seminar7_v1.ipynb
@@ -0,0 +1,539 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "ФИВТ, АПТ, Курс по машинному обучению, Весна 2017, семинар 7 \n",
+ "\n",
+ "Alexey Romanenko, \n",
+ "alexromsput@gmail.com"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Linear models (Линейные модели)\n",
+ "
Plan
\n",
+ "\n",
+ "* **Linear Models overview** \n",
+ " - Linear Model for Classification\n",
+ " - Linear Model for Regression (preview)\n",
+ " - Linear Models and regularization\n",
+ "\n",
+ "* **Gradient descent**\n",
+ " - GD, SGD, SAG, \n",
+ " - SGD with different loss function\n",
+ " - SGD regularization\n",
+ " \n",
+ "* **SVM: base overview**\n",
+ " - learning algorithm\n",
+ " - SVM realization\n",
+ " - MultiClasss SVM"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Populating the interactive namespace from numpy and matplotlib\n"
+ ]
+ }
+ ],
+ "source": [
+ "import os\n",
+ "import numpy as np\n",
+ "import seaborn as sns\n",
+ "import _pickle as pickle # in Python 2 try import cPickle as pickle\n",
+ "\n",
+ "from matplotlib.colors import ListedColormap\n",
+ "from sklearn.preprocessing import StandardScaler\n",
+ "from sklearn.datasets import make_moons, make_circles, make_classification\n",
+ "\n",
+ "\n",
+ "sns.set_context(\"notebook\", font_scale=1.5)\n",
+ "\n",
+ "from IPython.display import Image, SVG\n",
+ "\n",
+ "from scipy import optimize\n",
+ "import matplotlib.pyplot as plt\n",
+ "%pylab inline\n",
+ "from IPython import display\n",
+ "import random\n",
+ "\n",
+ "plt.style.use('ggplot')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Warm Up: 3 datasets
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X, y = make_classification(n_features=2, n_redundant=0, n_informative=2)\n",
+ "X += np.random.random(X.shape)\n",
+ "\n",
+ "datasets = [make_moons(noise=0.1), make_circles(noise=0.1, factor=0.5), (X, y)]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "pylab.rcParams['figure.figsize'] = 14, 4\n",
+ "pl = plt.subplots(1, len(datasets), sharex='col', sharey='row')\n",
+ "for i, (X, y) in enumerate(datasets):\n",
+ " X = StandardScaler().fit_transform(X)\n",
+ " pl[1][i].scatter(X[:, 0], X[:, 1], c=y, cmap=ListedColormap(['#FF0000', '#0000FF']))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "** Вопрос **: Какая выборка линейно разделима (предполагает использование линейной модели классификации)?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "** Вопросы **\n",
+ " - 1) как выглядит решающее правило для линейных моделей? Какие параметры у линейных моделей?\n",
+ " - 2) что общего между линейной моделью классификации и регрессии?\n",
+ " - 3) может ли предсказание линейной модели выходить за область значений обучающей выборки?\n",
+ " - 4) что такое реуляризация и зачем она нужна?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " Linear Model overview
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## For Classification"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "## $$Y = \\{+1, -1\\},~X\\in \\mathbf{R}^d$$\n",
+ "## $$y_{predict}(x) = sign() $$ \n",
+ "## $$margin(x, y) = y \\cdot sign()$$\n",
+ "## $$Q(w, X^\\ell) = \\frac{1}{n} \\sum_i^n L(y_i, ) \\rightarrow \\min_w$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## For Regression"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "## $$Y = \\mathbf{R},~X\\in \\mathbf{R}^d$$\n",
+ "## $$y_{predict}(x) = $$ \n",
+ "## $$Q(w, X^\\ell) = \\frac{1}{n} \\sum_i^n L(y_i, ) \\rightarrow \\min_w$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "### Loss functions:\n",
+ " - Hinge Loss \n",
+ "## $$L_i(x, y; w) = max(0, 1 - y\\cdot)$$\n",
+ " - Loistic Loss \n",
+ "## $$L_i(x, y; w) = log(1 + e^{-y\\cdot})$$\n",
+ " - Squared Loss\n",
+ "## $$L_i(x, y; w) = log(1 - y\\cdot)^2$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Loss Functions\n",
+ "![](http://scikit-learn.org/0.15/_images/plot_sgd_loss_functions_001.png = 200x200)\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "scrolled": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "pylab.rcParams['figure.figsize'] = (7.0, 7.0) \n",
+ "x = np.linspace(-1, 2.5)\n",
+ "pylab.plot(x, list(map(lambda m: np.max([0, 1 - m]), x)), label='hinge')\n",
+ "pylab.plot(x, list(map(lambda m: np.log(1 + e**(-m)), x)), label='logistic')\n",
+ "pylab.plot(x, list(map(lambda m: (1 - m)**2, x)), label='squared')\n",
+ "pylab.ylabel('Loss')\n",
+ "pylab.xlabel('Margin')\n",
+ "pylab.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Регуляризация"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---------\n",
+ "Gradient Descent for Linear Classifiers
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "** Вопросы **\n",
+ "* Что такое градиентный спуск (Gradient Descent?\n",
+ "* Какие недостатки у простоо GD?\n",
+ "* Что такое стохастический градиентный спуск (SGD)? SAG?\n",
+ "* Достоинства и недостатки метода SGD?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Gradient descent\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "### Градинтный спуск\n",
+ "## $$w_{k+1} = w_k - \\nabla Q(w_k, X^\\ell)= w_k - \\nabla \\sum_i L(w_k, x_i)$$\n",
+ "\n",
+ "### Main Problems of gradient method\n",
+ "* multicollinearity\n",
+ "* scaling problem\n",
+ "* Plateau\n",
+ "* Zig-zagginh"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Plateau\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Zig-zagging\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Как вычислять градиент\n",
+ "## $$ \\nabla_x \\langle a, x \\rangle = a. $$\n",
+ "## $$ \\nabla_x \\|x\\|_2^2 = 2 x. $$\n",
+ "## $$ \\nabla_x \\langle Ax, x \\rangle = (A + A^T) x, $$\n",
+ "### $where ~ A \\in R^{d \\times d}$\n",
+ "## $$ \\nabla_x \\|Ax + b\\|_2^2 = 2 A^T (Ax + b). $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "**Задача ** Найти производные приведенных функций по w, в матричной форме \n",
+ "## $$f(w) = \\sum_i log(1-e^{-y_i })$$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Нютон (HF, BFGS)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## $$w_{k+1} = w_k - \\nabla^2 f(x_k) \\nabla f(x_k)$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Методы второго порядка -- намного быстрее, но как правило дорогие т.к. требуют хранения гессиана.\n",
+ "\n",
+ "Некоторые методы второго порядка лишены этого недостатка, при необходимости используйте BFGS или HF Newton. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def nw(X, y, w, gradf, hessf, dold):\n",
+ " return -np.linalg.inv(hessf(X, y, w)).dot(grad_function(X, y, w))\n",
+ "\n",
+ "for X, y in datasets:\n",
+ " X, y = expand(X), -2*(y-0.5)\n",
+ " a = viz_opt(loss_function, grad_function, hess_function, X, y, nw) \n",
+ "\n",
+ "display.clear_output()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Stochastic GD, Momentum, Nesterov"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Что делать если в функции большая сумма? Давайте считать градиент только по случайной подвыборке"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## $$w_{k+1} = w_k - \\nabla \\hat{f}(w_k)$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## $$w_{k+1} = w_k - E \\nabla \\hat{f}(x_k)$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## $$x_{k+1} =w_k - E \\nabla \\hat{f}(x_k-\\alpha E \\nabla \\hat{f}_{k-1})$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Достоинства и недостатки SGD\n",
+ "* Benefits:\n",
+ " - Suitable for online learning\n",
+ " - Learning on big and small sets\n",
+ " - Faster than classic GD\n",
+ "\n",
+ "* Disadvantages and recommendations\n",
+ " - Convergence problems!\n",
+ " - Multiextremal functional and local extremums\n",
+ " -- recommendation: jog of weights\n",
+ " - Not very fast\n",
+ " - recommendation: SAG version of SGD\n",
+ "$$w^{(t+1)}=w^{(t)}-\\frac{\\eta_t}{\\ell} \\nabla \\left( (\\ell-1)\\cdot Q(w^{(t-1)},X^\\ell\\setminus \\{x_i\\}) + Q(w^{(t)},x_i)\\right)$$\n",
+ " - Sensitivity to feature scales\n",
+ " - recommendation: scale features\n",
+ " - Over-fitting and instability\n",
+ " - recommendation: regularization \n",
+ "$$Q_\\tau(w) = Q(w)+\\frac{\\tau}{2}\\lVert w\\rVert^2$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "-------\n",
+ "SVM
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "** Вопросы **\n",
+ "* Основная идея SVM?\n",
+ "* Что такое \n",
+ " - разделяющая гиперплоскость\n",
+ " - опорный вектор\n",
+ " - Margin?\n",
+ "* Как обучается SVM?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " Svm_max_sep_hyperplane_with_margin.png \n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Реализуем SVM своими руками\n",
+ "See my_svm.ipynb"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Заключение
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "**SGD**: \n",
+ "*Достоинства:\n",
+ " - Быстрые\n",
+ " - Работают\n",
+ " - Интерпретируемы\n",
+ " - Применимы к большим данным\n",
+ " - Можно обучать онлайн\n",
+ "* Недостатки:\n",
+ " - Не всегда хороши (вопросы сходимости)\n",
+ "\n",
+ "** SVM **\n",
+ "* Достоинства\n",
+ " - Сильная обощающая способность\n",
+ " - Выпуклая задача оптимизация (наличие решения)\n",
+ " - Не нужны все объекты обучающей выборки для обучения\n",
+ "* Недостатки:\n",
+ " - пока не добрались :)\n",
+ "\n",
+ "** HW **\n",
+ " - реализовать SVM и запустить его сгенерированной обучающей выборке (см.ссылку на стартовый код выше)\n",
+ "\n",
+ "** Обратная связь ** \n",
+ " * оцените семинар \n",
+ " * оставьте отзыв о лекции"
+ ]
+ }
+ ],
+ "metadata": {
+ "anaconda-cloud": {},
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
diff --git a/2018/seminars/8_linear_models/TextClassification.ipynb b/2018/seminars/8_linear_models/TextClassification.ipynb
new file mode 100644
index 0000000..9ee9006
--- /dev/null
+++ b/2018/seminars/8_linear_models/TextClassification.ipynb
@@ -0,0 +1,745 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.datasets import fetch_20newsgroups\n",
+ "import numpy as np\n",
+ "import heapq"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['alt.atheism',\n",
+ " 'comp.graphics',\n",
+ " 'comp.os.ms-windows.misc',\n",
+ " 'comp.sys.ibm.pc.hardware',\n",
+ " 'comp.sys.mac.hardware',\n",
+ " 'comp.windows.x',\n",
+ " 'misc.forsale',\n",
+ " 'rec.autos',\n",
+ " 'rec.motorcycles',\n",
+ " 'rec.sport.baseball',\n",
+ " 'rec.sport.hockey',\n",
+ " 'sci.crypt',\n",
+ " 'sci.electronics',\n",
+ " 'sci.med',\n",
+ " 'sci.space',\n",
+ " 'soc.religion.christian',\n",
+ " 'talk.politics.guns',\n",
+ " 'talk.politics.mideast',\n",
+ " 'talk.politics.misc',\n",
+ " 'talk.religion.misc']"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "all_categories = fetch_20newsgroups().target_names\n",
+ "all_categories"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Возьмём темы из одного раздела, возможно, их будет сложнее отличать друг от друга"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "categories = [\n",
+ " 'sci.electronics',\n",
+ " 'sci.space',\n",
+ " 'sci.med'\n",
+ "]\n",
+ "\n",
+ "train_data = fetch_20newsgroups(subset='train',\n",
+ " categories=categories,\n",
+ " remove=('headers', 'footers', 'quotes'))\n",
+ "\n",
+ "test_data = fetch_20newsgroups(subset='test',\n",
+ " categories=categories,\n",
+ " remove=('headers', 'footers', 'quotes'))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Для векторизации текстов воспользуемся CountVectorizer, он представляет документ как мешок слов. Можно всячески варировать извлечение признаков (убирать редкие слова, убирать частые слова, убирать слова общей лексики, брать биграмы и т.д.)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.feature_extraction.text import CountVectorizer"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "CountVectorizer(analyzer='word', binary=False, decode_error='strict',\n",
+ " dtype=, encoding='utf-8', input='content',\n",
+ " lowercase=True, max_df=1.0, max_features=None, min_df=1,\n",
+ " ngram_range=(1, 1), preprocessor=None, stop_words=None,\n",
+ " strip_accents=None, token_pattern='(?u)\\\\b\\\\w\\\\w+\\\\b',\n",
+ " tokenizer=None, vocabulary=None)"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "CountVectorizer()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "count_vectorizer = CountVectorizer(min_df=5, ngram_range=(1, 2)) "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "<1778x10885 sparse matrix of type ''\n",
+ "\twith 216486 stored elements in Compressed Sparse Row format>"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "sparse_feature_matrix = count_vectorizer.fit_transform(train_data.data)\n",
+ "sparse_feature_matrix"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "num_2_words = {\n",
+ " v: k\n",
+ " for k, v in count_vectorizer.vocabulary_.items() # use .iteritems() for Python 2\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.linear_model import LogisticRegression\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "from sklearn.model_selection import cross_val_score"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Обучим логистическую регрессию для предсказания темы документа"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
+ " intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
+ " penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
+ " verbose=0, warm_start=False)"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "algo = LogisticRegression()\n",
+ "algo.fit(sparse_feature_matrix, train_data.target)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Слова с наибольшим положительным весом, являются характерными словами темы"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "circuit, electronics, power, chips, parts, the number, them, used, tv, ve\n",
+ "msg, medical, my, blood, disease, doctor, health, treatment, your, needles\n",
+ "space, orbit, nasa, thanks for, launch, earth, sorry, moon, spacecraft, solar\n"
+ ]
+ }
+ ],
+ "source": [
+ "W = algo.coef_.shape[1]\n",
+ "for c in algo.classes_:\n",
+ " topic_words = [\n",
+ " num_2_words[w_num]\n",
+ " for w_num in heapq.nlargest(10, range(W), key=lambda w: algo.coef_[c, w])\n",
+ " ]\n",
+ " print(', '.join(topic_words))\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Сравним качество на фолдах с качеством на трейне и на отложенном тесте"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[0.8487395 0.84550562 0.83426966 0.83943662 0.82768362]\n",
+ "0.8391270024469429\n"
+ ]
+ }
+ ],
+ "source": [
+ "algo = LogisticRegression()\n",
+ "arr = cross_val_score(algo, sparse_feature_matrix, train_data.target, cv=5, scoring='accuracy')\n",
+ "print(arr)\n",
+ "print(np.mean(arr))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Почему это неправильная кроссвалидация?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
+ " intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
+ " penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
+ " verbose=0, warm_start=False)"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "algo.fit(sparse_feature_matrix, train_data.target)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9803149606299213"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "accuracy_score(algo.predict(sparse_feature_matrix), train_data.target)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.7928994082840237"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "accuracy_score(algo.predict(count_vectorizer.transform(test_data.data)), test_data.target)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "Мы видим переобучение, это проклятие размерности"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[0.72829132 0.74719101 0.73033708 0.74647887 0.71186441]\n",
+ "0.7328325372866697\n"
+ ]
+ }
+ ],
+ "source": [
+ "algo = LogisticRegression(penalty='l1', C=0.1)\n",
+ "arr = cross_val_score(algo, sparse_feature_matrix, train_data.target, cv=5, scoring='accuracy')\n",
+ "print(arr)\n",
+ "print(np.mean(arr))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "LogisticRegression(C=0.1, class_weight=None, dual=False, fit_intercept=True,\n",
+ " intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
+ " penalty='l1', random_state=None, solver='liblinear', tol=0.0001,\n",
+ " verbose=0, warm_start=False)"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "algo.fit(sparse_feature_matrix, train_data.target)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.7935883014623172"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "accuracy_score(algo.predict(sparse_feature_matrix), train_data.target)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.6813186813186813"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "accuracy_score(algo.predict(count_vectorizer.transform(test_data.data)), test_data.target)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Добавление регуляризатора уменьшает отличие на трейне и тесте, но ухудшает качество. Поиграйтесь с параметрами регуляризации, чтобы получить максимальное качество."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Чтобы не делать векторизацию и обучение раздельно, есть удобный класс Pipeline. Он позволяет объединить в цепочку последовательность действий"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.pipeline import Pipeline"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "pipeline = Pipeline([\n",
+ " (\"vectorizer\", CountVectorizer(min_df=5, ngram_range=(1, 2))),\n",
+ " (\"algo\", LogisticRegression())\n",
+ "])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Pipeline(memory=None,\n",
+ " steps=[('vectorizer', CountVectorizer(analyzer='word', binary=False, decode_error='strict',\n",
+ " dtype=, encoding='utf-8', input='content',\n",
+ " lowercase=True, max_df=1.0, max_features=None, min_df=5,\n",
+ " ngram_range=(1, 2), preprocessor=None, stop_words=None,\n",
+ " ...ty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
+ " verbose=0, warm_start=False))])"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pipeline.fit(train_data.data, train_data.target)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9803149606299213"
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "accuracy_score(pipeline.predict(train_data.data), train_data.target)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.7928994082840237"
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "accuracy_score(pipeline.predict(test_data.data), test_data.target)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Значения ровно такие же как мы получали ранее, делаяя шаги раздельно."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.pipeline import make_pipeline"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "При кроссвалидации нужно, чтобы CountVectorizer не обучался на тесте (иначе объекты становятся зависимыми). Pipeline позволяет это просто сделать."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[0.83753501 0.84550562 0.82303371 0.83943662 0.83050847]\n",
+ "0.835203886828576\n"
+ ]
+ }
+ ],
+ "source": [
+ "pipeline = make_pipeline(CountVectorizer(min_df=5, ngram_range=(1, 2)), LogisticRegression())\n",
+ "arr = cross_val_score(pipeline, train_data.data, train_data.target, cv=5, scoring='accuracy')\n",
+ "print(arr)\n",
+ "print(np.mean(arr))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "В Pipeline можно добавлять новые шаги препроцессинга данных"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.feature_extraction.text import TfidfTransformer"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[0.87114846 0.87078652 0.84831461 0.85633803 0.83898305]\n",
+ "0.8571141323991462\n"
+ ]
+ }
+ ],
+ "source": [
+ "pipeline = make_pipeline(CountVectorizer(min_df=5, ngram_range=(1, 2)), TfidfTransformer(), LogisticRegression())\n",
+ "arr = cross_val_score(pipeline, train_data.data, train_data.target, cv=5, scoring='accuracy')\n",
+ "print(arr)\n",
+ "print(np.mean(arr))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Pipeline(memory=None,\n",
+ " steps=[('countvectorizer', CountVectorizer(analyzer='word', binary=False, decode_error='strict',\n",
+ " dtype=, encoding='utf-8', input='content',\n",
+ " lowercase=True, max_df=1.0, max_features=None, min_df=5,\n",
+ " ngram_range=(1, 2), preprocessor=None, stop_words=None,\n",
+ " ...ty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
+ " verbose=0, warm_start=False))])"
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pipeline.fit(train_data.data, train_data.target)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.96962879640045"
+ ]
+ },
+ "execution_count": 30,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "accuracy_score(pipeline.predict(train_data.data), train_data.target)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.8241758241758241"
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "accuracy_score(pipeline.predict(test_data.data), test_data.target)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Качество стало немного лучше"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Задание\n",
+ "\n",
+ "1. Поиграйтесь с параметрами регуляризации, параметрами CountVectorizer и TfidfTransformer, чтобы получить максимальное качество.\n",
+ "2. Постройте список важных слов и словосочетаний для каждой темы (на основе значений коэффициентов)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
diff --git a/2018/seminars/8_linear_models/seminar8_v0.ipynb b/2018/seminars/8_linear_models/seminar8_v0.ipynb
new file mode 100644
index 0000000..e9fda23
--- /dev/null
+++ b/2018/seminars/8_linear_models/seminar8_v0.ipynb
@@ -0,0 +1,1204 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "ФИВТ, АПТ, Курс по машинному обучению, Весна 2017, семинар 8 \n",
+ "\n",
+ "Alexey Romanenko, \n",
+ "alexromsput@gmail.com"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Linear models (продолжение)\n",
+ " Plan
\n",
+ "\n",
+ "\n",
+ "* **SVM: **\n",
+ " - learning algorithm (повторение)\n",
+ " - Kernel Trick\n",
+ " - MultiClasss SVM\n",
+ " \n",
+ "* **SVM: example of realization **\n",
+ " - simple SVM\n",
+ " - true SVM\n",
+ " \n",
+ "* ** Use cases **\n",
+ " - Budget optimization\n",
+ " - Intelligent email sending\n",
+ " - Man-hours forecasting"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import numpy as np\n",
+ "import seaborn as sns\n",
+ "# import _pickle as pickle # use for Python 2: import cPickle as pickle\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "from matplotlib.colors import ListedColormap\n",
+ "from matplotlib.pyplot import plot, contourf, clabel, contour\n",
+ "\n",
+ "from matplotlib import cm\n",
+ "from sklearn.preprocessing import StandardScaler\n",
+ "from sklearn.datasets import make_moons, make_circles, make_classification\n",
+ "\n",
+ "%matplotlib inline\n",
+ "sns.set_context(\"notebook\", font_scale=1.5)\n",
+ "import random\n",
+ "from IPython.display import Image, SVG\n",
+ "from scipy import optimize"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "-------\n",
+ "SVM
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "** Вопросы **\n",
+ "* Основная идея SVM?\n",
+ "* Что такое \n",
+ " - разделяющая гиперплоскость\n",
+ " - опорный вектор\n",
+ " - Margin?\n",
+ "* Как обучается SVM?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " Svm_max_sep_hyperplane_with_margin.png \n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## $$ L_i = \\sum_{j, j \\neq y_i} max(0, w_j^tx - w_{y_i}^tx + 1)$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Quadratic Programming (QP) Problem:\n",
+ "\n",
+ "## $$ b = w_0 $$\n",
+ "\n",
+ "## $$ \\min_{i = 1, \\ldots, l} y_i ( - w_0) = 1 $$\n",
+ "\n",
+ "Linear Separability\n",
+ "\n",
+ "## \\begin{cases}\n",
+ " \\to \\min\\limits_{w} \\\\\n",
+ " y_i ( - w_0) \\geq 1, i = 1, \\ldots, l\n",
+ "\\end{cases}\n",
+ "\n",
+ "Linear Inseparability\n",
+ "\n",
+ "## \\begin{cases}\n",
+ " \\frac{1}{2} + C \\sum\\limits_{i=1}^{l} \\xi_i \\to \\min\\limits_{w, \\xi} \\\\\n",
+ " y_i ( - w_0) \\geq 1 - \\xi_i, i = 1, \\ldots, l \\\\\n",
+ " \\xi_i \\geq 0, i = 1, \\ldots, l\n",
+ "\\end{cases}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Linear Model Equivalence\n",
+ "\n",
+ "$$ Q(w, w_0) = \\sum\\limits_{i=1}^{l} (1 - M_i(w, w_0))_{+} + \\frac{1}{2C} {\\|w\\|}^2 \\to \\min\\limits_{w, w_0} $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Dual Form\n",
+ "\n",
+ "Before:\n",
+ "## \\begin{cases}\n",
+ " \\sum f(x_i) \\to \\min\\limits_{x} \\\\\n",
+ " h(x_i) \\geq 0, i = 1, \\ldots, n\n",
+ "\\end{cases}\n",
+ "\n",
+ "After:\n",
+ "## \\begin{cases}\n",
+ " \\sum f(x_i) - \\lambda_i h(x_i) \\to \\min\\limits_{x} \\max\\limits_{\\lambda} \\\\\n",
+ " h(x_i) \\geq 0, i = 1, \\ldots, n \\\\\n",
+ " \\lambda_i \\geq 0, i = 1, \\ldots, n \\\\\n",
+ " \\lambda_i = 0 \\ or \\ h(x_i) = 0 \\ (\\sum \\lambda_i h(x_i) = 0)\n",
+ "\\end{cases}\n",
+ "\n",
+ "Calculate derivatives over x and see corollary.\n",
+ "\n",
+ "\n",
+ "## \\begin{cases}\n",
+ " -\\sum\\limits_{i=1}^{l} \\lambda_i + \\frac{1}{2} \\sum\\limits_{i=1}^{l} \\sum\\limits_{j=1}^{l} \\lambda_i \\lambda_j y_i y_j \\to \\min\\limits_{\\lambda} \\\\\n",
+ " \\sum\\limits_{i=1}^{l} \\lambda_i y_i = 0 \\\\\n",
+ " 0 \\leq \\lambda_i \\leq C, i = 1, \\ldots, l\n",
+ "\\end{cases}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Rule\n",
+ "\n",
+ "### $$ a(x) = sign \\left(\\sum\\limits_{i = 1}^{l} \\lambda_i y_i - w_0 \\right) $$\n",
+ "## $$ w_0 = med \\{ - y_i \\ |\\ \\lambda_i > 0 \\} $$\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "--------\n",
+ "## Non-Linear SVM (Kernel Trick)\n",
+ "** Вопросы **\n",
+ " - Что такое ядро?\n",
+ " - Примеры ядер?\n",
+ " - Как строить ядра?\n",
+ " - Применение ядер для классификации нелинейных выборок\n",
+ "\n",
+ "Kernel fuction \n",
+ "## $$ K : X \\times X \\to R $$ \n",
+ "if $ K(x, x') = <\\phi(x), \\phi(x')> $, where $ \\phi : X \\to H $ and H is space with inner product"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Kernel_Machine\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "** Kernel trick **\n",
+ "## \\begin{cases}\n",
+ " -\\sum\\limits_{i=1}^{l} \\lambda_i + \\frac{1}{2} \\sum\\limits_{i=1}^{l} \\sum\\limits_{j=1}^{l} \\lambda_i \\lambda_j y_i y_j \\color{red}{K(x_i, x_j)} \\to \\min\\limits_{\\lambda} \\\\\n",
+ " \\sum\\limits_{i=1}^{l} \\lambda_i y_i = 0 \\\\\n",
+ " 0 \\leq \\lambda_i \\leq C, i = 1, \\ldots, l\n",
+ "\\end{cases}\n",
+ "\n",
+ "### $$ a(x) = sign \\left(\\sum\\limits_{i = 1}^{l} \\lambda_i y_i \\color{red}{K(x_i, x)} - w_0 \\right) $$\n",
+ "## $$ w_0 = med \\{ \\color{red}{K(w, x_i)} - y_i \\ |\\ \\lambda_i > 0 \\} $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Kernel and Building kernels\n",
+ "
\n",
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Example of Kernels\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "** Какое ядро приведёт к линеаризации следующего датасета? **"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "x = np.vstack((np.random.uniform(size=100).reshape((50,2)), np.random.uniform(size=100).reshape((50,2)) + 1)) \n",
+ "plt.figure(figsize=(10, 6))\n",
+ "plot(x[:,0], x[:,1], 'bo')\n",
+ "y1 = np.random.uniform(size=100).reshape((50,2))\n",
+ "y2 = np.random.uniform(size=100).reshape((50,2))\n",
+ "y1[:,0] += 1\n",
+ "y2[:,1] += 1\n",
+ "plot(y2[:,0], y2[:,1], 'go')\n",
+ "plot(y1[:,0], y1[:,1], 'go')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "** Какое ядро приведёт к линеаризации следующего датасета? **"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmAAAAFxCAYAAADUCRRzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X+Q5Hld3/HXe7YvWVE4EJblDnY4YvxRc3OtFdZfeAunZwxCAhVXtNc6jVAxWStZVCyRmbiyt+pMIRoMB6wRFCpXoTsxJ6Io/gC5y26KwtoraphhPDUqdwtravfgzvPgDofZT/7o6d2enu+3+/vt/ny+Pz7f56Nqane/09P97d6e7ld/Pu/P+2POOQEAAKA4c2WfAAAAQNMQwAAAAApGAAMAACgYAQwAAKBgBDAAAICCEcAAAAAKRgADAAAoGAEMAACgYAQwAACAgrXKPoFJnvWsZ7mbbrqp7NMAAACY6P7773/YOXdg0uUqH8BuuukmnT9/vuzTAAAAmMjMHsxyOaYgAQAACkYAAwAAKBgBDAAAoGAEMAAAgIIRwAAAAApGAAMAACgYAQwAAKBgBDAAAICCBQtgZtYyszeY2V+a2RfN7NNm9pZQtwcAAFAXITvhv1vS7ZLulPSApEOSFgLeHgAAQC0ECWBm9lJJHUlf75zbDHEbAFAHvY2eVs6uaPPyphYOLGj5yLI6i52yTwtAyUKNgL1G0p8QvgA0WW+jp2P3HLv67/VL61f/TQgDmi1UDdg3S/oLM3ubmT1mZl8ws98ysxsD3R4AVM7K2ZXE46vnVgs+EwBVEyqAPUfSD0v6BvWnIl8t6YWS3mdmFug2AaBSNi8nTwKkHQfQHKECmO18vdI59/vOuf8h6QclfZOk75j4w2anzMyZmbt48WKgUwSAsBYOJK87SjsOoDlCBbBHJK075z47dOycpH9QhpWQzrlTzjlzztmNNzJrCaCelo8sJx5funWp4DMBUDWhAtifpRw3SVcC3SYAVEpnsaPu0a7aB9tqzbXUPthW92iXAnwAwVZBfkDSnWb2LOfcwzvHXizpOklrgW4TACqns9ghcAHYI9QI2K9J+qyk3zWzf2VmPyDpbkkfcs6dC3SbAAAAtRAkgDnnHlO/2P4RST1Jb5f0YUnfF+L2AAAA6iTYVkTOuf8r6WWhrh8AAKCugm3GDQAAgGQEMAAAgIIRwAAAAApGAAMqrrfRU/tMW63TLbXPtNXb6JV9SgCAGQUrwgcwu95GT8fuOXb13+uX1q/+m95SAFBfjIABFbZydiXx+Oq51YLPBADgEwEMqLDNy5u5jgMA6oEABlTYwoHkvevTjgMA6oEABlTY8pHlxONLty4VfCYAAJ8IYECFdRY76h7tqn2wrdZcS+2DbXWPdinAB4CaM+dc2ecw1uHDh9358+fLPg0AAICJzOx+59zhSZdjBAwAAKBgBDAAAICCEcAAADNhtwYgPzrhAwCmxm4NwHQYAQMATI3dGoDpEMAAAFNjtwZgOgQwVAZ1JED9sFsDMB0CGCphUEeyfmld2277ah0JIQyoNnZrAKZDAEMlUEeCEBhVDa9JuzXwfIJPdMJHJbROt7Tttvcen2tp6+RWCWeEuhtdnTcQaziogt5GTytnV7R5eVMLBxa0fGQ5msea5xOyohM+aoU6EvjGqGqxYi8j4PkE3whgqATqSPxr+nQJq/OKFXtA4fkE3whgqIQm1ZEUIfbRiCwYVS1W7AGF5xN8I4ChMjqLHa0dX9PWyS2tHV8jfM0g1GhEnUbVGFUtVuwBhecTfCOAARHKOhqRJ1DVbVSNUdVixR5QeD7BN1ZBAhFqn2lr/dL63uMH21o7viYp/6quLNeJZutt9LR6bvXqKsilW5cIKGgcVkECDZZlNCLvNGWIUTXMrkqPN2UEQHatsk8AgH+DN75xoxF5i6YXDiwkjoAN1/iMjqoNpimHzwnTSeqxJYnHG6gppiCBhso7pZhlypJpyjDSHvtDTzukC49d2HOcxxsoD1OQAMbKWzSdpQg576halabPqixtujgpfEnxtH4AYsYUJNBQWaYpk35m3PezTFMOMF2ZXd5AFUvrByBmjIABDea7aDrPqFrsndN9SgtU89fPJx6PpfUDEDMCGABv8vRKir1zuk9pwfZN3/kmelMBNcUUJACvJk1TDuSZrmy6SdPFBC6gfghgAEqxfGQ5cWUf02fJsgZbAPXAFCSAmUy7kpGtXQA0GX3AAEwt73ZGwLSSGtHyHEMVVaoPmJk918weNzNnZl9RxG0CCI+VjBgI2dOtKhvB07cOPhU1BflmSY8XdFsACsJKRkh+AtK4cFOFoF+VEIh4BA9gZnZE0ksl/VLo2wJQrLQVi6xkbJZZA9KkcFOFoF+FEIi4BA1gZrZP0l2STkt6OORtAShe3u2MEKdZA9KkcFOFoF+FEIi4hB4BOy5pv6S3B74dACVgJeN0YqslmjUgTQo3VQj6VQiBiEuwAGZmz5T0c5Je55zbyvmzp3YK9t3FixfDnCBQc1V5E/e9nVHsYqwlmjUgTQo3oYJ+nt+hKoRAxCXkCNgvSPqYc+738/6gc+6Uc86cc3bjjTcGOLXZVeXND80U45t4UxRZS1TU69SsASlLuPEd9PP+DjHaC9+C9AEzs5slfVzSiyU9sHP4B9SfinyepM85557Icl1V7ANG7yMUJa33UftMO3Ebn/bBttaOr5Vwpsiqdbqlbbe99/hcS1snc00WjFW316neRi91q6UQ+B1CKGX3AftqSddJ+qikR3a+BnVgn1a/ML+2Yl4Nw8hedYz7hE5BcH0VVUtUt9epoqey6/w7xOt0HEIFsHOSvn3k600733uZ+n3BaqvOv7jjMK1VLePeQCkIrq+iaonSXo82Lm3w5q36FtXzOh2PIAHMOfewc+7e4S9dm4o865z78xC3W5S6/uJOUrdPzLEbF/QpCK6vomqJ0l6PrrgrvHmrvkX1vE7Hg824p1DXX9xJYh3Zq6txQZ+C4HorYrot7XUqSRPfvOv6O8TrdDxaRd2Qc+49kt5T1O2FNPgFLbJgtAgLBxYSi1LrPrJXV8tHlhOLqAdBv7PYqf1zDuEkvU5tXNrQFXdlz2Wb+uZdx98hXqfjwQjYlGLsfRTryF5d1fUTOqpj9HXq5gM3J14uy5s3hd/VwOt0PIK0ofCpim0oYlb0UnAAxZm2NUXdWlrEjtfpasvahoIABgANMs2bNz2zgOyyBrDCasAAAOWbpu6Jwm/AP2rAAABjxdp6xxfq4zANAhj24MUEwDAKv9PRGBXTIoBhF15MAIxiRW46GqNiWhThYxeKbQEgu6I2V0d9lL0ZN2qKYlsAyI76OEyLAIZdeDEBgOyoj8O0CGDYhRcTAHVW9CIi6uMwLWrAsAddlgHUER37UQV0wgcANErsi4h6Gz2tnF25+uF4+cgywbKC6IQPAGiUmBcRjY7uDVoESSKE1RQ1YACAKMS8iIh+Y/EhgAEAohDzIqKYR/eaigAGAIhCUSsSy9iuLebRvaaiCB8AgIzKWmnJCs/6oBM+AACelVWLRb+x+LAKEgCAjMqsxeosdghcEWEEDACAjKjFgi8EMAAAMop5pSWKRQADACAjarHgC6sgAQCoMLYgqhe2IgIAoObYgiheTEECAFBRbEEULwIYAAAVxRZE8SKANUgZ22eguno9qd2WWq3+nz2eDkDl0PYiXgSwhhjUEaxfWte2275aR0AIa6ZeTzp2TFpfl7a3+38eO0YIA6qGthfxIoA1BHUEGLaS/HTQKk8HoFJoexEv2lA0ROt0S9tue+/xuZa2Tm6VcEYoU6vVH/lKOr7F0wEApsZm3NiFOgIMW0j5b087DgDwiwDWEGXXEbAAoFqWk58OWqKsBAAKQQBriDLrCFgAUD2djtTt7l4F2e32jwMAwqMGDMG1z7S1fml97/GDba0dXyvhjAAACIMaMFQGjQQBANiNAIbgWACANDSDBdBUQQKYmb3KzH7HzD5jZo+b2f1mdmzyTyJGZS8AQDXRDBZAk4UaAXudpMcl/YSkV0j6iKT3mtmJQLdXW01YHUgjQSShGSyAJgtShG9mz3LOPTxy7L2SvtU594I81xVzEf5gdeAowgmagGawQLX0NnpaObuizcubWjiwoOUjy7wXTaHUIvzR8LXj45KeHeL26ortgdBkNIMFqoN2QcUrsgj/RZJqt+wt5BQhqwPRZDSDBaqDAYHiFRLAzOx2Sa+U9PYibs+X0J8IWB2IJqMZLFAdDAgUL3gAM7ObJL1X0vudc+/J+DOnzMyZmbt48WLAsxsv9CcCVgei6TodaW2tX/O1tkb4AsrCgEDxggYwM/tKSR+U9JCkO7L+nHPulHPOnHN24403Bju/SUJ/ImB1IACgChgQKF4r1BWb2VMkfUDSP5L0cufc50PdVigLBxYSt9Dx+Ymgs9ghcAEASjV4H1o9t3p1FeTSrUu8PwUUJICZWUvSb0r6aknf5py7FOJ2Qls+spzYJoJPBACA2DAgUKxQI2DvkPQyST8m6SvN7FuGvvdx59wXA92uV3wiAAAAIYRqxPopSc9P+fYLnHOfynpdMTdiBQAAccnaiDXICJhz7qYQ1wsAABCDIhuxAkCper3dfcfY+BtAWYKtggSAquj1pNe/Xrpw4dqx9XXp2M4aG/qPASgaI2AAotbr9YPWcPgatspOK2iAkNvqYToEMOzBLypispK8ocVVm+y0gpqY9rWZjbariQCGXfhFRWwmBawFjzutUGOGUGZ5bWaj7WoigGEXflERm0kBa8lTX+XBVOf6urS9fa3GjBAGH2Z5bWaj7WoigNVAkVOC/KKirtJGn5aTt7jT/LzU7forwE+b6qTGDD7M8tpclY22KW/ZjQBWcUVPCVblFxXIY9zoU6fTD1rD4azblR580O/qx7SpTmrM4MMsr81V2Gib8pa9CGAVV/SUYBV+UYG8Jo0+dTrS2pq0tdX/M0TbibSpTp81ZiFQt1YPs7w2dxY76h7tqn2wrdZcS+2DbXWPdgvdVo/ylr0IYBVX9JRgFX5Rgbyyjj6FDBtpU52+asxCoG6tOTqLHa0dX9PWyS2tHV8r/DWd8pa9CGAVV8aUYNm/qEBeWUafQoeNtKnOKjd5pW6tPkKNIBVVl0V5y14EsIpjShC+xDzVlGX0qYiwUcRUp0/UrdVHiBGkIuuyeC/biwBWcUwJwkdwin2qKcvoE2Fjr7rWrTVRiBGkIuuyeC/by5xzZZ/DWIcPH3bnz58v+zSAUgyC06i8U1vtdj90JR1fW5v+/OqEx2AvX88vhDcYrRo1S4hpnW5p223vPT7X0tbJramuE5KZ3e+cOzzpcoyAARXma9qsTqM/oaZKb7st+fhLXuLn+oGQQowgUZdVLgIYUGG+glNdpppCTpXee2/y8fvum/26Qwldt0cRfr34XiBFXVa5CGBAhaUFpFYr35tyXVokhAwEdRoFlNLD6Py8vyBWt8cEflGXVS5qwIAKS6vRGZWlZqfX6weZzc1+sFtaql6dT6vVDxtJx7dmLEmpSw1Yr9cPoknnOsxHnVZdHhOgTqgBAyIwurpv//7ky2UZIapDi4SQU6VF1oBNO3U4POo1iY9RwbqMjAIxIoABFTccnNJGgWKZMgoZCIqqAZulji1tCjaJj//zOjaPBWLBFCRQI02YMgo1VRpyenPYLP9HaeeYdjux/J8DMWEKEohQVaaMQq7OCzVVWtRK0FkK2/OcC9OEQL0RwIAaqcKUUR276vd60qOPJn/Pd5CZJeilBewTJ5gmBGJDAAMqKm2Uqexi+qJ6R/kaZRsExgsXdh+fnw8TZGYZpUwL2G99a/UXUADIhwAGVFCVR5mK6B3l8/6nBcanPz09yMwS/mYdpSwzYMe8YTtQNQQwoIJCjjLN+iZbRC2Vz/ufNzDmDX9Jj2fZo5TTqHLoB2JEAAMqKNQok4832SIWAvi8/3kDY57wF1NoYVuiZutt9NQ+01brdEvtM231Nnq5vo/8CGBABU0KDdOOYvl4ky1iIYDPUba8gTFP+IsptLAtUXP1Nno6ds8xrV9a17bb1vqldR2759jVkDXp+5gOAQyooHGhYZZRF19vsqGn2HyOsuUNjHnCX0yhpS4btsO/lbPJnyRWz61m+j6mQwADKmhcaJhl1KUub7K+R9nyBMY84a8uj2cWVekxh+JtXk7+xDA4Pun7WTCFuRcBDKiotNAwy6hLnd5kB/f/7rsl56Q77ihmZV6e8Fenx3OSKvSYQzkWDiR/Yhgcn/T9SbJMYTYxoBHAgJqZZdSlSm+yWerYyipyzzpiVqXH04c6rt7E7JaPJH+SWLp1KdP3J5k0hdnUGjP2ggRqZhBKRvl44+/1+lOcg30Yl5fDvAlnvQ9N2PsSqILeRk+r51a1eXlTCwcWtHTrkjqLnczfH6d1uqVtt3eT09ZcS1snt9Q+09b6pb2/6O2Dba0dr98veta9IAlgQA2F2LA6ZLAblTVYFbWBNoBwJgWsSQGtbtiMG4hYiKmiWYr787bFyFrHFlORe4zonI8sJk1hzlpjVlcEMACSpi/un6ZOKy1AtVq7fy6mIvfYxNSEFmF1FjvqHu2qfbCt1lxL7YNtdY92r05hzlpjVldMQQKQNH291aSfS6ork5KnOweGpz1DTLdiNr2e9OpXS08+ufd71OdhGrPUmFUNNWAAcpm2Bmxcndbdd6dfp8SbeB2lPU8GqM9D05VeA2ZmC2b2YTP7gpldNLPTZrYv1O0BmM00LRV6Pem665K/t7Awvq6s00l/o65jJ/mm1EOl/Z8OUJ8HZBMkgJnZMyR9SJKT9EpJpyX9pKQ7Q9weAD/yFPcPRkKSRrCk/lThJz+Z/L1BwIqlyD6tHmp+Pr4gNikcU58HZBNqBOy4pC+T9D3OuT92zv2q+uHrdWb2tEC3CaBAaSMh+/dfm2K8ciX5MoOAFUuRfdpjceFCfIXp48Lxvn3U5wFZhQpg3y3pD51zjw0d66kfyl4S6DYBFChtJORLXxq/Z6V0LWDF0kl+0qhQllYeZcozfZoWmiXp5pv9nxsQq1AB7OskPTB8wDn3kKQv7HwPQM1Nmj5MCyVzc7sD1ui0p1S/WqpJU6Zl1rRNCld520l0OtKJE8nfq9vIJVCmUAHsGZIeTTj+yM73xjKzU2bmzMxdvHjR+8kBsSiz8HvS9GFaKFlcTL/OsntLTft4jhsVksqracvyeE7TgPetb41j5BIolXPO+5ekLUk/lnD8M5J+Ic91vfCFL3TIrrvedbe84xa378597pZ33OK6692yTwmBdLvOSXu/ugX+l3e7zrXbzrVa/T+Hb3ua87vlluSfabeLuS+zPJ7drnPz82H+T7rd/mOzb1//z6zXl+Xx3Lcv+TKt1mznDDSVpPMuS1bKcqG8X5IuSXpjwvHHJf1UnusigGXXXe86ndKeL0JYnA4dKi+sZDUuoCUpMwxME/6SglHe+zzpOk+cyB7qRn/WbPLjWWboBWKUNYCFmoJ8QCO1XmZ2SNKXa6Q2DP6snE2eS1g9V/EKYOTW6/VX2CWpUg+tvHtWltmWIu9WTGnTe1Lyfc4yvZl0nXfdlXz7o1OEST/rUvpsDz+esaxEBeomVAD7oKR/YWZPHTr2/ZKekHRfoNtsvM3Lye8UacdRX+NWGIYMK6Frzm67Lfm4jzCQdu6D40nd/KX0xzNP7VTW2rZJTU6HjQbDPD87/HjGshIVqJ0sw2R5v9QvtP9bSX8s6Tsl/Tv1px9/Pu91MQWZ3S3vuCVxCrJ9hrmE2KRN1YWsAQtVczaYNpubS77+EyfCnXva9N7w16FDybVX4/4PRi+fdZpv3HVO+7P79s02JQogH5U5Bemce0TS7ZL2Sfpd9ZuwvkXSG0PcHvqauqN8E6WNyszPhxu5mGa13CTDI0NpTVvv8zBmnnbu73xn+s/Mz/f/vHAhedRq3Ejj6OWzTm/mGb0cHRVM+9mbb843DVxFTdnmqWp6Gz21z7TVOt1S+0xbvQ0eeK+ypLQyvxgBy6e73nXtM23XOt1y7TNtCvAjVcYKyBAF8mkjQ7Ncf1JhfJ6RpcFtThq1Svs/SLp81hGwcSN141abjiu4r/uIVxVW+zYRi7qmpzJXQfr8IoAByWZdbTd8PVlaHPhaJTgsSzDKsxov7c06bcXo/v3ptzluSm9wnw4d6reeaLXG34c84SjP/2va/R1MO8YQUlilWQ5KWqZHAJsBvbQQm+EgdOjQtbqmtGCSFgzyjERkuXyWEbA8ISLt+tL6c40LRlnObZrL+wxHabe5f3/+nmFVRZ+ycuy7c19iAGud5oGfJGsAC7UKsrZ6Gz0du+eY1i+ta9tta/3Suo7dc4y5b9TW6Aq8Cxeu1TWltbJIquvKu1ouS81YWguEubnd15+1Biit1urixWvnPjf0qudc+m1O6m4/ep+yXn5Qk+WjFivt/j75ZDk7CYRQZmuSJls4kPwApx1HfuaGX4Eq6PDhw+78+fOF3V77TFvrl9b3Hj/Y1trxtcLOA/Cl3e6/EefRavWLtmfRaiW3dhi97l6vH2A2N/tvqktLu8PJIECOSgp/afe13d69z+Sky6SdW1pvrcF9Gr78l76093JJ938WWf9vk+5bXeT5/4c/g8GIUd2jXXUWeeDHMbP7nXOHJ12OEbAR9NJCbKZpzOpjdCHryMWkZq15Vl9maSqap+Hq6Lml7WM5uE/Dl7/llvGX9SHrqFuVmvPmRZ+ycnQWO+oe7ap9sK3WXEvtg23Cl2eNDmBJS2wZdkVspnnD99H41FeH9bTw8IlP7J2OzPJmPcuUVtp9euSRvdN8RXSYH72/+/cnX67u03V5d1SAH53FjtaOr2nr5JbWjq8RvjxrbABLq/W67abbEi9PLy3UVVp3+WHz82FGFw4d2n0b01z3uPCQVOM06c06azBKqjsbBJ5Bj7CBCxeSz6OIkZvh+/vud2e7bwDK19gAlrZv4n0P3sewK2ptNDj89m+nX3YQCh580O/owqBuZ7jI/6GHpruuLNNsg+nILMX6k4JRr9cPWGlbB3U60vXXjz+P4dsqcuSG6TqgPhpbhN863dK221sh3JpraeukpwpZoGBpBctJfBaDj8pT6J7FoLj9E59I/n6rJd199+zF2pMev8H5Z11gAKB5KMKfgFovxCjPhsw+64JGR54++cnky01bDD4YSRpX2O5jq6RJj9/g/H21RmCLHVQd2xGF09gAxr6JiFGegOOrLmi0z9i4fR1HA0reADKufivP6sY0ky47OH8fBfZJj1vde3YhLvTFDKuxAYwltojRuE26Q9UF5Rl1Gw4o0wSQcTVOPkalJl12cP4+aq1CbG4O+JRWK716jiepF1na5Zf5xV6QQHZV2qR7bm78noa+9/g7cSL5+k6c2H25bnf3FkyHDl07t7THb37e/2MYyxY7WfcSRf343I6oSVv8KeNWRK2yAyAAfwYjMOM6y/s26BA/anFxfMG9jynDYffem3z8vvuu/T2pyH7QQuL1r+9vWXTokGTW/3vIxy/tcatTz67Rx3Mwiimx8jIGCwcWEneGyVsrPdpVfzCVKanRs06NnYIEYlV064Np66F87/GXJdCNmy4d3h/zoYf6qypDPn5FNGod5bvon2nUuPmqlQ45lVnnRQIEMAAzmbYeyncAyRLo8oyuhQ4RRffsClH073sUE9XR2+hp5eyK5mxO+1v7NWdzU9dKh9rir+6LBAhgAGY2zaib7wCSJdDlGV0rIkQUOVoZYrTK9ygmqmE42FxxV/Tkl57UFXdFS7cuTTVlGKrtU90XCRDAAOxRVH8qnwEkLdBJ1449+mj268sSIgZd8836X/Pz1W0jEWK0qoxpVITnO9iEavsUamStKAQwALtUuT/VpGA4Guik3fdlsDXSM5957WeG/z5sUohI2m4paU/IqggxWsXWR3HyHWxCtX2qe0N1Ahgwgxg7mVe1sHqaYJh2X5773GuNHx5+eLoQMa6gv+zHKkmo0aqiF30gvBDBprPY0drxNW2d3NLa8TUvqx/r3lC9sXtBArNK2zew7iMAVd3ncJr9JUPel7Tr9nX9IQz21CyqRQnqabRtxEAVm5X3NnpaPbeqzcubWjiwMHWdmk9Z94IkgAFT8r3hdFVU9X5NE6ZC3pe06/Z1/UCZqhhs6oLNuIHAYl2CX9XC6mlqmELel7Tr9nX9wKgie16FmDLEbgQwYEqxLsGvamH1NGEq5H0ZXPf8/LVj8/PVeKwQn7r3vMJeTEECU4q1BqzKqGFCU7XPtBO3BWofbGvtOPPdVcIUJBBYVUeKQip71Scr7tBUde95hb3YjBuYQafTnBDAxstAeXxtjI3qYAQMQCZV7Q8GNEHde15hLwIYgExiXfVZNWVP86KaQnWTR3kIYA1U5FJmxKNqqz5jDCpV3gYK5aM1RFwIYA3DUmZMq0r9wWINKkzzokx8OC8WAaxhfO9yj+ao0qrPWIMK07yYho/gxIfz4tEHrGFap1vadnv3c2nNtbR1soKb1wEJqrpf5ayqug0UqsvXvo30GfOHPmBIFGKXe6BoVatH86VK07yoB1+zGvQZKx4BrGFYyowYxBpUqjTNi3rwFZz4cF48AljDsJQZPpW1EjHmoEK3f+ThKzilfTh/yfNfkvuckA0BbEhTVoAMljLf/a/vlnNOd/zWHVHfX4RR9kpEggrgb1ajs9jRiW86sef4XX96F+8NgRDAdjRtBUjT7m9sqtADK9aViECd+JzVuPdT9yYeZ5V8GN5XQZrZ0yT9pKTvlvS1kp6Q9FFJP+2c+4u811fUKsimrQBp2v2NyeiejANFT8HFuhIRaCpWyftR5irIeUk/IukPJX2vpH8v6QZJHzOzQwFuz4umrQBp2v2NSVVGnmJdiQg0VdmF+E0pAxoIEcD+RtJXOedOOuf+2Dn3fkkvk3SdpNcEuD0vyn7iFa1p9zcmVWnWGetKRKCpylwl38SyGO8BzDn3eefcEyPHPifpQUnP9n17vjStPUPT7m9MQo88Za0vi3klItBEZa6Sb+IuLYV0wjezA5I+Lel1zrm35/nZIjvh9zZ6Wj23qs3Lm1o4sKClW5eibs/QtPsbi5A1YFWpLwPQLDHVn2WtASsqgP03SS+X9DXOuc9pU/i0AAAP7UlEQVRmuPwpSW+UpBtuuEEXL14Me4LIpbfR08rZlavBbfnIMsGtYL1ev+Zrc7M/8rW05CcgsRUOgDLEtDDMawAzs+vVL6Qfyzn3QMLP/qikt0s66px738QbG8FekNWSdd8xQlo9sbIRQBl87WlZBVkDWCvj9b1K0juz3O7ISbxC0l3qt6DIHb5QPePm6Qe/JKO/SINiSkm1+0VqmoWF5BEwVjYCCGnw3tCksphgU5Bm9iJJH5L0G865/zjt9TACVi1Z5uljGkpuGmrAAGA2ZfYBk5ndLOkDkv5A0mtD3AbKkaV9BT3G6ouVjQBQDO8BzMyerX7welzSWyV9k5l9y84XExk1l6V9BT3G6o09FgEgvBAjYAuSnifpkKSPqL8N0eDrHQFuDwXK0ieGHmMAAIxXSBuKWVADVk/0GAMQO1Z7I0ml+oDNggCWzscvPy8gAJBfTG0T4JfvNhSoGB+tHmgXAQDZjH5YffTJRxMvN9ySx+ft8eE4PkFWQSI8H/tmNXHvLQDIK2mj6AuPXUi8rI/V3k3cmDq03kZP7TNttU631D7TrsRjSQCrKR+tHmgXAQCTpX1YTeJjtTcfjv2qaqAlgNWUj1YPtIsAgMnyfCj1sdqbD8d+VTXQEsBqykerB9pFAMBkaR9K56+fH9uSx/ft8eF4OlUNtASwmsrSj6uI6wCA2KV9WH3Td75Ja8fXtHVyS2vH17y9dvLh2K+qBlraUAAAMEHRvQ3ppehP0S1D6AMGAACgYgMtAQwAAKBgWQMYNWAAAAAFI4ABAAAUjAAGAABQMAIYAABAwQhgAAAABSOAAQAAFIwABgAAUDACGAAAQMEIYAAAVFhvo6f2mbZap1tqn2mrt9Er+5TgQavsEwAAAMlG9zFcv7R+9d/sDVlvjIABADCDkCNUK2dXEo+vnlv1dhsoByNgAABMKfQI1eblzVzHUR+MgAEAMKXQI1QLBxZyHUd9EMAAAJhS6BGq5SPLiceXbl3ycv0oDwEMAIAphR6h6ix21D3aVftgW625ltoH2+oe7VKAHwFqwAAAmNLykeVdNWADPkeoOosdAleEGAEDxqD/DoBxGKHCtMw5V/Y5jHX48GF3/vz5sk8DDTS6ummAF1cAQBozu985d3jS5RgBA1LQfwcAEAoBDI2RdzqR/jsAgFAIYJhZHeqkBtOJ65fWte22rzZLHHeu9N8BAIRCAMNMpgk2ZZhmOpH+OwCAUAhgmEld6qSmmU5kdRMATKcOMyNlow8YZlKXOqmFAwtav7SeeHwc+u8AQD6h98eMBSNgmEld6qTKnE7kkyCAJqnLzEjZCGCYSV3qpKaZTvQRnMqqkSP0oYl43ldDXWZGykYjVsyst9HT6rlVbV7e1MKBBS3dulT7YWZfTVjbZ9qJU5/tg22tHV/LdT4rZ1euPsbLR5ZTz4MGshjI87ypO5731eHrda+usjZiDR7AzOzHJb1F0j3Oue/N+/MEMJTB1wtI63RL22577/G5lrZObmW6jrxvLE1/8UNf0wIJz/vqaNpzb1QlOuGb2bMl/aykyyFvB/DN1xC6jxq5vPUUDP9Dal4dDs/76mAFeTaha8BWJf2eJH4DUCu+Fhf4qJHL+8ZSl4URCKtpgYTnfbV0FjtaO76mrZNbWju+RvhKECyAmdk3Svo+SW8IdRtAKL4WF/j4JJj3jaUuCyMQVtMCCc971E2QAGZmJultkn7ROfeZELcBhORzCH3WT4J531iGz33O5rS/tV9zNqeVsyusCmuQpgUSpr1QN0GK8M3sNZLeKOnrnHNPmNm9kh6mCB+YzjQrTZteCIs4VygDVed1FaSZXS/phkmXc849sHPZP5f0Wufc/9z5+XuVI4CZ2Sn1A5xuuOEGXbx4McuPARjCqjAAKF7WAJZ1K6JXSXpnltuVtCzpgqQ/MrOnD93OdTv//nvnEtblD3HOnZJ0SuqPgGU8RwBDmlaEDQB1kqkGzDn3LuecTfraufjXSjos6ZGhr2+T9Iqdv39rgPsBYETTirABoE5CFOH/jKRvH/lak/S/d/6+d04EgHdNK8IGgDrJOgWZmXNuY/SYmT2qfg3Yvb5vD0CyQbE1RdgAUD3eAxiA6ugsdghcAFBBoTvhS5Kcc7dN04ICQBx6Gz21z7TVOt1S+0ybfmQAGo8RMABBjfYjW7+0fvXfjM4BaKpCRsAANFfTNoUGgCwIYACCoh8ZAOxFAAMQFP3IAGAvAhhQYTEUr9OPDAD2oggfqKhYitfpRwYAe2XajLtMhw8fdufPny/7NIDcehs9rZxduRo6lo8s5wodbKYNAPXjezNuADn4GL2ieB0A4kUNGBCAj9YLFK8DQLwIYEAAPkavKF4HgHgRwIAAfIxedRY76h7tqn2wrdZcS+2DbXWPdileB4AIUAMGBLB8ZHlXDdhA3tErNtMGgDgxAgYEwOgVAGAc2lAAACpt1pYuQJGytqFgBAyITAzd84GBQUuX9Uvr2nbbV1u68LxG3RHAgIg08c2KwBk3Hy1dgCoigAERadqbVRMDZ9PQkBixIoABEWnam1XTAmcT0ZAYsSKAARFp2ptVDIGTKdTxaEiMWBHAgIjU4c3KZ+Coe+BkCnUyWrogVrShACLT2+hp9dzq1SX7S7cuVebNanST8oFp31B9X1/R2mfaWr+0vvf4wbbWjq+VcEYAZpW1DQUBDEBhQgSOKgfOSVqnW9p223uPz7W0dXKrhDMCMKusAYytiAAUJkTNVp23a1o4sJAYSOsyhQpgetSAAQ1SdsF33Wu2fKtDzR6AMAhgQENUoeC7iMBRdsjMgwJzoLmoAQMaoioF3yFrtupelA+g/ijCB7BLEwq+qxIyATQXm3ED2KUJ9VcxNGYF0AwEMKAhmlDw3YSQCSAOBDCgIZpQ8N2EkDmNOi1MAJqCGjAAUalzY9YQWJgAFIsifAAACxOAglGED6CxmHK7hoUJQDWxFRGAqIxOuQ0azkpq5JQb2x0B1cQIGICorJxdSTy+em614DOpBhYmANVEAAMQFabcdmvC6legjpiCBBAVptz26ix2CFxAxTACBiAqTLkBqINgAczMnm9mXTP7nJl9wczWzOyloW4PACSm3ADUQ5ApSDM7JOmjktYkvVrS5yV9g6QvC3F7ADCMKTcAVReqBuzNkv5K0sudc1d2jn0o0G0BAADUivcAZmbXS/oeSf9mKHwBAABgR4gasH8m6TpJzsz+j5ltmdmnzWzJzCzA7QEAANRKiAD2nJ0//6uks5K+S9JvSPp5ST+a5QrM7JSZOTNzFy9eDHCKAAAA5ck0BbkzrXjDpMs55x7QtVD3QefcG3b+/hEze56kJUnvyHA9pySdkvqbcWc5RwAAgLrIWgP2KknvzHA5k/S5nb9/ZOR7fyLp1Wb2NOfcYxlvFwAAIDqZpiCdc+9yztmkr52L/1nK1Qy+T2E+AABoNO81YM65T0n6pKTbR751u6S/cs497vs2AQAA6iRUH7CTku4xszdL+iNJt0n6QUk/FOj2AAAAaiPIVkTOufepH7b+paTfk/T9kv6Dc+6/h7g9AACAOjHnqr3I0MwuS3qw7POIzI2S6O9RPB734vGYl4PHvRw87sVLesyf75w7MOkHKx/A4J+ZuaFFEygIj3vxeMzLweNeDh734s3ymAeZggQAAEA6AhgAAEDBCGDNdGfZJ9BQPO7F4zEvB497OXjcizf1Y04NGAAAQMEYAQMAACgYAQwAAKBgBDAAAICCEcAAAAAKRgADAAAoGAEMV5nZj5uZM7P/Vfa5xMrMnmZmd5rZn5rZ35nZ/zOz95nZ15R9bjExswUz+7CZfcHMLprZaTPbV/Z5xcrMXmVmv2NmnzGzx83sfjM7VvZ5NY2ZPXfn8Xdm9hVln0/MzKxlZm8ws780sy+a2afN7C15rqMV6uRQL2b2bEk/K+ly2ecSuXlJPyLp1yX9J0lPkbQk6WNm1nbOXSjz5GJgZs+Q9CFJm5JeKemrJP2y+h84f6bEU4vZ6yT9jaSfkPSwpJdJeq+ZPcs5d1epZ9Ysb5b0uKQvL/tEGuDdkm5Xvw/YA5IOSVrIcwX0AYMkycx+XdI/Uv9J9LBz7ntLPqUomdmXS7rinHti6NhXSnpI0pudczRSnJGZLUl6vfob4j62c+z1kk5Jes7gGPzZCVoPjxx7r6Rvdc69oKTTahQzOyLp/ZJW1A9iT3XOPV7uWcXJzF4q6Xclfb1zbnPa62EKEjKzb5T0fZLeUPa5xM459/nh8LVz7HOSHpT07HLOKjrfLekPR4JWT9KXSXpJOacUt9HwtePj4jldiJ3p9bsknVZ/BBJhvUbSn8wSviQCWOOZmUl6m6RfdM59puzzaSIzOyDpn6o/ZYbZfZ36UwJXOecekvSFne+hGC8Sz+miHJe0X9Lbyz6RhvhmSX9hZm8zs8d2ak1/y8xuzHMlBDC8WtJzJP1S2SfSYL+sft1Gr+wTicQzJD2acPyRne8hMDO7Xf36OwJBYGb2TEk/J+l1zrmtss+nIZ4j6YclfYOkjvrvoy+U9L6dQY1MKMKPjJldL+mGSZdzzj2wc9kVSa8dnRZDdnke84Sf/VFJd0g66pz7bIDTa6qk4lZLOQ6PzOwmSe+V9H7n3HtKPZlm+AVJH3PO/X7ZJ9IgtvP1ysHrtpn9raT7JH2HpA9nuRICWHxeJemdGS5nkpYlXZD0R2b29J3jLUnX7fz7751z22FOMyp5HvNr/zB7hfp1Gz/tnHtfiBNrqEckPT3h+PVKHhmDJzsLSj6o/qKSO0o+neiZ2c3q1yO9eOg1/Ck7f15vZtt8uA7iEUl/PfKh+Zykf1B/JSQBrImcc++S9K6MF/9aSYfVfzKNekTSEfWfVBgj52MuSTKzF6k/5firzrk3Bzmx5npAI7VeZnZI/aX5e0Yh4YeZPUXSB9RfTf1y59znSz6lJvhqSddJ+mjC9z6tfrubf1voGTXDn0n6xwnHTdKVrFdCAGu2n5H0KyPHfkXS30l6o6T1ws+oAXY+tX5A0h9Iem3JpxOjD0r6KTN7qnPu73eOfb+kJ9SfIoBnZtaS9JvqB4Jvc85dKvmUmuKcpG8fOfZSST+tfi+2vy78jJrhA5LuHGm/8mL1w/Ba1iuhDxh2MbN7RR+wYHYa3t6vfi3SD0l6cujbj826rBlXG7FuStqQ9CZJ/0TSf5b0K845GrEGYGa/pn6D4R+T9Kcj3/64c+6LxZ9VM5nZD6vfJJQ+YIGY2dPUf335jPp11E9V/7XmAefcP896PYyAAcVakPS8nb9/ZOR790m6rdCziZBz7pGdVXhvU79Z4qOS3qJ+I1aE8V07f/6XhO+9QNKnijsVICzn3GNm9h2S3qp+Kck/qN8E9yfyXA8jYAAAAAWjDxgAAEDBCGAAAAAFI4ABAAAUjAAGAABQMAIYAABAwQhgAAAABSOAAQAAFIwABgAAUDACGAAAQMH+P/dUhYCt8HOmAAAAAElFTkSuQmCC\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "x = np.random.randn(100, 2)\n",
+ "r = np.abs(np.random.randn(100)) + 4\n",
+ "fi = np.random.uniform(0.0, 2 * np.pi, size = 100)\n",
+ "y = np.vstack((r * np.cos(fi), r * np.sin(fi))).T\n",
+ "plt.figure(figsize=(10, 6))\n",
+ "plot(x[:,0], x[:,1], 'bo')\n",
+ "plot(y[:,0], y[:,1], 'go')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Example of RBF kernel\n",
+ "Demo SVM"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "------\n",
+ "## Multiclass SVM\n",
+ "## $$ Y = \\{1,..., K\\}$$\n",
+ "\n",
+ "** Вопросы **\n",
+ "* Как построить SVM для мультиклассовой задачи классификации?\n",
+ "* Какие недостатки у подходов One-to-One и One-to-All?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Решающее правило\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Pixelspace\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " SVM Realization
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Collecting cvxopt\n",
+ " Using cached cvxopt-1.1.9-cp36-cp36m-manylinux1_x86_64.whl\n",
+ "Installing collected packages: cvxopt\n",
+ "Successfully installed cvxopt-1.1.9\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip3 install cvxopt # Convex optimization package"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from collections import Counter\n",
+ "from itertools import product #, izip\n",
+ "\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "from sklearn import datasets\n",
+ "from sklearn.svm import SVC, LinearSVC\n",
+ "\n",
+ "# import time\n",
+ "from cvxopt import matrix, solvers"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "len(X) = 10000, len(y) = 10000\n",
+ "len(X_train) = 8000\n"
+ ]
+ }
+ ],
+ "source": [
+ "X, y = datasets.make_classification(n_samples=10000, n_features=20, n_classes=2, n_informative=20, n_redundant=0,\n",
+ " random_state=42)\n",
+ "\n",
+ "X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.2,\n",
+ " random_state=42)\n",
+ "\n",
+ "print(\"len(X) = {}, len(y) = {}\".format(len(X), len(y)))\n",
+ "print(\"len(X_train) = \", len(X_train))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(8000,)"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "Y_train.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Explore SVM"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### QP-solver (cvxopt)\n",
+ "\n",
+ "* [Библиотека CVXOPT](http://cvxopt.org/)\n",
+ "* [Документация библиотеки](http://cvxopt.org/documentation/index.html)\n",
+ "* [Разреженные и плотные матрицы](http://abel.ee.ucla.edu/cvxopt/userguide/matrices.html)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def smo_svm(X, Y, C = 1.0, kernl = lambda a, b : np.dot(a.T, b), max_passes = 100, tol = 1e-12):\n",
+ " lambd = np.zeros(len(X))\n",
+ " b = 0.0\n",
+ " \n",
+ " passes = 0\n",
+ " iters = 0\n",
+ " while passes < max_passes:\n",
+ " \n",
+ " if iters > 10000:\n",
+ " print(\"10000 iters!!!\")\n",
+ " break\n",
+ " \n",
+ " num_changed_lambds = 0\n",
+ " \n",
+ " # for objects in Learning Sample\n",
+ " for i in range(len(X) - 1):\n",
+ "\n",
+ " Ei = svm_func(X[i,:], X, Y, lambd, b) - Y[i]\n",
+ " \n",
+ " if Y[i] * Ei < -tol and lambd[i] < C or Y[i] * Ei > tol and lambd[i] > 0.0:\n",
+ " j = np.random.randint(i + 1, len(X))\n",
+ " \n",
+ " # print(\"optimizing %d %d\" % (i, j))\n",
+ " \n",
+ " Ej = svm_func(X[j,:], X, Y, lambd, b) - Y[j]\n",
+ " lambd_i_old = lambd[i]\n",
+ " lambd_j_old = lambd[j]\n",
+ " if (Y[i] != Y[j]):\n",
+ " L = max(0,lambd[j] - lambd[i])\n",
+ " H = min(C, C + lambd[j] - lambd[i])\n",
+ " else:\n",
+ " L = max(0,lambd[i] + lambd[j] - C)\n",
+ " H = min(C,lambd[i] + lambd[j])\n",
+ " \n",
+ " if (L == H):\n",
+ " continue\n",
+ " \n",
+ " nu = 2 * kernl(X[i,:], X[j,:]) - kernl(X[i,:], X[i,:]) - kernl(X[j,:], X[j,:])\n",
+ " \n",
+ " if nu >= 0.0:\n",
+ " continue\n",
+ " \n",
+ " lambd[j] = lambd[j] - (Y[j] * (Ei - Ej)) / (nu)\n",
+ " \n",
+ " if lambd[j] > H:\n",
+ " lambd[j] = H\n",
+ " \n",
+ " if lambd[j] < L:\n",
+ " lambd[j] = L\n",
+ " \n",
+ " if abs(lambd[j] - lambd_j_old) < 1e-7:\n",
+ " continue\n",
+ " \n",
+ " lambd[i] = lambd[i] + Y[i] * Y[j] * (lambd_j_old - lambd[j])\n",
+ " \n",
+ " b1 = b - Ei - Y[i] * (lambd[i] - lambd_i_old) * kernl(X[i,:], X[i,:])\\\n",
+ " - Y[j] * (lambd[j] - lambd_j_old) * kernl(X[i,:], X[j,:])\n",
+ "\n",
+ " b2 = b - Ej - Y[i] * (lambd[i] - lambd_i_old) * kernl(X[i,:], X[i,:])\\\n",
+ " - Y[j] * (lambd[j] - lambd_j_old) * kernl(X[j,:], X[j,:])\n",
+ " \n",
+ " if 0.0 < lambd[i] and lambd[i] < C:\n",
+ " b = b1\n",
+ " elif 0.0 < lambd[j] and lambd[j] < C:\n",
+ " b = b2\n",
+ " else:\n",
+ " b = (b1 + b2) / 2\n",
+ " \n",
+ " num_changed_lambds = num_changed_lambds + 1\n",
+ " \n",
+ " iters += 1\n",
+ " if iters % 10 == 0:\n",
+ " print(\"%d iters done\" % iters)\n",
+ " if num_changed_lambds == 0:\n",
+ " passes += 1\n",
+ " else:\n",
+ " passes = 0\n",
+ " \n",
+ "\n",
+ " return lambd, b"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def svm_func(x, X, Y, lambd, b, ind = None, kernl = lambda a, b : np.dot(a.T, b)):\n",
+ " if ind is None:\n",
+ " ind = range(len(X));\n",
+ " res = 0.0\n",
+ " for i in range(len(lambd)):\n",
+ " res += lambd[i] * Y[ind[i]] * kernl(X[ind[i],:], x)\n",
+ " return res + b"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# The real_SVM\n",
+ "def the_svm(X, Y, C = 1.0, kernl = lambda a , b : np.dot(a.T, b)):\n",
+ "\n",
+ " n_samples, n_features = X.shape\n",
+ "\n",
+ " M = np.zeros((n_samples, n_samples))\n",
+ "\n",
+ " for i in range(len(X)):\n",
+ " for j in range(len(X)):\n",
+ " M[i, j] = kernl(X[i,:], X[j,:]) * Y[i] * Y[j]\n",
+ " \n",
+ " P = cvxopt.matrix(M)\n",
+ " q = cvxopt.matrix(np.ones(n_samples) * -1)\n",
+ " A = cvxopt.matrix(Y, (1,n_samples))\n",
+ " b = cvxopt.matrix(0.0)\n",
+ "\n",
+ " tmp1 = np.diag(np.ones(n_samples) * -1)\n",
+ " tmp2 = np.identity(n_samples)\n",
+ " G = cvxopt.matrix(np.vstack((tmp1, tmp2)))\n",
+ " tmp1 = np.zeros(n_samples)\n",
+ " tmp2 = np.ones(n_samples) * C\n",
+ " h = cvxopt.matrix(np.hstack((tmp1, tmp2)))\n",
+ "\n",
+ " solution = cvxopt.solvers.qp(P, q, G, h, A, b)\n",
+ " lambd = np.ravel(solution['x'])\n",
+ "\n",
+ " sv = lambd > 1e-5\n",
+ " ind = np.arange(len(lambd))[sv]\n",
+ " lambd = lambd[sv]\n",
+ " svec = X[sv]\n",
+ " svec_y = Y[sv]\n",
+ "\n",
+ " b = 0.0\n",
+ " for n in range(len(lambd)):\n",
+ " b += sv_y[n]\n",
+ " b -= np.sum(lambd * sv_y * K[ind[n],sv])\n",
+ " b /= len(lambd)\n",
+ " \n",
+ " return lambd, b, sv, sv_y, ind"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "x = np.array([[0.0, 0.0], [1.0, 0.0], [0.0, 1.0], [1.0, 1.4]])\n",
+ "y = np.array([1.0, -1.0, 1.0, -1.0])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "scrolled": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "10 iters done\n",
+ "20 iters done\n",
+ "30 iters done\n",
+ "40 iters done\n",
+ "50 iters done\n",
+ "60 iters done\n",
+ "70 iters done\n",
+ "80 iters done\n",
+ "90 iters done\n",
+ "100 iters done\n",
+ "110 iters done\n",
+ "120 iters done\n",
+ "130 iters done\n",
+ "140 iters done\n",
+ "150 iters done\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "(array([0.29588265, 0.78277329, 1.70411739, 1.21722671]), 1.0)"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "lambd, b = smo_svm(x, y, C=100.0)\n",
+ "lambd, b"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[0. 0.] 1.0\n",
+ "[1. 0.] -1.0\n",
+ "[0. 1.] 1.0\n",
+ "[1. 1.4] -1.0\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "n = 10\n",
+ "xx = np.linspace(-2, 2, n)\n",
+ "yy = np.linspace(-2, 2, n)\n",
+ "XX, YY = np.meshgrid(xx, yy)\n",
+ "\n",
+ "F = np.zeros_like(XX)\n",
+ "for i in range(len(xx)):\n",
+ " for j in range(len(xx)):\n",
+ " F[j, i] = svm_func(np.array([xx[i], yy[j]]), x, y, lambd, b)\n",
+ "plt.figure(figsize=(14, 7))\n",
+ "contourf(xx, yy, F, 8, alpha=.75, cmap=cm.hot)\n",
+ "clabel(contour(xx, yy, F, 8, colors='black'), inline=1, fontsize=10)\n",
+ "for i in range(len(x)):\n",
+ " if y[i] == 1.0:\n",
+ " plot(x[i,0], x[i,1], 'bo')\n",
+ " print(x[i,:], y[i])\n",
+ " else:\n",
+ " plot(x[i, 0], x[i,1], 'go')\n",
+ " print(x[i,:], y[i])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "x = np.random.randn(20,2)\n",
+ "x[10:,0] += 3\n",
+ "y = np.hstack((np.ones(10), np.ones(10) * -1))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "10 iters done\n",
+ "20 iters done\n",
+ "30 iters done\n",
+ "40 iters done\n",
+ "50 iters done\n",
+ "60 iters done\n",
+ "70 iters done\n",
+ "80 iters done\n",
+ "90 iters done\n",
+ "100 iters done\n",
+ "110 iters done\n",
+ "120 iters done\n",
+ "130 iters done\n",
+ "140 iters done\n",
+ "150 iters done\n",
+ "160 iters done\n",
+ "170 iters done\n",
+ "180 iters done\n",
+ "190 iters done\n",
+ "200 iters done\n",
+ "210 iters done\n",
+ "220 iters done\n",
+ "230 iters done\n",
+ "240 iters done\n",
+ "250 iters done\n",
+ "260 iters done\n",
+ "270 iters done\n",
+ "280 iters done\n",
+ "290 iters done\n",
+ "300 iters done\n",
+ "310 iters done\n",
+ "320 iters done\n",
+ "330 iters done\n",
+ "340 iters done\n",
+ "350 iters done\n",
+ "360 iters done\n",
+ "370 iters done\n",
+ "380 iters done\n",
+ "390 iters done\n",
+ "400 iters done\n",
+ "410 iters done\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "(array([ 0.00000000e+00, 0.00000000e+00, -1.38777878e-17, 2.64223129e-01,\n",
+ " 0.00000000e+00, 0.00000000e+00, 5.84666925e+00, 0.00000000e+00,\n",
+ " 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,\n",
+ " 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.24699303e+00,\n",
+ " 0.00000000e+00, 0.00000000e+00, 4.86389934e+00, 0.00000000e+00]),\n",
+ " 5.430591994773475)"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "lambd, b = smo_svm(x, y, C=10.0)\n",
+ "lambd, b"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "n = 10\n",
+ "xx = np.linspace(np.min(x[:,0])-0.3, np.max(x[:,0]) + 0.3, n)\n",
+ "yy = np.linspace(np.min(x[:,1])-0.3, np.max(x[:,1]) + 0.3, n)\n",
+ "XX, YY = np.meshgrid(xx, yy)\n",
+ "\n",
+ "F = np.zeros_like(XX)\n",
+ "for i in range(len(xx)):\n",
+ " for j in range(len(xx)):\n",
+ " F[j,i] = svm_func(np.array([xx[i], yy[j]]), x, y, lambd, b)\n",
+ "\n",
+ "plt.figure(figsize=(14, 7))\n",
+ "contourf(xx, yy, F, 8, alpha=.75, cmap=cm.hot)\n",
+ "clabel(contour(xx, yy, F, 8, colors='black'), inline=1, fontsize=10)\n",
+ "for i in range(len(x)):\n",
+ " if y[i] == 1.0:\n",
+ " plot(x[i,0], x[i,1], 'bo')\n",
+ " else:\n",
+ " plot(x[i, 0], x[i,1], 'go')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "x = np.random.randn(20,2)\n",
+ "x[10:,0] += 2\n",
+ "x[10:,:] *= 1\n",
+ "y = np.hstack((np.ones(10), np.ones(10) * -1))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def polynomial_kernel(x, y, p=2):\n",
+ " return (1 + np.dot(x, y)) ** p\n",
+ "\n",
+ "def gaussian_kernel(x, y, gamma=0.1):\n",
+ " return (np.exp(-gamma*np.dot(x-y, x-y)))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "10 iters done\n",
+ "20 iters done\n",
+ "30 iters done\n",
+ "40 iters done\n",
+ "50 iters done\n",
+ "60 iters done\n",
+ "70 iters done\n",
+ "80 iters done\n",
+ "90 iters done\n",
+ "100 iters done\n",
+ "110 iters done\n",
+ "120 iters done\n",
+ "130 iters done\n",
+ "140 iters done\n",
+ "150 iters done\n",
+ "160 iters done\n",
+ "170 iters done\n",
+ "180 iters done\n",
+ "190 iters done\n",
+ "200 iters done\n",
+ "210 iters done\n",
+ "220 iters done\n",
+ "230 iters done\n",
+ "240 iters done\n",
+ "250 iters done\n",
+ "260 iters done\n",
+ "270 iters done\n",
+ "280 iters done\n",
+ "290 iters done\n",
+ "300 iters done\n",
+ "310 iters done\n",
+ "320 iters done\n",
+ "330 iters done\n",
+ "340 iters done\n",
+ "350 iters done\n",
+ "360 iters done\n",
+ "370 iters done\n",
+ "380 iters done\n",
+ "390 iters done\n",
+ "400 iters done\n",
+ "410 iters done\n",
+ "420 iters done\n",
+ "430 iters done\n",
+ "440 iters done\n",
+ "450 iters done\n",
+ "460 iters done\n",
+ "470 iters done\n",
+ "480 iters done\n",
+ "490 iters done\n",
+ "500 iters done\n",
+ "510 iters done\n",
+ "520 iters done\n",
+ "530 iters done\n",
+ "540 iters done\n",
+ "550 iters done\n",
+ "560 iters done\n",
+ "570 iters done\n",
+ "580 iters done\n",
+ "590 iters done\n",
+ "600 iters done\n",
+ "610 iters done\n",
+ "620 iters done\n",
+ "630 iters done\n",
+ "640 iters done\n",
+ "650 iters done\n",
+ "660 iters done\n",
+ "670 iters done\n",
+ "680 iters done\n",
+ "690 iters done\n",
+ "700 iters done\n",
+ "710 iters done\n",
+ "720 iters done\n",
+ "730 iters done\n",
+ "740 iters done\n",
+ "750 iters done\n",
+ "760 iters done\n",
+ "770 iters done\n",
+ "780 iters done\n",
+ "790 iters done\n",
+ "800 iters done\n",
+ "810 iters done\n",
+ "820 iters done\n",
+ "830 iters done\n",
+ "840 iters done\n",
+ "850 iters done\n",
+ "860 iters done\n",
+ "870 iters done\n",
+ "880 iters done\n",
+ "890 iters done\n",
+ "900 iters done\n",
+ "910 iters done\n",
+ "920 iters done\n",
+ "930 iters done\n",
+ "940 iters done\n",
+ "950 iters done\n",
+ "960 iters done\n",
+ "970 iters done\n",
+ "980 iters done\n",
+ "990 iters done\n",
+ "1000 iters done\n",
+ "1010 iters done\n",
+ "1020 iters done\n",
+ "1030 iters done\n",
+ "1040 iters done\n",
+ "1050 iters done\n",
+ "1060 iters done\n",
+ "1070 iters done\n",
+ "1080 iters done\n",
+ "1090 iters done\n",
+ "1100 iters done\n",
+ "1110 iters done\n",
+ "1120 iters done\n",
+ "1130 iters done\n",
+ "1140 iters done\n",
+ "1150 iters done\n",
+ "1160 iters done\n",
+ "1170 iters done\n",
+ "1180 iters done\n",
+ "1190 iters done\n",
+ "1200 iters done\n",
+ "1210 iters done\n",
+ "1220 iters done\n",
+ "1230 iters done\n",
+ "1240 iters done\n",
+ "1250 iters done\n",
+ "1260 iters done\n",
+ "1270 iters done\n",
+ "1280 iters done\n",
+ "1290 iters done\n",
+ "1300 iters done\n",
+ "1310 iters done\n",
+ "1320 iters done\n",
+ "1330 iters done\n",
+ "1340 iters done\n",
+ "1350 iters done\n",
+ "1360 iters done\n",
+ "1370 iters done\n",
+ "1380 iters done\n",
+ "1390 iters done\n",
+ "1400 iters done\n",
+ "1410 iters done\n",
+ "1420 iters done\n",
+ "1430 iters done\n",
+ "1440 iters done\n",
+ "1450 iters done\n",
+ "1460 iters done\n",
+ "1470 iters done\n",
+ "1480 iters done\n",
+ "1490 iters done\n",
+ "1500 iters done\n",
+ "1510 iters done\n",
+ "1520 iters done\n",
+ "1530 iters done\n",
+ "1540 iters done\n",
+ "1550 iters done\n",
+ "1560 iters done\n",
+ "1570 iters done\n",
+ "1580 iters done\n",
+ "1590 iters done\n",
+ "1600 iters done\n",
+ "1610 iters done\n",
+ "1620 iters done\n",
+ "1630 iters done\n",
+ "1640 iters done\n",
+ "1650 iters done\n",
+ "1660 iters done\n",
+ "1670 iters done\n",
+ "1680 iters done\n",
+ "1690 iters done\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "(array([1.00000000e+01, 2.49800181e-16, 0.00000000e+00, 8.62379338e+00,\n",
+ " 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,\n",
+ " 0.00000000e+00, 1.00000000e+01, 0.00000000e+00, 6.46810011e+00,\n",
+ " 1.00000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,\n",
+ " 0.00000000e+00, 0.00000000e+00, 1.00000000e+01, 2.15569327e+00]),\n",
+ " 2.895924148545535)"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Learn Kernel SVM\n",
+ "lambd, b = smo_svm(x, y, C=10.0, kernl=polynomial_kernel)\n",
+ "lambd, b"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Draw polynomial Kernel\n",
+ "n = 10\n",
+ "xx = np.linspace(np.min(x[:,0])-0.3, np.max(x[:,0]) + 0.3, n)\n",
+ "yy = np.linspace(np.min(x[:,1])-0.3, np.max(x[:,1]) + 0.3, n)\n",
+ "XX, YY = np.meshgrid(xx, yy)\n",
+ "\n",
+ "F = np.zeros_like(XX)\n",
+ "for i in range(len(xx)):\n",
+ " for j in range(len(xx)):\n",
+ " F[j,i] = svm_func(np.array([xx[i], yy[j]]), x, y, lambd, b, kernl=gaussian_kernel)\n",
+ "\n",
+ "plt.figure(figsize=(14, 7))\n",
+ "contourf(xx, yy, F, 8, alpha=.75, cmap=cm.hot)\n",
+ "clabel(contour(xx, yy, F, 8, colors='black'), inline=1, fontsize=10)\n",
+ "for i in range(len(x)):\n",
+ " if y[i] == 1.0:\n",
+ " plot(x[i,0], x[i,1], 'bo')\n",
+ " #print x[i,:], y[i]\n",
+ " else:\n",
+ " plot(x[i, 0], x[i,1], 'go')\n",
+ " #print x[i,:], y[i]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " Use cases
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "## Budget optimization\n",
+ "See description\n",
+ "## Intelligent email sending\n",
+ "See description\n",
+ "## Man-hours forecasting"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Заключение
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "** SVM **\n",
+ "* Достоинства\n",
+ " - Сильная обощающая способность\n",
+ " - Выпуклая задача оптимизация (наличие решения)\n",
+ " - Не нужны все объекты обучающей выборки для обучения\n",
+ "* Недостатки:\n",
+ " - пока не добрались :)\n",
+ "\n",
+ "** HW ** \n",
+ "\n",
+ "** Обратная связь ** \n",
+ " * оцените семинар \n",
+ " * оставьте отзыв о лекции"
+ ]
+ }
+ ],
+ "metadata": {
+ "anaconda-cloud": {},
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}