forked from WinVector/zmPDSwR
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.html
95 lines (91 loc) · 7.28 KB
/
README.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
<h1 id="example-code-and-data-for-practical-data-science-with-r-by-nina-zumel-and-john-mount-manning-2014.">Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014.</h1>
<ul>
<li>The book: <a href="http://www.manning.com/zumel/">"Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014</a> (book copyright Manning Publications Co., all rights reserved)</li>
<li>The support site: <a href="https://github.com/WinVector/zmPDSwR">GitHub WinVector/zmPDSwR</a></li>
</ul>
<h2 id="the-code-and-data-in-this-directory-supports-examples-from">The code and data in this directory supports examples from:</h2>
<ul>
<li>Appendix A: Working with R And Other Tools</li>
</ul>
<h1 id="sql-example-in-knitr-markdown">SQL example in knitr Markdown</h1>
<p>Material from <a href="https://github.com/WinVector/zmPDSwR/">Practical Data Science with R examples GitHub archive</a> In support of the Hotel/SQL example in Appendix A of <a href="http://www.manning.com/zumel/">Practical Data Science with R</a> by Nina Zumel and John Mount.</p>
<ul>
<li>start with README.Rmd and Workbook1.xlsx</li>
<li>produce README.md, HotelRelation.pdf and figure/* by running "knit('README.Rmd')" in R</li>
<li>produce README.html by running "pandoc README.md -o README.html" in bash shell</li>
<li>or all in one shot at the bash shell: echo "library('knitr'); knit('README.Rmd')" | R --vanilla ; pandoc README.md -o README.html</li>
</ul>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(<span class="st">'xlsx'</span>)</code></pre>
<pre><code>## Loading required package: xlsxjars
## Loading required package: rJava</code></pre>
<pre class="sourceCode r"><code class="sourceCode r">bookings <-<span class="st"> </span><span class="kw">read.xlsx</span>(<span class="st">'Workbook1.xlsx'</span>,<span class="dv">1</span>,<span class="dt">startRow=</span><span class="dv">3</span>)
prices <-<span class="st"> </span><span class="kw">read.xlsx</span>(<span class="st">'Workbook1.xlsx'</span>,<span class="dv">2</span>,<span class="dt">startRow=</span><span class="dv">3</span>)
<span class="kw">library</span>(<span class="st">'reshape2'</span>)
bthin <-<span class="st"> </span><span class="kw">melt</span>(bookings,<span class="dt">id.vars=</span><span class="kw">c</span>(<span class="st">'date'</span>),
<span class="dt">variable.name=</span><span class="st">'daysBefore'</span>,<span class="dt">value.name=</span><span class="st">'bookings'</span>)
pthin <-<span class="st"> </span><span class="kw">melt</span>(prices,<span class="dt">id.vars=</span><span class="kw">c</span>(<span class="st">'date'</span>),
<span class="dt">variable.name=</span><span class="st">'daysBefore'</span>,<span class="dt">value.name=</span><span class="st">'price'</span>)
daysCodes <-<span class="st"> </span><span class="kw">c</span>(<span class="st">'day.of.stay'</span>, <span class="st">'X1.before'</span>, <span class="st">'X2.before'</span>, <span class="st">'X3.before'</span>)
bthin$nDaysBefore <-<span class="st"> </span><span class="kw">match</span>(bthin$daysBefore,daysCodes)-<span class="dv">1</span>
pthin$nDaysBefore <-<span class="st"> </span><span class="kw">match</span>(pthin$daysBefore,daysCodes)-<span class="dv">1</span>
<span class="co"># prevent sqldf from triggering tcl/tk dependency</span>
<span class="co"># see: https://code.google.com/p/sqldf/ Troubleshooting</span>
<span class="kw">options</span>(<span class="dt">gsubfn.engine =</span> <span class="st">"R"</span>)
<span class="kw">library</span>(<span class="st">'sqldf'</span>)</code></pre>
<pre><code>## Loading required package: DBI
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: chron
## Loading required package: RSQLite
## Loading required package: RSQLite.extfuns</code></pre>
<pre class="sourceCode r"><code class="sourceCode r">joined <-<span class="st"> </span><span class="kw">sqldf</span>(<span class="st">'</span>
<span class="st"> select</span>
<span class="st"> bCurrent.date as StayDate,</span>
<span class="st"> bCurrent.daysBefore as daysBefore,</span>
<span class="st"> bCurrent.nDaysBefore as nDaysBefore,</span>
<span class="st"> p.price as price,</span>
<span class="st"> bCurrent.bookings as bookingsCurrent,</span>
<span class="st"> bPrevious.bookings as bookingsPrevious,</span>
<span class="st"> bCurrent.bookings - bPrevious.bookings as pickup</span>
<span class="st"> from</span>
<span class="st"> bthin bCurrent</span>
<span class="st"> join</span>
<span class="st"> bthin bPrevious</span>
<span class="st"> on</span>
<span class="st"> bCurrent.date=bPrevious.date</span>
<span class="st"> and bCurrent.nDaysBefore+1=bPrevious.nDaysBefore</span>
<span class="st"> join</span>
<span class="st"> pthin p</span>
<span class="st"> on</span>
<span class="st"> bCurrent.date=p.date</span>
<span class="st"> and bCurrent.nDaysBefore=p.nDaysBefore</span>
<span class="st">'</span>)
<span class="kw">library</span>(<span class="st">'ggplot2'</span>)
plt <-<span class="st"> </span><span class="kw">ggplot</span>(<span class="dt">data=</span>joined,<span class="kw">aes</span>(<span class="dt">x=</span>price,<span class="dt">y=</span>pickup)) +
<span class="st"> </span><span class="kw">geom_point</span>() +<span class="st"> </span><span class="kw">geom_jitter</span>() +<span class="st"> </span><span class="kw">geom_smooth</span>(<span class="dt">method=</span><span class="st">'lm'</span>)
<span class="kw">print</span>(plt)</code></pre>
<div class="figure">
<img src="figure/allsteps.png" alt="plot of chunk allsteps" /><p class="caption">plot of chunk allsteps</p>
</div>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggsave</span>(<span class="dt">filename=</span><span class="st">'HotelRelation.pdf'</span>,<span class="dt">plot=</span>plt)</code></pre>
<pre><code>## Saving 7 x 7 in image</code></pre>
<pre class="sourceCode r"><code class="sourceCode r">
<span class="kw">print</span>(<span class="kw">summary</span>(<span class="kw">lm</span>(pickup~price,<span class="dt">data=</span>joined)))</code></pre>
<pre><code>##
## Call:
## lm(formula = pickup ~ price, data = joined)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.62 -2.81 -1.21 3.39 6.38
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.0077 7.9874 1.38 0.2
## price -0.0280 0.0319 -0.88 0.4
##
## Residual standard error: 4.21 on 10 degrees of freedom
## Multiple R-squared: 0.0714, Adjusted R-squared: -0.0214
## F-statistic: 0.769 on 1 and 10 DF, p-value: 0.401</code></pre>
<h2 id="code-example-license">Code example license</h2>
<p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.</p>