-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path00_tokenization.qmd
59 lines (45 loc) · 1.02 KB
/
00_tokenization.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
title: "unnest tokens"
author:
- name: John R Little
affiliations:
- name: Duke University
- department: Center for Data & Vizualization Sciences
# date: 'today'
date-modified: 'today'
date-format: long
format:
html:
embed-resources: true
footer: "[John R Little](https://JohnLittle.info) ● [Center for Data & Visualization Sciences](https://library.duke.edu/data/) ● [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)"
logo: images/Rfun_logo.png
license: CC BY
toc: true
toc_float: true
df-print: paged
---
```{r}
#| message: false
#| warning: false
library(tidyverse)
library(tidytext)
```
## Text
Poem by Emily Dickinson
```{r}
text <- c("Because I could not stop for Death -",
"He kindly stopped for me -",
"The Carriage held but just Ourselves -",
"and Immortality")
text
```
## A tidy table
```{r}
text_df <- tibble(line = 1:4, text = text)
text_df
```
## Tokenization
```{r}
text_df %>%
unnest_tokens(word, text)
```