UC Davis DataLab
Spring 2022
Instructors: Tyler Shoemaker <[email protected]> and
Carl Stahmer <[email protected]>
This three-part workshop series introduces participants to natural language processing (NLP) with Python. It builds on our text mining series, "Getting Started with Textual Data", by extending the scope of data-inflected analysis to include various methods of modeling meaning. Sessions will cover NLP topics ranging from segmentation and dependency parsing to sentiment analysis and context-sensitive modeling. We will also discuss how to implement such methods for tasks like classification. Basic familiarity with analyzing textual data in Python is required. We welcome students, postdocs, faculty, and staff from a variety of research domains, ranging from health informatics to the humanities.
The course reader is a live webpage, hosted through GitHub, where you can enter curriculum content and post it to a public-facing site for learners.
To make alterations to the reader:
-
Run
git pull
, or if it's your first time contributing, see the Setup section of this document. -
Edit an existing chapter file or create a new one. Chapter files are Markdown files (
.md
) in thechapters/
directory. Enter your text, code, and other information directly into the file. Make sure your file:- Follows the naming scheme
##_topic-of-chapter.md
(the only exception isindex.md
, which contains the reader's front page). - Begins with a first-level header (like
# This
). This will be the title of your chapter. Subsequent section headers should be second-level headers (like## This
) or below.
Put any supporting resources in
data/
orimg/
. For large files, see the Large Files section of this document. You do not need to add resources generated by your code (such as plots). The next step saves these indocs/
automatically. - Follows the naming scheme
-
Run the command
jupyter-book build .
in a shell at the top level of the repo to regenerate the HTML files in the_build/
. -
When you're finished,
git add
:- Any files you edited directly
- Any supporting media you added to
img/
- The
.gitattributes
file (if you added a large file)
Then
git commit
andgit push
. This updates themain
branch of the repo, which contains source materials for the web page (but not the web page itself). -
Run the command
ghp-import -n -p -f _build/html
in a shell at the top level of the repo to update thegh-pages
branch of the repo. This uses theghp-import
Python package, which you will need to install first (pip install ghp-import
). The live web page will update automatically after 1-10 minutes.
If you want to include a large file (say over 1 MB), you should use git LFS. You can register a large file with git LFS with the shell command:
git lfs track YOUR_FILE
This command updates the .gitattributes
file at the top level of the repo. To
make sure the change is saved, you also need to run:
git add .gitattributes
Now that your large is registered with git LFS, you can add, commit, and push the file with git the same way you would any other file, and git LFS will automatically intercede as needed.
GitHub provides 1 GB of storage and 1 GB of monthly bandwidth free per repo for large files. If your large file is more than 50 MB, check with the other contributors before adding it.
Install jupyter-book using pip
pip install -U jupyter-book
This repo uses Git Large File Storage (git LFS) for large files. If you don't have git LFS installed, download it and run the installer. Then in the shell (in any directory), run:
git lfs install
Then your one-time setup of git LFS is done. Next, clone this repo with git clone
. The large files will be downloaded automatically with the rest of the
repo.