The documents (markdown and executable notebooks) in this repo are meant to be the source material from which a static website can be built using jupyter book.
- JB will execute notebooks in order to render them and extract an HTML representation.
- The execution environment must, therefore, contain all of the packages and compute resources needed for that to succeed (cluster hardware, S3 write access, python modules/libraries, etc).
- The environment must also contain the jupyter-book package. This package is not needed for any
of the notebooks to run; it is only needed for the build process. For this reason, it is not included
in the
HyTEST.yml
environment file. You should install jupyter-book to your hytest conda environment if you would like to build the JupyterBook - JB is based on Sphinx as the documentation engine. Sphinx uses templates to establish styling and features.
- Jupyter Book only builds the static website; it does not publish to a server.
- How to publish depends on where you want the docs to go.
- The default for this repo is to use 'GitHub Pages' as the deployment.
- Publication to GH Pages is made easier with
ghp-import
. This is yet another of those packages that your build environment will need, but is not included in theHyTEST.yml
file (because notebook users don't need it).
This is a set of notes to guide the publication workflow for this Jupyterbook project. The notes are intended for the project maintainers and contributors, not really for the end-users. (Although there isn't anything here that is secret or private, just not relevant to the subject matter in the notebooks).
These notebooks are written to execute on a cloud platform, specifically the nebari
configuration
at https://nebari.esipfed.org/
. It should be reasonably easy to adapt these to other cloud platforms,
or even HPC systems -- but that is not the focus of this project.
We read from and write to S3 object storage for most workflows. If the user wants to run the notebooks for themselves, they'll need to sort out their own access to do this. And likely will need to adjust path strings throughout the notebooks to ensure they are pointing to a location where they have write access.
The environment for this project is defined in the env.yml
file. It extends the "standard" pangeo
stack to add the pieces needed for jupyter-book
. The specific items I added in order to get
jupyter-book to work are:
jupyter-book
ghp-import
jsonschema-with-format-nongpl
webcolors
Those last two items are needed to get the jupyter-book
build to work, although they are
not declared as dependencies explicitly.
NOTE -- I added jupyter book to two different environments: pangeofu
and also the
"standard" pangeo
. These environments differ with respect to versions of a couple of
key libraries (zarr, fsspec). They both will work the same with respect to the jupyterbook
build process.
Build your workflow in a notebook and save it in an appropriate location. Doesn't really matter where it is, so long as it is in this repo -- I started a rough outline based on the old 'college numbering' system (101 is a freshman level course, 201 is sophomore, etc). But that can easily change as you grow the book project.
The finished notebook should be able to run all cells without interaction. Clear all outputs, reset the kernel, then "run all". That verifies that it will run -- but it does not build the book content. That comes later.
Your notebook can contain a mix of markdown and code cells. By default, all code cells are included in the rendered output. If you want to hide a code cell, you can add a tag to its metadata to control what gets shown in the rendered output. Here are some that I like:
hide-input
-- The code cell is 'hidden' behind a dropdown. The user can 'click to see more' to reveal the code cell contents. Code outputs always show.remove-input
-- The code cell is not shown at all. The user can't see the code, but the outputs are shown. I've included an example of this in an early cell in the00_file_inspection.ipynb
notebook. The code cell printing library versions has been tagged such that the book does not include the code cell, but the ouptut of that cell's execution is included.remove-cell
-- Does not show the code cell or its outputs. BUT THE CODE IS EXECUTED. I like this for cells that are used to set up the environment, but don't need to be shown in the rendered output. Those cells won't show in the book, but if the user downloads the notebook to run, all the code is there.
Other tags are available. See https://jupyterbook.org/en/stable/interactive/hiding.html
The jupyter-book project parses the markdown as 'MyST' markup. This is a superset of standard markdown, and adds some additional features. See https://jupyterbook.org/en/stable/reference/cheatsheet.html for the full list of options. Here are some useful directives for the HyTEST documentation:
There are several types... note
, seealso
, warning
, and so on. They are used like this:
:::{note}
This is a note.
:::
NOTE I have enabled the 'colon fence' extension, so you can use the :::
syntax
rather than the backticks. I find it easier to type. Backticks are still supported.
Referring to other documents in the book is made easier with the doc
directive.
This constructs a link for you, but it pulls the title of the target document automatically
to serve as the link text:
{doc}`/path/to/other/notebook`
Note that we are giving a path relative to the repo root (note the leading slash),
and we are NOT including the extension. This lets jupyterbook pick the right
file automatically (e.g. .md
or .ipynb
).
A couple of useful tips for using doc
directives:
- The document you are pointing to must be known to the indexer -- which means that it
must appear in the
_toc.yml
file. - It is possible to use relative path names by leaving off that leading slash.
The page layout of a rendered book page contains a left sidebar, the main content area, and a right sidebar. The left sidebar is used for the table of contents, and the right sidebar is used for the page navigation links (like a table of contents for sections in this specific page). This margin area on the right is often blank, but you can use it to put in additional content that is not part of the main flow of the page.
:::{margin}
This is a margin note.
:::
You can also nest directives... but you have to pair the 'fences' up correctly, so the
parser knows when to end a directive. If you wanted to put a note
in the margin, you
might do this:
::::{margin}
:::{note}
This is a note in the margin.
:::
::::
The margin
directive has a 4-colon fence, and the note
directive has a 3-colon fence.
A sidebar
is used much like a margin
-- it just puts the text in a different spot.
A sidebar
is placed in the main content area, against the right margin. The rest of
the main content text flows around it.
I have found that sidebars do not lay out consistently in the browsers I've tested. I
almost always prefer the effect of margin
over sidebar
.
See the back/Appendix_A.ipynb
notebook to see some of these tags in action.
You can include mathematical expressions in your markdown. The syntax is very similar to LaTeX, but
uses a lighter-weight markup (KaTeX). Enclose LaTeX-style math expressions in $
to
have them render.
I have configured the builder to ignore any document not explicitly listed in the
_toc.yml
file. You must add your notebook to the table of contents for it to be
included in the compiled/rendered output, and also for other documents to be able to reference it
using doc
directives.
The format of the _toc.yml
file is described in the jupyter-book documentation
at https://jupyterbook.org/en/stable/structure/toc.html. I have started a rough
draft already, which should be enough context to let you add what you like.
chapter
items appear in the sidebar table of contents on the left in the rendered output.- If a chapter contains
sections
, those sections appear as sub-items in a drop-down menu in the sidebar. - Sections can also contain sections. This will give you nested dropdown menus.
The build process operates like a mixture of sphinx
and papermill
. It uses the
standard document-generator sphinx
to build the book content, and it uses a
papermill
-like process to execute the notebooks and insert the results into the
compiled book content. (It's not papermill, but it's similar.)
cd
to the repo root directory- Activate the correct conda environment (
conda activate global-pangeo
) - Run
jb build .
to build the book.
This will create a _build
directory that contains the compiled book content as well
as the jupyterbook cache.
I have configured the builder to use a cache. That cache lives in the _build
folder.
It will only run notebooks for which it does not have already cached output. A notebook
with a newer timestamp that that in the cache is also re-run. To force all notebooks
to run (ignoring the cache), run jb build --all .
The jb
command is an alias to jupyter-book
. Use whichever you prefer. Pass a --help
to see the available options and arguments available. Mostly, the behavior is controlled
by the _config.yml
file.
NOTE // NOTE // NOTE // NOTE // NOTE
This is VERY important!!
The jb
command will execute all the notebooks using the currently active conda environment,
UNLESS the kernelspec
is explicitly set in the notebook metadata. Most of the notebooks
authored on nebari
will have that set -- but IT WILL BE WRONG.
If you use the standard jupyterlab interface on nebari, it will set the kernelspec
to something
like this:
"kernelspec": {
"display_name": "users-users-pangeofu",
"language": "python",
"name": "conda-env-users-users-pangeofu-py"
}
It will show you "users-users-pangeofu" as the kernel name, but the actual conda
environment it will activate is named something else. If you were to conda env list
,
you would see that an environment with name conda-env-users-users-pangeofu-py
does
not exist -- so jupyter-book will FAIL ("kernelspec not found", or similar).
There are a few workarounds:
- Use the
vscode
interface onnebari
. It will set thekernelspec
to a generic 'python'. I do not know why this is the case, but it is. Save the notebook out of thevscode
interface, and your jupyterbook runs will work, so long as thepangeo
orpangeofu
conda environments are currently active whenjb build
is run. - Use the jupyterlab interface to set the kernel to "Python 3 (ipykernel)". I have had mixed success with this. It usually works, but not always.
- Use
vi
to manually edit the notebook JSON. Completely remove thekernelspec
entry from the notebook metadata. THIS IS RISKY
Before I discovered the vscode
solution, I was manually editing the notebooks -- which
is not a sustainable way forward. The easiest workaround is probably to just reset the
kernel in the lab interface once your notebook is done (you've got all the executable
bits working).
The jb
command will build the book, but it will not publish it. The static HTML
content is generated in the _build/html
directory under the repo (this folder
is ignored via the .gitignore
file). You can copy that directory to a web server
and serve it up, or you can use the gh-pages
branch to publish it to github pages.
That's what we will do.
The typical workflow for this is to create a gh-pages
branch, and then push all
of the HTML to that branch on github. The gh-pages
branch is a special branch
that github will automatically publish to the github pages site. That workflow
is a little tedious/manual -- and it has been automated with another package we
like to use: ghp-import
.
Once built, execute the following command to publish the book to github pages:
ghp-import --no-jekyll -m 'commit message' --push --force _build/html
This will:
- Flag this as a 'bare' HTML site (no Jekyll processing)
- Create a commit on the
gh-pages
branch with the message 'commit message' - Push the commit to the
gh-pages
branch on the origin remote (i.e. github) - Force the push (overwrite any existing content)
...using the HTML content found in the _build/html
directory.
Now... go to the repo on github.... in the right sidebar of the main page, you should
see a link to a gh-pages
"Environment". Click on that link. You should see
the history of deployments to Pages. Click on the most recent deployment to visit
the published book.