Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish content to web server, GitHub is not a CDN #21

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

amotl
Copy link
Member

@amotl amotl commented Jan 8, 2025

About

Upload content of repository to web server when changed, in order to not fall into cost or other lock-in traps when using S3. We need to detour from using GitHub as a CDN, because it is not viable.

References

Using raw download URLs in web pages or otherwise using those direct links as a form of CDN is discouraged.
-- https://stackoverflow.com/a/58227912

@amotl amotl force-pushed the ci-publish-webserver branch 2 times, most recently from 5ed0533 to 161d756 Compare January 8, 2025 20:56
@amotl
Copy link
Member Author

amotl commented Jan 8, 2025

Problem

While I verified this setup works well using https://github.com/cicerops/webdav-demo, it apparently does not work well in our situation, yet.

Maybe Fastly, fronting cdn.crate.io, needs special treatment to permit WebDAV traffic?

Observations

<3>ERROR : misc/cities.parquet: Failed to copy: Update mkParentDir failed: <html>
<head><title>404 Not Found</title></head>
<3>ERROR : misc/load_worldcities.sql: Failed to copy: Update mkParentDir failed: <html>
<head><title>404 Not Found</title></head>

/cc @WalBeh, @msbt

@amotl amotl force-pushed the ci-publish-webserver branch 3 times, most recently from 801fe7b to 79dd780 Compare January 8, 2025 21:24
@amotl amotl force-pushed the ci-publish-webserver branch from 79dd780 to 17df6a5 Compare January 8, 2025 21:31
.github/workflows/publish.yml Outdated Show resolved Hide resolved
@msbt
Copy link

msbt commented Jan 9, 2025

@amotl can't we just use a bash file and a bunch of wgets to solve this? Like creating a new folder in https://cdn.crate.io/downloads/ for those assets? If the the links change often, we could put them in an external txt file (which doesn't make a lot of traffic) and iterate through it when a jenkins job is triggered, which fetches each download and move them in the download folder.

@amotl
Copy link
Member Author

amotl commented Jan 9, 2025

Hi @msbt, that's a nice idea about thrust reversal if the "push" will not be a viable option, resorting to a "pull" paradigm instead. We will certainly consider this if no other option can be used. Thanks a stack!

@amotl
Copy link
Member Author

amotl commented Jan 9, 2025

On the other hand, we can easily continue to use a Jenkins job, because it provides all features we need, without adding any obstacles.

  • Can monitor a Git repository for changes (by polling it).
  • Can run an rsync command to upload stuff to the web server without much ado.

This PR was trying to make it happen through GitHub Actions, but maybe we are not ready yet / takes too much efforts right now.

@amotl
Copy link
Member Author

amotl commented Jan 9, 2025

Sharing a little conversation:

@amotl said:

I am thinking about copying one-shot right now, in order to unlock downstream procedures, because DATASETS-21 apparently takes more time.

@simonprickett said:

Good idea - the files in the datasets repo don't change rapidly, so if there's a one off copy for now then i can fix the academy and city tour materials as people find/use those all the time.

@amotl said:

Yeah exactly. Done. https://cdn.crate.io/downloads/datasets/cratedb-datasets/

@amotl
Copy link
Member Author

amotl commented Jan 10, 2025

Comment on lines 41 to 51
- name: Acquire sources
uses: actions/checkout@v4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be cool to convert README.md to README.html after acquiring the sources, so people can read it easily per https://cdn.crate.io/downloads/datasets/cratedb-datasets/README.html.

Copy link
Member Author

@amotl amotl Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@amotl amotl force-pushed the ci-publish-webserver branch 3 times, most recently from 9f17035 to 06db545 Compare January 10, 2025 20:07
@amotl amotl force-pushed the ci-publish-webserver branch from 06db545 to 57dd80f Compare January 10, 2025 20:09
@amotl amotl force-pushed the ci-publish-webserver branch from 57dd80f to fcf1a12 Compare January 10, 2025 20:14
@amotl amotl force-pushed the ci-publish-webserver branch from fcf1a12 to 44213f3 Compare January 10, 2025 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants