From 5084e58a9c6868883966d812adb4707c9a7884a8 Mon Sep 17 00:00:00 2001 From: Gareth Western Date: Wed, 31 Jul 2024 13:59:00 +0200 Subject: [PATCH] chore: tidy up README and make cloud documentation more agnostic --- README.md | 56 ++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 45 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index e10bd74..1e09c3a 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,13 @@ # DuckDB Delta Extension + This is the experimental DuckDB extension for [Delta](https://delta.io/). It is built using the (also experimental) [Delta Kernel](https://github.com/delta-incubator/delta-kernel-rs). The extension (currently) offers **read** support for delta tables, both local and remote. -# Supported platforms +## Supported platforms + The supported platforms are: + - `linux_amd64` and `linux_amd64_gcc4` and `linux_arm64` - `osx_amd64` and `osx_arm64` - `windows_amd64` @@ -12,29 +15,54 @@ The supported platforms are: Support for the [other](https://duckdb.org/docs/extensions/working_with_extensions#platforms) DuckDB platforms is work-in-progress -# How to use -**NOTE: this extension requires the DuckDB v0.10.3 or higher** +## How to use + +> [!NOTE] +> This extension requires the DuckDB v0.10.3 or higher This extension is distributed as a binary extension. To use it, simply use one of its functions from DuckDB and the extension will be autoloaded: + ```SQL FROM delta_scan('s3://some/delta/table'); ``` -Note that using DuckDB [Secrets](https://duckdb.org/docs/configuration/secrets_manager.html) for S3 authentication is supported: +To scan a local table, use the full path prefixes with `file://` + +```SQL +FROM delta_scan('file:///some/path/on/local/machine'); +``` + +## Cloud Storage authentication + +Note that using DuckDB [Secrets](https://duckdb.org/docs/configuration/secrets_manager.html) for Cloud authentication is supported. + +### S3 Example ```SQL -CREATE SECRET (TYPE S3, provider credential_chain); +CREATE SECRET ( + TYPE S3, + PROVIDER CREDENTIAL_CHAIN +); FROM delta_scan('s3://some/delta/table/with/auth'); ``` -To scan a local table, use the full path prefixes with `file://` +### Azure Example + ```SQL -FROM delta_scan('file:///some/path/on/local/machine'); +CREATE SECRET ( + TYPE AZURE, + PROVIDER CREDENTIAL_CHAIN, + CHAIN 'cli', + ACCOUNT_NAME 'mystorageaccount' +); +FROM delta_scan('abfss://some/delta/table/with/auth'); ``` -# Features +## Features + While still experimental, many (scanning) features/optimizations are already supported in this extension as it reuses most of DuckDB's regular parquet scanning logic: + - multithreaded scans and parquet metadata reading - data skipping/filter pushdown - skipping row-groups in file (based on parquet metadata) @@ -43,24 +71,30 @@ regular parquet scanning logic: - scanning tables with deletion vectors - all primitive types - structs -- S3 support with secrets +- Cloud storage (AWS, Azure, GCP) support with secrets More features coming soon! -# Building +## Building + See the [Extension Template](https://github.com/duckdb/extension-template) for generic build instructions -# Running tests +## Running tests + There are various tests available for the delta extension: + 1. Delta Acceptence Test (DAT) based tests in `/test/sql/dat` 2. delta-kernel-rs based tests in `/test/sql/delta_kernel_rs` 3. Generated data based tests in `tests/sql/generated` (generated using [delta-rs](https://delta-io.github.io/delta-rs/), [PySpark](https://spark.apache.org/docs/latest/api/python/index.html), and DuckDB) To run the first 2 sets of tests: + ```shell make test_debug ``` + or in release mode + ```shell make test ```