Releases: dlt-hub/dlt
Releases · dlt-hub/dlt
0.5.2
Core Library
- Add
upsert
merge strategy for Postgres and Snowflake, by @jorritsandbrink in #1466 - Add basic
upsert
support fordelta
table format infilesystem
destination by @jorritsandbrink in #1600 - query tagging for snowflake by @rudolfix in #1582
- Support Open Source ClickHouse Deployments (MergeTree engine and more) by @Pipboyguy in #1496
- allows nested types in BigQuery via native
autodetect_schema
by @rudolfix in #1591 - Enable
upsert
merge strategy for more SQL destinations (Athena, BigQuery, Databricks, mssql) by @jorritsandbrink in #1628 - Fix/1512 fixes
current.pipeline()
access by @rudolfix in #1581 - feat: add config dataset_name_prefix to set custom staging dataset name by @donotpush in #1563
- fix: add airflow db reset for all tests by @donotpush in #1559
- Enable S3 compatible storage for
delta
table format by @jorritsandbrink in #1586 - feat/1495 rest_client: renames JSONResponsePaginator to JSONLinkPaginator by @willi-mueller in #1558
- Feat/1596 adds custom config providers + example of yaml config provider supporting profiles and jinja placeholders by @rudolfix in #1642
- Feat/1583 rest client session timeout configuration by @willi-mueller in #1590
- Add clarification for add_limit by @VioletM in #1594
- Fix/1606 fixes validator incremental step order to keep it always last in the pipe by @rudolfix in #1641
- Feat/1593 rest_client: allow setting of request kwargs by @willi-mueller in #1609
- prevent accidental wrapping of sources in resources when using adapters by @sh-rp in #1645
- Add empty source handling for
delta
table format onfilesystem
destination by @jorritsandbrink in #1617 - Surface original err msg from pydantic as extended_info on DataValidationError by @codingcyclist in #1569
- fix(dockerfile): remove extra spaces around equals sign in LABEL inst… by @thisisdope in #1573
- Qdrant uncommitted state restore and test by @steinitzu in #1545
- fix: suppress alembic logs for tests by @donotpush in #1578
Docs
- Document sql source reflection level and type adapter by @steinitzu in #1467
- Add to docs docs configuring file format options by @VioletM in #1543
- Added how dlt uses arrow by jorrit by @dat-a-man in #1577
- docs/514 rest_api: docs on pluggable paginators by @willi-mueller in #1557
- docs: documents new
convert
parameter in rest_api source incremental config by @willi-mueller in #1649 - Docs/1571 docs on handling NULL values at incremental cursor path by @willi-mueller in #1650
- Add note that pg_replication doesn't support scd2 by @akelad in #1608
- docs/505 updates documentation on custom hooks in response_actions by @willi-mueller in #1524
New Contributors
- @donotpush made their first contribution in #1559
- @thisisdope made their first contribution in #1573
- @akelad made their first contribution in #1608
Full Changelog: 0.5.1...0.5.2
0.5.1
This is a major release (0.4 -> 0.5) in our versioning scheme so please review the breaking changes below. Most of them are relevant only for platform builders that use dlt
internals. Some of the long-deprecated components were removed as well
Breaking Changes
PageNumberPaginator
takesbase_page
andpage
arguments instead ofinitial_page
. This allows to paginate APIs that number pages ie. from 0 or from 1. #1509- deprecated
credentials
argument was removed fromdlt.pipeline
. #1537 Please use destination factories to instantiate destinations with explicit credentials. (https://dlthub.com/devel/general-usage/destination#pass-explicit-credentials)
Breaking Changes (internals)
- if
dlt.source
ordlt.resource
decorated function is passed aNone
in a default argument during a function call, it will be handled exactly like in regular Python function call. Previously suchNone
would request argument injection from configuration. Please read more here: (#1430) dlt.config.value
anddlt.secrets.value
were evaluating toNone
at runtime. Now they will evaluate to a sentinel value. All the existing code should be backward compatible. (#1430)full_refresh
flag ofdlt.pipeline
will be deprecated and replaced withdev_mode
. (#1063) and (https://dlthub.com/devel/general-usage/pipeline#do-experiments-with-dev-mode)- the default resource extraction sequence has changed to
round_robin
fromfifo
as a default setting. You can switch back to the previous behavior and learn more about what this means here: (https://dlthub.com/docs/reference/performance#resources-extraction-fifo-vs-round-robin) - if you create an instance of a SPEC (ie.
SnowflakeCredentials
) it will not be marked as resolved even if all required fields are provided. previously some were resolving and some were not. #1489 parse_native_representation
never marks config as resolved. previously some were resolving and some were not. #1489
Core Library
- support
delta
tables withdelta-rs
on top offilesystem
destination. (#1382) LanceDB
destination and examples (#1375)- external files may be imported and loaded without extraction and normalization (https://dlthub.com/devel/general-usage/resource#import-external-files) - includes jsonl, csv, and parquet
- pick the loader file format for particular resource (https://dlthub.com/devel/general-usage/resource#pick-loader-file-format-for-a-particular-resource)
- extended support for various csv formats (https://dlthub.com/devel/dlt-ecosystem/file-formats/csv#change-settings)
- csv support for snowflake (#1470 https://dlthub.com/devel/dlt-ecosystem/destinations/snowflake#custom-csv-formats)
- support case sensitive and insensitive modes for our destinations ie. snowflake, redshift, bigquery, mssql etc. may work in both modes (#998 https://dlthub.com/devel/general-usage/naming-convention)
- you'll be able to fully change naming convention ie. to have LATIN-1 character set or create collision-free names (https://dlthub.com/devel/general-usage/naming-convention#write-your-own-naming-convention)
- two new naming conventions:
sql_cs_v1
(case sensitive) andsql_ci_v1
(case insensitive) to create SQL safe identifiers without snake case transformation (https://dlthub.com/devel/general-usage/naming-convention#available-naming-conventions) - you'll be able to modify destination capabilities via destination factories (https://dlthub.com/devel/general-usage/destination#inspect-destination-capabilities)
- schemas will be reflected with a single SQL statement which will make schema migrations faster
- loader can handle many more jobs (files) than before. we tested with 30k jobs and it looks fine
- we are adding
refresh
modes topipeline.run
that allow to drop and recreate tables - with different granularity. (https://dlthub.com/devel/general-usage/pipeline#refresh-pipeline-data-and-state) - when generating fingerprint for
filesystem
destination only the bucket component is taken into account #1516 - 1272 Support ClickHouse GCS S3 compatibility mode in filesystem destination by @Pipboyguy in #1423
- Ensure arrow field's nullable flag matches the schema column by @steinitzu in #1429
- Fix streamlit bug on chess example by @sh-rp in #1425
- Fix databricks pandas error by @steinitzu in #1443
- Extend orjson dependency allowed range with excluded versions by @steinitzu in #1501
- Fix/1465 fixes snowflake auth credentials by @rudolfix in #1489
- skips non resolvable fields from appearing in sample secrets.toml by @rudolfix in #1432
- RESTClient: pass environment settings to
requests.Session.send
by @burnash in #1452 - fix: service principal auth support for synapse copy job by @jorritsandbrink in #1472
- docs: Fixed markdown issue in duckdb.md by @PabloCastellano in #1528
- Loader parallelism strategies (destination can request the loading strategy ie. sequential or parallel) by @sh-rp in #1457
- Migrate to sentry sdk 2.0 by @sh-rp in #1477
- fix: allow loggeradapter in addition to logger in logcollector by @matsmhans1 in #1483
- Add load_id to arrow tables in extract step instead of normalize by @steinitzu in #1449
- #1356 implements OAuth2 Client Credentials flow by @willi-mueller in #1357
- Add LanceDB custom destination example code by @Pipboyguy in #1323
- fix(incremental): don't filter Arrow tables with empty filters by @IlyaFaer in #1480
- fix:
Pipeline.sql_client
credentials forwarding by @jorritsandbrink in #1499 - RESTClient: fix duplicate params in URL in JSONResponsePaginator by @burnash in #1515
- Update default log output to not have padding on log level by @sh-rp in #1517
- fix: remove obsolete
dremio
destination capabilities by @jorritsandbrink in #1527 - feat(filesystem): use only netloc and scheme for fingerprint by @IlyaFaer in #1516
- removes deprecated credentials argument from Pipeline by @rudolfix in #1537
- improves collision detection when naming convention changes by @rudolfix in #1536
- Fix/1542 rest client: makes request parameters optional by @willi-mueller in #1544
- RESTClient: add integrations tests for paginators by @burnash in #1509
- selects all tables from info schema if number of tables > threshold by @rudolfix in #1547
- configurable staging dataset name by @rudolfix in #1555
Docs
- naming conventions documentation (https://dlthub.com/docs/general-usage/naming-convention)
- methods to manipulate schema settings (https://dlthub.com/docs/general-usage/schema#schema-settings)
- rest_api: add troubleshooting section by @burnash in #1371
- RESTClient: add docs for
init_request
by @burnash in #1442 - Example: fast postgres to postgres by @AstrakhantsevaAA in #1428
- Docs: Updated filesystem docs with explanations for bucket URLs by @dat-a-man in #1435
- docs for loading with contracts to existing tables by @sh-rp in #1441
- Add troubleshooting to incremental docs by @burnash in #1458
- Docs: cover custom authentication, rework paginators section by @burnash in #1493
- rest_api: add an example to the incremental load section by @burnash in #1502
- rest_api: add a quick example to rest_api docs by @burnash in #1531
- Update grouping-resources.md docs by @axellpadilla in #1538
- adds examples and step by step explanation for refresh modes by @rudolfix in #1560
Verified Sources
We worked intensively on rest_api
and sql_database
:
- Add fallback value for tz in row_tuples_to_arrow (sql_database helpers) @khoadaniel dlt-hub/verified-sources#493
- allows SqlAlchemy engine to be passed to sql_table by @rudolfix dlt-hub/verified-sources#498
- Feat/505 rest api hooks in response actions @willi-mueller dlt-hub/verified-sources#512
- Feat/507 transformation function for incremental cursor @willi-mueller dlt-hub/verified-sources#515
- Allows incremental loading to be configured per resource in
sql_database
@rudolfix dlt-hub/verified-sources#478 - Allows to set the reflection level for tables: minimal (names/nullability), full (data types) and full_with_precision (with ie. varchar length). @steinitzu https://github.com/dl...
0.4.12
Core Library
- feat(pipeline): add an ability to auto truncate staging dataset by @IlyaFaer in #1292
- Feat/1406 bumps duckdb 0.10 + dbt to <=1.8.x by @rudolfix in #1407
- Azure service principal credentials support by @steinitzu in #1377
- Support partitioning hints for athena iceberg by @steinitzu in #1403
- Add recommended_file_size cap to limit data writer file size and cap BigQuery to 4gb by @steinitzu in #1368
- limits mssql query size to fit network buffer to prevent errors on large inserts by @rudolfix in #1372
- allows to bubble up exceptions when standalone resource returns by @rudolfix in #1374
- Fix: use .get on column in mssql destination for cases where the yaml… by @Daniel-Vetter-Coverwhale in #1380
- Make path tests Windows compatible by @jorritsandbrink in #1384
- RESTClient: Added "values" to the data pattern of the rest_api helper by @francescomucio in #1399
- corrects single entity path detection by @rudolfix in #1394
- RESTClient: implement AuthConfigBase.bool + update docs by @burnash in #1413
- Fix: ensure custom session can be provided to rest client by @z3z1ma in #1396
Docs
- RESTClient: add an example for creating a custom POST paginator by @burnash in #1358
- Add rest_api verified source documentation by @burnash in #1308
- Fix typo in Slack Docs by @cybermaxs in #1369
- RESTClient: docs: add the troubleshooting section by @burnash in #1367
- Replace weather api example with github in create a pipeline walkthrough by @sultaniman in #1351
- RESTClient: docs: Fixed snippet definition by @burnash in #1373
- docs: destination tables: elaborate on example code by @burnash in #1386
- add naming rules to contributing by @sh-rp in #1291
- Added info about how to reorder the columns to adjust a schema by @dat-a-man in #1364
- rest_api: add response_actions documentation by @burnash in #1362
- Update the tutorial to use
rest_client.paginate
for pagination by @burnash in #1287 - fix command to install dlt by @Benjamin0313 in #1404
- improves sql database docs by @rudolfix in #1383
- add typing classifier and update maintainers in pyproject by @sh-rp in #1391
- Updated installation command in destination docs and a few others by @dat-a-man in #1410
- Update filesystem docs with auto mkdir config by @VioletM in #1416
- add page to docs for openapi generator by @sh-rp in #1417
New Contributors
- @cybermaxs made their first contribution in #1369
- @Daniel-Vetter-Coverwhale made their first contribution in #1380
- @francescomucio made their first contribution in #1399
- @Benjamin0313 made their first contribution in #1404
Full Changelog: 0.4.11...0.4.12
0.4.11
Core Library
- RESTClient: building blocks (auths, paginators, response extractors etc.) to write REST API pipelines by @burnash
- Enable
merge
write disposition forathena
Iceberg by @jorritsandbrink in #1315 - adds std pipe iterator for stdout and stderr by @rudolfix in #1321
- adds _impl_cls to dlt.resource and dynamic config section to standalone resources with dynamic names by @rudolfix in #1324
- Accept :memory: mode for credentials parameter in duckdb factory by @sultaniman in #1297
- allows windows native, UNC and extended paths in filesystem source and destination by @rudolfix in #1335
- improves union validation: user friendly exceptions by @rudolfix in #1327
- improves instantiation and shutdown of thread pools for telemetry trackers by @rudolfix in #1340
- feat(airflow): pass data sources as callables and additional initializers for delayed source evaluation by @IlyaFaer in #1318
- Fix: ignores table options on ALTER TABLE in BigQuery by @rudolfix in #1306
- Fix: use correct check for column prop in column schema by @z3z1ma in #1347
- Streamlit caching and session state store fixes by @sultaniman in #1326
- implements method to merge columns in two table schemas by @rudolfix in #1348
- Extend motherduck client configuration to pass custom user agent by @sultaniman in #1284
- allows fsspec until 2023.1.0 by @rudolfix in #1305
Docs
- REST Client documentation by @burnash https://dlthub.com/docs/general-usage/http/rest-client
- REST API verified source documentation by @burnash @willi-mueller @francescomucio https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api
- Docs/google ads by @dat-a-man in #1313
- Docs: Freshdesk documentation by @dat-a-man in #1228
- Add instruction on installing dlt via pixi and conda by @sultaniman in #1332
Verified Sources
- rest_api verified source: quickly declare REST API endpoints and convert it into regular dlt source by @burnash @willi-mueller @francescomucio
- rest_api launch blog by @adrianbr in #1355
Full Changelog: 0.4.10...0.4.11
0.4.10
Core Library
- Clickhouse destination by @Pipboyguy in #1097
- fix(filesystem): UNC paths are supported on filesystem source and destination by @IlyaFaer in #1209
scd2
extension: pick your active record literal, defaults to NULL by @jorritsandbrink in #1275- make missing keys warning conditional on merge strategy by @jorritsandbrink in #1290
- Fix filesystem layout timestamps with milliseconds by @sultaniman in #1286
- fallbacks to copy on any OSError when doing hardlink by @rudolfix in #1302
- configurable anonymous telemetry tracker by @rudolfix in #1301
- fix athena edge case and adds layout tests for athena by @sh-rp in #1289
- Streamlit app: do not show a notice if there is no resource state for schema by @sultaniman in #1300
Docs
- Docs: Google Ads documentation. by @dat-a-man in #1224
- explains how to pass explicit credentials + few mssql cases by @rudolfix in #1299
Full Changelog: 0.4.9...0.4.10
0.4.9
Core Library
- SCD2 support by @jorritsandbrink in #1168 https://dlthub.com/devel/general-usage/incremental-loading#scd2-strategy
- A fully configurable layout for filesystem files by @sultaniman in #1182 https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#files-layout
- picks file format matching item format to minimize number of rewrites during loading by @rudolfix in #1222
- fix athena iceberg's trailing location by @romanperesypkin in #1230
- Pass options to parse iso like strings by @VioletM in #1219
- pipeline state can be restored from filesystem destination by @sh-rp in #1184 - https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#syncing-of-dlt-state
- Remove
staging-optimized
replace strategy forsynapse
by @jorritsandbrink in #1231 - fixes bug, where configs where not injected for async functions by @sh-rp in #1241
- feat(transform): implement columns pivot map function by @IlyaFaer in #1152
- Add max_table_nesting to resource decorator by @sultaniman in #1242
- adds csv options to write headers, change delimiter, quotation style by @rudolfix in #1239
- Check for default schema and schema name in streamlit session by @sultaniman in #1155
- Add seconds and millisecond timestamps to filesystem date placeholders by @sultaniman in #1260
- send dlt telemetry wherever you want, not only segment by @zem360 in #1236
- Make merge write-disposition fall back to staging append if no primary or merge keys are specified by @sh-rp in #1225
- Add snowflake application parameter to configuration by @sultaniman in #1266
Docs
- Added docs for deploying dlt with Prefect. by @dat-a-man in #1138
- a note on scd2 incoming high ts change by @rudolfix in #1273
- adding images and wordsmithing to Prefect walkthrough by @WillRaphaelson in #1276
Verified Sources
- Use
pyarrow
,pandas
,connectorx
orsqlalchemy
backends when reading tables withsql_database
. See README for details. dlt-hub/verified-sources#425 - Google ads source is available dlt-hub/verified-sources#428
- Pages endpoint for notion dlt-hub/verified-sources#429
New Contributors
- @romanperesypkin made their first contribution in #1230
- @WillRaphaelson made their first contribution in #1276
Full Changelog: 0.4.8...0.4.9
0.4.9a2
A pre-release that allows to try out the following features and includes the following bugfixes:
- SCD2 support by @jorritsandbrink in #1168 We are still working on BigQuery support) https://dlthub.com/devel/general-usage/incremental-loading#scd2-strategy
- A fully configurable layout for filesystem files by @sultaniman in #1182 https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#files-layout
- picks file format matching item format by @rudolfix in #1222
- fix athena iceberg's trailing location by @romanperesypkin in #1230
- Pass options to parse iso like strings by @VioletM in #1219
- filesystem state sync by @sh-rp in #1184 - https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#syncing-of-dlt-state
- Remove
staging-optimized
replace strategy forsynapse
by @jorritsandbrink in #1231 - fixes bug, where configs where not injected for async functions by @sh-rp in #1241
- adds options to write csv headers, change delimiter by @rudolfix in #1239
Final release is scheduled for next week
0.4.8
Core Library
- Add Dremio as a destination by @maxfirman in #1026
- adds a fast loading of arrow tables/pandas to postgres via COPY csv by @rudolfix in #1185
- adds a csv writer for filesystem and postgres by @rudolfix in #1185
- saves parquet with all logical types,
spark
flavor is not a default any longer by @rudolfix in #1185
#1185 - feat(bigquery): add streaming inserts support by @IlyaFaer in #1123
- Feat: parameterize pipeline class in the primary factory method by @z3z1ma in #1176
- Fix: check for typeddict before class or subclass checks which fail by @z3z1ma in #1160
- fixes column order and add hints table variants by @rudolfix in #1127
- fixes schema versioning by @rudolfix in #1140
- regular initializers for credentials / config specs are type checked like dataclasses by @rudolfix in #1142
- fix streamlit app state display: Add yaml representer for pendulum datetime by @sultaniman in #1192
synapse
andmssql
bugfixes and improvements (INSERT VALUES UNION) by @jorritsandbrink in #1174- various improvements to arrow table normalization by @rudolfix in #1185
- arrow tables without rows create tables in destination by @rudolfix in #1185
- fixes Motherduck configuration to use
my_db
default database and makes password / token mandatory by @rudolfix in
Docs
- docs: add typechecking to embedded snippets by @sh-rp in #1130
- Fix typo with switched column names in schema evolution docs page by @b-per in #1132
- Docs: deploy with Kestra by @dat-a-man in #1087
- Docs: Deploy dlt on dagster by @dat-a-man in #1086
- Update example connection string by @MiConnell in #1188
- Changed directory of all the blog images to google cloud storage. by @dat-a-man in #1156
Verified Sources
- postgres replication / CDC by @jorritsandbrink dlt-hub/verified-sources#392
New Contributors
- @b-per made their first contribution in #1132
- @MiConnell made their first contribution in #1188
- @maxfirman made their first contribution in #1026
Full Changelog: 0.4.7...0.4.8
0.4.7
Core Library
- Custom destinations with
@dlt.destination
decorator by @sh-rp in #1065 - A BigQuery custom destination supporting STRUCT data types by @sh-rp in #1107
- Built-in Streamlit rewrite, UI improvements, dark theme a by @sultaniman in #1060
- fixes various edge cases with Incremental data deduplication, for ordered and unordered results #971 by @rudolfix in #1062
- Adds new
dlt.mark
marker to materialize table schemas without data by @rudolfix in #1122 - validates class instances in typed dict by @rudolfix in #1082
- feat(airflow): allow re-using sources in airflow wrapper by @IlyaFaer in #1080
- feat(core): drop default value for write disposition by @IlyaFaer in #1057
- splits pandas and arrow imports to fix pyarrow.compute missing by @rudolfix in #1112
- improve no schema upgrade path exception by @sh-rp in #1125
Docs
- docs(airflow): add description of new decompose methods by @IlyaFaer in #1072
- check embedded code blocks by @sh-rp in #1093
- docs(kafka): describe the possible sync issues by @IlyaFaer in #1100
- Docs: schema evolution by @dat-a-man in #1078
- Add example link to the custom destination page by @VioletM in #1120
Full Changelog: 0.4.6...0.4.7
0.4.6
Core Library
- feat(airflow): expose the Airflow runner method to create custom DAGs by @IlyaFaer in #1014
- removes sql alchemy dependency and port parts of URL class by @rudolfix in #1028
- Parallelize decorator - run many regular generators in parallel by @steinitzu in #965
- Add main entry point to support calling dlt as python module by @sultaniman in #1023
Library Bugfixes
- fixes naive datetime bug in incremental by @rudolfix in #1020
- Import missing pyarrow compute for transforms on arrowitems by @sh-rp in #1010
- delete normalized package in case it already existed by @sh-rp in #1012
- fix(core): validation error with TTableHintTemplate by @IlyaFaer in #1039
- adds test case where payload data contains PUA unicode characters by @willi-mueller in #1053
- fix add_limit behavior in edge cases by @sh-rp in #1052
- adds row_order to Incremental - automatically stop taking data when out of range by @rudolfix in #1041
- Fix to serialize load metrics as list instead of a dictionary by @sultaniman in #1051
- fix import schema workflow by @sh-rp in #1013
- rollback all changes to live schemas when extraction fails by @sh-rp in #1013
Docs
- Fix zendesk example test by @VioletM in #1027
- Edit arrow-pandas.md and fix a typo by @Bl3f in #1001
- Added info about file compression to filesystem docs by @dat-a-man in #975
- Update "create destination" docs with new file layouts by @steinitzu in #1032
- Docs update on how to set query limits. by @dat-a-man in #973
- Docs/Updated for slack alerts. by @dat-a-man in #1042
Verified Sources
- scrape web sites with spiders and Scrapy and send data to dlt @sultaniman dlt-hub/verified-sources#332
sql_database
recoginizesend_value
androw_order
to return rows in range and optionally ordered. backfill and proper Airflow intervals support @rudolfix dlt-hub/verified-sources#388
New Contributors
Full Changelog: 0.4.5...0.4.6