Merge pull request #5 from OpenLineage/add-missing-release-notes

Add missing release notes.
OpenLineage · Jan 17, 2025 · 41eedcc · 41eedcc
2 parents 8b3cdaf + 89e7333
commit 41eedcc
Show file tree

Hide file tree

Showing 13 changed files with 402 additions and 0 deletions.
diff --git a/versioned_docs/version-1.21.1/releases/1_21_1.md b/versioned_docs/version-1.21.1/releases/1_21_1.md
@@ -0,0 +1,24 @@
+---
+title: 1.21.1
+sidebar_position: 9936
+---
+
+# 1.21.1 - 2024-08-29
+
+### Added
+* **Spec: add GCP Dataproc facet** [`#2987`](https://github.com/OpenLineage/OpenLineage/pull/2987) [@tnazarew](https://github.com/tnazarew)  
+    *Registers the Google Cloud Platform Dataproc run facet.*
+
+### Fixed
+* **Airflow: update SQL integration code to work with latest sqlparser-rs main** [`#2983`](https://github.com/OpenLineage/OpenLineage/pull/2983) [@kacpermuda](https://github.com/kacpermuda)  
+    *Adjusts the SQL integration after our sqlparser-rs fork has been updated to the latest main.*
+* **Spark: fix AWS Glue jobs naming for SQL events** [`#3001`](https://github.com/OpenLineage/OpenLineage/pull/3001) [@arturowczarek](https://github.com/arturowczarek)  
+    *SQL events now properly use the names of the jobs retrieved from AWS Glue.*
+* **Spark: fix issue with column lineage when using delta merge into command** [`#2986`](https://github.com/OpenLineage/OpenLineage/pull/2986) [@Imbruced](https://github.com/Imbruced)  
+    *A view instance of a node is now included when gathering data sources for input columns.*
+* **Spark: minor Spark filters refactor** [`#2990`](https://github.com/OpenLineage/OpenLineage/pull/2990) [@arturowczarek](https://github.com/arturowczarek)  
+    *Fixes a number of minor issues.*
+* **Spark: Iceberg tables in AWS Glue have slashes instead of dots in symlinks** [`#2984`](https://github.com/OpenLineage/OpenLineage/pull/2984) [@arturowczarek](https://github.com/arturowczarek)  
+    *They should use slashes and the prefix `table/`.*
+* **Spark: lineage for Iceberg datasets that are present outside of Spark's catalog is now present** [`#2937`](https://github.com/OpenLineage/OpenLineage/pull/2937) [@d-m-h](https://github.com/d-m-h)
+    *Previously, reading Iceberg datasets outside the configured Spark catalog prevented the datasets from being present in the `inputs` property of the `RunEvent`.*
diff --git a/versioned_docs/version-1.22.0/releases/1_21_1.md b/versioned_docs/version-1.22.0/releases/1_21_1.md
@@ -0,0 +1,24 @@
+---
+title: 1.21.1
+sidebar_position: 9936
+---
+
+# 1.21.1 - 2024-08-29
+
+### Added
+* **Spec: add GCP Dataproc facet** [`#2987`](https://github.com/OpenLineage/OpenLineage/pull/2987) [@tnazarew](https://github.com/tnazarew)  
+    *Registers the Google Cloud Platform Dataproc run facet.*
+
+### Fixed
+* **Airflow: update SQL integration code to work with latest sqlparser-rs main** [`#2983`](https://github.com/OpenLineage/OpenLineage/pull/2983) [@kacpermuda](https://github.com/kacpermuda)  
+    *Adjusts the SQL integration after our sqlparser-rs fork has been updated to the latest main.*
+* **Spark: fix AWS Glue jobs naming for SQL events** [`#3001`](https://github.com/OpenLineage/OpenLineage/pull/3001) [@arturowczarek](https://github.com/arturowczarek)  
+    *SQL events now properly use the names of the jobs retrieved from AWS Glue.*
+* **Spark: fix issue with column lineage when using delta merge into command** [`#2986`](https://github.com/OpenLineage/OpenLineage/pull/2986) [@Imbruced](https://github.com/Imbruced)  
+    *A view instance of a node is now included when gathering data sources for input columns.*
+* **Spark: minor Spark filters refactor** [`#2990`](https://github.com/OpenLineage/OpenLineage/pull/2990) [@arturowczarek](https://github.com/arturowczarek)  
+    *Fixes a number of minor issues.*
+* **Spark: Iceberg tables in AWS Glue have slashes instead of dots in symlinks** [`#2984`](https://github.com/OpenLineage/OpenLineage/pull/2984) [@arturowczarek](https://github.com/arturowczarek)  
+    *They should use slashes and the prefix `table/`.*
+* **Spark: lineage for Iceberg datasets that are present outside of Spark's catalog is now present** [`#2937`](https://github.com/OpenLineage/OpenLineage/pull/2937) [@d-m-h](https://github.com/d-m-h)
+    *Previously, reading Iceberg datasets outside the configured Spark catalog prevented the datasets from being present in the `inputs` property of the `RunEvent`.*
diff --git a/versioned_docs/version-1.22.0/releases/1_22_0.md b/versioned_docs/version-1.22.0/releases/1_22_0.md
@@ -0,0 +1,24 @@
+---
+title: 1.22.0
+sidebar_position: 9935
+---
+
+# 1.22.0 - 2024-09-05
+
+### Added
+* **SQL: add support for `USE` statement with different syntaxes** [`#2944`](https://github.com/OpenLineage/OpenLineage/pull/2944) [@kacpermuda](https://github.com/kacpermuda)  
+    *Adjusts our Context so that it can use the new support for this statement in the parser and pass it to a number of queries.*
+* **Spark: add script to build Spark dependencies** [`#3044`](https://github.com/OpenLineage/OpenLineage/pull/3044) [@arturowczarek](https://github.com/arturowczarek)  
+    *Adds a script to rebuild dependencies automatically following releases.*
+* **Website: versionable docs** [`#3007`](https://github.com/OpenLineage/OpenLineage/pull/3007) [`#3023`](https://github.com/OpenLineage/OpenLineage/pull/3023) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)  
+    *Adds a GitHub action that creates a new Docusaurus version on a tag push, verifiable using the openlineage-site repo. Implements a monorepo approach in a new `website` directory.*
+
+### Fixed
+* **SQL: add support for `SingleQuotedString` in `Identifier()`** [`#3035`](https://github.com/OpenLineage/OpenLineage/pull/3035) [@kacpermuda](https://github.com/kacpermuda)  
+    *Single quoted strings were being treated differently than strings with no quotes, double quotes, or backticks.*
+* **SQL: support `IDENTIFIER` function instead of treating it like table name** [`#2999`](https://github.com/OpenLineage/OpenLineage/pull/2999) [@kacpermuda](https://github.com/kacpermuda)  
+    *Adds support for this identifier in SELECT, MERGE, UPDATE, and DELETE statements. For now, only static identifiers are supported. When a variable is used, this table is removed from lineage to avoid emitting incorrect lineage.*
+* **Spark: fix issue with only one table in inputs from SQL query while reading from JDBC** [`#2918`](https://github.com/OpenLineage/OpenLineage/pull/2918) [@Imbruced](https://github.com/Imbruced)  
+    *Events created did not contain the correct input table when the query contained multiple tables.*
+* **Spark: fix AWS Glue jobs naming for RDD events** [`#3020`](https://github.com/OpenLineage/OpenLineage/pull/3020) [@arturowczarek](https://github.com/arturowczarek)  
+    *The naming for RDD jobs now uses the same code as SQL and Application events.*
diff --git a/versioned_docs/version-1.23.0/releases/1_23_0.md b/versioned_docs/version-1.23.0/releases/1_23_0.md
@@ -0,0 +1,40 @@
+---
+title: 1.23.0
+sidebar_position: 9934
+---
+
+# 1.23.0 - 2024-10-04
+
+### Added
+* **Java: added CompositeTransport** [`#3039`](https://github.com/OpenLineage/OpenLineage/pull/2944) [@JDarDagran](https://github.com/JDarDagran)   
+    *This allows user to specify multiple targets to which OpenLineage events will be emitted.*
+* **Spark extension interfaces: support table extended sources** [`#3062`](https://github.com/OpenLineage/OpenLineage/pull/3062) [@Imbruced](https://github.com/Imbruced)  
+    *Interfaces are now able to extract lineage from Table interface, not only RelationProvider.*
+* **Java: added GCP Dataplex transport** [`#3043`](https://github.com/OpenLineage/OpenLineage/pull/3043) [@ddebowczyk92](https://github.com/ddebowczyk92)  
+    *Dataplex transport is now available as a separate Maven package for users that want to send OL events to GCP Dataplex.*
+* **Java: added Google Cloud Storage transport** [`#3077`](https://github.com/OpenLineage/OpenLineage/pull/3077) [@ddebowczyk92](https://github.com/ddebowczyk92)  
+    *GCS transport is now available as a separate Maven package for users that want to send OL events to Google Cloud Storage.*
+* **Java: added S3 transport** [`#3129`](https://github.com/OpenLineage/OpenLineage/pull/3129) [@arturowczarek](https://github.com/arturowczarek)  
+    *S3 transport is now available as a separate Maven package for users that want to send OL events to S3.*
+* **Java: add option to configure client via environment variables** [`#3094`](https://github.com/OpenLineage/OpenLineage/pull/3094) [@JDarDagran](https://github.com/JDarDagran)  
+    *Specified variables are now autotranslated to configuration values.*
+* **Python: add option to configure client via environment variables** [`#3114`](https://github.com/OpenLineage/OpenLineage/pull/3114) [@JDarDagran](https://github.com/JDarDagran)  
+    *Specified variables are now autotranslated to configuration values.*
+* **Python: add option to add custom headers in HTTP transport** [`#3116`](https://github.com/OpenLineage/OpenLineage/pull/3116) [@JDarDagran](https://github.com/JDarDagran)  
+    *Allows user to add custom headers, for example for auth purposes.*
+* **Spec: add full dataset dependencies** [`#3097`](https://github.com/OpenLineage/OpenLineage/pull/3097) [`#3098`](https://github.com/OpenLineage/OpenLineage/pull/3098) [@arturowczarek](https://github.com/arturowczarek)  
+    *Now, if datasetLineageEnabled is enabled, and when column level lineage depends on the whole dataset, it does add dataset dependency instead of listing all the column fields in that dataset.*
+* **Java: OpenLineageClient and Transports are now AutoCloseable** [`#3122`](https://github.com/OpenLineage/OpenLineage/pull/3122) [@ddebowczyk92](https://github.com/ddebowczyk92)  
+    *This prevents a number of issues that might be caused by not closing underlying transports.*
+
+### Fixed
+* **Python Facet generator does not validate optional arguments** [`#3054`](https://github.com/OpenLineage/OpenLineage/pull/3054) [@JDarDagran](https://github.com/JDarDagran)  
+    *This fixes issue where NominalTimeRunFacet Facet breaks when nominalEndTime is None.*
+* **SQL: report only actually used tables from CTEs, rather than all** [`#2962`](https://github.com/OpenLineage/OpenLineage/pull/2962) [@Imbruced](https://github.com/Imbruced)  
+    *With this change, if SQL specified CTE, but does not use it in final query, the lineage won't be falsely reported.*
+* **Fluentd: Enhancing plugin's capabilities** [`#3068`](https://github.com/OpenLineage/OpenLineage/pull/3068) [@jonathanlbt1](https://github.com/jonathanlbt1)  
+    *This change enhances performance and docs of fluentd proxy plugin.*
+* **SQL: fix parser to point to origin table instead of CTEs** [`#3107`](https://github.com/OpenLineage/OpenLineage/pull/3107) [@Imbruced](https://github.com/Imbruced)  
+    *For some complex CTEs, parser emitted CTE as a target table instead of original table. This is now fixed.*
+* **Spark: column lineage correctly produces for merge into command** [`#3095`](https://github.com/OpenLineage/OpenLineage/pull/3095) [@Imbruced](https://github.com/Imbruced)  
+    *Now OL produces CLL correctly for the potential view in the middle.*
diff --git a/versioned_docs/version-1.24.2/releases/1_23_0.md b/versioned_docs/version-1.24.2/releases/1_23_0.md
@@ -0,0 +1,40 @@
+---
+title: 1.23.0
+sidebar_position: 9934
+---
+
+# 1.23.0 - 2024-10-04
+
+### Added
+* **Java: added CompositeTransport** [`#3039`](https://github.com/OpenLineage/OpenLineage/pull/2944) [@JDarDagran](https://github.com/JDarDagran)   
+    *This allows user to specify multiple targets to which OpenLineage events will be emitted.*
+* **Spark extension interfaces: support table extended sources** [`#3062`](https://github.com/OpenLineage/OpenLineage/pull/3062) [@Imbruced](https://github.com/Imbruced)  
+    *Interfaces are now able to extract lineage from Table interface, not only RelationProvider.*
+* **Java: added GCP Dataplex transport** [`#3043`](https://github.com/OpenLineage/OpenLineage/pull/3043) [@ddebowczyk92](https://github.com/ddebowczyk92)  
+    *Dataplex transport is now available as a separate Maven package for users that want to send OL events to GCP Dataplex.*
+* **Java: added Google Cloud Storage transport** [`#3077`](https://github.com/OpenLineage/OpenLineage/pull/3077) [@ddebowczyk92](https://github.com/ddebowczyk92)  
+    *GCS transport is now available as a separate Maven package for users that want to send OL events to Google Cloud Storage.*
+* **Java: added S3 transport** [`#3129`](https://github.com/OpenLineage/OpenLineage/pull/3129) [@arturowczarek](https://github.com/arturowczarek)  
+    *S3 transport is now available as a separate Maven package for users that want to send OL events to S3.*
+* **Java: add option to configure client via environment variables** [`#3094`](https://github.com/OpenLineage/OpenLineage/pull/3094) [@JDarDagran](https://github.com/JDarDagran)  
+    *Specified variables are now autotranslated to configuration values.*
+* **Python: add option to configure client via environment variables** [`#3114`](https://github.com/OpenLineage/OpenLineage/pull/3114) [@JDarDagran](https://github.com/JDarDagran)  
+    *Specified variables are now autotranslated to configuration values.*
+* **Python: add option to add custom headers in HTTP transport** [`#3116`](https://github.com/OpenLineage/OpenLineage/pull/3116) [@JDarDagran](https://github.com/JDarDagran)  
+    *Allows user to add custom headers, for example for auth purposes.*
+* **Spec: add full dataset dependencies** [`#3097`](https://github.com/OpenLineage/OpenLineage/pull/3097) [`#3098`](https://github.com/OpenLineage/OpenLineage/pull/3098) [@arturowczarek](https://github.com/arturowczarek)  
+    *Now, if datasetLineageEnabled is enabled, and when column level lineage depends on the whole dataset, it does add dataset dependency instead of listing all the column fields in that dataset.*
+* **Java: OpenLineageClient and Transports are now AutoCloseable** [`#3122`](https://github.com/OpenLineage/OpenLineage/pull/3122) [@ddebowczyk92](https://github.com/ddebowczyk92)  
+    *This prevents a number of issues that might be caused by not closing underlying transports.*
+
+### Fixed
+* **Python Facet generator does not validate optional arguments** [`#3054`](https://github.com/OpenLineage/OpenLineage/pull/3054) [@JDarDagran](https://github.com/JDarDagran)  
+    *This fixes issue where NominalTimeRunFacet Facet breaks when nominalEndTime is None.*
+* **SQL: report only actually used tables from CTEs, rather than all** [`#2962`](https://github.com/OpenLineage/OpenLineage/pull/2962) [@Imbruced](https://github.com/Imbruced)  
+    *With this change, if SQL specified CTE, but does not use it in final query, the lineage won't be falsely reported.*
+* **Fluentd: Enhancing plugin's capabilities** [`#3068`](https://github.com/OpenLineage/OpenLineage/pull/3068) [@jonathanlbt1](https://github.com/jonathanlbt1)  
+    *This change enhances performance and docs of fluentd proxy plugin.*
+* **SQL: fix parser to point to origin table instead of CTEs** [`#3107`](https://github.com/OpenLineage/OpenLineage/pull/3107) [@Imbruced](https://github.com/Imbruced)  
+    *For some complex CTEs, parser emitted CTE as a target table instead of original table. This is now fixed.*
+* **Spark: column lineage correctly produces for merge into command** [`#3095`](https://github.com/OpenLineage/OpenLineage/pull/3095) [@Imbruced](https://github.com/Imbruced)  
+    *Now OL produces CLL correctly for the potential view in the middle.*
diff --git a/versioned_docs/version-1.24.2/releases/1_24_2.md b/versioned_docs/version-1.24.2/releases/1_24_2.md
@@ -0,0 +1,29 @@
+---
+title: 1.24.2
+sidebar_position: 9933
+---
+
+# 1.24.2 - 2024-11-05
+
+### Added
+* **Spark: Add Dataproc run facet to include jobType property** [`#3167`](https://github.com/OpenLineage/OpenLineage/pull/3167) [@codelixir](https://github.com/codelixir)  
+    *Updates the GCP Dataproc run facet to include jobType property.*
+* **Add EnvironmentVariablesRunFacet to core spec** [`#3186`](https://github.com/OpenLineage/OpenLineage/pull/3186) [@JDarDagran](https://github.com/JDarDagran)  
+    *Additionally, directly use EnvironmentVariablesRunFacet in Python client.*
+* **Add assertions for format in test events** [`#3221`](https://github.com/OpenLineage/OpenLineage/pull/3221) [@JDarDagran](https://github.com/JDarDagran)
+* **Spark: Add integration tests for EMR** [`#3142`](https://github.com/OpenLineage/OpenLineage/pull/3142) [@arturowczarek](https://github.com/arturowczarek)  
+    *Spark integration has integration tests for EMR.*
+
+### Changed
+* **Move Kinesis to separate module, migrate HTTP transport to httpclient5** [`#3205`](https://github.com/OpenLineage/OpenLineage/pull/3205) [@mobuchowski](https://github.com/mobuchowski)  
+    *Moves Kinesis integration to a separate module and updates HTTP transport to use HttpClient 5.x.*
+* **Docs: Upgrade docusaurus to 3.6** [`#3219`](https://github.com/OpenLineage/OpenLineage/pull/3219) [@arturowczarek](https://github.com/arturowczarek)
+* **Spark: Limit the Seq size in RddPathUtils::extract()** [`#3148`](https://github.com/OpenLineage/OpenLineage/pull/3148) [@codelixir](https://github.com/codelixir)  
+    *Adds flag to limit the logs in RddPathUtils::extract() to avoid OutOfMemoryError for large jobs.*
+
+### Fixed
+* **Docs: Fix outdated Spark-related docs** [`#3215`](https://github.com/OpenLineage/OpenLineage/pull/3215) [@mobuchowski](https://github.com/mobuchowski)
+* **Fix docusaurus-mdx-checker errors** [`#3217`](https://github.com/OpenLineage/OpenLineage/pull/3217) [@arturowczarek](https://github.com/arturowczarek)
+* **[Integration/dbt] Parse dbt source tests** [`#3208`](https://github.com/OpenLineage/OpenLineage/pull/3208) [@MassyB](https://github.com/MassyB)  
+    *Consider dbt sources when looking for test results.*
+* **Avoid tests in configurable test** [`#3141`](https://github.com/OpenLineage/OpenLineage/pull/3141) [@pawel-leszczynski](https://github.com/pawel-leszczynski)