-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.4.1 #492
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- conditionally load extensions based on environment variable DUCKDB_USE_INSTALLED_EXTENSIONS - improves flexibility and avoids redundant installation when extensions are already present
- added SLING_DUCKDB_COMPUTE environment variable to allow disabling duckdb computation for tasks. This allows for easier testing and debugging scenarios where duckdb may not be desired or available.
- added pipeline configuration file support - implemented pipeline execution logic - added pipeline tests - updated CLI to support pipeline configuration - updated documentation to reflect pipeline functionality - added pipeline state management - added support for pipeline steps
- changed the top-level key from `pipeline` to `steps` in the YAML configuration for pipeline definitions. - updated the `LoadPipelineConfig` function to correctly parse the `steps` key instead of `pipeline`. - this ensures compatibility with the updated YAML structure and prevents errors during pipeline loading.
- Refactor runtime state handling to use a consistent `state` map instead of `hooks` map. - Update pipeline and replication configurations to utilize the new `state` map. - Modify `SetStateData` and `SetStateKeyValue` functions in `RuntimeState` interface. - Adjust test YAML files to reflect the state changes. - Enhance hook execution to incorporate the new state management.
- Updated the List method across all file system clients (Azure, FTP, Google, Local, S3, SFTP) to support glob pattern matching.
- Fixed a bug in the `CopyRecursive` function where the destination path was incorrectly constructed, leading to potential issues when the `toPath` already ended with a `/`. The fix uses `strings.TrimSuffix` to remove any trailing `/` from `toPath` before appending the relative path.
- Fixed a bug where relative paths were not correctly calculated during recursive file copy, leading to incorrect destination paths. - Improved handling of both single files and nested files. The new logic correctly determines the relative path in all cases. - Added checks to ensure correct destination path construction when `toPath` is a directory.
- Updated `ExecuteOnDone` in `Hook` and `Step` interfaces to return an `OnFailType` and an error. - Modified `Hooks.Execute` and `Pipeline.Execute` to handle the returned `OnFailType` and error appropriately. This improves error handling and allows for more granular control over failure scenarios.
- Added lower and upper case versions of schema and table names to the StreamState struct. - Updated StateSet function to populate these new fields with the corresponding values from the format map. - This enhancement provides more flexibility and consistency in handling schema and table names. - The lower and upper case versions can be used for case-insensitive comparisons or other operations where case sensitivity is not required.
- Added specific handling for casting boolean strings ('true'/'false') to integers (1/0) in SQL Server queries. This addresses potential data type mismatch issues when selecting boolean columns into integer columns.
- Correctly handle cases where a string column is cast to an integer, including non-boolean values. The previous implementation only handled 'true' and 'false' strings, causing incorrect results for other string values. This change adds an `else {col}` clause to handle these cases.
- added `stream_table_lower` and `stream_table_upper` variables to support specifying a range of tables for replication
- added support for chunking large datasets using the `chunk_size` option in the replication configuration - implemented `ChunkByColumnRange` function to generate chunk ranges based on the specified column and size - updated replication process to handle chunked streams - added tests for chunking functionality - improved error handling and logging - updated documentation to reflect the new chunking feature - added new config option `parallel_chunks` to control how many chunks to run in parallel. - adjusted test data and scripts to accommodate for chunking - optimized chunking process to reduce memory usage and improve performance - updated test cases for improved coverage and accuracy
- disable default pool behavior to avoid unnecessary connection buildup - improve connection management for source/target options - TODO: refactor metadata passing for better connection management
- Updated the expected output for the `sling run` command in the test suite. - Fixed a discrepancy in the expected output string for test case 69 and 70. - Improved test case accuracy and reliability.
- Added a new test for sling pipeline 02, which includes sftp and aws s3 data transfers. - Added a new test for sling pipeline 01 to improve test coverage.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Major Changes
Pipeline Implementation
Pipeline
type for executing sequential stepsState Management Refactoring
Hooks
toState
in runtime state managementRuntimeState
interfaceReplicationState
andPipelineState
implementationsChunking Functionality
ProcessChunks()
method for handling data partitioningCase Handling Improvements
stream_schema_lower/upper
,stream_table_lower/upper
File System Updates
Minor Changes