All notable changes to this project will be documented in this file.
Performance improvements, bugfixes and other minor improvements including one new feature.
Hook.post_children_start()
now returns:[{child_id(), result(), pid(), [node()]}]
- Fixed error handling on process startups and stoppings. Returning more information about the failures and partial successes. Example:
ProcessHub.start_children/3
,ProcessHub.stop_children/3
withasync_wait: true
now on failures return:{:error, list_of_failures(), list_of_partial_successes()}
. This does not affect successful operations.
- Added new option
on_failure: :continue | :rollback
to theProcessHub.start_children/3
,ProcessHub.stop_children/3
functions. This option allows the user to specify what should happen if the operation fails.
ProcessHub.Strategy.Redundancy.Replication
was not properly updating the redundancy_signal value on some occasions due to race condition.- Fixed issue with
ProcessHub.Janitor
not purging the cache properly when using Gossip protocol. - State passing on hotswap migration with graceful shutdown fixed.
- Timeout option was not properly used in some cases.
- Improved typespecs across the codebase.
- Improved overall performance of start/stop/hotswap migration operations which involved synchronization and large amount of message passing by using bulk operations.
Added new configuration options for timeout and other static values.
storage_purge_interval
option added to the configuration to allow the user to set the interval for purging the storage table from old data (ttl based).migr_base_timeout
option added to the configuration to allow the user to set the timeout for the migration process to complete before it is considered failed.dsup_max_restarts
option added to the configuration to allow the user to set the maximum number of restarts for the distributed supervisor.dsup_max_seconds
option added to the configuration to allow the user to set the maximum number of seconds for the distributed supervisor to be running before it is considered failed.dsup_shutdown_timeout
option added to the configuration to allow the user to set the timeout for the distributed supervisor to shutdown before forcefully killing it when receiving a shutdown signal.
Bug fixes, minor documentation improvements.
- All tests are generating binary child_id's by default instead of atoms to avoid future issues with Keyword lists.
- Hotswap migration was not working properly when child_id was a binary type due to the
Keyword.get/3
calls. - Cluster update was throwing errors when child_id was a binary.
- Minor documentation improvements and logging.
Minor improvements on documentation.
Removed the requirement to start the Erlang node as a distributed node as
it is no longer needed anymore since the blockade
library now handles only one event queue per node.
Also added ability to start child processes statically.
- Ability to start child processes statically when starting up the ProcessHub.
- Removed the requirement to start the Erlang node as distributed node as it is no longer
needed anymore since the
blockade
library now handles only one event queue per node.
- Hex docs from 0.30.6 -> 0.34.2
This version includes bug fixes, new API functions, and minor improvements to the documentation.
- Replaced
Cachex
with our custom implementation to enhance performance. - Updated the default values for
max_restarts
andmax_seconds
to 100 and 4, respectively. - Storage module now accepts the table identifier as the first parameter, allowing it to be used with multiple tables.
- Introduced new API functions
get_pids/2
andget_pid/2
to get the pid/s bychild_id
. - New guide page for interacting with the process registry.
- Corrected an issue where local supervisor restarts were not properly updating the global registry.
- Fixed various typos and errors in the documentation.
Bugfixes and documentation improvements.
- Internal hook handler ID's.
- Guide page for creating custom strategies.
- More examples on how to use the hook system.
- Child process stopping when
child_id
was binary type. (Reported and fixed by Peaceful James - Documentation errors.
This release was focused on improving the hook system and adding new features to the system.
Some of the callbacks were removed from the strategies and the hooks were introduced to replace them.
All strategies will implement the init
callback where they can define the hooks they want to use.
- Hooks are registered as structs with an ID so they are easier to add/remove at runtime.
- Renamed
HookManager.register_hook_handlers/3
toHookManager.register_handlers/3
. Distributor/child_terminate
now handles operation at bulk to increase efficiency. The function is renamed toDistributor/children_terminate
.- All hook handlers can specify priority when registering them. Default is 0.
- Function to remove hook handler by it's ID.
HookManager.cancel_handler/3
. ProcessHub.Strategy.PartitionTolerance.Base.toggle_lock?/3
ProcessHub.Strategy.PartitionTolerance.Base.toggle_unlock?/3
ProcessHub.Strategy.PartitionTolerance.Base.handle_node_up/3
ProcessHub.Strategy.PartitionTolerance.Base.handle_node_down/3
ProcessHub.Strategy.Migration.Base.handle_process_startups/3
ProcessHub.Strategy.Distribution.Base.handle_node_join/2
ProcessHub.Strategy.Distribution.Base.handle_node_leave/2
ProcessHub.Strategy.Distribution.Base.handle_shutdown/1
ProcessHub.Strategy.Redundancy.Base.handle_post_start/3
ProcessHub.Strategy.Redundancy.Base.handle_post_update/3
- migration hotswap shutdown test case by synchronizing the state update.
Test.Helper.TestServer.handle_call/3
function parameters by adding the missin _from parameter.
Added new configuration options for timeout values and renamed LocalStorage -> Storage.
Depending on the number of nodes and processes, the default values might not be optimal. The new timeout configuration options enable the user to fine-tune the system for their specific use case.
- Renamed
ProcessHub.Service.LocalStorage
module toProcessHub.Service.Storage
.
- New timeout configuration for different operations:
:hubs_discover_interval
:deadlock_recovery_timeout
:child_migration_timeout
:handover_data_wait
This release introduces process state migration to the next node when the node is shutting down.
Note that it currently only supports the hotswap strategy with graceful shutdown of the node.
Added:
- New callback
ProcessHub.Strategy.Migration.Base.handle_startup/3
which will be called once the processes are started on the local distributed supervisor. - New callback
ProcessHub.Strategy.Distribution.Base.handle_shutdown/2
which will be called when coordinator process is shutting down. - Hotswap process state is now migrated to the next node when the node is shutting down.
Changed:
ProcessHub.process_list/2
when used withlocal
option now returns{child_id, pid}
tuples and not{child_id, [pid]}
as before.
Fixed:
- Replication master node could differ from node to node in some cases.
This release adds some guides and documentation pages and minor documentation fixes.
This release brings in lots of improvements and few new features.
The main focus was on making the distribution strategy configurable. This has led to the introduction of new strategies and the ability to implement custom strategies for process distribution.
Secondary focus was on improving the performance and reliability of the system. Profiled and optimized the codebase to find the biggest inefficiencies and bottlenecks.
Includes also some bugfixes.
New guides and documentation pages added.
Hook.registry_pid_inserted/0
no longer returns all node-pid pairs that are inserted but only the ones that are different from the previous state. This gives better overview of the changes that are made to the registry.- Integration tests now take into account the replication factor when waiting for the hook messages.
- Moved
belongs_to
function from redundancy to distribution strategy. - Replication strategy is now selecting the active nodes with different algorithm.
- Hooks are no longer spawing new processes and are executed in the context of the caller.
- Removed
cluster_nodes
parameter fromProcessHub.Strategy.PartitionTolerance.Base.handle_startup
function because the strategy can access the nodes from theCluster
module itself when needed. - Changed
:cluster_join
->:post_cluster_join
and:cluster_leave
->:post_cluster_leave
- Changed
:child_migrated_hook
->:children_migrated_hook
and the hook data to{node(), [child_spec()]}
- Improved hot swap migration performance by replacing multiple single operations with bulk operations.
- Support for configurable distribution strategy. This allows the user to switch between predefined strategies or implement their own.
- Added
ProcessHub.Strategy.Distribution.Guided
strategy which requires manual guidance from the user to decide which nodes should be used for process distribution. - Created guides.
- Added new hooks:
:pre_children_start_hook
,:pre_cluster_join
,:pre_cluster_leave
- Documentation pages for hooks, strategies and guides.
- New API function
ProcessHub.process_list/2
- Reduced parameters on some strategy callbacks.
- Removed hotswap migration retention
:none
option in favour of integer value.
- Replication strategy
:cluster_size
option was not counting the local node. - Tests we're failing due to race conditions in some cases.
- The hotswap migration was having difficulties with migrating large amount of processes.
- Documentation fixes.
- Process redistribution on node leave was not cleaning up the local storage properly and some processes we're not distributed to other nodes.
Replaced :ets with Cachex for local storage to improve reliability of the system and avoid potential race conditions.
Includes minor bugfixes and code improvements.
Introduced new hook forwarded_migration
.
- Increased the timeout value for children redistribution task.
- Increased integration tests load by 10x to improve the reliability of the tests.
- Replaced vanilla :ets with cachex for local storage. This improved the reliability of the system by preventing race conditions in some scenarios.
- Increased the default process sync start timeout from 5000 -> 10000 ms.
- Added custom identifier for local storage table.
- Transactions for synchronization append function to reduce possibility of multiple processes writing to the same table keys at the same time.
- Locking process registry when doing bulk operations to prevent overwriting of the data.
- New hook
forwarded_migration
which is called when a process is migrated to another node and the migration is forwarded to the new node where the startup is handled.
- Synchronization caused multiple nodes to reply to the caller. This caused anomalies in integration tests.
- Node up and down handlers are no longer blocking operations for coordinator to avoid timeout errors when handling large amount of processes.
- The child start responses we're in reverse order when single child was started on multiple nodes.
- Documentation hook keys were not correct and missing the
_hook
suffix.
Fixed bugs, code improvements, added more documentation, improved tests performance by reusing peer nodes.
- Project description.
- All integration tests start static number of nodes which will be reused rather than start new nodes for each test case. This improved the performance of the test suites.
- Process registry dispatches
:registry_pid_insert_hook
no longer adds the child_spec to the hook data but rather the child_id. - Gossip protocol synchronizes only after all the initial synchronization data is collected from all nodes. This improves the overall efficiency of the protocol.
- Redundancy strategy for replication now supports dynamic cluster size option. This can be used to replicate process always on all nodes.
- Examples on how to register hooks.
- WorkerQueue process who's job is to execute jobs. It is used to prevent race conditions and execute jobs in order.
- Process start/stop with binary child_id returned error when used with await/1 function.
- Fixed scenario where hub was shutdown but the running tasks we're trying to call exiting processes.
- Dynamic quorum strategy was returning wrong quorum status in some cases.
- Fixed situtation where system was caught in a locked state for a moments due to race condition in the state handling.
- Gossip protocol interval synchronization was not always taking the data that had the highest timestamp.
Contains minor bugfixes and improvements.
ProcessHub.child_spec/1
switched staticProcessHub
id with dynamic child id.- Improved existing unit tests.
- Hotswap strategy provides handover callbacks using macro.
- Added partition mode check ups before process startup or stopping. This can prevent some errors where node moves to partition mode before the process starts or stops.
Elixir 1.13-1.15 support added. Includes minor bugfixes, test fixes and documentation updates.
- Added GitHub Actions for automated testing.
- Made sure that
ProcessHub
is compatible with Elixir 1.13-1.15. - Added example usage section to the documentation.
- Updated
ProcessHub
documentation by adding a list of all available strategies. - Removed unnecessary file .tool-version generated by asdf.
- README.md table of contents links fixed.
- Fixed
ProcessHub
await/1 function example code formatting. - Fixed tests for elixir 1.15 & OTP 26
- Fixed test case which was failing in some cases due to async call being executed before.