Skip to content

Releases: the4thdoctor/pg_chameleon

v2.0.1

13 Jan 23:17
Compare
Choose a tag to compare

The first maintenance release of pg_chameleon v2 adds a performance improvement in the read replica process when
the variables limit_tables or skip_tables are set.

Previously all the rows were read from the replica stream as the BinLogStreamReader do not allow the usage of the tables in the form of
schema_name.table_name. This caused a large amount of useless data hitting the replica log tables as reported in the issue #58.

The private method __store_binlog_event now evaluates the row schema and table and returns a boolean value on whether the row or query
should be stored or not into the log table.

The release fixes also a crash in read replica if an alter table added a column was of type character varying.

Changelog from v2.0.0

  • Fix for issue #58. Improve the read replica performance by filtering the row images when limit_tables/skip_tables are set.
  • Make the read_replica_stream method private.
  • Fix read replica crash if in alter table a column was defined as character varying

v2.0.0

01 Jan 00:22
Compare
Choose a tag to compare

This stable release consists of the same code of the RC1 with few usability improvements.

A new option is now available to set to set the maximum level for the messages to be sent to rollbar.
This is quite useful if we configure a periodical init_replica (e.g. pgsql source type refreshed every hour) and we don't want to fill rollbar with noise.
For example chameleon init_replica --source pgsql --rollbar-level critical will send to rollbar only messages marked as critical.

There is now a command line alias chameleon which is a wrapper for chameleon.py.

A new command enable_replica is now available to enable the source's replica if the source is not stopped clean.

Changelog from v2.0rc1

  • Add option --rollbar-level to set the maximum level for the messages to be sent to rollbar. Accepted values: "critical", "error", "warning", "info". The Default is "info".
  • Add command enable_replica used to reset the replica status in case of error or unespected crash
  • Add script alias chameleon along with chameleon.py

v2.0.0.rc1

24 Dec 09:11
Compare
Choose a tag to compare

This release candidate comes with few bug fixes and few usability improvements.

Previously when adding a table with a replicated DDL having an unique key, the table's creation failed because of the fields were
set as NULLable . Now the command works properly.

The system now checks if the MySQL configuration allows the replica when initialising or refreshing replicated entities.

A new class rollbar_notifier was added in order to simplyfi the message management within the source and engine classes.

Now the commands init_replica,refresh_schema,sync_tables send an info notification to rollbar when they complete successfully or
an error if they don't.

The command sync_tables now allows the special name --tables disabled to have all the tables with replica disabled
re synchronised at once.

Changelog from v2.0beta1

  • Fix for issue #52, When adding a unique key the table's creation fails because of the NULLable field
  • Add check for the MySQL configuration when initialising or refreshing replicated entities
  • Add class rollbar_notifier for simpler message management
  • Add end of init_replica,refresh_schema,sync_tables notification to rollbar
  • Allow --tables disabled when syncing the tables to re synchronise all the tables excluded from the replica

v2.0.0.beta1

10 Dec 13:54
Compare
Choose a tag to compare
v2.0.0.beta1 Pre-release
Pre-release

The first beta for the milestone 2.0 adds fixes a long standing bug to the replica process and adds more features to the postgresql support.

The race condition fixed was caused by a not tokenised DDL preceeded by row images, causing the collected binlog rows to be added several times to the log_table.
It was quite hard to debug as the only visible effect was a primary key violation on random tables.

The issue is caused if a set of rows lesser than the replica_batch_size are followed by a DDL that is not tokenised (e.g. CREATE TEMPORARY TABLE `foo`; )
which coincides with the end of read from the binary log.
In that case the batch is not closed and the next read replica attempt will restart from the previous position reading and storing again the same set of rows.
When the batch is closed the replay function will eventually fail because of a primary/unique key violation.

The tokeniser now works properly when an ALTER TABLE ADD COLUMN's definition is surrounded by parentheses e.g. ALTER TABLE foo ADD COLUMN(bar varchar(30));
There are now error handlers when wrong table names, wrong schema names, wrong source name and wrong commands are specified to chameleon.py
When running commands that require a source name tye system checks if the source is registered.

The init_replica for source pgsql now can read from an hot standby but the copy is not consistent as it's not possible to export a snapshot from the hot standbys.
Also the * init_replica for source pgsql adds the copied tables as fake "replicated tables" for better show_status display.

For the source type pgsql the following restrictions apply.

  • There is no support for real time replica
  • The data copy happens always with file method
  • The copy_max_memory doesn't apply
  • The type override doesn't apply
  • Only init_replica is currently supported
  • The source connection string requires a database name

Changelog from v2.0alpha3

  • fix a race condition where an unrelated DDL can cause the collected binlog rows to be added several times to the log_table
  • fix regression in write ddl caused by the change of private method
  • fix wrong ddl parsing when a column definition is surrounded by parentheses e.g. ALTER TABLE foo ADD COLUMN(bar varchar(30));
  • error handling for wrong table names, wrong schema names, wrong source name and wrong commands
  • init_replica for source pgsql now can read from an hot standby but the copy is not consistent
  • init_replica for source pgsql adds "replicated tables" for better show_status display
  • check if the source is registered when running commands that require a source nam

v2.0.0.alpha3

03 Dec 17:06
Compare
Choose a tag to compare
v2.0.0.alpha3 Pre-release
Pre-release

please note this is a not production release. do not use it in production

The third and final alpha3 for the milestone 2.0 fixes some issues and add more features to the system.

As there are changes in the replica catalog if upgrading from the alpha1 there will be need to do a drop_replica_schema
followed by a create_replica_schema. This will drop any existing replica and will require re adding the sources and
re initialise them with init_replica.

The system now supports a source type pgsql with the following limitations.

  • There is no support for real time replica
  • The data copy happens always with file method
  • The copy_max_memory doesn't apply
  • The type override doesn't apply
  • Only init_replica is currently supported
  • The source connection string requires a database name
  • In the show_status detailed command the replicated tables counters are always zero

A stack trace capture is now added on the log and the rollbar message for better debugging.
A new parameter on_error_replay is available for the sources to set whether the replay process should skip the tables or exit on error.

This release adds the command upgrade_replica_schema for upgrading the replica schema from the version 1.8 to the 2.0.

The upgrade procedure is described in the documentation.

Please read it carefully before any upgrade and backup the schema sch_chameleon before attempting any upgrade.

Changelog from v2.0alpha2

  • Remove limit_tables from binlogreader initialisation, as we can read from multiple schemas we should only exclude the tables not limit
  • Fix wrong formatting for default value when altering a field
  • Add upgrade procedure from version 1.8.2 to 2.0
  • Improve error logging and table exclusion in replay function
  • Add stack trace capture to the rollbar and log message when one of the replica daemon crash
  • Add on_error_replay to set whether the replay process should skip the tables or exit on error
  • Add init_replica support for source type pgsql (EXPERIMENTAL)

v1.8.2

25 Nov 12:03
Compare
Choose a tag to compare

The version 1.8.2 is the bugfix for the final release 1.8 for the branch v1.
There are few bugfixes, ans some backports from the version 2.0, which is currently in alpha.

This release upgrades the replica catalogue to the version 1.7, adding a new field t_source_schema to the table t_sources .
The field is used only for the migration to the version 2.0.0 and is updated every time a sourceid is requested from the class pg_engine.

The show_version command now displays the source schema as well.

Changelog from 1.8.1

  • Fix for issue #33 pg can't handle NUL characters in string
  • Fix exception in b64 conversion when saving a discarded row
  • Add t_source_schema in table t_sources, used for the upgrade to the upcoming version 2.0
  • change log line formatting inspired by the super clean look in pgbackrest (thanks you guys)
  • Update show_status to display the source schema

v2.0.0.alpha2

18 Nov 11:24
Compare
Choose a tag to compare
v2.0.0.alpha2 Pre-release
Pre-release

The second alpha of the milestone 2.0 comes after a week of full debugging. This release is more usable and stable than the
alpha1. As there are changes in the replica catalog if upgrading from the alpha1 there will be need to do a drop_replica_schema
followed by a create_replica_schema. This will drop any existing replica and will require re adding the sources and
re initialise them with init_replica.

The full list of changes is in the CHANGELOG file. However there are few notable remarks.

There is a detailed display of the show_status command when a source is specified. In particular the number of replicated and
not replicated tables is displayed. Also if any table as been pulled out from the replica it appears on the bottom.

From this release there is an error log which saves the exception's data during the replay phase.
The error log can be queried with the new command show_errors.

A new source parameter replay_max_rows has been added to set the amount of rows to replay.
Previously the value was set by the parameter replica_batch_size. If upgrading from alpha1 you may need to add
this parameter to your existing configuration.

Finally there is a new class called pgsql_source, not yet functional though.
This class will add a very basic support for the postgres source type.
More details will come in the alpha3.

Changelog from v2.0alpha1

  • Fix wrong position when determining the destination schema in read_replica_stream
  • Fix wrong log position stored in the source's high watermark
  • Fix wrong table inclusion/exclusion in read_replica_steam
  • Add source parameter replay_max_rows to set the amount of rows to replay. Previously the value was set by replica_batch_size
  • Fix crash when an alter table affected a table not replicated
  • Fixed issue with alter table during the drop/set default for the column (thanks to psycopg2's sql.Identifier)
  • add type display to source status
  • Add fix for issue #33 cleanup NUL markers from the rows before trying to insert them in PostgreSQL
  • Fix broken save_discarded_row
  • Add more detail to show_status when specifying the source with --source
  • Changed some methods to private
  • ensure the match for the alter table's commands are enclosed by word boundaries
  • add if exists when trying to drop the table in swap tables. previously adding a new table failed because the table wasn't there
  • fix wrong drop enum type when adding a new field
  • add log error for storing the errors generated during the replay
  • add not functional class pgsql_source for source type pgsql
  • allow type_override to be empty
  • add show_status command for displaying the log error entries
  • add separate logs for per source
  • change log line formatting inspired by the super clean look in pgbackrest (thanks you guys)

v2.0alpha1

11 Nov 12:05
Compare
Choose a tag to compare
v2.0alpha1 Pre-release
Pre-release

release notes

please note this is a not production release. do not use it in production

The documentation is available at http://www.pgchameleon.org/documents_v2/index.html

This is the first alpha of the milestone 2.0.
The project has been restructured in many ways thanks to the user's feedback.
Hopefully this will make the system much simple to use.

The main changes in the version 2 are the following.

The system is Python 3 only compatible. Python 3 is the future and there is no reason why to keep developing thing in 2.7.

The system now can read from multiple MySQL schemas in the same database and replicate them it into a target PostgreSQL database.
The source and target schema names can be different.

The system now use a conservative approach to the replica. The tables which generate errors during the replay are automatically excluded from the replica.

The init_replica process runs in background unless the logging is on the standard output or the debug option is passed to the command line.

The replica process now runs in background with two separated subprocess, one for the read and one for the replay.
If the logging is on the standard output or the debug option is passed to the command line the main process stays in foreground though.

The system now use a soft approach when initialising the replica .
The tables are locked only when copied. Their log coordinates will be used by the replica damon to put the database in a consistent status gradually.

The system can now use the rollbark key and environment to setup the Rollbar integration, for a better error detection.

changelog from version 1.8

  • Python 3 only development
  • Add support for reading from multiple MySQL schemas and restore them it into a target PostgreSQL database. The source and target schema names can be different.
  • Conservative approach to the replica. Tables which generate errors are automatically excluded from the replica.
  • Daemonised init_replica process.
  • Daemonised replica process with two separated subprocess, one for the read and one for the replay.
  • Soft replica initialisation. The tables are locked when needed and stored with their log coordinates. The replica damon will put the database in a consistent status gradually.
  • Rollbar integration for a simpler error detection.

v1.8.1

04 Nov 15:37
Compare
Choose a tag to compare

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.8

  • Fix for issue #31, MySQL numeric password breaks pg_chameleon
  • Fix for issue #32, pg Schema name upper case is dropped to lowercase in init_replica
  • Fix for issue #34, Missing escape for table names

v1.8

08 Oct 08:47
Compare
Choose a tag to compare

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.7

  • Fix wrong check in thread alive when running with --thread option
  • Add support for RENAME statement