Release notes

0.9.9 (September 9, 2024)

Minor version release updating dependencies.

Notable version upgrades

Default Python version used in automated tests is changed to 3.12
Pandas updated to 2.2.2 version
SQLAlchemy updated to 2.x API
Numpy updated to 2.x API

0.9.8 (November 20, 2023)

Enhancements

Experimental support for using SQL to generate anonymising sets of values. This feature is available for all column types except numerical.
make_distinct custom action now works on date columns.
You can now easily add a column with current date and time by using '@sysdate' as a derived column.
Pseudo-CHI numbers can be now generated by passing pseudo_chi as anonymising set to UUID columns.
Numerical column weights for categorical values are now optional. This should speed up the process of manually composing a specification.

Bug fixes

Minor bugs fixed in shift_distribution, make_outlier and make_distinct.
Fixed a bug in regex distribution where the target number of uniques wasn't respected.
Fixed date column not being recognized if source data had missing values.

Package version upgrades

Python version changes to 3.10
Pandas updated to 2.x version

0.9.7 (November 15, 2022)

Enhancements

Using Exhibit as an importable library is now easier. Please see the scripting recipe for more details and examples.
anon.db is now called exhibit.db. You can also now use 3rd party databases to store associated specification / aliasing data, as long as you have the required SQL Alchemy dialect installed. Set EXHIBIT_DB_SCHEMA and EXHIBIT_DB_URL environment variables and Exhibit will use those instead of the local exhibit.db.
New custom action & filter pairs: shift_distribution_right / left and COLUMN_NAME with_high / low_frequency
You can now save probabilities for columns you marked as linked in the CLI.
UUID columns can now be generated using incrementing integer values by setting anonymising_set to range. You can also set different seeds for each UUID column.

Bug fixes

Improved the calculation of weights for numerical columns.
Various other minor bug fixes and improvements to error messages.

Package version upgrades

added scipy and sqlalchemy as dependencies.

0.9.6 (August 3, 2022)

Enhancements

Added experimental support for using pickled machine learning models as plug-ins. See the Create Exhibit-compatible ML model.ipynb recipe for details.
Added an option to save probabilities of values in columns that are put into the DB.
Added performance and memory benchmarking.
You can now reference custom lookups you added to the DB directly in the specification as long as specification columns and DB columns match.
Added a make_almost_same custom action.

Bug fixes

Fixed a bug where generating a dataset without any categorical columns would give an error.
Fixed a DB bug that gave missing data an equal chance to appear for columns where number of uniques exceeded the in-line limit.

Package version upgrades

added dill as a dependency.

0.9.5 (June 26, 2022)

Enhancements

Added 4 new custom actions to manipulate timeseries: given a numerical column and a timeseries column, create artificial skew (left or right) or add peak / valley.
generate_as_sequence custom action has a new variant that lets you generate repeated sequences of values in the order that they appear in the spec, regardless of the probability vector.
You can now apply a single custom action to multiple columns by providing them as a comma-separated target string. The same applies to actions. The processing of custom constraints happens in the order in which column names / actions were specified.

Bug fixes

Fixed an issue where custom constraints wouldn't always respect original column types (float or Int64).
Fixed an issue where column values generated from a regular expression pattern were inadvertently repeated under certain conditions.
Fixed a bug with missing values in user linked columns.
Fixed a bug that could result in linked column groups being in different order when re-running the generation of the same specification.

Package version upgrades

numpy bumped to 1.22.

0.9.4 (June 6, 2022)

Enhancements

Added experimental support for generating geospatial data. You can now generate point geometry with latitude and longitude coordinates sampled from H3 hexagons.
Additionally, you can create random, but geographically-valid regions to match the partitions in the data. This is done using a new custom action called geo_make_regions.
You can now add noise to user linked column groups. Rather than mirroring the original relationships exactly, links can be formed between random column values based on a specified probability.
Added a new custom action: generate_as_sequence. This action is useful when generated values much follow a specific order in a partition, like vaccine doses administered to an individual: "full schedule" before "booster". This is different from sorting because sorting happens after the data has been generated, whereas generate_as_sequence will ensure that "booster" is never generated by itself - only when preceded by "full schedule".

Bug fixes

When generating missing data, there was a chance that missing values will be generated in the same rows for different columns rather than independently.
Fixed a number of issues around nullable integer type.

Package version upgrades

pandas bumped to 1.4.2.
numpy bumped to 1.21.5.
PyYAML bumped to 6.0.

0.9.3 (April 19, 2022)

Bug fixes

When asking Exhibit to generate a specification from a dataset that didn't contain any numerical columns, the resulting specification was missing probability information for categorical columns below the in-line limit.

Enhancements

Revised the specification of custom constraints (previously called conditional constraints). Now you can specify the subset (filter) of the data, the partition, and one or more columns to be affected by the constraints. In addition to the make_null, make_not_null and make_outlier, there are 4 new constraints available: make_same, make_distinct, sort_ascending and sort_descending.
Added an option to generate uuid columns. If your source dataset includes record-level data with unique identifiers, you can exclude them from processing and generate them separately. The uuid columns work differently to normal categorical columns in that you specify the probabilities of the frequency of each unique value appearing in your synthetic dataset. See uuid_demo.yml and uuid_anon.csv files for examples.
Added an option to designate numerical columns like age or dose number as categorical. List such columns after the --discrete_columns flag in CLI.

0.9.2 (February 14, 2022)

Bug fixes

Fixed a RNG-related bug that could result in slightly different datasets being generated on Linux and Windows from the same specification.

Enhancements

You can now use Exhibit as an importable library, not just as a CLI program. See recipes/exhibit_scripting.py for examples of the basic API.
Exhibit now correctly handles columns composed entirely out of boolean values. For the purposes of dataset generation they are treated as categorical rather than numerical values.

0.9.1 (December 5, 2021)

Hotfix a Windows-specific bug related to SQLite3 type adaptors.

0.9.0 (December 4, 2021)

First beta release ready for limited use in production.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

Release notes

0.9.9 (September 9, 2024)

Notable version upgrades

0.9.8 (November 20, 2023)

Enhancements

Bug fixes

Package version upgrades

0.9.7 (November 15, 2022)

Enhancements

Bug fixes

Package version upgrades

0.9.6 (August 3, 2022)

Enhancements

Bug fixes

Package version upgrades

0.9.5 (June 26, 2022)

Enhancements

Bug fixes

Package version upgrades

0.9.4 (June 6, 2022)

Enhancements

Bug fixes

Package version upgrades

0.9.3 (April 19, 2022)

Bug fixes

Enhancements

0.9.2 (February 14, 2022)

Bug fixes

Enhancements

0.9.1 (December 5, 2021)

0.9.0 (December 4, 2021)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Release notes

0.9.9 (September 9, 2024)

Notable version upgrades

0.9.8 (November 20, 2023)

Enhancements

Bug fixes

Package version upgrades

0.9.7 (November 15, 2022)

Enhancements

Bug fixes

Package version upgrades

0.9.6 (August 3, 2022)

Enhancements

Bug fixes

Package version upgrades

0.9.5 (June 26, 2022)

Enhancements

Bug fixes

Package version upgrades

0.9.4 (June 6, 2022)

Enhancements

Bug fixes

Package version upgrades

0.9.3 (April 19, 2022)

Bug fixes

Enhancements

0.9.2 (February 14, 2022)

Bug fixes

Enhancements

0.9.1 (December 5, 2021)

0.9.0 (December 4, 2021)