v1.0.0rc1
Pre-releasev1.0.0rc1 Sep 17, 2021
Release candidate for version 1.0
What's New in this Release
For additional documentation, check out the 1.0 transition guide
Adding Interesting Values
To add interesting values for a single entity, call EntitySet.add_interesting_values
passing the
id of the entity for which interesting values should be added.
>>> es.add_interesting_values(entity_id='log')
Setting a Secondary Time Index
To set a secondary time index for a specific entity, call EntitySet.set_secondary_time_index
passing
Entity for which to set the secondary time index along with the dictionary mapping the secondary time
index column to the for which the secondary time index applies.
>>> customers_secondary_time_index = {'cancel_date': ['cancel_reason']}
>>> es.set_secondary_time_index(es['customers'], customers_secondary_time_index)
Creating a Relationship and Adding to an EntitySet
Relationships are now created by passing parameters identifying the entityset along with four string values
specifying the parent dataframe, parent column, child dataframe and child column. Specifying parameter names
is optional.
>>> new_relationship = Relationship(
... entityset=es,
... parent_dataframe_name='customers',
... parent_column_name='id',
... child_dataframe_name='sessions',
... child_column_name='customer_id'
... )
Relationships can now be added to EntitySets in one of two ways. The first approach is to pass in
name values for the parent dataframe, parent column, child dataframe and child column. Specifying
parameter names is optional with this approach.
>>> es.add_relationship(
... parent_dataframe_name='customers',
... parent_column_name='id',
... child_dataframe_name='sessions',
... child_column_name='customer_id'
... )
Relationships can also be added by passing in a previously created Relationship
object. When using
this approach the relationship
parameter name must be included.
>>> es.add_relationship(relationship=new_relationship)
Replace DataFrame
To replace a dataframe in an EntitySet with a new dataframe, call EntitySet.replace_dataframe
and pass in the name of the dataframe to replace along with the new data.
>>> es.replace_dataframe(dataframe_name='log', df=df)
List Logical Types and Semantic Tags
Logical types and semantic tags have replaced variable types to parse and interpret columns. You can list all the available logical types by calling featuretools.list_logical_types
.
>>> ft.list_logical_types()
You can list all the available semantic tags by calling featuretools.list_semantic_tags
.
>>> ft.list_semantic_tags()
Breaking Changes
Entity.add_interesting_values
has been removed. To add interesting values for a single
entity, callEntitySet.add_interesting_values
and pass the name of the dataframe for
which to add interesting values in thedataframe_name
parameter (#1405, #1370).Entity.set_secondary_time_index
has been removed and replaced byEntitySet.set_secondary_time_index
with an addeddataframe_name
parameter to specify the dataframe on which to set the secondary time index (#1405, #1370).Relationship
initialization has been updated to accept four name values for the parent dataframe,
parent column, child dataframe and child column instead of accepting twoVariable
objects (#1405, #1370).EntitySet.add_relationship
has been updated to accept dataframe and column name values or a
Relationship
object. Adding a relationship from aRelationship
object now requires passing
the relationship as a keyword argument (#1405, #1370).Entity.update_data
has been removed. To update the dataframe, callEntitySet.replace_dataframe
and use thedataframe_name
parameter (#1630, #1522).- The data in an
EntitySet
is no longer stored inEntity
objects. Instead, dataframes
with Woodwork typing information are used. Accordingly, most language referring to “entities”
will now refer to “dataframes”, references to “variables” will now refer to “columns”, and
“variable types” will use the Woodwork type system’s “logical types” and “semantic tags” (#1405). - The dictionary of tuples passed to
EntitySet.__init__
has replaced thevariable_types
element
with separatelogical_types
andsemantic_tags
dictionaries (#1405). EntitySet.entity_from_dataframe
no longer exists. To add new tables to an entityset, useEntitySet.add_dataframe
(#1405).EntitySet.normalize_entity
has been renamed toEntitySet.normalize_dataframe
(#1405).- Instead of raising an error at
EntitySet.add_relationship
when the dtypes of parent and child columns
do not match, Featuretools will now check whether the Woodwork logical type of the parent and child columns
match. If they do not match, there will now be a warning raised, and Featuretools will attempt to update
the logical type of the child column to match the parent’s (#1405). - If no index is specified at
EntitySet.add_dataframe
, the first column will only be used as index if
Woodwork has not been initialized on the DataFrame. When adding a dataframe that already has Woodwork
initialized, if there is no index set, an error will be raised (#1405). - Featuretools will no longer re-order columns in DataFrames so that the index column is the first column of the DataFrame (#1405).
- Type inference can now be performed on Dask and Koalas dataframes, though a warning will be issued
indicating that this may be computationally intensive (#1405). - EntitySet.time_type is no longer stored as Variable objects. Instead, Woodwork typing is used, and a
numeric time type will be indicated by the'numeric'
semantic tag string, and a datetime time type
will be indicated by theDatetime
logical type (#1405). last_time_index
,secondary_time_index
, andinteresting_values
are no longer attributes
of an entityset’s tables that can be accessed directly. Now they must be accessed through the metadata
of the Woodwork DataFrame, which is a dictionary (#1405).- The helper function
list_variable_types
will be removed in a future release and replaced bylist_logical_types
.
In the meantime,list_variable_types
will return the same output aslist_logical_types
(#1447).
Changelog
- Enhancements
- Add support for creating EntitySets from Woodwork DataTables (#1277)
- Add
EntitySet.__deepcopy__
that retains Woodwork typing information (#1465) - Add
EntitySet.__getstate__
andEntitySet.__setstate__
to preserve typing when pickling (#1581) - Returned feature matrix has woodwork typing information (#1664)
- Fixes
- Fix
DFSTransformer
Documentation for Featuretools 1.0 (#1605) - Fix
calculate_feature_matrix
time type check andencode_features
for synthesis tests (#1580) - Revert reordering of categories in
Equal
andNotEqual
primitives (#1640) - Fix bug in
EntitySet.add_relationship
that causedforeign_key
tag to be lost (#1675) - Update DFS to not build features on last time index columns in dataframes (#1695)
- Fix
- Changes
- Remove
add_interesting_values
fromEntity
(#1269) - Move
set_secondary_time_index
method fromEntity
toEntitySet
(#1280) - Refactor Relationship creation process (#1370)
- Replaced
Entity.update_data
withEntitySet.update_dataframe
(#1398) - Move validation check for uniform time index to
EntitySet
(#1400) - Replace
Entity
objects inEntitySet
with Woodwork dataframes (#1405) - Refactor
EntitySet.plot
to work with Woodwork dataframes (#1468) - Move
last_time_index
to be a column on the DataFrame (#1456) - Update serialization/deserialization to work with Woodwork (#1452)
- Refactor
EntitySet.query_by_values
to work with Woodwork dataframes (#1467) - Replace
list_variable_types
withlist_logical_types
(#1477) - Allow deep EntitySet equality check (#1480)
- Update
EntitySet.concat
to work with Woodwork DataFrames (#1490) - Add function to list semantic tags (#1486)
- Initialize Woodwork on feature matrix in
remove_highly_correlated_features
if necessary (#1618) - Remove categorical-encoding as an add-on library (will be added back later) (#1632)
- Remove autonormalize as an add-on library (will be added back later) (#1636)
- Remove tsfresh, nlp_primitives, sklearn_transformer as an add-on library (will be added back later) (#1638)
- Update input and return types for
CumCount
primitive (#1651) - Standardize imports of Woodwork (#1526)
- Rename target entity to target dataframe (#1506)
- Replace
entity_from_dataframe
withadd_dataframe
(#1504) - Create features from Woodwork columns (#1582)
- Move default variable description logic to
generate_description
(#1403) - Update Woodwork to version 0.4.0 with
LogicalType.transform
and LogicalType instances (#1451) - Update Woodwork to version 0.4.1 with Ordinal order values and whitespace serialization fix (#1478)
- Use
ColumnSchema
for primitive input and return tyes (#1411) - Update features to use Woodwork and remove
Entity
andVariable
classes (#1501) - Re-add
make_index
functionality to EntitySet (#1507) - Use
ColumnSchema
in DFS primitive matching (#1523) - Updates from Featuretools v0.26.0 (#1539)
- Leverage Woodwork better in
add_interesting_values
(#1550) - Update
calculate_feature_matrix
to use Woodwork (#1533) - Update Woodwork to version 0.6.0 with changed categorical inference (#1597)
- Update
nlp-primitives
requirement for Featuretools 1.0 (#1609) - Remove remaining references to
Entity
andVariable
in code (#1612) - Update Woodwork to version 0.7.1 with changed initialization (#1648)
- Removes outdated workaround code related to a since-resolved pandas issue (#1677)
- Remove unused
_dataframes_equal
andcamel_to_snake
functions (#1683) - Update Woodwork to version 0.8.0 for improved performance (#1689)
- Remove redundant typecasting in
encode_features
(#1694) - Speed up
encode_features
if not inplace, some space cost (#1699) - Clean up comments and commented out code (#1701)
- Update Woodwork to version 0.8.1 for improved performance (#1702)
- Remove
- Documentation Changes
- Add a Woodwork Typing in Featuretools guide (#1589)
- Add a resource guide for transitioning to Featuretools 1.0 (#1627)
- Update
using_entitysets
page to use Woodwork (#1532) - Update FAQ page to use Woodwork integration (#1649)
- Update DFS page to be Jupyter notebook and use Woodwork integration (#1557)
- Update Feature Primitives page to be Jupyter notebook and use Woodwork integration (#1556)
- Update Handling Time page to be Jupyter notebook and use Woodwork integration (#1552)
- Update Advanced Custom Primitives page to be Jupyter notebook and use Woodwork integration (#1587)
- Update Deployment page to use Woodwork integration (#1588)
- Update Using Dask EntitySets page to be Jupyter notebook and use Woodwork integration (#1590)
- Update Specifying Primitive Options page to be Jupyter notebook and use Woodwork integration (#1593)
- Update API Reference to match Featuretools 1.0 API (#1600)
- Update Index page to be Jupyter notebook and use Woodwork integration (#1602)
- Update Feature Descriptions page to be Jupyter notebook and use Woodwork integration (#1603)
- Update Using Koalas EntitySets page to be Jupyter notebook and use Woodwork integration (#1604)
- Update Glossary to use Woodwork integration (#1608)
- Update Tuning DFS page to be Jupyter notebook and use Woodwork integration (#1610)
- Fix small formatting issues in Documentation (#1607)
- Remove Variables page and more references to variables (#1629)
- Update Feature Selection page to use Woodwork integration (#1618)
- Update Improving Performance page to be Jupyter notebook and use Woodwork integration (#1591)
- Fix typos in transition guide (#1672)
- Testing Changes
Thanks to the following people for contributing to this release:
@gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd