- Fixing bug: improving datasets download handling to cope with transmission errors.
- Fixing bug: solving failure when using the first column of a dataset as objective field in models and ensembles.
- Adding new bigmler analyze option, --random-fields to analyze performance of random forests chaging the number of random candidates.
- Fixing bug in reify subcommand for unordered reifications.
- Adding bigmler reify subcommand to script the resource creation.
- Fixing bug: changing the related Python bindings version to solve encoding problem when using Python 3 on Windows.
- Adding bigmler report subcommand to generate reports for cross-validation results in bigmler analyze.
- Fixing bug: bigmler analyze and filtering datasets failed when the origin dataset was a filtered one.
- Fixing bug: bigmler analyze --features could not analyze phi for a user-given category because the metric is called phi_coefficient.
- Modifying the output of bigmler analyze --features and --nodes to include the command to generate the best performing model and the command to clean all the generated resources.
- Fixing bug: dataset generation with a filter on a previous dataset was not working.
- Adding the --project-tag option to bigmler delete.
- Fixing that the --test-dataset and related options can be used in model evaluation.
- Fixing bug: bigmler anomalies for datasets with more than 1000 fields failed.
- Adding the --top-n, --forest-size and --anomalies-dataset to the bigmler anomaly subcommand.
- Fixing bug: source upload failed when using arguments that contain unicodes.
- Fixing bug: bigmler analyze subcommand failed for datasets with more than 1000 fields.
- Supporting Python 3 and changing the test suite to nose.
- Adding --cluster-models option to generate the models related to cluster datasets.
- Adding --score flag to create batch anomaly scores for the training set.
- Allowing --median to be used also in ensembles predictions.
- Using --seed option also in ensembles.
- Adding --median flag to use median instead of mean in single models' predictions.
- Updating underlying BigML python bindings' version to 4.0.2 (Python 3 compatible).
- Fixing bug: resuming commands failed retrieving the output directory
- Fixing docs formatting errors.
- Adding --to-dataset and --no-csv flags causing batch predictions, batch centroids and batch anomaly scores to be stored in a new remote dataset and not in a local CSV respectively.
- Adding the sample subcommand to generate samples from datasets
- Fixing bug: using --model-fields with --max-categories failed.
- Fixing bug: Failed field retrieval for batch predictions starting from source or dataset test data.
- Adding the --project and --project-id to manage projects and associate them to newly created sources.
- Adding the --cluster-seed and --anomaly-seed options to choose the seed for deterministic clusters and anomalies.
- Refactoring dataset processing to avoid setting the objective field when possible.
- Adding --optimize-category in bigmler analyze subcommands to select the category whose evaluations will be optimized.
- Fixing bug: k-fold cross-validation failed for ensembles.
- Fixing bug: ensembles' evaluations failed when using the ensemble id.
- Fixing bug: bigmler analyze lacked model configuration options (weight-field, objective-fields, pruning, model-attributes...)
- Adding k-fold cross-validation for ensembles in bigmler analyze.
- Adding the --model-file, --cluster-file, --anomaly-file and --ensemble-file to produce entirely local predictions.
- Fixing bug: the bigmler delete subcommand was not using the --anomaly-tag, --anomaly-score-tag and --batch-anomaly-score-tag options.
- Fixing bug: the --no-test-header flag was not working.
- Fixing bug: --field-attributes was not working when used in addition to --types option.
- Adding the capability of creating a model/cluster/anomaly and its corresponding batch prediction from a train/test split using --test-split.
- Improving domain transformations for customized private settings.
- Fixing bug: model fields were not correctly set when the origin dataset was a new dataset generated by the --new-fields option.
- Refactoring predictions code, improving some cases performance and memory usage.
- Adding the --fast option to speed prediction by not storing partial results in files.
- Adding the --optimize option to the bigmler analyze --features command.
- Improving perfomance in individual model predictions.
- Forcing garbage collection to lower memory usage in ensemble's predictions.
- Fixing bug: batch predictions were not adding confidence when --prediction-info full was used.
- Adding bigmler anomaly as new subcommand to generate anomaly detectors, anomaly scores and batch anomaly scores.
- Fixing bug: source updates failed when using --locale and --types flags together.
- Updating bindings version and fixing code accordingly.
- Adding --k option to bigmler cluster to change the number of centroids.
- Fixing bug: --source-attributes and --dataset-attributes where not updated.
- Fixing bug: bigmler analyze was needlessly sampling data to evaluate.
- Adding the new --missing-splits flag to control if missing values are included in tree branches.
- Fixing bug: handling unicode command parameters on Windows.
- Fixing bug: handling stdout writes of unicodes on Windows.
- Fixing but for bigmler analyze: the subcommand failed when used in development created resources.
- Fixing bug when many models are evaluated in k-fold cross-validations. The create evaluation could fail when called with a non-finished model.
- Improving delete process. Promoting delete to a subcommand and filtering the type of resource to be deleted.
- Adding --dry-run option to delete.
- Adding --from-dir option to delete.
- Fixing bug when Gazibit report is used with personalized URL dashboards.
- Adding the --to-csv option to export datasets to a CSV file.
- Adding the --cluster-datasets option to generate the datasets related to the centroids in a cluster.
- Fixing bug for the --delete flag. Cluster, centroids and batch centroids could not be deleted.
- Documentation update.
- Adding cluster subcommand to generate clusters and centroid predictions.
- Fixing bug for the analyze subcommand. The --resume flag crashed when no --ouput-dir was used.
- Fixing bug for the analyze subcommand. The --features flag crashed when many long feature names were used.
- Fixing bug for --delete flag, broken by last fix.
- Fixing bug when field names contain commas and --model-fields tag is used.
- Fixing bug when deleting all resources by tag when ensembles were found.
- Adding --exclude-features flag to analyze.
- Fixing bug when utf8 characters were used in command lines.
- Adding the --balance flag to the analyze subcommand.
- Fixing bug for analyze. Some common flags allowed were not used.
- Fixing bug for analyze. User-given objective field was changed when using filtered datasets.
- Fixing bug for analyze. User-given objective field was not used.
- Docs update and test change to adapt to backend node threshold changes.
- Fixing bug in analyze --nodes. The default node steps could not be found.
- Setting dependency of new python bindings version 1.3.1.
- Fixing bug: --shared and --unshared should be considered only when set in the command line by the user. They were always updated, even when absent.
- Fixing bug: --remote predictions were not working when --model was used as training start point.
- Changing the Gazibit report for shared resources to include the model shared url in embedded format.
- Fixing bug: train and tests data could not be read from stdin.
Adding the
analyze
subcommand. The subcommand presents new features, such as:--cross-validation
that performs k-fold cross-validation,--features
that selects the best features to increase accuracy (or any other evaluation metric) using a smart search algorithm and--nodes
that selects the node threshold that ensures best accuracy (or any other evaluation metric) in user defined range of nodes.
- Fixing bug: --no-upload flag was not really used.
- Adding the --reports option to generate Gazibit reports.
- Adding the --shared flag to share the created dataset, model and evaluation.
- Fixing bug for model building, when objective field was specified and no --max-category was present the user given objective was not used.
- Fixing bug: max-category data stored even when --max-category was not used.
- Adding --missing-strategy option to allow different prediction strategies when a missing value is found in a split field. Available for local predictions, batch predictions and evaluations.
- Adding new --delete options: --newer-than and --older-than to delete lists of resources according to their creation date.
- Adding --multi-dataset flag to generate a new dataset from a list of equally structured datasets.
- Bug fixing: resume from multi-label processing from dataset was not working.
- Bug fixing: max parallel resource creation check did not check that all the
- older tasks ended, only the last of the slot. This caused more tasks than permitted to be sent in parallel.
- Improving multi-label training data uploads by zipping the extended file and transforming booleans from True/False to 1/0.
- Bug fixing: dataset objective field is not updated each time --objective is used, but only if it differs from the existing objective.
- Storing the --max-categories info (its number and the chosen other label) in user_metadata.
- Fix when using the combined method in --max-categories models. The combination function now uses confidence to choose the predicted category.
- Allowing full content text fields to be also used as --max-categories objective fields.
- Fix solving objective issues when its column number is zero.
- Adding the --objective-weights option to point to a CSV file containing the weights assigned to each class.
- Adding the --label-aggregates option to create new aggregate fields on the multi label fields such as count, first or last.
- Fix in local random forests' predictions. Sometimes the fields used in all the models were not correctly retrieved and some predictions could be erroneus.
- Fix to allow the input data for multi-label predictions to be expanded.
- Fix to retrieve from the models definition info the labels that were given by the user in its creation in multi-label models.
- Adding new --balance option to automatically balance all the classes evenly.
- Adding new --weight-field option to use the field contents as weights for the instances.
- Adding new --source-attributes, --ensemble-attributes, --evaluation-attributes and --batch-prediction-attributes options.
- Refactoring --multi-label resources to include its related info in the user_metadata attribute.
- Refactoring the main routine.
- Adding --batch-prediction-tag for delete operations.
- Fix to transmit --training-separator when creating remote sources.
- Fix for multiple multi-label fields: headers did not match rows contents in some cases.
- Fix for datasets generated using the --new-fields option. The new dataset was not used in model generation.
- Adding --multi-label-fields to provide a comma-separated list of multi-label fields in a file.
- Fix for ensembles' local predictions when order is used in tie break.
- Fix for duplicated model ids in models file.
- Adding new --node-threshold option to allow node limit in models.
- Adding new --model-attributes option pointing to a JSON file containing model attributes for model creation.
- Fix for missing modules during installation.
- Adding the --max-categories option to handle datasets with a high number of categories.
- Adding the --method combine option to produce predictions with the sets of datasets generated using --max-categories option.
- Fixing problem with --max-categories when the categorical field is not a preferred field of the dataset.
- Changing the --datasets option behaviour: it points to a file where dataset ids are stored, one per line, and now it reads all of them to be used in model and ensemble creation.
- Adding confidence to predictions output in full format
- Bug fixing: multi-label predictions failed when the --ensembles option is used to provide the ensemble information
- Bug fixing: --dataset-price could not be set.
- Adding the threshold combination method to the local ensemble.
- Bug fixing: --model-fields option with absolute field names was not compatible with multi-label classification models.
- Changing resource type checking function.
- Bug fixing: evaluations did not use the given combination method.
- Bug fixing: evaluation of an ensemble had turned into evaluations of its
- models.
- Adding pruning to the ensemble creation configuration options
- Changing fields_map column order: previously mapped dataset column number to model column number, now maps model column number to dataset column number.
- Adding evaluations to multi-label models.
- Bug fixing: unicode characters greater than ascii-127 caused crash in multi-label classification
- Adapting to predictions issued by the high performance prediction server and the 0.9.0 version of the python bindings.
- Support for shared models using the same version on python bindings.
- Support for different server names using environment variables.
- Adding ensembles' predictions for multi-label objective fields
- Bug fixing: in evaluation mode, evaluation for --dataset and --number-of-models > 1 did not select the 20% hold out instances to test the generated ensemble.
- Adding text analysis through the corresponding bindings
- Adding support for multi-label objective fields
- Adding --prediction-headers and --prediction-fields to improve --prediction-info formatting options for the predictions file
- Adding the ability to read --test input data from stdin
- Adding --seed option to generate different splits from a dataset
- Adding --test-separator flag
- Bug fixing: resume crash when remote predictions were not completed
- Bug fixing: Fields object for input data dict building lacked fields
- Bug fixing: test data was repeated in remote prediction function
- Bug fixing: Adding replacement=True as default for ensembles' creation
- Adding --max-parallel-evaluations flag
- Bug fixing: matching seeds in models and evaluations for cross validation
- Changing --model-fields and --dataset-fields flag to allow adding/removing fields with +/- prefix
- Refactoring local and remote prediction functions
- Adding 'full data' option to the --prediction-info flag to join test input data with prediction results in predictions file
- Fixing errors in documentation and adding install for windows info
- Adding new flag to control predictions file information
- Bug fixing: using default sample-rate in ensemble evaluations
- Adding standard deviation to evaluation measures in cross-validation
- Bug fixing: using only-model argument to download fields in models
- Adding delete for ensembles
- Creating ensembles when the number of models is greater than one
- Remote predictions using ensembles
- Adding cross-validation feature
- Using user locale to create new resources in BigML
- Adding --ensemble flag to use ensembles in predictions and evaluations
- Deep refactoring of main resources management
- Fixing bug in batch_predict for no headers test sets
- Fixing bug for wide dataset's models than need query-string to retrieve all fields
- Fixing bug in test asserts to catch subprocess raise
- Adding default missing tokens to models
- Adding stdin input for --train flag
- Fixing bug when reading descriptions in --field-attributes
- Refactoring to get status from api function
- Adding confidence to combined predictions
- Evaluations management
- console monitoring of process advance
- resume option
- user defaults
- Refactoring to improve readability
- Improved locale management.
- Adds progressive handling for large numbers of models.
- More options in field attributes update feature.
- New flag to combine local existing predictions.
- More methods in local predictions: plurality, confidence weighted.
- New flag for locale settings configuration.
- Filtering only finished resources.
- Fix to ensure windows compatibility.
- Initial release.