Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement set based aod verifier, support aod mining in fastod #468

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
4267c23
Add ColumnIndexOption and move there ValidateIndex
polyntsov Sep 28, 2024
31f3c47
Add parameter to allow empty list of indices in IndicesOption
polyntsov Sep 28, 2024
1e76762
Move complex stripeed partition swap and create definitions to cpp
polyntsov Sep 28, 2024
20592c8
Refactor Swap in complex stripped partition
polyntsov Sep 28, 2024
4d65979
Introduce od::Ordering enum and use it instead of bool Ascending
polyntsov Sep 28, 2024
f043b22
Introduce partition type for complex stripped partition as create param
polyntsov Sep 28, 2024
b452c28
Accept in CreateAttributeSet any range as list of attributes
polyntsov Sep 29, 2024
789c617
Store DataFrame in ComplexStrippedPartition as raw pointer
polyntsov Sep 29, 2024
0214e5d
Store DataFrame directly as value in Fastod
polyntsov Sep 29, 2024
eef5077
Move nd_verifier's VectorToString to general util and accept any range
polyntsov Sep 29, 2024
7cff09d
Add missing <vector> include to config/iption.h
polyntsov Sep 30, 2024
cae91d2
Add method to convert fastod::AttributeSet to vector of column indices
polyntsov Sep 30, 2024
187a778
Add a callback to Option which is called before the option is set
polyntsov Sep 30, 2024
1f901a6
Implement getters for context and cols in canonical ods
polyntsov Sep 30, 2024
cdb4980
Implement a function to load algo data without configuring execute opts
polyntsov Sep 30, 2024
8b24e2f
Introduce is required callback to option
polyntsov Sep 30, 2024
d938c38
Allow absence of non-required options in algo factory
polyntsov Sep 30, 2024
fa7af04
Implement aod verifier and cover it with tests
polyntsov Sep 30, 2024
7d04aea
Implement python bindings to set based aod verifier
polyntsov Sep 30, 2024
60c2927
Implement error parameter for fastod and add tests for aod mining
polyntsov Sep 30, 2024
df33951
Specify in readme that we now support approximate set-based ODs
polyntsov Sep 30, 2024
4c350d0
Implement aod verification python example
polyntsov Sep 30, 2024
56a0f31
Avoid unnecessary copying of partitions in fastod partition cache
polyntsov Sep 30, 2024
a0e7af5
Don't store indices vectors via shared ptr in fastod complex partition
polyntsov Sep 30, 2024
3f6ec49
Fallback to split and swap validation when error is zero in canonical od
polyntsov Oct 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion src/core/algorithms/od/fastod/fastod.h
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, some commit description will be nice

Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

#include <memory>
#include <string>
#include <initializer_list>
#include <unordered_map>
#include <unordered_set>
#include <vector>
Expand Down Expand Up @@ -115,7 +116,9 @@ class Fastod : public Algorithm {
for (model::ColumnIndex i = 0; i < data_->GetColumnCount(); i++) {
for (model::ColumnIndex j = 0; j < data_->GetColumnCount(); j++) {
if (i == j) continue;
CSPut<Ordering>(fastod::CreateAttributeSet({i, j}, data_->GetColumnCount()),
CSPut<Ordering>(fastod::CreateAttributeSet(
std::initializer_list<model::ColumnIndex>{i, j},
data_->GetColumnCount()),
AttributePair(i, j));
}
}
Expand Down
5 changes: 3 additions & 2 deletions src/core/algorithms/od/fastod/model/attribute_set.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include <functional>
#include <stdexcept>
#include <string>
#include <ranges>

#include <boost/functional/hash.hpp>

Expand Down Expand Up @@ -159,11 +160,11 @@ struct boost::hash<algos::fastod::AttributeSet> {

namespace algos::fastod {

inline AttributeSet CreateAttributeSet(std::initializer_list<model::ColumnIndex> attributes,
inline AttributeSet CreateAttributeSet(std::ranges::input_range auto const& attributes,
model::ColumnIndex size) {
Comment on lines +165 to 166
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe std::span<const model::ColumnIndex>?
We're only need a view of a sequence and this way we're explicitly saying that we want model::ColumnIndex objects there

AttributeSet attr_set(size);

for (auto const attr : attributes) {
for (model::ColumnIndex const attr : attributes) {
attr_set.Set(attr);
}

Expand Down