-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastADC implementation #470
base: main
Are you sure you want to change the base?
Conversation
f7a5ca2
to
3cc3060
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
src/core/algorithms/dc/FastADC/misc/typed_column_data_value_differences.cpp
Outdated
Show resolved
Hide resolved
src/core/algorithms/dc/FastADC/misc/typed_column_data_value_differences.cpp
Show resolved
Hide resolved
src/core/algorithms/dc/FastADC/util/evidence_aux_structures_builder.cpp
Outdated
Show resolved
Hide resolved
src/core/algorithms/dc/FastADC/util/evidence_aux_structures_builder.h
Outdated
Show resolved
Hide resolved
928755d
to
f3a8b73
Compare
TypedColumData kInt type is int64_t, and FastADC algorithm uses 64-bit long types
FastADC algorithm for mining approximate Denial Constraints will be implemented here.
IndexProvider assigns unique indices to each distinct object of type T added to it. It will be used later for two main operations: 1. Map out all predicates to numbers to use dynamic bitsets for quick intersection/etc. 2. Hash all values in the table keeping their relative order (ignoring columns of DC-unsupported types. Only ints, doubles and strings are allowed). That is, the same values are substituited by the same integers, and higher values are replaced by larger integers
FastADC algotithm decides which column pairs to use to create predicate with by checking whether they are comparable with `==, !=, <, >, >=, <=` or with `==, !=`. In both cases when the columns are of expected type (string, int or double) but different, we need to assert some kind of similarity between them. Otherwise the predicate space will be too big and not really interesting from the DC finding stadpoint, since there will be predicates like `!=` in between two completely different data attributes. These metrics are: - "shared percentage" Measures the overlap between two columns by considering the frequency of each unique element. It calculates the frequency of each unique value in both columns and determines the ratio of the shared values to the total values. - "average ratio" Computes the average value of each column and then returns the ratio of the smaller average to the larger average.
Generates and categorizes predicates for the future evidence set construction
This test builds predicate space from the provided data (CSV file) and compares the list of predicates that will be used later for DC discovery with the expected one. The expected list of predicates was built manually from running the FastADC Java implementation. The next test check that inverse and mutex maps are being built correctly.
This commit introduces the Position List Indexes (Pli) building. It's working with hashed column data, such that that equal values are represented by identical keys, and values are sorted by their natural order. We also build a so-called PliShards, which are just Pli's for a specific segment of the dataset, splitting whole dataset into a bunch of shards. This will allow us to be more efficient later.
This class organizes predicates into packs and creates a correction map, which will be used for optimizing predicate comparisons in derived clasees, that will actually build clues from PLIs
Inherits from CommonClueSetBuilder and builds clues based from one PLI shard
Inherits from CommonClueSetBuilder and builds clues based from two PLI shards
Validates the number of bits in the clue, the structure of the predicate packs. And the correction map which stores predicate-to-bitset mappings
This is a class for constructing clues from PliShards.
The expected values are, once again, are taken from Java implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
For now this class builds necessary structures to build Evidences later. The structures are clue set, correction map and cardinality mask.
This is class that maps 1to1 with Clue. The ApproximateEvidenceInversion algorithm (AEI) that will build approximate denial constraints is using Evidences as it's input
EvidenceSet is basically just a vector of evidences. The only thing that's adding is a method to get total count (I probably can publically inherit from std::vector<Evidence>...?)
Add the building of evidences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my comments on this PR will address specific issues related to variations of Clang (see #507). You can also rebase to my branch (PR#507) and see CI failures.
I'm going to add a wiki page with detailed description of Clang issues, but it'll take some time, so now, if you have questions, you'll better reach me on Telegram or here.
Now I'm posting only first part of comments, those related to build phase. I'll post more comments, if there will be problems with tests.
@@ -0,0 +1,98 @@ | |||
#pragma once | |||
|
|||
#include "dc/FastADC/model/evidence_set.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "dc/FastADC/model/evidence_set.h" | |
#include "dc/FastADC/model/evidence_set.h" | |
#include "util/bitset_extensions.h" |
|
||
for (auto const& evidence : evidence_set_) { | ||
PredicateBitset bitset = evidence.evidence; | ||
for (size_t i = bitset._Find_first(); i != bitset.size(); i = bitset._Find_next(i)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (size_t i = bitset._Find_first(); i != bitset.size(); i = bitset._Find_next(i)) { | |
util::BitsetIterator<kPredicateBits> iter{bitset}; | |
for (size_t i = iter.Pos(); i != bitset.size(); iter.Next(), i = iter.Pos()) { |
for (auto const& dc : unhit_evi_dcs) { | ||
boost::dynamic_bitset<> unhit_cand = dc.cand & evi; | ||
if (unhit_cand.any()) | ||
new_candidates.Add(DCCandidate(dc.bitset, unhit_cand)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_candidates.Add(DCCandidate(dc.bitset, unhit_cand)); | |
new_candidates.Add(DCCandidate{dc.bitset, unhit_cand}); |
Parenthised aggregate initialization isn't currenlty supported by Apple Clang. Only braced one is allowed.
There's no other problems with Clang. |
|
||
#include <easylogging++.h> | ||
|
||
#include "builtin.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "builtin.h" | |
#include "model/types/builtin.h" |
#include <easylogging++.h> | ||
|
||
#include "builtin.h" | ||
#include "misc.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "misc.h" | |
#include "algorithms/dc/FastADC/misc/misc.h" |
|
||
#include "builtin.h" | ||
#include "misc.h" | ||
#include "table/column.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "table/column.h" | |
#include "model/table/column.h" |
#include "builtin.h" | ||
#include "misc.h" | ||
#include "table/column.h" | ||
#include "table/typed_column_data.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "table/typed_column_data.h" | |
#include "model/table/typed_column_data.h" |
#include "misc.h" | ||
#include "table/column.h" | ||
#include "table/typed_column_data.h" | ||
#include "type.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "type.h" | |
#include "model/types/type.h" |
sb << c_not << "{ "; | ||
bool first = true; | ||
for (PredicatePtr predicate : predicate_set_) { | ||
if (!first) { | ||
sb << c_and; | ||
} | ||
sb << predicate->ToString(); | ||
first = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sb << c_not << "{ "; | |
bool first = true; | |
for (PredicatePtr predicate : predicate_set_) { | |
if (!first) { | |
sb << c_and; | |
} | |
sb << predicate->ToString(); | |
first = false; | |
} | |
sb << c_not << "{ "; | |
sb << predicate_set_.front().ToString(); | |
for (auto it = std::next(predicate_set_.begin()); it != predicate_set_.end(); ++it) { | |
sb << c_and << it->ToString(); | |
} |
size_t pos = 0; | ||
while (tmp.any()) { | ||
if (tmp.test(0)) { | ||
evidence ^= correctionMap[pos]; | ||
} | ||
tmp >>= 1; | ||
pos++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t pos = 0; | |
while (tmp.any()) { | |
if (tmp.test(0)) { | |
evidence ^= correctionMap[pos]; | |
} | |
tmp >>= 1; | |
pos++; | |
} | |
for (size_t pos = 0; tmp.any(); ++pos) { | |
if (tmp.test(0)) { | |
evidence ^= correctionMap[pos]; | |
} | |
tmp >>= 1; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic number btw
|
||
#include <easylogging++.h> | ||
|
||
#include "evidence.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use full name
#include "operator.h" | ||
|
||
#include <utility> | ||
|
||
#include "builtin.h" | ||
#include "type.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full names also here
Operator::OperatorMap<OperatorType> Operator::InitializeInverseMap() { | ||
return {{OperatorType::kEqual, OperatorType::kUnequal}, | ||
{OperatorType::kUnequal, OperatorType::kEqual}, | ||
{OperatorType::kGreater, OperatorType::kLessEqual}, | ||
{OperatorType::kLess, OperatorType::kGreaterEqual}, | ||
{OperatorType::kGreaterEqual, OperatorType::kLess}, | ||
{OperatorType::kLessEqual, OperatorType::kGreater}}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a result of a previous discussion #417 (comment), those static fields should be declared as constexpr and frozen::unordered_map should be utilized.
For inspiration look at "algorithms/dc/model/operator.h"
|
||
std::vector<PredicatePtr> const& Predicate::GetImplications(PredicateProvider* provider) const { | ||
if (implications_.empty()) { | ||
auto op_implications = op_.GetImplications(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto type is not obviuos
} | ||
|
||
std::vector<PredicatePtr> const& Predicate::GetImplications(PredicateProvider* provider) const { | ||
if (implications_.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the same style as above
if (implications_.empty()) { | |
if (!implications_) { |
#include "column_operand.h" | ||
#include "operator.h" | ||
#include "table/typed_column_data.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full names also here and everywhere else
plis.push_back( | ||
BuildPli(hashed_input[col], input[col].IsNumeric(), shard_beg, shard_end)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such complex syntax impairs readability. Declare a variable and then move
clusters[cluster_id].push_back(row); | ||
} | ||
|
||
return Pli(clusters, keys, key_to_cluster_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Pli(clusters, keys, key_to_cluster_id); | |
return {clusters, keys, key_to_cluster_id}; |
/** Create predicate object and return pointer to it or obtain it from cache */ | ||
PredicatePtr GetPredicate(Operator const& op, ColumnOperand const& left, | ||
ColumnOperand const& right) { | ||
auto [iter, _] = predicates_[op][left].try_emplace(right, op, left, right); | ||
return &iter->second; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Predicate
really such a complex structure that it's worth caching? Or an algorithm supposes creating a huge amount of predicates?
boost::dynamic_bitset<> result(lhs); | ||
result &= rhs; | ||
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply
boost::dynamic_bitset<> result(lhs); | |
result &= rhs; | |
return result; | |
return lhs & rhs; |
And then operator&
is already supported for these type
for (; e < evidences_.size(); ++e) { | ||
if (!(IsSubset(dc, evidences_[e].evidence))) { | ||
target -= evidences_[e].count; | ||
if (target <= 0) return true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems little bit more concise
for (; e < evidences_.size(); ++e) { | |
if (!(IsSubset(dc, evidences_[e].evidence))) { | |
target -= evidences_[e].count; | |
if (target <= 0) return true; | |
} | |
} | |
for (; e < evidences_.size(); ++e) { | |
if (IsSubset(dc, evidences_[e].evidence)) continue; | |
target -= evidences_[e].count; | |
if (target <= 0) return true; | |
} |
bool TransitivityStep() { | ||
std::unordered_set<PredicatePtr> additions; | ||
|
||
// Add implications and symmetric implications to additions | ||
std::for_each(closure_.begin(), closure_.end(), [&](PredicatePtr p) { | ||
if (p->GetSymmetric(provider_) != nullptr) { | ||
auto const& sym_implications = | ||
p->GetSymmetric(provider_)->GetImplications(provider_); | ||
additions.insert(sym_implications.begin(), sym_implications.end()); | ||
} | ||
auto const& implications = p->GetImplications(provider_); | ||
additions.insert(implications.begin(), implications.end()); | ||
}); | ||
|
||
for (auto const& [op, list] : grouped_) { | ||
for (Operator op_trans : op.GetTransitives()) { | ||
auto const p_trans_it = grouped_.find(op_trans); | ||
if (p_trans_it == grouped_.end()) continue; | ||
|
||
std::vector<PredicatePtr> const& p_trans = p_trans_it->second; | ||
|
||
for (PredicatePtr p : list) { | ||
for (PredicatePtr p2 : p_trans) { | ||
if (p == p2) continue; | ||
|
||
// Transitive inference: A -> B; B -> C | ||
if (p->GetRightOperand() == p2->GetLeftOperand()) { | ||
PredicatePtr new_pred = provider_->GetPredicate(op, p->GetLeftOperand(), | ||
p2->GetRightOperand()); | ||
additions.insert(new_pred); | ||
} | ||
|
||
// Transitive inference: C -> A; A -> B | ||
if (p2->GetRightOperand() == p->GetLeftOperand()) { | ||
PredicatePtr new_pred = provider_->GetPredicate( | ||
op, p2->GetLeftOperand(), p->GetRightOperand()); | ||
additions.insert(new_pred); | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
// Handle special cases for operators | ||
auto const& uneq_list_it = grouped_.find(OperatorType::kUnequal); | ||
if (uneq_list_it != grouped_.end()) { | ||
for (PredicatePtr p : uneq_list_it->second) { | ||
if (closure_.Contains(provider_->GetPredicate( | ||
OperatorType::kLessEqual, p->GetLeftOperand(), p->GetRightOperand()))) { | ||
additions.insert(provider_->GetPredicate( | ||
OperatorType::kLess, p->GetLeftOperand(), p->GetRightOperand())); | ||
} | ||
if (closure_.Contains(provider_->GetPredicate(OperatorType::kGreaterEqual, | ||
p->GetLeftOperand(), | ||
p->GetRightOperand()))) { | ||
additions.insert(provider_->GetPredicate( | ||
OperatorType::kGreater, p->GetLeftOperand(), p->GetRightOperand())); | ||
} | ||
} | ||
} | ||
|
||
auto const& leq_list_it = grouped_.find(OperatorType::kLessEqual); | ||
if (leq_list_it != grouped_.end()) { | ||
for (PredicatePtr p : leq_list_it->second) { | ||
if (closure_.Contains(provider_->GetPredicate(OperatorType::kGreaterEqual, | ||
p->GetLeftOperand(), | ||
p->GetRightOperand()))) { | ||
additions.insert(provider_->GetPredicate( | ||
OperatorType::kEqual, p->GetLeftOperand(), p->GetRightOperand())); | ||
} | ||
} | ||
} | ||
|
||
// Add all newly inferred predicates | ||
return AddAll(additions); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems too complex, definitely should be split into several functions
auto const& pivot_keys = pivotPli.GetKeys(); | ||
|
||
for (size_t i = 0; i < pivot_keys.size(); ++i) { | ||
size_t j; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicit initialization
auto const& pivot_clusters = pivotPli.GetClusters(); | ||
auto const& probe_clusters = probePli.GetClusters(); | ||
auto const& pivot_keys = pivotPli.GetKeys(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not obvious auto
type
auto const& pivot_keys = pivotPli.GetKeys(); | ||
|
||
for (size_t i = 0; i < pivot_keys.size(); ++i) { | ||
size_t j; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicit initialization
auto const& pivot_keys = pivotPli.GetKeys(); | ||
auto const& probe_keys = probePli.GetKeys(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not obvious auto
type
namespace { | ||
namespace py = pybind11; | ||
} // namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why anonymous namespace?
@@ -1,5 +1,6 @@ | |||
#pragma once | |||
|
|||
#include "descriptions.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used anywhere?
#include "descriptions.h" | ||
#include "names.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "descriptions.h" | |
#include "names.h" | |
#include "config/names_and_descriptions.h" |
"\" is of unsupported type. Only numeric and string types are supported."); | ||
} | ||
|
||
for (std::size_t row_index = 0; row_index < rows_num; row_index++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (std::size_t row_index = 0; row_index < rows_num; row_index++) { | |
for (std::size_t row_index = 0; row_index < rows_num; ++row_index) { |
if (column.IsNull(row_index)) { | ||
throw std::runtime_error("Some of the value coordinates are nulls."); | ||
} | ||
if (column.IsEmpty(row_index)) { | ||
throw std::runtime_error("Some of the value coordinates are empty."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not important you may combine with column.IsNullOrEmpty()
if (column.IsNull(row_index)) { | |
throw std::runtime_error("Some of the value coordinates are nulls."); | |
} | |
if (column.IsEmpty(row_index)) { | |
throw std::runtime_error("Some of the value coordinates are empty."); | |
} | |
if (column.IsNullOrEmpty(row_index)) { | |
throw std::runtime_error("Some of the value coordinates are null or empty."); | |
} |
break; | ||
default: | ||
LOG(DEBUG) << "Column type " << c1.GetType().ToString() << " is not numeric"; | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return -1; | |
return -1.0; |
} | ||
|
||
double GetSharedPercentage(model::TypedColumnData const& c1, model::TypedColumnData const& c2) { | ||
if (c1.GetColumn() == c2.GetColumn()) return 1.; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (c1.GetColumn() == c2.GetColumn()) return 1.; | |
if (c1.GetColumn() == c2.GetColumn()) return 1.0; |
LOG(DEBUG) << "Column " << c1.GetColumn()->ToString() << " with type " | ||
<< c1.GetType().ToString() | ||
<< " is not supported for shared percentage calculation"; | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return -1; | |
return -1.0; |
} | ||
|
||
double GetAverageRatio(model::TypedColumnData const& c1, model::TypedColumnData const& c2) { | ||
if (c1.GetColumn() == c2.GetColumn()) return 1.; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (c1.GetColumn() == c2.GetColumn()) return 1.; | |
if (c1.GetColumn() == c2.GetColumn()) return 1.0; |
static int CompareBitsets(boost::dynamic_bitset<> const& lhs, | ||
boost::dynamic_bitset<> const& rhs) { | ||
size_t max_size = std::max(lhs.size(), rhs.size()); | ||
|
||
for (size_t i = 0; i < max_size; ++i) { | ||
bool lhs_bit = i < lhs.size() ? lhs[i] : false; | ||
bool rhs_bit = i < rhs.size() ? rhs[i] : false; | ||
|
||
if (lhs_bit != rhs_bit) { | ||
return lhs_bit ? 1 : -1; // lhs_bit is true and rhs_bit is false => lhs > rhs | ||
} | ||
} | ||
return 0; // bitsets are equal | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth using CompareResult
enum defined in model/types/builtin.h
std::string result = "{ "; | ||
for (PredicatePtr predicate : *this) { | ||
result += predicate->ToString() + " "; | ||
} | ||
result += "}"; | ||
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string result = "{ "; | |
for (PredicatePtr predicate : *this) { | |
result += predicate->ToString() + " "; | |
} | |
result += "}"; | |
return result; | |
std::stringstream ss; | |
ss << "{ " | |
for (PredicatePtr predicate : *this) { | |
ss << predicate->ToString() << ' '; | |
} | |
ss << '}'; | |
return ss.str(); |
#include <sstream> | ||
#include <string> | ||
|
||
#include "predicate_set.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use full name
|
||
#include <sstream> | ||
#include <string> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IWYU, add dynamic_bitset
header and predicate.h
#include <boost/move/utility_core.hpp> | ||
|
||
#include "dc/FastADC/providers/index_provider.h" | ||
#include "predicate.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full name
std::sort(evidences_.begin(), evidences_.end(), | ||
[](Evidence const& o1, Evidence const& o2) { return o2.count < o1.count; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare a lambda to improve readablitiy, change the order of operands in comparison so it's clear that the sequence is sorted in descending order.
std::sort(evidences_.begin(), evidences_.end(), | |
[](Evidence const& o1, Evidence const& o2) { return o2.count < o1.count; }); | |
auto cmp = [](Evidence const& o1, Evidence const& o2) { return o1.count > o2.count; }; | |
std::sort(evidences_.begin(), evidences_.end(), cmp); |
public: | ||
bool Add(boost::dynamic_bitset<> const& bs) { | ||
Add(bs, bs.find_first()); | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary if it always returns true
?
boost::dynamic_bitset<> const* res = | ||
it->second->GetSubset(add, add.find_next(next_bit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a variable for returned value, it will improve readablitiy
AddAndCategorizePredicate(ColumnOperand(input[i].GetColumn(), ColumnOperandTuple::t), | ||
ColumnOperand(input[j].GetColumn(), ColumnOperandTuple::s), | ||
comparable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddAndCategorizePredicate(ColumnOperand(input[i].GetColumn(), ColumnOperandTuple::t), | |
ColumnOperand(input[j].GetColumn(), ColumnOperandTuple::s), | |
comparable); | |
auto t_col_op = ColumnOperand(input[i].GetColumn(), ColumnOperandTuple::t); | |
auto s_col_op = ColumnOperand(input[j].GetColumn(), ColumnOperandTuple::s); | |
AddAndCategorizePredicate(t_col_op, s_col_op, comparable); |
std::stable_sort(indexes.begin(), indexes.end(), | ||
[&coverages](int i, int j) { return coverages[i] < coverages[j]; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::stable_sort(indexes.begin(), indexes.end(), | |
[&coverages](int i, int j) { return coverages[i] < coverages[j]; }); | |
auto cmp = [&coverages](int i, int j) { return coverages[i] < coverages[j]; } | |
std::stable_sort(indexes.begin(), indexes.end(), cmp); |
void SingleClueSetBuilder::CorrectNumSingle(std::vector<Clue>& clues, Pli const& pli, | ||
Clue const& eqMask, Clue const& gtMask) { | ||
for (size_t i = 0; i < pli.Size(); ++i) { | ||
auto const& cluster = pli.Get(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not obvious auto
type
FastADC was introduced in this paper to efficiently mine Denial Constraints (DCs). A DC states that for any pair of rows, it should never be the case that some condition (e.g., t.A = s.A and t.B ≠ s.B) is satisfied.
This PR implements that algorithm.