Crowd Context #73

sjyk · 2015-08-26T04:23:52Z

Include other cols in the task.

sjyk · 2015-08-27T05:56:56Z

This is actually hard to do, since the current code applies a distinct count first and then runs attrdedup

thisisdhaas · 2015-08-27T07:47:54Z

Hm. Could we rewrite the initial count distinct query as a group by?

e.g. SELECT name, first(col1), first(col2), ... FROM t GROUP BY name

This requires spark SQL to have a first aggregate, or some other way of getting a value out of the group.

sjyk self-assigned this Aug 27, 2015

Provide feedback