Skip to content
Ondřej Moravčík edited this page May 14, 2015 · 7 revisions

Data can be upload as single file.

sc.text_file(FILE, workers_num, serializer=nil)

All files on directory.

sc.whole_text_files(DIRECTORY, workers_num, serializer=nil)

Direct. Data must be iterable.

sc.parallelize(data, workers_num, serializer=nil)
sc.parallelize([1,2,3,4,5], workers_num, serializer=nil)
sc.parallelize(1..5, workers_num, serializer=nil)

Options

workers_num
Min count of works computing this task.
(This value can be overwriten by spark)
serializer
``` wefwef ```
Clone this wiki locally