There will be only one! #1

lucarosellini · 2014-01-07T08:40:26Z

No description provided.

Upgrade Kafka dependecy to 0.8.0 release version

The 'duration' field was incorrectly renamed to 'task time' in the table that lists stages.

We clone hadoop key and values by default and reuse objects if asked to. We try to clone for most common types of writables and we call WritableUtils.clone otherwise intention is to optimize, for example for NullWritable there is no need and for Long, int and String creating a new object with value set would be faster than doing copy on object hopefully. There is another way to do this PR where we ask for both key and values whether to clone them or not, but could not think of a use case for it except either of them is actually a NullWritable for which I have already worked around. So thought that would be unnecessary.

Fix configure didn't work small problem in ALS

…presentation.

This reverts commit 942c80b.

This reverts commit 669ba4c.

Revert PR 381 This PR missed a bunch of test cases that require "spark.cleaner.ttl". I think it is what is causing test failures on Jenkins right now (though it's a bit hard to tell because the DNS for cs.berkeley.edu is down). I'm submitting this to see if it fixes jeknins. I did try just patching various tests but it was taking a really long time because there are a bunch of them, so for now I'm just seeing if a revert works.

Fix UI bug introduced in #244. The 'duration' field was incorrectly renamed to 'task time' in the table that lists stages.

Minor api usability changes - Expose checkpoint directory - since it is autogenerated now - null check for jars - Expose SparkHadoopUtil : so that configuration creation is abstracted even from user code to avoid duplication of functionality already in spark.

Correct L2 regularized weight update with canonical form Per thread on the user@ mailing list, and comments from Ameet, I believe the weight update for L2 regularization needs to be corrected. See http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/%3CCAH3_EVMetuQuhj3__NdUniDLc4P-FMmmrmxw9TS14or8nT4BNQ%40mail.gmail.com%3E

Updated java API docs for streaming, along with very minor changes in the code examples. Docs updated for: Scala: StreamingContext, DStream, PairDStreamFunctions Java: JavaStreamingContext, JavaDStream, JavaPairDStream Example updated: JavaQueueStream: Not use deprecated method ActorWordCount: Use the public interface the right way.

Only log error on missing jar to allow spark examples to jar. Right now to run the spark examples on Yarn you have to use the --addJars option and put the jar in hdfs. To make that nicer so the user doesn't have to specify the --addJars option change it to simply log an error instead of throwing.

This patch was proposed by Sean Mackrory.

Restricting /lib to top level directory in .gitignore This patch was proposed by Sean Mackrory.

Added StreamingContext.awaitTermination to streaming examples StreamingContext.start() currently starts a non-daemon thread which prevents termination of a Spark Streaming program even if main function has exited. Since the expected behavior of a streaming program is to run until explicitly killed, this was sort of fine when spark streaming applications are launched from the command line. However, when launched in Yarn-standalone mode, this did not work as the driver effectively got terminated when the main function exits. So SparkStreaming examples did not work on Yarn. This addition to the examples ensures that the examples work on Yarn and also ensures that everyone learns that StreamingContext.awaitTermination() being necessary for SparkStreaming programs to wait. The true bug-fix of making sure all threads by Spark Streaming are daemon threads is left for post-0.9.

SPARK-1028 : fix "set MASTER automatically fails" bug. spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem. The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.

@matei

Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM. bin/run-example scripts was not passing Java properties set through the SPARK_JAVA_OPTS to the example. This is important for examples like Twitter** as the Twitter authentication information must be set through java properties. Hence added the same JAVA_OPTS code in run-example as it is in bin/spark-class script. Also added SPARK_MEM, in case someone wants to run the example with different amounts of memory. This can be removed if it is not tune with the intended semantics of the run-example scripts. @matei Please check this soon I want this to go in 0.9-rc4

@mridulm

Handful of 0.9 fixes This patch addresses a few fixes for Spark 0.9.0 based on the last release candidate. @mridulm gets credit for reporting most of the issues here. Many of the fixes here are based on his work in #477 and follow up discussion with him.

@transient

…ts-for-mllib [MLlib] Use a LocalSparkContext trait in test suites Replaces the 9 instances of ```scala class XXXSuite extends FunSuite with BeforeAndAfterAll { @transient private var sc: SparkContext = _ override def beforeAll() { sc = new SparkContext("local", "test") } override def afterAll() { sc.stop() System.clearProperty("spark.driver.port") } ``` with ```scala class XXXSuite extends FunSuite with LocalSparkContext { ```

It's the task count across the cluster, not per worker, per machine, per core, or anything else.

Clarify spark.default.parallelism It's the task count across the cluster, not per worker, per machine, per core, or anything else.

Synchronized with apache:master

ankurdave and others added 30 commits January 10, 2014 23:46

Add GC note to GraphLab

2d7e8d8

Soften wording about GraphX superseding Bagel

362b942

More edits.

b8a44f1

Move Analytics to algorithms and fix doc

34496d6

More organizational changes and dropping the benchmark plot.

0c9d39b

Optimize Edge.lexicographicOrdering

4f7ddf4

Addressing comment about Graph Processing in docs.

56a245c

Fix configure didn't work small problem in ALS

cbfbc01

algorithms -> lib

feaa078

starting structural operator discussion.

1f45e4e

Merge pull request #373 from jerryshao/kafka-upgrade

4216178

Upgrade Kafka dependecy to 0.8.0 release version

Fix UI bug introduced in #244.

b313e15

The 'duration' field was incorrectly renamed to 'task time' in the table that lists stages.

Finished documenting structural operators and starting join operators.

fac44bb

Make nullValue and VertexSet package-private

0b5c49e

Remove GraphLab

732333d

Fix indent and use SparkConf in Analytics

64f73f7

Minor update for clone writables and more documentation.

b0fbfcc

One-line Scaladoc comments in Edge and EdgeDirection

55101f5

Use SparkConf in GraphX tests (via LocalSparkContext)

574c0d2

Merge pull request #387 from jerryshao/conf-fix

6510f04

Fix configure didn't work small problem in ALS

Make EdgeDirection val instead of case object for Java compat.

02771aa

Stop SparkListenerBus daemon thread when DAGScheduler is stopped.

2180c87

Finished docummenting join operators and revised some of the initial …

64c4593

…presentation.

Revert "Fix one unit test that was not setting spark.cleaner.ttl"

22d4d62

This reverts commit 942c80b.

Revert "Fix default TTL for metadata cleaner"

07b952e

This reverts commit 669ba4c.

Correcting typos in documentation.

cf57b1b

Renamed cloneKeyValues to cloneRecords; updated docs.

362cda1

Merge pull request #388 from pwendell/master

dbc11df

Fix UI bug introduced in #244. The 'duration' field was incorrectly renamed to 'task time' in the table that lists stages.

pwendell and others added 30 commits January 18, 2014 16:23

LocalSparkContext for MLlib

720836a

Only log error on missing jar to allow spark examples to jar.

ceb79a3

update comment

dd56b21

Removing docs on akka options

cdb003e

Minor fixes

54867e9

Bug fix for reporting of spill output

1b29914

Force use of LZF when spilling data

c324ac1

Fixing speculation bug

f84400e

Remove shuffle files if they are still present on a machine.

de526ad

Avoid matching attempt files in the checkpoint

d46df96

Added StreamingContext.awaitTermination to streaming examples.

2e95174

Restricting /lib to top level directory in .gitignore

e437069

This patch was proposed by Sean Mackrory.

Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM.

e0b741d

Merge pull request #483 from pwendell/gitignore

7373ffb

Restricting /lib to top level directory in .gitignore This patch was proposed by Sean Mackrory.

Adding small code comment

a917a87

Removed SPARK_MEM from run-examples.

65869f8

Style clean-up

a9bcc98

Fixed import order

3a067b4

Clarify spark.default.parallelism

069bb94

It's the task count across the cluster, not per worker, per machine, per core, or anything else.

Merge pull request #489 from ash211/patch-6

749f842

Clarify spark.default.parallelism It's the task count across the cluster, not per worker, per machine, per core, or anything else.

Merge pull request #6 from apache/master

c205dc7

Synchronized with apache:master

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There will be only one! #1

There will be only one! #1

lucarosellini commented Jan 7, 2014

There will be only one! #1

Are you sure you want to change the base?

There will be only one! #1

Conversation

lucarosellini commented Jan 7, 2014