Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There will be only one! #1

Open
wants to merge 1,028 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1028 commits
Select commit Hold shift + click to select a range
2d7e8d8
Add GC note to GraphLab
ankurdave Jan 11, 2014
362b942
Soften wording about GraphX superseding Bagel
ankurdave Jan 11, 2014
b8a44f1
More edits.
jegonzal Jan 11, 2014
34496d6
Move Analytics to algorithms and fix doc
ankurdave Jan 11, 2014
0c9d39b
More organizational changes and dropping the benchmark plot.
jegonzal Jan 11, 2014
4f7ddf4
Optimize Edge.lexicographicOrdering
ankurdave Jan 11, 2014
56a245c
Addressing comment about Graph Processing in docs.
jegonzal Jan 11, 2014
cbfbc01
Fix configure didn't work small problem in ALS
jerryshao Jan 11, 2014
feaa078
algorithms -> lib
ankurdave Jan 11, 2014
1f45e4e
starting structural operator discussion.
jegonzal Jan 11, 2014
4216178
Merge pull request #373 from jerryshao/kafka-upgrade
pwendell Jan 11, 2014
b313e15
Fix UI bug introduced in #244.
pwendell Jan 11, 2014
fac44bb
Finished documenting structural operators and starting join operators.
jegonzal Jan 11, 2014
0b5c49e
Make nullValue and VertexSet package-private
ankurdave Jan 11, 2014
732333d
Remove GraphLab
ankurdave Jan 11, 2014
ee6e7f9
Merge pull request #359 from ScrapCodes/clone-writables
rxin Jan 11, 2014
64f73f7
Fix indent and use SparkConf in Analytics
ankurdave Jan 11, 2014
b0fbfcc
Minor update for clone writables and more documentation.
rxin Jan 11, 2014
55101f5
One-line Scaladoc comments in Edge and EdgeDirection
ankurdave Jan 11, 2014
574c0d2
Use SparkConf in GraphX tests (via LocalSparkContext)
ankurdave Jan 11, 2014
6510f04
Merge pull request #387 from jerryshao/conf-fix
rxin Jan 11, 2014
02771aa
Make EdgeDirection val instead of case object for Java compat.
ankurdave Jan 11, 2014
2180c87
Stop SparkListenerBus daemon thread when DAGScheduler is stopped.
rxin Jan 11, 2014
64c4593
Finished docummenting join operators and revised some of the initial …
jegonzal Jan 11, 2014
22d4d62
Revert "Fix one unit test that was not setting spark.cleaner.ttl"
pwendell Jan 12, 2014
07b952e
Revert "Fix default TTL for metadata cleaner"
pwendell Jan 12, 2014
409866b
Merge pull request #393 from pwendell/revert-381
pwendell Jan 12, 2014
cf57b1b
Correcting typos in documentation.
jegonzal Jan 12, 2014
362cda1
Renamed cloneKeyValues to cloneRecords; updated docs.
rxin Jan 12, 2014
dbc11df
Merge pull request #388 from pwendell/master
rxin Jan 12, 2014
288a878
Merge pull request #389 from rxin/clone-writables
rxin Jan 12, 2014
9a0dfdf
Add Naive Bayes to Python MLlib, and some API fixes
mateiz Jan 10, 2014
4c28a2b
Update some Python MLlib parameters to use camelCase, and tweak docs
mateiz Jan 10, 2014
f00e949
Added Java unit test, data, and main method for Naive Bayes
mateiz Jan 11, 2014
f5108ff
Converted JobScheduler to use actors for event handling. Changed prot…
tdas Jan 12, 2014
4d9b0ab
Added waitForStop and stop to JavaStreamingContext.
tdas Jan 12, 2014
18f4889
Merge remote-tracking branch 'apache/master' into error-handling
tdas Jan 12, 2014
5741078
Log Python exceptions to stderr as well
mateiz Jan 12, 2014
224f1a7
Update Python required version to 2.7, and mention MLlib support
mateiz Jan 12, 2014
c5921e5
Fixed bugs.
tdas Jan 12, 2014
78d2d17
Merge pull request #4 from apache/master
Jan 12, 2014
93a65e5
Remove simple redundant return statement for Scala methods/functions:
hsaputra Jan 12, 2014
91a5636
Merge branch 'master' into remove_simpleredundantreturn_scala
hsaputra Jan 12, 2014
f1c5eca
Fix accidental comment modification.
hsaputra Jan 12, 2014
f096f4e
Link methods in programming guide; document VertexID
ankurdave Jan 12, 2014
448aef6
Moved DStream, DStreamCheckpointData and PairDStream from org.apache.…
tdas Jan 12, 2014
5e35d39
Add PageRank example and data
ankurdave Jan 12, 2014
cfb1e6c
Setting load defaults to true in executor
pwendell Jan 12, 2014
7883b8f
Fixed bugs to ensure better cleanup of JobScheduler, JobGenerator and…
tdas Jan 13, 2014
0d4886c
Remove now un-needed hostPort option
pwendell Jan 13, 2014
0bb3307
Removing mentions in tests
pwendell Jan 13, 2014
82e2b92
Merge pull request #392 from rxin/listenerbus
rxin Jan 13, 2014
7a4bb86
Add connected components example to doc
ankurdave Jan 13, 2014
074f502
Merge pull request #396 from pwendell/executor-env
pwendell Jan 13, 2014
f4d77f8
Rename DStream.foreach to DStream.foreachRDD
pwendell Jan 11, 2014
c7fabb7
Changed StreamingContext.stopForWait to awaitTermination.
tdas Jan 13, 2014
d1820fe
Merge branch 'error-handling' into dstream-move
tdas Jan 13, 2014
aa2c993
Merge remote-tracking branch 'apache/master' into error-handling
tdas Jan 13, 2014
54d3486
Fix Scala version in docs (it was printed as 2.1)
mateiz Jan 13, 2014
74d0126
Merge remote-tracking branch 'apache/master' into dstream-move
tdas Jan 13, 2014
e6e20ce
Adding deprecated versions of old code
pwendell Jan 13, 2014
034f89a
Fixed persistence logic of WindowedDStream, and fixed default persist…
tdas Jan 13, 2014
5a8abfb
Address code review concerns and comments.
hsaputra Jan 13, 2014
2802cc8
Disable shuffle file consolidation by default
pwendell Jan 13, 2014
28a6b0c
Merge pull request #398 from pwendell/streaming-api
pwendell Jan 13, 2014
405bfe8
Merge pull request #394 from tdas/error-handling
pwendell Jan 13, 2014
c787ff5
Documenting Pregel API
jegonzal Jan 13, 2014
2216319
adding Pregel as an operator in GraphOps and cleaning up documentatio…
jegonzal Jan 13, 2014
0ab505a
Merge pull request #395 from hsaputra/remove_simpleredundantreturn_scala
pwendell Jan 13, 2014
0b96d85
Merge pull request #399 from pwendell/consolidate-off
pwendell Jan 13, 2014
20c509b
Add TriangleCount example
ankurdave Jan 13, 2014
d691e9f
Move algorithms to GraphOps
ankurdave Jan 13, 2014
777c181
Merge remote-tracking branch 'apache/master' into dstream-move
tdas Jan 13, 2014
1efe78a
Use GraphLoader for algorithms examples in doc
ankurdave Jan 13, 2014
66c9d00
Tested and corrected all examples up to mask in the graphx-programmin…
jegonzal Jan 13, 2014
ffa1d38
Fixed import formatting.
tdas Jan 13, 2014
8d40e72
Get rid of spill map in SparkEnv
andrewor14 Jan 13, 2014
e6ed13f
Merge pull request #397 from pwendell/host-port
rxin Jan 13, 2014
69c9aeb
Enable external sorting by default
andrewor14 Jan 13, 2014
a1f0992
Report bytes spilled for both memory and disk on Web UI
andrewor14 Jan 13, 2014
8ca9773
Add LiveJournalPageRank example
ankurdave Jan 13, 2014
b93f9d4
Merge pull request #400 from tdas/dstream-move
pwendell Jan 13, 2014
5d61e05
Improvements to external sorting
pwendell Jan 13, 2014
ea69cff
Further improve VertexRDD scaladocs
ankurdave Jan 13, 2014
9fe8862
Improve EdgeRDD scaladoc
ankurdave Jan 13, 2014
c3816de
Changing option wording per discussion with Andrew
pwendell Jan 13, 2014
80e4d98
Improving documentation and identifying potential bug in CC calculation.
jegonzal Jan 13, 2014
15ca89b
Fix mapReduceTriplets links in doc
ankurdave Jan 13, 2014
97cd27e
Add graph loader links to doc
ankurdave Jan 13, 2014
27311b1
Added unpersisting and modified testsuite to better test out metadata…
tdas Jan 13, 2014
8038da2
Merge pull request #2 from jegonzal/GraphXCCIssue
ankurdave Jan 13, 2014
30328c3
Updated JavaStreamingContext to make scaladoc compile.
rxin Jan 13, 2014
e2d25d2
Merge branch 'master' into graphx
rxin Jan 14, 2014
01c0d72
Merge pull request #410 from rxin/scaladoc1
rxin Jan 14, 2014
dc041cd
Merge branch 'scaladoc1' of github.com:rxin/incubator-spark into graphx
rxin Jan 14, 2014
c0bb38e
Improved file input stream further.
tdas Jan 14, 2014
1bd5cef
Remove aggregateNeighbors
ankurdave Jan 14, 2014
ae4b75d
Add EdgeDirection.Either and use it to fix CC bug
ankurdave Jan 14, 2014
cfe4a29
Improvements in example code for the programming guide as well as add…
jegonzal Jan 14, 2014
1233b3d
Merge remote-tracking branch 'apache/master' into filestream-fix
tdas Jan 14, 2014
02a8f54
Miscel doc update.
rxin Jan 14, 2014
a4e12af
Merge branch 'graphx' of github.com:ankurdave/incubator-spark into gr…
rxin Jan 14, 2014
87f335d
Made more things private.
rxin Jan 14, 2014
ae06d2c
Updated GraphGenerator.
rxin Jan 14, 2014
1dce9ce
Moved PartitionStrategy's into an object.
rxin Jan 14, 2014
79a5ba3
Yarn Client refactor
colorant Jan 9, 2014
161ab93
Yarn workerRunnable refactor
colorant Jan 9, 2014
622b7f7
Minor changes in graphx programming guide.
jegonzal Jan 14, 2014
552de5d
Finished second pass on pregel docs.
jegonzal Jan 14, 2014
4c22c55
Address comments to fix code formats
colorant Jan 10, 2014
8e5c732
Moved SVDPlusPlusConf into SVDPlusPlus object itself.
rxin Jan 14, 2014
9317286
More cleanup.
rxin Jan 14, 2014
0b18bfb
Updated doc for PageRank.
rxin Jan 14, 2014
0fbc0b0
Merge branch 'graphx' of github.com:ankurdave/incubator-spark into gr…
rxin Jan 14, 2014
d4cd5de
Fix for Kryo Serializer
pwendell Jan 14, 2014
ee8931d
Finished documenting vertexrdd.
jegonzal Jan 14, 2014
9e84e70
Add default value for HadoopRDD's `cloneRecords` constructor arg, to …
harveyfeng Jan 14, 2014
a2fee38
Merge pull request #411 from tdas/filestream-fix
pwendell Jan 14, 2014
33022d6
Adjusted visibility of various components.
rxin Jan 14, 2014
b07bc02
Merge pull request #412 from harveyfeng/master
pwendell Jan 14, 2014
cc93c2a
Disable MLlib tests for now while Jenkins is still on Python 2.6
mateiz Jan 14, 2014
8399341
Wording changes per Patrick
andrewor14 Jan 14, 2014
d4d9ece
Remove Graph.statistics and GraphImpl.printLineage
ankurdave Jan 14, 2014
84d6af8
Make Graph{,Impl,Ops} serializable to work around capture
ankurdave Jan 14, 2014
c6023be
Fix infinite loop in GraphGenerators.generateRandomEdges
ankurdave Jan 14, 2014
59e4384
Fix Pregel SSSP example in programming guide
ankurdave Jan 14, 2014
c28e5a0
Improve scaladoc links
ankurdave Jan 14, 2014
e14a14b
Remove K-Core and LDA sections from guide; they are unimplemented
ankurdave Jan 14, 2014
67795db
Write Graph Builders section in guide
ankurdave Jan 14, 2014
6f6f8c9
Wrap methods in the appropriate class/object declaration
ankurdave Jan 14, 2014
c6dbfd1
Edge object must be public for Edge case class
ankurdave Jan 14, 2014
76ebdae
Fix bug in GraphLoader.edgeListFile that caused srcId > dstId
ankurdave Jan 14, 2014
08b9fec
Merge pull request #409 from tdas/unpersist
pwendell Jan 14, 2014
2cd9358
Finish 6f6f8c928ce493357d4d32e46971c5e401682ea8
ankurdave Jan 14, 2014
af645be
Fix all code examples in guide
ankurdave Jan 14, 2014
0ca0d4d
Merge pull request #401 from andrewor14/master
pwendell Jan 14, 2014
0d94d74
Code clean up for mllib
soulmachine Jan 14, 2014
12386b3
Since getLong() and getInt() have side effect, get back parentheses, …
soulmachine Jan 14, 2014
68641bc
Merge pull request #413 from rxin/scaladoc
pwendell Jan 14, 2014
4bafc4f
adding documentation about EdgeRDD
jegonzal Jan 14, 2014
945fe7a
Merge pull request #408 from pwendell/external-serializers
pwendell Jan 14, 2014
80e73ed
Adding minimal additional functionality to EdgeRDD
jegonzal Jan 14, 2014
4a805af
Merge pull request #367 from ankurdave/graphx
pwendell Jan 14, 2014
c2852cf
Indent two spaces
soulmachine Jan 14, 2014
fdaabdc
Merge pull request #380 from mateiz/py-bayes
pwendell Jan 14, 2014
4e497db
Removed StreamingContext.registerInputStream and registerOutputStream…
tdas Jan 14, 2014
0984647
Enable compression by default for spills
pwendell Jan 14, 2014
055be5c
Merge pull request #415 from pwendell/shuffle-compress
pwendell Jan 14, 2014
a3da468
Merge remote-tracking branch 'upstream/master' into code-style
soulmachine Jan 14, 2014
f8e239e
Merge remote-tracking branch 'apache/master' into filestream-fix
tdas Jan 14, 2014
f8bd828
Fixed loose ends in docs.
tdas Jan 14, 2014
980250b
Merge pull request #416 from tdas/filestream-fix
pwendell Jan 14, 2014
8fb3685
Merge pull request #5 from apache/master
Jan 14, 2014
2303479
Add missing header files
pwendell Jan 14, 2014
fa75e5e
Merge pull request #420 from pwendell/header-files
pwendell Jan 14, 2014
57fcfc7
Added parentheses for that getDouble() also has side effect
soulmachine Jan 14, 2014
486f37c
Improving the graphx-programming-guide.
jegonzal Jan 14, 2014
3fcc68b
Merge pull request #423 from jegonzal/GraphXProgrammingGuide
rxin Jan 14, 2014
0bba773
Additional edits for clarity in the graphx programming guide.
jegonzal Jan 14, 2014
71b3007
Broadcast variable visibility change & doc update.
rxin Jan 14, 2014
6a12b9e
Updated API doc for Accumulable and Accumulator.
rxin Jan 14, 2014
f8c12e9
Added package doc for the Java API.
rxin Jan 14, 2014
55db774
Added license header for package.scala in the Java API package.
rxin Jan 14, 2014
1b5623f
Maintain Serializable API compatibility by reverting back to java.io.…
rxin Jan 14, 2014
f12e506
Fixed a typo in JavaSparkContext's API doc.
rxin Jan 14, 2014
6f965a4
Don't clone records for text files
pwendell Jan 14, 2014
938e4a0
Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+)
mateiz Jan 14, 2014
b683608
Deprecate rather than remove old combineValuesByKey function
pwendell Jan 14, 2014
5b3a3e2
Complain if Python and NumPy versions are too old for MLlib
mateiz Jan 14, 2014
2ce23a5
Merge pull request #425 from rxin/scaladoc
rxin Jan 14, 2014
8ea2cd5
Adding fix covering combineCombinersByKey as well
pwendell Jan 14, 2014
b1b22b7
Style fix
pwendell Jan 14, 2014
8ea056d
Add GraphX dependency to examples/pom.xml
ankurdave Jan 14, 2014
d601a76
Merge pull request #427 from pwendell/deprecate-aggregator
rxin Jan 14, 2014
193a075
Merge pull request #429 from ankurdave/graphx-examples-pom.xml
rxin Jan 14, 2014
74b46ac
Merge pull request #428 from pwendell/writeable-objects
rxin Jan 14, 2014
1210ec2
Describe GraphX caching and uncaching in guide
ankurdave Jan 15, 2014
ad294db
Merge pull request #431 from ankurdave/graphx-caching-doc
rxin Jan 15, 2014
3a386e2
Merge pull request #424 from jegonzal/GraphXProgrammingGuide
rxin Jan 15, 2014
148757e
Add deb profile to assembly/pom.xml
markhamstra Jan 15, 2014
f4d9019
VertexID -> VertexId
ankurdave Jan 15, 2014
147a943
Removed repl-bin and updated maven build doc.
markhamstra Jan 15, 2014
dfb1524
Fixed SVDPlusPlusSuite in Maven build.
rxin Jan 15, 2014
1f4718c
Changed SparkConf to not be serializable. And also fixed unit-test lo…
tdas Jan 15, 2014
0e15bd7
Merge remote-tracking branch 'apache/master' into filestream-fix
tdas Jan 15, 2014
087487e
Merge pull request #434 from rxin/graphxmaven
pwendell Jan 15, 2014
139c24e
Merge pull request #435 from tdas/filestream-fix
pwendell Jan 15, 2014
0aea33d
Expose method and class - so that we can use it from user code (parti…
mridulm Jan 15, 2014
3d9e66d
Merge pull request #436 from ankurdave/VertexId-case
rxin Jan 15, 2014
263933d
remove "-XX:+UseCompressedStrings" option
CrazyJvm Jan 15, 2014
cef2af9
Merge pull request #366 from colorant/yarn-dev
tgravescs Jan 15, 2014
494d3c0
Merge pull request #433 from markhamstra/debFix
pwendell Jan 15, 2014
9259d70
GraphX shouldn't list Spark as provided
pwendell Jan 15, 2014
00a3f7e
Workers should use working directory as spark home if it's not specified
pwendell Jan 15, 2014
5fecd25
Merge pull request #441 from pwendell/graphx-build
pwendell Jan 15, 2014
9e63753
Made some classes private[stremaing] and deprecated a method in JavaS…
tdas Jan 15, 2014
2a05403
Merge pull request #443 from tdas/filestream-fix
pwendell Jan 15, 2014
59f475c
Merge pull request #442 from pwendell/standalone
pwendell Jan 15, 2014
2ffdaef
Clarify that Python 2.7 is only needed for MLlib
mateiz Jan 15, 2014
4f0c361
Merge pull request #444 from mateiz/py-version
pwendell Jan 15, 2014
a268d63
Fail rather than hanging if a task crashes the JVM.
kayousterhout Jan 16, 2014
0675ca5
Merge pull request #439 from CrazyJvm/master
rxin Jan 16, 2014
7a0c5b5
fix "set MASTER automatically fails" bug.
CrazyJvm Jan 16, 2014
8400536
fix some format problem.
CrazyJvm Jan 16, 2014
84595ea
Merge pull request #414 from soulmachine/code-style
rxin Jan 16, 2014
718a13c
Updated unit test comment
kayousterhout Jan 16, 2014
c06a307
Merge pull request #445 from kayousterhout/exec_lost
rxin Jan 16, 2014
4e510b0
Fixed Window spark shell launch script error.
Qiuzhuang Jan 16, 2014
1a0da89
Address review comments
mridulm Jan 16, 2014
edd82c5
Use method, not variable
mridulm Jan 16, 2014
11e6534
Updated java API docs for streaming, along with very minor changes in…
tdas Jan 16, 2014
fcb4fc6
adding clone records field to equivaled java apis
ScrapCodes Jan 14, 2014
d4fd89e
Merge pull request #438 from ScrapCodes/clone-records-java-api
pwendell Jan 17, 2014
d749d47
Merge pull request #451 from Qiuzhuang/master
pwendell Jan 17, 2014
b690e11
Address review comment
mridulm Jan 17, 2014
e91ad3f
Correct L2 regularized weight update with canonical form
srowen Jan 18, 2014
5316bca
Use renamed shuffle spill config in CoGroupedRDD.scala
pwendell Jan 18, 2014
aa981e4
Merge pull request #461 from pwendell/master
pwendell Jan 18, 2014
bf56995
Merge pull request #462 from mateiz/conf-file-fix
pwendell Jan 19, 2014
4c16f79
Merge pull request #426 from mateiz/py-ml-tests
pwendell Jan 19, 2014
73dfd42
Merge pull request #437 from mridulm/master
pwendell Jan 19, 2014
fe8a354
Merge pull request #459 from srowen/UpdaterL2Regularization
pwendell Jan 19, 2014
720836a
LocalSparkContext for MLlib
ajtulloch Jan 19, 2014
ceb79a3
Only log error on missing jar to allow spark examples to jar.
tgravescs Jan 19, 2014
dd56b21
update comment
tgravescs Jan 19, 2014
256a355
Merge pull request #458 from tdas/docs-update
pwendell Jan 19, 2014
792d908
Merge pull request #470 from tgravescs/fix_spark_examples_yarn
pwendell Jan 19, 2014
cdb003e
Removing docs on akka options
pwendell Jan 21, 2014
54867e9
Minor fixes
pwendell Jan 21, 2014
1b29914
Bug fix for reporting of spill output
pwendell Jan 21, 2014
c324ac1
Force use of LZF when spilling data
pwendell Jan 21, 2014
f84400e
Fixing speculation bug
pwendell Jan 21, 2014
de526ad
Remove shuffle files if they are still present on a machine.
pwendell Jan 21, 2014
d46df96
Avoid matching attempt files in the checkpoint
pwendell Jan 21, 2014
2e95174
Added StreamingContext.awaitTermination to streaming examples.
tdas Jan 21, 2014
e437069
Restricting /lib to top level directory in .gitignore
pwendell Jan 21, 2014
e0b741d
Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM.
tdas Jan 21, 2014
7373ffb
Merge pull request #483 from pwendell/gitignore
rxin Jan 21, 2014
0367981
Merge pull request #482 from tdas/streaming-example-fix
pwendell Jan 21, 2014
6b4eed7
Merge pull request #449 from CrazyJvm/master
rxin Jan 21, 2014
a917a87
Adding small code comment
pwendell Jan 21, 2014
65869f8
Removed SPARK_MEM from run-examples.
tdas Jan 21, 2014
c67d3d8
Merge pull request #484 from tdas/run-example-fix
pwendell Jan 21, 2014
a9bcc98
Style clean-up
pwendell Jan 21, 2014
77b986f
Merge pull request #480 from pwendell/0.9-fixes
pwendell Jan 21, 2014
3a067b4
Fixed import order
ajtulloch Jan 21, 2014
f854498
Merge pull request #469 from ajtulloch/use-local-spark-context-in-tes…
rxin Jan 21, 2014
069bb94
Clarify spark.default.parallelism
ash211 Jan 21, 2014
749f842
Merge pull request #489 from ash211/patch-6
rxin Jan 21, 2014
c205dc7
Merge pull request #6 from apache/master
Jan 22, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,4 @@ derby.log
dist/
spark-*-bin.tar.gz
unit-tests.log
lib/
/lib/
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ locally with one thread, or "local[N]" to run locally with N threads.

## Running tests

Testing first requires [Building](#Building) Spark. Once Spark is built, tests
Testing first requires [Building](#building) Spark. Once Spark is built, tests
can be run using:

`./sbt/sbt test`
Expand Down
122 changes: 118 additions & 4 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@
<name>Spark Project Assembly</name>
<url>http://spark.incubator.apache.org/</url>

<properties>
<spark.jar>${project.build.directory}/scala-${scala.binary.version}/${project.artifactId}-${project.version}-hadoop${hadoop.version}.jar</spark.jar>
<deb.pkg.name>spark</deb.pkg.name>
<deb.install.path>/usr/share/spark</deb.install.path>
<deb.user>root</deb.user>
</properties>

<repositories>
<!-- A repository in the local filesystem for the Py4J JAR, which is not in Maven central -->
<repository>
Expand Down Expand Up @@ -79,7 +86,7 @@
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
<outputFile>${project.build.directory}/scala-${scala.binary.version}/${project.artifactId}-${project.version}-hadoop${hadoop.version}.jar</outputFile>
<outputFile>${spark.jar}</outputFile>
<artifactSet>
<includes>
<include>*:*</include>
Expand Down Expand Up @@ -108,12 +115,12 @@
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource>
</transformer>
</transformers>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer">
<resource>log4j.properties</resource>
</transformer>
</transformers>
</configuration>
</execution>
Expand Down Expand Up @@ -171,5 +178,112 @@
</plugins>
</build>
</profile>
<profile>
<id>deb</id>
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>buildnumber-maven-plugin</artifactId>
<version>1.1</version>
<executions>
<execution>
<phase>validate</phase>
<goals>
<goal>create</goal>
</goals>
<configuration>
<shortRevisionLength>8</shortRevisionLength>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.vafer</groupId>
<artifactId>jdeb</artifactId>
<version>0.11</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>jdeb</goal>
</goals>
<configuration>
<deb>${project.build.directory}/${deb.pkg.name}_${project.version}-${buildNumber}_all.deb</deb>
<attach>false</attach>
<compression>gzip</compression>
<dataSet>
<data>
<src>${spark.jar}</src>
<type>file</type>
<mapper>
<type>perm</type>
<user>${deb.user}</user>
<group>${deb.user}</group>
<prefix>${deb.install.path}/jars</prefix>
</mapper>
</data>
<data>
<src>${basedir}/src/deb/RELEASE</src>
<type>file</type>
<mapper>
<type>perm</type>
<user>${deb.user}</user>
<group>${deb.user}</group>
<prefix>${deb.install.path}</prefix>
</mapper>
</data>
<data>
<src>${basedir}/../conf</src>
<type>directory</type>
<mapper>
<type>perm</type>
<user>${deb.user}</user>
<group>${deb.user}</group>
<prefix>${deb.install.path}/conf</prefix>
<filemode>744</filemode>
</mapper>
</data>
<data>
<src>${basedir}/../bin</src>
<type>directory</type>
<mapper>
<type>perm</type>
<user>${deb.user}</user>
<group>${deb.user}</group>
<prefix>${deb.install.path}/bin</prefix>
<filemode>744</filemode>
</mapper>
</data>
<data>
<src>${basedir}/../sbin</src>
<type>directory</type>
<mapper>
<type>perm</type>
<user>${deb.user}</user>
<group>${deb.user}</group>
<prefix>${deb.install.path}/sbin</prefix>
<filemode>744</filemode>
</mapper>
</data>
<data>
<src>${basedir}/../python</src>
<type>directory</type>
<mapper>
<type>perm</type>
<user>${deb.user}</user>
<group>${deb.user}</group>
<prefix>${deb.install.path}/python</prefix>
<filemode>744</filemode>
</mapper>
</data>
</dataSet>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
2 changes: 2 additions & 0 deletions assembly/src/deb/RELEASE
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
compute-classpath.sh uses the existence of this file to decide whether to put the assembly jar on the
classpath or instead to use classfiles in the source tree.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ class BagelSuite extends FunSuite with Assertions with BeforeAndAfter with Timeo
}
// To avoid Akka rebinding to the same port, since it doesn't unbind immediately on shutdown
System.clearProperty("spark.driver.port")
System.clearProperty("spark.hostPort")
}

test("halting by voting") {
Expand Down
2 changes: 2 additions & 0 deletions bin/compute-classpath.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ if [ -f "$FWDIR"/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-dep
CLASSPATH="$CLASSPATH:$FWDIR/repl/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/mllib/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/bagel/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/graphx/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/streaming/target/scala-$SCALA_VERSION/classes"

DEPS_ASSEMBLY_JAR=`ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-deps.jar`
Expand All @@ -59,6 +60,7 @@ if [[ $SPARK_TESTING == 1 ]]; then
CLASSPATH="$CLASSPATH:$FWDIR/repl/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/mllib/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/bagel/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/graphx/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/streaming/target/scala-$SCALA_VERSION/test-classes"
fi

Expand Down
7 changes: 1 addition & 6 deletions bin/pyspark
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,7 @@ if [ -n "$IPYTHON_OPTS" ]; then
fi

if [[ "$IPYTHON" = "1" ]] ; then
# IPython <1.0.0 doesn't honor PYTHONSTARTUP, while 1.0.0+ does.
# Hence we clear PYTHONSTARTUP and use the -c "%run $IPYTHONSTARTUP" command which works on all versions
# We also force interactive mode with "-i"
IPYTHONSTARTUP=$PYTHONSTARTUP
PYTHONSTARTUP=
exec ipython "$IPYTHON_OPTS" -i -c "%run $IPYTHONSTARTUP"
exec ipython $IPYTHON_OPTS
else
exec "$PYSPARK_PYTHON" "$@"
fi
20 changes: 12 additions & 8 deletions bin/run-example
Original file line number Diff line number Diff line change
Expand Up @@ -45,20 +45,15 @@ fi
EXAMPLES_DIR="$FWDIR"/examples
SPARK_EXAMPLES_JAR=""
if [ -e "$EXAMPLES_DIR"/target/scala-$SCALA_VERSION/*assembly*[0-9Tg].jar ]; then
# Use the JAR from the SBT build
export SPARK_EXAMPLES_JAR=`ls "$EXAMPLES_DIR"/target/scala-$SCALA_VERSION/*assembly*[0-9Tg].jar`
fi
if [ -e "$EXAMPLES_DIR"/target/spark-examples*[0-9Tg].jar ]; then
# Use the JAR from the Maven build
# TODO: this also needs to become an assembly!
export SPARK_EXAMPLES_JAR=`ls "$EXAMPLES_DIR"/target/spark-examples*[0-9Tg].jar`
fi
if [[ -z $SPARK_EXAMPLES_JAR ]]; then
echo "Failed to find Spark examples assembly in $FWDIR/examples/target" >&2
echo "You need to build Spark with sbt/sbt assembly before running this program" >&2
exit 1
fi


# Since the examples JAR ideally shouldn't include spark-core (that dependency should be
# "provided"), also add our standard Spark classpath, built using compute-classpath.sh.
CLASSPATH=`$FWDIR/bin/compute-classpath.sh`
Expand All @@ -81,11 +76,20 @@ else
fi
fi

# Set JAVA_OPTS to be able to load native libraries and to set heap size
JAVA_OPTS="$SPARK_JAVA_OPTS"
JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
# Load extra JAVA_OPTS from conf/java-opts, if it exists
if [ -e "$FWDIR/conf/java-opts" ] ; then
JAVA_OPTS="$JAVA_OPTS `cat $FWDIR/conf/java-opts`"
fi
export JAVA_OPTS

if [ "$SPARK_PRINT_LAUNCH_COMMAND" == "1" ]; then
echo -n "Spark Command: "
echo "$RUNNER" -cp "$CLASSPATH" "$@"
echo "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"
echo "========================================"
echo
fi

exec "$RUNNER" -cp "$CLASSPATH" "$@"
exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"
2 changes: 1 addition & 1 deletion bin/spark-class2.cmd
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ for %%d in ("%TOOLS_DIR%\target\scala-%SCALA_VERSION%\spark-tools*assembly*.jar"

rem Compute classpath using external script
set DONT_PRINT_CLASSPATH=1
call "%FWDIR%sbin\compute-classpath.cmd"
call "%FWDIR%bin\compute-classpath.cmd"
set DONT_PRINT_CLASSPATH=0
set CLASSPATH=%CLASSPATH%;%SPARK_TOOLS_JAR%

Expand Down
11 changes: 8 additions & 3 deletions bin/spark-shell
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,18 @@ for o in "$@"; do
done

# Set MASTER from spark-env if possible
DEFAULT_SPARK_MASTER_PORT=7077
if [ -z "$MASTER" ]; then
if [ -e "$FWDIR/conf/spark-env.sh" ]; then
. "$FWDIR/conf/spark-env.sh"
fi
if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]]; then
MASTER="spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT}"
export MASTER
if [ "x" != "x$SPARK_MASTER_IP" ]; then
if [ "y" != "y$SPARK_MASTER_PORT" ]; then
SPARK_MASTER_PORT="${SPARK_MASTER_PORT}"
else
SPARK_MASTER_PORT=$DEFAULT_SPARK_MASTER_PORT
fi
export MASTER="spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT}"
fi
fi

Expand Down
4 changes: 2 additions & 2 deletions bin/spark-shell.cmd
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ rem limitations under the License.
rem

rem Find the path of sbin
set SBIN=%~dp0..\sbin\
set BIN=%~dp0..\bin\

cmd /V /E /C %SBIN%spark-class2.cmd org.apache.spark.repl.Main %*
cmd /V /E /C %BIN%spark-class2.cmd org.apache.spark.repl.Main %*
5 changes: 4 additions & 1 deletion conf/log4j.properties.template
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Ignore messages below warning level from Jetty, because it's a bit verbose
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
2 changes: 1 addition & 1 deletion conf/spark-env.sh.template
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@
# - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node

# - SPARK_WORKER_DIR, to set the working directory of worker processes
10 changes: 10 additions & 0 deletions core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,11 @@
<groupId>${akka.group}</groupId>
<artifactId>akka-slf4j_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>${akka.group}</groupId>
<artifactId>akka-testkit_${scala.binary.version}</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
Expand Down Expand Up @@ -165,6 +170,11 @@
<artifactId>scalatest_${scala.binary.version}</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalacheck</groupId>
<artifactId>scalacheck_${scala.binary.version}</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,16 @@

class FileClient {

private Logger LOG = LoggerFactory.getLogger(this.getClass().getName());
private static final Logger LOG = LoggerFactory.getLogger(FileClient.class.getName());

private final FileClientHandler handler;
private Channel channel = null;
private Bootstrap bootstrap = null;
private EventLoopGroup group = null;
private final int connectTimeout;
private final int sendTimeout = 60; // 1 min

public FileClient(FileClientHandler handler, int connectTimeout) {
FileClient(FileClientHandler handler, int connectTimeout) {
this.handler = handler;
this.connectTimeout = connectTimeout;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ class FileClientChannelInitializer extends ChannelInitializer<SocketChannel> {

private final FileClientHandler fhandler;

public FileClientChannelInitializer(FileClientHandler handler) {
FileClientChannelInitializer(FileClientHandler handler) {
fhandler = handler;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,14 @@
*/
class FileServer {

private Logger LOG = LoggerFactory.getLogger(this.getClass().getName());
private static final Logger LOG = LoggerFactory.getLogger(FileServer.class.getName());

private EventLoopGroup bossGroup = null;
private EventLoopGroup workerGroup = null;
private ChannelFuture channelFuture = null;
private int port = 0;
private Thread blockingThread = null;

public FileServer(PathResolver pResolver, int port) {
FileServer(PathResolver pResolver, int port) {
InetSocketAddress addr = new InetSocketAddress(port);

// Configure the server.
Expand Down Expand Up @@ -70,7 +69,8 @@ public FileServer(PathResolver pResolver, int port) {
* Start the file server asynchronously in a new thread.
*/
public void start() {
blockingThread = new Thread() {
Thread blockingThread = new Thread() {
@Override
public void run() {
try {
channelFuture.channel().closeFuture().sync();
Expand Down
Loading