It’s a good choice to read spark souce code in IntelliJ IDEA. This tutorial introduces how to do it.
Get spark repository
- Fork apache spark project to your Github account
Clone spark to local:
12$ git clone git@github.com:username/spark.git$ cd spark/Add apache spark remote (to keep up-to-date with apache spark repo):
123$ git remote add apache https://github.com/apache/spark.git# check remote accounts$ git remote -vSync repo with apache spark:
1234# Fetch the branches and their respective commits from the apache repo$ git fetch apache# Update codes$ git pull apache masterPush new updates to your own github account repo:
1$ git push origin masterCreate new develop branch for developing:
1$ git checkout -b developPush develop branch to your github repo:
1$ git push -u origin develop
Built spark in Intellij IDEA 15
- Install IntelliJ IDEA 15 as well as IDEA Scala Plugin
Make sure your are in your own develop branch:
1$ git checkout developOpen spark project in IDEA (directly open pom.xml file)
Menu -> File -> Open -> {spark}/pom.xmlModify
java.version
to your java version inside pom.xml12# pom.xml<java.version>1.8</java.version>Build spark by sbt
1$ build/sbt assemblyValidating spark is built successfully
1$ ./bin/spark-shell
Reading spark codes
It’s better to read or change codes on your develop branch and sync with apache spark repo inside master branch. So normally, you can update your develop branch by following commands:
|
|
Useful IDEA Shortcuts:
command + o : search classes
command + b : go to implementation
command + [ : go back to the previous location
shift + command + F : search files
Several important classes:
SparkContext.scala
DAGScheduler.scala
TaskSchedulerImpl.scala
BlockManager.scala
Ref: Building Spark: http://spark.apache.org/docs/latest/building-spark.html