AdStax Spark Job Manager¶ ↑
The AdStax Spark Job Manager is a gem to manager Spark jobs running in an AdStax cluster.
Installation¶ ↑
From RubyGems¶ ↑
Make sure you have ruby (at least v2.0.0) installed, and just run:
$ gem install adstax-spark-job-manager
From source¶ ↑
Clone this repo and build the gem:
$ git clone git://github.com/ShiftForward/adstax-spark-job-manager.git $ gem build adstax-spark-job-manager.gemspec $ gem install adstax-spark-job-manager-0.2.0.gem
Usage¶ ↑
The AdStax Spark Job Manager publishes an adstax-spark-job-manager
binary, which provides a set of utilities to submit, kill and query the status of Spark jobs running on an AdStax cluster. See the help for the command (running it with -h
) for more details.
The methods available are submit
, kill
, status
or log
. To submit a job, one has to provide the task with the --adstax-host
parameter (pointing to where the AdStax instance is running), a --jar
parameter pointing to a bundled jar with your application, all required dependencies, and which includes at least one implementation of eu.shiftforward.adstax.spark.SparkJob
. Note that you don’t need to bundle the spark-core
dependency, as it will be provided at runtime. The --job
parameter should be the fully qualified name of the class extending eu.shiftforward.adstax.spark.SparkJob
and which is going to be used as the Spark job to run. Everything following the required parameters will be used as arguments for the SparkJob
. For example, in order to submit the SparkPI
example, one can use the following command:
$ adstax-spark-job-manager submit --adstax-host sample-adstax-instance.dev.adstax.io --jar http://s3.amazonaws.com/shiftforward-public/bin/spark/adstax-spark-examples-1.0.jar --job eu.shiftforward.adstax.spark.examples.SparkPi 100
This command should return information about the submission, for example:
{ "action" : "CreateSubmissionResponse", "serverSparkVersion" : "2.0.0-SNAPSHOT", "submissionId" : "driver-20160713161243-0002", "success" : true }
You can now use the returned submission id to query the status of the job, as well as list its standard output. In order to query the status of the job, use the status
command:
$ adstax-spark-job-manager status --adstax-host sample-adstax-instance.dev.adstax.io --submission-id driver-20160713161243-0002 { "action" : "SubmissionStatusResponse", "driverState" : "FINISHED", "message" : "task_id {\n value: \"driver-20160713161243-0002\"\n}\nstate: TASK_FINISHED\nmessage: \"Command exited with status 0\"\nslave_id {\n value: \"9f18159e-ebe9-4a70-89e1-9774adf2cdd6-S9\"\n}\ntimestamp: 1.468426400438861E9\nexecutor_id {\n value: \"driver-20160713161243-0002\"\n}\nsource: SOURCE_EXECUTOR\n11: \"A\\371\\330\\365+\\027Ds\\237\\243\\\"\\317\\276\\353\\363\\367\"\n13: \"\\n\\036\\022\\f10.0.174.173*\\016\\022\\f10.0.174.173\"\n", "serverSparkVersion" : "2.0.0-SNAPSHOT", "submissionId" : "driver-20160713161243-0002", "success" : true }
The log
command allows you to output the stdout and stderr of the job’s driver. You can hide the stderr with the --hide-stderr
command and keep tailing the output with the --follow
command:
$ adstax-spark-job-manager log --adstax-host sample-adstax-instance.dev.adstax.io --submission-id driver-20160713161243-0002 --hide-stderr --follow Registered executor on ec2-54-87-240-29.compute-1.amazonaws.com Starting task driver-20160713161243-0002 Forked command at 22260 sh -c 'cd spark-2*; bin/spark-submit --name eu.shiftforward.adstax.spark.SparkJobRunner --master mesos://zk://zk.sample-adstax-instance.dev.adstax.io:2181/mesos --driver-cores 1.0 --driver-memory 1024M --class eu.shiftforward.adstax.spark.SparkJobRunner --conf spark.driver.supervise=false --conf spark.app.name=eu.shiftforward.adstax.spark.SparkJobRunner --conf spark.es.port=49200 --conf spark.es.nodes=localhost --conf spark.mesos.coarse=false --conf spark.executor.uri=https://s3.amazonaws.com/shiftforward-public/bin/spark/spark-2.0.0-SNAPSHOT-bin-2.4.0.tgz ../adstax-spark-examples-1.0.jar --job eu.shiftforward.adstax.spark.examples.SparkPi 100' Pi is roughly 3.1407 Command exited with status 0 (pid: 22260)
The kill
command allows you to cancel and kill an ongoing job. Killing already finished jobs has no effect:
$ adstax-spark-job-manager kill --adstax-host sample-adstax-instance.dev.adstax.io --submission-id driver-20160713161243-0002 { "action" : "KillSubmissionResponse", "message" : "Driver already terminated", "serverSparkVersion" : "2.0.0-SNAPSHOT", "submissionId" : "driver-20160713161243-0002", "success" : false }