apache spark – Pyspark kernel in jupyterhub, access to master remotely

i have two kernels for Spark, one to run locally and one to run towards a cluster. Is there a way to set an environment variable to my spark master so that users dont have to define master in SparkContext in the kernel which is to speak with the spark cluster? I tried “export SPARK_MASTER_HOST=’spark:/my-server.domain.com:7077′ “, however that did not work.

Code

from pyspark import SparkContext
import random

def inside(p):
    x, y = random.random(), random.random()
    return x*x + y*y < 1


sc = SparkContext(master="spark:/my-server.domain.com:7077", appName="test")

num_samples = 2

count = sc.parallelize(range(0, num_samples)).filter(inside).count()

pi = 4 * count / num_samples
print(pi)

sc.stop()

Read more here: Source link