i have two kernels for Spark, one to run locally and one to run towards a cluster. Is there a way to set an environment variable to my spark master so that users dont have to define master in SparkContext in the kernel which is to speak with the spark cluster? I tried “export SPARK_MASTER_HOST=’spark:/my-server.domain.com:7077′ “, however that did not work.
from pyspark import SparkContext import random def inside(p): x, y = random.random(), random.random() return x*x + y*y < 1 sc = SparkContext(master="spark:/my-server.domain.com:7077", appName="test") num_samples = 2 count = sc.parallelize(range(0, num_samples)).filter(inside).count() pi = 4 * count / num_samples print(pi) sc.stop()
Read more here: Source link