Printing the spark, hadoop, system configs


#1

I changed the Google storage connector write buffer size when starting up an experimental Dataproc cluster with

  --properties 'core:fs.gs.io.buffersize.write=1048576'

and found these commands (modified from a SO post) useful for verifying such changes. You’ll need access to hc:

from hail.utils.java import Env

To print the Spark config:

print(Env().hc().sc.getConf().getAll())

To print the Hadoop config:

hadoopConf = {}
iterator = Env().hc().sc._jsc.hadoopConfiguration().iterator()
while iterator.hasNext():
    prop = iterator.next()
    hadoopConf[prop.getKey()] = prop.getValue()
for item in sorted(hadoopConf.items()): print(item)

To print system properties:

import os
for item in sorted(os.environ.items()): print(item)

GCP defaults are here: https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/conf/gcs-core-default.xml