Site icon skwrites

Spark Important Configurations

data center, spark job, cluster

data center, spark job, cluster

Memory Configurations

yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.resource.memory-mb

spark.driver.memory
spark.driver.memoryOverhead

spark.executor.memoryOverhead
spark.executor.memory
spark.executor.fraction
spark.executor.storageFraction
spark.executor.cores

spark.memory.offHeap.enabled
spark.memory.offHeap.size
spark.executor.pyspark.memory

Adaptive Query Execution

spark.sql.adaptive.enabled=true
spark.sql.shuffle.partitions=10
spark.sql.autoBroadcastJoinThreshold=10MB

spark.sql.adaptive.coalescePartitions.enabled
spark.sql.adaptive.coalescePartitions.initialPartitionNum
spark.sql.adaptive.coalescePartitions.minPartitionNum

spark.sql.adaptive.localShuffleReader.enabled=true	
spark.sql.adaptive.advisoryPartitionSizeInBytes

spark.sql.adaptive.skewjoin.enabled=true
spark.sql.adaptive.skewjoin.skewPartitionFactor=5
spark.sql.adaptive.skewjoin.skewPartitionThresholdInBytes=256MB

Partitioning & Caching

//Pruning
spark.sql.optimizer.dynamicPartitionPruning.enabled

Cache Vs Persist

Hints & Accumulators

Partitioning Hints:

Join Hints:

Accumulators – At Action level, gurantee accuracy

Speculative Execution

spark.speculation=true
spark.speculation.interval=100ms
spark.speculation.multiplier=1.5
spark.speculation.quantile=0.75
spark.speculation.minTaskRuntime=100ms
spark.speculation.task.duration.threshold=None

Dynamic Resource Allocation

spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.shuffleTracking.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60s
spark.dynamicAllocation.schedulerBacklogTimeout=1s

Spark Schedulers

spark.scheduler.mode=FAIR
Exit mobile version