configuration – What is spark.driver.maxResultSize?
assuming that a worker wants to send 4G of data to the driver, then having spark.driver.maxResultSize=1G, will cause the worker to send 4 messages (instead of 1 with unlimited spark.driver.maxResultSize).
No. If estimated size of the data is larger than
maxResultSize given job will be aborted. The goal here is to protect your application from driver loss, nothing more.
if I set it to 1M (the minimum), will it be the most protective approach?
In sense yes, but obviously it is not useful in practice. Good value should allow application to proceed normally but protect application from unexpected conditions.