Mastering Apache Spark 2 (Spark 2.2+)

Updated 2 months ago

Srinivas Reddy (@mrsrinivas) started discussion #96

a year ago · 0 comments


Spark relies on data locality, aka data placement or proximity to data source, that makes Spark jobs sensitive to where the data is located. It is therefore important to have Spark running on Hadoop YARN cluster if the data comes from HDFS.

Data Locality (Edit this file)

Won't Spark standalone will schedule tasks in NODE_LOCAL way ?

No description provided.

No comments on this discussion.

to join this conversation on GitBook. Already have an account? Sign in to comment

You’re not receiving notifications from this thread.

1 participant