jaceklaskowski
Mastering Apache Spark 2 (Spark 2.2+)

Updated 2 months ago

Srinivas Reddy (@mrsrinivas) started discussion #95

a year ago · 0 comments

Open

Spark relies on data locality, aka data placement or proximity to data source, that makes Spark jobs sensitive to where the data is located. It is therefore important to have Spark running on Hadoop YARN cluster if the data comes from HDFS.

Data Locality (Edit this file)

Won't Spark standalone will schedule tasks in NODE_LOCAL way ?

No description provided.
Srinivas Reddy @mrsrinivas commented a year ago

Do we really have to go for spark on yarn for it ?


to join this conversation on GitBook. Already have an account? Sign in to comment
Notifications

You’re not receiving notifications from this thread.


2 participants