jaceklaskowski
Mastering Apache Spark 2 (Spark 2.2+)

Updated 2 months ago

ryan-factual (@ryan-factual) started discussion #145

10 months ago · 0 comments

Open

Number of partitions can be fewer than number of blocks?

[spark programming guide](http://spark.apache.org/docs/latest/programming-guide.html#external-datasets) says:

Note that you cannot have fewer partitions than blocks.

But [Partitions and Partitioning](https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-rdd-partitions.html) says:

Ideally, you would get the same number of blocks as you see in HDFS, but if the lines in your file are too long (longer than the block size), there will be fewer partitions.

which is in conflict.

No comments on this discussion.


to join this conversation on GitBook. Already have an account? Sign in to comment
Notifications

You’re not receiving notifications from this thread.


1 participant