您的位置：首页 > 编程语言 > Python开发

python spark 求解最大最小平均

2017-07-12 10:15 351 查看

rdd = sc.parallelizeDoubles(testData);

Now we’ll calculate the mean of our dataset.

1	LOGGER.info("Mean: " + rdd.mean());

There are similar methods for other statistics operation such as max, standard deviation, …etc.

Every time one of this method is invoked , Spark performs the operation on the entire RDD data. If more than one operations performed, it will repeat again and again which is very inefficient. To solve this, Spark provides “StatCounter” class which executes once and provides results of all basic statistics operations in the same time.

1	StatCounter statCounter = rdd.stats();

Now results can be accessed as follows,

1
2
3
4
5
6
7

LOGGER.info("Count: " + statCounter.count());
LOGGER.info("Min: " + statCounter.min());
LOGGER.info("Max: " + statCounter.max());
LOGGER.info("Sum: " + statCounter.sum());
LOGGER.info("Mean: " + statCounter.mean());
LOGGER.info("Variance: " + statCounter.variance());
LOGGER.info("Stdev: " + statCounter.stdev());

摘自：http://www.sparkexpert.com/tag/rdd/

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航

python spark 求解最大 最小 平均

python spark 求解最大最小平均