用IDEA写spark单词统计
2018-04-07 20:38
288 查看
1.创建一个项目
2.选择Maven项目,然后点击next
3.填写maven的GAV,然后点击next
4.填写项目名称,然后点击finish
5.创建好maven项目后,点击Enable Auto-Import
创建maven后不能创建scala 按照下面的来
注意自己的版本号
注意!!!记得把以下几个选项勾上 不然打不了jar包
这个不用选 因为自己选择的是默认的的c:\\user\..\.m2\settings.xml
6.配置Maven的pom.xml<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion>
<groupId>cn.bw.spark</groupId>
<artifactId>WordCount2</artifactId>
<version>2.0</version>
<properties>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.10.6</scala.version>
<scala.compat.version>2.10</scala.compat.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.4</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
</dependency>
</dependencies>
<build>
&
d267
lt;sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-make:transitive</arg>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.18.1</version>
<configuration>
<useFile>false</useFile>
<disableXmlReport>true</disableXmlReport>
<includes>
<include>**/*Test.*</include>
<include>**/*Suite.*</include>
</includes>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>cn.bw.spark.WordCountSpark</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
7.将src/main/java和src/test/java分别修改成src/main/scala和src/test/scala,与pom.xml中的配置保持一致
8.新建一个scala class,类型为Object
9.编写spark程序
package cn.bw.spark
import org.apache.spark.{SparkContext, SparkConf}
object WordCount {
def main(args: Array[String]) {
//创建SparkConf()并设置App名称
val conf = new SparkConf().setAppName("WC")
//创建SparkContext,该对象是提交spark App的入口
val sc = new SparkContext(conf)
//使用sc创建RDD并执行相应的transformation和action
sc.textFile(args(0)).flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_, 1).sortBy(_._2, false).saveAsTextFile(args(1))
//停止sc,结束该任务
sc.stop()
}
}
10.使用Maven打包:首先修改pom.xml中的main class
点击idea右侧的Maven Project选项
点击Lifecycle,选择clean和package,然后点击Run Maven Build
11.选择编译成功的jar包,并将该jar上传到Spark集群中的某个节点上
12.首先启动hdfs和Spark集群
13.使用spark-submit命令提交Spark应用(注意参数的顺序)
/usr/local/spark-1.5.2-bin-hadoop2.6/bin/spark-submit \ //注意是在submit下
--class cn.bw.spark.WordCount \ //写的是你类的全路径
2.选择Maven项目,然后点击next
3.填写maven的GAV,然后点击next
4.填写项目名称,然后点击finish
5.创建好maven项目后,点击Enable Auto-Import
创建maven后不能创建scala 按照下面的来
注意自己的版本号
注意!!!记得把以下几个选项勾上 不然打不了jar包
这个不用选 因为自己选择的是默认的的c:\\user\..\.m2\settings.xml
6.配置Maven的pom.xml<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion>
<groupId>cn.bw.spark</groupId>
<artifactId>WordCount2</artifactId>
<version>2.0</version>
<properties>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.10.6</scala.version>
<scala.compat.version>2.10</scala.compat.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.4</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
</dependency>
</dependencies>
<build>
&
d267
lt;sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-make:transitive</arg>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.18.1</version>
<configuration>
<useFile>false</useFile>
<disableXmlReport>true</disableXmlReport>
<includes>
<include>**/*Test.*</include>
<include>**/*Suite.*</include>
</includes>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>cn.bw.spark.WordCountSpark</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
7.将src/main/java和src/test/java分别修改成src/main/scala和src/test/scala,与pom.xml中的配置保持一致
8.新建一个scala class,类型为Object
9.编写spark程序
package cn.bw.spark
import org.apache.spark.{SparkContext, SparkConf}
object WordCount {
def main(args: Array[String]) {
//创建SparkConf()并设置App名称
val conf = new SparkConf().setAppName("WC")
//创建SparkContext,该对象是提交spark App的入口
val sc = new SparkContext(conf)
//使用sc创建RDD并执行相应的transformation和action
sc.textFile(args(0)).flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_, 1).sortBy(_._2, false).saveAsTextFile(args(1))
//停止sc,结束该任务
sc.stop()
}
}
10.使用Maven打包:首先修改pom.xml中的main class
点击idea右侧的Maven Project选项
点击Lifecycle,选择clean和package,然后点击Run Maven Build
11.选择编译成功的jar包,并将该jar上传到Spark集群中的某个节点上
12.首先启动hdfs和Spark集群
13.使用spark-submit命令提交Spark应用(注意参数的顺序)
/usr/local/spark-1.5.2-bin-hadoop2.6/bin/spark-submit \ //注意是在submit下
--class cn.bw.spark.WordCount \ //写的是你类的全路径
--master spark://node1.bw.cn:7077 \ --executor-memory 512m \ //指定运行内存 --total-executor-cores 1 \ //指定核数 /root/spark-mvn-1.0-SNAPSHOT.jar \ //jar的路径 hdfs://node1.bw.cn:9000/words.txt \ //文件的路径 hdfs://node1.bw.cn:9000/out //输出的路径查看程序执行结果hdfs dfs -cat hdfs://node1.bw.cn:9000/out/part-00000
相关文章推荐
- 启动Spark Shell,在Spark Shell中编写WordCount程序,在IDEA中编写WordCount的Maven程序,spark-submit使用spark的jar来做单词统计
- 2-1、Spark的单词统计WC
- Spark学习—统计文件单词出现次数
- spark on yarn运行scala单词统计程序出错
- 利用Java的Spark做单词统计并排序
- Spark案例:Scala版统计单词个数
- Spark案例:Java版统计单词个数
- HADOOP和Spark统计SRS的代码的单词频率
- Spark案例:Python版统计单词个数
- spark 单词统计
- [置顶] 【spark 词频统计】spark单词进行计数升级版
- spark:学习杂记+wordcount(单词统计)--22
- 使用spark的dataframe实现单词统计
- spark 统计单词个数
- spark统计文献中每个英文单词出现的次数
- spark下统计单词频次
- spark下统计单词频次
- java,scala之spark streaming 版本的单词统计(通过监听端口)
- SparkStreaming的实时单词统计小例子
- 用spark建立一个单词统计的应用