Spark GraphX 入门实例完整scala代码
2014-12-20 20:49
363 查看
由于天然符合互联网中很多场景的需求,图计算正受到越来越多的青睐。Spark GraphX 是作为 Spark 技术堆栈中的一员,担负起了 Spark 在图计算领域中的重任。网络上已经有很多图计算和 Spark GraphX 的概念介绍,此处就不再赘述。 本文将一篇很好的 Spark GraphX 入门文章中代码块整合为一个完整的可执行类,并加上必要注释以及执行结果,以方便有兴趣的朋友快速从 API 角度了解 Spark GraphX。
本文引用的代码块和多数文字描述均摘引自网文 graph-analytics-with-graphx, 在此特向作者表以感谢!
[1] 完整可执行scala 代码:
[2] 执行结果:
本文引用的代码块和多数文字描述均摘引自网文 graph-analytics-with-graphx, 在此特向作者表以感谢!
[1] 完整可执行scala 代码:
package scala.spark.graphx import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD import org.apache.spark._ import org.apache.spark.SparkContext._ object GraphXExample { def main(args: Array[String]) { val conf = new SparkConf().setAppName("GraphXExample") val sc = new SparkContext(conf) // [A] creating the Property Graph from arrays of vertices and edges println("[A] creating the Property Graph from arrays of vertices and edges"); // Each vertex is keyed by a unique 64-bit long identifier (VertexID), like '1L' val vertexArray = Array( (1L, ("Alice", 28)), (2L, ("Bob", 27)), (3L, ("Charlie", 65)), (4L, ("David", 42)), (5L, ("Ed", 55)), (6L, ("Fran", 50))) // the Edge class stores a srcId, a dstId and the edge property val edgeArray = Array( Edge(2L, 1L, 7), Edge(2L, 4L, 2), Edge(3L, 2L, 4), Edge(3L, 6L, 3), Edge(4L, 1L, 1), Edge(5L, 2L, 2), Edge(5L, 3L, 8), Edge(5L, 6L, 3)) // construct the following RDDs from the vertexArray and edgeArray variables. val vertexRDD: RDD[(Long, (String, Int))] = sc.parallelize(vertexArray) val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray) // build a Property Graph val graph: Graph[(String, Int), Int] = Graph(vertexRDD, edgeRDD) // [B] Extract the vertex and edge RDD views of a graph println("[B] Extract the vertex and edge RDD views of a graph"); // Solution 1 println("Solution 1:============") graph.vertices.filter { case (id, (name, age)) => age > 30 }.collect.foreach { case (id, (name, age)) => println(s"$name is $age") } // Solution 2 println("Solution 2:============") graph.vertices.filter(v => v._2._2 > 30).collect.foreach(v => println(s"${v._2._1} is ${v._2._2}")) // Solution 3 println("Solution 3:============") for ((id, (name, age)) <- graph.vertices.filter { case (id, (name, age)) => age > 30 }.collect) { println(s"$name is $age") } // [C] Exposes a triplet view which logically joins the vertex and edge properties yielding an RDD[EdgeTriplet[VD, ED]] println("[C] Exposes a triplet view which logically joins the vertex and edge properties yielding an RDD[EdgeTriplet[VD, ED]]"); println("Use the graph.triplets view to display who likes who: ") for (triplet <- graph.triplets.collect) { println(s"${triplet.srcAttr._1} likes ${triplet.dstAttr._1}") } // For extra credit, find the lovers. // If someone likes someone else more than 5 times than that relationship is getting pretty serious. println("For extra credit, find the lovers if has:============") for (triplet <- graph.triplets.filter(t => t.attr > 5).collect) { println(s"${triplet.srcAttr._1} loves ${triplet.dstAttr._1}") } // [D] Graph Operators // Property Graphs also have a collection of basic operations println("[D] Graph Operators") // compute the in-degree of each vertex val inDegrees: VertexRDD[Int] = graph.inDegrees // Define a class to more clearly model the user property case class User(name: String, age: Int, inDeg: Int, outDeg: Int) // Create a user Graph val initialUserGraph: Graph[User, Int] = graph.mapVertices { case (id, (name, age)) => User(name, age, 0, 0) } // Fill in the degree information val userGraph = initialUserGraph.outerJoinVertices(initialUserGraph.inDegrees) { case (id, u, inDegOpt) => User(u.name, u.age, inDegOpt.getOrElse(0), u.outDeg) }.outerJoinVertices(initialUserGraph.outDegrees) { case (id, u, outDegOpt) => User(u.name, u.age, u.inDeg, outDegOpt.getOrElse(0)) } // Here we use the outerJoinVertices method of Graph which has the following (confusing) type signature: // def outerJoinVertices[U, VD2](other: RDD[(VertexID, U)])(mapFunc: (VertexID, VD, Option[U]) => VD2): Graph[VD2, ED] // Using the degreeGraph print the number of people who like each user: println("Using the degreeGraph print the number of people who like each user:============") for ((id, property) <- userGraph.vertices.collect) { println(s"User $id is called ${property.name} and is liked by ${property.inDeg} people.") } // Print the names of the users who are liked by the same number of people they like. userGraph.vertices.filter { case (id, u) => u.inDeg == u.outDeg }.collect.foreach { case (id, property) => println(property.name) } // [D.1] The Map Reduce Triplets Operator // The mapReduceTriplets operator enables neighborhood aggregation and find the oldest follower of each user println("[D.1] The Map Reduce Triplets Operator") // Find the oldest follower for each user println("Find the oldest follower for each user:============") val oldestFollower: VertexRDD[(String, Int)] = userGraph.mapReduceTriplets[(String, Int)]( // For each edge send a message to the destination vertex with the attribute of the source vertex edge => Iterator((edge.dstId, (edge.srcAttr.name, edge.srcAttr.age))), // To combine messages take the message for the older follower (a, b) => if (a._2 > b._2) a else b) userGraph.vertices.leftJoin(oldestFollower) { (id, user, optOldestFollower) => optOldestFollower match { case None => s"${user.name} does not have any followers." case Some((name, age)) => s"${name} is the oldest follower of ${user.name}." } }.collect.foreach { case (id, str) => println(str) } // Try finding the average follower age of the followers of each user println("Try finding the average follower age of the followers of each user:============") val averageAge: VertexRDD[Double] = userGraph.mapReduceTriplets[(Int, Double)]( // map function returns a tuple of (1, Age) edge => Iterator((edge.dstId, (1, edge.srcAttr.age.toDouble))), // reduce function combines (sumOfFollowers, sumOfAge) (a, b) => ((a._1 + b._1), (a._2 + b._2))).mapValues((id, p) => p._2 / p._1) // Display the results userGraph.vertices.leftJoin(averageAge) { (id, user, optAverageAge) => optAverageAge match { case None => s"${user.name} does not have any followers." case Some(avgAge) => s"The average age of ${user.name}\'s followers is $avgAge." } }.collect.foreach { case (id, str) => println(str) } // [D.2] Subgraph // The subgraph operator that takes vertex and edge predicates and returns the graph // containing only the vertices that satisfy the vertex predicate (evaluate to true) // and edges that satisfy the edge predicate and connect vertices that satisfy the // vertex predicate. println("[D.2] Subgraph") // restrict our graph to the users that are 30 or older println("restrict our graph to the users that are 30 or older:============") val olderGraph = userGraph.subgraph(vpred = (id, user) => user.age >= 30) // compute the connected components val cc = olderGraph.connectedComponents // display the component id of each user: olderGraph.vertices.leftJoin(cc.vertices) { case (id, user, comp) => s"${user.name} is in component ${comp.get}" }.collect.foreach { case (id, str) => println(str) } } }
[2] 执行结果:
[A] creating the Property Graph from arrays of vertices and edges [B] Extract the vertex and edge RDD views of a graph Solution 1:============ David is 42 Fran is 50 Charlie is 65 Ed is 55 Solution 2:============ David is 42 Fran is 50 Charlie is 65 Ed is 55 Solution 3:============ David is 42 Fran is 50 Charlie is 65 Ed is 55 [C] Exposes a triplet view which logically joins the vertex and edge properties yielding an RDD[EdgeTriplet[VD, ED]] Use the graph.triplets view to display who likes who: Bob likes Alice Bob likes David Charlie likes Bob Charlie likes Fran David likes Alice Ed likes Bob Ed likes Charlie Ed likes Fran For extra credit, find the lovers if has:============ Bob loves Alice Ed loves Charlie [D] Graph Operators Using the degreeGraph print the number of people who like each user:============ User 4 is called David and is liked by 1 people. User 6 is called Fran and is liked by 2 people. User 2 is called Bob and is liked by 2 people. User 1 is called Alice and is liked by 2 people. User 3 is called Charlie and is liked by 1 people. User 5 is called Ed and is liked by 0 people. David Bob [D.1] The Map Reduce Triplets Operator Find the oldest follower for each user:============ Bob is the oldest follower of David. Charlie is the oldest follower of Fran. Charlie is the oldest follower of Bob. David is the oldest follower of Alice. Ed is the oldest follower of Charlie. Ed does not have any followers. Try finding the average follower age of the followers of each user:============ The average age of David's followers is 27.0. The average age of Fran's followers is 60.0. The average age of Bob's followers is 60.0. The average age of Alice's followers is 34.5. The average age of Charlie's followers is 55.0. Ed does not have any followers. [D.2] Subgraph restrict our graph to the users that are 30 or older:============ David is in component 4 Fran is in component 3 Charlie is in component 3 Ed is in component 3 |
相关文章推荐
- Spark GraphX 入门实例完整Scala代码
- Spark+ECLIPSE+JAVA+MAVEN windows开发环境搭建及入门实例【附详细代码】
- webpack+vue+vueRouter模块化构建完整项目实例超详细步骤(附截图、代码、入门篇)
- Spark+ECLIPSE+JAVA+MAVEN windows开发环境搭建及入门实例【附详细代码】
- highcharts图表中级入门:图表手动放大缩小自动切换图表标题(title/subtitle)以及图例(legend)、credits字体示例呈现3 6、完整实例代码:
- Spark入门实战系列--9.Spark GraphX介绍及实例
- Spark入门实战系列--9.Spark GraphX介绍及实例
- webpack+vue+vueRouter模块化构建完整项目实例超详细步骤(附截图、代码、入门篇)
- spark入门知识讲解和基础数据操作编程(统一用scala编程实例)
- MVC入门——经典MVC登录实例详解--附完整代码
- 存储过程从入门到熟练(多个存储过程完整实例及调用方法)_AX
- 使用C语言进行面向对象的开发--GObject入门[实例代码]
- C#完整执行存储过程的代码加实例
- jQuery 学习入门篇附实例代码
- 一个完整的定向小虫,方便大家看小说,呵呵,有完整代码,实例,文档。。还可定制,这仅仅是一个小小的演试.
- ajax实例入门代码
- SDL入门教程(九):2、显示文本的完整代码
- 实例讲解PDA入门开发的过程,附代码
- 存储过程从入门到熟练(多个存储过程完整实例及调用方法)(转)
- Linux下获取进程完整路径(包括进程名)——完整实例代码实现