How to extract data from XML nodes in Scala
2016-12-01 16:41
507 查看
By Alvin Alexander. Last updated: June 3 2016
Problem: In a Scala application, you want to extract information from XML you receive, so you can use the data in your application.
Use the methods of the Scala
to extract the data. The most commonly used methods of the
The following examples demonstrate most of the methods just shown. Given this XML literal:
you can search for and extract subelements with the
These methods will be demonstrated more in subsequent recipes.
The label method returns the name of the current element. A
The
Later examples will demonstrate how to improve on this result.
Element attributes are extracted with the
these methods, and the values they return:
The following examples demonstrate how those same method calls behave when you search for an attribute that doesn’t exist:
To demonstrate more ways to work with element attributes, let’s create a new element:
These examples show how
These examples show how to iterate over a set of attributes:
The
The
You can use
Because
The
You can improve this result with the
This approach shows another way to extract the text from the elements:
There are more ways to tackle these problems using XPath methods, which will be shown in subsequent chapters.
As a word of caution, be careful with the
this, the following examples show the output when there is a space before the
In the next examples the same XML, formatted in different ways, yields different results:
If you need to extract text in this manner, a workaround is to extract the text components individually into a sequence, and then re-combine the sequence as desired. The following example demonstrates how to accomplish this with the
and
the
This lets you write the following code, which creates a sequence of strings from the
The REPL shows that the resulting variable strings has the following type and data:
In the XPath recipes in this chapter you’ll see how to accomplish some of the same tasks using the
If you want to test these commands against large data sets, this URL maintains a nice collection of sample XML data:
http://www.cs.washington.edu/research/xmldatasets/
The NASA data set is 23 MB, and causes the Scala REPL to crash with a Java heap space error:
To get around this problem, you can allocate more heap space when starting the REPL with this command:
or this command:
Problem: In a Scala application, you want to extract information from XML you receive, so you can use the data in your application.
Solution
Use the methods of the Scala Elemand
NodeSeqclasses
to extract the data. The most commonly used methods of the
Elemclass are shown here:
Commonly used methods of the Elem class Method Description ------ ----------- x \ "div" Searches the XML literal x for elements of type <div>. Only searches immediate child nodes (no grandchild or “descendant” nodes). x \\ "div" Searches the XML literal x for elements of type <div>. Returns matching elements from child nodes at any depth of the XML tree. x.attribute("class") Returns the value of the given attribute in the current node. <a x="10" y="20">foo</a>.attribute("x") // returns Some(10). x.attributes Returns all attributes of the current node, prefixed and unprefixed, in no particular order. scala> <a x="10" y="20">foo</a>.attributes res0: scala.xml.MetaData = x="10" y="20" x.child Returns the children of the current node. <a><b>foo</b></a>.child // returns <b>foo</b>. x.copy(...) Returns a copy of the element, letting you replace data during the copy process. x.label The name of the current element. <a><b>foo</b></a>.label // returns a. x.text Returns a concatenation of text(n) for each child n. x.toString Emits the XML literal as a String. Use scala.xml.PrettyPrinter to format the output, if desired.
Examples
The following examples demonstrate most of the methods just shown. Given this XML literal:scala> val x = <div class="content"><p>Hello</p><p>world</p></div> x: scala.xml.Elem = <div class="content"><p>Hello</p><p>world</p></div>
you can search for and extract subelements with the
\and
\\XPath methods:
scala> x \ "p" res0: scala.xml.NodeSeq = NodeSeq(<p>Hello</p>, <p>world</p>) scala> x \\ "p" res1: scala.xml.NodeSeq = NodeSeq(<p>Hello</p>, <p>world</p>)
These methods will be demonstrated more in subsequent recipes.
The label method returns the name of the current element. A
<p>tag returns p, a
<div>tag returns div, etc.:
scala> x.label res2: String = div scala> <name>Joe</name>.label res3: String = name
The
textmethod returns the text from all subelements, which the Scaladoc describes as, “a concatenation of all text(n) for each child n”:
scala> x.text res4: String = Helloworld
Later examples will demonstrate how to improve on this result.
Element attributes are extracted with the
attributeor
attributesmethods. The following examples demonstrate how to call
these methods, and the values they return:
scala> x.attribute("class") res5: Option[Seq[scala.xml.Node]] = Some(content) scala> x.attributes("class") res6: Seq[scala.xml.Node] = content scala> x.attributes.get("class") res7: Option[Seq[scala.xml.Node]] = Some(content)
The following examples demonstrate how those same method calls behave when you search for an attribute that doesn’t exist:
scala> x.attribute("foo") res8: Option[Seq[scala.xml.Node]] = None scala> x.attributes("foo") res9: Seq[scala.xml.Node] = null scala> x.attributes.get("foo") res10: Option[Seq[scala.xml.Node]] = None scala> x.attributes.get("foo").getOrElse("N/A") res11: Object = N/A
To demonstrate more ways to work with element attributes, let’s create a new element:
scala> val w = <forecast day="Thu" date="10 Nov 2011" low="37" high="58" /> w: scala.xml.Elem = <forecast day="Thu" date="10 Nov 2011" low="37" high="58" />
These examples show how
attributeand
attributeswork with multiple attributes:
scala> w.attribute("day") res0: Option[Seq[scala.xml.Node]] = Some(Thu) scala> w.attributes("day") res1: Seq[scala.xml.Node] = Thu scala> w.attributes res2: scala.xml.MetaData = day="Thu" date="10 Nov 2011" low="37" high="58"
These examples show how to iterate over a set of attributes:
scala> for (a <- w.attributes) println(s"key: ${a.key}, value: ${a.value}") key: day, value: Thu key: date, value: 10 Nov 2011 key: low, value: 37 key: high, value: 58 scala> w.attributes.asAttrMap res3: Map[String,String] = Map(low -> 37, date -> 10 Nov 2011, day -> Thu, high -> 58)
Child elements
The childmethod returns all child nodes of the current element. To demonstrate this, let’s create a new XML variable:
scala> val p = <person><name>Ken</name><age>23</age></person> p: scala.xml.Elem = <person><name>Ken</name><age>23</age></person>
The
childmethod returns immediate child nodes:
scala> p.child res0: Seq[scala.xml.Node] = ArrayBuffer(<name>Ken</name>, <age>23</age>)
You can use
childto iterate over all the children:
scala> for (n <- p.child) println(n) <name>Ken</name> <age>23</age>
Because
childreturns a sequence, you can also access the child elements like this:
scala> p.child(0) res1: scala.xml.Node = <name>Ken</name> scala> p.child(0).label res2: String = name scala> p.child(0).text res3: String = Ken scala> p.child(1) res4: scala.xml.Node = <age>23</age> scala> p.child(1).text.toInt res5: Int = 23
Text and strings
The toStringmethod returns the XML structure as a
String:
scala> p.toString res6: String = <person><name>Ken</name><age>23</age></person>
You can improve this result with the
PrettyPrinterclass.
This approach shows another way to extract the text from the elements:
scala> for (n <- p.child) yield n.text res7: Seq[String] = ArrayBuffer(Ken, 23)
There are more ways to tackle these problems using XPath methods, which will be shown in subsequent chapters.
As a word of caution, be careful with the
textmethod. It returns different results depending on how the XML is formatted, which can be a particular problem when extracting XHTML data. To demonstrate
this, the following examples show the output when there is a space before the
<br>tag, and when there is no space:
scala> <div><p>Hello, world, <br/>it's me.</p></div>.text res0: String = Hello, world, it's me. scala> <div><p>Hello, world,<br/>it's me.</p></div>.text res1: String = Hello, world,it's me.
In the next examples the same XML, formatted in different ways, yields different results:
scala> <div><p>Is 2 > 1?</p><p>Why do you ask?</p></div>.text res2: String = Is 2 > 1?Why do you ask? scala> <div> | <p>Is 2 > 1?</p> | <p>Why do you ask?</p> | </div>.text res3: String = " Is 2 > 1? Why do you ask? "
If you need to extract text in this manner, a workaround is to extract the text components individually into a sequence, and then re-combine the sequence as desired. The following example demonstrates how to accomplish this with the
child,
label,
and
textmethods. Given this XML literal:
val xml = <div><p>Is 2 > 1?</p><p>Why do you ask?</p></div>
the
childmethod returns the elements as a sequence:
scala> xml.child res0: Seq[scala.xml.Node] = ArrayBuffer(<p>Is 2 > 1?</p>, <p>Why do you ask?</p>)
This lets you write the following code, which creates a sequence of strings from the
<p>tags:
val strings = for { e <- xml.child if e.label == "p" } yield e.text
The REPL shows that the resulting variable strings has the following type and data:
strings: Seq[String] = ArrayBuffer(Is 2 > 1?, Why do you ask?)
In the XPath recipes in this chapter you’ll see how to accomplish some of the same tasks using the
\and
\\methods.
Example data sets and REPL memory errors
If you want to test these commands against large data sets, this URL maintains a nice collection of sample XML data:http://www.cs.washington.edu/research/xmldatasets/
The NASA data set is 23 MB, and causes the Scala REPL to crash with a Java heap space error:
scala> val xml = scala.xml.XML.loadFile("nasa.xml") java.lang.OutOfMemoryError: Java heap space ...
To get around this problem, you can allocate more heap space when starting the REPL with this command:
$ scala -J-Xms256m -J-Xmx512m
or this command:
$ env JAVA_OPTS="-Xms256m -Xmx512m" scala
相关文章推荐
- How to extract data from ContentHolder in Windc...
- Insight into DOMDocument - how to convert data from XML to array in PHP
- How to create custom navigation menu in SharePoint with XML data source 使用XML数据源在SharePoint创建自定义导航菜单
- How to get data from Oracle DB in silverlight via WCF ?
- How To Generate An XML File As A Target Datastore Using ODI In An Integration Interface ? [ID 454268
- How to create an XTR file from XML in Delphi XE4 using XML Mapper?(delphi中如何通过xmlmapper创建xtr文件)
- how to do with the special characters in the xml data
- How to create custom navigation menu in SharePoint with XML data source 使用XML数据源在SharePoint创建自定义导航菜单
- Units Problem: How to read text size as custom attr from xml and set it to TextView in java code
- How to retreive raw post data from HttpServletRequest in java
- Simple VBScript program to extract data from all worksheets in an Excel spreadsheet
- How to extract datafiles from asm diskgroup?
- How to read data from csv file in c#
- Units Problem: How to read text size as custom attr from xml and set it to TextView in java code
- How to retreive raw post data from HttpServletRequest in java
- How to extract datafiles from asm diskgroup?
- Units Problem: How to read text size as custom attr from xml and set it to TextView in java code
- using JS to control two select(html),the data can be loaded from database and XML,and show in the select
- How to read data from a file in reverse order?
- How to create a hex dump from binary data in C++