Spark使用VectorAssembler时:IllegalArgumentException: Data type StringType is not supported
2018-01-02 11:41
1141 查看
我的处理顺序:
因为我在另外一个scala文件中已经将csv分割为了训练集和测试集,并且为了后续计算各个filed之间的相关系数而把所有filed的类型都已经改为了Double型,所以我就没有考虑我的数据会有问题。所以我怀疑是文件第一行的列名在搞鬼,然后我就去除了文件的第一行:然后:org.apache.hadoop.fs.ChecksumException: Checksum error
读取调用hadoop的API生成的文件,系统会检查文件是否“损坏”
复制粘贴后删除第一行再试,发觉还是报错
看源码,发觉VectorAssembler处理数据类型时最开始不会理睬第一行,仅仅在所有数据类型都case匹配不到时才会去做一些与第一行有关的操作。我不得不怀疑下filed的数据类型
printSchema发现:所有filed的类型又变成了string。。。我靠!这么不智能?还是说我操作失误?我认为Spark按道理还是能检测出数据类型的啊(194201.0, 378.0, 831.0, 2.0, 2.0, 3.0, 6.0些都理解为String?)
我不可能再去转换一次,那样太傻逼了,肯定有别的办法
加上mark标记的内容
val rawTrainDf = spark.read.format(“csv”).option(“header”,true).option(“inferSchema”, true).load(myTestCsvPath)
搞定:
注:
I. option(“header”,true)是说,Spark你别在我的csv文件的第一行添加列名:_c0,_c1等,我自己有列名。II. option(“inferSchema”, true)是说,Spark你别用你的默认方式处理我的数据类型,我也有【我暂时这样理解的】
III. 疑惑:option具体有哪些属性可以设置?我也没找见这个option相关的源代码在哪。知道的道友麻烦点拨一下,感谢!
相关文章推荐
- 使用Retrofit时出现 java.lang.IllegalArgumentException: URL query string "t={type}&p={page}&size={count}" must not have replace block. For dynamic query parameters use @Query.异常原因
- java.lang.IllegalArgumentException: Window type can not be changed after the window is added.
- 使用Activity.isfinishing()解决java.lang.IllegalArgumentException: View not attached to window manager
- android之“java.lang.IllegalArgumentException: Window type can not be changed after the window is add”
- java.lang.IllegalArgumentException: Window type can not be changed after the window is added.
- android之“java.lang.IllegalArgumentException: Window type can not be changed after the window is add”
- java.lang.IllegalArgumentException: other than LinearLayoutManger is not supported
- jpa语句报 org.springframework.dao.InvalidDataAccessApiUsageException: Parameter with that position [2] did not exist; nested exception is java.lang.IllegalArgumentException: Parameter with that position
- 使用Activity.isfinishing()解决java.lang.IllegalArgumentException: View not attached to window manager
- tomcat的java.lang.IllegalArgumentException:Document base *** does not exist or is not a readable
- Unknown integral data type for ids : java.lang.String; nested exception is org.hibernate.id.Identifi
- Caused by: java.lang.IllegalArgumentException: Pointcut is not well-formed: expecting ')' at character position 11
- Spring 整合 Flex (BlazeDS)无法从as对象 到 Java对象转换的异常:org.springframework.beans.ConversionNotSupportedException: Failed to convert property value of type 'java.util.Date' to required type 'java.sql.Timestamp' for property 'wfsj'; nested exception is java.lang.Ill
- java.lang.IllegalArgumentException: Document base F:\personal\projects\annoMVC\web does not exist or is not a readable directory
- Tomcat的异常 之 java.lang.IllegalArgumentException: Document base *** does not exist or is not a readable
- 关于spring java.lang.IllegalArgumentException: Name for argument type [java.lang.String] 的错误
- Tomcat的异常 之 java.lang.IllegalArgumentException: Document base xxx does not exist or is not a readab
- 在 Visual Studio 单元测试中使用CallContext 导致的 Unit Test Adapter threw exception: Type is not resolved for member... 异常
- 【spring】IllegalArgumentException Can not set field to $Proxy 在spring中使用事物或AOP遇到的错误
- org.apache.jasper.JasperException: java.lang.IllegalArgumentException: Attribute type="password" is