pyspark 读写 elastic search 出错,求帮助

2017-10-20 21:11:01 +08:00
 SlipStupig

我用 spark 读写 es 报错:

py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.mr.LinkedMapWritable

代码如下:

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local").setAppName("elasticsearch-hadoop")
sc = SparkContext(conf=conf)

# read in ES index/type "products/kcosmetics"
es_rdd = sc.newAPIHadoopRDD(
    inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
    keyClass="org.apache.hadoop.io.NullWritable",
    valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
    conf={ "es.resource" : "products" })
print(es_rdd.first())

kcosmetics_availability = es_rdd.map(lambda item: ("key",{
    'id': item[0] , ## _id from products/kcosmetics
    'availability': item[1]['availability']
}))

# write the results to "titanic/value_counts"
kcosmetics_availability.saveAsNewAPIHadoopFile(
    path='-',
    outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
    keyClass="org.apache.hadoop.io.NullWritable",
    valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
    conf={
        "es.index.auto.create": "true", # auto creating index as inserted
        "es.mapping.id": "id",          # auto mapping id as index id
        "es.resource" : "products/kcosmetics_stocks" })

根据错误信息我又去安装 elasticsearch-hadoop, 结果提示我:

java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one
jar:file:/home/andy/Desktop/spark-2.2.0-bin-hadoop2.7/jars/elasticsearch-spark-20_2.11-5.6.3.jar
jar:file:/home/andy/Desktop/spark-2.2.0-bin-hadoop2.7/jars/elasticsearch-hadoop-mr-5.6.3.jar

spark 版本是:2.2 elasticsearch-spark 版本: 5.6.3

4636 次点击
所在节点    程序员
3 条回复
badttt
2017-10-20 21:39:58 +08:00
jar 包版本问题,把 es spark 的删掉
ligyxy
2017-10-21 03:40:42 +08:00
你只需要安装其中的 elasticsearch-hadoop-5.6.3.jar
SlipStupig
2017-10-21 08:40:03 +08:00
@ligyxy 可以了

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/399369

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX