2016-02-20 10 views
12

Próbuję prosty przykład liczenia słów na iskrze w trybie klastrów i tryb lokalny działa dobrze w trybie lokalnym, ale wyrzucanie wyjątku rzucania klasy w trybie klastrów jest tutaj fragment kodu ...Spark 1.6.0 rzucający wyjątek classcast w trybie klastrowania działa dobrze w trybie lokalnym

package com.example 

import com.typesafe.config.ConfigFactory 
import org.apache.spark.{SparkConf, SparkContext} 

/** 
    * Created by Rahul Shukla on 20/2/16. 
    */ 
object SampleFile extends App { 
    val config = ConfigFactory.load() 
    val conf = new SparkConf() 
    .setAppName("WordCount") 
    //.setMaster("spark://ULTP:7077") 
    .setMaster("local") 
    .setSparkHome(config.getString("example.spark.home")) 
    .set("spark.cleaner.ttl", "30s") 
    .set("spark.app.id", "KnowledgeBase") 
    .set("spark.driver.allowMultipleContexts", "true") 
    val sc = new SparkContext(conf) 
    sc.textFile("build.sbt").flatMap(row => row.split(" ")).map((_, 1)).reduceByKey(_ + _).foreach(println) 
} 

Środowisko Spark 1.6 build przeciwko Scala 2.11.7

Wyjątek:

java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD 
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) 
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) 
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
    at org.apache.spark.scheduler.Task.run(Task.scala:89) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
16/02/21 03:18:05 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) 
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD 
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) 
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) 
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
    at org.apache.spark.scheduler.Task.run(Task.scala:89) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
16/02/21 03:18:05 ERROR Executor: Exception in task 1.1 in stage 0.0 (TID 2) 
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD 
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) 
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) 
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
    at org.apache.spark.scheduler.Task.run(Task.scala:89) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
16/02/21 03:18:05 ERROR Executor: Exception in task 0.1 in stage 0.0 (TID 3) 
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD 
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) 
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) 
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
    at org.apache.spark.scheduler.Task.run(Task.scala:89) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
16/02/21 03:18:06 ERROR Executor: Exception in task 1.2 in stage 0.0 (TID 4) 
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD 
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) 
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) 
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
    at org.apache.spark.scheduler.Task.run(Task.scala:89) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
16/02/21 03:18:06 ERROR Executor: Exception in task 0.2 in stage 0.0 (TID 5) 
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD 
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) 
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) 
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
    at org.apache.spark.scheduler.Task.run(Task.scala:89) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
16/02/21 03:18:06 ERROR Executor: Exception in task 1.3 in stage 0.0 (TID 6) 
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD 
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) 
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) 
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
    at org.apache.spark.scheduler.Task.run(Task.scala:89) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
16/02/21 03:18:06 ERROR Executor: Exception in task 0.3 in stage 0.0 (TID 7) 
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD 
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) 
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) 
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) 
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
    at org.apache.spark.scheduler.Task.run(Task.scala:89) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 

Spark Shell Wyjście:

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). 
log4j:WARN Please initialize the log4j system properly. 
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties 
To adjust logging level use sc.setLogLevel("INFO") 
16/02/22 01:23:51 WARN Utils: Your hostname, ULTP resolves to a loopback address: 127.0.1.1; using 192.168.1.104 instead (on interface wlan1) 
16/02/22 01:23:51 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
Spark context available as sc. 
SQL context available as sqlContext. 
Welcome to 
     ____    __ 
    /__/__ ___ _____/ /__ 
    _\ \/ _ \/ _ `/ __/ '_/ 
    /___/ .__/\_,_/_/ /_/\_\ version 1.6.0 
     /_/ 

Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_72) 
Type in expressions to have them evaluated. 
Type :help for more information. 
+0

Kod źródłowy: https://github.com/shukla2009/spark-example.git –

+1

Wygląda na wersję Scala twojego klastra Spark 2.10, ale używasz Scala 2.11 do kompilowania aplikacji. Czy możesz to sprawdzić? – zsxwing

+0

Sugestia: zmień 'local' na' local [2] 'i spróbuj ponownie. "local [n]" oznacza uruchamianie lokalne, ale z n wątkami roboczymi, które lepiej symulują prawdziwe działania klastra (np. uruchamia serializację), więc może wypróżnić błąd lokalnie, gdzie jest łatwiejszy do wykrycia. Jeśli się nie powiedzie - to nie jest problem z konfiguracją klastra. –

Odpowiedz

2

Ok, miałem dokładnie taki sam wyjątek. Mój problem okazał się totalnym smackerem na czole. Zaczynałem powłokę z -cp dla słoików zamiast --jars. DOH! Nie jest dla mnie jasne, czy to zrobiłeś, czy nie, ale wyobrażam sobie, że może się to łatwo przytrafić komuś, kto przywykł do scala REPL-cp :) Ślad stosu jest tak niewidoczny, mam nadzieję, że pomoże to biednej duszy.

2

Miałem ten sam błąd raz rozwiązany przez dodanie spark.jars do SparkConf jak ten

set("spark.jars", <pathToYourSparkAppJar>) 

Ale uruchomiła Spark aplikacji Jar jako moduł z innej aplikacji Java. Używając powłoki Spark, możesz użyć opcji wiersza poleceń --jars <pathToYourSparkAppJar>.

0

Dokładnie ten sam problem z tobą, i pomyślnie rozwiązany przez odpowiedź @Emanuel Seidinger. Wielkie dzięki.

i dla ludzi, którzy są z wykorzystaniem iskrę 2.0+, SparkSession jest oficjalnym wejście stosowania iskrowym, w tym przypadku można odwoływać się do tego przykładu:

val sparkSession: SparkSession = SparkSession.builder 
    .appName("<your_app_name>") 
    .master("<your_master>") 
    .config("spark.executor.uri", "<your_executor_uri>") 
    .config("spark.executor.memory", "<your_conf>") 
    .config("spark.jars", "<path_to_your_assembly_jar>") 
    .getOrCreate 

Następnie można użyć tego, aby rozpocząć nowy sterownik za pomocą na przykład sbt-jmh lub inne narzędzia, aby rozpocząć sesję iskry, zamiast przesyłać aplikację za pomocą "spark-submit".

Powiązane problemy