8

Próbuję użyć MlLib do mojego colloborative filtrowania.Apache Spark - MlLib - Filtrowanie grupowe

Występuje następujący błąd w moim programie Scala, gdy uruchomię go w Apache Spark 1.0.0.

14/07/15 16:16:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
    14/07/15 16:16:31 WARN LoadSnappy: Snappy native library not loaded 
    14/07/15 16:16:31 INFO FileInputFormat: Total input paths to process : 1 
    14/07/15 16:16:38 WARN TaskSetManager: Lost TID 10 (task 80.0:0) 
    14/07/15 16:16:38 WARN TaskSetManager: Loss was due to java.lang.UnsatisfiedLinkError 
    java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dposv(CII[DII[DII)I 
     at org.jblas.NativeBlas.dposv(Native Method) 
     at org.jblas.SimpleBlas.posv(SimpleBlas.java:369) 
     at org.jblas.Solve.solvePositive(Solve.java:68) 
     at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$2.apply(ALS.scala:522) 
     at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$2.apply(ALS.scala:509) 
     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) 
     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) 
     at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) 
     at scala.collection.mutable.ArrayOps$ofInt.foreach(ArrayOps.scala:156) 
     at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
     at scala.collection.mutable.ArrayOps$ofInt.map(ArrayOps.scala:156) 
     at org.apache.spark.mllib.recommendation.ALS.org$apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:509) 
     at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:445) 
     at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:444) 
     at org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31) 
     at org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31) 
     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
     at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:156) 
     at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154) 
     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
     at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) 
     at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) 
     at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) 
     at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) 
     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) 
     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
     at org.apache.spark.scheduler.Task.run(Task.scala:51) 
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:744) 
    14/07/15 16:16:38 ERROR TaskSchedulerImpl: Lost executor 0 on maroki.office.mkechinov.ru: Uncaught exception 
    14/07/15 16:16:38 WARN TaskSetManager: Lost TID 12 (task 80.0:0) 
    14/07/15 16:16:42 WARN TaskSetManager: Lost TID 18 (task 80.0:1) 
    14/07/15 16:16:42 WARN TaskSetManager: Loss was due to fetch failure from null 
    14/07/15 16:16:42 WARN TaskSetManager: Loss was due to fetch failure from null 
    14/07/15 16:16:43 WARN TaskSetManager: Lost TID 25 (task 80.1:0) 
    14/07/15 16:16:43 WARN TaskSetManager: Loss was due to java.lang.UnsatisfiedLinkError 

Jak mogę rozwiązać ten błąd?

Odpowiedz

9

Spark documentation wyraźnie wspomina, że ​​MLLib korzysta z bibliotek natywnych, które muszą być obecne w węzłach. (Czyli nie pochodzi z instalacji zapłonowej)

MLlib używa biblioteki jblas algebry liniowej, która sama zależy od rodzimych Fortran rutyny. Może być konieczne zainstalowanie biblioteki wykonawczej gfortran, jeśli nie jest ona już obecna w Twoich węzłach. MLlib zgłosi błąd łączenia, jeśli nie będzie mógł automatycznie wykryć tych bibliotek.

Musisz upewnić się, że biblioteka libgfortran istnieje we wszystkich węzłach.

do użytku debian/ubuntu: sudo apt-get install libgfortran3

dla CentOS użyć: sudo yum install gcc-gfortran

Powiązane problemy