Sys.setenv(SPARK_HOME="/home/shige/bin/spark")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
into the .Rprofile file. This, however, has the undesirable side effect of adding yet another directory to which R packages can be installed.
My solution is:
1. Create a soft link of the SparkR directory in the the directory where other R packages are installed (ln -s /home/shige/bin/spark/R/lib/
2. Add (Sys.setenv(SPARK_HOME="/home/shige/bin/spark")) to the .Rprofile file.
3. Add (Sys.setenv(SPARKR_SUBMIT_ARGS ='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"')) to the .Rprofile.
All set.
10 comments:
How to setup in Mac environment?
Thanks a lot.
I assume it'll also work on Windows and Mac. Give it a try and let me know.
Hi, thanks for replying me.
I have download "spark-1.4.0-bin-hadoop2.6.tgz" and unzip it.
ln -s /Users/frankie/Desktop/spark-1.4.0-bin-without-hadoop/R/lib/SparkR /????? (i don't know this path for?)
Thanks
I got SparkR running on RStudio, but I have no idea how to read a .csv file from RStudio.
When launching de sparkRshell
./bin/sparkR --master local[7] --packages com.databricks:spark-csv_2.10:1.0.3
I can read a file as follows:
flights <- read.df(sqlContext, "./nycflights13.csv", "com.databricks.spark.csv", header="true")
But in RStudio this is no longer true, I get the following error:
Caused by: java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv
Do you have any idea how to solve this? Can I import com.databricks:spark-csv_2.10:1.0.3 in .Rprofile or somewhere else? You can also check my question on stackOverflow: http://stackoverflow.com/questions/30870379/loading-com-databricks-spark-csv-via-rstudio
Hi Wannes,
Thanks for the post. Unfortunately, I do not have an answer. Hopefully someone can answer your question on StackOverflow.
Shige
Hi Shige,
I've found a solution, if you're interested, check stack overflow.
Thanks, I'll give it a try.
On a Mac, you can install using Homebrew. You do have to adjust the path a little:
https://gist.github.com/kenahoo/0f4c08fe10337a53836d
I got error when sc <- sparkR.init():
Error in socketConnection(port = monitorPort) :
cannot open the connection
In addition: Warning message:
In socketConnection(port = monitorPort) : localhost:64143 cannot be opened
Here's the info of hadoop and spark. Any idea? Thanks!
$ brew info hadoop
hadoop: stable 2.7.2
Framework for distributed processing of large data sets
https://hadoop.apache.org/
/usr/local/Cellar/hadoop/2.7.2 (6,304 files, 310M) *
Built from source on 2016-05-27 at 02:34:15
From: https://github.com/Homebrew/ho...
==> Caveats
In Hadoop's config file:
/usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hadoop-env.sh,
/usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-env.sh and
/usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-env.sh
$JAVA_HOME has been set to be the output of:
/usr/libexec/java_home
$ brew info apache-spark
apache-spark: stable 1.6.1, HEAD
Engine for large-scale data processing
https://spark.apache.org/
/usr/local/Cellar/apache-spark/1.6.1 (736 files, 372M) *
Built from source on 2016-05-27 at 02:24:45
From: https://github.com/Homebrew/ho...
A much easier solution is provided by Rstudio in the form of a new package called "sparklyr". It allows you to automatically download and setup Spark on your local machine without manually tweaking any settings.
Post a Comment