Spark local mode operation

original
2015/12/19 12:23
Reading number 10K

one Spark Introduction
Spark is a distributed computing based on the map reduce algorithm, which has the advantages of Hadoop MapReduce; However, unlike MapReduce, the intermediate output and results of the job can be saved in memory, thus no need to read or write HDFS. Therefore, Spark can be better applied to map reduce algorithms requiring iteration such as data mining and machine learning.

2. Operation mode
Local mode
Standalone mode
Mesoes mode
Yarn mode

Let's test it“ Local mode ”The running condition of this test is Windows7

1. Environmental preparation

jdk 1.6
Spark 1.2.0   
Official website: http://spark.apache.org/downloads.html
Or: http://archive.apache.org/dist/spark/
We can download the source code [spark 1.2.0. tgz] and compile it by ourselves, or directly download [spark 1.2.0 bin hadoop2.4. tgz]
Note: 1.6 is installed in the local jdk. Spark1.5+can only run on jdk7,8;
Scala  2.11.7   http://www.scala-lang.org/download/
Python 2.7.7   https://www.python.org/downloads/windows/

2. Contents

Will download【 spark-1.2.0-bin-hadoop2.4.tgz 】Decompress

Bin: a command to facilitate our local mode testing
Conf: Some configuration templates. We can remove the template from xxxx.template and reconfigure it

By default, it directly supports scala and Python running, and executes respectively: spark shell.cmd and pyspark. cmd

function spark-shell.cmd:

function pyspark.cmd:

1. Of course, Spark itself supports the java language. We can execute java programs through Spark-submit.cmd

Create a maven project to test:

 <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.2.0</version> </dependency>

It will automatically download all the dependent jar packages. There are many jar packages, so you need to wait a little bit

 package com.spark; import java.util.Arrays; import java.util.regex.Pattern; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.FlatMapFunction; public class WordCount { private static final Pattern SPACE = Pattern.compile(" "); public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("JavaWordCount"); JavaSparkContext ctx = new JavaSparkContext(conf); String filePath = "D:/systemInfo.log"; JavaRDD<String> lines = ctx.textFile(filePath, 1); JavaRDD<String> words = lines .flatMap(new FlatMapFunction<String, String>() { @Override public Iterable<String> call(String s) { return Arrays.asList(SPACE.split(s)); } }); System.out.println("wordCount:" + words.count()); } }
Make our project into a jar package through maven


Spark submit parameter description: Usage: spark-submit [options] <app jar | python file> [app options]

See this for details: http://my.oschina.net/u/140462/blog/519409


You can view the relevant information of Spark through port 4040: http://localhost:4040

2. Java programs can also be used without Spark submit can be run locally:

 package com.spark; import java.io.BufferedReader; import java.io.InputStreamReader; import java.util.Arrays; import java.util.regex.Pattern; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.FlatMapFunction; public final class JavaWordCount { private static final Pattern SPACE = Pattern.compile(" "); public static void main(String[] args) throws Exception { SparkConf conf = new SparkConf().setMaster("local").setAppName( "JavaWordCount"); JavaSparkContext ctx = new JavaSparkContext(conf); String filePath = ""; BufferedReader reader = new BufferedReader(new InputStreamReader( System.in)); System.out.println("Enter FilePath:"); System.out.println("e.g. D:/systemInfo.log"); while (true) { System.out.println("> "); filePath = reader.readLine(); if (filePath.equalsIgnoreCase("exit")) { ctx.stop(); } else { JavaRDD<String> lines = ctx.textFile(filePath, 1); JavaRDD<String> words = lines .flatMap(new FlatMapFunction<String, String>() { @Override public Iterable<String> call(String s) { return Arrays.asList(SPACE.split(s)); } }); System.out.println("wordCount:" + words.count()); } } } }
The results are as follows:
 Enter FilePath: e.g. D:/systemInfo.log >  D:/systemInfo.log wordCount:48050 >

The setMaster is local. The specific master URL parameters are shown in the following figure:

Reference above: http://www.infoq.com/cn/articles/apache-spark-introduction

Expand to read the full text
Loading
Click to lead the topic 📣 Post and join the discussion 🔥
Reward
zero comment
seven Collection
zero fabulous
 Back to top
Top