506070日本女同性恋精品,中文天堂最新版在线网

當(dāng)前位置： OFweek 人工智能網(wǎng) > 其他 > 正文

初識(shí)MapReduce的應(yīng)用場(chǎng)景（附JAVA和Python代碼）

2019-03-01 08:34

Python進(jìn)階學(xué)習(xí)交流

Java版本代碼

先是準(zhǔn)備一個(gè)數(shù)據(jù)集，包含著已經(jīng)切割好的詞匯，這里我們?cè)O(shè)置文件的格式是txt格式的。文件名是WordMRDemo．txt，內(nèi)容是下面簡(jiǎn)短的一句話，以空格分割開：

hello my name is spacedong welcome to the spacedong thank you

引入Hadoop的依賴包

／／這里使用的是2．6．5的依賴包，你可以使用其他版本的
＜dependency＞
＜groupId＞org．a(chǎn)pache．hadoop＜／groupId＞
＜artifactId＞hadoop－common＜／artifactId＞
＜version＞2．6．5＜／version＞
＜／dependency＞
＜dependency＞
＜groupId＞org．a(chǎn)pache．hadoop＜／groupId＞
＜artifactId＞hadoop－client＜／artifactId＞
＜version＞2．6．5＜／version＞
＜／dependency＞

（溫馨提示：代碼部分可左右滑動(dòng)）

新建WordMapper．java文件，代碼的作用是進(jìn)行以空格的形式進(jìn)行分詞。

public class WordMapper extends Mapper＜LongWritable， Text， Text， IntWritable＞｛
＠Override
protected void map（LongWritable key， Text value， Mapper．Context context）
throws java．io．IOException， InterruptedException ｛
String line ＝ value．toString（）；
／／StringTokenizer默認(rèn)按照空格來(lái)切
StringTokenizer st ＝ new StringTokenizer（line）；
while （st．hasMoreTokens（））｛
String world ＝ st．nextToken（）；
／／map輸出
context．write（new Text（world）， new IntWritable（1））；
｝
｝
｝

新建WordReduce．java文件，作用是進(jìn)行詞匯的統(tǒng)計(jì)。

public class WordReduce extends Reducer＜Text， IntWritable， Text， IntWritable＞｛
＠Override
protected void reduce（Text key， Iterable＜IntWritable＞ iterator， Context context）
throws java．io．IOException ，InterruptedException ｛
int sum ＝ 0 ；
for（IntWritable i：iterator）｛
sum＋＝i．get（）；
｝
context．write（key， new IntWritable（sum））；
｝
｝

新建WordMRDemo．java文件，作用是運(yùn)行Job，開始分析句子。

public class WordMRDemo ｛
public static void main（String［］ args）｛
Configuration conf ＝ new Configuration（）；
／／設(shè)置mapper的配置，既就是hadoop／conf／mapred－site．xml的配置信息
conf．set（＂mapred．job．tracker＂，＂hadoop：9000＂）；
try ｛
／／新建一個(gè)Job工作
Job job ＝ new Job（conf）；
／／設(shè)置運(yùn)行類
job．setJarByClass（WordMRDemo．class）；
／／設(shè)置要執(zhí)行的mapper類
job．setMapperClass（WordMapper．class）；
／／設(shè)置要執(zhí)行的reduce類
job．setReducerClass（WordReduce．class）；
／／設(shè)置輸出key的類型
job．setMapOutputKeyClass（Text．class）；
／／設(shè)置輸出value的類型
job．setMapOutputValueClass（IntWritable．class）；
／／設(shè)置ruduce任務(wù)的個(gè)數(shù)，默認(rèn)個(gè)數(shù)為一個(gè)（一般reduce的個(gè)數(shù)越多效率越高）
／／job．setNumReduceTasks（2）；
／／mapreduce 輸入數(shù)據(jù)的文件／目錄，注意，這里可以輸入的是目錄。
FileInputFormat．a(chǎn)ddInputPath（job， new Path（＂F：BigDataWorkPlacedatainput＂））；
／／mapreduce 執(zhí)行后輸出的數(shù)據(jù)目錄，不能預(yù)先存在，否則會(huì)報(bào)錯(cuò)。
FileOutputFormat．setOutputPath（job， new Path（＂F：BigDataWorkPlacedataout＂））；
／／執(zhí)行完畢退出
System．exit（job．waitForCompletion（true）？ 0 ： 1）；
｝ catch （Exception e）｛
e．printStackTrace（）；
｝
｝
｝

最后執(zhí)行WordMRDemo．java文件，然后得到的結(jié)果是out文件夾內(nèi)的內(nèi)容，它長(zhǎng)這個(gè)樣子：

out的文件目錄

out的文件目錄

打開part－r－00000文件的內(nèi)容如下

具體的文件內(nèi)容

具體的文件內(nèi)容Python代碼版本

新建map．py文件，進(jìn)行詞匯的切割。

for line in sys．stdin：
time．sleep（1000）
ss ＝ line．strip（）．split（＇＇）
for word in ss：
print ＇＇．join（［word．strip（），＇1＇］）

新建red．py文件，進(jìn)行詞匯的統(tǒng)計(jì)。

cur＿word ＝ None
sum ＝ 0
for line in sys．stdin：
ss ＝ line．strip（）．split（＇＇）
if len（ss） �。� 2：
continue
word， cnt ＝ ss
if cur＿word ＝＝ None：
cur＿word ＝ word
if cur＿word �。� word：
print ＇＇．join（［cur＿word， str（sum）］）
cur＿word ＝ word
sum ＝ 0
sum ＋＝ int（cnt）
print ＇＇．join（［cur＿word， str（sum）］）

新建run．sh文件，直接運(yùn)行即可。

HADOOP＿CMD＝＂／usr／local／src／hadoop－2．6．5／bin／hadoop＂
STREAM＿JAR＿PATH＝＂／usr／local／src／hadoop－2．6．5／share／hadoop／tools／lib／hadoop－streaming－2．6．5．jar＂
INPUT＿FILE＿PATH＿1＝＂／test．txt＂
OUTPUT＿PATH＝＂／output＂
＄HADOOP＿CMD fs －rmr －skipTrash ＄OUTPUT＿PATH
＃ Step 1．
＄HADOOP＿CMD jar ＄STREAM＿JAR＿PATH
－input ＄INPUT＿FILE＿PATH＿1
－output ＄OUTPUT＿PATH
－mapper ＂python map．py＂
－reducer ＂python red．py＂
－file ．／map．py
－file ．／red．py

以上的是演示demo的核心代碼，完整的代碼可以上github的代碼倉(cāng)庫(kù)上獲取。

GitHub地址為：http：／／github．com／cassieeric／bigDaaNotes

以上的文章是MapReduce系列的第一篇，下篇預(yù)告是MapReduce的編程模型，敬請(qǐng)期待！

福利

看完后，是否對(duì) MapReduce 有了初步的了解呢？最后送一本電子書給大家《Hadoop的技術(shù)內(nèi)幕：深入解析MapReduce架構(gòu)設(shè)計(jì)及實(shí)現(xiàn)原理》，在公眾號(hào)后臺(tái)回復(fù) MapReduce 關(guān)鍵字即可獲取。

參考資料：

Hadoop的技術(shù)內(nèi)幕：深入解析MapReduce架構(gòu)設(shè)計(jì)及實(shí)現(xiàn)原理

題圖：cosmin Paduraru

<上一頁(yè) 1 2

本地收藏打印推薦給朋友

聲明： 本文由入駐維科號(hào)的作者撰寫，觀點(diǎn)僅代表作者本人，不代表OFweek立場(chǎng)。如有侵權(quán)或其他問題，請(qǐng)聯(lián)系舉報(bào)。

發(fā)表評(píng)論

共0條評(píng)論，0人參與

登錄登錄即可訪問所有OFweek服務(wù)

用戶名/郵箱/手機(jī)：
密碼：
忘記密碼？
用其他賬號(hào)登錄： QQ | 微信 | 新浪微博

請(qǐng)輸入評(píng)論內(nèi)容...

請(qǐng)輸入評(píng)論/評(píng)論長(zhǎng)度6~500個(gè)字

暫無(wú)評(píng)論

暫無(wú)評(píng)論

圖片新聞

最新發(fā)布

最新活動(dòng)更多

一周熱點(diǎn) 月點(diǎn)擊榜

企業(yè)服務(wù) 廣告服務(wù) 獵頭服務(wù) 薪酬報(bào)告

人工智能獵頭職位更多

掃碼關(guān)注公眾號(hào)
OFweek人工智能網(wǎng)
獲取更多精彩內(nèi)容

文章糾錯(cuò)

x

_*文字標(biāo)題：

_*糾錯(cuò)內(nèi)容：

聯(lián)系郵箱：

_*驗(yàn) 證碼：

看不清，點(diǎn)擊換一張

粵公網(wǎng)安備 44030502002758號(hào)

感谢您访问我们的网站，您可能还对以下资源感兴趣：

精品剧情v国产在线观看

精品一区二区三区在线观看视频肉体奉公hd中文字幕看片在线男女h视频

<ul id="6ei0g"></ul>

<th id="6ei0g"></th>

<strike id="6ei0g"><s id="6ei0g"></s></strike>