亚洲十八**毛片_亚洲综合影院_五月天精品一区二区三区_久久久噜噜噜久久中文字幕色伊伊 _欧美岛国在线观看_久久国产精品毛片_欧美va在线观看_成人黄网大全在线观看_日韩精品一区二区三区中文_亚洲一二三四区不卡

代做IEMS 5730、代寫 c++,Java 程序設(shè)計

時間:2024-03-11  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標(biāo)簽:

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關(guān)信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗(yàn)證碼平臺 理財 WPS下載

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045

    91精品国产高清一区二区三区| 国产免费视频| 在线看视频你懂得| 精品国产黄a∨片高清在线| 亚洲午夜一区| 国产精品盗摄一区二区三区| 免费av网页| 高清不卡一区| 青青国产91久久久久久| 欧美日韩中文字幕在线视频| 青青草视频在线观看| 牛牛影视一区二区三区免费看| 精品午夜一区二区三区在线观看| 在线观看成人小视频| 国产在线观看av| 少妇精品久久久一区二区| 99久久婷婷国产| 欧美1819sex性处18免费| 91欧美精品| 久久www免费人成看片高清| 欧洲在线/亚洲| 日本孕妇大胆孕交无码| 亚洲天堂一区二区三区四区| 中文字幕亚洲精品在线观看 | 国产精品美女久久久久aⅴ| 成年人免费视频观看| 精品久久久久久久久久岛国gif| 经典三级在线一区| 精品国产3级a| 在线观看欧美| 国产馆精品极品| 成人免费在线观看网站| 久久久91麻豆精品国产一区| 成人精品小蝌蚪| 日韩加勒比系列| 丝袜美腿综合| 综合分类小说区另类春色亚洲小说欧美 | 国产在线观看网站| 俺要去色综合狠狠| 夜夜夜精品看看| 18网站在线观看| 狂野欧美一区| 2020天天干夜夜爽| 国产女人18毛片水真多18精品 | h1515四虎成人| 国产白丝网站精品污在线入口| 69日本xxxxxxxxx49| 日本一道高清一区二区三区| 亚洲天堂免费在线观看视频| 午夜羞羞小视频在线观看| 首页综合国产亚洲丝袜| 精品福利一区二区三区免费视频| 成人免费直播在线| 亚洲色图一区二区三区| 欧美色图天堂| 国产精品一品二品| 在线观看视频色潮| 欧美精品偷拍| 欧美军同video69gay| 日韩成人在线电影| 久久精品一区二区三区不卡| 日本在线看片免费人成视1000| 玖玖在线精品| 污视频网站免费看| 极品尤物久久久av免费看| 欧美日韩国产a| 国产精品极品| 五月激情综合色| 欧美不卡高清一区二区三区| 久久日韩粉嫩一区二区三区| 国产午夜精品久久久久免费视| 韩国av一区二区| 韩国三级在线观看久| 日韩精品一卡二卡三卡四卡无卡| 黄网在线播放| 亚洲精品激情| h精品动漫在线观看| 一本色道久久综合一区| 九色丨porny丨自拍入口| 在线中文字幕第一区| 天天爽夜夜爽| 欧美精品色网| 国产超碰精品在线观看| 久久久久国产精品一区二区| 日本高清网站| 久久综合中文| 国自产拍在线网站网址视频| 久久99精品一区二区三区三区| 美女做暖暖视频免费在线观看全部网址91| 在线视频免费在线观看一区二区| 成全视频全集| 石原莉奈一区二区三区在线观看| 日本高清中文字幕二区在线| 日韩高清一区二区| 97电影在线看视频| 99re6这里只有精品视频在线观看 99re8在线精品视频免费播放 | 成年人视频在线| 久久精品三级| a视频网址在线观看| 成人av网在线| 538在线观看| 亚洲精品五月天| 香蕉免费一区二区三区在线观看| 色婷婷综合激情| 色777狠狠狠综合伊人| 日本在线免费观看视频| 国产欧美一区二区三区国产幕精品| 日韩欧美国产精品一区二区三区| 羞羞答答国产精品www一本 | 男人的天堂www| 美女久久网站| 黄色网页在线免费观看| 国产婷婷色一区二区三区| 亚洲成人精品综合在线| 欧美制服丝袜第一页| 2023国产精品久久久精品双| 中国黄色在线视频| 成人中文字幕在线| 高清成人在线| 日韩欧美在线免费观看| 欧美极品一区二区三区| 男女网站在线观看| 国产日产欧产精品推荐色| 午夜电影一区| 激情亚洲色图| 国产1区2区3区精品美女| 亚洲精品一区三区三区在线观看| 在线观看区一区二| 亚洲专区免费| 国模私拍一区二区国模曼安| 日韩欧美黄色动漫| 国产一区白浆| 国产精品一品| 蓝色福利精品导航| 国产精选在线| 在线观看免费视频综合| 久久久久国产一区二区| 黄色软件视频在线观看| 欧美性一二三区| 日韩黄色在线观看| 蜜桃视频成人m3u8| 日韩区在线观看| 奇米色777欧美一区二区| 大胆人体一区| 欧美刺激午夜性久久久久久久| 日本三级亚洲精品| 日韩福利一区| 欧美日韩国产一级二级| 久久草av在线| 一区中文字幕| 一二三四社区在线视频| 亚洲人成在线播放网站岛国| 在线成人超碰| xxxcom在线观看| 91精品国产一区二区三区 | 日韩欧美国产一区在线观看| 国产综合成人久久大片91| 精品一区二区三区中文字幕在线| 羞羞的视频免费| 中文字幕av一区二区三区高| 99久久精品费精品国产风间由美| 成人免费在线| 91传媒视频在线播放| 久久国产人妖系列| 中文字幕一区二区三区四区久久 | 欧美片在线播放| 国产一区二区久久| 国产伦精品一区二区三区在线播放| 亚洲精品一区视频| 一区二区三区毛片| 久久一区二区三区超碰国产精品| 精品国产美女a久久9999| 日本激情视频网| 亚洲精品美国一| 青草av.久久免费一区| 色悠久久久久综合先锋影音下载| 欧美在线观看在线观看| 欧美性猛交xxxx乱大交3| 韩国精品一区二区| 精品久久91| 美女日韩欧美| 在线视频观看你懂的| 欧美小视频在线| 波多野结衣在线一区| 亚洲一本二本| 伊人久久大香伊蕉在人线观看热v 伊人久久大香线蕉综合影院首页 伊人久久大香 | 黄色免费网站在线观看| 精品视频在线看| 久久久久久一二三区| 伊人久久亚洲热| 亚洲精品一区在线| 性欧美videoshd高清| 成人看片app| 亚洲国产精品嫩草影院| 国产美女主播视频一区| 日韩精品免费一区二区在线观看| 国产日韩电影| 北条麻妃在线| 五月天亚洲激情| 欧美视频免费在线观看|