Hadoop3.0大数据处理学习4(案例:数据清洗、数据指标统计、任务脚本封装、Sqoop导出Mysql)
终极管理员 知识笔记 39阅读
案例需求分析
直播公司每日都会产生海量的直播数据为了更好地服务主播与用户提高直播质量与用户粘性往往会对大量的数据进行分析与统计从中挖掘商业价值我们将通过一个实战案例来使用Hadoop技术来实现对直播数据的统计与分析。下面是简化的日志文件详细的我会更新在Gitee hadoop_study/hadoopDemo1 · Huathy/study-all/
{id:1580089010000,uid:12001002543,nickname:jack2543,gold:561,watchnumpv:1697,follower:1509,gifter:2920,watchnumuv:5410,length:3542,exp:183}{id:1580089010001,uid:12001001853,nickname:jack1853,gold:660,watchnumpv:8160,follower:1781,gifter:551,watchnumuv:4798,length:189,exp:89}{id:1580089010002,uid:12001003786,nickname:jack3786,gold:14,watchnumpv:577,follower:1759,gifter:2643,watchnumuv:8910,length:1203,exp:54}
原始数据清洗代码 清理无效记录由于原始数据是通过日志方式进行记录的在使用日志采集工具采集到HDFS后还需要对数据进行清洗过滤丢弃缺失字段的数据针对异常字段值进行标准化处理。清除多余字段由于计算时不会用到所有的字段。 编码 DataCleanMap package dataClean;import com.alibaba.fastjson.JSON;import com.alibaba.fastjson.JSONObject;import org.apache.commons.lang3.StringUtils;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;/** * author Huathy * date 2023-10-22 22:15 * description 实现自定义map类在里面实现具体的清洗逻辑 */public class DataCleanMap extends Mapper<LongWritable, Text, Text, Text> { /** * 1. 从原始数据中过滤出来需要的字段 * 2. 针对核心字段进行异常值判断 * * param key * param value * param context * throws IOException * throws InterruptedException */ Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String valStr value.toString(); // 将json字符串数据转换成对象 JSONObject jsonObj JSON.parseObject(valStr); String uid jsonObj.getString(uid); // 这里建议使用getIntValue返回0而不是getInt异常。 int gold jsonObj.getIntValue(gold); int watchnumpv jsonObj.getIntValue(watchnumpv); int follower jsonObj.getIntValue(follower); int length jsonObj.getIntValue(length); // 过滤异常数据 if (StringUtils.isNotBlank(valStr) && (gold * watchnumpv * follower * length) > 0) { // 组装k2v2 Text k2 new Text(); k2.set(uid); Text v2 new Text(); v2.set(gold \t watchnumpv \t follower \t length); context.write(k2, v2); } }}
DataCleanJob package dataClean;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * author Huathy * date 2023-10-22 22:02 * description 数据清洗作业 * 1. 从原始数据中过滤出来需要的字段 * uid gold watchnumpv(总观看)、follower粉丝关注数量、length总时长 * 2. 针对以上五个字段进行判断都不应该丢失或为空否则任务是异常记录丢弃。 * 若个别字段丢失则设置为0. * <p> * 分析 * 1. 由于原始数据是json格式可以使用fastjson对原始数据进行解析获取指定字段的内容 * 2. 然后对获取到的数据进行判断只保留满足条件的数据 * 3. 由于不需要聚合过程只是一个简单的过滤操作所以只需要map阶段即可不需要reduce阶段 * 4. 其中map阶段的k1,v1的数据类型是固定的<LongWritable,Text>k2,v2的数据类型是<Text,Text>k2存储主播IDv2存储核心字段 * 中间用\t制表符分隔即可 */public class DataCleanJob { public static void main(String[] args) throws Exception { System.out.println(inputPath > args[0]); System.out.println(outputPath > args[1]); String path args[0]; String path2 args[1]; // job需要的配置参数 Configuration configuration new Configuration(); // 创建job Job job Job.getInstance(configuration, wordCountJob); // 注意这一行必须设置否则在集群的时候将无法找到Job类 job.setJarByClass(DataCleanJob.class); // 指定输入文件 FileInputFormat.setInputPaths(job, new Path(path)); FileOutputFormat.setOutputPath(job, new Path(path2)); // 指定map相关配置 job.setMapperClass(DataCleanMap.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); // 指定reduce 数量0表示禁用reduce job.setNumReduceTasks(0); // 提交任务 job.waitForCompletion(true); }}
运行 ## 运行命令[rootcent7-1 hadoop-3.2.4]# hadoop jar hadoopDemo1-0.0.1-SNAPSHOT-jar-with-dependencies.jar dataClean.DataCleanJob hdfs://cent7-1:9000/data/videoinfo/231022 hdfs://cent7-1:9000/data/res231022inputPath > hdfs://cent7-1:9000/data/videoinfo/231022outputPath > hdfs://cent7-1:9000/data/res2310222023-10-22 23:16:15,845 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:80322023-10-22 23:16:16,856 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.2023-10-22 23:16:17,041 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1697985525421_00022023-10-22 23:16:17,967 INFO input.FileInputFormat: Total input files to process : 12023-10-22 23:16:18,167 INFO mapreduce.JobSubmitter: number of splits:12023-10-22 23:16:18,873 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1697985525421_00022023-10-22 23:16:18,874 INFO mapreduce.JobSubmitter: Executing with tokens: []2023-10-22 23:16:19,157 INFO conf.Configuration: resource-types.xml not found2023-10-22 23:16:19,158 INFO resource.ResourceUtils: Unable to find resource-types.xml.2023-10-22 23:16:19,285 INFO impl.YarnClientImpl: Submitted application application_1697985525421_00022023-10-22 23:16:19,345 INFO mapreduce.Job: The url to track the job: 23:16:19,346 INFO mapreduce.Job: Running job: job_1697985525421_00022023-10-22 23:16:31,683 INFO mapreduce.Job: Job job_1697985525421_0002 running in uber mode : false2023-10-22 23:16:31,689 INFO mapreduce.Job: map 0% reduce 0 23-10-22 23:16:40,955 INFO mapreduce.Job: map 100% reduce 0 23-10-22 23:16:43,012 INFO mapreduce.Job: Job job_1697985525421_0002 completed successfully2023-10-22 23:16:43,153 INFO mapreduce.Job: Counters: 33File System CountersFILE: Number of bytes read0FILE: Number of bytes written238970FILE: Number of read operations0FILE: Number of large read operations0FILE: Number of write operations0HDFS: Number of bytes read24410767HDFS: Number of bytes written1455064HDFS: Number of read operations7HDFS: Number of large read operations0HDFS: Number of write operations2HDFS: Number of bytes read erasure-coded0Job Counters Launched map tasks1Data-local map tasks1Total time spent by all maps in occupied slots (ms)7678Total time spent by all reduces in occupied slots (ms)0Total time spent by all map tasks (ms)7678Total vcore-milliseconds taken by all map tasks7678Total megabyte-milliseconds taken by all map tasks7862272Map-Reduce FrameworkMap input records90000Map output records46990Input split bytes123Spilled Records0Failed Shuffles0Merged Map outputs0GC time elapsed (ms)195CPU time spent (ms)5360Physical memory (bytes) snapshot302153728Virtual memory (bytes) snapshot2588925952Total committed heap usage (bytes)214958080Peak Map Physical memory (bytes)302153728Peak Map Virtual memory (bytes)2588925952File Input Format Counters Bytes Read24410644File Output Format Counters Bytes Written1455064[rootcent7-1 hadoop-3.2.4]# ## 统计输出文件行数[rootcent7-1 hadoop-3.2.4]# hdfs dfs -cat hdfs://cent7-1:9000/data/res231022/* | wc -l46990## 查看原始数据记录数[rootcent7-1 hadoop-3.2.4]# hdfs dfs -cat hdfs://cent7-1:9000/data/videoinfo/231022/* | wc -l90000
数据指标统计 对数据中的金币数量总观看PV粉丝关注数量视频总时长等指标进行统计涉及四个字段为了后续方便可以自定义Writable统计每天开播时长最长的前10名主播以及对应的开播时长 自定义Writeable代码实现 由于原始数据涉及多个需要统计的字段可以将这些字段统一的记录在一个自定义的数据类型中方便使用

package videoinfo;import org.apache.hadoop.io.Writable;import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;/** * author Huathy * date 2023-10-22 23:32 * description 自定义数据类型为了保存主播相关核心字段方便后期维护 */public class VideoInfoWriteable implements Writable { private long gold; private long watchnumpv; private long follower; private long length; public void set(long gold, long watchnumpv, long follower, long length) { this.gold gold; this.watchnumpv watchnumpv; this.follower follower; this.length length; } public long getGold() { return gold; } public long getWatchnumpv() { return watchnumpv; } public long getFollower() { return follower; } public long getLength() { return length; } Override public void write(DataOutput dataOutput) throws IOException { dataOutput.writeLong(gold); dataOutput.writeLong(watchnumpv); dataOutput.writeLong(follower); dataOutput.writeLong(length); } Override public void readFields(DataInput dataInput) throws IOException { this.gold dataInput.readLong(); this.watchnumpv dataInput.readLong(); this.follower dataInput.readLong(); this.length dataInput.readLong(); } Override public String toString() { return gold \t watchnumpv \t follower \t length; }}
基于主播维度 videoinfo VideoInfoJob package videoinfo;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * author Huathy * date 2023-10-22 23:27 * description 数据指标统计作业 * 1. 基于主播进行统计统计每个主播在当天收到的总金币数量总观看PV总粉丝关注量总视频开播市场 * 分析 * 1. 为了方便统计主播的指标数据吗最好是把这些字段整合到一个对象中这样维护方便 * 这样就需要自定义Writeable * 2. 由于在这里需要以主播维度进行数据的聚合所以需要以主播ID作为KEY进行聚合统计 * 3. 所以Map节点的<k2,v2>是<Text自定义Writeable> * 4. 由于需要聚合所以Reduce阶段也需要 */public class VideoInfoJob { public static void main(String[] args) throws Exception { System.out.println(inputPath > args[0]); System.out.println(outputPath > args[1]); String path args[0]; String path2 args[1]; // job需要的配置参数 Configuration configuration new Configuration(); // 创建job Job job Job.getInstance(configuration, VideoInfoJob); // 注意这一行必须设置否则在集群的时候将无法找到Job类 job.setJarByClass(VideoInfoJob.class); // 指定输入文件 FileInputFormat.setInputPaths(job, new Path(path)); FileOutputFormat.setOutputPath(job, new Path(path2)); // 指定map相关配置 job.setMapperClass(VideoInfoMap.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); // 指定reduce job.setReducerClass(VideoInfoReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); // 提交任务 job.waitForCompletion(true); }}
VideoInfoMap package videoinfo;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;/** * author Huathy * date 2023-10-22 23:31 * description 实现自定义Map类在这里实现核心字段的拼接 */public class VideoInfoMap extends Mapper<LongWritable, Text, Text, VideoInfoWriteable> { Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { // 读取清洗后的每一行数据 String line value.toString(); String[] fields line.split(\t); String uid fields[0]; long gold Long.parseLong(fields[1]); long watchnumpv Long.parseLong(fields[1]); long follower Long.parseLong(fields[1]); long length Long.parseLong(fields[1]); // 组装K2 V2 Text k2 new Text(); k2.set(uid); VideoInfoWriteable v2 new VideoInfoWriteable(); v2.set(gold, watchnumpv, follower, length); context.write(k2, v2); }}
VideoInfoReduce package videoinfo;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import java.io.IOException;/** * author Huathy * date 2023-10-22 23:31 * description 实现自定义Map类在这里实现核心字段的拼接 */public class VideoInfoReduce extends Reducer<Text, VideoInfoWriteable, Text, VideoInfoWriteable> { Override protected void reduce(Text key, Iterable<VideoInfoWriteable> values, Context context) throws IOException, InterruptedException { // 从v2s中把相同key的value取出来进行累加求和 long goldSum 0; long watchNumPvSum 0; long followerSum 0; long lengthSum 0; for (VideoInfoWriteable v2 : values) { goldSum v2.getGold(); watchNumPvSum v2.getWatchnumpv(); followerSum v2.getFollower(); lengthSum v2.getLength(); } // 组装k3 v3 VideoInfoWriteable videoInfoWriteable new VideoInfoWriteable(); videoInfoWriteable.set(goldSum, watchNumPvSum, followerSum, lengthSum); context.write(key, videoInfoWriteable); }}
基于主播的TOPN计算 VideoInfoTop10Job package top10;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * author Huathy * date 2023-10-23 21:27 * description 数据指标统计作业 * 需求统计每天开播时长最长的前10名主播以及时长信息 * 分析 * 1. 为了统计每天开播时长最长的前10名主播信息需要在map阶段获取数据中每个主播的ID和直播时长 * 2. 所以map阶段的k2 v2 为Text LongWriteable * 3. 在reduce阶段对相同主播的时长进行累加求和将这些数据存储到一个临时的map中 * 4. 在reduce阶段的cleanup函数(最后执行)中对map集合的数据进行排序处理 * 5. 在cleanup函数中把直播时长最长的前10名主播信息写出到文件中 * setup函数在reduce函数开始执行一次而cleanup在结束时执行一次 */public class VideoInfoTop10Job { public static void main(String[] args) throws Exception { System.out.println(inputPath > args[0]); System.out.println(outputPath > args[1]); String path args[0]; String path2 args[1]; // job需要的配置参数 Configuration configuration new Configuration(); // 从输入路径来获取日期 String[] fields path.split(/); String tmpdt fields[fields.length - 1]; System.out.println(日期 tmpdt); // 生命周期的配置 configuration.set(dt, tmpdt); // 创建job Job job Job.getInstance(configuration, VideoInfoTop10Job); // 注意这一行必须设置否则在集群的时候将无法找到Job类 job.setJarByClass(VideoInfoTop10Job.class); // 指定输入文件 FileInputFormat.setInputPaths(job, new Path(path)); FileOutputFormat.setOutputPath(job, new Path(path2)); job.setMapperClass(VideoInfoTop10Map.class); job.setReducerClass(VideoInfoTop10Reduce.class); // 指定map相关配置 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); // 指定reduce job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); // 提交任务 job.waitForCompletion(true); }}
VideoInfoTop10Map package top10;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;/** * author Huathy * date 2023-10-23 21:32 * description 自定义map类在这里实现核心字段的拼接 */public class VideoInfoTop10Map extends Mapper<LongWritable, Text, Text, LongWritable> { Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { // 读取清洗之后的每一行数据 String line key.toString(); String[] fields line.split(\t); String uid fields[0]; long length Long.parseLong(fields[4]); Text k2 new Text(); k2.set(uid); LongWritable v2 new LongWritable(); v2.set(length); context.write(k2, v2); }}
VideoInfoTop10Reduce package top10;import cn.hutool.core.collection.CollUtil;import org.apache.commons.collections.CollectionUtils;import org.apache.commons.collections.MapUtils;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;import java.io.IOException;import java.util.*;/** * author Huathy * date 2023-10-23 21:37 * description */public class VideoInfoTop10Reduce extends Reducer<Text, LongWritable, Text, LongWritable> { // 保存主播ID和开播时长 Map<String, Long> map new HashMap<>(); Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { String k2 key.toString(); long lengthSum 0; for (LongWritable v2 : values) { lengthSum v2.get(); } map.put(k2, lengthSum); } /** * 任务初始化的时候执行一次一般在里面做一些初始化资源连接的操作。mysql、redis连接操作 * * param context * throws IOException * throws InterruptedException */ Override protected void setup(Context context) throws IOException, InterruptedException { System.out.println(setup method running...); System.out.println(context: context); super.setup(context); } /** * 任务结束的时候执行一次做关闭资源连接操作 * * param context * throws IOException * throws InterruptedException */ Override protected void cleanup(Context context) throws IOException, InterruptedException { // 获取日期 Configuration configuration context.getConfiguration(); String date configuration.get(dt); // 排序 LinkedHashMap<String, Long> sortMap CollUtil.sortByEntry(map, new Comparator<Map.Entry<String, Long>>() { Override public int compare(Map.Entry<String, Long> o1, Map.Entry<String, Long> o2) { return -o1.getValue().compareTo(o2.getValue()); } }); Set<Map.Entry<String, Long>> entries sortMap.entrySet(); Iterator<Map.Entry<String, Long>> iterator entries.iterator(); // 输出 int count 1; while (count < 10 && iterator.hasNext()) { Map.Entry<String, Long> entry iterator.next(); String key entry.getKey(); Long value entry.getValue(); // 封装K3 V3 Text k3 new Text(date \t key); LongWritable v3 new LongWritable(value); // 统计的时候还应该传入日期来用来输出统计的时间而不是获取当前时间可能是统计历史 context.write(k3, v3); count; } }}
任务定时脚本封装 任务依赖关系数据指标统计top10统计以及播放数据统计依赖数据清洗作业
将任务提交命令进行封装方便调用便于定时任务调度
sh -x data_clean.sh
任务执行结果监控 针对任务执行的结果进行检测如果执行失败则重试任务同时发送告警信息。

#!/bin/bash# 建议使用bin/bash形式# 判读用户是否输入日期如果没有则默认获取昨天日期。需要隔几天重跑灵活的指定日期if [ x$1 x ]; then yes_time$(date %y%m%d --date1 days ago)else yes_time$1fijobs_home/home/jobscleanjob_inputhdfs://cent7-1:9000/data/videoinfo/${yes_time}cleanjob_outputhdfs://cent7-1:9000/data/videoinfo_clean/${yes_time}videoinfojob_input${cleanjob_output}videoinfojob_outputhdfs://cent7-1:9000/res/videoinfoJob/${yes_time}top10job_input${cleanjob_output}top10job_outputhdfs://cent7-1:9000/res/top10/${yes_time}# 删除输出目录为了兼容脚本重跑hdfs dfs -rm -r ${cleanjob_output}# 执行数据清洗任务hadoop jar ${jobs_home}/hadoopDemo1-0.0.1-SNAPSHOT-jar-with-dependencies.jar \ dataClean.DataCleanJob \ ${cleanjob_input} ${cleanjob_output}# 判断数据清洗任务是否成功hdfs dfs -ls ${cleanjob_output}/_SUCCESS# echo $? 可以获取上一个命令的执行结果0成功否则失败if [ $? 0 ]; then echo clean job execute success .... # 删除输出目录为了兼容脚本重跑 hdfs dfs -rm -r ${videoinfojob_output} hdfs dfs -rm -r ${top10job_output} # 执行指标统计任务1 echo execute VideoInfoJob .... hadoop jar ${jobs_home}/hadoopDemo1-0.0.1-SNAPSHOT-jar-with-dependencies.jar \ videoinfo.VideoInfoJob \ ${videoinfojob_input} ${videoinfojob_output} hdfs dfs -ls ${videoinfojob_output}/_SUCCESS if [ $? ! 0 ] then echo VideoInfoJob execute failed .... fi # 指定指标统计任务2 echo execute VideoInfoTop10Job .... hadoop jar ${jobs_home}/hadoopDemo1-0.0.1-SNAPSHOT-jar-with-dependencies.jar \ top10.VideoInfoTop10Job \ ${top10job_input} ${top10job_output} hdfs dfs -ls ${top10job_output}/_SUCCESS if [ $? ! 0 ] then echo VideoInfoJob execute failed .... fielse echo clean job execute failed ... date time is ${yes_time} # 给管理员发送短信、邮件 # 可以在while进行重试fi
使用Sqoop将计算结果导出到MySQL Sqoop可以快速的实现hdfs-mysql的导入导出
快速安装Sqoop工具 数据导出功能开发使用Sqoop将MapReduce计算的结果导出到Mysql中 导出命令sqoop export \--connect jdbc:mysql://192.168.56.101:3306/data?serverTimezoneUTC&useSSLfalse \--username hdp \--password admin \--table top10 \--export-dir /res/top10/231022 \--input-fields-terminated-by \t
导出日志 [rootcent7-1 sqoop-1.4.7.bin_hadoop-2.6.0]# sqoop export \> --connect jdbc:mysql://192.168.56.101:3306/data?serverTimezoneUTC&useSSLfalse \> --username hdp \> --password admin \> --table top10 \> --export-dir /res/top10/231022 \> --input-fields-terminated-by \tWarning: /home/sqoop-1.4.7.bin_hadoop-2.6.0//../hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /home/sqoop-1.4.7.bin_hadoop-2.6.0//../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.2023-10-24 23:42:09,452 INFO sqoop.Sqoop: Running Sqoop version: 1.4.72023-10-24 23:42:09,684 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.2023-10-24 23:42:09,997 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.2023-10-24 23:42:10,022 INFO tool.CodeGenTool: Beginning code generationLoading class com.mysql.jdbc.Driver. This is deprecated. The new driver class is com.mysql.cj.jdbc.Driver. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.2023-10-24 23:42:10,921 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM top10 AS t LIMIT 12023-10-24 23:42:11,061 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM top10 AS t LIMIT 12023-10-24 23:42:11,084 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop-3.2.4注: /tmp/sqoop-root/compile/6d507cd9a1a751990abfd7eef20a60c2/top10.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。2023-10-24 23:42:23,932 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/6d507cd9a1a751990abfd7eef20a60c2/top10.jar2023-10-24 23:42:23,972 INFO mapreduce.ExportJobBase: Beginning export of top102023-10-24 23:42:23,972 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address2023-10-24 23:42:24,237 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar2023-10-24 23:42:27,318 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative2023-10-24 23:42:27,325 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative2023-10-24 23:42:27,326 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps2023-10-24 23:42:27,641 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:80322023-10-24 23:42:29,161 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1698153196891_00152023-10-24 23:42:39,216 INFO input.FileInputFormat: Total input files to process : 12023-10-24 23:42:39,231 INFO input.FileInputFormat: Total input files to process : 12023-10-24 23:42:39,387 INFO mapreduce.JobSubmitter: number of splits:42023-10-24 23:42:39,475 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative2023-10-24 23:42:40,171 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1698153196891_00152023-10-24 23:42:40,173 INFO mapreduce.JobSubmitter: Executing with tokens: []2023-10-24 23:42:40,660 INFO conf.Configuration: resource-types.xml not found2023-10-24 23:42:40,660 INFO resource.ResourceUtils: Unable to find resource-types.xml.2023-10-24 23:42:41,073 INFO impl.YarnClientImpl: Submitted application application_1698153196891_00152023-10-24 23:42:41,163 INFO mapreduce.Job: The url to track the job: 23:42:41,164 INFO mapreduce.Job: Running job: job_1698153196891_00152023-10-24 23:43:02,755 INFO mapreduce.Job: Job job_1698153196891_0015 running in uber mode : false2023-10-24 23:43:02,760 INFO mapreduce.Job: map 0% reduce 0 23-10-24 23:43:23,821 INFO mapreduce.Job: map 25% reduce 0 23-10-24 23:43:25,047 INFO mapreduce.Job: map 50% reduce 0 23-10-24 23:43:26,069 INFO mapreduce.Job: map 75% reduce 0 23-10-24 23:43:27,088 INFO mapreduce.Job: map 100% reduce 0 23-10-24 23:43:28,112 INFO mapreduce.Job: Job job_1698153196891_0015 completed successfully2023-10-24 23:43:28,266 INFO mapreduce.Job: Counters: 33File System CountersFILE: Number of bytes read0FILE: Number of bytes written993808FILE: Number of read operations0FILE: Number of large read operations0FILE: Number of write operations0HDFS: Number of bytes read1297HDFS: Number of bytes written0HDFS: Number of read operations19HDFS: Number of large read operations0HDFS: Number of write operations0HDFS: Number of bytes read erasure-coded0Job Counters Launched map tasks4Data-local map tasks4Total time spent by all maps in occupied slots (ms)79661Total time spent by all reduces in occupied slots (ms)0Total time spent by all map tasks (ms)79661Total vcore-milliseconds taken by all map tasks79661Total megabyte-milliseconds taken by all map tasks81572864Map-Reduce FrameworkMap input records10Map output records10Input split bytes586Spilled Records0Failed Shuffles0Merged Map outputs0GC time elapsed (ms)3053CPU time spent (ms)11530Physical memory (bytes) snapshot911597568Virtual memory (bytes) snapshot10326462464Total committed heap usage (bytes)584056832Peak Map Physical memory (bytes)238632960Peak Map Virtual memory (bytes)2584969216File Input Format Counters Bytes Read0File Output Format Counters Bytes Written02023-10-24 23:43:28,282 INFO mapreduce.ExportJobBase: Transferred 1.2666 KB in 60.9011 seconds (21.2968 bytes/sec)2023-10-24 23:43:28,291 INFO mapreduce.ExportJobBase: Exported 10 records.
标签: