免费观看又色又爽又黄的小说免费_美女福利视频国产片_亚洲欧美精品_美国一级大黄大色毛片

Hadoop綜合項目——二手房統計分析(MapReduce篇)-創新互聯

Hadoop綜合項目——二手房統計分析(MapReduce篇)

成都創新互聯是一家集網站建設,臺山企業網站建設,臺山品牌網站建設,網站定制,臺山網站建設報價,網絡營銷,網絡優化,臺山網站推廣為一體的創新建站企業,幫助傳統企業提升企業形象加強企業競爭力。可充分滿足這一群體相比中小企業更為豐富、高端、多元的互聯網需求。同時我們時刻保持專業、時尚、前沿,時刻以成就客戶成長自我,堅持不斷學習、思考、沉淀、凈化自己,讓我們為更多的企業打造出實用型網站。文章目錄
  • Hadoop綜合項目——二手房統計分析(MapReduce篇)
    • 0、 寫在前面
    • 1、MapReduce統計分析
      • 1.1 統計四大一線城市房價的最值
      • 1.2 按照城市分區統計二手房數量
      • 1.3 根據二手房信息發布時間排序統計
      • 1.4 統計二手房四大一線城市總價Top5
      • 1.5 基于二手房總價實現自定義分區全排序
      • 1.6 基于建造年份和房子總價的二次排序
      • 1.7 自定義類統計二手房地理位置對應數量
      • 1.8 統計二手房標簽的各類比例
    • 2、數據及源代碼
    • 3、總結


在這里插入圖片描述


0、 寫在前面
  • Windows版本:Windows10
  • Linux版本:Ubuntu Kylin 16.04
  • JDK版本:Java8
  • Hadoop版本:Hadoop-2.7.1
  • Hive版本:Hive1.2.2
  • IDE:IDEA 2020.2.3
  • IDE:Pycharm 2021.1.3
  • IDE:Eclipse3.8
1、MapReduce統計分析

通過MapReduce對最值、排序、TopN、自定義分區排序、二次排序、自定義類、占比等8個方面的統計分析

1.1 統計四大一線城市房價的最值
  • 分析目的:

二手房房價的最值是體現一個城市經濟的重要因素,也是顧客購買的衡量因素之一。

  • 代碼:

Driver端:

public class MaxMinTotalPriceByCityDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "MaxMinTotalPriceByCity");
        job.setJarByClass(MaxMinTotalPriceByCityDriver.class);
        job.setMapperClass(MaxMinTotalPriceByCityMapper.class);
        job.setReducerClass(MaxMinTotalPriceByCityReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.setInputPaths(job, new Path("datas/tb_house.txt"));
        FileOutputFormat.setOutputPath(job, new Path("MapReduce/out/MaxMinTotalPriceByCity"));
        job.waitForCompletion(true);
    }
}
  • Mapper端:
public class MaxMinTotalPriceByCityMapper extends Mapper{
    private Text outk = new Text();
    private IntWritable outv = new IntWritable();
    @Override
    protected void map(Object key, Text value, Context out) throws IOException, InterruptedException {
        String line = value.toString();
        String[] data = line.split("\t");
        outk.set(data[1]);      // city
        outv.set(Integer.parseInt(data[6]));        // total
        out.write(outk, outv);
    }
}

Reducer端:

public class MaxMinTotalPriceByCityReducer extends Reducer{
    @Override
    protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
        ListtotalList = new ArrayList();
        Iteratoriterator = values.iterator();
        while (iterator.hasNext()) {
            totalList.add(iterator.next().get());
        }
        Collections.sort(totalList);
        int max = totalList.get(totalList.size() - 1);
        int min = totalList.get(0);
        Text outv = new Text();
        outv.set("房子總價大、小值分別為:" + String.valueOf(max) + "萬元," + String.valueOf(min) + "萬元");
        context.write(key, outv);
    }
}
  • 運行情況:

tp

  • 結果:

    tp

1.2 按照城市分區統計二手房數量
  • 分析目的:

二手房的數量是了解房子基本情況的維度之一,數量的多少在一定程度上體現了房子的受歡迎度。

  • 代碼:

tp

Driver端:

public class HouseCntByCityDriver {
    public static void main(String[] args) throws Exception {
        args = new String[] { "/input/datas/tb_house.txt", "/output/HouseCntByCity" };
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://node01:9000");
        Job job = Job.getInstance(conf, "HouseCntByCity");
        job.setJarByClass(HouseCntByCityDriver.class);
        job.setMapperClass(HouseCntByCityMapper.class);
        job.setReducerClass(HouseCntByCityReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setPartitionerClass(CityPartitioner.class);
        job.setNumReduceTasks(4);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.waitForCompletion(true);
    }
}

Mapper端:

public class HouseCntByCityMapper extends Mapper{
    private Text outk = new Text();
    private IntWritable outv = new IntWritable(1);
    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] data = line.split("\t");
        outk.set(new Text(data[1]));
        context.write(outk, outv);
    }
}

Reducer端:

public class HouseCntByCityReducer extends Reducer{
    @Override
    protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) sum += val.get();
        context.write(key, new IntWritable(sum));
    }
}
  • 運行情況:

tp

  • 結果:

在這里插入圖片描述

1.3 根據二手房信息發布時間排序統計
  • 分析目的:

二手房的信息發布時間是了解房子基本情況的維度之一,在一定程度上,顧客傾向于最新的房源信息。

  • 代碼:

Driver端:

public class AcessHousePubTimeSortDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration()
        Job job = Job.getInstance(conf, "AcessHousePubTimeSort");
        job.setJarByClass(AcessHousePubTimeSortDriver.class);
        job.setMapperClass(AcessHousePubTimeSortMapper.class);
        job.setReducerClass(AcessHousePubTimeSortReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.setInputPaths(job, new Path("datas/tb_house.txt"));
        FileOutputFormat.setOutputPath(job, new Path("MapReduce/out/AcessHousePubTimeSort"));
        job.waitForCompletion(true);
    }
}

Mapper端:

public class AcessHousePubTimeSortMapper extends Mapper{
    private Text outk = new Text();
    private IntWritable outv = new IntWritable(1);
    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String lines = value.toString();
        String data[] = lines.split("\t");
        String crawler_time = data[9], followInfo = data[4];
        String ct = crawler_time.substring(0, 10);
        int idx1 = followInfo.indexOf("|"), idx2 = followInfo.indexOf("發");
        String timeStr = followInfo.substring(idx1 + 1, idx2);
        String pubDate = "";
        try {
            pubDate = getPubDate(ct, timeStr);
        } catch (ParseException e) {
            e.printStackTrace();
        }
        outk.set(new Text(pubDate));
        context.write(outk, outv);
    }
    public String getPubDate(String ct, String timeStr) throws ParseException{
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
        Date getTime = sdf.parse(ct);
        String getDate = sdf.format(getTime);
        Calendar calendar = Calendar.getInstance();
        calendar.setTime(getTime);
        if (timeStr.equals("今天")) {
            calendar.add(Calendar.DAY_OF_WEEK,-0);
        } else if (timeStr.contains("天")) {
            int i = 0;
            while (Character.isDigit(timeStr.charAt(i))) i++;
            int size = Integer.parseInt(timeStr.substring(0, i));
            calendar.add(Calendar.DAY_OF_WEEK, -size);
        } else {
            int i = 0;
            while (Character.isDigit(timeStr.charAt(i))) i++; 
            int size = Integer.parseInt(timeStr.substring(0, i));
            calendar.add(Calendar.MONTH, -size);
        }
        Date pubTime = calendar.getTime();
        String pubDate = sdf.format(pubTime);
        return pubDate;
    }
}

Reducer端:

public class AcessHousePubTimeSortReducer extends Reducer{
    @Override
    protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) sum += val.get();
        context.write(key, new IntWritable(sum));
    }
}
  • 運行情況:

在這里插入圖片描述

  • 結果:

tp

1.4 統計二手房四大一線城市總價Top5
  • 分析目的:

TopN是MapReduce分析最常見且必不可少的一個例子。

  • 代碼:

Driver端:

public class TotalPriceTop5ByCityDriver {
    public static void main(String[] args) throws Exception {
        args = new String[] {  "datas/tb_house.txt", "MapReduce/out/TotalPriceTop5ByCity" };
        Configuration conf = new Configuration();
        if (args.length != 2) {
            System.err.println("Usage: TotalPriceTop5ByCity");
            System.exit(2);
        }
        Job job = Job.getInstance(conf);
        job.setJarByClass(TotalPriceTop5ByCityDriver.class);
        job.setMapperClass(TotalPriceTop5ByCityMapper.class);
        job.setReducerClass(TotalPriceTop5ByCityReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setNumReduceTasks(1);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Mapper端:

public class TotalPriceTop5ByCityMapper extends Mapper{
    private int cnt = 1;
    private Text outk = new Text();
    private IntWritable outv = new IntWritable();
    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] data = line.split("\t");
        String city = data[1], totalPrice = data[6];
        outk.set(data[1]);
        outv.set(Integer.parseInt(data[6]));
        context.write(outk, outv);
    }
}

Reducer端:

public class TotalPriceTop5ByCityReducer extends Reducer{
   private Text outv = new Text();
   private int len = 0;
    @Override
    protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
        ListtotalPriceList = new ArrayList();
        Iteratoriterator = values.iterator();
        while (iterator.hasNext()) {
            totalPriceList.add(iterator.next().get());
        }
        Collections.sort(totalPriceList);
        int size = totalPriceList.size();
        String top5Str = "二手房總價Top5:";
        for (int i = 1; i<= 5; i++) {
            if (i == 5) {
                top5Str += totalPriceList.get(size - i) + "萬元";
            } else {
                top5Str += totalPriceList.get(size - i) + "萬元, ";
            }
        }
        outv.set(String.valueOf(top5Str));
        context.write(key, outv);
    }
}
  • 運行情況:

tp

  • 結果:

tp

1.5 基于二手房總價實現自定義分區全排序
  • 分析目的:

自定義分區全排序可以實現不同于以往的排序方式,展示效果與默認全排序可以體現出一定的差別。

  • 代碼:
public class TotalOrderingPartition extends Configured implements Tool {
    static class SimpleMapper extends Mapper{
        @Override
        protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            IntWritable intWritable = new IntWritable(Integer.parseInt(key.toString()));
            context.write((Text) key, intWritable);
        }
    }
    static class SimpleReducer extends Reducer{
        @Override
        protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
            for (IntWritable value : values) {
                context.write(value, NullWritable.get());
            }
        }
    }
    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        Job job = Job.getInstance(conf, "Total Order Sorting");
        job.setJarByClass(TotalOrderingPartition.class);
        job.setInputFormatClass(KeyValueTextInputFormat.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.setNumReduceTasks(3);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(NullWritable.class);
        TotalOrderPartitioner.setPartitionFile(job.getConfiguration(), new Path(args[2]));
        InputSampler.Samplersampler = new InputSampler.SplitSampler(5000, 10);
        InputSampler.writePartitionFile(job, sampler);
        job.setPartitionerClass(TotalOrderPartitioner.class);
        job.setMapperClass(SimpleMapper.class);
        job.setReducerClass(SimpleReducer.class);
        job.setJobName("TotalOrderingPartition");
        return job.waitForCompletion(true) ? 0 : 1;
    }
    public static void main(String[] args) throws Exception {
        args = new String[] { "datas/tb_house.txt", "MapReduce/out/TotalOrderingPartition/outPartition1", "MapReduce/out/TotalOrderingPartition/outPartition2" };
        int exitCode = ToolRunner.run(new TotalOrderingPartition(), args);
        System.exit(exitCode);
    }
}
  • 運行情況:

在這里插入圖片描述

  • 結果:

在這里插入圖片描述

在這里插入圖片描述


在這里插入圖片描述


在這里插入圖片描述

1.6 基于建造年份和房子總價的二次排序
  • 分析目的:

某些時候按照一個字段的排序方式并不能讓我們滿意,二次排則是解決這個問題的一個方法。

  • 代碼:

Driver端:

tp

Mapper端:

在這里插入圖片描述

Reducer端:

在這里插入圖片描述

  • 運行情況:

tp

  • 結果:

tp

1.7 自定義類統計二手房地理位置對應數量
  • 分析目的:

某些字段通過MapReduce不可以直接統計得到,這時采用自定義類的方式便可以做到。

  • 代碼:

自定義類:

public class HouseCntByPositionTopListBean implements Writable {
    private Text info;
    private IntWritable cnt;
    public Text getInfo() {
        return info;
    }
    public void setInfo(Text info) {
        this.info = info;
    }
    public IntWritable getCnt() {
        return cnt;
    }
    public void setCnt(IntWritable cnt) {
        this.cnt = cnt;
    }
    @Override
    public void readFields(DataInput in) throws IOException {
        this.cnt = new IntWritable(in.readInt());
    }
    @Override
    public void write(DataOutput out) throws IOException {
        out.writeInt(cnt.get());
    }
    @Override
    public String toString() {
        String infoStr = info.toString();
        int idx = infoStr.indexOf("-");
        String city = infoStr.substring(0, idx);
        String position = infoStr.substring(idx + 1);
        return city + "#" + "[" + position + "]" + "#" + cnt;
    }
}

Driver端:

在這里插入圖片描述

Mapper端:

tp

Reducer端:

tp

  • 運行情況:

在這里插入圖片描述

  • 結果:

在這里插入圖片描述

在這里插入圖片描述
在這里插入圖片描述

1.8 統計二手房標簽的各類比例
  • 分析目的:

占比分析同樣是MapReduce統計分析的一大常用方式。

  • 代碼:

Driver端:

public class TagRatioByCityDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        args = new String[] {"datas/tb_house.txt", "MapReduce/out/TagRatioByCity" };
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        job.setJarByClass(TagRatioByCityDriver.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setMapperClass(TagRatioByCityMapper.class);
        job.setReducerClass(TagRatioByCityReducer.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.waitForCompletion(true);
    }
}

Mapper端:

public class TagRatioByCityMapper extends Mapper{
    private Text outk = new Text();
    private IntWritable outv = new IntWritable(1);
    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] data = line.split("\t");
        String city = data[1], tag = data[8];
        if ("".equals(tag))  tag = "未知標簽";
        outk.set(city + "-" + tag);
        context.write(outk, outv);
    }
}

Reducer端:

public class TagRatioByCityReducer extends Reducer{
    private Text outv = new Text();
    private int sum = 0;
    @Override
    protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
        DecimalFormat df = new DecimalFormat("0.00");
        int cnt = 0;
        for (IntWritable value : values) {
            cnt += value.get();
        }
        String s = key.toString();
        String format = "";
        if (s.contains("上海")) {
            sum = 2995;
            format = df.format((double) cnt / sum * 100) + "%";
        } else if (s.contains("北京")) {
            sum = 2972;
            format = df.format((double) cnt / sum * 100) + "%";
        } else if (s.contains("廣州")) {
            sum = 2699;
            format = df.format((double) cnt / sum * 100) + "%";
        } else {
            sum = 2982;
            format = df.format((double) cnt / sum * 100) + "%";
        }
        outv.set(format);
        context.write(key, outv);
    }
}
  • 運行情況:

在這里插入圖片描述

  • 結果:

tp

2、數據及源代碼
  • Github

  • Gitee

3、總結

MapReduce統計分析過程需要比較細心,「根據二手房信息發布時間排序統計」這個涉及到Java中日期類SimpleDateFormatDate的使用,需要慢慢調試得出結果;統計最值和占比的難度并不高,主要在于統計要計算的類別的數量和總數量,最后二者相處即可;二次排序和自定義類難度較高,但一步一步來還是可以實現的。

結束!

在這里插入圖片描述

你是否還在尋找穩定的海外服務器提供商?創新互聯www.cdcxhl.cn海外機房具備T級流量清洗系統配攻擊溯源,準確流量調度確保服務器高可用性,企業級服務器適合批量采購,新人活動首月15元起,快前往官網查看詳情吧

本文題目:Hadoop綜合項目——二手房統計分析(MapReduce篇)-創新互聯
網站鏈接:http://m.newbst.com/article12/dcjddc.html

成都網站建設公司_創新互聯,為您提供網頁設計公司建站公司軟件開發用戶體驗品牌網站建設外貿網站建設

廣告

聲明:本網站發布的內容(圖片、視頻和文字)以用戶投稿、用戶轉載內容為主,如果涉及侵權請盡快告知,我們將會在第一時間刪除。文章觀點不代表本網站立場,如需處理請聯系客服。電話:028-86922220;郵箱:631063699@qq.com。內容未經允許不得轉載,或轉載時需注明來源: 創新互聯

成都網站建設公司