[关闭]
@zhangyy 2018-03-29T14:49:44.000000Z 字数 1817 阅读 179

hive 的日志案例分析

hive的部分


依据日志文件,参考字段,统计分析每日各时段的pv和uv

  • 建立Hive表,表列分隔符需要与原文件保持一致
  • Load加载数据到Hive表
  • 写Hive sql统计,结果落地到Hive表2
  • 从Hive表2导出结果到mysql表

一: 在hive上面进行规划数据库:

  1. create database db_track;
  2. use db_track;

image_1ak34fc461le01an61l0f1bf4iie9.png-11.1kB


  1. create table track_log(
  2. id string,
  3. url string,
  4. referer string,
  5. keyword string,
  6. type string,
  7. guid string,
  8. pageId string,
  9. moduleId string,
  10. linkId string,
  11. attachedInfo string,
  12. sessionId string,
  13. trackerU string,
  14. trackerType string,
  15. ip string,
  16. trackerSrc string,
  17. cookie string,
  18. orderCode string,
  19. trackTime string,
  20. endUserId string,
  21. firstLink string,
  22. sessionViewNo string,
  23. productId string,
  24. curMerchantId string,
  25. provinceId string,
  26. cityId string,
  27. fee string,
  28. edmActivity string,
  29. edmEmail string,
  30. edmJobId string,
  31. ieVersion string,
  32. platform string,
  33. internalKeyword string,
  34. resultSum string,
  35. currentPage string,
  36. linkPosition string,
  37. buttonPosition string
  38. )
  39. partitioned by (date string,hour string)
  40. row format delimited fields terminated by '\t' ;

  1. load data local inpath '/home/hadoop/2015082818' into table track_log partition(date='20150828',hour='18') ;
  2. load data local inpath '/home/hadoop/2015082819' into table track_log partition(date='20150828',hour='19') ;

  1. select date,hour,count(url),count(distinct guid) from track_log where date='20150828' group by date,hour ;

image_1ak5emolor706pt101a15ca84em.png-16.3kB


  1. create table track_log_tmp(
  2. date string,
  3. hour string,
  4. pv string,
  5. uv string
  6. )
  7. row format delimited fields terminated by '\t';

image_1ak5eomfu18v31vukm4sba1bvc13.png-12.2kB


  1. insert into table track_log_tmp select date,hour,count(url),count(distinct guid) from track_log where date='20150828' group by date,hour ;

image_1ak5et3kv1cm8161015as1heg23n1g.png-17kB


  1. create table track_log(
  2. date varchar(20),
  3. hour varchar(10),
  4. pv varchar(50),
  5. uv varchar(50),
  6. primary key (date,hour)
  7. ) ;

image_1ak5f38v21pbi1dincr1okojbu1t.png-11.1kB


  1. bin/sqoop export \
  2. --connect jdbc:mysql://namenode01.hadoop.com:3306/yangyang \
  3. --username root \
  4. --password 123456 \
  5. --table track_log \
  6. --export-dir /user/hive/warehouse/db_track.db/track_log_tmp \
  7. --input-fields-terminated-by '\t'

image_1ak5f8d0i14jq1bf8163a1ab4in52d.png-49.2kB
image_1ak5f8ud415ji1npfc021c5gnq22q.png-50.7kB


image_1ak5fbn1inlmuj428619hv1sfq37.png-8.8kB


添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注