In order to query hbase table using hive, an external table should be created.
CREATE EXTERNAL TABLE webpage_hive (key string, baseUrl string, status int, prevFetchTime bigint, fetchTime bigint, fetchInterval bigint, retriesSinceFetch int, reprUrl string, content string, contentType string, protocolStatus string, modifiedTime bigint, prevModifiedTime bigint, batchId string, title string, text string, parseStatus int, signature string, prevSignature string, score int, headers map<string,string>, inlinks map<string,string>, outlinks map<string,string>, metadata map<string,string>, markers map<string,string>) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,f:bas,f:st,f:pts#b,f:ts#b,f:fi#b,f:rsf,f:rpr,f:cnt,f:typ,f:prot,f:mod#b,f:pmod#b,f:bid,p:t,p:c,p:st,p:sig,p:psig,s:s,h:,il:,ol:,mtdt:,mk:") TBLPROPERTIES ("hbase.table.name" = "webpage");
after executing this statement columns are created like:
baseurl | string | from deserializer |
batchid | string | from deserializer |
content | string | from deserializer |
contenttype | string | from deserializer |
fetchinterval | bigint | from deserializer |
fetchtime | bigint | from deserializer |
headers | map<string,string> | from deserializer |
inlinks | map<string,string> | from deserializer |
key | string | from deserializer |
markers | map<string,string> | from deserializer |
metadata | map<string,string> | from deserializer |
modifiedtime | bigint | from deserializer |
outlinks | map<string,string> | from deserializer |
parsestatus | int | from deserializer |
prevfetchtime | bigint | from deserializer |
prevmodifiedtime | bigint | from deserializer |
prevsignature | string | from deserializer |
protocolstatus | string | from deserializer |
reprurl | string | from deserializer |
retriessincefetch | int | from deserializer |
score | int | from deserializer |
signature | string | from deserializer |
status | int | from deserializer |
text | string | from deserializer |
title | string | from deserializer |
some of example queries are:
Following query converts bigint epoch to readable date format:
select baseurl,from_unixtime(fetchtime, "[dd/MM/yyyy:HH:mm:ss Z]") AS ft from webpage_hive order by baseurl desc;
Following query explode outlinks in a lateral view and displays as key,value pairs:
SELECT baseurl, outl_key,outl_value FROM webpage_hive LATERAL VIEW explode(outlinks) olTable AS outl_key,outl_value;
Comments
Post a Comment