Posted by: Wildan Maulana | January 29, 2009

HBase

HBase adalah database berbasis Hadoop. Bersifat open-source, distributed, penyimpanan datanya berorientasi kolom,  dimodelkan berdasarkan sebuah paper dari Google , Bigtable: A Distributed Storeage System for Structured Data oleh Chang et al. Seperti halnya Bigtable yang  penyimpan datanya   terdistribuasi melalui Google File System, HBase menyediakan kemampuan yang serupa Bigtable di atas infrastruktur Hadoop.

Tujuan HBase adalah untuk menyimpan tabel-tabel yang sangat besar – miliaran baris X miliaran kolom — diatas cluster-cluster yang menggunakan komoditi hardware [2]. Cobalah HBase jika Anda berencana untuk menyimpan dan menjalankan data yang besar.

Petunjuk Instalasi Singkat

  1. Checkout Hbase terbaru dari subversion repository-nya, gunakan svn eclipse plugin ataupun dengan command line,svn co http://svn.apache.org/repos/asf/hadoop/hbase/trunk hbase
  2. buat file build.propertiesdist.dir=/opt/hbase-trunk
  3. To build HBase
    ant package
  4. to build the docs :ant docs  -Djava5.home=/usr/lib/jvm/java-1.5.0-sun -Dforrest.home=/opt/apache-forrest-0.8/
  5. Buat symlink :ln -s /opt/hbase-trunk/ /opt/hbase
  6. Selanjutnya baca [5]🙂

Update :

%%%%%%%%%%%%%%%%%%%%%%%%%%%SOLVED- 3-Feb-09%%%%%%%%%%%%%%

if i compile the hbase from trunk (the default hadoop used is the 0.19 version) which will failed when you start hbase on distributed environtment with the following error :

———————-cut————————————————–

2009-01-29 16:58:59,301 INFO org.apache.hadoop.hbase.master.HMaster: vmName=Java HotSpot(TM) Server VM, vmVendor=Sun Microsystems Inc., vmVersion=11.0-b15
2009-01-29 16:58:59,302 INFO org.apache.hadoop.hbase.master.HMaster: vmInputArguments=[-Xmx1000m, -XX:+HeapDumpOnOutOfMemoryError, -Dhbase.log.dir=/opt/hbase/bin/../logs, -Dhbase.log.file=hbase-hadoop-master-tobeThink.log, -Dhbase.home.dir=/opt/hbase/bin/.., -Dhbase.id.str=hadoop, -Dhbase.root.logger=INFO,DRFA, -Djava.library.path=/opt/hbase/bin/../lib/native/Linux-i386-32]
2009-01-29 16:58:59,744 ERROR org.apache.hadoop.hbase.master.HMaster: Can not start master
java.io.IOException: Call to tobethink.pappiptek.lipi.go.id/192.168.107.119:54310 failed on local exception: null
at org.apache.hadoop.ipc.Client.call(Client.java:699)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:186)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:156)
at org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:96)
at org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:78)
at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:978)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1022)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:493)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:438)

————————–cut———————————————–

On the datanode log, there are INFO which tell us that there are mismatch version happen :

————–cut—————————–

2009-01-29 16:58:59,739 WARN org.apache.hadoop.ipc.Server: Incorrect header or version mismatch from 192.168.107.119:40175 got version 2 expected version 3
————————-cut———————————

, now i’m trying to compile it against hadoop-0.21.0dev ….

I’ll continue this effort next week, on Monday, coz Friday untill Sunday, is tobeThink! days !🙂

################################

Ok .., setelah tanya di milis hbase, ternyata untuk beberapa minggu ke depan, hbase trunk belum stabil, jadi mau gak mau
harus menggunakan hbase 0.19 dan hadoop 0.19 juga …

Hasil fresh install hadoop 0.19 & hbase 0.19 :

hadoop@tobeThink:/opt/hbase$ ./bin/start-hbase.sh
starting master, logging to /opt/hbase/bin/../logs/hbase-hadoop-master-tobeThink.out
WARNING! File system needs to be upgraded. Run the ‘${HBASE_HOME}/bin/hbase migrate’ script.
localhost: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-hadoop-regionserver-tobeThink.out

hadoop@tobeThink:/opt/hbase$ ./bin/hbase migrate upgrade
09/02/02 16:33:21 INFO util.Migrate: Verifying that file system is available..
09/02/02 16:33:21 INFO util.Migrate: Verifying that HBase is not running….Trys ten times  to connect to running master
09/02/02 16:33:22 INFO ipc.HBaseClass: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 0 time(s).
……………
09/02/02 16:33:31 INFO ipc.HBaseClass: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 9 time(s).
09/02/02 16:33:31 INFO util.Migrate: Starting upgrade
09/02/02 16:33:31 FATAL util.Migrate: Upgrade failed
java.io.IOException: File system version file hbase.version does not exist. No upgrade possible. See http://wiki.apache.org/hadoop/Hbase/HowToMigrate for more information.
at org.apache.hadoop.hbase.util.Migrate.run(Migrate.java:175)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.hbase.util.Migrate.main(Migrate.java:357)
—————————-

Masih belum tahu kenapa ….., tapi coba workaround praktis dengan coba menggunakan svn branch 0.19 … (gagal juga …)

Btw .., ad yg diskusi menarik masalah ini di pengumuman release candidate 0.2 :

http://osdir.com/ml/java.hadoop.hbase.devel/2008-07/msg00497.html

TODO :

Besok coba baca http://wiki.apache.org/hadoop/Hadoop%20Upgrade , dan coba lagi …

waktunya pulang …

%%%%%%%%%%%%%%%%%%%%%%%%%%%SOLVED%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Dapat jawaban dimili dari Jean-Daniel, direktori yang digunakan oleh hbase, jangan dibuat terlebih dahulu, karena direktori hbase
ini akan dibuat secara otomaris ketika hbase master di start. (Lihat [5])..

Berikut ini adalah hbase-site.xml (menggunakan pseudo-distributed mode) yang digunakan :

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://tobethink.pappiptek.lipi.go.id:54310/hbase</value>
<description>The directory shared by region servers.
Should be fully-qualified to include the filesystem to use.
E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
</description>
</property>
</configuration>

Next …, coba bermain2 dulu dengan hbase shell, etc, apa perbadaan detail antara pseudo-distributed operation dan full-distributed operation, baru beranjak ke Hama …

Happy Learning !

Thanks to Apache Foundation Folks for the great pieces of software ..
———————–

[1] http://hadoop.apache.org/hbase/
[2] What kind of hardware scales best for Hadoop?, http://wiki.apache.org/hadoop/FAQ#18
[3] HBase Version Control, http://hadoop.apache.org/hbase/version_control.html
[4] Hbase FAQ, http://wiki.apache.org/hadoop/Hbase/FAQ
[5] Getting Started, http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description
[6] Structired Data – Everything in  Its Place,  http://www.seobook.com/lsi/structured_data.htm

[7] Matching Impedance: When to use HBase, http://blog.rapleaf.com/dev/?p=26


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: