Talk:Understanding HBase and BigTable

From Jimbojw.com

Jump to: navigation, search

Comments on Understanding HBase and BigTable

Note: Due to a recent influx of spam messages on this page, comments are now being moderated. Your comment will appear once it has been approved.
Leave a comment
Sorry, comments are disabled.

Contents

Bryan Duxbury said ...

This is a great primer. We should incorporate something like this into our official documentation/wiki.

--Bryan Duxbury 10:42, 18 May 2008 (MST)

Jimbojw said ...

Thanks Bryan,

I'm glad you think this is valuable. The hardest part for me learning Hbase was unlearning what I already knew about relational databases.

After several conversations with friends about Hbase, it became clear to me that what really was needed was a lay-programmer's conceptual primer, presenting the concepts in the right order to minimize confusion.

--Jimbojw 07:17, 19 May 2008 (MST)

stack said ...

Excellent! (I added it our little list of articles linked from main hbase wiki page).

P.S. A vote in a room full of hbasistas said its 'HBase', not 'Hbase'

--stack 10:05, 20 May 2008 (MST)

Jimbojw said ...

Thanks stack!

I updated the URL and all references in the article from 'Hbase' to 'HBase' per your suggestion.

--Jimbojw 11:21, 20 May 2008 (MST)

Acure said ...

Great Job.

--Acure 14:39, 20 May 2008 (MST)

Jamie Penney said ...

Great explanation! I think I actually understand this stuff now. Thanks a lot for writing this in a clear, understandable fashion for us all.

--Jamie Penney 19:53, 20 May 2008 (MST)

Drew said ...

This is an excellent primer, thanks for writing it up.

Are there any sort of performance implications that are implied with column families? For example, if I needed to store a column 'foo' in a certain application, what rationale would I use to put it in column family 'A' or 'B'? Would I group columns into families based on how they're used in a particular application?

--Drew 13:26, 27 May 2008 (MST)

Jimbojw said ...

Hi Drew,

I started writing a response, but it got to be so long that I decided to put it into its own blog post.

See Understanding HBase column-family performance options

--Jimbojw 16:32, 27 May 2008 (MST)

Josh Ma said ...

> This is important when choosing a row key convention. For example, consider a table whose keys are domain names. It makes the most sense to list them in reverse notation (so "com.jimbojw.www" rather than "www.jimbojw.com") so that rows about a subdomain will be near the parent domain row.

> Continuing the domain example, the row for the domain "mail.jimbojw.com" would be right next to the row for "www.jimbojw.com" rather than say "mail.xyz.com" which would happen if the keys were regular domain notation.

Jimbojw,It's the right?Maybe the part of below is right for user.

Continuing the domain example, the row for the domain "com.jimbojw.mail" would be right next to the row for "com.jimbojw.www" rather than say "com.xyz.mail" which would happen if the keys were regular domain notation.

--Josh Ma 03:42, 10 June 2008 (MST)

Jimbojw said ...

Hi Josh,

Thanks for taking the time to comment.

I see what you're saying. Perhaps reversing the domain notation in the second example would have been better for comprehension. :/

--Jimbojw 10:01, 10 June 2008 (MST)

Anand said ...

This article helped a lot. Thanks a tonne. I would love to read an article/discussion about where an RDBMS like setup makes more/less sense and compare those with BigTable/HBase. This would help people recognise the differences more and from a real-world perspective. Is there already something like that?

--Anand 04:32, 20 July 2008 (MST)

nontster said ...

Thanks Jimbojw, this help me a lot.

--nontster 00:07, 22 July 2008 (MST)

Marcus Herou said ...

Thanks man! You've helped a bunch of people.

--Marcus Herou 10:23, 29 July 2008 (MST)

Igor Minar said ...

Thanks a lot for putting together this excellent article.

The only thing I miss, is a brief mention of things that people are used to in RDBMS, but are not available in column-oriented dbs (joins, etc).

--Igor Minar 20:25, 24 August 2008 (MST)

Toucan said ...

I enjoyed reading this article and if anyone is interested in sparse, distributed, persistent multidimensional databases they may be interested in Caché, see links below.

--Tony 21:53, 2 October 2008 (GMT)

--Toucan 13:51, 2 October 2008 (MST)

yossi said ...

Great article , it's been a lot of help, thank you!

--yossi 01:39, 15 October 2008 (MST)

Manu said ...

Thank you for the lucid explanation. Would like to know more on which kinds of applications does this fit in comparison to the traditional RDBMS.

--Manu 17:19, 19 December 2008 (MST)

Ole-Martin Mørk said ...

Great article! I have written a hands-on post on how to use HBase's shell to accomplish this. Check it out: http://ole-martin.net/hbase-tutorial-for-beginners/

--Ole-Martin Mørk 09:21, 22 December 2008 (MST)

Stuart said ...

Really nice article. One point of confusion: Towards the end, it says

'Using our imaginary HBase table, querying for the row/column of "aaaaa"/"A:foo" will return "y" while querying for the row/column/timestamp of "aaaaa"/"A:foo"/10 will return "m". Querying for a row/column/timestamp of "aaaaa"/"A:foo"/2 will return a null result.

It looks to me like "aaaaa"/"A:foo"/10 should be "aaaaa"/"A:foo"/4, since that's m's timestamp - is that correct?

--Stuart 10:20, 4 March 2009 (MST)

Glenn Gillen said ...

Great article. I was on the right path, but this has really helped cement my understanding.

--Glenn Gillen 05:45, 24 March 2009 (MST)

Sumanth said ...

Thank you very much. The domain example has been very illustrative.

--Sumanth 23:41, 2 April 2009 (MST)