Posted by: Wildan Maulana | February 19, 2009

Index ratio Nutch Index

Just copy paste 🙂 :

It is ~1M = 1.5G Ram for just the index (not segments, linkdb, etc.) So
for a 4G Ram box you can fit ~2M pages, an 8G box = ~4.5M pages. That
being said it all depends on the amount of content you index per page
(max content size).


Miguel Costa wrote:
> Hi,
> I would like to know the ratio between (index size)/(collection size) for
> collections larger than 1 TB.
> My objective is to have all the index in memory, so having I x GB of memory,
> what is the maximum size of a collection I can index?
> Anyone can give me some numbers from your indexations ?
> Regards,
> —
> Miguel Costa


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: