Index ratio Nutch Index

Just copy paste 🙂 :

It is ~1M = 1.5G Ram for just the index (not segments, linkdb, etc.) So
for a 4G Ram box you can fit ~2M pages, an 8G box = ~4.5M pages. That
being said it all depends on the amount of content you index per page
(max content size).


Miguel Costa wrote:
> Hi,
> I would like to know the ratio between (index size)/(collection size) for
> collections larger than 1 TB.
> My objective is to have all the index in memory, so having I x GB of memory,
> what is the maximum size of a collection I can index?
> Anyone can give me some numbers from your indexations ?
> Regards,
> —
> Miguel Costa


