Monday, March 7, 2011

Would you keep you index in internal memory or external memory

Consider this a classical question for IR and the source of infinite discussions for the community. What is the most reasonable answer?


  1. This comment has been removed by the author.

  2. The answer really depends on the specifics of the business problem you are trying to solve, doesn't it?

    What is the user tolerance of latency in the application? Is most data access random or sequential? How much do the additional servers cost needed to use internal memory versus the benefit of reduced latency? How large is the working set and is the problem amenable to partial solutions like keeping part of the index in memory? How much is the design simplified, reliability improved, and complexity of code reduced if you add more hardware?

    Generally, I lean toward keeping as much data as possible in memory. The cost of latency is high and often underestimated, as is the cost of complexity that comes from trying to make do with too little hardware. But, the answer to this question depends on the specifics of the question, that is, the specifics of the needs of the business and application.