Skip to content

Latest commit

 

History

History
41 lines (36 loc) · 1.15 KB

520_Lucene.asciidoc

File metadata and controls

41 lines (36 loc) · 1.15 KB

Lucene and Elasticsearch (TODO)

This chapter discusses the relationship between Lucene and Elasticsearch, and explains some of the Lucene internals that the advanced user should be aware of.

Introduction

  • Inverted index distilled

  • Information Retrieval Basics

Lucene in a Nutshell

  • A conceptual overview

    • Documents, Fields

    • 4-Dimension API (Fields → Terms → Docs → Positions)

  • Lifetime of a Document

  • Lifetime of a Query

    • Query Rewriting

    • Weight & Scorer

    • Query Types

    • MultiTermQueries vs. “common” queries

  • Data structures explained

    • File Formats on a high level (ie. the basic datastructures)

Introduction to Retrieval Models

  • Document a-time retrieval

  • Scoring Models

    • TF-IDF, BM25, etc

    • Similarity & Per-Field Scoring

  • Proximity Scoring

    • PhraseQueries and their impact

  • Custom Scoring

Performance Factors & Caveats

  • Thinking Lucene

    • Concepts revisited with performance implications in mind

  • Design for Speed

    • how to layout your index for fast retrieval

    • how Lucene utilizes its environment (OS, Disk Cache, DirectMemory)

    • Sorting, FieldCache and friends

  • Search performance characteristics