Rank documents based on their relevance to a query using the Llama provider. This is useful for improving search results and implementing retrieval-augmented generation (RAG) systems.
Use topN to limit the number of returned documents:
Configure the rerank model with specific options:
| Option | Type | Default | Description |
|---|---|---|---|
normalize | number | model default | Score normalization mode |
contextParams.n_ctx | number | 2048 | Context window size |
contextParams.n_gpu_layers | number | 99 | GPU layers for acceleration |
Each result in the ranking array contains:
| Property | Type | Description |
|---|---|---|
relevanceScore | number | Relevance score (higher = more relevant) |
index | number | Index of the document in the original input array |
Release resources when done: