You can generate responses using Llama models with the Vercel AI SDK's generateText or streamText functions.
Stream responses for real-time output:
Configure model behavior with generation options:
temperature (0-1): Controls randomness. Higher values = more creative, lower = more focusedmaxTokens: Maximum number of tokens to generatetopP (0-1): Nucleus sampling thresholdtopK: Top-K sampling parameterYou can pass selected options with generateText or streamText as follows:
When creating a model instance, you can configure llama.rn specific options: