Models
Overview
In Nstream AI, we leverage three types of models to power diverse AI-driven use cases efficiently:
- MegaModel: Serves as the foundational model for ground truth generation, providing a strong baseline for AI tasks.
- BaseModel: Fine-tuned using the ground truth generated by MegaModel, enabling precise domain adaptation for specific applications.
- EmbeddingModel: Used to construct a robust knowledge base, ensuring high-quality vector embeddings for improved similarity search and retrieval tasks.
MegaModel
A large-scale foundational model (e.g., OpenAI GPT, Gemini, or Anthropic) used for ground truth generation. These models are typically pre-trained on vast datasets and provide powerful natural language understanding and generation capabilities. They serve as a backbone for high-performance AI applications.
apiVersion: llm.nstream.ai/v1
kind: MegaModel
metadata:
name: "NAME"
namespace: "NAMESPACE"
spec:
modelProvider: "MODEL_PROVIDER"
modelTemplate:
azureOpenAI:
apiVersion: "PREVIEW"
apikey: "APIKEY"
deployment: "DEPLOYMENT"
endpoint: "ENDPOINT"
| Key | Description | Example |
|---|---|---|
apiVersion | Defines the API version for model deployment | llm.nstream.ai/v1 |
kind | Specifies the type of model being deployed | MegaModel |
name | The unique name of the model instance | gpt-4-mega |
namespace | The namespace under which the model is deployed | nstream |
modelProvider | The provider of the model | (e.g., AzureOpenAI, OpenAI, Anthropic) |
apiVersion | API version for the provider service | 2023-10-01-preview |
apikey | API key required for authentication | SECRET_KEY |
deployment | Deployment name for the model | gpt-4-deployment |
endpoint | Endpoint URL for the model | https://api.openai.com/v1 |
BaseModel
A smaller pre-trained model that serves as a checkpoint for fine-tuning and domain adaptation. Base models are designed to be customized for specific tasks, reducing training costs while maintaining high accuracy.
apiVersion: llm.nstream.ai/v1
kind: BaseModel
metadata:
name: "NAME"
namespace: "NAMESPACE"
spec:
baseModelName: "NAME"
baseModelServingInfo:
hfModelId: "HF_MODEL_ID"
modelProvider: "HUGGINGFACE"
modelProviderToken: "HF_TOKEN"
baseModelServingTemplate:
replicas: "REPLICATION"
resourceLimit:
cpu: "CPU"
memory: "MEMORY"
gpu: "GPU"
resourceRequest:
cpu: "CPU"
memory: "MEMORY"
gpu: "GPU"
| Key | Description | Example |
|---|---|---|
apiVersion | Defines the API version for model deployment | llm.nstream.ai/v1 |
kind | Specifies the type of model being deployed | BaseModel |
name | The unique name of the model instance | llama3-8b-instruct |
namespace | The namespace under which the model is deployed | nstream |
baseModelName | Name of the base model used for fine-tuning | llama3-8b-instruct |
hfModelId | Hugging Face Model ID for retrieval | meta-llama/Llama-3-8B |
modelProvider | Provider of the base model | HUGGINGFACE |
modelProviderToken | API token for authentication | HF_SECRET_TOKEN |
replicas | Number of instances to deploy | 1 |
cpu | Maximum CPU allocation | 8 |
memory | Maximum memory allocation | 24Gi |
gpu | Maximum GPU allocation | 1 |
EmbeddingModel
This model converts text into high-dimensional vectors for similarity search. Embedding models are crucial in recommendation systems, information retrieval, and document ranking.
apiVersion: llm.nstream.ai/v1
kind: EmbeddingModel
metadata:
name: "NAME"
namespace: "NAMESPACE"
spec:
embeddingModelName: "EMBEDDING_MODEL"
embeddingModelServingInfo:
hfModelId: "HF_MODEL_ID"
modelProvider: "HUGGINGFACE"
modelProviderToken: "HF_TOKEN"
embeddingModelServingTemplate:
replicas: "REPLICATION"
resourceLimit:
cpu: "CPU"
memory: "MEMORY"
resourceRequest:
cpu: "CPU"
memory: "MEMORY"
| Key | Description | Example |
|---|---|---|
apiVersion | Defines the API version for model deployment | llm.nstream.ai/v1 |
kind | Specifies the type of model being deployed | EmbeddingModel |
name | The unique name of the model instance | all-minilm-l6-v2 |
namespace | The namespace under which the model is deployed | nstream |
embeddingModelName | Name of the embedding model | all-MiniLM-L6-v2 |
hfModelId | Hugging Face Model ID for embeddings | sentence-transformers/all-MiniLM-L6-v2 |
modelProvider | Provider of the embedding model | HUGGINGFACE |
modelProviderToken | API token for authentication | HF_SECRET_TOKEN |
replicas | Number of instances to deploy | 1 |
cpu | Maximum CPU allocation | 10 |
memory | Maximum memory allocation | 24Gi |
cpu | Requested CPU allocation | 2 |
memory | Requested memory allocation | 8Gi |