Skip to main content

Models

Overview

In Nstream AI, we leverage three types of models to power diverse AI-driven use cases efficiently:

  • MegaModel: Serves as the foundational model for ground truth generation, providing a strong baseline for AI tasks.
  • BaseModel: Fine-tuned using the ground truth generated by MegaModel, enabling precise domain adaptation for specific applications.
  • EmbeddingModel: Used to construct a robust knowledge base, ensuring high-quality vector embeddings for improved similarity search and retrieval tasks.

MegaModel

A large-scale foundational model (e.g., OpenAI GPT, Gemini, or Anthropic) used for ground truth generation. These models are typically pre-trained on vast datasets and provide powerful natural language understanding and generation capabilities. They serve as a backbone for high-performance AI applications.

apiVersion: llm.nstream.ai/v1
kind: MegaModel
metadata:
name: "NAME"
namespace: "NAMESPACE"
spec:
modelProvider: "MODEL_PROVIDER"
modelTemplate:
azureOpenAI:
apiVersion: "PREVIEW"
apikey: "APIKEY"
deployment: "DEPLOYMENT"
endpoint: "ENDPOINT"
KeyDescriptionExample
apiVersionDefines the API version for model deploymentllm.nstream.ai/v1
kindSpecifies the type of model being deployedMegaModel
nameThe unique name of the model instancegpt-4-mega
namespaceThe namespace under which the model is deployednstream
modelProviderThe provider of the model(e.g., AzureOpenAI, OpenAI, Anthropic)
apiVersionAPI version for the provider service2023-10-01-preview
apikeyAPI key required for authenticationSECRET_KEY
deploymentDeployment name for the modelgpt-4-deployment
endpointEndpoint URL for the modelhttps://api.openai.com/v1

BaseModel

A smaller pre-trained model that serves as a checkpoint for fine-tuning and domain adaptation. Base models are designed to be customized for specific tasks, reducing training costs while maintaining high accuracy.

apiVersion: llm.nstream.ai/v1
kind: BaseModel
metadata:
name: "NAME"
namespace: "NAMESPACE"
spec:
baseModelName: "NAME"
baseModelServingInfo:
hfModelId: "HF_MODEL_ID"
modelProvider: "HUGGINGFACE"
modelProviderToken: "HF_TOKEN"
baseModelServingTemplate:
replicas: "REPLICATION"
resourceLimit:
cpu: "CPU"
memory: "MEMORY"
gpu: "GPU"
resourceRequest:
cpu: "CPU"
memory: "MEMORY"
gpu: "GPU"
KeyDescriptionExample
apiVersionDefines the API version for model deploymentllm.nstream.ai/v1
kindSpecifies the type of model being deployedBaseModel
nameThe unique name of the model instancellama3-8b-instruct
namespaceThe namespace under which the model is deployednstream
baseModelNameName of the base model used for fine-tuningllama3-8b-instruct
hfModelIdHugging Face Model ID for retrievalmeta-llama/Llama-3-8B
modelProviderProvider of the base modelHUGGINGFACE
modelProviderTokenAPI token for authenticationHF_SECRET_TOKEN
replicasNumber of instances to deploy1
cpuMaximum CPU allocation8
memoryMaximum memory allocation24Gi
gpuMaximum GPU allocation1

EmbeddingModel

This model converts text into high-dimensional vectors for similarity search. Embedding models are crucial in recommendation systems, information retrieval, and document ranking.

apiVersion: llm.nstream.ai/v1
kind: EmbeddingModel
metadata:
name: "NAME"
namespace: "NAMESPACE"
spec:
embeddingModelName: "EMBEDDING_MODEL"
embeddingModelServingInfo:
hfModelId: "HF_MODEL_ID"
modelProvider: "HUGGINGFACE"
modelProviderToken: "HF_TOKEN"
embeddingModelServingTemplate:
replicas: "REPLICATION"
resourceLimit:
cpu: "CPU"
memory: "MEMORY"
resourceRequest:
cpu: "CPU"
memory: "MEMORY"
KeyDescriptionExample
apiVersionDefines the API version for model deploymentllm.nstream.ai/v1
kindSpecifies the type of model being deployedEmbeddingModel
nameThe unique name of the model instanceall-minilm-l6-v2
namespaceThe namespace under which the model is deployednstream
embeddingModelNameName of the embedding modelall-MiniLM-L6-v2
hfModelIdHugging Face Model ID for embeddingssentence-transformers/all-MiniLM-L6-v2
modelProviderProvider of the embedding modelHUGGINGFACE
modelProviderTokenAPI token for authenticationHF_SECRET_TOKEN
replicasNumber of instances to deploy1
cpuMaximum CPU allocation10
memoryMaximum memory allocation24Gi
cpuRequested CPU allocation2
memoryRequested memory allocation8Gi