Models

Overview

In Nstream AI, we leverage three types of models to power diverse AI-driven use cases efficiently:

MegaModel: Serves as the foundational model for ground truth generation, providing a strong baseline for AI tasks.
BaseModel: Fine-tuned using the ground truth generated by MegaModel, enabling precise domain adaptation for specific applications.
EmbeddingModel: Used to construct a robust knowledge base, ensuring high-quality vector embeddings for improved similarity search and retrieval tasks.

MegaModel

A large-scale foundational model (e.g., OpenAI GPT, Gemini, or Anthropic) used for ground truth generation. These models are typically pre-trained on vast datasets and provide powerful natural language understanding and generation capabilities. They serve as a backbone for high-performance AI applications.

apiVersion: llm.nstream.ai/v1
kind: MegaModel
metadata:
  name: "NAME"
  namespace: "NAMESPACE"
spec:
  modelProvider: "MODEL_PROVIDER"
  modelTemplate:
    azureOpenAI:
      apiVersion: "PREVIEW"
      apikey: "APIKEY"
      deployment: "DEPLOYMENT"
      endpoint: "ENDPOINT"

Key	Description	Example
`apiVersion`	Defines the API version for model deployment	`llm.nstream.ai/v1`
`kind`	Specifies the type of model being deployed	`MegaModel`
`name`	The unique name of the model instance	`gpt-4-mega`
`namespace`	The namespace under which the model is deployed	`nstream`
`modelProvider`	The provider of the model	(e.g., `AzureOpenAI`, `OpenAI`, `Anthropic`)
`apiVersion`	API version for the provider service	`2023-10-01-preview`
`apikey`	API key required for authentication	`SECRET_KEY`
`deployment`	Deployment name for the model	`gpt-4-deployment`
`endpoint`	Endpoint URL for the model	`https://api.openai.com/v1`

BaseModel

A smaller pre-trained model that serves as a checkpoint for fine-tuning and domain adaptation. Base models are designed to be customized for specific tasks, reducing training costs while maintaining high accuracy.

apiVersion: llm.nstream.ai/v1
kind: BaseModel
metadata:
  name: "NAME"
  namespace: "NAMESPACE"
spec:
  baseModelName: "NAME"
  baseModelServingInfo:
    hfModelId: "HF_MODEL_ID"
    modelProvider: "HUGGINGFACE"
    modelProviderToken: "HF_TOKEN"
  baseModelServingTemplate:
    replicas: "REPLICATION"
    resourceLimit:
      cpu: "CPU"
      memory: "MEMORY"
      gpu: "GPU"
    resourceRequest:
      cpu: "CPU"
      memory: "MEMORY"
      gpu: "GPU"

Key	Description	Example
`apiVersion`	Defines the API version for model deployment	`llm.nstream.ai/v1`
`kind`	Specifies the type of model being deployed	`BaseModel`
`name`	The unique name of the model instance	`llama3-8b-instruct`
`namespace`	The namespace under which the model is deployed	`nstream`
`baseModelName`	Name of the base model used for fine-tuning	`llama3-8b-instruct`
`hfModelId`	Hugging Face Model ID for retrieval	`meta-llama/Llama-3-8B`
`modelProvider`	Provider of the base model	`HUGGINGFACE`
`modelProviderToken`	API token for authentication	`HF_SECRET_TOKEN`
`replicas`	Number of instances to deploy	`1`
`cpu`	Maximum CPU allocation	`8`
`memory`	Maximum memory allocation	`24Gi`
`gpu`	Maximum GPU allocation	`1`

EmbeddingModel

This model converts text into high-dimensional vectors for similarity search. Embedding models are crucial in recommendation systems, information retrieval, and document ranking.

apiVersion: llm.nstream.ai/v1
kind: EmbeddingModel
metadata:
  name: "NAME"
  namespace: "NAMESPACE"
spec:
  embeddingModelName: "EMBEDDING_MODEL"
  embeddingModelServingInfo:
    hfModelId: "HF_MODEL_ID"
    modelProvider: "HUGGINGFACE"
    modelProviderToken: "HF_TOKEN"
  embeddingModelServingTemplate:
    replicas: "REPLICATION"
    resourceLimit:
      cpu: "CPU"
      memory: "MEMORY"
    resourceRequest:
      cpu: "CPU"
      memory: "MEMORY"

Key	Description	Example
`apiVersion`	Defines the API version for model deployment	`llm.nstream.ai/v1`
`kind`	Specifies the type of model being deployed	`EmbeddingModel`
`name`	The unique name of the model instance	`all-minilm-l6-v2`
`namespace`	The namespace under which the model is deployed	`nstream`
`embeddingModelName`	Name of the embedding model	`all-MiniLM-L6-v2`
`hfModelId`	Hugging Face Model ID for embeddings	`sentence-transformers/all-MiniLM-L6-v2`
`modelProvider`	Provider of the embedding model	`HUGGINGFACE`
`modelProviderToken`	API token for authentication	`HF_SECRET_TOKEN`
`replicas`	Number of instances to deploy	`1`
`cpu`	Maximum CPU allocation	`10`
`memory`	Maximum memory allocation	`24Gi`
`cpu`	Requested CPU allocation	`2`
`memory`	Requested memory allocation	`8Gi`

Overview​

MegaModel​

BaseModel​

EmbeddingModel​

Overview

MegaModel

BaseModel

EmbeddingModel