installArchitecture

Architecture

MOSTLY AI runs as a set of containerized applications and services that you can deploy in a Kubernetes cluster and maintain a fault-tolerant and highly available application.

MOSTLY AI architecture diagram

Application Nodes

Image name / Pod name
DescriptionPod Lifecycle
mostly-core

mostly-core-api
Component that reads metadata and analyzes data sources and destinations.Service
mostly-app-v2

mostly-app
Contains the backend and public APIs of MOSTLY AI.Service
mostly-keycloak

mostly-keycloak
Keycloak is an open-source identity management, authentication, and authorization tool. This container has a pre-configured Keycloak instance for MOSTLY AI.Service
mostly-coordinator

mostly-coordinator
Component that takes all requests from the web application and coordinates execution of tasks on the main AI engine.Service
mostly-core-gpu

mostly-core-gpu
Service for running AI tasks on GPU nodes.Service
mostly-postgresql

mostly-pqsl
Database instance. Contains databases for app, coordinator, and Keycloak.Service
mostly-app-ui

mostly-ui
Contains the frontend of MOSTLY AI. Reachable over port 8080.Service
mostly-assistant-broker

mostly-assistant-broker
Service for managing the communication between the MOSTLY AI Assistant, an LLM, JEG Kernels, and MOSTLY AI application.Service
mostly-enterprise-gateway

mostly-jeg
Jupyter Enterprise Gateway. Supports the management of multiple Assistant Python sessions for multiple users.Service
mostly-jeg-kernel

mostly-jeg-kernel
Jupyter Python Kernel. The base image for each Python session in the MOSTLY AI Assistant.Service
mostly-troubleshooterTroubleshooting tool for MOSTLY AI deployments.Service
mostly-backup

mostly-backup
Backup service for MOSTLY AI.Service
docs

mostly-docs
Documentation service that provides guides and API documentation.Service
mostly-ai-haproxyAutomatic load balancer and ingress management service.Service
mostly-ai-minioShared storage service.Service
mostly-core-probe-apiService for live-probing generators.Service

AI Worker nodes

Image name / Pod nameDescriptionPod Lifecycle
AI jobengine-task-<task-id>

mostly-core
• Reads from data sources and writes into data destinations
• Performs AI training and data generation.
• Creates Model and Data reports
Job

Third-party integrations and connections

Active Directory is an optional integration that can help you manage the authentication of users to MOSTLY AI. With this integration, end users do not need to create new credentials to log in to MOSTLY AI.

Image repository

The MOSTLY AI image repository contains the deployment images of all containers and makes it easy to deploy MOSTLY AI to various types of Kubernetes clusters.

Corporate databases

MOSTLY AI can connect to your internal databases (with the help of connectors) and read original data or deliver the generated synthetic data in the same or another database.

Cloud storage buckets

In addition to databases, you can also read original data and deliver synthetic data from and to cloud storage buckets (AWS S3, Azure blob storage, Google Cloud storage buckets).

LLM services

The Assistant Broker can connect to any Large Language Model (LLM) service that supports tool calling via LiteLLM. This includes publicly available LLM services such as Claude or OpenAI, or privately hosted LLM services deployed within your customer infrastructure. This integration enables the assistant service to provide intelligent responses and coordinate code execution through the internal Jupyter Hub nodes.

Storage requirements

MOSTLY AI uses two types of storage.

  • Block Storage. Used by single pods, such as PostgreSQL. Typically, this is automatically provisioned by the Kubernetes cluster.
  • Shared Storage. Shared by various pods to store models, synthetic data, and others. Two options exist for using shared storage:
    • MinIO. MOSTLY AI provides shared storage via MinIO. This uses the block storage provisioned by the Kubernetes cluster.
    • External S3 storage. You can use an external S3 storage solution instead of MinIO.