Skip to main content

Agent Pack Schema

Vendored reference

This is a mirror of hologram-generic-ai-agent-vs/docs/agent-pack-schema.md, the canonical source of truth for the agent-pack.yaml schema. When the upstream file changes, refresh this page as part of the same PR (see update.md §10).

For the tour, start with the Agent Pack overview.

Agent packs make the AI agent chatbot fully configurable so the same binary can be reused for different conversational agents.

Goals

  • Single source of truth: prompts, welcome messages, flows, tools, MCP servers, RAG/memory settings, and integrations live inside one manifest.
  • Backward compatibility: if AGENT_PACK_PATH is not defined or the manifest is invalid, the application falls back to legacy environment variables.
  • Environment overrides: any string value can reference ${VAR_NAME} so deployments can override values at runtime.
  • Early validation: the loader rejects malformed manifests during startup and surfaces warnings.

Location and structure

agent-packs/
<agent-id>/
agent-pack.yaml # or agent-pack.yml / agent-pack.json

The service reads the path provided via AGENT_PACK_PATH. If not set, it defaults to agent-packs/ in the current working directory. Accepted filenames: agent-pack.yaml, agent-pack.yml, agent-pack.json.

Manifest fields (agent-pack.yaml)

FieldTypeRequiredDescription
metadataobjectnoAgent identifiers and descriptive data.
languagesmapnoPer-language prompts/messages (en, es, etc.).
llmobjectnoLLM provider and model parameters.
ragobjectnoRAG and vector store settings.
memoryobjectnoMemory backend and session window config.
flowsobjectnoWelcome, authentication, and menu behavior.
toolsobjectnoDynamic tool JSON and bundled tool settings.
mcpobjectnoMCP (Model Context Protocol) server connections.
imageGenerationobjectnoImage generation providers and MinIO storage.
speechToTextobjectnoSpeech-to-text (voice note transcription) configuration.
integrationsobjectnoExternal service configuration (VS Agent, DB, etc.).

All top-level fields are optional.


metadata

metadata:
id: my-agent
displayName: My Agent
description: >-
A conversational agent for managing Wise accounts.
defaultLanguage: en
tags: [wise, finance]
FieldTypeDefaultDescription
idstringUnique agent identifier.
displayNamestringHuman-readable agent name.
descriptionstringOptional description.
defaultLanguagestringenDefault language code.
tagsstring[]Optional tags for categorization.

languages

A map keyed by language code (e.g., en, es, fr). Each entry can define:

FieldTypeDescription
greetingMessagestringShort greeting sent on connection. Supports {userName} placeholder.
welcomeMessagestringAlias for greetingMessage (deprecated, use greetingMessage).
systemPromptstringLLM system prompt / persona for this language.
stringsmap<string, string>Localized literals (menu labels, auth messages, etc.).
languages:
en:
greetingMessage: "Hello {userName}! How can I help you today?"
systemPrompt: |
You are a helpful financial assistant...
strings:
CREDENTIAL: "Authenticate"
LOGOUT: "Logout"
es:
greetingMessage: "¡Hola {userName}! ¿En qué puedo ayudarte?"
systemPrompt: |
Eres un asistente financiero...
strings:
CREDENTIAL: "Autenticarse"
LOGOUT: "Cerrar sesión"

llm

FieldTypeEnv overrideDefaultDescription
providerstringLLM_PROVIDERopenaiLLM provider (openai, ollama, anthropic). Use openai for any OpenAI-compatible API.
modelstringOPENAI_MODELgpt-4o-miniModel name.
temperaturenumber/stringOPENAI_TEMPERATURE0.3Sampling temperature (0–1).
maxTokensnumber/stringOPENAI_MAX_TOKENS512Max tokens per completion.
baseUrlstringOPENAI_BASE_URLBase URL for OpenAI-compatible APIs (e.g., Kimi, DeepSeek, Groq, Together AI).
agentPromptstringAGENT_PROMPTDefault agent prompt / persona.
verboseboolean/stringEnable verbose LLM logging.
llm:
provider: openai
model: gpt-4o-mini
temperature: 0.3
maxTokens: 1000
agentPrompt: |
You are an AI financial assistant...

Using OpenAI-compatible providers

Any API that follows the OpenAI chat completions format can be used by setting provider: openai and providing a baseUrl:

# Kimi (Moonshot AI)
llm:
provider: openai
model: moonshot-v1-8k
baseUrl: https://api.moonshot.cn/v1

# DeepSeek
llm:
provider: openai
model: deepseek-chat
baseUrl: https://api.deepseek.com

# Groq
llm:
provider: openai
model: llama-3.3-70b-versatile
baseUrl: https://api.groq.com/openai/v1

# Together AI
llm:
provider: openai
model: meta-llama/Llama-3-70b-chat-hf
baseUrl: https://api.together.xyz/v1

Set the corresponding API key via OPENAI_API_KEY (or the agent pack's environment variable resolution).


rag

FieldTypeEnv overrideDefaultDescription
providerstringRAG_PROVIDERvectorstoreRAG provider (vectorstore, langchain).
docsPathstringRAG_DOCS_PATH./docsLocal directory for RAG documents.
remoteUrlsstring[]RAG_REMOTE_URLS[]Remote document URLs (.txt, .md, .pdf, .csv).
chunkSizenumber/stringRAG_CHUNK_SIZE1000Document chunk size (characters).
chunkOverlapnumber/stringRAG_CHUNK_OVERLAP200Overlap between chunks (characters).
vectorStore.typestringVECTOR_STOREredisVector store provider (redis, pinecone).
vectorStore.indexNamestringVECTOR_INDEX_NAMEagent-iaIndex name for the vector store.
pinecone.apiKeystringPINECONE_API_KEYPinecone API key (if using Pinecone).
rag:
provider: langchain
docsPath: ./docs
remoteUrls:
- https://example.com/docs/guide.md
chunkSize: 1000
chunkOverlap: 200
vectorStore:
type: redis
indexName: my-agent
pinecone:
apiKey: ${PINECONE_API_KEY}

memory

FieldTypeEnv overrideDefaultDescription
backendstringAGENT_MEMORY_BACKENDmemorymemory (in-memory) or redis.
windownumber/stringAGENT_MEMORY_WINDOW8Session memory window size.
redisUrlstringREDIS_URLredis://localhost:6379Redis URL for persistent storage.
memory:
backend: redis
window: 20
redisUrl: redis://redis:6379

flows

flows.welcome

FieldTypeDescription
enabledboolean/stringEnable the welcome flow.
sendOnProfileboolean/stringSend greeting when user profile is received.
templateKeystringKey in languages.<lang> to use as greeting template.

flows.authentication

FieldTypeEnv overrideDescription
enabledboolean/stringEnable credential-based authentication.
requiredboolean/stringAUTH_REQUIREDBlock guest (unauthenticated) users from chatting.
credentialDefinitionIdstringCREDENTIAL_DEFINITION_IDVerifiable credential definition ID for authentication.
issuerServiceDidstringISSUER_SERVICE_DIDDID of the service that issues the required credential. When set, users who lack the credential receive an invitation to this service.
userIdentityAttributestringUSER_IDENTITY_ATTRIBUTECredential attribute used as unique user identity (e.g., email, login). Default: name.
rolesAttributestringROLES_ATTRIBUTECredential attribute containing user roles (string, CSV, or JSON array).
defaultRolestringDEFAULT_ROLEFallback role when credential lacks the roles attribute. Default: user.
adminUsersstring[]ADMIN_USERS (CSV)User identities that bypass all RBAC checks. Replaces legacy adminAvatars.
adminAvatarsstring[]ADMIN_AVATARS (CSV)(Legacy) Avatar names with admin privileges. Use adminUsers instead.

flows.menu

FieldTypeDescription
itemsarrayList of menu item objects.

Each menu item:

FieldTypeDescription
idstringUnique menu item identifier.
labelKeystring(Optional) Key into languages.<lang>.strings for the display label.
labelstring(Optional) Static label text. Used if labelKey is not set.
actionstringAction to trigger (e.g., authenticate, logout, mcp-config, abort-config, my-approval-requests, pending-approvals).
visibleWhenenumalways, authenticated, unauthenticated, configuring, notConfiguring, hasApprovalRequests, hasPendingApprovals.
badgestring(Optional) Dynamic badge key. The agent resolves this to a count shown next to the label. Values: approvalRequestCount, pendingApprovalCount.
flows:
welcome:
enabled: true
sendOnProfile: true
templateKey: greetingMessage
authentication:
enabled: true
required: true
credentialDefinitionId: ${CREDENTIAL_DEFINITION_ID}
issuerServiceDid: ${ISSUER_SERVICE_DID}
userIdentityAttribute: employeeLogin
rolesAttribute: roles
defaultRole: employee
adminUsers:
- alice@acme.corp
adminAvatars: # legacy — prefer adminUsers
- bob
menu:
items:
- id: authenticate
labelKey: CREDENTIAL
action: authenticate
visibleWhen: unauthenticated
- id: logout
labelKey: LOGOUT
action: logout
visibleWhen: authenticated
- id: mcp-config
labelKey: MCP_CONFIG_MENU
action: mcp-config
visibleWhen: notConfiguring
- id: abort-config
labelKey: MCP_CONFIG_ABORT
action: abort-config
visibleWhen: configuring
- id: my-approval-requests
labelKey: MY_APPROVAL_REQUESTS
action: my-approval-requests
visibleWhen: hasApprovalRequests
badge: approvalRequestCount
- id: pending-approvals
labelKey: PENDING_APPROVALS
action: pending-approvals
visibleWhen: hasPendingApprovals
badge: pendingApprovalCount

tools

FieldTypeEnv overrideDescription
dynamicConfiganyLLM_TOOLS_CONFIGJSON string or object defining external LLM tools.
bundledmapSettings for built-in tools (keyed by tool name).

bundled.statisticsFetcher

FieldTypeEnv overrideDefaultDescription
enabledbooleanSTATISTICS_TOOL_ENABLEDtrueEnable the statistics tool.
endpointstringSTATISTICS_API_URLStatistics API endpoint URL.
requiresAuthbooleanSTATISTICS_REQUIRE_AUTHfalseRequire authentication for stats.
defaultStatClassstringUSER_CONNECTEDDefault statistics class.
defaultStatEnumsarrayDefault enum values for stats.
tools:
dynamicConfig: ${LLM_TOOLS_CONFIG}
bundled:
statisticsFetcher:
enabled: true
endpoint: ${STATISTICS_API_URL}
requiresAuth: false
defaultStatClass: USER_CONNECTED

mcp

MCP (Model Context Protocol) server configuration. Env override: MCP_SERVERS_CONFIG (JSON array string).

mcp:
servers:
- name: wise
transport: streamable-http
url: ${WISE_MCP_URL}
...

Each server entry:

FieldTypeDescription
namestringUnique server name.
transportenumstdio, sse, or streamable-http.
urlstringServer URL (required for sse and streamable-http).
commandstringExecutable command (required for stdio).
argsstring[] or stringCommand arguments (for stdio).
envmap<string, string>Environment variables passed to the stdio process.
headersmap<string, string>HTTP headers sent with every request (for sse/streamable-http).
reconnectboolean/stringAuto-reconnect on disconnect.
accessModeenumadmin-controlled (default) or user-controlled.
userConfigobjectUser-facing configuration (only when accessMode: user-controlled).
toolAccessobjectTool-level access control.

mcp.servers[].userConfig

When accessMode is user-controlled, each user is prompted to provide configuration values (e.g., API tokens) through the chat interface.

FieldTypeDescription
fieldsarrayList of user config fields.

Each field:

FieldTypeDescription
namestringInternal field name (e.g., token).
typeenumsecret (masked, never logged) or text.
labelstring or map<string, string>Localized prompt label. A map keyed by language code, or a plain string.
headerTemplatestringMaps the value into a header. e.g., "Bearer {value}".
headerNamestringHTTP header name to set. Defaults to Authorization if omitted.

mcp.servers[].toolAccess

Two models are supported. When roles is defined, the RBAC model is active; otherwise the legacy model applies.

Legacy model:

FieldTypeDescription
defaultenumpublic (all tools available to all users) or admin (admin-only by default).
publicstring[]Tools explicitly available to all users (when default: admin).

RBAC model:

FieldTypeDescription
defaultenumnone (deny unlisted tools), all (allow unlisted tools), or legacy values.
rolesmap<string, string[]>Maps role names to lists of tool names accessible by that role.
approvalarrayList of approval policies (see below).

Each approval policy:

FieldTypeDescription
toolsstring[]Tool names that require approval.
approversstring[]Role names that can approve requests for these tools.
timeoutMinutesnumberMinutes before a pending request expires. Default: 60.
toolAccess:
default: none
roles:
guest: [get_exchange_rate]
employee: [list_profiles, get_balances, list_transfers]
finance: [send_money, create_invoice, list_recipients]
approval:
- tools: [send_money]
approvers: [finance-manager, cfo]
timeoutMinutes: 60
- tools: [create_invoice]
approvers: [finance-manager]
timeoutMinutes: 120

When RBAC is active:

  • Tools are filtered per user — the LLM only sees tools the user's roles can access
  • adminUsers bypass all RBAC checks and see all tools
  • Users holding both a tool role and an approver role get self-approval (immediate execution)
  • Stale approval requests are automatically expired via a periodic task

Example: End-user mode (each user provides their own token)

mcp:
servers:
- name: wise
transport: streamable-http
url: ${WISE_MCP_URL}
accessMode: user-controlled
userConfig:
fields:
- name: token
type: secret
label:
en: "Please enter your Wise API Token:"
es: "Por favor, ingresa tu Token de API de Wise:"
headerTemplate: "Bearer {value}"
toolAccess:
default: public

Example: Corporate mode with RBAC (shared token, role-based access)

When accessMode is omitted, it defaults to admin-controlled: a shared connection is established at startup using the global headers, and all users share it.

mcp:
servers:
- name: wise
transport: streamable-http
url: ${WISE_MCP_URL}
accessMode: admin-controlled
headers:
Authorization: "Bearer ${WISE_API_TOKEN}"
toolAccess:
default: none
roles:
guest: [get_exchange_rate]
employee: [list_profiles, get_balances]
finance: [send_money, create_invoice]
approval:
- tools: [send_money]
approvers: [finance-manager]
timeoutMinutes: 60

Example: Corporate mode without RBAC (legacy)

mcp:
servers:
- name: wise
transport: streamable-http
url: ${WISE_MCP_URL}
accessMode: admin-controlled
headers:
Authorization: "Bearer ${WISE_API_TOKEN}"
toolAccess:
default: public

imageGeneration

Configures AI image generation. The agent exposes two built-in LangChain tools (generate_image and upload_media_to_mcp) when at least one provider is configured and MinIO storage is available.

Architecture:

  1. The LLM calls generate_image with a prompt and optional target specs.
  2. The provider generates the raw image (e.g. DALL-E API).
  3. The image is converted (resized, format, quality) via sharp.
  4. A 128×128 thumbnail preview is generated for the chat UI.
  5. The converted image is uploaded to MinIO and a presigned URL is returned.
  6. A MediaMessage with preview, width, and height is sent to the user.
  7. The image ref is stored in-memory (1h TTL) for optional MCP bridging via upload_media_to_mcp.
FieldTypeDescription
providersarrayList of image generation provider configurations.

imageGeneration.providers[]

FieldTypeDefaultDescription
namestringUnique provider name, referenced by the LLM in generate_image tool calls.
typestringProvider type. Supported: openai-dalle, openai-gpt-image.
modelstringdepends on typeModel name (e.g. dall-e-3, dall-e-2, gpt-image-1, gpt-image-1.5).
apiKeyEnvstringOPENAI_API_KEYEnvironment variable name holding the API key.
defaultSizestring1024x1024Default image size. Provider interprets this — see per-type notes below.
qualityenumprovider defaultopenai-gpt-image only: low | medium | high | auto.
outputFormatenumpngopenai-gpt-image only: png | jpeg | webp. Native format returned by the API.
backgroundenumprovider defaultopenai-gpt-image only: transparent | opaque | auto.
Type-specific notes
  • openai-dalle — sizes 1024x1024, 1792x1024, 1024x1792 (dall-e-3) or 256x256 / 512x512 / 1024x1024 (dall-e-2). Always returns PNG.
  • openai-gpt-image — sizes 1024x1024, 1024x1536, 1536x1024, or auto. Does not accept response_format; native format is controlled by outputFormat. Supports up to n=10 per call.
# DALL-E 3
imageGeneration:
providers:
- name: dalle
type: openai-dalle
model: dall-e-3
# apiKeyEnv: OPENAI_API_KEY # default, reuses the LLM key
defaultSize: 1024x1024
# gpt-image-1.5
imageGeneration:
providers:
- name: gpt-image
type: openai-gpt-image
model: gpt-image-1.5
defaultSize: 1024x1024
quality: high
outputFormat: png
# background: transparent # optional

MinIO storage (environment variables)

Image generation requires MinIO for storing generated images and serving presigned URLs. Configure via environment variables:

VariableDefaultDescription
MINIO_ENDPOINTMinIO server hostname (e.g. minio). Required to enable image generation.
MINIO_PORT9000MinIO server port.
MINIO_ACCESS_KEYminioadminMinIO access key.
MINIO_SECRET_KEYminioadminMinIO secret key.
MINIO_USE_SSLfalseUse HTTPS for MinIO connection.
MINIO_PUBLIC_URLauto-derivedPublic base URL for presigned URLs (e.g. https://minio.example.com:9000).
MINIO_BUCKETimage-genBucket name for storing generated images.

If MINIO_ENDPOINT is not set, the media store is disabled and image generation tools will not be registered.

Built-in tools

When image generation is enabled, two tools are automatically registered with the LLM:

generate_image — Generates images via the configured provider, converts them, uploads to MinIO, and sends a preview to the user.

ParameterTypeRequiredDescription
providerstringyesProvider name (from imageGeneration.providers[].name).
promptstringyesText prompt describing the desired image.
nnumbernoNumber of images to generate (default: 1).
target_formatstringnoOutput format: jpeg, png, or webp.
target_max_widthnumbernoMax width in pixels.
target_max_heightnumbernoMax height in pixels.
target_max_size_kbnumbernoMax file size in KB (quality is reduced to fit).

upload_media_to_mcp — Bridges a generated image to an MCP server by sending it as base64 via an internal MCP tool call, bypassing LLM context.

ParameterTypeRequiredDescription
ref_idstringyesImage reference ID returned by generate_image.
serverstringyesMCP server name.
toolstringyesMCP tool name to invoke with the image data.

speechToText

Configures voice note transcription (speech-to-text). When enabled, incoming audio MediaMessage items are transcribed and fed to the LLM as text input.

FieldTypeDefaultDescription
requireAuthboolean/stringfalseWhen true, voice notes from unauthenticated users are rejected.
providerobjectSTT provider configuration (see below). If omitted, STT is disabled.

speechToText.provider

FieldTypeDefaultDescription
namestringUnique provider name (for logging).
typestringProvider type: openai-whisper (OpenAI cloud) or whisper-compatible (self-hosted endpoint).
modelstringwhisper-1Whisper model name. Use whisper-1 for OpenAI, or e.g. large-v3 for self-hosted.
apiKeyEnvstringOPENAI_API_KEYEnvironment variable name containing the API key.
baseUrlstringBase URL for self-hosted Whisper-compatible endpoints (e.g. https://my-whisper.example.com/v1).
languagestringOptional language hint (ISO 639-1 code, e.g. en, es).

Both openai-whisper and whisper-compatible use the same OpenAI-compatible /v1/audio/transcriptions API. The difference is that whisper-compatible is intended for self-hosted endpoints where baseUrl is required and apiKeyEnv may not be needed.

# Example: OpenAI cloud Whisper
speechToText:
requireAuth: true
provider:
name: whisper
type: openai-whisper
model: whisper-1
# apiKeyEnv: OPENAI_API_KEY # default

# Example: Self-hosted Whisper-compatible endpoint
speechToText:
requireAuth: false
provider:
name: local-whisper
type: whisper-compatible
model: large-v3
baseUrl: https://my-whisper.example.com/v1

Voice authentication message

When requireAuth: true and an unauthenticated user sends a voice note, the agent sends a VOICE_AUTH_REQUIRED message. This message is configurable per language via the strings map:

languages:
en:
strings:
VOICE_AUTH_REQUIRED: "Please log in before sending voice messages."
es:
strings:
VOICE_AUTH_REQUIRED: "Inicia sesión antes de enviar mensajes de voz."

If not overridden, the following defaults are used:

LanguageDefault message
enVoice messages require authentication. Please authenticate first to use this feature.
esLos mensajes de voz requieren autenticación. Por favor, autentícate primero para usar esta función.
frLes messages vocaux nécessitent une authentification. Veuillez vous authentifier d'abord pour utiliser cette fonctionnalité.

vision

Configures image-to-text description (vision). When enabled, incoming image MediaMessage items are described by a vision-capable LLM and injected into the chat flow as [Image] <description> so the main LLM has textual context to reason over. Non-image media (documents, etc.) is unaffected.

FieldTypeDefaultDescription
requireAuthboolean/stringfalseWhen true, images from unauthenticated users are rejected.
providerobjectVision provider configuration (see below). If omitted, vision is disabled.

vision.provider

FieldTypeDefaultDescription
namestringUnique provider name (for logging).
typestringProvider type: openai-vision (OpenAI cloud) or openai-compatible-vision (self-hosted endpoint).
modelstringgpt-4o-miniVision-capable model name (e.g. gpt-4o, gpt-4o-mini, gpt-4.1-mini).
apiKeyEnvstringOPENAI_API_KEYEnvironment variable name containing the API key.
baseUrlstringBase URL for OpenAI-compatible endpoints (e.g. https://my-llm.example.com/v1).
promptstringbuilt-in concise-description promptPrompt used when asking the model to describe the image. Useful to tune verbosity or domain focus.
maxTokensnumber300Max tokens for the description output.
detailenumautoOpenAI image detail hint: auto, low, or high. low is cheaper; high is more accurate.
languagestringOptional language hint for the description (ISO 639-1, e.g. en, es).

Both openai-vision and openai-compatible-vision use the same OpenAI-compatible /v1/chat/completions API with an image_url content block containing a data: URI. The difference is that openai-compatible-vision is intended for self-hosted endpoints where baseUrl is required and apiKeyEnv may not be needed.

# Example: OpenAI cloud vision
vision:
requireAuth: true
provider:
name: vision
type: openai-vision
model: gpt-4o-mini
detail: auto
# apiKeyEnv: OPENAI_API_KEY # default

# Example: Self-hosted OpenAI-compatible vision endpoint
vision:
requireAuth: false
provider:
name: local-vision
type: openai-compatible-vision
model: llava-next
baseUrl: https://my-llm.example.com/v1
prompt: "Describe the key visual elements of this image in one short paragraph."
maxTokens: 200

Image authentication message

When requireAuth: true and an unauthenticated user sends an image, the agent sends an IMAGE_AUTH_REQUIRED message. This message is configurable per language via the strings map:

languages:
en:
strings:
IMAGE_AUTH_REQUIRED: "Please log in before sending images."
es:
strings:
IMAGE_AUTH_REQUIRED: "Inicia sesión antes de enviar imágenes."

If not overridden, the following defaults are used:

LanguageDefault message
enImages require authentication. Please authenticate first to use this feature.
esLas imágenes requieren autenticación. Por favor, autentícate primero para usar esta función.
frLes images nécessitent une authentification. Veuillez vous authentifier d'abord pour utiliser cette fonctionnalité.

integrations

Free-form configuration for external services. The schema accepts any structure under vsAgent and postgres.

integrations:
vsAgent:
adminUrl: ${VS_AGENT_ADMIN_URL}
stats:
enabled: ${VS_AGENT_STATS_ENABLED}
host: ${VS_AGENT_STATS_HOST}
port: ${VS_AGENT_STATS_PORT}
queue: ${VS_AGENT_STATS_QUEUE}
username: ${VS_AGENT_STATS_USER}
password: ${VS_AGENT_STATS_PASSWORD}
postgres:
host: ${POSTGRES_HOST}
user: ${POSTGRES_USER}
password: ${POSTGRES_PASSWORD}
dbName: ${POSTGRES_DB_NAME}

Value resolution order

  1. Load agent-pack.yaml (or .yml / .json).
  2. Replace ${VAR} placeholders with process.env.VAR when available.
  3. Explicit environment variables (e.g., AGENT_PROMPT, LLM_TOOLS_CONFIG, MCP_SERVERS_CONFIG) override the resolved pack values.
  4. Fall back to hard-coded defaults when neither pack nor env provides a value.

Compatibility

  • If AGENT_PACK_PATH is missing or invalid, a warning is logged and the app continues with legacy env-only configuration.
  • Packs can be swapped by mounting a different directory and pointing AGENT_PACK_PATH to it (Docker, Kubernetes, etc.).