Metasearch for AI-Powered Organizations#
A Guide to SearXNG Federated Search on the Cluster#
Authors: Snit Sanghlao, Claude Sonnet 4.6 (Anthropic)
Date: March 2026
Instance: https://aicenter.mahidol.ac.th/metasearch/
What is Metasearch?#
A metasearch engine (also called a federated search engine) queries multiple search engines simultaneously, aggregates and deduplicates the results, and returns a unified ranked list β without storing user data or tracking queries.
SearXNG is an open-source, self-hosted metasearch engine. A single query to SearXNG can simultaneously retrieve results from:
Category |
Sources |
|---|---|
General Web |
DuckDuckGo, Brave, Bing |
Academic |
Google Scholar, arXiv, PubMed, Semantic Scholar, Crossref |
Reference |
Wikipedia |
Code |
GitHub, StackOverflow |
Why Metasearch Matters for Organizations#
Concern |
Problem with Public Search Engines |
Metasearch Solution |
|---|---|---|
Privacy |
Google/Bing log every query and link it to user identity |
No tracking, no logs, self-hosted |
Data Leakage |
Sensitive queries are sent to external commercial companies |
All queries remain within organizational infrastructure |
Compliance |
PDPA / GDPR risk when employees search sensitive topics |
On-premise, auditable, policy-controlled |
Coverage |
Single engine = single index = blind spots in results |
Multiple sources = broader, more complete results |
Cost |
Enterprise search APIs charge per query at scale |
Free, self-hosted, unlimited queries |
Control |
Cannot restrict or customize engines used |
Full control over enabled engines, categories, and filters |
Why Metasearch is Critical for AI Agentic Systems#
AI agents need reliable, real-time, broad information access to reason and act effectively. Metasearch is a foundational infrastructure component for agentic AI.
1. Grounding β Reducing Hallucination#
Language models have a knowledge cutoff. Agents that can search retrieve current, factual information and use it as context, grounding answers in reality rather than memorized training data.
2. Clean API for Tool Use#
AI agents call APIs, not browsers. SearXNG exposes a standard JSON REST API:
GET /search?q={query}&format=json
Any agent, LLM framework, or RAG pipeline can call this directly β no scraping, no browser automation required.
3. Multi-Source in One Request#
A single SearXNG query returns results from PubMed, arXiv, Semantic Scholar, and news sources simultaneously β without requiring separate API keys or integrations for each source.
4. Private Agentic Workflows#
When an AI agent performs sensitive tasks (patient data analysis, competitor research, legal document review), those queries must not leak to external companies. A private metasearch instance ensures all agent activity stays internal.
5. RAG Pipeline Integration#
In Retrieval-Augmented Generation (RAG), SearXNG acts as the live web retrieval layer, complementing internal vector databases with fresh knowledge:
User Query
β
βΌ
Agent
β
ββββΊ SearXNG (live web) βββΊ Top N results βββ
β ββββΊ LLM Context βββΊ Answer
ββββΊ Vector DB (internal docs) βββΊ Top K chunks βββ
6. Compatible with Major AI Frameworks#
Framework / Tool |
Integration |
|---|---|
OpenWebUI |
Built-in web search toggle |
Continue (VS Code) |
|
LangChain |
SearxNG search tool |
LlamaIndex |
Web retrieval node |
AutoGen / CrewAI |
Custom agent tool via REST API |
Any custom agent |
Direct HTTP JSON API |
Using SearXNG on This Cluster#
Option 1: OpenWebUI#
Setup (Admin only)
Go to: Admin Panel β Settings β Web Search
Web Search Engine:
searxngSearXNG Query URL:
https://aicenter.mahidol.ac.th/metasearch/search?q=<query>
How to Use (All users)
Open a new chat in OpenWebUI
Click the globe icon at the bottom of the chat input bar
The icon highlights when web search is active
Type your question and send β the model will search and answer using live results
Web search is toggled per message. You can enable or disable it for each individual message.
Option 2: Continue VS Code Extension#
Continue uses SearXNG as a context provider. When you type @web, Continue fetches results from SearXNG and injects them into the modelβs prompt before sending β the model itself does not browse the web.
Step 1: Configure Continue#
PC (Local Windows)
Config file: C:\Users\<your-username>\.continue\config.yaml
name: Local Config
version: 1.0.0
schema: v1
models:
- name: Qwen3.5
provider: openai
model: cyankiwi/Qwen3.5-122B-A10B-AWQ-4bit
apiBase: https://aicenter.mahidol.ac.th/vllm/v1
systemMessage: "You are a helpful assistant."
apiKey: "sk-xxxx"
requestOptions:
extraBodyProperties:
chat_template_kwargs:
enable_thinking: false
context:
- provider: web
params:
engine: "searxng"
query: ""
searxngBaseUrl: https://aicenter.mahidol.ac.th/metasearch/
n: 5
- provider: code
- provider: docs
- provider: diff
- provider: terminal
- provider: problems
- provider: folder
- provider: codebase
Remote SSH (Linux Cluster)
Connect to the cluster via VS Code Remote SSH, then run:
mkdir -p ~/.continue
nano ~/.continue/config.yaml
Paste the same config above, save (Ctrl+O β Enter β Ctrl+X).
Step 2: Reload VS Code#
Press Ctrl+Shift+P β Developer: Reload Window
Step 3: Use @web in Chat#
Open the Continue chat panel
Type
@β a dropdown appearsSelect web from the list
Type your query followed by your question:
@web transformer architecture survey 2024
Summarize the key improvements in recent transformer models.
Press Enter β Continue fetches results from SearXNG and the model answers using those results as context
Troubleshooting#
Problem |
Solution |
|---|---|
|
Reload VS Code: |
Dropdown appears but no |
Check |
Globe icon missing in OpenWebUI |
Ask admin to enable Web Search in Admin Panel |
SearXNG not reachable |
Open |
Model says it cannot search |
Use |
Remote SSH config not loading |
Ensure config is at |
Verify SearXNG API is working:
curl "https://aicenter.mahidol.ac.th/metasearch/search?q=test&format=json"
Expected: a JSON object with a results array containing search hits.
Summary#
Metasearch is the search infrastructure layer for AI β it gives agents and users access to the open web privately, broadly, and without per-query cost. For organizations handling sensitive data, a self-hosted metasearch instance is essential infrastructure for any AI workload that requires real-world knowledge.
References#
SearXNG Documentation. SearXNG: A privacy-respecting, hackable metasearch engine. https://docs.searxng.org/
Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401. https://arxiv.org/abs/2005.11401
Continue. Context Providers β Search. https://docs.continue.dev/customize/context-providers
OpenWebUI. Web Search Integration. https://docs.openwebui.com/features/web_search
Anthropic. (2025). Claude Sonnet 4.6 Model Card. https://www.anthropic.com/
This document was co-authored by Snit Sanghlao and Claude Sonnet 4.6 (Anthropic), March 2026.