Clickhouse
Clickhouse Endpoint Documentation
OverviewCopied!
ClickHouse endpoints in Eden provide a high-performance interface for interacting with ClickHouse, an open-source column-oriented database management system that excels at real-time analytics on large datasets. These endpoints are specifically optimized for the analytical workloads and unique performance characteristics of ClickHouse.
Key FeaturesCopied!
-
High-Performance Analytics: Optimized for analytical queries on large datasets
-
Column-Oriented Storage: Takes advantage of ClickHouse's column storage format
-
Vectorized Query Execution: Support for ClickHouse's vectorized query processing
-
Real-Time Data Processing: Enables low-latency analytics on fresh data
-
Advanced Aggregation: Efficient implementation of complex analytical functions
-
Distributed Query Processing: Support for ClickHouse's sharded architecture
API Request TypesCopied!
ClickHouse endpoints currently support a single primary operation type:
Read Operation
Used for executing SQL queries against ClickHouse to retrieve data.
Required Properties:
-
kind
: Must be "clickhouse" -
type
: Must be "read" -
body
: A SQL query string to be executed against ClickHouse
Optional Properties:
-
settings
: Object containing ClickHouse-specific query settings -
parameters
: Named parameters to be substituted in the query -
format
: Desired output format (default is "JSON")
Example Request:
{
"kind": "clickhouse",
"type": "read",
"body": "SELECT date, count() AS hits, uniq(user_id) AS unique_users FROM web_traffic WHERE site_id = {site_id:UInt32} AND date >= {start_date:Date} GROUP BY date ORDER BY date DESC LIMIT 10",
"parameters": {
"site_id": 42,
"start_date": "2025-04-01"
},
"settings": {
"max_execution_time": 30,
"max_threads": 4
},
"format": "JSON"
}
Response:
{
"data": [
{
"date": "2025-05-10",
"hits": 145289,
"unique_users": 28376
},
{
"date": "2025-05-09",
"hits": 152843,
"unique_users": 30128
},
// ...more rows
],
"metadata": {
"rows": 10,
"bytes": 1240,
"executionTime": 0.042,
"statistics": {
"rows_read": 24563789,
"bytes_read": 982543560,
"elapsed_seconds": 0.038
}
}
}
Use Cases:
-
High-performance analytical queries
-
Time-series data analysis
-
Real-time dashboards and reporting
-
Complex aggregations on large datasets
-
User behavior analytics
-
Log analysis and monitoring
Endpoint ConfigurationCopied!
Basic Configuration Example
{
"name": "AnalyticsClickhouse",
"type": "clickhouse",
"description": "Analytics data warehouse",
"connection": {
"host": "clickhouse.example.com",
"port": 8123,
"database": "analytics",
"protocol": "http"
},
"authentication": {
"type": "username_password",
"username": "${ENV_CLICKHOUSE_USER}",
"password": "${ENV_CLICKHOUSE_PASSWORD}"
},
"settings": {
"maxExecutionTime": 60,
"compressionEnabled": true,
"connectionTimeout": 30,
"socketTimeout": 300
}
}
Advanced Configuration Example
{
"name": "EnterpriseClickhouse",
"type": "clickhouse",
"description": "Enterprise data analytics platform",
"connection": {
"host": "clickhouse-cluster.example.com",
"port": 8123,
"database": "enterprise_analytics",
"protocol": "https",
"secure": true,
"clusterName": "analytics_cluster"
},
"authentication": {
"type": "username_password",
"username": "${ENV_CLICKHOUSE_USER}",
"password": "${ENV_CLICKHOUSE_PASSWORD}"
},
"settings": {
"maxExecutionTime": 300,
"compressionEnabled": true,
"connectionTimeout": 30,
"socketTimeout": 600
},
"advanced": {
"querySettings": {
"maxThreads": 8,
"maxMemoryUsage": "20G",
"maxBytesBeforeExternal": "10G",
"joinAlgorithm": "auto",
"maxInsertBlockSize": 1000000,
"prefer_column_name_to_alias": 0
},
"connection": {
"tcpKeepAlive": true,
"loadBalancing": "round_robin",
"maxParallelReplicas": 2
},
"retries": 3,
"retryBackoffMs": 1000,
"caching": {
"enabled": true,
"ttlSeconds": 300,
"maxCacheSize": "1G"
},
"metrics": {
"enabled": true,
"collectQueryStatistics": true
}
}
}
Configuration Options ReferenceCopied!
Connection Options
Option |
Type |
Description |
---|---|---|
|
String |
ClickHouse server hostname or IP |
|
Number |
Port to connect to (default: 8123 for HTTP, 9000 for native) |
|
String |
Default database to use |
|
String |
"http", "https", or "native" |
|
Boolean |
Whether to use TLS/SSL for connections |
|
String |
ClickHouse cluster name for distributed operations |
Authentication Options
Option |
Type |
Description |
---|---|---|
|
String |
Authentication type: "username_password", "none", or "ldap" |
|
String |
Username for authentication |
|
String |
Password for authentication |
|
String |
LDAP server (for LDAP authentication) |
Settings Options
Option |
Type |
Description |
---|---|---|
|
Number |
Maximum query execution time in seconds |
|
Boolean |
Whether to use compression for data transfer |
|
Number |
Connection timeout in seconds |
|
Number |
Socket operation timeout in seconds |
Advanced Options
Option |
Type |
Description |
---|---|---|
|
Object |
ClickHouse-specific query settings |
|
Object |
Advanced connection parameters |
|
Number |
Number of retry attempts for failed queries |
|
Number |
Backoff time between retries in milliseconds |
|
Object |
Result caching configuration |
|
Object |
Metrics collection configuration |
Query FeaturesCopied!
Parameter Binding
ClickHouse endpoints support named parameter binding with type hints:
SELECT * FROM users WHERE id = {user_id:UInt64} AND active = {status:UInt8}
The following types are supported:
-
UInt8
,UInt16
,UInt32
,UInt64
-
Int8
,Int16
,Int32
,Int64
-
Float32
,Float64
-
String
,FixedString
-
Date
,DateTime
,DateTime64
-
UUID
,IPv4
,IPv6
-
Array(T)
where T is one of the above types
Output Formats
The endpoint supports multiple output formats:
-
JSON
(default): Returns data as JSON objects -
JSONEachRow
: JSON format with one row per line -
CSVWithNames
: CSV format with column headers -
TabSeparated
: Tab-separated values -
Pretty
: Human-readable text format
Query Settings
Key ClickHouse query settings that can be specified:
-
max_execution_time
: Maximum query execution time in seconds -
max_threads
: Maximum number of threads to use for query execution -
max_memory_usage
: Maximum memory usage for query execution -
priority
: Query priority (higher values get more resources) -
distributed_product_mode
: Mode for distributed joins
Best PracticesCopied!
Query Optimization
-
Schema Design:
-
Design tables with the MergeTree engine family
-
Choose appropriate sorting keys based on query patterns
-
Use materialized columns for frequently computed values
-
Consider data skipping indexes for frequently filtered columns
-
-
Query Writing:
-
Filter early and on indexed columns
-
Use
PREWHERE
for preliminary filtering -
Leverage
FINAL
keyword judiciously (it can be expensive) -
Use subqueries with
WITH
for readability and optimization
-
-
Aggregation Optimization:
-
Use approximate functions (
uniqHLL
,quantileTDigest
) for large datasets -
Consider two-level aggregation for distributed tables
-
Use
SimpleAggregateFunction
when applicable -
Pre-aggregate data with materialized views when possible
-
Performance Tuning
-
Resource Management:
-
Set appropriate
max_threads
based on query complexity -
Set reasonable memory limits with
max_memory_usage
-
Use
max_execution_time
to prevent runaway queries -
Consider
priority
settings for important queries
-
-
Data Loading:
-
Use bulk inserts with large batches
-
Take advantage of ClickHouse's parallel insertion capabilities
-
Consider using the Buffer engine for write-heavy workloads
-
Use asynchronous inserts for non-critical data
-
-
Caching Strategy:
-
Enable result caching for frequently executed queries
-
Set appropriate TTL values based on data freshness requirements
-
Monitor cache hit rates and adjust accordingly
-
Operational Considerations
-
Monitoring:
-
Track query performance metrics
-
Monitor system tables for resource usage
-
Set up alerts for slow queries
-
Track disk space usage
-
-
High Availability:
-
Configure multiple contact points for ClickHouse clusters
-
Implement retry logic for transient failures
-
Use load balancing across replicas
-
Consider read/write splitting for heavy workloads
-
TroubleshootingCopied!
Common Issues and Solutions
-
Memory Issues:
-
Adjust
max_memory_usage
in query settings -
Optimize queries to reduce memory consumption
-
Consider breaking large queries into smaller chunks
-
Use streaming for large result sets
-
-
Performance Problems:
-
Check
system.query_log
for slow queries -
Verify proper index usage with
EXPLAIN
-
Look for distributed query bottlenecks
-
Check for proper sorting key usage
-
-
Connection Failures:
-
Verify network connectivity
-
Check authentication credentials
-
Ensure database exists
-
Verify server resource availability
-
Advanced Diagnostics
-
Query Analysis:
-
Use
EXPLAIN
to understand query execution -
Check
system.query_log
for detailed query metrics -
Analyze
system.metrics
andsystem.asynchronous_metrics
-
Review
system.processes
for currently running queries
-
-
Profile Queries:
EXPLAIN PIPELINE SELECT count() FROM your_table WHERE condition
-
Identify Bottlenecks:
-
Check CPU usage on ClickHouse nodes
-
Monitor network throughput
-
Check disk I/O performance
-
Review memory usage patterns
-
Example QueriesCopied!
Time-Series Analytics
SELECT
toStartOfHour(timestamp) AS hour,
count() AS events,
uniq(user_id) AS unique_users,
avg(duration_ms) AS avg_duration
FROM user_events
WHERE date >= {start_date:Date} AND date <= {end_date:Date}
GROUP BY hour
ORDER BY hour DESC
Funnel Analysis
SELECT
date,
countIf(event_type = 'view') AS views,
countIf(event_type = 'click') AS clicks,
countIf(event_type = 'purchase') AS purchases,
round(countIf(event_type = 'click') / countIf(event_type = 'view'), 3) AS ctr,
round(countIf(event_type = 'purchase') / countIf(event_type = 'click'), 3) AS conversion
FROM events
WHERE date BETWEEN {start_date:Date} AND {end_date:Date}
GROUP BY date
ORDER BY date
Cohort Analysis
WITH
first_purchase AS (
SELECT
user_id,
min(toStartOfMonth(order_date)) AS cohort_month
FROM orders
GROUP BY user_id
)
SELECT
cohort_month,
toRelativeMonthNum(order_date, cohort_month) AS month_number,
count(DISTINCT orders.user_id) AS active_users
FROM orders
JOIN first_purchase ON orders.user_id = first_purchase.user_id
GROUP BY cohort_month, month_number
ORDER BY cohort_month, month_number