A Young Goku (from Dragon Ball) wearing his Crane School uniform
Kunall Banerjee

Running Metabase in production on Fly.io

17th April, 2024

I managed to deploy, and run Metabase (backed by Fly Postgres) on Fly.io in a production setting using ClickHouse as the data warehouse. There are tutorials online, but they only deploy Metabase in its default configuration (with H2 as the internal database), don’t provide (cost/price) estimates, nor metrics, nor do they specify how to add external dependencies or plugins into Metabase. All 3 things I needed answered for my use-case. Here’s what I have found (so far).

This solution is currently being tested by 2 people for internal reporting at a company, and later will be used by ~20 people. They didn’t want to spend too much on running this setup, so some decisions here are backed by cost-effective measures. If you can suggest further cost savings, I’d be extremely grateful!

Also, I should add:

This post may serve as a guide, but that’s not what it is intended for. This is reference material that I plan on updating over time. If you continue reading, my assumption is that you already know about ClickHouse, Fly and Metabase, and you are just curious how I’ve set things up

I’m going to skip over setting up and deploying ClickHouse, and move directly to setting up the Postgres instance required to run Metabase in production. But for brevity, I’ll mention that the ClickHouse instance is also running on Fly with the recommendations made for self-managed ClickHouse instances.

  1. Preparing the Postgres instance required to move Metabase to production
    fly pg create --name <metabase-db-name> --initial-cluster-size 1 --region <closest-to-you> --vm-size 1024 --volume-size 1

When setting up the Postgres app, I followed the application database server size recommended by Metabase. I opted to keep the initial cluster size to 1 and also chose to keep the volume size to 1G. Both can be horizontally scaled (up) at will. I deployed this first with no issues, but right now it’s not attached to a Fly app, so there’s not much to it. I end step 2 by attaching the Postgres app to the Metabase app.

  1. Preparing the custom Dockerfile to deploy Metabase on Fly

Ideally, I wouldn’t even need to do that, and just pass the Metabase image directly in fly.toml. But because I went with ClickHouse, I had to accommodate for that change somehow. Fortunately, you can deploy apps via Dockerfile on Fly, so there’s that.

ClickHouse is not an officially supported driver. However, it is listed as a partner driver, which means there’s a community supported driver for it

Being able to deploy Metabase as a JAR was one of its selling points for me, along with the fact that you can add external dependencies or plugins simply by dropping them in a dedicated “plugins” directory. These plugins are generally self-contained as JAR files, too.

metabase-prod-deployment

But here’s where I hit a major roadblock.

Note that Metabase will use this directory to extract plugins bundled with the default Metabase distribution (such as drivers for various databases such as SQLite), thus it must be readable and writable by Docker.

Fly supports Dockerfile, but ultimately your apps are deployed stand-alone to a VM directly. There is no Docker if you fly ssh console into your Metabase instance. On top of that, Fly Volumes aren’t available during builds.

Keeping all of that in mind, this is what I ended up with:

Dockerfile
    # Use the official OpenJDK 11 image as the base imageFROM adoptopenjdk/openjdk11:alpineENV MB_VERSION=v0.49.3 \    MB_APP_PORT=3000 \    MB_JETTY_HOST=0.0.0.0 \    METABASE_CLICKHOUSE_DRIVER_VERSION=1.4.0RUN apk add --no-cache bash wget ca-certificatesRUN mkdir -p /metabase-data /pluginsRUN wget -O metabase.jar "http://downloads.metabase.com/${MB_VERSION}/metabase.jar"RUN wget -O plugins/ch.jar "https://github.com/ClickHouse/metabase-clickhouse-driver/releases/download/${METABASE_CLICKHOUSE_DRIVER_VERSION}/clickhouse.metabase-driver.jar"EXPOSE $MB_APP_PORTCMD ["java", "-jar", "metabase.jar"]

You can just use the metabase image from Docker Hub if you don’t want (or need) to add external dependencies or plugins

I could have also exposed the database URL when I deployed the Postgres app in step 1. After all, the Postgres app is not exposed to the public Internet, and all 3 apps are deployed on a private network within the same org, so it wouldn’t matter.

Next, I attached the Postgres app created in step 1 to the Metabase app:

    fly pg attach --app-name <your-metabase-app> --variable-name MB_DB_CONNECTION_URI --postgres-app <postgres-app-name>

If you don’t pass MB_DB_CONNECTION_URI as the variable name, then you will have to take an extra step to create the secret manually and assign it the database connection URI string to it

  1. Create a volume for the Metabase app

I created the volume that Metabase would mount to once deployed to the VM.

    fly volumes create metabase_data --region <closest-to-you> --size 1

If you skip this step before deploying the configuration in step 4, then Fly will assign 3G (the max free space available per organization) to the volume automatically

  1. Create the fly.toml config file

Now, referencing the Dockerfile created in step 2, we end up with this fly.toml configuration for the Metabase app.

fly.toml
    app = "<your-metabase-app>"kill_signal = "SIGTERM"kill_timeout = 5[build]dockerfile = "Dockerfile"[mounts]destination = "/metabase-data"source = "metabase_data"[http_service]auto_start_machines = trueauto_stop_machines = trueforce_https = trueinternal_port = 3000min_machines_running = 0[http_service.concurrency]hard_limit = 150soft_limit = 100type = "requests"[[http_service.checks]]grace_period = "120s"interval = "30s"method = "GET"path = "/api/health"timeout = "5s"
  1. First deployment of Metabase

This will take longer the first time you do it as Metabase will need to initialize and run migrations on its database.

    fly deploy --vm-memory 1024

I set the VM Memory to 1024MB after I saw OOM exceptions in the remote builder logs

Time to tail the logs of the Metabase app to ensure everything works as expected:

    2024-04-17T12:02:18Z app[<REDACTED>] yul [info]2024-04-17 12:02:18,585 DEBUG plugins.lazy-loaded-driver :: Registering lazy loading driver :clickhouse...2024-04-17T12:02:18Z app[<REDACTED>] yul [info]2024-04-17 12:02:18,591 INFO driver.impl :: Registered driver :clickhouse (parents: [:sql-jdbc]) 🚚
    2024-04-17T12:02:21Z app[<REDACTED>] yul [info]2024-04-17 12:02:21,682 INFO driver.impl :: Initializing driver :clickhouse...2024-04-17T12:02:21Z app[<REDACTED>] yul [info]2024-04-17 12:02:21,683 INFO plugins.classloader :: Added URL file:/plugins/ch.jar to classpath2024-04-17T12:02:21Z app[<REDACTED>] yul [info]2024-04-17 12:02:21,685 DEBUG plugins.init-steps :: Loading plugin namespace metabase.driver.clickhouse...2024-04-17T12:02:21Z app[<REDACTED>] yul [info]2024-04-17 12:02:21,778 INFO driver.impl :: Registered driver :clickhouse (parents: [:sql-jdbc]) 🚚2024-04-17T12:02:21Z app[<REDACTED>] yul [info]2024-04-17 12:02:21,819 DEBUG plugins.jdbc-proxy :: Registering JDBC proxy driver for com.clickhouse.jdbc.ClickHouseDriver...2024-04-17T12:02:21Z app[<REDACTED>] yul [info]2024-04-17 12:02:21,821 INFO metabase.util :: Load lazy loading driver :clickhouse took 136.2 ms

If you see Metabase successfully ran its migrations, then things are most likely working. Although, if you really care to check, you’ll also see logs showing that the ClickHouse driver has been moved into /plugins.

Post-production deployment

I just had to go to Metabase admin settings to initialize ClickHouse as the database type. Because ClickHouse is deployed to Fly, the host name is set to <clickhouse-app-name>.internal, and the rest are set to default values. This may change at a later date.

metabase-clickhouse-driver-redacted

I’ll be writing about the cost to run this setup, as well as other metrics over time. It’s too soon to tell anything.

As always, reliability is of concern when deploying to Fly.io. A year later after the post made by Kurt (CEO of Fly.io), and I can’t say things have improved much. The only improvement I’ve noticed is that the remote builder no longer errors out or times out that often.

As I was writing this post, Fly had another outage. They have had 16 incidents this month alone, and there’s still ~15 days left in the month of April.

fly-downtime-personalized-status-page

More coming soon…


You can contact me if you wish to discuss how to improve this setup, or if you want to have this exact set up (ClickHouse + Metabase) for your organization