Real-Time Monitoring for Phoenix Apps with Grafana and PromEx

Krzysztof Janiec

August 29, 2025

Real-Time Monitoring for Phoenix Apps with Grafana and PromEx

Join Elixir newsletter

Subscribe to receive Elixir news to your inbox every two weeks.

Expand your skills

Download free e-books, watch expert tech talks, and explore open-source projects. Everything you need to grow as a developer - completely free.

Explore Community

Table of contents

When you first launch your Elixir app, everything seems super fast. Lightweight processes and fast routing, fantastic WebSocket support. But when the app becomes a real, revenue-generating system, you suddenly need answers to difficult questions.

Why observe?

Why is registration latency increasing? Which endpoint is overloading the Erlang VM? This is where good old observability - metrics, logs, and traces - pays its debt.

Prometheus Metrics: The De Facto Currency of Insight

The primary choice for numerical data points in modern back-ends is Prometheus. A running Prometheus instance scrapes your app at predictable intervals and stores captured metrics as time series. Those series power alert rules, and, of course, all the fancy Grafana dashboards.

Prometheus works brilliantly with Elixir because both share a “let it crash, restart fast” philosophy. So, leaning on Prometheus metrics in Phoenix feels natural.

Meet the PromEx Library

Enter PromEx—an Elixir wrapper that hides a mountain of boilerplate. The PromEx library exposes a clean, consistent interface for registering plugins, emitting application metrics, and even provisioning dashboards automatically. With a single PromEx module, you can cover everything from VM statistics to LiveView latency or Ecto queries usage.

Under the hood, each plugin returns a list of structs describing what should be scraped. PromEx turns those into collectors, handles telemetry handlers, and groups them into metrics collection cycles that run at your chosen interval.

PromEx and Its Friends

Add three lines to mix.exs, run mix deps.get, copy-paste the sample PromEx module from the docs, enable it in the supervision tree, and you’re off to the races, or if you want a one-liner:

mix prom_ex.gen.config --datasource YOUR_PROMETHEUS_DATASOURCE_ID

That single line wires in default VM metrics, BEAM memory stats, and endpoint timings collected via telemetry.

Digging Into Built-in PromEx Plugins and Dashboards

PromEx ships a couple of production‑ready plugins and ready‑to‑import dashboards right out of the box. Mix and match, or just slam them all into plugins/0 and let PromEx handle the rest.

Digging Into Built-in PromEx Plugins and Dashboards

Plugin line‑up

PromEx.Plugins.Application – dependency counts, Git info, uptime;
PromEx.Plugins.Beam – schedulers, GC pauses, run‑queue lengths;
PromEx.Plugins.Phoenix – HTTP request counts, durations, channel joins;
PromEx.Plugins.PhoenixLiveView – mount/handle_event/handle_params timings and error counts;
PromEx.Plugins.Ecto – query timings, pool checkout waits, result counts;
PromEx.Plugins.Oban – job queue depth, execution and failure rates;
PromEx.Plugins.Absinthe – GraphQL execution timings, query complexity, subscription fan‑out;
PromEx.Plugins.Broadway – message throughput, batch processing latency;

Digging Into Built-in PromEx Plugins line up

Dashboards cheat‑sheet

Dashboard	Why you care
Application	Know exactly what version and SHA are running, plus dependency bloat.
BEAM	Memory leaks, scheduler starvation, or process explosions.
Ecto	Spot N+1 queries and slow migrations.
Oban	Watch your background jobs back up in real time.
Phoenix	Slice request latency by route or method.
Phoenix LiveView	Catch long‑running handle_event callbacks.
Broadway	Validate batch sizes and back‑pressure behaviour.

Writing Your Own Plugin (Yes, It’s Easy)

Need your own metrics? Make your own plugin. Define a module, use PromEx.Plugin, and implement two callbacks:

metrics/1 – produce a Summary, Counter, Histogram, or even a custom gauge metric.
(Optional) dashboard/0 – return a JSON dashboard description if you want dashboards shipped automatically.

Because you’re in plain Elixir, you have complete freedom in collecting metrics. You can define gauges dynamically based on real-time data, query local ETS tables, or even execute HTTP calls to external services to gather metrics.

Here’s a practical example of creating a custom gauge to monitor MQTT connection count and messages processed:

defmodule MyApp.MqttMetrics do
  use PromEx.Plugin
  alias PromEx.Gauge

  @impl true
  def metrics(_config) do
    [
      Gauge.new(
        name: :mqtt_active_connections,
        help: "Current active MQTT connections",
        labels: [:broker]
      ),
      Gauge.new(
        name: :mqtt_messages_processed_total,
        help: "Total number of MQTT messages processed",
        labels: [:broker]
      )
    ]
  end

  @impl true
  def collect_metrics(_) do
    connections = MyApp.Mqtt.connection_count()
    messages_processed = MyApp.Mqtt.total_messages()

    Gauge.set(:mqtt_active_connections, connections, ["mqtt-primary"])
    Gauge.set(:mqtt_messages_processed_total, messages_processed, ["mqtt-primary"])
  end
end

Integrating custom plugins with dashboards is straightforward. Simply provide a JSON-based dashboard description via dashboard/0, and PromEx automatically provisions it in Grafana. Your new metrics immediately become visual and actionable, enhancing your application's observability significantly.

Feel empowered to use these tools creatively—monitor MQTT brokers, external API latencies, user behavior metrics, or even business KPIs. The flexibility provided by PromEx makes monitoring tailored precisely to your application's needs.

Prometheus and Grafana: Better Together

Prometheus is the metrics engine—it scrapes/pulls data from your services, stores it in its own time-series database (TSDB), evaluates PromQL recording/alerting rules, and, with Alertmanager, routes alerts. Grafana is the visualization and orchestration layer—it connects to Prometheus (and many other data sources), renders dashboards, handles users/teams/folders, adds annotations, and offers UI-driven alerting and correlations (e.g., Prometheus + Loki logs + Tempo traces). Prometheus owns collection, labeling, retention, and rule evaluation; Grafana owns the querying UX, panel transformations, permissions, and cross-source storytelling. A handy rule of thumb: collectors, scrape configs, recording/alert rules → Prometheus; dashboards, exploration, alert presentation/routing UI, and cross-data-source links → Grafana.

The combo of Prometheus metrics and Grafana is cliché for a reason. Metrics land in under a second, Grafana alerts trigger under two, and you can reconstruct downtime retroactively. Metrics are the facts; dashboards tell the stories.

Once PromEx pushes its JSON to Grafana, you get:

A dashboard's function that groups panels by plugin.
Pre-wired graph annotations when you deploy.
Prometheus-style ad-hoc queries in Explore to, well, explore.

Prometheus and Grafana

Loki, PromTail, and Log Structure Side Quest

Logs are often your first line of defense when debugging incidents or exploring unexpected behavior. Loki, part of the Grafana ecosystem, offers a modern log aggregation solution that integrates seamlessly with your Elixir/Phoenix stack.

To leverage Loki effectively in your Elixir application, structured logging is crucial. Structured logging means that your logs are formatted consistently, typically as JSON, and include essential context, such as request IDs, user identifiers, or error metadata. This format allows Loki to index logs efficiently, supporting powerful queries without the overhead of full-text indexing.

The popular LoggerJSON library integrates beautifully with Loki, automatically formatting Elixir logs as structured JSON. Here’s a quick example of a structured log format:

{
  "timestamp": "2024-06-16T12:34:56.789Z",
  "level": "error",
  "message": "Failed to process payment",
  "metadata": {
    "request_id": "12345-abcd",
    "user_id": 789,
    "error_reason": "Insufficient funds"
  }
}

To start sending logs to Loki, Promtail is the recommended log shipper. Promtail runs alongside your Phoenix app, collecting logs from standard output or log files, and then forwarding them directly to Loki. Promtail’s lightweight design and configurable pipelines mean minimal resource overhead.

Configuring Promtail is straightforward—just point it at your logs and specify your Loki instance. With structured logs, you can perform advanced queries in Grafana, correlating log events with specific metrics or incidents. For instance, you can easily trace errors related to specific requests by filtering on request IDs or quickly detect anomalous user behavior by filtering on user IDs.

Integrating structured logging with Loki dramatically enhances your observability, enabling quicker root-cause analysis and streamlined debugging. Embrace structured logging patterns in your Phoenix applications, and let Loki’s powerful querying capabilities make troubleshooting simpler and faster.

Monitoring the Monitor (Yes, It’s Turtles)

Even Prometheus can melt. It's critical to ensure your monitoring infrastructure remains reliable—if Prometheus itself becomes overloaded or crashes, you might miss crucial signals about your system’s health. This is where meta-monitoring comes into play.

Export Prometheus's own /metrics endpoint to a secondary scraper dedicated specifically to monitoring Prometheus. Track key metrics like:

prometheus_tsdb_head_series: This indicates the number of active time series in Prometheus. A rapidly growing number can signal potential memory exhaustion, risking an Out-Of-Memory (OOM) scenario.
prometheus_remote_storage_queue_highest_sent_timestamp_seconds: Monitoring this helps detect ingest lag. An increasing gap between the highest sent timestamp and the current time means your Prometheus might be struggling to keep up with incoming data.
prometheus_rule_evaluation_failures_total: This metric is essential to detect silently failing alerting rules. An increasing count here indicates that Prometheus can't reliably evaluate your alerts.
process_resident_memory_bytes: Track Prometheus' RAM usage explicitly to ensure it's not exceeding available resources.
prometheus_http_requests_total: Monitor HTTP request rates to Prometheus to spot unusual load spikes or configuration issues.

Implementing meta-monitoring safeguards your observability infrastructure, ensuring you don't lose visibility precisely when you need it the most. Remember, meta-monitoring might seem mundane, but it becomes invaluable during critical incidents. Meta-monitoring is boring until it’s not.

One Compose to Rule Them All: PromEx, Prometheus, Grafana

The easiest way to stand up Prometheus metrics and Grafana for a Phoenix application is with Docker Compose plus the prom_ex library:

Spin up a Prometheus instance and a Grafana in one file.
Then add a prom_ex module to your Elixir application so your instrumented app exposes application metrics via telemetry events and telemetry handlers at app endpoint and router boundaries;
With a single mix task (prom_ex.gen.config), you generate a configuration that wires running Prometheus to scrape the app and auto-provisions Grafana dashboards;
This Compose-based integration is minimal code to execute yet high value: PromEx plugins leverage the Phoenix ecosystem to monitor system performance across the whole lifecycle of your server/app, while Grafana lets users visualize, explore, and alert on metrics and dashboards; in short, Prometheus metrics and PromEx give you a configured, production-grade observability stack using first-class tools.

We’ve walked through why observability matters, how to create metrics with PromEx plugins, and where to visualize them with Grafana dashboards—all while sprinkling in log aggregation via Loki. The result is a single‑pane‑of‑glass view of your Phoenix app that lets you manage performance, debug incidents, and prove value to stakeholders.

Dive deeper into this topic with these related posts

No items found.

Discover more content from this category