When you first launch your Elixir app, everything seems super fast. Lightweight processes and fast routing, fantastic WebSocket support. But when the app becomes a real, revenue-generating system, you suddenly need answers to difficult questions.

Table of contents

    Why observe?

    Why is registration latency increasing? Which endpoint is overloading the Erlang VM? This is where good old observability - metrics, logs, and traces - pays its debt.

    Prometheus Metrics: The De Facto Currency of Insight

    The primary choice for numerical data points in modern back-ends is Prometheus. A running Prometheus instance scrapes your app at predictable intervals and stores captured metrics as time series. Those series power alert rules, and, of course, all the fancy Grafana dashboards.

    Prometheus works brilliantly with Elixir because both share a “let it crash, restart fast” philosophy. So, leaning on Prometheus metrics in Phoenix feels natural.

    Meet the PromEx Library

    Enter PromEx—an Elixir wrapper that hides a mountain of boilerplate. The PromEx library exposes a clean, consistent interface for registering plugins, emitting application metrics, and even provisioning dashboards automatically. With a single PromEx module, you can cover everything from VM statistics to LiveView latency or Ecto queries usage.

    Under the hood, each plugin returns a list of structs describing what should be scraped. PromEx turns those into collectors, handles telemetry handlers, and groups them into metrics collection cycles that run at your chosen interval.

    PromEx and Its Friends

    Add three lines to mix.exs, run mix deps.get, copy-paste the sample PromEx module from the docs, enable it in the supervision tree, and you’re off to the races, or if you want a one-liner:

    mix prom_ex.gen.config --datasource YOUR_PROMETHEUS_DATASOURCE_ID

    That single line wires in default VM metrics, BEAM memory stats, and endpoint timings collected via telemetry.

    Digging Into Built-in PromEx Plugins and Dashboards

    PromEx ships a couple of production‑ready plugins and ready‑to‑import dashboards right out of the box. Mix and match, or just slam them all into plugins/0 and let PromEx handle the rest.

    Digging Into Built-in PromEx Plugins and Dashboards

    Plugin line‑up

    • PromEx.Plugins.Application – dependency counts, Git info, uptime;

    • PromEx.Plugins.Beam – schedulers, GC pauses, run‑queue lengths;

    • PromEx.Plugins.Phoenix – HTTP request counts, durations, channel joins;

    • PromEx.Plugins.PhoenixLiveView – mount/handle_event/handle_params timings and error counts;

    • PromEx.Plugins.Ecto – query timings, pool checkout waits, result counts;

    • PromEx.Plugins.Oban – job queue depth, execution and failure rates;

    • PromEx.Plugins.Absinthe – GraphQL execution timings, query complexity, subscription fan‑out;

    • PromEx.Plugins.Broadway – message throughput, batch processing latency;

    Digging Into Built-in PromEx Plugins line up

    Dashboards cheat‑sheet

    DashboardWhy you care
    ApplicationKnow exactly what version and SHA are running, plus dependency bloat.
    BEAMMemory leaks, scheduler starvation, or process explosions.
    EctoSpot N+1 queries and slow migrations.
    ObanWatch your background jobs back up in real time.
    PhoenixSlice request latency by route or method.
    Phoenix LiveViewCatch long‑running handle_event callbacks.
    BroadwayValidate batch sizes and back‑pressure behaviour.

    Writing Your Own Plugin (Yes, It’s Easy)

    Need your own metrics? Make your own plugin. Define a module, use PromEx.Plugin, and implement two callbacks:

    1. metrics/1 – produce a Summary, Counter, Histogram, or even a custom gauge metric.

    2. (Optional) dashboard/0 – return a JSON dashboard description if you want dashboards shipped automatically.

    Because you’re in plain Elixir, you have complete freedom in collecting metrics. You can define gauges dynamically based on real-time data, query local ETS tables, or even execute HTTP calls to external services to gather metrics.

    Here’s a practical example of creating a custom gauge to monitor MQTT connection count and messages processed:

    defmodule MyApp.MqttMetrics do
      use PromEx.Plugin
      alias PromEx.Gauge
    
      @impl true
      def metrics(_config) do
        [
          Gauge.new(
            name: :mqtt_active_connections,
            help: "Current active MQTT connections",
            labels: [:broker]
          ),
          Gauge.new(
            name: :mqtt_messages_processed_total,
            help: "Total number of MQTT messages processed",
            labels: [:broker]
          )
        ]
      end
    
      @impl true
      def collect_metrics(_) do
        connections = MyApp.Mqtt.connection_count()
        messages_processed = MyApp.Mqtt.total_messages()
    
        Gauge.set(:mqtt_active_connections, connections, ["mqtt-primary"])
        Gauge.set(:mqtt_messages_processed_total, messages_processed, ["mqtt-primary"])
      end
    end

    Integrating custom plugins with dashboards is straightforward. Simply provide a JSON-based dashboard description via dashboard/0, and PromEx automatically provisions it in Grafana. Your new metrics immediately become visual and actionable, enhancing your application's observability significantly.

    Feel empowered to use these tools creatively—monitor MQTT brokers, external API latencies, user behavior metrics, or even business KPIs. The flexibility provided by PromEx makes monitoring tailored precisely to your application's needs.

    Prometheus and Grafana: Better Together

    Prometheus is the metrics engine—it scrapes/pulls data from your services, stores it in its own time-series database (TSDB), evaluates PromQL recording/alerting rules, and, with Alertmanager, routes alerts. Grafana is the visualization and orchestration layer—it connects to Prometheus (and many other data sources), renders dashboards, handles users/teams/folders, adds annotations, and offers UI-driven alerting and correlations (e.g., Prometheus + Loki logs + Tempo traces). Prometheus owns collection, labeling, retention, and rule evaluation; Grafana owns the querying UX, panel transformations, permissions, and cross-source storytelling. A handy rule of thumb: collectors, scrape configs, recording/alert rules → Prometheus; dashboards, exploration, alert presentation/routing UI, and cross-data-source links → Grafana.

    The combo of Prometheus metrics and Grafana is cliché for a reason. Metrics land in under a second, Grafana alerts trigger under two, and you can reconstruct downtime retroactively. Metrics are the facts; dashboards tell the stories.

    Once PromEx pushes its JSON to Grafana, you get:

    • A dashboard's function that groups panels by plugin.

    • Pre-wired graph annotations when you deploy.

    • Prometheus-style ad-hoc queries in Explore to, well, explore.

    Prometheus and Grafana

    Loki, PromTail, and Log Structure Side Quest

    Logs are often your first line of defense when debugging incidents or exploring unexpected behavior. Loki, part of the Grafana ecosystem, offers a modern log aggregation solution that integrates seamlessly with your Elixir/Phoenix stack.

    To leverage Loki effectively in your Elixir application, structured logging is crucial. Structured logging means that your logs are formatted consistently, typically as JSON, and include essential context, such as request IDs, user identifiers, or error metadata. This format allows Loki to index logs efficiently, supporting powerful queries without the overhead of full-text indexing.

    The popular LoggerJSON library integrates beautifully with Loki, automatically formatting Elixir logs as structured JSON. Here’s a quick example of a structured log format:

    {
      "timestamp": "2024-06-16T12:34:56.789Z",
      "level": "error",
      "message": "Failed to process payment",
      "metadata": {
        "request_id": "12345-abcd",
        "user_id": 789,
        "error_reason": "Insufficient funds"
      }
    }

    To start sending logs to Loki, Promtail is the recommended log shipper. Promtail runs alongside your Phoenix app, collecting logs from standard output or log files, and then forwarding them directly to Loki. Promtail’s lightweight design and configurable pipelines mean minimal resource overhead.

    Configuring Promtail is straightforward—just point it at your logs and specify your Loki instance. With structured logs, you can perform advanced queries in Grafana, correlating log events with specific metrics or incidents. For instance, you can easily trace errors related to specific requests by filtering on request IDs or quickly detect anomalous user behavior by filtering on user IDs.

    Integrating structured logging with Loki dramatically enhances your observability, enabling quicker root-cause analysis and streamlined debugging. Embrace structured logging patterns in your Phoenix applications, and let Loki’s powerful querying capabilities make troubleshooting simpler and faster.

    Monitoring the Monitor (Yes, It’s Turtles)

    Even Prometheus can melt. It's critical to ensure your monitoring infrastructure remains reliable—if Prometheus itself becomes overloaded or crashes, you might miss crucial signals about your system’s health. This is where meta-monitoring comes into play.

    Export Prometheus's own /metrics endpoint to a secondary scraper dedicated specifically to monitoring Prometheus. Track key metrics like:

    • prometheus_tsdb_head_series: This indicates the number of active time series in Prometheus. A rapidly growing number can signal potential memory exhaustion, risking an Out-Of-Memory (OOM) scenario.

    • prometheus_remote_storage_queue_highest_sent_timestamp_seconds: Monitoring this helps detect ingest lag. An increasing gap between the highest sent timestamp and the current time means your Prometheus might be struggling to keep up with incoming data.

    • prometheus_rule_evaluation_failures_total: This metric is essential to detect silently failing alerting rules. An increasing count here indicates that Prometheus can't reliably evaluate your alerts.

    • process_resident_memory_bytes: Track Prometheus' RAM usage explicitly to ensure it's not exceeding available resources.

    • prometheus_http_requests_total: Monitor HTTP request rates to Prometheus to spot unusual load spikes or configuration issues.

    Implementing meta-monitoring safeguards your observability infrastructure, ensuring you don't lose visibility precisely when you need it the most. Remember, meta-monitoring might seem mundane, but it becomes invaluable during critical incidents. Meta-monitoring is boring until it’s not.

    One Compose to Rule Them All: PromEx, Prometheus, Grafana

    The easiest way to stand up Prometheus metrics and Grafana for a Phoenix application is with Docker Compose plus the prom_ex library:

    • Spin up a Prometheus instance and a Grafana in one file.

    • Then add a prom_ex module to your Elixir application so your instrumented app exposes application metrics via telemetry events and telemetry handlers at app endpoint and router boundaries;

    • With a single mix task (prom_ex.gen.config), you generate a configuration that wires running Prometheus to scrape the app and auto-provisions Grafana dashboards;

    • This Compose-based integration is minimal code to execute yet high value: PromEx plugins leverage the Phoenix ecosystem to monitor system performance across the whole lifecycle of your server/app, while Grafana lets users visualize, explore, and alert on metrics and dashboards; in short, Prometheus metrics and PromEx give you a configured, production-grade observability stack using first-class tools.

    We’ve walked through why observability matters, how to create metrics with PromEx plugins, and where to visualize them with Grafana dashboards—all while sprinkling in log aggregation via Loki. The result is a single‑pane‑of‑glass view of your Phoenix app that lets you manage performance, debug incidents, and prove value to stakeholders.

    FAQ

    1. What is the importance of observability for Phoenix applications?

    Observability, encompassing metrics, logs, and traces, helps developers understand performance bottlenecks—such as increasing registration latency or overloaded VM processes—in production-ready Phoenix applications.

    2. Why is Prometheus considered the go-to solution for capturing metrics in Elixir apps?

    Prometheus excels at scraping time-series metrics at reliable intervals. Its pull-based model aligns naturally with Elixir’s fault-tolerant philosophy, making it a solid choice for gathering real-time insights from Phoenix backends.

    3. What role does the PromEx library play in integrating metrics collection?

    PromEx acts as an Elixir-friendly wrapper that streamlines the boilerplate involved in metrics instrumentation and telemetry. It simplifies registering plugins and auto-provisioning Grafana dashboards for metrics like VM statistics, HTTP request timings, and more.

    4. What built-in plugins does PromEx provide out of the box?

    PromEx includes a robust lineup of plugins to monitor various aspects: PromEx.Plugins.Application: dependency counts, uptime, and version info PromEx.Plugins.Beam: BEAM VM metrics—GC pauses, scheduler queues, memory PromEx.Plugins.Phoenix: HTTP request volumes, durations, channel joins PromEx.Plugins.PhoenixLiveView: timings and errors for LiveView callbacks PromEx.Plugins.Ecto: query durations, result sizes, checkout wait times PromEx.Plugins.Oban: job queue depth and processing metrics PromEx.Plugins.Absinthe: GraphQL request complexity and execution times PromEx.Plugins.Broadway: throughput and latency for batch processing

    5. What types of metrics dashboards does PromEx automatically create?

    PromEx can auto-provision dashboards that correspond to its plugins, including: Application Dashboard: Shows version/commit info and dependency footprint BEAM Dashboard: Monitors memory usage, process and scheduler stats Ecto Dashboard: Visualizes slow queries and database interactions Oban Dashboard: Tracks job queues and failure rates Phoenix Dashboard: Breaks down HTTP timing by endpoint/method Phoenix LiveView Dashboard: Monitors callback duration and errors Broadway Dashboard: Shows batch size, throughput, and pressure behavior

    6. How does PromEx streamline observability configuration with Grafana?

    PromEx handles the heavy lifting—registering telemetry handlers, exposing metrics, and uploading dashboards—so developers can define observability behavior declaratively. In just three lines and a mix command, PromEx connects to Prometheus and Grafana, simplifying setup and reducing manual effort.

    7. Can PromEx automatically upload dashboards to Grafana?

    Yes—PromEx integrates with Grafana's API to auto-upload dashboards upon app startup and manages lifecycle annotations. This ensures dashboards track deployment timelines and stay in sync with your application’s versions and environments Grafana Labs.

    8. Why is having automated dashboards beneficial for production observability?

    Automated dashboards reduce manual toil and streamline deployment pipelines. They enable version-controlled visuals, faster follow-up when app topology changes, and better alignment between application versions and observability tools.

    Curiosum Elixir and React Developer Krzysztof
    Krzysztof Janiec Elixir & React Developer

    Read more
    on #curiosum blog