Integrate Flower, Prometheus and Grafana for celery task management and visualization.

7 min readMar 4, 2023

In our previous article, we tried to get a basic idea of the celery tool to fulfill async requests in django. In this article, we will try to understand profiling tool for celery (and in a bigger picture for complex applications)

Flower is a web-based tool for monitoring and administrating Celery clusters.

Feature of flower:

Real-time monitoring using Celery Events
Remote Control
Shutdown and restart worker instances
Control worker pool size and autoscale settings
View and modify the queues a worker instance consumes from
Apply time and rate limits
Revoke and terminate tasks
Broker monitoring (In our case broker will be Redis)
HTTP API
Integration support to Prometheus

Installation:

Please follow the steps mentioned here: Installation — Flower 1.0.1 documentation

Docker-compose file changes:

version: x.y
services:
  ...
  flower:
    networks:
      - network_name
    build:
      context: .
    command: celery -A project_name flower --port=5555
    env_file: .env
    ports:
      - 5555:5555
    depends_on:
      - celery
      
networks:
  network_name:
    driver: bridge

Note: network needs to be defined as internal services will read data from each other. (for eg. celery is dependent on redis (message broker), flower is dependent on celery, Prometheus on flower and so on.

Running it locally, you will be able to see:

Also, you can check metrics by routing to base_url/metrics endpoint.

Now real things begin with the question of how these matrices can be useful to us. That’s where Prometheus comes into the picture. Before it, for basics let’s try to understand another concept.

Introduction to Time-Series Database

What is time-series data?
A sequence of data points collected over time intervals.

Time-series data gives us the ability to track changes over time. Small changes over time are missed which can give us more insights. for eg. stock values over a period of time or inflation rate charts for a particular duration etc.

Why time-series database?

A time-series database (TSDB) is a computer system that is designed to store and retrieve data records that are part of a “time series,” which is a set of data points that are associated with timestamps. The timestamps provide a critical context for each of the data points in how they are related to others. Such data contains Insert events only (no update).

By focusing on change, we can identify time-series datasets that we aren’t collecting today and identify opportunities to start collecting that data now, so that we can harness its value later.

Why can’t we just use a ‘normal’ database?

Scaling issues as raw data can become a huge pile of chunks over time.
Usability — Time Scale Databases (TSDB) provide a better GUI. Can be useful to understand things easily.

Our Use Case?

By storing time series data we can identify what, when and where things went wrong (Monitoring).
Get a better understanding of the system, and what’s actually happening in the system and based on that technical and business decisions can be driven.

Now, let’s jump back to our original discussion.

What is Prometheus?

Prometheus is an open-source tool used for metrics-based monitoring and alerting. It is a popular and powerful solution for Kubernetes monitoring. Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.

Important notes:

Stores metrics in memory and local disk in a custom, efficient format.
Supports flexible query language — PromQL.
Prometheus doesn’t require our applications to use CPU cycles pushing metrics to some centralized collector.
A very powerful tool for collecting and querying metric data.
It works by pulling(collecting data from application services and hosts) real-time metrics from applications on a regular cadence by sending HTTP requests on metrics endpoints of applications. Also provides pushing mechanism support.
For situations where pulling metrics is not feasible (e.g. short-lived jobs) Prometheus provides a Push gateway that allows applications to still push metric data if required.
Prometheus is usually used alongside Grafana. Grafana is a visualization tool that pulls Prometheus metrics and makes it easier to monitor.
Memory usage: Prometheus keeps all the currently used chunks in memory. In addition, it keeps as many most recently used chunks in memory as possible. You have to tell Prometheus how much memory it may use for this caching.
Prometheus has a sophisticated local storage subsystem. For indexes, it uses LevelDB. For the bulk sample data, it has its own custom storage layer, which organizes sample data in chunks of constant size (1024 bytes payload). These chunks are then stored on disk in one file per time series.

Components

The Prometheus ecosystem consists of multiple components, many of which are optional:

the main Prometheus server which scrapes and stores time series data
client libraries for instrumenting application code
a push gateway for supporting short-lived jobs
special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
an alert manager to handle alerts
various support tools

Architecture

*source*: Prometheus official documentation.

Installation

Add prometheus.yml file to your project directory.

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']
  - job_name: flower
    static_configs:
      - targets: ['flower:5555'] # if you run flower config using docker else use localhost:5555

There are three blocks of configuration in the example configuration file: global, rule_files, and scrape_configs.

The global block controls the Prometheus server's global configuration.

The scrape_interval controls how often Prometheus will scrape targets.

The evaluation_interval option controls how often Prometheus will evaluate rules. Prometheus uses rules to create new time series and to generate alerts.

The scrape_configs controls what resources Prometheus monitors. Since Prometheus also exposes data about itself as an HTTP endpoint it can scrape and monitor its own health. Prometheus expects metrics to be available on targets on a path of /metrics. So this default job is scraping via the URL: http://localhost:9090/metrics.

Add below changes to your docker-compose.yml file

prometheus:
    networks: # as mentioned above this component will be required 
      - inref
    image: prom/prometheus:latest
    container_name: prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml # absolute path will be required
      - ~/inref-volume/prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    restart: unless-stopped
    ports:
      - 9090:9090
    labels:
      org.label-schema.group: "monitoring"

Note: an absolute path will be needed under the volume section.

How does Prometheus work?

Collects metrics from monitored targets by scraping metrics HTTP endpoints. Allows multiple Prometheus servers to scrape services independently. Each Prometheus server is standalone, not depending on network storage or other remote services.

Data writes are in form of bulk/batch writes in periodic intervals.

When series are deleted via the API, deletion records are stored in separate tombstone files.

What does it not do?

Logging or tracing.
Automatic Anomaly detection.
Long durable storage.
Prometheus values reliability. You can always view what statistics are available about your system, even under failure conditions.
If you need 100% accuracy, such as for per-request billing, Prometheus is not a good choice.

Types of data matrices:

can be found under #TYPE section

Matric Types:

Counter: Any value that increases, ie. count of requests, errors, API calls etc. It resets to zero on restart.
Gauge: Types of values that vary bidirectionally (increment/decrement)
Histogram: A histogram samples observations (usually things like request duration or response sizes) and counts them in configurable buckets.
Also records a count of the number of observations and a sum of the observations.
This allows the calculation of averages and percentiles.
When operating on buckets, remember that the histogram is cumulative.
Summary: summary samples observations, it calculates configurable quartiles over a sliding time window.

Setting up Grafana For Prometheus:

Grafana open source software enables you to query, visualize, alert on, and explore your metrics, logs, and traces wherever they are stored. Grafana OSS provides you with tools to turn your time-series database (TSDB) data into insightful graphs and visualizations.

Install grafana using docker-compose. Add below section:

grafana:
  networks:
    - inref
  image: grafana/grafana:6.7.2
  container_name: grafana
  restart: unless-stopped
  ports:
    - 3000:3000
  labels:
    org.label-schema.group: "monitoring"

After running the containers, you can visit http://localhost:3000. Default credentials are admin, admin.

Add a Prometheus data source:

Click on the Grafana logo to open the sidebar.
Click on “Data Sources” in the sidebar.
Choose “Add New”.
Select “Prometheus” as the data source.
Set the Prometheus server URL (in our case: http://localhost:9090/)
Click “Add” to test the connection and to save the new data source.

The dashboard has been successfully integrated. Now one can apply queries using UI to analyse different graphs. Query documentation can be found here.