Integrate Flower, Prometheus and Grafana for celery task management and visualization.
In our previous article, we tried to get a basic idea of the celery tool to fulfill async requests in django. In this article, we will try to understand profiling tool for celery (and in a bigger picture for complex applications)
Flower is a web-based tool for monitoring and administrating Celery clusters.
Feature of flower:
- Real-time monitoring using Celery Events
- Remote Control
- Shutdown and restart worker instances
- Control worker pool size and autoscale settings
- View and modify the queues a worker instance consumes from
- Apply time and rate limits
- Revoke and terminate tasks
- Broker monitoring (In our case broker will be Redis)
- HTTP API
- Integration support to Prometheus
Installation:
Please follow the steps mentioned here: Installation — Flower 1.0.1 documentation
Docker-compose file changes:
version: x.y
services:
...
flower:
networks:
- network_name
build:
context: .
command: celery -A project_name flower --port=5555
env_file: .env
ports:
- 5555:5555
depends_on:
- celery
networks:
network_name:
driver: bridge
Note: network needs to be defined as internal services will read data from each other. (for eg. celery is dependent on redis (message broker), flower is dependent on celery, Prometheus on flower and so on.
Running it locally, you will be able to see:
Also, you can check metrics by routing to base_url/metrics endpoint.
Now real things begin with the question of how these matrices can be useful to us. That’s where Prometheus comes into the picture. Before it, for basics let’s try to understand another concept.
Introduction to Time-Series Database
What is time-series data?
A sequence of data points collected over time intervals.
Time-series data gives us the ability to track changes over time. Small changes over time are missed which can give us more insights. for eg. stock values over a period of time or inflation rate charts for a particular duration etc.
Why time-series database?
A time-series database (TSDB) is a computer system that is designed to store and retrieve data records that are part of a “time series,” which is a set of data points that are associated with timestamps. The timestamps provide a critical context for each of the data points in how they are related to others. Such data contains Insert events only (no update).
By focusing on change, we can identify time-series datasets that we aren’t collecting today and identify opportunities to start collecting that data now, so that we can harness its value later.
Why can’t we just use a ‘normal’ database?
- Scaling issues as raw data can become a huge pile of chunks over time.
- Usability — Time Scale Databases (TSDB) provide a better GUI. Can be useful to understand things easily.
Our Use Case?
- By storing time series data we can identify what, when and where things went wrong (Monitoring).
- Get a better understanding of the system, and what’s actually happening in the system and based on that technical and business decisions can be driven.
Now, let’s jump back to our original discussion.
What is Prometheus?
Prometheus is an open-source tool used for metrics-based monitoring and alerting. It is a popular and powerful solution for Kubernetes monitoring. Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.
Important notes:
- Stores metrics in memory and local disk in a custom, efficient format.
- Supports flexible query language — PromQL.
- Prometheus doesn’t require our applications to use CPU cycles pushing metrics to some centralized collector.
- A very powerful tool for collecting and querying metric data.
- It works by pulling(collecting data from application services and hosts) real-time metrics from applications on a regular cadence by sending HTTP requests on metrics endpoints of applications. Also provides pushing mechanism support.
- For situations where pulling metrics is not feasible (e.g. short-lived jobs) Prometheus provides a Push gateway that allows applications to still push metric data if required.
- Prometheus is usually used alongside Grafana. Grafana is a visualization tool that pulls Prometheus metrics and makes it easier to monitor.
- Memory usage: Prometheus keeps all the currently used chunks in memory. In addition, it keeps as many most recently used chunks in memory as possible. You have to tell Prometheus how much memory it may use for this caching.
Prometheus has a sophisticated local storage subsystem. For indexes, it uses LevelDB. For the bulk sample data, it has its own custom storage layer, which organizes sample data in chunks of constant size (1024 bytes payload). These chunks are then stored on disk in one file per time series.
Components
The Prometheus ecosystem consists of multiple components, many of which are optional:
- the main Prometheus server which scrapes and stores time series data
- client libraries for instrumenting application code
- a push gateway for supporting short-lived jobs
- special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
- an alert manager to handle alerts
- various support tools
Architecture
Installation
Add prometheus.yml file to your project directory.
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']
- job_name: flower
static_configs:
- targets: ['flower:5555'] # if you run flower config using docker else use localhost:5555
There are three blocks of configuration in the example configuration file: global
, rule_files
, and scrape_configs
.
The global
block controls the Prometheus server's global configuration.
The scrape_interval
controls how often Prometheus will scrape targets.
The evaluation_interval
option controls how often Prometheus will evaluate rules. Prometheus uses rules to create new time series and to generate alerts.
The scrape_configs
controls what resources Prometheus monitors. Since Prometheus also exposes data about itself as an HTTP endpoint it can scrape and monitor its own health. Prometheus expects metrics to be available on targets on a path of /metrics
. So this default job is scraping via the URL: http://localhost:9090/metrics.
Add below changes to your docker-compose.yml file
prometheus:
networks: # as mentioned above this component will be required
- inref
image: prom/prometheus:latest
container_name: prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml # absolute path will be required
- ~/inref-volume/prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--web.enable-lifecycle'
restart: unless-stopped
ports:
- 9090:9090
labels:
org.label-schema.group: "monitoring"
Note: an absolute path will be needed under the volume section.
How does Prometheus work?
Collects metrics from monitored targets by scraping metrics HTTP endpoints. Allows multiple Prometheus servers to scrape services independently. Each Prometheus server is standalone, not depending on network storage or other remote services.
Data writes are in form of bulk/batch writes in periodic intervals.
When series are deleted via the API, deletion records are stored in separate tombstone files.
What does it not do?
- Logging or tracing.
- Automatic Anomaly detection.
- Long durable storage.
- Prometheus values reliability. You can always view what statistics are available about your system, even under failure conditions.
- If you need 100% accuracy, such as for per-request billing, Prometheus is not a good choice.
Types of data matrices:
can be found under #TYPE section
Matric Types:
- Counter: Any value that increases, ie. count of requests, errors, API calls etc. It resets to zero on restart.
- Gauge: Types of values that vary bidirectionally (increment/decrement)
- Histogram: A histogram samples observations (usually things like request duration or response sizes) and counts them in configurable buckets.
Also records a count of the number of observations and a sum of the observations.
This allows the calculation of averages and percentiles.
When operating on buckets, remember that the histogram is cumulative. - Summary: summary samples observations, it calculates configurable quartiles over a sliding time window.
Setting up Grafana For Prometheus:
Grafana open source software enables you to query, visualize, alert on, and explore your metrics, logs, and traces wherever they are stored. Grafana OSS provides you with tools to turn your time-series database (TSDB) data into insightful graphs and visualizations.
Install grafana using docker-compose. Add below section:
grafana:
networks:
- inref
image: grafana/grafana:6.7.2
container_name: grafana
restart: unless-stopped
ports:
- 3000:3000
labels:
org.label-schema.group: "monitoring"
After running the containers, you can visit http://localhost:3000. Default credentials are admin, admin.
Add a Prometheus data source:
- Click on the Grafana logo to open the sidebar.
- Click on “Data Sources” in the sidebar.
- Choose “Add New”.
- Select “Prometheus” as the data source.
- Set the Prometheus server URL (in our case: http://localhost:9090/)
- Click “Add” to test the connection and to save the new data source.
The dashboard has been successfully integrated. Now one can apply queries using UI to analyse different graphs. Query documentation can be found here.