Reporting engine

Reporting engine is a key component in querying data from Fyma platform.

For request and response formats, see API reference.

Under the hood

Essentially it is a facade for Elasticsearch. If you're familiar with it, you will understand the engine in no time.

We handle multi-tenancy so that customers can access only their data. Also, we only provide a subset of Elasticsearch filters and aggregations.

Key concepts

Before we get started, it's good to establish some key concepts to better understand how to query your data.

In Elasticsearch, data is stored in indices. Think of them as tables. In indices, there are documents. Think of them as table rows.

To get something meaningful out of your data, we aggregate documents. An excerpt from Elasticsearch documentation:

An aggregation summarizes your data as metrics, statistics, or other analytics. Aggregations help you answer questions like:

What’s the average load time for my website?

Who are my most valuable customers based on transaction volume?

What would be considered a large file on my network?

How many products are in each product category?

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

Types of data

There are two types of data in Fyma platform: measurements and events. A few real-world parallels:

A temperature sensor. We would need to check it every, let's say, a minute and store a measurement.
A door chime. It notifies us every time someone opens the door. We would listen for these notification and store an event for each notification.

Parking lot stats

Parking cameras generate measurements with a fixed interval. We listen these measurements and once every camera has submitted a measurement, we aggregate them and insert a new document into Elasticsearch. These documents roughly look like this:

Parking lot ID	Free	Occupied	Total	Timestamp
123	10	20	30	2023-01-03T07:52:47.130Z
456	0	85	85	2023-01-03T07:52:47.130Z
123	15	15	30	2023-01-03T07:54:18.820Z
456	5	80	85	2023-01-03T07:54:18.820Z
123	16	14	30	2023-01-03T07:54:41.448Z
456	30	55	85	2023-01-03T07:54:41.448Z

Movement events

Movement events are generated by live cameras which constantly process a video stream (as opposed to parking cameras which take a look at a video stream every 15 seconds). We keep track of all the objects we detect across the frame. Once an object has disappeared, we insert a document into Elasticsearch.

Camera ID	Label	Crossed lines	Polygons	Timestamp
123	car	road A->B	-	2023-01-03T07:52:47.130Z
123	person	crosswalk B->A; road B->A	kerbside (4 seconds)	2023-01-03T07:54:18.820Z
456	person	entrance A->B	walkway (10 seconds)	2023-01-03T07:54:41.448Z

Label shows which kind of object it was.
Crossed lines shows which lines did the object cross (+ direction).
Polygons show which polygons did the object visit (+ how much time it spent there)

Aggregating data

To aggregate data, we need to provide an aggregation pipeline. In this pipeline we can combine different aggregations.

Date histogram aggregation

Groups data hourly, daily, monthly etc by timestamp.

Terms aggregation

Groups data by term.

Let's take the data from movement events as an example and aggregate it by label. Then we would have two buckets - car and person:

Camera ID	Label	Crossed lines	Polygons	Timestamp
123	car	road A->B	-	2023-01-03T07:52:47.130Z

Camera ID	Label	Crossed lines	Polygons	Timestamp
123	person	crosswalk B->A; road B->A	kerbside (4 seconds)	2023-01-03T07:54:18.820Z
456	person	entrance A->B	walkway (10 seconds)	2023-01-03T07:54:41.448Z

We could also aggregate it by crossed lines. Then we would have three buckets - road, crosswalk and entrance:

Camera ID	Label	Crossed lines	Polygons	Timestamp
123	car	road A->B	-	2023-01-03T07:52:47.130Z
123	person	crosswalk B->A; road B->A	kerbside (4 seconds)	2023-01-03T07:54:18.820Z

Camera ID	Label	Crossed lines	Polygons	Timestamp
123	person	crosswalk B->A; road B->A	kerbside (4 seconds)	2023-01-03T07:54:18.820Z

Camera ID	Label	Crossed lines	Polygons	Timestamp
456	person	entrance A->B	walkway (10 seconds)	2023-01-03T07:54:41.448Z

Stats aggregation

Calculates min/max/average/sum over a numerical field.

Examples

Parking lot example

Let's say we want to understand how many cars occupy our parking lot every hour. Let's assume that we have

a parking lot
a single camera in said parking lot
day's worth of data

To better illustrate, let's visualise some of the data in JSON.

[
  {
    "parkingLotId": "<parking lot id>",
    "free": 10,
    "occupied": 20,
    "total": 30,
    "activations": [...],
    "timestamp": 0
  },
  {
    "parkingLotId": "<parking lot id>",
    "free": 12,
    "occupied": 18,
    "total": 30,
    "activations": [...],
    "timestamp": 600000 // 10 minutes
  },
  {
    "parkingLotId": "<parking lot id>",
    "free": 15,
    "occupied": 18,
    "total": 30,
    "activations": [...],
    "timestamp": 1200000 // 20 minutes
  }
]

Let's pretend that actually there's much more data - one document after every 15 seconds as it's supposed to be. By doing the math, we see that there are a lot of measurements:

24 hours * 60 minutes * 4 datapoints/minute = 5760 measurements

And that's for a single parking lot every day! Luckily, that's where aggregations come to help. Coming back to our original question - how many cars occupy the parking lot every our? - we can use the following aggregation pipeline:

Firstly, divide measurements into hour-sized buckets. Given that we have day's worth of data, that's 24 buckets. In each bucket, there are 60 minutes * 4 datapoints/minute = 240 measurements, that belong to that specific bucket, e.g. a measurement with a timestamp of 14:33:15 belongs to a bucket with a key of 14 (as in 14th hour). To illustrate with JSON, it might look something like this (documents omitted for brevity):

{
  "buckets": [
    { "key": 0, "documents": [] },
    { "key": 1, "documents": [] },
    { "key": 2, "documents": [] },
    /* ... */
    { "key": 21, "documents": [] },
    { "key": 22, "documents": [] },
    { "key": 23, "documents": [] }
  ]
}

Secondly, go over every bucket and calculate the average value of the occupied property shown in the above.

{
  "buckets": [
    { "key": 0, "average": 15 },
    { "key": 1, "average": 20 },
    { "key": 2, "average": 22 },
    /* ... */
    { "key": 21, "average": 3 },
    { "key": 22, "average": 3 },
    { "key": 23, "average": 0 }
  ]
}

Finally, we have 24 datapoints instead of the original 5760.