Reporting engine
Reporting engine is a key component in querying data from Fyma platform.
For request and response formats, see API reference.
Under the hood
Essentially it is a facade for Elasticsearch. If you're familiar with it, you will understand the engine in no time.
We handle multi-tenancy so that customers can access only their data. Also, we only provide a subset of Elasticsearch filters and aggregations.
Key concepts
Before we get started, it's good to establish some key concepts to better understand how to query your data.
In Elasticsearch, data is stored in indices. Think of them as tables. In indices, there are documents. Think of them as table rows.
To get something meaningful out of your data, we aggregate documents. An excerpt from Elasticsearch documentation:
An aggregation summarizes your data as metrics, statistics, or other analytics. Aggregations help you answer questions like:
- What’s the average load time for my website?
- Who are my most valuable customers based on transaction volume?
- What would be considered a large file on my network?
- How many products are in each product category?
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
Types of data
There are two types of data in Fyma platform: measurements and events. A few real-world parallels:
- A temperature sensor. We would need to check it every, let's say, a minute and store a measurement.
- A door chime. It notifies us every time someone opens the door. We would listen for these notification and store an event for each notification.
Parking lot stats
Parking cameras generate measurements with a fixed interval. We listen these measurements and once every camera has submitted a measurement, we aggregate them and insert a new document into Elasticsearch. These documents roughly look like this:
Parking lot ID | Free | Occupied | Total | Timestamp |
---|---|---|---|---|
123 | 10 | 20 | 30 | 2023-01-03T07:52:47.130Z |
456 | 0 | 85 | 85 | 2023-01-03T07:52:47.130Z |
123 | 15 | 15 | 30 | 2023-01-03T07:54:18.820Z |
456 | 5 | 80 | 85 | 2023-01-03T07:54:18.820Z |
123 | 16 | 14 | 30 | 2023-01-03T07:54:41.448Z |
456 | 30 | 55 | 85 | 2023-01-03T07:54:41.448Z |
Movement events
Movement events are generated by live cameras which constantly process a video stream (as opposed to parking cameras which take a look at a video stream every 15 seconds). We keep track of all the objects we detect across the frame. Once an object has disappeared, we insert a document into Elasticsearch.
Camera ID | Label | Crossed lines | Polygons | Timestamp |
---|---|---|---|---|
123 | car | road A->B | - | 2023-01-03T07:52:47.130Z |
123 | person | crosswalk B->A; road B->A | kerbside (4 seconds) | 2023-01-03T07:54:18.820Z |
456 | person | entrance A->B | walkway (10 seconds) | 2023-01-03T07:54:41.448Z |
- Label shows which kind of object it was.
- Crossed lines shows which lines did the object cross (+ direction).
- Polygons show which polygons did the object visit (+ how much time it spent there)
Aggregating data
To aggregate data, we need to provide an aggregation pipeline. In this pipeline we can combine different aggregations.
Date histogram aggregation
Groups data hourly, daily, monthly etc by timestamp.
Terms aggregation
Groups data by term.
Let's take the data from movement events as an example and aggregate it by label. Then we would have two buckets - car and person:
Camera ID | Label | Crossed lines | Polygons | Timestamp |
---|---|---|---|---|
123 | car | road A->B | - | 2023-01-03T07:52:47.130Z |
Camera ID | Label | Crossed lines | Polygons | Timestamp |
---|---|---|---|---|
123 | person | crosswalk B->A; road B->A | kerbside (4 seconds) | 2023-01-03T07:54:18.820Z |
456 | person | entrance A->B | walkway (10 seconds) | 2023-01-03T07:54:41.448Z |
We could also aggregate it by crossed lines. Then we would have three buckets - road, crosswalk and entrance:
Camera ID | Label | Crossed lines | Polygons | Timestamp |
---|---|---|---|---|
123 | car | road A->B | - | 2023-01-03T07:52:47.130Z |
123 | person | crosswalk B->A; road B->A | kerbside (4 seconds) | 2023-01-03T07:54:18.820Z |
Camera ID | Label | Crossed lines | Polygons | Timestamp |
---|---|---|---|---|
123 | person | crosswalk B->A; road B->A | kerbside (4 seconds) | 2023-01-03T07:54:18.820Z |
Camera ID | Label | Crossed lines | Polygons | Timestamp |
---|---|---|---|---|
456 | person | entrance A->B | walkway (10 seconds) | 2023-01-03T07:54:41.448Z |
Stats aggregation
Calculates min/max/average/sum over a numerical field.
Examples
Parking lot example
Let's say we want to understand how many cars occupy our parking lot every hour. Let's assume that we have
- a parking lot
- a single camera in said parking lot
- day's worth of data
To better illustrate, let's visualise some of the data in JSON.
[
{
"parkingLotId": "<parking lot id>",
"free": 10,
"occupied": 20,
"total": 30,
"activations": [...],
"timestamp": 0
},
{
"parkingLotId": "<parking lot id>",
"free": 12,
"occupied": 18,
"total": 30,
"activations": [...],
"timestamp": 600000 // 10 minutes
},
{
"parkingLotId": "<parking lot id>",
"free": 15,
"occupied": 18,
"total": 30,
"activations": [...],
"timestamp": 1200000 // 20 minutes
}
]
Let's pretend that actually there's much more data - one document after every 15 seconds as it's supposed to be. By doing the math, we see that there are a lot of measurements:
24 hours * 60 minutes * 4 datapoints/minute = 5760 measurements
And that's for a single parking lot every day! Luckily, that's where aggregations come to help. Coming back to our original question - how many cars occupy the parking lot every our? - we can use the following aggregation pipeline:
Firstly, divide measurements into hour-sized buckets. Given that we have day's worth of data, that's 24 buckets. In each bucket, there are 60 minutes * 4 datapoints/minute = 240 measurements
, that belong to that specific bucket, e.g. a measurement with a timestamp of 14:33:15
belongs to a bucket with a key of 14 (as in 14th hour). To illustrate with JSON, it might look something like this (documents omitted for brevity):
{
"buckets": [
{ "key": 0, "documents": [] },
{ "key": 1, "documents": [] },
{ "key": 2, "documents": [] },
/* ... */
{ "key": 21, "documents": [] },
{ "key": 22, "documents": [] },
{ "key": 23, "documents": [] }
]
}
Secondly, go over every bucket and calculate the average value of the occupied
property shown in the above.
{
"buckets": [
{ "key": 0, "average": 15 },
{ "key": 1, "average": 20 },
{ "key": 2, "average": 22 },
/* ... */
{ "key": 21, "average": 3 },
{ "key": 22, "average": 3 },
{ "key": 23, "average": 0 }
]
}
Finally, we have 24 datapoints instead of the original 5760.