Data Analytics: OLAP vs OLTP
Mar 7, 2024
In this article, you will learn about the difference between OLAP and OLTP for data analyst.
OLAP and OLTP are two different approaches to handling data.
OLAP
Well, OLAP stands for Online Analytical Processing and used for business intelligence and data analysis.
More simply put, it is used to cost and time efficiently analyze huge data from multiple sources, perform complex analyses, and generate reports.
Examples of OLAP data warehouses:
Amazon Redshift, Snowflake, Microsoft Azure Synapse Analytics, and Google BigQuery.
OLTP
OLTP stands for Online Transaction Processing.
OLTP is used for transaction processing in real-time.
In simple terms, OLTP is a production database (connected directly to the application/website), which means it manages things like user orders, subscriptions, inventory, and so on.
It needed to be fast [and furious] to write and read, get and show data to the user.
Examples of OLTP databases:
Oracle Database, Microsoft SQL Server, MySQL, and PostgreSQL.
A lot of them are open-source.
OLTP vs OLAP for Data Analyst
OLAP is not better than OLTP, and OLTP is not better than OLAP.
They are just tools that are used for separate tasks.
AND, mostly, they work together.
BI, Data Engineers, or even Software Developers are constantly loading data from OLTP DB to OLAP DWH to make analytics there possible.
Simply, why not do analytics in OLTP?
Considering the fact that a lot of OLTP DBs are open-source this is the right question.
OLTP is connected to the production version of your product.
It has to be as fast as possible.
Storing history data there is no way to achieve relevant speed.
OLTP scales vertically
You have only one instance.
And it means that to improve the work of your transactional DB, you have to add additional memory, processor cores, and related to that instance.
Then, imagine a case when someone writes "select * from ...".
It will take all the available computing slots and put other production queries into the queue.
Which will make users wait until the server responds.
Why use OLAP for analytics?
OLAP is designed for analytics.
In most cases, OLAP is NoSQL and column-based.
BTW: NoSQL means "Not only SQL", not "No SQL".
OLAP scales horizontally
This means that it adds simply more instances to fulfill the needs of the user's query.
And, what is more important, it does it without any control, automatically (serverless).
This means you do not have to manage the number of instances, OLAP will analyze the query and allocate the needed amount.
Great support of BI tools, analytics functions, storing different types of data, and a lot of others
But, OLAP is expensive.
Really expensive.
So, it is better to understand how to write optimised queries.
Why it is important to Data Analysts?
Data analysts, most of time use an OLAP Datawarehouse.
Which means querying huge volumes of data.
The cost of the query is tied to the amount of scanned data. More data scanned -> more money your company needs to pay.
Example for BigQuery:
There are no real entity like tables in BigQuery as we get used in OLTP DBs, only columns.
UI simply shows us tables as we are used to them.
But, in fact, BigQuery stores columns separately.
(Columns are connected to each other).
This means that if you write “select *”, it will scan every column in the table.
However, if you select only a few columns, it will scan only them and you will pay only for them.
Imagine having 30 columns in a table with 100M of rows.
Querying 3 of them and querying 30 of them is a huge difference in this case.
It is a very beneficial skill for both analysts and business to know how to optimise for costs SQL queries.
Cheers,
Stan.