At Kudi, we believe that people of all classes should have access to financial services and seamlessly perform transactions. We make this possible through our agency banking network.
What is the point of a business if you are not analyzing historical data?
We have tried to democratize both data access and data analysis. We utilize Google BigQuery for data analytics. It is fully managed and scales seamlessly to answer our questions when we run queries. It is also fast.
Unfortunately not everyone writes queries, and not every question can be answered with a table of figures. There is a reason why data storytelling is a thing.
We utilize Tableau Desktop and Tableau Online for creating and publishing reports and dashboards. This works very well until it starts to slow down. The architecture I am describing is in the following image.
It you are wondering why it would slow down, it's because data can be a beast. Big Data is characterized by three properties:
Volume: the amount of data that gets produced and stored
Variety: the different types of data that gets produced or changed
Velocity: the rate at which the data gets produced
We won't even go into the dimensionality problem: the various states that the data exists in. If you are wondering how we manage consistency, we will go into that in a future post.
For now, let's just state that if you tried to look at the most recent state of transactions data over a period of thirty days, you could very easily be dealing with 3 million records and anywhere between 3Gb and 9GB of data depending on the number of columns you are dealing with. All of the information you need to send to the dashboard is so you can maximize the granularity, and as a result make it possible to analyze and extract insights from the data.
The professional version of Tableau Desktop is equipped to handle up to 65 million records, so the size of the data isn't the problem. However, if you are fetching that data from a server then you have a bottleneck on your hands.
This bottleneck isn't something that users are happy about. The experience starts to degrade after a period of time.
Even after publishing the report, it still takes a while to move all of that data between two cloud providers.
You could speed things up by caching the data on Tableau Online or Tableau Server by utilizing extracts. That is exactly what we decided to do. This has a number of advantages:
Speed up access to the data
Reduce the cost of querying the data on GCP
Make the data available even when there is a network outage between the two clouds
Bonus advantage: embed access credentials in one place
Bonus advantage: make the extracted data sources available to Tableau Explorer license users for data analysis using Ask Data (an NLP approach to data analysis), or for them to use in creating their own reports using the web interface.
Tableau Data Extracts have made it possible for us to publish much faster reports. We schedule refreshes so that the data doesn't get too stale, and we hide unused fields so that the queries that run only fetch the columns we are interested in.
If you find our use of technology to be of interest and would like to join our team, please take a look at our careers page here, or drop us a note at engineering@kudi.com. Also, please subscribe to receive updates from us on new posts and job openings.
Comentários