PostgreSQL DBMS: How a 1980’s Database is Better than a New Key-Value Store

CATEGORIES
Aug 2018
comment-big-icon.png   0
Share
Author : Pete Peranzo
postgresql vs mysql, scalability & benefits

Voiced by Amazon Polly

Data is the backbone of many of our work processes today. You store and access data about your clients, your products, your team. All that information has moved from paper filing cabinets to computers, which makes accessing it easier and analyzing it possible. To do that, you rely on software programs that store this information, or data, in ways that help you keep it organized.

Progress often means improvements to the tools we use. But in some cases, the old one turns out to be better. In the case of databases, we vote for the old school, hence comes into picture PostgreSQL DBMS.

Relational Database Background

There are a few types of databases including flat and hierarchical. But in many cases today, you need something stronger. Enter the relational database, a collection of data organized in tables, records, and columns so the data can be retrieved and reassembled in many different ways. A relational database is based on the mathematical concept of mapping out data sets and the idea that elements are all related to one another. This approach to computing makes data searchable and organized. For example, you might relate something like color to a physical item you sell such as a pair of pants.

This approach to computing was invented by Edgar F. Codd at IBM in 1970. For many years, only the CIA and other large organizations had access to such databases, but in 1979, Oracle commercialized it, introducing the Relational Database to personal computing.

Before that, personal computers used tables like spreadsheets and comma separated files to keep all data. With that approach, a user cannot query for such things as “what’s the best-selling product” without pulling data from all the tables and writing a program to sort and calculate the answer.

Relational databases, also called RDBs, use Structured Query Language (SQL), to query data placed in multiple tables into a single results table that did not exist prior. Thus, relational databases can create multiple views of data without the need for third-party applications. SQL is an English-based programming language and provides an easy interface for users to interact with the chosen database.

Key-Value Stores

Of course, since the 1980s, we have come up with other ways to store and retrieve data, especially since the Internet was born.

One of those is a key-value database, also called a key-value store. The data stored in this way are commonly known today as a dictionary or hash, or you might hear the term “associative array.” Whereas relational databases store information in tables, key-value databases treat the tables as a collection with many fields, making it easier to identify and extract single pieces. The data is stored as a collection of key-value pairs. The key is unique so you can search for it, but different databases have different conventions for keys so one might use numbers while another uses a combination of letters, numbers, and symbols.

Storing data this way is simple, and it scales well.

While Oracle and Microsoft’s programs worked well, SQL queries are resource intensive and must be executed every time a user needs to access data. SQL works fine for a few thousand users, but you can’t scale it.

You might also hear about MySQL, which was the first open source relational database management system (RDBMS). While similar, it has fewer features than MS SQL and Oracle offerings. This offered a low-cost solution so companies could power their website without the high cost of software licenses. This did and still works great for small traffic websites. Users can shard the database, sharing tables across multiple servers but with more complex query requirements. In addition, they can cache popular queries into a faster technology similar to key-value but at far greater cost and complexity. However as the amount of data required for modern web applications along with larger user demands, MySQL hit a wall.

Google popularized the concept of storing all data in key-value pairs. This made running queries difficult, but made accessing and storing data much faster. In this scenario, one can store all the attributes of an object by giving the item a unique key such as a number and a list of attributes as its value. Further, Google made running queries faster and easier with a paradigm called MapReduce, in which multiple computers query data at the same time and the result is stored for later access.

How We Still Use SQL & PostGreSQL vs MySQL

That all sounds fantastic, right? Many people think so. In fact, many programmers and startups developed open-source products with similar technologies, and the SQL-based database began to decline in favor of MySQL, though today you’ll find a mix of both. But key-value stores aren’t as useful when you need to query your data set.

That’s why we have found a better solution. During the time of MySQL, there was a research product called Postgre. The program started as a non-SQL structured database but later added SQL support and most of the features of a full Oracle install, changing its name to PostgreSQL. Postgre was offered as a free download under an open source license. While it was always the most advanced option, it was not always the most popular.

However, PostgreSQL released an add-on called hstore. This enables users to create SQL queries of key-value pairs while including all the benefits of a relational database. This tool has other advantages, including: 

  • No risk of over-deployment/breaking your license agreement for a proprietary program.
  • No licensing cost for the software.
  • Lower maintenance and tuning requirements while remaining stable.
  • Customizable.
  • Compatible with Unix and Windows.
  • Can handle high volume.

PostgreSQL has some of the same performance limitations. With a few million rows of data in a table or millions of users it will have issues, but with hstore the program is more flexible in the beginning. That makes moving toward a cache system to offset performance limitations easier and far more cost effective with development labor. With this approach, the data is already in key-value pairs with the added benefits of indexing. Therefore, the cache can contain the same key-value pairs as the hstore field instead of a forced structure that requires a larger amount of processing to move to the key/value schema for cache.

In addition, one of the benefits of PostgreSQL is it can easily be expanded by developers to offer greater performance. For one project, we were able to keep a system with firehose of continuous data on PostgreSQL while lowering their hosting bill due to writing code at the database level for specific queries then calling these functions instead of the slower SQL code. The result meant the system operated much more efficiently and no longer required additional resources to perform.

At Imaginovation, this is what we use for most of our client work. This combination is the best of both worlds, giving clients the freedom and scalability of key-value pairs with the power of SQL.

Talk to us about building your database.

 

Write your comments

Your email address will not be published. Required fields are marked *