Relational databases are so entrenched and ubiquitous that we reflexively use them for new application requirements. However, graph databases are better for applications with specific processing requirements and data structures.
When relational databases handle application processing and data structures poorly, the consequences include:
- Poor online query performance.
- Long elapsed times for batch tasks.
- The inability to perform some queries, meaning no result is ever returned.
- High cost to implement schema and application changes when new business requirements trigger schema changes.
- The imposition of data volume maximums to ensure acceptable performance.
- Application outages to implement schema and the application changes.
There are many differences between relational databases and graph databases. The most significant difference is that relational databases determine relationships through join processing based on primary and foreign key values. Graph databases take a much different approach to handling relationships. They store relationships as physical pointers in the database.
Data structures where graph databases are the better choice are as follows.
Complex data relationships
The following data structures express complex data relationships:
- Many-to-many relationships between entities.
- One-to-many relationships involving many entities.
- Complex schemas with many entities and relationships.
- Indirect relationships among entities.
Applications that process complex data relationships include:
- Supply chains. Complexity arises from multiple and alternative suppliers, many parts in components and products, alternative parts, and various shipping methods and routes.
- Recommendations engines. Consumer recommendations are based on consumer profile, purchase, search, view and comment history, and similar or complementary product relationships.
- Financial fraud. Fraud relies on deliberately complex relationships to obscure crime or ownership.
- Semantic search. These searches go beyond text string matching to consider other attributes such as synonyms, search context, geographic location and query intent.
Data hierarchies
Data hierarchies include the following data structures:
- Tree structures or parent-child relationships with multiple levels.
- Hierarchies with many levels.
- Recursions requiring self-joins.
Applications that process data hierarchies include:
- Decision support.
- Network routing for trucks and aircraft.
- Network management for infrastructure such as telecom, electrical power grids, and water distribution.
- Bill of materials in manufacturing.
- Folder structures and file systems for storing data. Examples include File Manager, SharePoint and OneDrive from Microsoft or their equivalents from Google.
- Performance reporting applications with hierarchies, such as company sales data can be broken down into sales by region and then for individual stores. Alternatively, sales data can be broken down for each product category and then individual transaction data by store.
Highly connected data
Highly connected data consists of data structures where entity values are connected to multiple related entity values via one or more intersection tables.
Applications that process highly connected data include:
- Digital Asset Management (DAM). For example, content streaming companies track which movies each viewer has already watched and which movies they can watch.
- Social media marketing to measure the advertising response, identify friend relationships and influencer effectiveness.
Real-time insights
Some applications must produce real-time insights to encourage an action or prevent an outcome. Examples include:
- Product recommendations.
- Financial transaction approvals or denials to avoid fraud.
- Money transfer monitoring to prevent fraud and money laundering.
- The traffic light changes to manage traffic to avoid or reduce gridlock.
- Industrial process control to improve product quality, maintain consistency or avoid catastrophic facility destruction.
In these applications, if the insights are not real-time, the business can not capture an opportunity, or adverse consequences can occur. Graph databases can achieve the fast response required. Solutions that involve relational databases and ETL are not quick enough.
Evolving data schemas
Our systems development and enhancement work routinely encounters continually evolving business requirements. These changes typically trigger changes to database schemas.
Relational databases expect a well-defined, preferably static, schema. Changes require well-planned schema and application migrations. Often, the implementation of changes requires an application outage. Applications that do not require the new data are still affected by it.
The opposite extreme is no schema. The data infers the schema. The enforcement of the schema is handled within the application. Implementing changes is achieved by simply loading the new data without planning or an application outage. However, other applications that depend on a defined schema can not access the database.
Graph databases operate between these two extremes. There is a prescribed schema. Constraints enforce it. Changes are implemented by changing the schema while the database and applications are active. Only applications coded to access the new data see it. Older applications do not see the new data and are unaffected by it.
How graph databases deliver fast performance
In all these data situations, applications interacting with relational databases process primary key values, foreign key values and multiple indices. However, applications interacting with graph databases only process the physical pointers. This difference creates a significant performance advantage in favour of using graph databases. This difference is often an order of magnitude and can be multiple orders of magnitude in particularly complex situations.
You may wonder if there’s a cost side associated with achieving this fast performance. There is. Graph databases are slower and more resource-consuming for insert performance than relational databases. However, in most applications, records are read many more times than they are inserted or updated. That reality offsets the higher cost for insert performance.
Impediments to implementing graph databases
If graph databases are this impressive, why isn’t everyone using them? Here are some reasons:
- Simple transactions. Relational databases perform well in many situations that require comparatively simple transaction processing and involve modest data volumes.
- Relational databases are so entrenched and ubiquitous that we reflexively use them for all application requirements.
- Available expertise. Most organizations have accumulated deep expertise in designing, building and operating relational databases, while graph databases are new.
- Lack of awareness. Many organizations are still largely unaware of graph databases despite the efforts of vendors to promote their use.
- Product maturity. Graph databases have only evolved to become mature products in recent years.
Despite these reasons, organizations implement graph databases when the data exhibits the characteristics described above, or the data volumes, regardless of data structure, are billions of rows.
What ideas can you contribute to help organizations recognize data structures that can lead to superior graph database applications? We’d love to hear your opinion. You can share that with us below. Select the checkmark for agreement or the X for disagreement. In either case, you’ll be asked if you also want to send your comments directly to our editorial team.