Every organization thinks that online response times should be faster, and batch elapsed times should be shorter. Performance expectations, no matter how unrealistic or even ridiculous, exert a lot of pressure on IT management to work miracles.
Graph databases and Field Programmable Gate Arrays (FPGA) can dramatically increase application performance by multiple orders of magnitude to respond to these high expectations and ever more demanding systems.
Challenging information technology trends
Many CIOs are expected to respond to these challenging trends:
- Exploding data volumes.
- Increasing number of active end-users.
- Longer average end-user sessions.
- Expanding number of applications.
Also, many organizations are pursuing major information technology initiatives like:
- Data analytics and data visualization that consume significant computing resources.
- Digital transformation that increases the number of applications and improves the integration across applications.
- Data warehouses and data lakes that consume significant storage.
- IIoT that creates huge volumes of time series data.
- Artificial intelligence (AI) and machine learning (ML) that exhibit a voracious appetite for data and computing resources.
- The data-driven organization concept that requires lots of data integration among diverse data sources.
No amount of upgrading, tuning, and optimizing of the computing environment can keep up with this growth in the consumption of computing resources. Moving applications to the cloud can help significantly but only up to a point.
Graph databases and FPGAs have emerged as effective information technology that CIOs can implement to respond to these demanding trends. Let’s explore how they help.
To read an overview of graph theory, its applications, mathematics, and history, click here.
Graph databases
Dr. Victor Lee, the Head of Product Strategy and Developer Relations at TigerGraph, a leading graph database software package vendor, presented at the recent Graph+AI World conference. He said, “Graph databases, like TigerGraph, offer significant advantages for applications that must quickly process large volumes of data that exhibits considerable connectedness among multiple entities in the database schema”.
The following features of graph databases contribute to successful applications that must manage huge data volumes and still deliver excellent performance at a scale that relational databases cannot manage.
Fast query speed
Graph databases deliver extraordinarily fast query response times because queries only process the relevant relationships and not the total data volume in the database.
Graph databases routinely reduce query completion times by one to two orders of magnitude compared to the same application running on a relational database. As the number of entity instances increases, the difference in performance grows further.
This speed is essential to the successful operation of the data analytics-oriented applications listed below. A good example application is financial fraud detection that frequently requires querying millions of accounts and billions of transactions.
An important caveat about query speed is that only a native graph database can achieve fast query response times. There are some graph database software packages which are only wrappers running on top of a tabular database. These solutions can only run as fast as the underlying tabular database.
Entity relationships stored as data
Graph databases explicitly store the relationships among entities as data alongside the attribute data. This simple sentence encapsulates a huge difference between relational databases and graph databases. By contrast, relational databases determine relationships by performing more expensive and time-consuming joins.
This relationship storage in graph databases:
- Makes the database schema much easier to understand for software developers and business analysts.
- Results in super-fast queries, even for complex queries or large data volumes.
A good example application is supply chain that requires representing the complex relationships that exist among the thousands of components and parts suppliers associated with aircraft or automobile manufacturing.
Entity relationships easy to understand
Whenever a DBMS can represent real-world relationships accurately and avoid kluges or workarounds such as cross-reference tables or composite keys, it’s easier for software developers to understand the organization of the data in the database. That ease-of-understanding leads to:
- More accurate, reliable solutions with less development effort.
- Reduced effort and elapsed time to implement future enhancements.
A good example application is computing infrastructure problem analysis where a complex computing environment with many components must be represented in the database schema in an easy to understand way.
Data structures responsive to change
Whenever a DBMS can represent real-world data structures accurately, more of the same benefits listed under Entity relationships above can be realized.
In graph databases, data structures are more flexible and multiple data types are more easily combined. While data is still organized in tables, these table definitions and their relationship definitions can be altered dynamically.
These graph database capabilities are particularly important when the application data includes many data types. A good example application is Facebook comments or posts that can consist of any combination of text, images, videos, links, and geographic coordinates.
Field Programmable Gate Arrays (FPGA)
Kumar Deepak, a Distinguished Engineer at Xilinx, a leading vendor of FPGA hardware and related software, presented at the recent Graph+AI World conference. He said, “Our graph database customers experience significant performance gains when they add Xilinx FPGAs to their computing infrastructure”.
The following features of FPGAs deliver excellent performance for graph database applications that operate with data volumes at a scale that even a multi-CPU server cluster cannot manage.
Capacity scaling
There is a limit to what adding more CPUs to a server can achieve because each additional CPU produces a smaller performance increment because of limitations described by Amdahl’s law and limited memory-to-CPU bandwidth.
FPGAs address this scalability limitation of CPUs by pipelining parallel computations and by offering much higher memory-to-CPU bandwidth with low latency.
This FPGA capability significantly raises:
- The data volumes that a graph database application can process while still delivering excellent performance.
- The number of concurrent tasks that a graph database can process.
Fast execution speed
The sequential instruction processing architecture of general-purpose server CPUs that are designed to handle widely varying workloads limits its execution speed. Increasing the clock speed helps but that approach is also limited by other constraints.
FPGAs address this speed limitation of CPUs by offering a massively parallel processing architecture that performs a focused number of functions extremely fast. Further, the parallel processing elements of FPGAs are typically pipelined to process even more data per FPGA clock cycle than CPUs. Parallel processing and pipelining can apply to any of the following situations:
- Instructions – perform multiple instructions at the same time.
- Tasks – perform different tasks on a single set of data at the same time.
- Data – perform the same instruction for different blocks of data at the same time.
This FPGA capability significantly reduces the query completion times by one to two orders of magnitude compared to the same application running without the FPGA.
Architected for algorithms
Server CPUs are architected for the instructions associated with transaction processing. That’s the right choice for many applications but not the ones listed below.
By contrast, FPGAs can be architected for graph algorithms by configuring them with just the instructions associated with graph algorithms. To support the effective use of FPGAs, graph database software package vendors include or license software routines that perform many of the frequently used graph algorithms.
This FPGA capability significantly raises the complexity of algorithms that a graph database application can process while still delivering excellent performance.
For a short explanation of graph algorithms, click here. For a list of algorithms including graph algorithms, click here. For an expanded list of graph algorithms, click here.
Well-suited applications
The applications where graph databases and FPGAs provide significant performance, software development, and manageability improvements over relational databases include:
- Fraud detection, money laundering.
- Supply chain optimization.
- Customer 360 interaction analysis.
- Product recommendations.
- Bioinformatics, drug discovery.
- Social network monitoring.
- Risk management.
- Identity and access management.
- Computing infrastructure monitoring.
What graph database capabilities are you looking for to add value to your organization’s data? Let us know in the comments below.