Graph Databases: The Path from Relational Big Data to the Future

Cruce
Let's talk briefly about graph databases. We know that it's not going to be a small part of the future data landscape.

How do you see graph databases fitting into the evolving data ecosystem? How can large corporate users that are already managing massive relational data stores consider graphs structures for data? So, how do you see the market and how do you see the pathway there?

Gaja
I think the market is evolving and there are a lot of graph database vendors who are coming onboard and each of them have their own strengths and weaknesses.

Eventually there will be market consolidation, and you'll probably you will know who the leaders are, and you will say, "hey, you know what? I want to work on these people." So let's say that you have picked your vendor. From a practitioner's perspective. For me, graph databases allow us to ascertain multilevel transitionary relationships.

So what I mean by that is, we all know the theory of transitivity, right? If A implies B and B implies C, then that implies A implies C. With graph databases, We can take this to a completely new level because if there are relationships, say, in our alphabet, in the same example that I just used, you will be able to say that A implies plays Z, right? Because there are some intermediary relationship in those twenty six letters.

That is very, very important when you're looking at understanding customer behavior, because when you're trying to find out, and I'll give you a very specific example, why does a customer always buy a thousand shares of Alibaba, followed by a thousand shares of Amazon, for example? That may not be readily apparent. So when you start looking at the relationship of data from the perspective of the ticker symbol, the stock symbol, which is AWS or Alibaba, you will start seeing relationships between them, that one is following the other.

And there are 10 cases of that that happen in the last three months within your transactions. So those are insights that don't necessarily occur in a normal relational world because, you know, it's very difficult to get those because you have a separation of what a transaction is, and the separation of what the entity is. Now, when we go through this whole exercise of saying, "hey, we want to go down the path of graph databases," again, we start with models.

We want to make sure that you build a simple model first. Start small. Say, you want to start with customer and transactions, or customer and products. When you come from a predominantly relational world, and this is different for different cloud vendors, But say, for example, on AWS, you feel a lot of relational data and you want to bring it into a graph database, and you say you have your model built. You may have to sort of offload your data to a block storage like S3 and then load the data from there into the graph model.

And that's OK. I mean, you can do that. But I would say in anything that we do new, we start small, have a nice, well-defined proof of concept. Then you have a much higher degree of success. You get the job done. You see the value. And then you can start building on it iteratively.

So I'm a big believer in this whole iterative model where I say, you know, "what can I show you in two or three weeks?" A very small, well-defined, narrow scope. And then once we're done with that, then we can constantly build and evolve and expand the scope. And that's how eventually the system gets hardened and the system becomes stable and consistent. - @dbperfman That's sort of my take on graph databases.

Cruce
It's interesting, I see these trends towards NoSQL happening as well. So there's this evolution that seems to be happening on the enterprise data warehouse, where there is a NoSQL representation, and there might be a graph representation and we may be querying and using different views or assemblies, data catalogues out of our core datasets.

I'm curious, within that diverse environment, do you see enterprises embracing an "all of the above" strategy, or do you think most organizations need to pick and stick with one approach to their data warehousing, data aggregation, data hub approach and architecture?

What's the right way forward for an enterprise leader trying to figure out, out of the different possibilities for the data future, where do we head?

Gaja
OK, I'm going to say something which probably I wouldn't have said 20 years ago. I'm going to say keep it wide open. Open slate. You want to make sure that the architecture is built in a way such that you can change the backend database from relational to NoSQL to whatever else that comes tomorrow. And that should not matter. That's the level of abstraction that you have to build into the code, into the software that you built.

Now, like I mentioned earlier during this podcast, I believe that the time has come for us to use what is termed as fit for purpose data persistence, which means we have to get out of this mindset that I'm just going to pick one vendor, one database, and I'm going to put everything in it. - @dbperfman

Well, that's not going to work in the future. And the good thing about cloud computing is that a lot of the classic database vendors functionality has been democratized, which means that you have a choice. This is not a relationship where you are handcuffed for 10 years and you don't have a choice, you can't get out if it doesn't work.

You can build your architecture in such a way that you use the right persistence layer for the right things, because you want it fit for purpose. - @dbperfman If you're going to do relationships, and let's say that you want to do relationships within customer and transactions, then by all means put it into a graph database. Because you need a graph database for that. And if you need something where you have to deliver business objects for a mobile application to reduce the elapsed time of how long it takes for the application to render, then use a NoSQL database and render a business object in the form of JSON to the application.

If it's a matter of back to the system of record transactions, rows and columns and dates, characters and strings and numbers, then of course, use a relational database. But don't try to say, "OK, I'm going to have a bunch of audio files and I'm gonna go stick it into a relational database." Will it work? Yeah, it'll work. Yeah. You can do it. But is it the right data persistence layer? Probably not.

And I think that's where fundamentally we have to start looking at it. Two things. One is, how do we create a decoupled architecture and abstracted software layer that is not dependent? There's no hardware connection to any database vendor or layer or whatever. And the other thing is, how much can I use a backend service without having to manage service? And how can I get the job done quickly? Whereby the persistence layer is durable, it is scalable, it is highly available, and if I need to go global, I can go global.

So I think that is how I probably approach the problem. So keep an open, wide, clean slate with a whiteboard and with an erasable marker, and say "today it is this database X, tomorrow we can go to database Y." Of course. Absolutely can. It shouldn't be anything, the architecture that prevents you from doing that. So it's a very flexible architecture, if you will.

The architecture for today, the architecture du jour, uses NoSQL. Tomorrow could be using something else. And that's OK. We should be able to transition from NoSQL to that other database if we need to. And I think that's sort of the mindset that I have in building those kind of architectures.

Cruce
It both limits risk at the same time as opens up possibilities. It limits the blast radius, and contains the possibility that any new vendor change will will blow up data operations, and also creates a whole new set of capabilities that wouldn't be possible without that modular and abstracted architecture.

Gaja
Absolutely. 20 years ago, when the relational situation says I need high availability and disaster recovery, that was a big deal. We had to build a secondary instance. We had to make sure that VR was built in a location a thousand kilometers away. We had to make sure that we had shipped the logs. We had to make sure that the logs got applied on the disaster recovery site.

Today, it's a simple thing. It's a switch. Inside your create table statement. You can create a NoSQL data, a table, a global table, and that table can instantly get replicated to multiple parts of the world. And you don't have to do a thing, and rightfully so.

Your core competency is not like handholding replication. Your core competency should be building functionality that enhances your customer experience in your industry. And that's where I think cloud computing has come a long way in taking all those things and doing the heavy lifting for us.

I do not want to manage replication. I do not want to worry about disaster recovery as long as I have set up the global tables. I'm done. And I think that's why fundamentally we have changed in how even data persistence occurs. We don't worry about is my system gonna be up or down. Or what happens if the entire eastern seaboard loses power. Not a big deal. Because you have the other site coming up on the West Coast and it's up and running and in seconds.

Graph Databases: The Path from Relational Big Data to the Future

[A] Podcast #

Interview With Gaja Vaidyanatha

Bio

Resources

Follow Gaja Vaidyanatha on social media:

And follow the latest from CloudData LLC:

Podcast Bonus Material

Transcript

Highlighted Quotes

Related Resources

[A] Treasury

Graph Databases: The Path from Relational Big Data to the Future

[A] Podcast #

Interview With Gaja Vaidyanatha

Bio

Resources

Follow Gaja Vaidyanatha on social media:

And follow the latest from CloudData LLC:

Podcast Bonus Material

Transcript

Highlighted Quotes

Related Resources

[A] Treasury

Next Podcast