Dynamic Data-Driven Customer Interactions with Big Data

Cruce
Welcome to Towards a Smarter World. This is your host, Cruce Saunders, and I'm pleased to be joined today by Gaja Vaidyanatha, the evolved data practitioner with a 27 years track record of managing and integrating large data footprints on premises and on the cloud. [He is] the senior enterprise leader passionate about delivering high quality data for analytics, machine learning and AI. Gaja has had an extensive experience in industry and also has a lot of exposure to really high-end architectures for delivering customer experience and data integration at massive scale.

His data philosophy is good data creates good business insights which lead to great business outcomes. He's the principle of Cloud Data LLC, a consulting firm. Cloud Data enables customers with the design, architecture and building of serverless data integration hubs in the cloud. So today's conversation will be on the data side of the content and customer experience continuum. Welcome, Gaja.

Gaja
Thank you so much, Cruce. That was an awesome introduction.

Cruce
I'm really glad we can bridge this conversation within the content community because content and data are often two sides of the coin that create customer experience. A lot of times the customer experience practitioners focused on content and delivery of environments that are supported by content don't always think through the data side of how we get our customer data into interaction with content. - @mrcruce

So this will hopefully help to plant some ideas within the enterprise leaders listening about how data and content work together. Because our audience cuts across a lot of business roles involved with customer experience, why don't we start with some basics? What do you see is the role of data in delivering customer experiences?

Gaja
That's a great question. It's a great point to start this conversation. I believe that the role of data is huge. A lot of people talk about the 360-degree view of the business. And in my humble opinion, "we need to start with a 360-degree view of the customer given that the customer is the most important dimension of the business." -@dbperfman As we all know, a customer interacts with the business with many touch points, with the website, mobile or otherwise.

They view a lot of pages on that website. They transact business with us. They get on audio calls, video calls. They send us images, documents, maybe during onboarding. And then, of course, they also interact with us in the realm of social media. And these days, depending on the type of business, they may even interact with us using telemetry data. So "providing a world-class customer experience begins with integrating all of the aforementioned touchpoints of the customer in a single location." - @dbperfman

My belief is that it is only then that one truly understands the customer and the full picture of the customer and how they interact with us. And when you do that now, you can bring the service to a completely new level.

Cruce
That is very descriptive. We are constantly collecting data in so many different touch points and it becomes a large volume very quickly. What is your rubric for what kind of data is important to collect versus what kind of data would be just taking up space on a server as you're prioritizing data collection? How do you sift through that decision?

Gaja
As a data practitioner, for me, all data is great data. I do not differentiate with something that is more important than others. I mean, of course, there is some sense of that where you can say that someone's Facebook update is less important than the transactional data that they created when they interacted with our website. But having said that, I believe that the whole idea of building that whole 360-degree view of the customer begins with the transactional data, the so-called relational data in the classic sense.

And then we need to slowly start incorporating the non-relational data components, be it documents, be it images and then start incorporating social media and web logs and system logs. So my belief is that you need to make sure that every customer touchpoint there is something happening, there is an activity that's happening, and that activity needs to be captured obviously with the consent of the customer.

And once that is captured, then it's easy for you to build the entire picture of the customer. And let me give you a very specific example. If someone were to ask you the question so how is this customer doing? Say a business leader asked the question to another business leader that other business leader should be able to say, "Yeah, I think it looks like he or she is doing great. We had a great call with them two days ago. The sentiment analysis on the audio part was great. They've been interacting with us really well on social media channels, also very positive about our product and service. They have generated a lot of business, this is the amount of business they generated for us in the last 90 days. And by the way, they seem to be doing some very interesting things with the way they generate and transact with us."

So I think that sort of entire picture needs to be built. It's not enough to say, "Yeah, I had a call with the customer and everything was fine." You have to sort of build all the pieces together whether it's a call, whether it's a website interaction, whether it's a video interaction, chatbot doesn't matter. You have to make sure you get the whole picture. - @dbperfman

Cruce
You know, it's reminding me of the in-session data that we are working with within customer experience management platforms. And that same approach plays out within real time experience optimization, where a customer's onsite search queries or their chatbot intents that we've been able to capture or their browse behavior, the sequence of what they're looking at and when. All of that becomes useful to helping deliver the next best action, the next best content for that customer.

Gaja
Yeah, I completely agree. And in that point, I want to just add one thing, and that is as a data practitioner, I don't classify data as structured and unstructured data. I know the market does that and it sort of troubles me at various levels because the whole idea that data is structured inside a relational table and not structured when it's outside a relational table makes no sense. Because you could have a business object that is delivered as a JSON document from a document database.

It's very, very well-structured data. So is the case with weblogs also. So I think in my definition, that whole classification of data is more of is it relational? Is it something that we can put in rows and columns? Or is it something that we need another persistence layer? Another way to store our data and let's just classify that as non-relational?

Cruce
Yeah, that makes sense to me. The structured conversation within the content realm is something we have all of the time because we are working towards modular components of content and the modularity really matters when it comes to being able to take, for example, an article. And instead of seeing it as just a page, take the title and the description and a short summary set of takeaway points and separate them from the related main article.

And present those out via mobile application or in a chatbot stream or in some other more micro format and then be able to all roll that up into a single experience when somebody is on the long article format. So we're taking something that used to be chunky, like an article in a WordPress template that just had a few fields in a relational database representing the content. We're turning that into something much more faceted and much more manageable.

And I think the challenge is putting those manageable pieces, those chunks, those modules into relationship with modular data. Getting those things to work cleanly together so that the right chunks can be related to the right customer data, contextual pieces. - @mrcruce

Gaja
Right, makes sense.

Cruce
How do you do that in a large enterprise? I know this is something that [A] works on trying to solve all the time, but I'd love to get your thoughts. And as you've progressed through your career, as everybody's moved towards recommender systems, how this evolution is happening. It kind of goes to the bigger conversation about how the landscape for data has been changing and where enterprises are focused today.

Gaja
Right. So if you dial back the clock 15, 20 years ago, the idea of customer service or content management was completely different. And we never, ever thought of mining non-relational data sources like weblogs or telemetry or social media to do anything related with customer service. So for me, what has actually happened is the inflow of non-relational data has been absolutely incredible, unprecedented. And it continues to surprise me every year that we go – how much more data that we are collecting, be it with smartphones, be it every time we touch a button in a smartphone, in an app, whatever it is, right?

Every interaction is now getting traced. So I think the classical historical however you want to call it the mindset, the mindset and the relations within. And this is something I can talk to because I was in the relational space for over two decades. And during those days you couldn't have convinced me to store data in non-relational format because I would always find a way to do it in the relational space.

But then things changed, right? The volume of data increased by orders of magnitude. The types of data that we collect also changed. And it's no longer the right place to put all of those types of data inside a relational table.

So what's happened is as cloud computing evolved, all of the cloud vendors started supporting services that allowed one to store this non-relational data or what the market calls unstructured data in different types of data stores. And I think that's been a great technological improvement for us. An enablement that allows us to sort of maintain and manage that data differently, because in the prior world, it was just all very, very clunky when you put everything inside the same relational database way the transaction data was.

So enterprises these days are collecting large amounts of data, they're collecting large amounts of non-relational data. We need to sort of get the whole picture of the customer interaction. We can't get slivers of the interaction. "We have to get the whole picture. If you don't use some of these new fit-for-purpose data persistence mechanisms, then I don't think you'll be well served in the area of customer service and customer experience." - @dbperfman

Cruce
You know, relational databases over the last few decades have traditionally required a lot of very specialized skills. The user interfaces are not known for being particularly accessible to business users. It's one thing to know some basic Sequel statements and be able to query a database. It's another thing to be a business user working entirely in a UI. It's another thing to be different levels of data engineer.

I'm really curious to see how you think the movement toward different kinds of non-relational architectures and cloud-based solutions and BI solutions are helping to make data more accessible to business users who are not necessarily highly trained. Are we getting closer to an era where we can use our data with more facility and less knowledge friction around managing the data?

Gaja
I believe we have. In my own experience in the past, I would say year and a half in one of the projects that we delivered, there was this concept of making the data available to everyone. We understood going in that our user population was very varied. They came across different segments of the business and different levels of technological experience and expertise.

So when the data story was sorted in the sense that we got the data cleansed and integrated and all that, we basically said that the UI is going to be of three levels. And let's call it for the lack of a better word, a novice user, a medium user and an expert user. And the way we built the interface was that the novice user basically wanted to see more of insights, which means that if that user was able to click and navigate through the UI, then that was good enough.

They did not necessarily want to go and start doing whatever analysis and start meshing two different types of data to see what comes out of it. So in those cases, the visualization layer lent itself reasonably well. We then took the medium user. The medium user basically was he has a very good understanding of the business, but also knows enough on the technology side, on the data side. So they have some tribal knowledge about the data.

In that scenario, what we did was we actually built a very simple portal whereby we say, "Hey, these are the datasets available to you. What do you want to do today?" So they say, "OK, I want to pick a customer and then I want to pick transactions." And we behind the scenes sort of did the relationships and the so-called join, if you will.

Because we visually said, "You tell us what you want and we'll tie it together and you pick which columns you want." Because for them we didn't say columns, we said this is the data elements that you have in a given asset and you can put it together and click on submit. And lo and behold, now you have a dashboard that you have built which is customized to your needs.

And the third type of user, which is so-called the expert user, for them, it was more of, you know, "Hey, give me something that is equivalent to a Sequel interface where I can write the queries. And then once I write the queries I push a separate button, then go ahead, use the same visualization layer to expose that."

And so we use the three layers of visualization where we said, we have a canned set of dashboards that's for the novice user. We have a set of dashboards that can be generated on the fly for the medium user, using just business language and data glossary terms. And then the third user basically was talking about rows and columns and where class predicates and aggregate functions and join conditions and stuff like that.

So I think that's how one has to look at it, because when you deliver something, the consumption layer and how you build that consumption layer and what ease with which people can at least actually use your system and use the data is extremely important.

Cruce
Yeah, I am very interested in this multi-tiered approach to user interfaces and I think our listeners will be as well. I think we can certainly learn from that in the software business in general. Regardless of whether it's a data application, there's a lot of our clients in the technology space have user experience layers that are fairly monolithic.

And what I think you're proposing here is almost customized or personalized UI. I really like that because it moves kind of closer towards what I would call an intelligent customer experience where the actual user interface is adaptive to the customer's context, which could be a maturity level within their own ability to work with the application. So the application itself may evolve based on their needs for that UI.

Gaja
Absolutely. And I think there is one additional point there, which you sort of brought up, and that is how do we deal with the user populace that has certain constraints? Which means that if someone is visually challenged, how do you cater to that person versus somebody who is visually challenged from the perspective of they can't see color? So they can see only different shades of grey.

It's very important to factor all of that in in your UI design. And the good news is these days in the cloud space, there are many, many options for us to actually take text translated into voice, do a sort of runtime rendition of what is on the screen dynamically, which helps different types of users utilizing that data. And I think that's equally important.

Cruce
The idea that accessibility is a function of modular architecture is I think kind of one of the more important aspects of this movement towards intelligent systems. We have to have modular components in order to make content more relevant and accessible.- @mrcruce The accessibility is a broad term, it can mean accessibility for business levels or interaction levels.

So the demand for that UI to do things based on expertise or it could be a UI adaptation based on the needs of the individual consumer or user and their abilities to consume that information in either visual form, auditory form or adapted to different kinds of screen reader applications. All of that becomes more modular and possible consumable when we take our interfaces and look at them as an assembly built out of components.

Gaja
Yes, indeed.

Cruce
As opposed to monolithic renderings in a fixed space.

Gaja
Yes, absolutely. I absolutely subscribe to your point of view. The days of creating monolithic applications should be done and gone and dusted. We have to create components and make sure these components come together and mesh differently for different people, for different users. - @dbperfman We cannot keep reinventing the UI wheel every single time. You're absolutely on point on that.

Cruce
There's a lot of compatibility between the content and data worlds, and I'm glad we're exploring some of those dimensions in this conversation. Switching gears a little bit, you've written extensively and spoken about serverless data hubs. I know it's an area you're certainly very passionate about.

Can you describe for listeners not familiar with what a serverless data hub is, what that is, and what opportunities are you seeing within the serverless space? What do they allow that was not allowed before them?

Gaja
I think I'll talk about what serverless is and then we can talk about the data hub part as a follow up to that. So basically serverless computing doesn't mean that there are no servers, it just means that there are no servers when your system is idle. So the concept is pretty simple. If you have a piece of a program that needs to run, there are two ways you can deploy it today. You can deploy it on a regular server, on a virtual machine, for example.

It'll run and let's say it runs for 10 minutes and then let's say for the rest of the hour it doesn't do anything because there's no activity. For that hour you've paid 50 minutes of idle time for that 10-minute run. With the serverless world, what happens is when you call the program a micro container, a virtual machine, what you call is a dynamic virtual machine gets launched with the code. The code runs again, it runs for 10 minutes.

And then once it's done, the container goes away. So what that means is you don't pay for that 50 minutes of idle time. That is one of the most important aspects of serverless computing. The other aspect is that from an operational cost perspective, there are no servers to manage. So basically what that means is you can focus more on your core competencies if you are in the financial services industry and you're a bank or you focus on banking, not on managing servers. If you are in healthcare you focus on healthcare related issues rather than managing servers.

Let me sort of back up a little bit when it came about everybody was talking about how cloud computing will reduce capital costs and replace it with operational costs. You know, the whole buzzword of CapEx replaced by OpEx. Well, that's great if you're starting on the cloud journey, but in the long run, you want to reduce your operational costs also. And the only way you can reduce your operational costs is if you don't pay for idle time.

And if you don't pay for the management aspect of hundreds of servers in your enterprise. And the only way to do that is with serverless computing. Let's ask the question then. Why don't we put everything on serverless? Well, the problem is that the serverless computing realm is relevant for workloads that have burst activity for some time with pockets of idle time splattered across the day. Which means that if you have an application that is busy 24/7 you'll be ill served by putting it in a serverless platform because you will pay way more for it.

You're better off just doing the traditional, getting a virtual machine, stalling, patching, upgrading and maintaining and paying for the operational costs. But if you have a workload where out of a 24-hour time period the workload is really active only for eight hours, then that's a very good use case for serverless because then you can say, "I do not have to pay for the remaining 16 hours."

So the workload defines whether or not we want to go serverless. With this whole idea of why am I so passionate about serverless data hubs? Let's sort of back up that truck a little bit and saying, I'm very passionate about data integration. I believe data integration is analogous to the motto of our great nation, e pluribus unum, which means out of many one. - @dbperfman

I feel that the need to bring different sources of data together and provide a single source of the truth is fundamental for every enterprise. - @dbperfman Whether you're for profit or non-profit, it doesn't really matter. If you're dealing with data, you need to make sure that you have a single source of the data truth. So a data integration hub is a single location where data is harmonized, it's canonicalized, cleansed, standardized and checked for adherence to business rules, compliance rules.

And you want this in a single location because what you do not want is a customer experience where they talk to the transactional side of the business and they get one story and then they talk to another business unit and they get a different story. And this has happened many, many times with many people. I believe that more and more organizations will start utilizing serverless for the workload characteristics that I just mentioned before.

But I think it's important that we start looking at it for data integration hubs, because data integration hubs are classic. They're almost like what you call it, a poster child for serverless because they are not something that's busy 24/7 having thousands of API calls. They basically have burst activity, pockets of quiet time or idle time, some more burst activity, pockets of idle time. And that sort of pattern just continues day after day.

Cruce
So any kind of persistent API service would not run on a serverless environment?

Gaja
You can run it. But my point is, if you look at the cost over a period of time, you may actually see that it actually costs more to do that than to go the server route. Now, you may do it still just because of the fact that you do not have enough staff to manage servers or to patch operating systems. But that's a decision point where you basically said it's more for the ease of operation, not necessarily for saving costs.

And that's a valid reason to do that, too, because hey, there's nothing to manage. Once your program is finished, it's gone. But if you have an application that's sort of generating 500 API calls a second 24/7 by 365, you will see that if you deploy that on serverless, that your costs are going to be much, much higher than on a server platform.

Cruce
Yeah. Architectures are just in a state of major change right now from what seemed to be a fairly steady state for some time after cloud infrastructures became more the norm. Can you tell me a little bit about your vision for the most important aspects of changes in cloud architecture today and the ones you anticipate coming up soon? Let's give our audience a little bit of your vision there.

Gaja
Well, I think it goes back to your previous comment about making sure that we architect for components and that sort of falls really in line with this whole concept of micro services architecture. And microservices if you look at it, I mean, we've done the concept of microservices for decades. Right?

We've always for those of us who actually went through any sort of standardized software engineering discipline, be it in computer science degree in school or whatever else, we sort of knew that we took a big problem, we broke it down into smaller problems. We made sure that we had subroutines, we had functions that did a specific task.

So we did that. Now, what cloud computing has done is that's sort of taken it to a different level because in this whole serverless realm, you have to sort of architect it in a way such that it is completely decoupled and asynchronous. Because this container goes away after your program is finished, which means that you somehow have to communicate to the next part of the program without having to sort of have a direct connection between these two containers because you can't. Because nobody knows which container is going to be when and where, because these containers don't have any names that you can access it with. So they're not like regular servers.

So the whole concept of an event driven architecture sort of comes into play. And that is incredibly powerful because if you ask anyone who's worked in the space of performance, they'll tell you that coupling is bad. Anytime you tie two things together, you're creating a weak point in the system because now when these two things with either one of these starts to misbehave, it affects both. - @dbperfman So the whole idea of de-coupled architecture is possible with the micro services and in the serverless realm dynamically launching micro containers.

So this whole idea of when do I want a piece of code to run? I want a piece of code to run on a trigger and that triggers an event. So rather than me saying I'm going to call this program, we are not going to do that anymore. Program 1 doesn't call program 2, which calls program 3 and function 4. That doesn't happen anymore.

Program 1 does something and the act of doing that is captured as an event. And that event triggers program 2 and so on and so forth. So from an architect standpoint, that needs to be the way we do it. And if you look at the data space itself, and I had this conversation just this week, actually, somebody was saying that, "Oh, we are working with NoSQL databases. Someone told us that we don't have to do any modeling with NoSQL databases." Then that is the farthest thing from the truth.

The way we model data in relational was different than the way we model data in Sequel, but it doesn't obviate the need for modeling. So all of those practices still have to come into play. We still have to do the basics. What we did for many, many years of course, we did modeling a different way in the relational space, but that's OK. We just do it differently now.

So I would say time spent in sound principles of architecture, making sure that we follow the practices that have been established by a specific cloud window, making sure you think in components. And I like the CTO of Amazon, Dr. Werner Vogels' quote, and he says and I'm paraphrasing, this is: "Everything that can fail will fail. Your job is to contain the blast radius." And that is a very significant part. And if you build a couple systems your blast radius is much bigger.

I think it's almost imperative that we all look at how do I build a system where if this one component fails, the system still goes on? And I think that fundamentally is how we will design and deploy applications in the future. - @dbperfman

Cruce
This conversation is just really reminding me how much there is to learn from data systems when we're designing customer experience delivery environments.- @mrcruce The movement towards microservices is happening everywhere at every level of the architectural lifecycle within customer experience. And it's well-established in the data world. It is not well-established yet within the content world, and that is really at the essence of the movement [A] is working to lead in the industry.

Is this discussion of how do we move content into a micro service mode along with data? Because there's just not wide awareness that content systems need to function in this independent, interdependent modular way, allowing the transit of content objects through an entire ecosystem. I think within the data world, as you've been describing so eloquently, data objects are already today moving in more containerized, modular, microservice-oriented architectures. Now, with serverless environments and integration data hubs.

I see the content side of the stack also moving in that same direction. It just is going to take moving a lot of authoring management and publishing processes along with it because content has traditionally been done in a big, chunky, ugly way for a long time. I mean, for decades we've been building document-oriented content objects.

Now we're asking people to build modular-oriented content objects. We're asking IT and content technology stakeholders to move the mindset away from documents towards modular components that are representable and reusable at every layer of the stack, including all the CSS classes and interface pieces that have to consume them. They all have to follow the same modular component structure. And so that whole modeling conversation becomes really key to orchestration, doesn't it?

Gaja
Absolutely does. And I think that's where content what it used to be 15, 20 ago versus what it is today, because the data sources have changed, the definition of what content is has also changed. And because content doesn't come in one shape and it used to come in one shape and that was called the document before. But these days, you have to make sure that you pick all of the touch points of the customer, all of the various pieces of data.

Data that's coming from an IOT at a remote drilling station of an oil rig. I think that's where the integration hub comes to play a big deal. Because it actually brings all of those disparate sources and creates objects, a unified object with components at the data layer, which you can then consume with the relevant objects in the rendering layer or the visualization layer, which then gives you the whole picture on the 360-degree view. So yeah, I absolutely agree with you.

Cruce
Wow. It's really a C-change. And what's interesting to me is that data also can be content. I mean, we already consume content that is data all of the time when we're dealing with for example, election results or the spread of a particularly impactful virus that is impacting our world. All of those things are data related items that become content.

And so our content experience very much involves data and the interplay between data and content is getting to be more and more integrated from just a customer experience design standpoint.

Gaja
Absolutely. And this is what I mean. And I think you and I are coming from different angles and the same exact point. And we are in sync right here. It's a beautiful feeling because there is this harmony and synchronicity here. And that's what I'm trying to tell people this. I said, look, when a customer interfaces with you, interacts with you through a telephone call, a video call, a chatbot, a document, image, a photograph maybe they sent during onboarding, the transactions, how they use the web app, the mobile app. There are so many bits and pieces and all of those bits and pieces have content.

I'm so glad that you actually said that, because that is something that most people don't get because they look at data as just a bunch of numbers and characters and they don't look at [data] from the perspective that it's an asset. It's something that has got such valuable insights hidden in it, that unless you bring it together you won't be able to harness it. - @dbperfman

I think gone are the days where IT is looked at as a cost center. I mean, that transition has already started many years ago where people look at IT as a revenue center because most companies today have a competitive edge because of their IT and their software and their programs. But from the data standpoint, that similar amount of focus needs to be brought forth. And there has to be a complete cultural change within every organization to say, hey, guys, data is not something that lives in a database. It's something that drives our business. And we need to treat it as an asset. - @dbperfman

Cruce
Yes. And the supply chains related to that content and that data need to be attended to. I mean, there's just so much waste in those supply chains. I mean, if it's an asset, we have to be able to pay attention to the way that asset is formed, managed, maintained, organized. - @mrcruce And a lot of times those processes are accidental or they're baked into some kind of afterthought PNL and line item. And it's not necessarily considered as an organizational competency. Right? The supply lifecycle around content or data.

I'd love to hear your thoughts on how to reduce waste when it comes to wrangling data. I read a blog article you wrote that talked about how data scientists have been spending 50% or more of their time wrangling the data like transforming it. And that's the exact same thing that [A] sees when it comes to many in the content side of the industry.

There is so much transformation energy from the time it goes from somebody's sketchbook to Microsoft Word to a cloud-based collaboration to eventually an EM instance or a content management system. And then it's represented in three or four different channels or sometimes dozens of different channels, sometimes dozens of variations, all of it manual.

It's insane. We see that same thing and content more than 50% of valuable creative human beings' time being spent in overhead process. How can we address the resource drain from your perspective, especially in data?

Gaja
Absolutely. I am glad A you read the article and B you picked up on that point. I was very conservative when I put that 50% by the way. I talked to one of my customers a couple of weeks ago and literally flat out asked them if you want to draw a pie chart. How much time do your data scientists spend in cleansing data? And the answer was more than 75%. - @dbperfman

And I was thinking, OK, this is the right place for me to be because that's the kind of environment that I can add value. If you look at it from the AI standpoint, right. And this is true, whether it's AI or anything else. It doesn't matter. Let's look at it from the perspective of enterprise data consumption, regardless of whether it's for AI or whatever.

And when I wrote the article, I used what's called the runner's analogy. If you're just sitting on your couch and suddenly one day you decide to run the ultramarathon, which is the longest race that a human being has run, you're not going to be very successful because you've had a very sedentary lifestyle. There's not been much management of the various components.

So you need to build a program together. You need to make sure that you get healthy. You have to make sure that you eat right, you hydrate, you rest well and you train. So out of the couch. And literally, there are people who have written blog articles in the running world as to how you can get off the couch and still run a marathon. And that was awesome because I saw this and I said, "This is exactly what's happening in the AI space."

People want to go around the ultramarathon, i.e. go start an AI project, but they don't get the prerequisites done first. It's a classic case of putting the cart in front of the horse. You have to get your data story right. You have to make sure there is a single version of the data truth in your enterprise. And you don't have 42 different versions of the customer table floating around in 42 different systems which give 42 different impressions of who the customer is.

All of that has to get sorted out first, which means that you have to provision the data, cleanse it, standardize it, harmonize it, and then deliver that data for analytics programs, machine learning, AI, content management, whatever it is. But people don't do that. People basically say, "I need to get this project done. I cannot wait for anything else. I'm going to take the data from the source system and I'm going to go and have at it."

And what they don't realize is they end up spending so much time cleansing the data and that process gets repeated as a one off in every single project and that is where the wastage comes. - @dbperfman So there are two things that happen. You're wasting valuable time that your employees could have spent doing something else, i.e. data scientists should be building models and algorithms, not cleansing data. And the other thing is that you sort of reinvent the wheel for every project.

And I think that is where a colossal amount of wastage comes and also the customer experience is bad. OK, let's assume that you did all of that and everything was great then. Then, wonderful. But the net result of the customer experience, that is also bad because the customer is interacting with one business unit, getting a different picture of who he is or who she is. And with another business unit, it's a completely different picture because there's no consistency. There is so much of redundancy and duplication.

And I think fundamentally, if you look at it, if you create a single source of the data truth, i.e. a data integration hub. And if you do it in the realm of cloud computing with the serverless model, you've got a lot to gain. And this is not something where you say, "OK, you know what, let's sign an 18-month deal and we'll show you something at the end of 18 months." This is something that we can demonstrate value on a week by week basis because it's an iterative model.

You can bring the customer data first. We can say, "Hey, let's run version one of the cleansing." And then we'll show you what it is after the version one of cleansing. And then we do version two and then we bring in transactions or products or whatever. So that kind of methodical evolutionary approach to data and making sure that any time anybody is reading customer data, they're reading it from this hub, because people go through so much of application level integration. They spend millions of dollars on it.

And I tell people, "What is the point if you have 100 applications and you cannot tell me that you have the name of the customer if I give you an identifier and that same identifier is the same customer across the 100 applications?" You cannot do that until and unless you get your data story right that level of integration will not be possible. If you don't get your data integration done, then application level then becomes more point to point hardwired connections and that will not scale in the long run.

So I think that's where fundamentally we have to change. We have to basically say, "Yes, we want to do AI, yes, it's important, but we need to get the prerequisites done." First, let's cleanse our data, standardize it, store it in one place and deliver it to the enterprise from that single location, have it sourced from multiple sources from the backend. All of a sudden now you're looking at the data truth.

Cruce
It is a good reminder that in an ongoing practice when we're dealing with customer data, any kind of data, we're dealing with customer experiences and any kind of content that powers them. - @mrcruce These are not one-time push button get off the couch and run a marathon. These are practices that require a set of discipline around them in order to become valuable, in order be able to accomplish those goals.

Gaja
Exactly. And the thing is that if you get off the couch, you first should try running around the block, then maybe run a 1k, a 5k, a 10k, a half a marathon, then a marathon. Then you try a few marathons and then you go to an ultramarathon. But no, that's not what we want to do. We want to just go run the ultramarathon even though we've been sitting on the couch for years and we expect positive outcomes out of it and that's not going to happen.

And the more people I talk to, they understand it. They say, "Hey, this is logical. I mean, what you're saying is not rocket science." But my question is, why is common sense not common practice? Why aren't we doing this? And I think we need to move towards that, because at the end of the day, I believe that when we do AI, we need to make sure that we engage in responsible AI.

And "responsible AI will not be possible unless you have high data quality." - @dbperfman And so fundamentally, we have to change that mindset that we'll just do broad brush strokes on our data and launch this project because that's our MBO and we get paid our bonus for that. No, no, no. We have to look at it in the long run for the good, for the better good of the universe of humankind and our businesses. We need to do this right.

Cruce
We have covered so much, Gaja. Thank you for your wonderful insights today. I know that because we covered a lot, some of this we're going to turn into bonus material. And I think if anybody's interested in that, take a look at architecture and micro services, abstractions, graph databases in the past the data organizations can take towards them, AI and the future of machine learning and how a data consumer and publisher can use and leverage AI to really amplify, accelerate and simplify a lot of the operations around data.

All of those topics and more can be seen in a follow up set of bonus materials linked in this podcast description on simplea.com. Gaja, thank you so much for your time today. I really appreciate the insight, vision and clarity you've brought to this connection between content, data, customer experience, architecture and the future of this world we are all participating in and co-developing at the same time making it a little bit smarter one step at a time.

I'd love to ask you to give our listeners some insight into how to learn more about your work, your company, any kind of articles that they might be interested in, and we'll make sure that we link to those in the podcast description as well.

Gaja
Cruce, thank you first of all for having me on this podcast, and I truly appreciate all the kind words. In Star Wars parlance, I'm still a padawan. I'm still learning the craft after 28 years, still learn something new every day. I'm very humbled that I have had the opportunity to share some of my thoughts with you. Yes, we had a very, very interesting conversation touching all sorts of topics. With regards to what people can find well, you can connect with me on LinkedIn, that's Gaja Vaidyanatha.

And I have published a few LinkedIn articles about the subject of data integration, about how I myself moved, made my journey from Oracle to NoSQL, how to architect a data integration hub on a AWS and recently published an article on the cost savings of running on a serverless platform versus a server based platform. You can also visit us at cloud-data.biz, and that's the website of the company.

And we'd love to hear from you, we'd love to interact with you. And if there is any way we can be of service, we cherish that opportunity. And our goal at the end of the day is to make sure that you get value from your data and you get a single version of the data truth in your organization. Thanks again for inviting me on this podcast and I appreciate your time and effort in this.

Cruce
Thank you, everybody, for listening in on this wonderful conversation with Gaja Vaidyanatha, which can be spelled so you can find him at V A I D Y A N A T H A. So look up Gaja's work and find him on Twitter, on LinkedIn and connect by the website. Thank you, Gaja, very much for your time today and to all of our listeners. One step at a time towards a smarter world.

Gaja
Thanks so much.

Dynamic Data-Driven Customer Interactions with Big Data

[A] Podcast #19

Interview With Gaja Vaidyanatha

Bio

Resources

Follow Gaja Vaidyanatha on social media:

And follow the latest from CloudData LLC:

Podcast Bonus Material

Transcript

Highlighted Quotes

Related Resources

[A] Treasury

Whitepapers & Articles by Gaja Vaidyanatha:

Dynamic Data-Driven Customer Interactions with Big Data

[A] Podcast #19

Interview With Gaja Vaidyanatha

Bio

Resources

Follow Gaja Vaidyanatha on social media:

And follow the latest from CloudData LLC:

Podcast Bonus Material

Transcript

Highlighted Quotes

Related Resources

[A] Treasury

Whitepapers & Articles by Gaja Vaidyanatha:

Previous Podcast

Next Podcast