With all of the data that companies accumulate, it's a struggle to find an effective cloud storage repository to not only hold and manage all of that information, but to enable search and security capabilities as well. Fortunately, cloud platform vendors such as IBM, which offers IBM Cloud for Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) scenarios, are actively working on new ways to manage data in multicloud architectures.
What Is a Multicloud Architecture?
A multicloud architecture consists of data and code stored in multiple cloud environments within a single architecture. Simply imagine an application that uses code and resources across multiple clouds, such as Amazon Web Services (AWS), IBM Cloud, and Microsoft Azure. By using interoperability standards that are still evolving, multicloud architectures bring interoperability to software services no matter what clouds those services are using as a platform. This lets you tailor your cloud resources so they more specifically target your workloads.
Small to midsize businesses (SMBs) should consider a provider that can help manage the infrastructure of multiple cloud services and keep them secure and organized in a single console. Even better is one that can combine third-party cloud services, such as Microsoft Office 365 , with resources you have running on your own virtual servers in another cloud. A public cloud may be appropriate for one app and a private cloud for another. SMBs will benefit from the cost-effectiveness and agility that a multicloud architecture provides.
Multicloud and IBM
From a multicloud standpoint, it's been a busy year for IBM. In May, it launched IBM Cloud Private for Data to let companies extract hidden insights from their data across disciplines such as data engineering, data science, and development as well as their apps and databases. Then, on September 10, the company announced that IBM Cloud Private for Data would integrate with Red Hat OpenShift, the open-source container and Kubernetes app platform. Kubernetes is an open-source platform for running containers across clusters of servers. This integration with Red Hat gives more options to companies when running cloud-native workloads so they can run on-premises, in public and private clouds, and in the open-source Red Hat OpenShift environment. IBM will also extend its partnership with Hortonworks, a Big Data software pioneer, to integrate services in Hortonworks DataPlane with IBM Cloud Private for Data.
Finally, on September 13, IBM also announced that it would let users query analytics across the enterprise by using a tool called Queryplex, which is a single console for searching across clouds. That same day, IBM held an event at Terminal 5 in New York City hosted by ESPN's Hannah Storm to spotlight customers that are taking on the artificial intelligence (AI) challenge. Shortly before the event, PCMag caught up with Rob Thomas, General Manager of IBM Analytics, to get his take on how the new cloud search capability works, IBM's work with Red Hat, and some winning strategies in AI.
PCMag (PCM): How does IBM Cloud Private for Data let you see all of your data?
Rob Thomas (RT): Think about it as the console for how a client manages data anywhere across any cloud. If clients are using that, then they can see all the data they have on premise, in a private cloud container architecture, or they can see data they have on AWS, Microsoft Azure, Google Cloud Platform , or IBM Cloud. It's a single console for understanding all your data—where it is, cataloging your data and organizing it.
PCM: What is Queryplex and how can SMBs use something like that to search across clouds?
RT: Queryplex gives you the ability to really write a Structured Query Language (SQL) query and find data anywhere in the world and do analytics. With this wide-angle SQL capability, you don't have to move the data. We'll find the data wherever it is and we'll enable it. We can use the processing power on the edge and then provide the analytics back to a single place. So, those are two sides of the same coin. One is a console for managing all your data. The second piece is about how do you actually do analytics on data that's anywhere without having to move the data as Step 1, because moving the data is costly; it's time-consuming. So, we basically eliminated the need for data movement, which is super powerful.
PCM: What would be a day-to-day example of a company using this type of query capability?
RT: A good one would be an automotive company that's doing telematics to do predictive maintenance on an automobile or [to see] how it's performing. Today, the approach would be to connect to the car and then bring data back to a central location. It gives you the real-time capability. So, what was 30 days before is now 30 seconds. That's the power of doing this; it just totally changes the nature and the process of analytics.
PCM: What are the security implications of searching across multiple clouds? How do you opt in to allow that type of search?
RT: We designed Queryplex as an enterprise product that will take advantage of whatever an organization has established around Lightweight Directory Access Protocol (LDAP) security and identity management protocols or data-governance policies. Let me give you an example: If your company policy is that anytime you do federated queries that you don't want to touch any Personally Identifiable Information (PII), then we could mask that data as part of this capability so that it wasn't part of it. We really designed it to integrate into the security architecture of a company.
PCM: What would a company need to do to allow access to different clouds?
RT: When you're in IBM Cloud Private for Data, you get installed very quickly. In terms of connecting to a different cloud, it's just knowing the IP address. That's pretty straightforward; you can do that. So the connectivity piece is not hard. Where I think it gets harder for companies is that, as you're advancing more toward AI or data science-type use cases, you need to build a model for that. You need to train that model, and we're able to help you organize the data to do that.
PCM: What are a couple of key strategies for companies to implement AI or machine learning (ML)?
RT: A few different things. I see some clients that establish data science Centers of Excellence (COE). I think that could be a good way to energize the organization on the topic and get things moving. I think that's one good approach.
We see other clients that hire a Chief Data Officer (CDO) and give that person the mission of driving the company in this direction. I think that's good, too.
Third, I see a lot of companies that rely on this to come from line of businesses, meaning line of business to find the use case, and then that's for the technology innovation. I think any of those can work.
I think the biggest gap and what I encourage clients to do is to have a data strategy. Part of a data strategy is knowing where you are today. Meaning, are you really just doing business intelligence (BI) and data warehousing or are you actually doing self-service analytics? Understand where you are and then understand the end point. If you get clarity on those two points, then you can launch experiments through data science COEs, a CDO, or through a line of business, knowing that you'll get a level of repeatability out of those, which is important.
PCM: What led IBM to work with Red Hat?
RT: If you go back to 2000, IBM's been a pretty huge proponent of Linux. I'd argue that Linux probably wouldn't be where it is today without IBM's support. Because of that, we've always had an ongoing dialogue with Red Hat around innovation and how we support the ecosystem. We've been watching what Red Hat has done with OpenShift.
We're huge believers in containers, and Kubernetes has a way to help clients modernize apps and data states. If you look at Red Hat with OpenShift, they built a great container platform that's focused on modernization. But they don't have anything for data, and it's hard to modernize apps without modernizing data at the same time.
Where we can bring what we've done in terms of modernizing data services with IBM Cloud Private for Data is to run that natively right on OpenShift, so those clients that are on an application modernization journey can do the same thing with data, and they can turn that project into outcomes for AI.
Hadoop has not yet moved to a microservice architecture, so that's the other piece of the puzzle. Working with Hortonworks to help modernize and create microservices of Hadoop that could play along with IBM Cloud Private for Data and OpenShift.
PCM: How do companies use that type of microservice architecture?
RT: I think it all comes back to AI and data science. Whatever you're doing with data is typically driven around a business outcome. You're looking for some advantage in terms of how you're using analytics.
So, if you got a lot of your data in Hadoop, if you're not able to use that for predictive analytics, ML, or data science, then it's not very valuable to the organization. That's how I connect the dots. Hadoop is a microservice; it's a lot more composable, a lot more flexible. It's easier to work with the data, and it's easier to make it available to a large data science team. And that enables you to get more value out of your Hadoop implementation.PCM: Where do you see things going in the future as far as AI and ML?
RT: We're going to slowly enter the mainstream. A year ago, the discussion was, "Could I do anything?" I would say this has been the year of increased experimentation. I think next year we get into mass experimentation and hopefully, by the end of next year, we're at a point where this becomes more mainstream. People are using AI and models to automate a lot of basic business processes, to automate a lot of decision making. So, we're clearly on that journey. You can see the progression. I feel like we're getting close to a tipping point, if you will, but we're not quite there yet.