Great Cloud Migration Opens Data Opportunities

Big Data IoT Forum | June 16, 2019

By Alex Woodie–We’re in the midst of a massive cloud migration at the moment, as companies look to take advantage of the scalability and simplicity of storing and processing data in the cloud. But connecting the dots between on-premise data sources and AWS, Google Cloud, and Azure repositories is easier said than done. That’s also where a handful of vendors, from Attunity to WANDisco, are looking to make their mark.

Attunity‘s specialty is change data capture (CDC), which is the process of capturing changes to databases and other sources in real time and moving it to some other place. Several years ago, Attunity’s CDC tool was primarily used to move large amounts of transactional data housed in relational databases into the big analytical databases, like Netezza Teradata, and Vertica

But these days, a lot of the company’s business is moving data into cloud data warehouses from the Big Three – Amazon‘s Redshift and Athena, Google Big Query, and Microsoft Azure Cloud SQL Server – not to mention upstarts like Snowflake and Databricks, says Dan Potter, a VP with the company.

“We see the bulk of our business in the cloud space,” Potter tells Datanami in a recent interview. “Those environments are the vast majority of where people are moving data and processing data.”

In many cases, companies already have an on-premise data lake, perhaps running on Apache Hadoop, but they’re looking to expand their data storage and processing capabilities out to the clouds in a hybrid manner, Potter says. That’s where Attunity, which was acquired by BI vendor Qlik for $560 million in February, comes in.

“As people move to things like Snowflake, they realize that they still have the issue of doing all of the ETL, all of the modeling of the warehouse, all the creation of the warehouse scripts, maintaining that warehouse,” Potter says. “We fully automate that end-to-end.”

While the cloud vendor’s provide the various pieces of the big data puzzle – the object stores for cheap and massive storage, the distributed SQL query engines for advanced analytics, and various machine learning and AI capabilities for building predictive and intelligent applications – it’s still largely up to the customer to put it all together.

“All of these things are multi-stage processes,” Potter says. “The first sage is, I need to capture changed data from the source system. It may be mainframe or SAP or something else. I move that stuff into the cloud, move it into cloud object store. Then I need to assemble all of those change files in the data lake so it’s available for direct analytics from that repository. Or from that, I might take a snapshot or build out historic data sets that I can then move into a cloud data warehouse or move into Databricks for processing.”

The cloud vendors do provide some tools for moving data. Amazon provides Kinesis, a Kafka clone, as well as Glue, for writing ETL scripts. But Amazon doesn’t make it easy for companies to utilize other vendors solutions, such as Databricks (which runs on Azure and AWS) or Snowflake (which also runs on Azure and AWS and soon Google Cloud). Attunity wants to be the trusted Switzerland to help companies connect the dots in the cloud.

“All that orchestration needs to be handled by someone,” Potter says. “Being an independent provider like we are, we’re the ones that can go in and stitch it all together.”

WANdisco is also aiming to get a piece of the emerging cloud migration pie. Today the San Ramon, California company announced LiveMigrator, which it says will allow companies to migrate data from on-premise sources to the cloud in a single pass, while maintaining data availability and cohesiveness.

The company, which has its roots in Hadoop disaster recovery and Hadoop data replication with its NonstopHadoop offering, built its new LiveMigrator offering on its existing Fusion technology. That software was launched four years ago as a higher-layer abstraction level to enable companies to replicate data housed in Cloudera and Hortonworks clusters to other data stores, including S3, EMC Isilon, MapR, and Teradata.

Category: Uncategorized