Dark Mode Light Mode

Fivetran starts offering a service for managing data lakes.

Tuesday revealed the public availability of Fivetran Managed Data Lake Service, the firm most renowned for assisting businesses in developing their data pipelines.

The new service promises to alleviate the tedious effort of maintaining data lakes by automating and simplifying it for firm customers, freeing them up to concentrate on producing products on top of this data. With regard to Google Cloud on the road plan, the service now supports Amazon S3, Azure Data Lake Storage (ADLS), and Microsoft OneLake.

Fivetran has historically only supported data warehouses, which are usually used to store structured, relational data for analytics and business intelligence (BI) applications. On the other hand, data lakes serve to store both structured and unstructured data from diverse sources, primarily for real-time analytics and machine learning tasks. Databricks has helped to popularise the idea of the Lakehouse, which seeks to create a unified data repository combining the best of both worlds.

Fivetran co-founder and COO Taylor Brown told me “The idea is that we’re bringing the scalable infrastructure that we’ve delivered to BI for the last nine years to AI and the whole workload environment.”

Advertisement

After normalizing and deduplicating Fivetran’s current 500+ connections, the Managed Data Lake Service sends them to one of the supported data lakes, either in the Delta Lake or Apache Iceberg table formats. Once in the data lake, users can operationalize the data or transfer it to a machine learning platform to power their new AI applications, using their preferred compute engine such as Databricks, Snowflake, Starburst, or Redshift.

“Fivetran has only really supported the data warehouses, and certainly some customers use those tools as data lakes, but we have a lot of customers requesting that we support more of the Iceberg and Delta Lake formats in data lakes, particularly the larger customers,” Brown said.

Many of the clients that used the new managed service during its trial period discovered, as Brown informed me, that they were creating identical pipelines to load their data into data lakes and data warehouses.

One issue with data lakes is that it can be challenging to ensure users only access the data they need. Fivetran underlined in Tuesday’s release that it connects with current data catalogues and governance systems such as AWS Glue, Databricks Unity Catalogue, and Microsoft Purview.

“Fivetran’s support for Delta Lake as a direct destination excites us greatly,” said Databricks Product Director Himanshu Raja. With this new functionality, customers can now use Fivetran to power the Databricks Data Intelligence Platform to create an open lakehouse. Delta Lake is built using The forthcoming Fivetran integration with Unity Catalogue also excites us, as it will provide out-of-the-box governance and security for all Fivetran-generated tables.

Fivetran is giving the new service free (up to $10,000 per subscriber) until the end of August. Fivetran will then charge for it based on its current usage model. “Using Fivetran’s Managed Data Lake Service has one advantage: the ingestion is free,” Brown said. “You must actually ingest the data using the warehouse compute if you are loading within Snowflake, Databricks, or the other downstream consumers; this can be quite expensive in some cases.”

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Add a comment Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

How some former Pinterest search experts piqued Biz Stone's interest

Next Post

Hailo and Raspberry Pi work together on the AI extension kit.

Advertisement