AWS Launches Glue To Automate ETL Job Automation
A new solution from Amazon Web Services (AWS), promises to cut down the time required for organizations to sort through their data for analytics projects.
AWS Glue will be available for general use starting Monday at the AWS Summit in New York City. Matt Wood, AWS’ general manager of artificial Intelligence, presented the new service during the keynote presentation. It is an extract, transform, and load (ETL), solution that can be fully managed and is completely serverless.
Glue is designed for organizations to dramatically reduce the time they spend on data refinement before they can analyze it. AWS announced that Glue could make what can take months of work a matter of minutes.
According to the company, data integration, which involves extracting data from different sources, normalizing it and loading it into data storage, can take up to 75 percent of the time needed to implement an analysis project. Customers can spend months manually coding and editing ETL programs, which often become more complicated and more error-prone as new data sources and data volumes increase.
Glue automates the data integration process, simplifying this process. Glue works by first creating “crawlers”, which are deployed across AWS resources of an organization to discover and categorize its data and metadata. Based on the collected information, Glue creates an editable and sortable data catalogue.
Next, the service creates an individual transformation code for that data. “Glue
You can automatically generate ETL scripts in Python! Randall Hunt, AWS developer evangelist, wrote a blog post explaining how to automatically generate ETL scripts (in Python!)
Users can then schedule one or several ETL jobs, regardless of whether they are consecutive, recurring, or on-demand. Glue will scale resources automatically according to the workload.
Here are the prices for glue