Ever considered automating data extraction and transformation workloads? Here’s why it’s a bright idea

April 20th, 2021

Have you ever met someone who’s obsessed with automation? You know, the type of person who sets up Apple Home Automation with Siri to open their garage door, turn on the air conditioner when it’s hotter than 26 degrees inside the house and switch on the lights when they arrive home.

That’s me. I’ve worked in the application and data space for most of my career focusing on automation, and I get excited about finding new opportunities to automate in all areas of my life – both personally and professionally.

Lately, I’ve noticed a pretty significant automation opportunity that many companies seem to be overlooking: automating Azure Data Factory (ADF), DataBricks, and DevOps on reporting and data projects. It can be relatively easy to do, with a huge potential payoff.

Repetitive work makes a perfect target for automation

Any developer who’s worked on a data or reporting project knows how tedious it can be to create ADF pipelines. Tapping into various systems and extracting data can be complex work. Writing a ADF pipeline – a set of actions that take raw data and move them to a destination for storage and analysis – for each data source is an essential first step.

ADF pipelines are critical for undertaking data analysis and maintaining data quality, but they’re time consuming to build. An ADF pipeline usually takes a day of developer effort to write. What’s more, the work can be mundane and repetitive, since it’s not unusual to have 200 or 300 pipelines in a project.

The good news is that mundane and repetitive is what makes ADF pipelines a perfect candidate for automation. Because it involves identical actions time and time again, ADF pipelines can be easily automated once you find the pattern. You just populate a single table and let the automation perform the task of transferring data. We do this at Antares with a custom framework that runs over 200 pipelines extracting from various source systems.

How much time and effort will it save me?

Automating ADF pipelines substantially reduces workload pressures on the development team, particularly in projects with complex data sources.

I recently worked on a project with over 200 handwritten pipelines and I found a pattern that reduced those to just four. That meant that instead of making 200 changes, I could make four and automate the remaining pipelines to inherit them.

This significantly reduced the development time on the project – from 180 days of effort to in this case, 20 days – and allowed us to refocus our developers on other high priority work. It also had flow-on benefits for the remainder of the project because we had a repeatable and low-effort way of accommodating changes or adding new pipelines.

Of course, every project is different and the amount of time you can save by automating ADF pipelines will vary. But if your project has a lot of ADF pipelines and you can see patterns that emerge, you can expect to save days – if not weeks – of effort through automation.

If automating ADF pipelines is so great, why isn’t everyone doing it?

The classic barrier to automation is that teams are too busy focusing on completing a project and don’t have time (or make time) to explore opportunities to refine their approach. It can be difficult to convince a project manager to pause development work in favour of taking a couple of weeks to set up ADF pipeline automation especially if it impacts on project timelines or can’t be tied to a billable activity.

So, how do I overcome resistance?

Your best chance of success is to secure agreement to invest in ADF pipeline automation at the very beginning of a project so it can be tested, validated, and built into the approach.

You’ll be more likely to persuade project teams to pursue automation if you can demonstrate how it will work – build a minimum viable product and say something like “Hey, I’ve got an example of where we’ve automated ten ADF pipelines and I can see it working really well on this project. Here’s how we could build on that and here’s how much time I think it can save us.”

Decision makers need to see a tangible outcome and real value for the project, especially if you’re eating into developer time to focus on something that may not be strictly billable.

Tips for success

If you’re keen to try out ADF pipeline automation for the first time, it’ll serve you well to keep these tips in mind:

  • Identify patterns, then act on them – If you’re a developer and you’ve noticed that you’ve repeated the same pattern five or six times, stop what you’re doing. Realise that there’s a pattern emerging and start developing that way going forward – potentially through automation – rather than trucking on.
  • Don’t be afraid to leave your comfort zone – A lot of developers become so accustomed to repetition that they feel comfortable with it, they accept it, and they may not consider that there’s a better way of managing ADF pipelines. Working in new ways can feel uncomfortable, particularly when you’re under a lot of pressure, but it’s absolutely short-term pain for long-term gain.
  • Empower the development team – As with any other type of automation, the person best placed to identify automation opportunities is usually the one who is carrying out the task manually. Empower the developers to identify opportunities to automate pipelines and challenge the norm by saying “I see this pattern, how about we try this with automation?” The onus is on developers to spot patterns and push for automation because the project manager probably isn’t aware of the emerging patterns.

Where to from here?

There’s no feeling quite like coming home in the evening to an automated smart home that’s already cool on a hot day and lit when I walk through the door. Automation is not only intrinsically cool, but it has huge potential in saving you time and making your life easier whether you’re arriving home after work or showing up for a long week of building data pipelines.

When it comes to data workloads, the potential value of automation for development teams is immense. If you’re keen to find out more about how it can help your organisation, I’d love to talk.

By Kelvin Hong, Principle Consultant in Data & AI, Antares Solutions