Achieving DataOps

Share on facebook
Share on google
Share on twitter
Share on linkedin

DataOps defined by Gartner as “collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization” (1).

DataOps defined by Delphix as “alignment of people, process, and technology to enable the rapid, automated, and secure management of data. Its goal is to improve outcomes by bringing together those that need data with those that provide it, eliminating friction throughout the data lifecycle” (2).

DataOps

Both definitions point out the same: DevOps aims to reduce significant time between development being done and operations being realized while the data lags because of regulations like GDPR, CCPA etc, as well as size and operational costs. 

People have been managing data for a long time, but we’re at a point now where the quantity, velocity and variety of data available to a modern enterprise can no longer be managed without a significant change in the fundamental infrastructure (3). We are all aware of the fact that data is growing exponentially, and companies demand data being available in more environments to analyze, develop and test the requirements.

There are some ways to enable the data flow and I will provide details from Delphix Data Company Magazine (4) and customer experiences to explain the steps to Achieving DataOps using Delphix as DataOps Platform.

1. Accelerate Data Delivery

Talking about data delivery, it is needed to provide the latest version of secured data to Development, Test and all the other teams asking for. In legacy approach, the process usually involves the teams to open a ticket and then wait for the required steps and actions to be taken to complete the request to get the data.

Below are the general steps that is required for a team to get data:

  1. Developer opens a ticket for a database refresh for a database of Dev, Test or UAT. The ticket can be opened against a Test Support Team or a team that covers similar responsibilities.
  2. The ticket requires a database backup and restore operation.
  3. It is possible that database backup operation will require additional storage based on the size of the database.
  4. In case a storage is required, the DBA pauses his/her operations and opens another ticket for the Storage Team.
  5. The ticket goes to Storage Team, storage is provisioned to the server and ticket is closed.
  6. DBA team resumes operation, starts backup procedure for the database. The operation, depending on the size of database might take couple of hours or more.
  7. After the restore operation, the database backup needs to be moved to a shared place where it is available for restore operation. In some cases it is also possible that another storage requirement ticket will be opened and DBA team again will wait for the operation to complete.
  8. After the storage provision is completed, or if it is not needed after the database restore operation is completed, the DBA team will shutdown the database and restore it with the newest backup. Depending on the size the operation might take couple of hours or more. In case there are errors, the operation have to be started from scratch which might cause the whole operation to take up to days to complete and we have seen in many of my customers for this simple ticket to complete in a week or more.
  9. After the database is restored it is now the time for the Security team to secure the database. Based on the available tools a masking tool or masking scripts will be used, which might take in some cases again days or weeks. One of my customers spent more that 30 days to complete the masking operations in their 40TB customer database that is restored with legacy methods.
  10. We are at the Step 10 of the ticket and hopefully the database is ready for use for the developers who have been submitted the ticket about 5 or more days ago.

All of above steps are for just one database. Things will get more complicated when there are more databases be it from the same vendor or different vendors. It might become a small project if we are to assume to refresh 10 databases to be refreshed ranging in size from 1TB to 100TBs.

No alt text provided for this image

“Adopting a platform-based approach is critical: It helps enterprises automate the rapid provisioning of different test data based on developer needs while still observing modern data security practices with capabilities like masking non-production data, which accelarates data delivery in the DevOps life cycle” (4).

2. Reduce Data Friction

“Friction is the force resisting the relative motion of solid surfaces…” (5) is what Wikipedia explains. Data Friction is the slowness of the data related operations because of data volume and other data related reasons. As companies demand more data to be made available in more places, data friction emerges because the demands of data consumers aren’t met by data operators (4).

In today’s fast growing highly demanding business making data available for a test or reporting or other purposes is the game changer.

No alt text provided for this image

DataOps brings the two audiences, data operators and data consumers, closer. Simplifying and accelerating the process of providing the data to data consumers and giving them control over right data in right place will eliminate the unnecessary delays in working with data.

Implementing a DataOps platform is highly required in this case and in general for enabling DataOps, because all the other options will be either dependent on vendor or the people such as DBAs, security teams, storage teams, sysadmins etc. With the right DataOps platform it is possible to completely eliminate the data friction by providing virtual datasets having high quality secured data to development, test, data analytics teams.

3. Eliminate manual work (Automation everywhere)

Similar to DevOps, DataOps’ goal is also reducing manual work with more automation. When talking about data and in many cases how difficult to move data around it can be clearly seen that having a DataOps platform and automating the data delivery process is crucially important.

Keeping the data that has been available in different environments in sync with production can become an extremely resource intensive process that might take days or even weeks. I remember talking to a customer DBA and him commenting the following: “I have more than 500 databases in test environments that me and my team are responsible, if I check 1 database a day it will take more than a year to check all the databases“. In this example he was only talking about 5 different databases and all of them were Oracle.

Deliver data to all stages of the CI/CD pipeline

A DataOps platform that gives users Self Service capabilities to manage, refresh, rewind their own data will improve the DataOps maturity levels. Adding automation to the Self-Service capabilities can eliminate most of tickets that are being opened by data consumers to data operators. The idea behind automation is to turn the ticket driven IT requests to automated, data on-demand, ticket free environment.

The key therefore is to provide a programmable data interface, so that your data provision is automated and synchronised across all of your most important data sources. That is quite a big undertaking when you have multiple different database and data source types. If you can simplify or homogenise the data interface to allow a standard API to manipulate and synchronise controls across that data then that helps to direct your automation and scripting so that it delivers value across all of your data sources.

4. Simplify data collaboration with data protection

Providing data to data consumers at on demand is a great feature but without establishing a secure data distribution whole process is in vain. On the other hand, providing multiple databases in minutes via data platform brings the complexity of work for securing all the data that will be around.

No alt text provided for this image

A DataOps platform that takes comprehensive approach to data security will only simplify these complex processes as well as making data available almost instantly. The idea behind this architecture is to secure the data in one place and then distribute the secured copies of the data to all data consumers.

5. Provide Self Service

Bring on demand, secure and multiple data to data consumers might let other problems come into play, such as overwhelming workload to data operators. For DataOps practices and culture to settle and DataOps become more mature Self-Service capabilities is a must in every DataOps platform.

No alt text provided for this image

Having Self Service will enable data consumers to become more involved in the process as well as manage their own data in any way they see fit. Being able to refresh their own data in minutes from the latest secured, high quality Production data with Self Service is the dream of all data consumers. What about restoring data to an earlier stare, a minute ago or an hour or a day, and without any ticket or intervention from DBAs; and doing this in just minutes?

Any DataOps attempt must include Self Service capabilities for the data consumers otherwise the attempt itself will make overburdened data operators become even more overburdened, something that might even be called a data nightmare.

6. Single Point of Control

Today it is almost impossible to find an enterprise that would use only one type of database vendor and software. Even the best case scenario would include multiple vendors’ products running.

No alt text provided for this image

From data operators’ point of view, managing many data sources and their lower environments that are required by data consumers becomes a task that is both complex and tiring. In ticket managed enterprises the effects and results of the complexity can be felt everywhere: an environment refresh will take weeks or months; UAT, Preprod and similar end to end integrated systems would either be never refreshed or any operation on these systems will become huge projects that will take time of many experts for long durations; all projects that depend on data will slow down or tested poorly; shift left might stop and many more…

Enterprises can only address this challenge by implementing a standardized approach to managing, securing and distributing heterogenous data by adopting a DataOps platform that works on all data sources.

7. To the Cloud, to all Clouds

The name of these days is the Cloud! Organizations are excited about cloud and how it will cut down costs and will bring many other benefits to all enterprises. Many organizations also thinks or accepts that the cloud might be the enabler of digital transformation for them. Moving from on-premise systems and legacy ways of operating to a new chapter in the cloud will definitely benefit everyone in the process.

No alt text provided for this image

Still, moving to the cloud requires planning and strategy. Any DataOps planning will require a platform that fully supports cloud, preferably all clouds. A DataOps platform that supports synchronization and replication from on-premise data sources to cloud instances and provide all the benefits everywhere will help immensely.

In conclusion

Data is the most important asset to organizations if used properly, securely and instantly. Data stays behind of the code which causes many delays and unwanted business outcomes. To solve all these issues Delphix DataOps Platform is the perfect solution that will any organization’s journey to DataOps.

Resources

  1. Gartner DataOps Definition
  2. The Power of DataOps
  3. From DevOps to DataOps, By Andy Palmer
  4. Delphix Data Company Magazine Issue #1
  5. Friction definition from Wikipedia

Recent Posts

Achieving DataOps

DataOps defined by Gartner as “collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data

Delphix Use Cases for Insurance Market

When we start talking about big companies and their approach to handling their software delivery methods data comes out eventually as an obstacle to the