As we put 2018 behind us, Joe Hellerstein, co-founder & CSO at Trifacta calls out six emerging trends in data science and engineering that we can expect to see in 2019. However without the right tools, knowledge and people power, it is very possible that not all tech trends will live up to the hype.
Machine learning potential will run into unpleasant realities without high-quality data
Investors and the tech press are all abuzz about machine learning (ML), but those neural networks are only as good as the training data from which they learn. High-quality datasets -– typically the bigger the better – yield more accurate models.
As ML initiatives scale, we expect that the pain of cleansing and preparing high-quality data for ML models will become more apparent in 2019.
Data preparation is still widely regarded as the biggest bottleneck in any data project, which means that data scientists often spend more time preparing data than actually building and tuning machine learning systems.
In order for ML to make an impact at scale, organisations will need to first accelerate their data preparation processes.
DataOps will be the new DevOps
As organisations have shifted toward self-service, data analysts now have the right tools to wrangle and analyse their own data instead of endlessly iterating with IT. But after this shift occurs, then the question becomes, how do you make such operations scalable, efficient, and repeatable? Enter DataOps.
As an adaptation of the software development methodology DevOps, DataOps refers to the tools, methodology and organisational structures that businesses must adopt to improve the velocity, quality, and reliability of analytics.
Data engineers fill the critical roles powering DataOps and, as these practices become commonplace, data engineers will become critical resources. In fact, 73% of organisations polled said they planned to invest in DataOps this year. In the same way that DevOps engineers are a highly sought-after role today, we predict that data engineers will be in the near future.
The cloud transition remains both unstoppable and partial
The move toward cloud has been on for a while – and we saw proof of this in 2018 – but the migration wasn't anywhere near as fast or as total as predicted. Transitioning to cloud platforms can require a costly, time-consuming architectural overall, which can’t be done in one fell swoop.
The journey toward cloud isn’t nearly over for many organisations, and we predict that in 2019, we’ll continue to see a lot more progress. Still, some data and processes will remain on-premises for many organisations for the foreseeable future, largely due to regulatory concerns. The final destination for many organisations will be a hybrid cloud approach.
Data lakes aren’t going anywhere
The merger between Hortonworks and Cloudera prompted a lot of chatter about an end to data lakes. This prognosis will prove to be off-target, especially as the cloud migration continues.
In fact, storage offerings like AWS S3 make data lakes easier to maintain and use. In 2019, we predict that the data lake will continue to be sound architectural strategy for many organisations.
Autoscaling serverless solutions will become increasingly common
Data is everywhere, and even small businesses and individuals want to roll up their sleeves and wrangle datasets alongside the Fortune 500. One size doesn't fit all, however, which means serverless, pay-as-you-go solutions for DataOps will become a hot commodity for fledgling companies that are uninterested in setting up their own DataOps infrastructure right away.
Larger companies will seek out technologies whose costs scale automatically, allowing for surges at usage peaks and lower maintenance-level fees during idle periods. Above all else, convenience and flexibility will be key selection factors, regardless of company size.
Self-service technologies without governance will hit their limits
As self-service solutions grow and adoption no longer becomes the primary metric of success, organisations will increasingly question whether these solutions are efficient, scalable, and secure.
Without governance in place, IT organisations in particular will feel increasing pressure as the number of technologies to maintain and processes to schedule multiply unchecked. Heightened DataOps practices will offer new guidance on self-service technologies, and we predict that in 2019, self-service products without governance will hit their limits.
There are undoubtedly surprises waiting for us in the year ahead, but these six trends should help you to build a framework for anticipating, interpreting, and even leveraging whatever data wrangling challenges come your way in 2019.