The rise of generative AI has put renewed focus on the importance of data. Without data that’s high in quality and up-to-date, an AI model simply will not generate output that’s as good as a model that has data with those qualities. That’s one of the lessons that data pipeline purveyor Matillion will be hammering home during its one-day Data Unlocked conference on November 15.
As an extract, transform, and load (ETL) software and service provider, Matillion is best known for helping companies move transactional data from operational data stores into cloud-based data warehouses such as AWS’s Amazon Redshift and Snowflake as quickly and efficiently as they can.
While the big data analytics use case is still strong, the extraordinary interest in AI fueled by ChatGPT’s launch nearly a year ago has led Matillion to deliver innovations to help customers leverage the latest AI tech. The company will be discussing these at the Data Unlocked virtual summit that’s taking place on Wednesday.
At the show, Matillion will be unveiling new capabilities for working with unstructured data for AI use cases, according to Laura Malins, Matillion’s vice president of product. It’s all about helping customers get better output from AI models by feeding them better data, she says.
“We already have components that will take semi-structured data and flatten it out into a tabular format, but we think there’s a lot more opportunities and there will be a lot more demand to take unstructured data–video data, call log data, and we’re even looking at scraped Web content data–and bring that into a warehouse or data lake,” she says. “Then use AI to give you some kind of summary or intelligence or a feedback score on that data, and then put that into your warehouse and use that to embellish your structured data.”
Use cases such as churn prediction and customer sentiment analysis are not new to data science. But the advent of very powerful large language models (LLMs) like GPT-4 has dramatically lowered the bar on the type of effort required before a business can enjoy good results with these projects. As an established ETL/ELT data pipeline provider, Matillion is in a unique position to funnel high quality and trusted data into AI models, Malins says.
For instance, Matillion could help a call center client by pulling unstructured call log data through an LLM to provide a summary of the information or a feedback score for a particular customer, she says.
“Give me an indication on this data whether this customer is happy, whether this customer is sad, whether this customer is neutral, and why,” Malins says. “And then we can start to be more intelligent around what data sources we have and what we do with the data. We can get meaningful and quantitative data from really unstructured, big data sources in a way that’s never been done before.”
The goal is to build on Matillion’s role as a trusted provider of structured data to help customers take the next step into AI with their less structured data, Malins says. There’s a noted lack of trust in data right now.
“One thing that we’re seeing a lot of at the minute is a lot of fear around external models and who has access to that and could I lose my data, etc.,” the VP of product says. “Matillion would give you that traceability, that lineage around it, that data governance. So you’ve got the repeatability around the process. You understand where that information is coming from, so if something is wrong, you can tweak it and tune it for the next time around.”
There’s a lot of experimentation going on with GenAI and LLM at the moment, and one of the things companies are doing is building ensembles of models, where the output from one model becomes the input for another. There are risks inherent with doing that, and that is another area where Matillion may be able to provide some benefits to customers, Malins says.
“There’s a phrase in the data industry that’s always existed: Rubbish in, rubbish out,” she says. “That’s just more rubbish effectively going into it. It’s not being validated and verified. And AI speeds things up, so it’s just propagating the rubbish out and that inability to distinguish between what’s actually good and what’s not good.”
If taken to the extreme, this can lead to AI model collapse, where the output of AI models is essentially worthless. That potentiality is leading to a recognition that greater data lineage and governance is mandated, Malins says.
“A key trend in the industry going forward [will be] around traceability: Who puts what into the models, and who owns what outputs,” she says. “I think AI’s been a bit of a free for all in 2023, and I think it will be for some elements of 2024. A lot of it’s about companies getting their hands on it and learning and what to learn and understand more about kind of what you can get from it, what you put into it, and what you get out for it.”
Data Unlocked will feature keynotes by Matillion executives, such as Mulins, CEO Matthew Scullion, and CPO Ciaran Dynes, as well as Snowflake CEO Frank Slootman; Mo Gawdat, the former Chief Business Officer of Google; and David Coulthard, a Formula 1 Grand Prix Driver. Attendance to the virtual event is free. You can register here.