“Short cuts make long delays.”
― J.R.R. Tolkien, The Fellowship of the Ring
The lakehouse pattern, in which you store all of your structured and unstructured data in a Lake, and get warehouse performance and semantics on it, has become the foremost pattern for data and AI at scale. This requires two fundamental layers: lakehouse storage (such as Delta) and lakehouse governance (such as Unity Catalog).
The criticality of governance is well established; you can only have a near-zero-copy data strategy with strong governance; otherwise, your strategy reduces to everyone having access to everything, which is not only untenable – in many cases, it is illegal. In addition, governing access in a unified way has many less obvious benefits:
- Auto-capture of lineage between data assets
- Audit logs for compliance
- Emergent semantics (discovering business terminology through usage, helping other usage)
- Statistics for auto-tuning performance
In total, these capabilities make data applications, and AI, much simpler and more efficient.
In Azure Databricks, Unity Catalog (UC) is the governance platform that delivers these capabilities. The general setup is you store all of your data in a lake (e.g. Azure Data Lake Storage, aka ADLS), but only access it through UC, providing all of the benefits above. This is the default setup and it covers all compliance regimes for all industries.
In 2023, Microsoft announced Fabric, the next step in the evolution of its Data and AI strategy. Databricks works closely with the Fabric team and is really excited about the path forward; all of your data in a Delta Lake, and seamless interoperability of all of your tooling.
It’s awesome. Except for the current state of shortcuts.
Fabric co-opted the zero-copy philosophy, which is great. A technique for that is what they call shortcuts; shortcuts are essentially pointers or symlinks to the files stored in ADLS. That way, a Fabric engine doesn’t have a copy of the data, it can just point to the data. Yay! Zero copy!
But get this – it is just pointing to the file directly in ADLS, without any consultation with Unity Catalog. Which means all of the governance benefits disappear. What’s more, it requires giving the user direct access to the underlying storage, a worst practice for managing data at scale. Our large customers that started down the path of granting user permissions at the file level all reverted since it was too difficult to manage.
But wait… you can just represent all of the UC permissions in ADLS, right? Maybe using Microsoft Purview? Well, no. There are a few reasons why:
- ADLS is file-based, and a lot of things you want to permission in Unity Catalog are “above the files”, like column masks, views, or models
- Replicating the permissions of UC in ADLS is essentially replicating UC. Microsoft’s One Security will have these capabilities over time, but it will be a multi-year journey
- Myriad security primitives, like network security (such as Private Link), depend on blocking direct user access to ADLS files, and these are not yet available through shortcuts
Due to these inherent limitations, Databricks and Microsoft are working on a governance-respecting implementation of shortcuts for Azure Databricks, wherein the concept will remain the same (you will have shortcuts to Databricks objects in OneLake), but it will be coherent with the governance rules you have established.
OK, this is all pretty confusing. Let me illustrate with a quick story.
I’m a bit of a data fiend. I build a lot of my own dashboards, several of which are popular within Databricks. I was checking on one of them this morning, where I got the following error:
This was an internal table from our data team that I was using, but the data team wants consumers to use a downstream table, so they enhanced the permissions in UC over the holidays. It was frustrating for me, but it was by design. They had sent out a PSA to all of the downstream consumers, including me (which they found in the lineage report), but I don’t always read my email (haha).
So I switched to the new table they recommended (which has a production SLA, monitoring, etc.). It is actually a view derived from multiple tables, with things like row-based access control enforced. Now the dashboard hums again. More importantly, the data team is free to refactor the upstream tables without breaking any consumers.
What if I was just using a shortcut to that initial table (by pointing directly at the files in ADLS)? Ignoring the governance issues, there would be the following problems:
- Higher level constructs (above the files) couldn’t be leveraged by the data team
- They wouldn’t have been able to block me without replicating all of the governance in ADLS
- My report would depend on a non-SLA table that would break unexpectedly
- Perhaps most importantly, they wouldn’t have known to notify me at all without the insight into lineage provided by UC
But sure, shortcuts make a nice demo 🙂
Azure Databricks and Microsoft Fabric are based on many similar design principles, the teams work very closely together, and the many thousands of customers that run their business on Azure Databricks will get a lot of benefit from this tighter integration. Customers already run PowerBI directly on the lakehouse through UC and this will keep getting better. In fact, publishing anything in UC directly to PowerBI has been made seamless.
Shortcuts are a compelling way to see how this can become even easier. But, Shortcuts, today, are simply not ready for any production use cases. If you want to employ them in the near term, be sure to understand the downstream implications for governance and stability of your systems, and budget significant clean-up work to untangle the permissions on your data when the governance is eventually coherent.
In 2024 (hopefully early 2024), we will deliver the governance-coherent shortcuts, and we are very excited for that day! This solution will provide shortcuts in OneLake that respect UC policies, and provide all of the governance benefits mentioned above.




