Data Exfiltration in Power Query - Understanding the Risk and Protections
What is Data Exfiltration in Power Query?
Data exfiltration is the act of moving sensitive data outside a trusted environment without authorization. In the context of Power Query (the data transformation engine behind Excel, Power BI, dataflows, etc.), this means an insider could use a Power Query script to siphon data from secure sources (like databases) out to an external destination. Microsoft defines data exfiltration as occurring when sensitive business data is accidentally or intentionally moved outside of a trusted boundary.1
Power Query is powerful â it can connect to hundreds of data sources and, importantly, it can make web requests. By default, when Power Query runs in cloud services (e.g. Power BI dataflows or datamarts), it executes in the Power Query Online Mashup Engine, which operates in Microsoftâs cloud with unrestricted internet access.2 In other words, if a userâs Power Query code is executed in the cloud, it can potentially send data to any external endpoint on the internet. This creates a potential security blind spot for organizations: a savvy user with data access can craft a query that exfiltrates (sends out) sensitive data, bypassing typical network protections.
Mashup Query Engine
The Mashup Engine is the core component of Power Query responsible for executing data extraction, transformation, and loading (ETL) processes. It interprets the M scriptsâPower Queryâs formula languageâand performs the operations defined within them.3
Mashup Execution Environment
The Mashup Engine operates within different environments depending on where Power Query is utilized:
- Desktop Applications: In tools like Excel and Power BI Desktop, the Mashup Engine runs locally on the userâs machine, leveraging local resources for data processing.
- Cloud Services: In the Power BI service, the Mashup Engine executes in the cloud, utilizing Microsoftâs infrastructure to process data.
- On-Premises Data Gateway: When accessing on-premises data sources from cloud services, the Mashup Engine operates through the data gateway, ensuring secure data transmission between on-premises sources and cloud services.
How Can Power Query M Code Exfiltrate Data?
Power Queryâs M language allows combining data from multiple sources and calling web endpoints and Power Query can query a secure database and then silently send that data to an external web service. Letâs illustrate with a simplified snippet of Power Query M code that demonstrates the core concept of data exfiltration:
let // Step 1 SourceData = Read data from a secure SQL database, // Step 2 Convert = Convert the data for transmission, // Step 3 Result = Send the data out to an external API endpointin Result
Figure: Simplified snippet of Power Query M code
In this example, the query connects to a Secure SQL Database (e.g. an internal FinanceDB) and retrieves data. Then it converts that data and uses a function to send the data to an external web service. The Result
of the web call (if any) is returned by the query, but the key point is that the sensitive data has now left the organization â itâs been sent to an external server controlled by the attacker.
Microsoftâs own documentation explicitly notes this risk: âBy using Web.Contents, users can make web requests with sensitive data in the body of the requestâ.4
Figure: Data Exfiltration Path via Power Query â The diagram above shows how an insiderâs Power Query can extract data from a secure database and send it to an outside API, all within a single Power Query execution.
In the above flow, the Power Query Mashup Engine (running in the cloud) has access to the secure SQL database (using provided credentials). It queries the data and then loads it into a semantic model and out over the internet to an external endpoint. Notice that this exfiltration can occur even if the data source is inside the corporate network â the Mashup Engine runs outside the network by default, so it isnât subject to the organizationâs internal firewalls or DLP scanners in the usual way.5 Effectively, the Power Query is acting as a bridge between the secure data source and the attackerâs server.
Why Is It Hard to Prevent This? (No Foolproof Safeguard)
It may be startling, but there is currently no foolproof way to completely prevent data exfiltration from a Power Query file under Microsoftâs current model. If a user has access to create or edit Power Query queries (in Excel, Power BI, dataflows, etc.) and they have access to some sensitive data source, they can potentially craft M code to leak that data. The Power Query runtime will happily execute the code as long as the user is authorized to access the sources and run queries.
By design, the default Power Query Online environment trusts the userâs code and has no fine-grained content inspection or egress filtering of its own. Microsoft warns that the cloud-based Mashup Engine runs outside the tenantâs network and is not subject to corporate firewalls or DLP policies, meaning data can flow to any destination the user has access to.6 In essence, once the query is running, the service will perform whatever data connections the code specifies â whether thatâs pulling from a database or pushing to an external URL. There is no built-in setting like âdisable web access in queriesâ or âallow queries to only connect to approved endpointsâ at the Power Query level today. This is why the problem is often described as an insider threat scenario; it assumes the person executing the query is authorized to access the data in the first place, and then abuses that privilege.
To reiterate, no native Power Query setting can outright stop a determined user from embedding an exfiltration step in their M code. Organizations must instead rely on broader security strategies to mitigate this risk. Microsoftâs guidance (as of early 2024) focuses on external controls â essentially making it harder for that malicious query to succeed by constraining where and how Power Query code runs. Below weâll examine the current best practices and protections Microsoft recommends, and then why those measures still have limitations.
Microsoftâs Recommended Protections and Best Practices
Microsoft has acknowledged the data exfiltration risk in official documentation and outlines several best practices to help protect against it.78
These protections mainly involve network-level controls and execution policies rather than application-level restrictions in Power Query itself. The key strategies include:
- Network Isolation of Data Sources: Lock down your sensitive data sources so that they can only be accessed from approved network locations. For example, use firewall rules, Azure Virtual Network service endpoints, Private Links, or network security groups to ensure that the database or data source cannot be reached directly from the public Internet.9 The idea is to obstruct line-of-sight from the cloud-based Power Query engine to your data. If the Power Query Mashup Engine (running in the cloud) tries to connect to a protected database that isnât reachable outside the corporate network, the connection will fail.10 In practice, this often means forcing the query to go through an on-premises data gateway or a VNet gateway that resides in your network. Microsoft notes that when a dataflow or query is bound to a gateway (due to the source being isolated), all data access in that query will occur via that gateway â cloud and gateway execution are never mixed in a single query.11 Because the gateway runs within your controlled environment, it will be subject to your organizationâs firewalls, network monitoring, and DLP solutions.12 Essentially, network isolation + gateway enforcement makes the Power Query run as if it were on a corporate machine, where you can apply your usual outbound restrictions (for instance, blocking it from calling unknown external websites).
-
Force On-Premises (Gateway) Execution: If truly necessary, an organization can disable cloud execution of Power Query entirely, forcing all queries to run through on-premises gateways. Currently, Microsoft offers this as a tenant-wide policy (set via a support ticket) that turns off the Power Query Online Mashup Engine for the entire tenant.1314 After this policy is applied, any attempt at cloud evaluation will fail with an error like âCloud evaluation request denied based on tenant policies. Please use a data gateway and try again.â15 This effectively guarantees that none of your Power Query code runs in Microsoftâs cloud environment â everything has to go through a gateway that you control. However, this is a very blunt measure: itâs all or nothing (more on the drawbacks later). Microsoft notes that this policy impacts all dataflow-like operations (Power BI dataflows, Fabric dataflows, Power Platform dataflows, datamarts, etc.), and it is applied tenant-wide, not to specific workspaces or users.16 In other words, you canât selectively allow some cloud executions and block others with this policy; turning it on will break any queries that were not already using a gateway, until they are reconfigured.
-
Tenant and Identity Policies: Another aspect of Microsoftâs guidance is to use identity-driven controls to prevent data from leaving the organizationâs tenant or being accessed by unauthorized users. For cloud SaaS data stores (like Dataverse, Fabric Lakehouse, or even Power BI itself as a data source), pure network isolation might not be feasible because these services have multi-tenant endpoints. Instead, Microsoft recommends tenant isolation policies at the Azure AD (Entra ID) level.1718 Examples include: only allow authentication via your organizationâs Azure AD (no using generic usernames/passwords or anonymous access to data sources), enforce location-based conditional access so that only users on managed devices (e.g. corporate laptops or network) can access the Power BI/Fabric service, and use tenant restrictions to block your users from signing into other tenantsâ Power BI environments (and vice versa).19 These measures help ensure that a user canât, say, use their credentials on an external Power BI tenant or tool to pull data out, and that only approved devices within your org can run queries. While these identity-centric controls donât directly stop something like
Web.Contents
calls, they do guard against certain exfiltration paths (such as an employee trying to shuttle data to another cloud tenant or personal account). -
Governance and Monitoring: In addition to preventive controls, Microsoft is adding auditing and governance features to address this gap. Administrators can leverage Power BI/Fabric audit logs (via Microsoft Purview) to monitor data connection activities.2021 Every time a Power Query data source connection is created or used, it can be logged with details like who ran it, what data source was accessed, and whether a gateway was involved.2223 This doesnât stop an exfiltration attempt in real-time, but it provides a trail that could identify suspicious behavior (for example, a user suddenly using a Web data source or an unknown API in their query). Microsoftâs roadmap also includes features like data source allowlisting â which will let tenant admins restrict what connectors or endpoints can be used in Power Query â and enhanced auditing of query activities.24 These are not yet fully available as of this writing, but they indicate that finer-grained controls are on the way.
In summary, Microsoftâs current best practices to mitigate Power Query exfiltration risks involve shifting execution into controlled environments (gateways or isolated networks), and layering on network/identity policies and monitoring. All of these assume the organization takes explicit action to constrain how Power Query runs. If you do nothing, the default is that the Power Query Online engine will run your code in the cloud with open internet access â a scenario in which an insider could potentially send data anywhere.2526
Limitations and Shortcomings of the Current Approach
While the above measures can reduce the risk, they come with significant limitations. Itâs important to understand these shortcomings:
-
Heavy Reliance on Network Layer Security: The primary defense is essentially âdonât let the cloud engine talk to anything it shouldnâtâ by using network isolation. This puts a lot of onus on infrastructure setup â every sensitive data source must be properly firewalled or put behind private networks. Any gap (for instance, a database that allows public IP access or an allowed outbound route to the internet) can be exploited by a malicious query. Microsoft explicitly notes that higher-layer controls like app-level policies or conditional access are not sufficient; you really need network-level or lower isolation for sensitive data.27 Not every organization is prepared to enforce strict network rules for all data sources, and doing so can be complex and costly (especially if you have many data systems). In short, the current model leans on traditional network security â if thatâs not airtight, Power Query can punch through it.
-
No Fine-Grained Power Query Controls: Today, there is a lack of fine-grained controls within Power Query itself. If you allow people to author queries, you are inherently allowing them to use the full M language power. Thereâs no built-in way to say âuser Xâs queries cannot call Web.Contentsâ or âonly allow queries to connect to domain Yâ in the Power BI or Excel UI. The platform doesnât let an admin easily restrict specific functions or external calls on a per-file or per-workspace basis. The all-or-nothing tenant policy to disable cloud execution is basically the only switch, and it doesnât discriminate between âgoodâ and âbadâ queries. Microsoft has recognized this gap â hence the planned feature to allowlist or block specific connectors/endpoints.28 â but until that arrives, organizations have limited options. This means a determined insider with edit rights can use any connector or function at their disposal in M. Even if you only give someone access to one data source, they could still combine it with a Web connector in the query to send data out. There is also no content inspection of the M script by the service that could catch something like âhey, this query is posting data to an unknown URLâ â it will just execute it.
-
Tenant-Level âKill Switchâ is All or Nothing: The strongest preventive measure available â turning off the cloud Mashup Engine â applies to the entire tenant and all Power Query workloads.29 This is a very coarse control. In a large organization, completely forcing gateway execution can be disruptive: all dataflows, datamarts, and even certain Power BI features (like Quick Import from SharePoint or some transformations) will stop running in the cloud and require a gateway or will break.30 Thereâs no way to enforce the restriction for only a subset of workspaces or only for certain users. Itâs also not a toggle you can self-serve in the admin portal as of now â it requires filing a support ticket to get it enabled.31 Because itâs tenant-wide, if you enable it, you need to have gateways set up for every scenario, and sufficient capacity on those gateways to handle the load.32 This heavy-handed approach might be unacceptable for organizations that rely on cloud dataflows for convenience or have many existing cloud refreshes. In practice, many companies hesitate to use this nuclear option, leaving them to rely on the network and policy approaches above â which, as noted, are not foolproof.
-
Insider Threat Still Exists: Even with all the controls, remember that these measures primarily make it harder (or more detectable) to exfiltrate data â they donât eliminate the possibility entirely. For example, if an organization allows Power Query to run in the cloud for convenience, and just trusts network isolation, a savvy insider might still find an endpoint that is allowed. Or if an on-premises gateway is used, but that gateway machine itself has unrestricted internet access, the queryâs
Web.Contents
call could run through the gateway and out to the internet (unless the gatewayâs environment is locked down by network rules). Identity policies (like tenant restrictions or managed device requirements) help prevent certain routes of exfiltration (such as using an external tenant or unmanaged PC), but they donât stop a query from sending data to an anonymous external API from within an allowed session. Ultimately, an authorized user abusing their data access via Power Query is difficult to stop without either removing their access or implementing very restrictive infrastructure controls. -
Lack of Visibility and Granular Auditing: Although auditing capabilities are improving, it can still be non-trivial to spot what exactly a query is doing. The audit logs will tell you that a data source was accessed or maybe that a Web data source was used, but they wonât necessarily flag âthis query sent 500 customer records to example.com.â Thereâs no content inspection of the payload in logs. So detection of malicious use might require correlating logs (e.g. noticing that right after someone accessed a sensitive SQL data source, they also made a series of web calls). This is more of a SOC (Security Operations) challenge â and many orgs might not have the resources to deeply analyze Power BI/PQ usage logs. In short, current protections are largely preventive at the network/identity level, and any detective controls are still rudimentary.
Conclusion: Raising Awareness and Next Steps
The ability to perform complex data mashups is one of Power Queryâs greatest strengths â but as weâve seen, it also introduces a serious security concern. An insider with the know-how can leverage Power Query to bypass traditional data loss prevention, making it a potential data exfiltration pipeline if left unchecked. The current Microsoft model does not offer an out-of-the-box, foolproof block against this behavior. Instead, organizations must use a combination of network isolation, gateway enforcement, and strict tenant-level policies to mitigate the risk, and even then, gaps remain. For IT administrators and data platform owners, the key takeaway is to be aware of this blind spot. If your environment allows Power Query to run code in the cloud (as is the case with Power BI dataflows, datamarts, Excel Online, etc.), you should assume that any data accessible by those queries could potentially be sent to an external site. In high-security environments, strongly consider disabling cloud-based Power Query execution (even though itâs an extreme step) or tightly constrain it to whatâs absolutely necessary. At the very least, implement the network isolation strategies â ensure that sensitive databases can only be reached via your controlled networks or gateways, not directly from the internet.3334
Moving forward, we can anticipate better controls from Microsoft (such as connector allowlisting and improved auditing),35 but until then, we have to rely on these broader security measures. Educate your teams about this risk so that power users and analysts understand why certain restrictions (like using gateways or blocking unknown connectors) might be in place. If youâre a Power BI or IT admin, keep an eye on tenant settings and the roadmap for any new data exfiltration protection features. This is a nuanced security hole â itâs not a traditional âbugâ that can be patched, but rather an inherent capability being misused. Closing it completely is non-trivial, which is why awareness is crucial. By knowing the limits of current protections, you can better harden your environment and avoid unpleasant surprises. The bottom line is that data exfiltration via Power Query is possible and challenging to prevent, especially in cloud-enabled scenarios36 â so itâs on us to be vigilant and put layers of defense around it.
More resources:
- Power BI dataflows enhanced compute engine
- Query evaluation in Power Query
- On-premises data gateway January update is now available
Footnotes
-
https://learn.microsoft.com/en-us/data-integration/gateway/data-connection-auditing#:~:text=Data%20exfiltration%20occurs%20when%20sensitive,or%20Power%20BI%20semantic%20models â©
-
https://learn.microsoft.com/en-us/data-integration/gateway/data-connection-auditing#:~:text=data%20across%20hundreds%20of%20supported,accidental%20or%20malicious%20data%20exfiltration â©
-
https://learn.microsoft.com/en-us/power-query/query-folding-basics â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=,against%20another%20untrusted%20data%20source â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=A%20trusted%20user%20who%20has,programs%20can%20exfiltrate%20sensitive%20data â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=A%20trusted%20user%20who%20has,programs%20can%20exfiltrate%20sensitive%20data â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=,and%20best%20practices%20for%20safeguards â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=unrestricted%20internet%20access,to%20flow%20between%20these%20sources â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=We%20recommend%20that%20all%20data,considered%20insufficient%20for%20this%20purpose â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=These%20network%20isolation%20policies%20must,Additionally%2C%20since â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=the%20public%20internet%29,as%20firewalls%20and%20DLP%20protection â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=access%20is%20then%20required%20to,execution%20in%20a%20cloud%20environment â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=The%20goal%20for%20isolating%20sensitive,Impacted%20capabilities%20include â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=After%20application%20of%20the%20policy%2C,to%20accommodate%20all%20their%20workloads â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=After%20application%20of%20the%20policy%2C,to%20accommodate%20all%20their%20workloads â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=a%20full%20network%20isolation%20solution,to%20accommodate%20all%20their%20workloads â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=For%20most%20software,provided%20by%20Microsoft%20Entra%20ID â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=,14 â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=,14 â©
-
https://learn.microsoft.com/en-us/data-integration/gateway/data-connection-auditing#:~:text=How%20to%20use%20data%20connection,auditing â©
-
https://learn.microsoft.com/en-us/data-integration/gateway/data-connection-auditing#:~:text=Shortcuts,find%20all%20connections%20by%20querying â©
-
https://learn.microsoft.com/en-us/data-integration/gateway/data-connection-auditing#:~:text=Auditing%20for%20data%20source%20connections,find%20all%20connections%20by%20querying â©
-
https://learn.microsoft.com/en-us/data-integration/gateway/data-connection-auditing#:~:text=For%20each%20attempted%20connection%2C%20we,include%20the%20following%20information â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=The%20following%20list%20contains%20some,data%20exfiltration%20risks%20in%20Fabric â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=unrestricted%20internet%20access,to%20flow%20between%20these%20sources â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=A%20trusted%20user%20who%20has,programs%20can%20exfiltrate%20sensitive%20data â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=We%20recommend%20that%20all%20data,considered%20insufficient%20for%20this%20purpose â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=,creation%2C%20updating%2C%20deletion%2C%20and%20usage â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=a%20full%20network%20isolation%20solution,to%20accommodate%20all%20their%20workloads â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=After%20application%20of%20the%20policy%2C,to%20accommodate%20all%20their%20workloads â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=The%20goal%20for%20isolating%20sensitive,Impacted%20capabilities%20include â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=a%20full%20network%20isolation%20solution,to%20accommodate%20all%20their%20workloads â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=These%20network%20isolation%20policies%20must,Additionally%2C%20since â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=connections%20to%20an%20on,risks%20associated%20with%20arbitrary%20code â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=The%20following%20list%20contains%20some,data%20exfiltration%20risks%20in%20Fabric â©
-
https://learn.microsoft.com/en-us/power-query/dataflows/data-exfiltration-best-practices#:~:text=A%20trusted%20user%20who%20has,programs%20can%20exfiltrate%20sensitive%20data â©