“Resources exceeded during query execution”, an article on the Memory limit problem faced in BQ.
In some cases you will get an “Resources exceeded during query execution” error when trying to build a large derived table in BQ(Big Query). This generally happen for queries that require huge amount of memory to run, or when a inner/derived query needs huge amount of memory to store temporarily while more operations is made on it.
Why does this happen?
This happens because even though BQ is a distributed Relational database and most queries are executed on multiple nodes, there are certain operations that requires all data to be processed on a single node due to calculation needs, which means that all the data needed for the calculation needs to exist on the same node. This is needed for those calculations even though BQ is a "distributed DB" and since BQ is a database running with a "serverless paradigm", it has several limitations for single node computation requests.
When your query requests for too much data to be processed on a single node and can no longer fit onto that node, you will get the “Resources exceeded during query execution” error and the whole query fails.
In the case of failure, even if your query already processed huge amount of data or took a really long time to run, your account will not be billed by Google for this query. Since all failed queries are not billed (Refer to BQ billing).
What are the Operations that Operate on a Single Node?
Operations that need to see all the data in the resulting table at once have to operate on a single node.
Un-partitioned window functions like RANK() OVER() or ROW_NUMBER() OVER() will operate on a single node.
Another operation that is mentioned to face a lot of problems is the ORDER BY clause. It is an operation that requires a single node to see all data, so it may hit the memory limit of a single.
ORDER BY clause:
- If your query contains an ORDER BY clause, all the data is still going to be passed to a single node and then sorted.
- If it doesn’t fit on the node then it fails. Since BQ is a scanning database (it scans the whole column for each query) sorting the data is unnecessary.
- So don’t use ORDER BY in ordering derived table results.
When will it matter?
When you want to generate a primary key for a derived table. A common way to do this is to use ROW_NUMBER() OVER() as id to generate a sequence number for a each row in your table. If the table is big we fail with “Resources exceeded” because this function tries to operate on a single node.
Other situations is when ordering table that requires all data to be present in order to do proper sorting.
Or in fact any other tasks that is not based on grouping data, or tasks that cannot be easily split up to work in parallel will have a chance of facing this type of issue.
How to avoid operating on a single node, to prevent query from failing?
Unfortunately, there seems to be no way to increase the per node memory limit of BQ and it is not likely that Google would do something about it soon. At least at the time of writing, there seems to be no known way of customizing the node BQ uses for its calculation too. Thus in this case, it is better if we choose to work with this limitation in mind and design solutions that does not rely heavily on operations that will overload single node processes. Below are some ways to deal with the issue:
Partitioning using a generated key
Luckily, there’s an easy way around this. You can PARTITION the the window function by date and build a string as the primary key. For example:
CONCAT(CAST(ROWNUMBER() OVER(PARTITION BY eventdate) AS STRING),
'|',(CAST(event_date AS STRING)) as id
Splitting up the query and using "temporary tables"
You can try to split your query up.
Write the results of every individual sub/inner query to another table as a temporary storage space for further processing. Since Big Query also has a limitation for the maximum response size that can be processed.
You can do that either if you are using Legacy or Standard SQL, and you can follow the steps to do it in the documentation pretty easily.
By doing this, you effectively reduce the load of per query data scanning and may help to solve your memory limit issue.
More reading resources:
Some other resources on the internet that answers this question too. Can take a look at all of them to get different perspectives on this.