WebGPU shader limits

The only limits are the ones you set yourself (or so they said).

Limits can be a fun thing to look out for when exploring unconventional ways to use something in a safe, consensual, way.

A while ago I got in my mind that it would be cool to try to dump data directly in the shader code string. Engrave the data in the code instead of going with the common approach of providing it through some external mechanism (textures, buffers, etc). This manipulation of the code string would happen before it being compiled and by making use of some sort of pre-processor that could update the shader code in unusual ways. A lispic meta shader code as a service or something like that.

That road led me to over-engineer a demoscene engine that still grabs my attention to this day and hopefully will bring its first fruits in Inercia 2025 or 2026. But this will be the subject for a future post.

In the meanwhile there were some learnings that I found that might be cool enough to be worth of your attention. All code is WGSL shader code.

By the book limits

Querying the WebGPU device is the regular way to know about some of the limits of the adapter/device we have running. It gives developers the device reported maximum values for certain things like the maximum storage buffer binding size and maximum invocations per workgroup etc.

These queried limits are the ones that change between devices, and naturally do not cover all the limits that we are bound to. For instance a lot of the limits come from WebGPU and WGSL specs and these are often the minimum limits that the implementations must support.

For this blog post I am interested in the WGSL shader code limits, which the spec lays out in a nice table:

"WGSL Limits from the spec" (image from the WGSL 9 Sep 2025 spec limits section)

To explore these, lets start with a simple shader code, the triangle.

You can’t spell Triskel without TRI.

The simple WGSL triangle can be done with the vertex code:

@vertex
fn vertexMain(
  @builtin(vertex_index) VertexIndex : u32
) -> @builtin(position) vec4f {

  // A static array with the coordinates of this triangle
  var pos = array<vec2f, 3>(
    vec2(0.5, 0.5), vec2(-0.5, -0.5), vec2(0.5, -0.5),
  );

  return vec4f(pos[VertexIndex], 0.0, 1.0);
}

How many vertices can we fit into that var pos = array<vec2f, 3> array?

The last row of the WGSL limits table tells us that the minimum supported value that WebGPU has to guarantee for the "Maximum number of elements in value constructor expression of array type " is 2047.

In order to test this, lets introduce a bit of JS in this shader module string to extend the last vertex to guarantee that the array is filled n times:

const preprocessWGSL = (n: number) => `
@vertex
fn vertexMain(
  @builtin(vertex_index) VertexIndex : u32
) -> @builtin(position) vec4f {

    var pos = array<vec2f, ${n}>(
    ${
  "vec2(0.0, 0.5), vec2(-0.5, -0.5), " +
  // Extend this vertex:
  "vec2(0.5, -0.5), ".repeat(n - 2).slice(0, -2)
});

  return vec4f(pos[VertexIndex], 0.0, 1.0);
}`;

Procedural generation of limits

Now I want to replicate this kind of logic of declaring a super big thing in the WGSL code to check the other limits of that spec table.

The idea is to do this in three steps:

  1. Create a shader module with the offending code, and check for compilation errors. If it passes:
  2. Create a WebGPU pipeline with the offending code, and check if it doesn't throw. If it passes:
  3. Go to 1. but increase the number up until a value is found that breaks either 1. or 2.

Create shader and pipeline

I am testing both the createShaderModule() and the pipeline creation, since some things are only checked when the pipeline is created (binding layout, workgroup effective storage limits, etc).

I am expecting that the createShaderModule() compilation info is more about syntax and tipification. While pipeline creation more about storage or byte limit errors.

Setup

All values provided are for my machine, which is an intel macos with an Intel and AMD gpu. The reported adapter vendor is "Intel" and architecture is "gen-9".

All code is run in a webworker in an offline canvas.

Lets go bottom up through the WGSL table limits and do this!

Maximum number of elements in value constructor expression of array

The table says it WGSL should support at least 2047 elements.

Using the code above and recipe I get the following in my intel apple macbook:

Safari Technology Preview 227

Compile: error at 2048 Pipeline: —
“constant array cannot have more than 2047 elements”
Limit hit: 2047 (WGSL floor)

Chrome 140.0.7339.133

Compile: OK until 2047 Pipeline: error at 2048
Stalls for about 10 seconds and then pipeline throws with a long message with C++ code. Browser is irresponsive with much bigger limits.
Limit hit: 2047 (WGSL floor)

Maximum combined byte-size of all variables instantiated in the workgroup address...

The WGSL table says we should be able to have at least 16384 bytes.

This value is the same as the one communicated by the hardware device limits at maxComputeWorkgroupStorageSize.

To test this up, I created a simple compute shader with another really big array. I am not sure if it is necessary to include a barrier to prevent some sort of memory layout optimization or dead code elimination here, so I added it nevertheless.

// output, this is just to enforce the usage of the workgroup data
@group(0) @binding(0) var<storage, read_write> output: array<f32>;

// the ${n} here is coming from the JS preprocessor
var<workgroup> sharedData: array<f32, ${n}>;

@compute @workgroup_size(64)
fn computeMain(@builtin(local_invocation_id) localId: vec3<u32>) {
  let idx = localId.x;

  // fill it up! (the ${n} here is coming from the JS preprocessor)
  if (idx < ${n}) {
    sharedData[idx] = f32(idx);
  }
  workgroupBarrier();

  // compiler might be smart to just ignore the variable if not used, so
  // lets use it here just to avoid dead code elimination:
  if (idx == 0) {
    var sum = 0.0;
    // the ${n} here is coming from the JS preprocessor
    for (var i = 0u; i < ${n}; i++) {
      sum += sharedData[i];
    }
    output[0] = sum;
  }
}

Safari Technology Preview 227

Compile: - Pipeline: error above 16384
“The combined byte size of all variables in the workgroup address space exceeds 16384 bytes”
Limit hit: 16384 (WGSL floor)

Chrome 140.0.7339.133

Compile: - Pipeline: error at 2048
“The total use of workgroup storage (16400 bytes) is larger than the maximum allowed (16384 bytes). This adapter supports a higher maxComputeWorkgroupStorageSize of 32768, which can be specified in requiredLimits when calling requestDevice(). Limits differ by hardware, so always check the adapter limits prior to requesting a higher limit.”
Limit hit: 16384 (WGSL floor)

We could try to increase the hardware limits by requesting a bit more from the device, but I am just looking at what happens at the limit for creative purposes.

Lets move on to the next row of the WGSL spec limits table.

Maximum combined byte-size of all variables instantiated in the function address...

The spec says that we should have 8192 bytes. The thing here is that function space variables are unique to each invocation, they are not shared, so no need for the barrier as above.

Also this limit is not communicated by the hardware device when querying for limits, so lets see.

@group(0) @binding(0) var<storage, read_write> output: array<f32>;

fn someFunctionWithLocalMemory() -> f32 {
  // Same thing as before, ${n} is the JS preprocessor input:
  var localMem: array<f32, ${n}>;
  
  // ${n} is the JS preprocessor
  for (var i = 0u; i < ${n}; i++) {
    localMem[i] = f32(i);
  }
  
  var sum = 0.0;
  // ${n} is the JS preprocessor
  for (var i = 0u; i < ${n}; i++) {
    sum += localMem[i];
  }
  
  return sum;
}

@compute @workgroup_size(1)
fn computeMain() {
  output[0] = someFunctionWithLocalMemory();
}

Safari Technology Preview 227

Compile: - Pipeline: error above 8192
“The combined byte size of all variables in this function exceeds 8192 bytes”
Limit hit: 8192 (WGSL floor)

Chrome 140.0.7339.133

Compile: - Pipeline: no error
No message, the limit is never reached, at 262144bytes the compilation fails with the message _“array count (65536) must be less than 65536”_, which is not a limit I could find in the spec.
Limit hit: 262145 - Array count got to 65536 and I was using f32 (4 bytes)

This ceiling of 65536 elements seems to be a practical limit of the toolchain (Chrome, or maybe even driver). This is ok since the groundfloor established by the spec is way below it.

Maximum combined byte-size of all variables instantiated in the private address space

The private address space is cool for module-scope variables that are unique for each invocation but persist accross function calls within it.

The limit in the spec table is again 8192 bytes, to test this I will declare a big array as var<private> (which is the default when declaring variables in the module scope) and see how big it can be. Again this is not a limit exposed through the WebGPU API hardware device limits.

@group(0) @binding(0) var<storage, read_write> output: array<f32>;

// Same as before, a big array:
var<private> privateData: array<f32, ${n}>;

@compute @workgroup_size(1)
fn computeMain() {

  // Initialize it, the ${n} is set by this test shader preprocessor
  for (var i = 0u; i < ${n}; i++) {
    privateData[i] = f32(i);
  }
  
  var sum = 0.0;
  // Use all the data to avoid it being optimized away, like above
  // the ${n} is set by this test shader preprocessor
  for (var i = 0u; i < ${n}; i++) {
    sum += privateData[i];
  }
  
  output[0] = sum;
}

Safari Technology Preview 227

Compile: - Pipeline: error above 8192
“The combined byte size of all variables in the private address space exceeds 8192 bytes”
Limit hit: 8192 (WGSL floor)

Chrome 140.0.7339.133

Compile: - Pipeline: no error
Same as previous test. This specific limit is never reached, however at 262144bytes the compilation fails with the message _“array count (65536) must be less than 65536”_.
Limit hit: 262145 - Array count got to 65536 and I was using f32 (4 bytes)

And lets move on to the next one.

Maximum number of case selector values in a switch statement

This is an interesting one, it certainly did not cross my mind to explore the amount of case selector values. WGSL switch's must always carry a default case, which always counts as 1 regardless if it is empty or not.

Im going with just single values per case, since my purpose is to hit my desired limit and not so much to test the switch case implementation nuances:

@group(0) @binding(0) var<storage, read_write> output: array<f32>;

@compute @workgroup_size(1)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
  let testValue = i32(id.x) % ${numCases};
  var result: f32 = -1.0;
  
  switch testValue {
${cases.join("\n")}
    default: {
      result = 999.0;
    }
  }
  
  output[0] = result;
}

Safari Technology Preview 227

Compile: error above 1023 Pipeline -
“switch statement cannot have more than 1023 case selector values”
Limit hit: 1023 (WGSL floor)

Chrome 140.0.7339.133

Compile: error above 16383 Pipeline: -
“switch statement has 16384 case selectors, max is 16383”
Limit hit: 16383

Maximum number of parameters for a function

This is a cool way for a preprocessor to pass down extra data in the shader string code. The spec says that we have at least 255 arguments for each function, so lets create a preprocessor that uses structs and see how WebGPU behaves.

@group(0) @binding(0) var<storage, read_write> output: array<f32>;

struct MyStruct {
  value: f32,
  flag: bool
}

fn someFunction(${
    Array.from({ length: n }, (_, i) => `s${i}: MyStruct`).join(", ")
  }) -> f32 {
  var sum = 0.0;
  ${Array.from({ length: n }, (_, i) => `sum += s${i}.value;`).join("\n      ")}
  return sum;
}

@compute @workgroup_size(1)
fn main() {
  let s = MyStruct(1.0, true);
  output[0] = someFunction(${Array.from({ length: n }, () => "s").join(", ")});
    }

Safari Technology Preview 227

Compile: error above 255 Pipeline -
“function cannot have more than 255 parameters”
Limit hit: 255 (WGSL floor)

Chrome 140.0.7339.133

Compile: error above 255 Pipeline: -
“function declares 256 parameters, maximum is 255”
Limit hit: 255 (WGSL floor)

This limit seems to be per function, and structs are treated as just another param. It could maybe be used creatively as a way to dump extra data into a shader code string when other limits are reached. Though I imagine the compilation time would become a factor at some point.

Maximum nesting depth of brace-enclosed statements in a function

To test this limit I want to dump a lot of nested if's in the shader string code. Specs says we can have at least 127 nested block statements.

Here is the full code for it, including the #computePipeline from the WebGPU engine I am working at, where I moved the WebGPU spec to a fully declarative preprocessor (more about this next post).

export const limitsNestedIfShaderModule = (depth: number) => {
  let opening = "";
  let closing = "";
  let indent = "";

  // Create a bunch of nested ifs
  for (let i = 0; i < depth; i++) {
    opening += `${indent}if (${i}.0 < ${depth}.0) {\n`;
    indent += "  ";
    closing = `${indent}output[0] = ${i}.0;\n` + closing;
    if (i < depth - 1) {
      closing = "  ".repeat(depth - i - 1) + "}\n" + closing;
    } else {
      closing = indent.slice(0, -2) + "}\n" + closing;
    }
  }

  const shaderCode = `
#computePipeline someComputation {
  layout=auto
  compute={
    entryPoint=computeMain
    module=code 
  }
}

#shaderModule code {
  code="
@group(0) @binding(0) var<storage, read_write> output: array<f32>;

@compute @workgroup_size(1)
fn computeMain() {
${opening}${closing}
}

"}
  `;

  return shaderCode;
};

(As a way to treat shader code as data, I did bring the WebGPU spec to a declarative space as a high-level abstraction preprocessor for the demoscene engine I'm working at.)

The results are in:

Safari Technology Preview 227

Compile: error above 127 Pipeline -
“maximum parser recursive depth reached”
Limit hit: 127 (WGSL floor)

Chrome 140.0.7339.133

Compile: error above 63 Pipeline: -
“statement nesting depth / chaining length exceeds limit of 127”
Limit hit: 64 (below the WGSL floor)

Now this is strange. Safari is doing the right thing, "brace-enclosed statements" include the function block and beyond. However Chrome appears to be doing something entirely different, maybe counting closing braces as well? Not really sure what is happening here.

This got me wondering about the behaviour of nested composite types, which is coming right next.

Maximum nesting depth of a composite type

Spec says the limit is 15. I am going to generate a big nested array like this:

array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<f32, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>, 2>

My idea here is to push the nesting limit of a composite type in a single expression.

These are the results:

Safari Technology Preview 227

Compile: error above 15 Pipeline -
“composite type may not be nested more than 15 levels”
Limit hit: 15 (WGSL floor)

Chrome 140.0.7339.133

Compile: error above 29 Pipeline: -
Chrome lets it go big until it hits another limits such as: “array byte size (0x100000000) must not exceed 0xffffffff bytes”
Limit hit: 29 for f32 (above WGSL floor)

This suggests that Chrome maybeeee is not be explicitly checking nesting depth at all (just maybe) and just letting us hit whatever other limit might be in their parsing stack. This is again ok because the groundfloor established by the spec is also fullfilled.

Maximum number of members in a structure type

The final limit that I think might be potentially interesting to test as a shader developer is the maximum number of members in a struct.

The spec says we can have capacity for at least 1023 members in a struct.

@group(0) @binding(0) var<storage, read_write> output: array<f32>;

// Preprocessor creates n members for this struct:
struct BigStruct {
${members}
}

// Create and use them here to avoid them from being optimized:
@compute @workgroup_size(1)
fn computeMain() {
  var s = BigStruct(${initValues});
  
  ${sumLoop}
  output[0] = result;
}

Safari Technology Preview 227

Compile: error above 1023 Pipeline -
“struct cannot have more than 1023 members”
Limit hit: 1023 (WGSL floor)

Chrome 140.0.7339.133

Compile: error above 16383 Pipeline: -
“'struct BigStruct' has 16384 members, maximum is 16383”
Limit hit: 16383 (above WGSL floor)

Chrome is generous here and goes well beyond the minimum provided.

A table of it all

These results fit well within my expectations, Safari as always is more strict and spec oriented, while Chrome goes above the limits in some fronts while sometimes having a less polished failure mode.

Limit (WGSL spec)Spec minSafari TP 227Chrome 140
Array constructor elements2047Errs >2047 (2047)Compiles; Pipelines breaks at 2048; UI may stall
Workgroup storage bytes (var<workgroup>)16384Errs in pipelinePipeline Error; suggests requiredLimits 32768
Function local bytes (function stack)8192Errs in pipelineNo limit reached; compiles up to “array count” 65536
Private addr space bytes (var<private>)8192Errs in pipelineNo limit reached; same as above
Case selectors in a switch1023Errs >1023Errs >16383
Function parameters255Errs >255Errs >255
Struct members1023Errs >1023Errs >16383
Nested brace-enclosed statements127Errs >127Errs >63 (below spec min 127)
Nested composite types15Errs >15Errs >29 fails with a different limit

Conclusion

I think that there are at least two takeaways I can safely make from poking at WGSL spec limits:

  1. The spec minimums are good guidelines. Safari Technology Preview enforces them very strictly, which makes it great to validate what the portable floor actually is.

  2. Browsers may go beyond the minimums. Chrome lets you stretch several rows (struct members, switch cases, local/private bytes) way past the spec floor, but you’ll sometimes trip different internal limits first (like array-element counts) or see pipeline-time failures rather than neat compile errors.

If you’re considering “engraving” data into the shader source for creative or preprocessing reasons my advice so far would be:

All of these were run with a preprocessor demo player that I am working on for fun, if you are curious I'll share a follow-up shortly and ideally a WebGPU demo at Inercia 2025.

There is a follow-up post to this one here.