Running Microsoft FXC in Docker
Microsoft DXC is the new shader compiler stack, but the FXC compiler is still the dominant HLSL compiler for a number of reasons:
- Performance and correctness regressions of DXIL shaders compared to DXBC
- Many cross compilers and custom toolchains still rely on DXBC
- IHV drivers are still being adapted to consume DXIL, which is more low-level compared to DXBC
- DXC is a complex codebase, as it is based on LLVM - difficult to build, and many components
- DXIL is Direct3D 12 only, which makes it Windows 10 only
Therefore, it is still important to support shader compilation with FXC in some situations.
The performance and correctness regressions are a point of ongoing effort, but this is less of a problem today than it was 6 months ago - at least in my opinion, based on my own shaders and tests. In fact, most issues that are reported are fixed in just a couple days - an example. The opposite is also true, where some shaders have massive performance or compile-time cliffs when compiled with FXC compared to DXC, especially when arrays are involved.
Halcyon (SEED’s R&D engine) currently has a mixture of FXC and DXC compiled shaders when running under Direct3D 12, whereas the Vulkan path exclusively uses shaders compiled with DXC.
Given this scene:
Lets compare the the performance:
Name | Direct3D 12 | Vulkan | Using DXBC |
---|---|---|---|
Depth Clear | 0.003 ms | 0.003 ms | No |
GBuffer Meshes | 3.126 ms | 3.387 ms | No |
Velocity Vector | 0.035 ms | 0.033 ms | No |
GBuffer Sky | 0.046 ms | 0.048 ms | No |
Reproject Meta | 0.091 ms | 0.089 ms | No |
Temporal Reproject | 0.163 ms | 0.158 ms | No |
DiffuseSh | 0.012 ms | 0.011 ms | No |
Shadow Pass | 1.084 ms | 1.086 ms | No |
Shadow Pass | 1.091 ms | 1.114 ms | No |
Shadow Pass | 1.080 ms | 1.100 ms | No |
Depth Pyramid | 0.041 ms | 0.032 ms | No |
GTAO Pass | 0.284 ms | 0.181 ms | Yes |
GTAO Bilateral | 0.084 ms | 0.083 ms | No |
GTAO Bilateral | 0.085 ms | 0.086 ms | No |
GTAO Temporal | 0.099 ms | 0.173 ms | Yes |
Lighting | 1.081 ms | 3.048 ms | No |
SSR Trace | 0.595 ms | 0.604 ms | No |
IBL Reflection | 0.021 ms | 0.031 ms | Yes |
Reflection Filter | 0.831 ms | 1.239 ms | Yes |
Reflection Filter | 0.486 ms | 0.478 ms | Yes |
Reflection Merge | 0.049 ms | 0.065 ms | No |
Temporal AA | 0.269 ms | 0.219 ms | No |
Velocity Reduce | 0.019 ms | 0.029 ms | No |
Velocity Reduce | 0.004 ms | 0.004 ms | No |
Velocity Dilate | 0.011 ms | 0.004 ms | No |
Motion Blur | 0.111 ms | 0.113 ms | No |
Bloom Extract | 0.013 ms | 0.045 ms | No |
Bloom Downsample | 0.004 ms | 0.008 ms | No |
Bloom Blur | 0.004 ms | 0.004 ms | No |
Exposure Adaption | 0.004 ms | 0.003 ms | No |
Bloom Upsample | 0.005 ms | 0.005 ms | No |
Bloom Upsample | 0.006 ms | 0.004 ms | No |
Bloom Upsample | 0.009 ms | 0.008 ms | No |
Bloom Upsample | 0.022 ms | 0.023 ms | No |
Bloom Apply | 0.039 ms | 0.255 ms | No |
Final Output | 0.041 ms | 0.092 ms | Yes |
Present | 0.018 ms | 0.017 ms | No |
Totals | 11.173 ms | 13.928 ms | 5 / 37 |
A bit hand-wavy, but if we assume that DXIL and SPIR-V are translated by backend compilers into comparable IL, then we can draw some conclusions about these performance metrics.
In cases where DXBC is used but the Direct3D 12 performance is worse than Vulkan, this typically indicates a case where DXIL is likely faster than DXBC, but correctness prevents us from using it.
In cases where DXBC is used and the Direct3D 12 performance is better than Vulkan, this typically indicates a case where DXIL is slower than DXBC, indicating a performance regression.
The most interesting case is the Lighting
pass which uses DXIL, and Vulkan is ~3x more expensive. In the DXC stack, HLSL to SPIR-V uses the same AST as HLSL to DXIL, indicating this performance cliff exists in the translation from AST to SPIR-V.
The performance issue with the Reflection
passes is largely related to pow(x, 2)
differences; FXC emits x * x
whereas DXC emits exp2(log2(x) * 2)
. It’s of course easy to solve this app-side, but it’s important to track and fix these issues in the compiler itself (i.e. supporting power expansion up to 16). Aside from performance, there are numerical differences which cause corruption when DXIL is used for these passes instead of DXBC.
In general, DXIL is used for nearly all passes, and with good performance and compile times.
One of the components in the DXC compiler stack is dxbc2dxil
, would could possibly help with transitioning existing DXBC toolchains over to DXIL. Source
HLSL Other shading langs DSL DXBC IL
+ + + +
| | | |
v v v v
Clang Clang Other Tools dxbc2dxil
+ + + +
| | | |
v v v |
+------+--------------------+---------+ |
| High level IR (DXIR) | |
+-------------------------------------+ |
| |
| |
v |
Optimizer <-----+ Linker |
+ ^ + |
| | | |
| | | |
+------------v------+-------------v-----v-------+
| Low level IR (DXIL) |
+------------+----------------------+-----------+
| |
v v
Driver Compiler Verifier
Regarding IHV driver stability, I definitely don’t envy the hard work the driver engineers have been needing to do in order to support DXIL. Previously, they just needed to support the more higher level DXBC specification, which gave them a lot more freedom to map these concepts to their internal IL, whereas DXIL is a lot lower level and more explicit around flow control, intrinsics, and overall behavior.
This is definitely a controversial topic, but I personally feel that the overall benefits of an open source compiler stack, proper support for features like wave intrinsics, and an actual specification are very advantageous. As one example, the open source nature of DXC has allowed for Google to collaborate with Microsoft and add HLSL to SPIR-V support to the same codebase, making it less problematic to develop or maintain a complex engine that runs on Vulkan and Direct3D 12, using only HLSL as a source language.
Following my previous posts regarding shader compilation on Linux and scaling out in Kubernetes, I looked into running FXC in Docker. One major problem of FXC is that it is only a closed source Windows binary, which eliminates any ability to cross-compile it for Linux.
Without any source, the only other alternative was to give Wine a shot, which has no problem running fxc.exe
correctly.
FROM ubuntu:18.04
ARG DEBIAN_FRONTEND="noninteractive"
RUN dpkg --add-architecture i386 \
&& apt-get update \
&& apt-get install -y \
software-properties-common \
winbind \
cabextract \
p7zip \
unzip \
wget \
curl \
zenity \
&& wget -O- https://dl.winehq.org/wine-builds/Release.key | apt-key add - \
&& apt-add-repository https://dl.winehq.org/wine-builds/ubuntu/ \
&& apt-get update \
&& apt-get install -y --install-recommends winehq-stable \
&& mkdir -p /home/wine/.cache/wine \
&& wget https://dl.winehq.org/wine/wine-mono/4.7.3/wine-mono-4.7.3.msi \
-O /home/wine/.cache/wine/wine-mono-4.6.4.msi \
&& wget https://dl.winehq.org/wine/wine-gecko/2.47/wine_gecko-2.47-x86.msi \
-O /home/wine/.cache/wine/wine_gecko-2.47-x86.msi \
&& wget https://dl.winehq.org/wine/wine-gecko/2.47/wine_gecko-2.47-x86_64.msi \
-O /home/wine/.cache/wine/wine_gecko-2.47-x86_64.msi \
&& wget https://raw.githubusercontent.com/Winetricks/winetricks/master/src/winetricks \
-O /usr/bin/winetricks \
&& chmod +rx /usr/bin/winetricks \
&& mkdir -p /home/wine/.cache/winetricks/win7sp1 \
&& wget https://download.microsoft.com/download/0/A/F/0AFB5316-3062-494A-AB78-7FB0D4461357/windows6.1-KB976932-X86.exe \
-O /home/wine/.cache/winetricks/win7sp1/windows6.1-KB976932-X86.exe \
&& groupadd -g 1010 wine \
&& useradd -s /bin/bash -u 1010 -g 1010 wine \
&& chown -R wine:wine /home/wine \
&& apt-get autoremove -y \
software-properties-common \
&& apt-get autoclean \
&& apt-get clean \
&& apt-get autoremove
VOLUME /home/wine
ENV WINEARCH=win64
ENV WINEDEBUG=fixme-all
RUN winecfg
WORKDIR /fxc
COPY d3dcompiler_47.dll .
COPY fxc.exe .
ENTRYPOINT ["wine", "fxc"]
The above Dockerfile has been published to Docker Hub as gwihlidal/fxc.
The published image can be invoked with:
$ docker run --rm gwihlidal/fxc /help
The host machine file system can also be bind mounted into the container so that fxc can be used like a regular command line application on any machine:
$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) gwihlidal/fxc /T <target> /E <entry-point-name> <input-hlsl-file>
Example output (DXBC):
% docker run --rm -v $(pwd):$(pwd) -w $(pwd) gwihlidal/fxc /T ps_5_1 /E main simple.hlsl
Microsoft (R) Direct3D Shader Compiler 10.1
Copyright (C) 2013 Microsoft. All rights reserved.
//
// Generated by Microsoft (R) HLSL Shader Compiler 10.1
//
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Input
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_TARGET 0 xyzw 0 TARGET float xyzw
//
ps_5_1
dcl_globalFlags refactoringAllowed
dcl_output o0.xyzw
mov o0.xyzw, l(0,1.000000,0,1.000000)
ret
// Approximately 2 instruction slots used