Parsing Shader Includes

An important aspect of shader compilation is the ability to include arbitrary graphs of shader files. Typically, this is performed with a callback supplied to a shader compiler invoked with a relative or absolute file path, and the callback returns the contents of the specified file, sourced from a file system or virtual file system.

If a single shader entry point is being compiled, then the shader and all include dependencies will be evaluated, and compilation time for a single invocation is typically not a major concern. However, if rebuilding all shaders for a full game, the overall compilation time becomes a major concern, and eliminating redundant or unnecessary work is critical.

The common approach to eliminate redundant or unnecessary work is to generate an identity for each shader entry point, consistenting of a hash representing the file contents, preprocessor definitions, compiler flags, and a hash of the compiler binary itself.

However, the include graph referenced from each shader must also be represented by the identity, otherwise any changes to utility shader files would not cause a shader using the utility functions to pick up the changes and rebuild correctly.

Parsing the Directives

Despite shader compilers supporting callbacks for include directives, it is undesirable to use this callback in preprocessor mode for the sake of dependency tracking, because evaluating the full shader can be quite slow. Some build systems run the preprocessor to “flatten” all the includes into a single source file, which eliminates the need for an include callback during actual compilation and makes CAS identity tracking simpler, at the expense of massive text files that result in I/O bottlenecks, high memory usage, and frontend compiler bottlenecks.

A more ideal approach is using a regular expression to parse include directives in a shader, and in a consistent way regardless of platform or compiler used.

Rust is a fantastic systems language, and also provides an excellent SIMD-accelerated PCRE (perl-compatible regular expression) engine, which I will show in the following code examples.

There are multiple ways to write an include parsing regular expression. One approach is to use multiple matching groups, at the expense of readability.

(?m)(^*\#\s*include\s*<([^<>]+)>)|(^\s*\#\s*include\s*"([^"]+)")


Another approach is to use a single expression for both relative and absolute matches, which is easier to understand, but loses context about the type of match.

(?m)^*\#include\s+["<]([^">]+)*[">]


It is likely that the parser needs to know about absolute vs. relative include directives, which means we either need to use matching groups, or to use two individual expressions.

Relative includes can be parsed with the following:

(?m)^*\#\s*include\s*"([^"]+)"


Absolute includes can be parsed with the following:

(?m)^*\#\s*include\s*<([^<>]+)>


The rust code to evaluate these regular expressions:

#[macro_use] extern crate lazy_static;
extern crate regex;

use regex::Regex;

#[derive(Debug, Clone, PartialEq)]
struct MatchRange {
    start: usize,
    end: usize,
}

#[derive(Debug, Clone, PartialEq)]
struct MatchResult {
    include_path: String,
    range: MatchRange,
    relative_path: bool,
}

fn parse_text_regex(
    results: &mut Vec<MatchResult>,
    input: &str,
    regex: &Regex,
    relative: bool,
) {
    // Result will be an iterator over tuples containing the start and end
    // indices for each match in the string
    let regex_iter = regex.find_iter(input);
    for result in regex_iter {
        let range_start = result.start();
        let range_end = result.end();
        let range_text = &input[range_start..range_end];
        if let Some(range_caps) = regex.captures(range_text) {
            let include_path = range_caps.get(1).map_or("", |m| m.as_str());
            if !include_path.is_empty() {
                results.push(MatchResult {
                    include_path: include_path.to_owned(),
                    range: MatchRange {
                        start: range_start,
                        end: range_end,
                    },
                    relative_path: relative,
                });
            }
        }
    }
}

fn parse_text(results: &mut Vec<MatchResult>, input: &str) {
    lazy_static! {
        static ref ABSOLUTE_PATH_REGEX: Regex = Regex::new(r#"(?m)^*\#\s*include\s*<([^<>]+)>"#)
            .expect("failed to compile absolute include path regex");
    }

    lazy_static! {
        static ref RELATIVE_PATH_REGEX: Regex = Regex::new(r#"(?m)^*\#\s*include\s*"([^"]+)""#)
            .expect("failed to compile relative include path regex");
    }

    parse_text_regex(results, input, &ABSOLUTE_PATH_REGEX, false);
    parse_text_regex(results, input, &RELATIVE_PATH_REGEX, true);
}

NOTE: Usage of lazy_static! is a performance optimization so that the regular expressions are only compiled once, regardless of the number of times that parse_text is called.

The parse_text function can now be used on any shader to parse the include directives in that content. In order to fully parse nested includes, this function can be called recursively, or widely-parallel using a producer-consumer queue approach (depending on how much work is being done). In both cases, it is very important that you track what includes have been visited, and break any cycles (i.e. includes that end up referencing themselves in the graph).

Generating an Identity

While parsing the includes, it may be desirable to also generate a unique content-addressable identity that represents a shader file at a particular version. My personal preference is to use SHA-256, and also compute a human-friendly Base58 encoded string, which is quite easy to perform in rust.

Cargo.toml:

base58 = "0.1.0"
sha2 = "0.7.1"
filebuffer = "0.4.0"
use base58::ToBase58;
use sha2::Sha256;
use std::io;
use std::path::Path;

#[derive(Debug, Default, Clone)]
pub struct Identity {
  pub raw: Vec<u8>,
  pub txt: String,
}

pub fn compute_data_identity(data: &[u8]) -> Identity {
  // create a Sha256 object
  let mut hasher = Sha256::default();

  // write input data
  hasher.input(data);

  // read hash digest and consume hasher
  let output_raw = hasher.result().to_vec();
  let output_txt = output_raw.to_base58();

  Identity {
    raw: output_raw,
    txt: output_txt,
  }
}

pub fn compute_file_identity<P: AsRef<Path>>(path: P) -> io::Result<Identity> {
  let fbuffer = FileBuffer::open(&path)?;

  // create a Sha256 object
  let mut hasher = Sha256::default();

  // write input data
  hasher.input(&fbuffer);

  // read hash digest and consume hasher
  let output_raw = hasher.result().to_vec();
  let output_txt = output_raw.to_base58();

  Ok(Identity {
    raw: output_raw,
    txt: output_txt,
  })
}

An upcoming post will be covering shader identities in a distributed environment!


© 2018. All rights reserved.