slang::parsing::Lexer class

The Lexer is responsible for taking source text and chopping it up into tokens. Tokens carry along leading "trivia", which is things like whitespace and comments, so that we can programmatically piece back together what the original file looked like.

There are also helper methods on this class that handle token manipulation on the character level.

Public static functions

static auto concatenateTokens(BumpAllocator& alloc, Token left, Token right) -> Token
Concatenates two tokens together; used for macro pasting.
static auto stringify(BumpAllocator& alloc, SourceLocation location, std::span<Trivia const> trivia, Token* begin, Token* end) -> Token
static auto commentify(BumpAllocator& alloc, Token* begin, Token* end) -> Trivia
static void splitTokens(BumpAllocator& alloc, Diagnostics& diagnostics, const SourceManager& sourceManager, Token sourceToken, size_t offset, KeywordVersion keywordVersion, SmallVectorBase<Token>& results)

Public functions

auto lex(KeywordVersion keywordVersion = LexerFacts::getDefaultKeywordVersion()) -> Token
auto isNextTokenOnSameLine() -> bool
auto lexEncodedText(ProtectEncoding encoding, uint32_t expectedBytes, bool singleLine) -> Token
Lexes a token that contains encoded text as part of a protected envelope.

Function documentation

static Token slang::parsing::Lexer::stringify(BumpAllocator& alloc, SourceLocation location, std::span<Trivia const> trivia, Token* begin, Token* end)

Converts a range of tokens into a string literal; used for macro stringification. The location and trivia parameters are used in the newly created token. The range of tokens to stringify is given by begin and end.

static Trivia slang::parsing::Lexer::commentify(BumpAllocator& alloc, Token* begin, Token* end)

Converts a range of tokens into a block comment; used for macro expansion. The range of tokens to commentify is given by begin and end.

static void slang::parsing::Lexer::splitTokens(BumpAllocator& alloc, Diagnostics& diagnostics, const SourceManager& sourceManager, Token sourceToken, size_t offset, KeywordVersion keywordVersion, SmallVectorBase<Token>& results)

Splits the given token at the specified offset into its raw source text. The trailing portion of the split is lexed into new tokens and appened to results

Token slang::parsing::Lexer::lex(KeywordVersion keywordVersion = LexerFacts::getDefaultKeywordVersion())

Lexes the next token from the source code. This will never return a null pointer; at the end of the buffer, an infinite stream of EndOfFile tokens will be generated

bool slang::parsing::Lexer::isNextTokenOnSameLine()

Looks ahead in the source stream to see if the next token we would lex is on the same line as the previous token we've lexed.