slang::parsing::Lexer class

The Lexer is responsible for taking source text and chopping it up into tokens.

Tokens carry along leading "trivia", which is things like whitespace and comments, so that we can programmatically piece back together what the original file looked like.

There are also helper methods on this class that handle token manipulation on the character level.

Public static functions

static bool concatenateTokens(BumpAllocator& alloc, SourceManager& sourceManager, const LexerOptions& options, Token left, Token right, SmallVectorBase<Token>& results)
Concatenates two tokens together.
static Token stringify(BumpAllocator& alloc, SourceManager& sourceManager, const LexerOptions& options, Token startToken, std::span<Token> bodyTokens, Token endToken)
Converts a range of tokens into a string literal; used for macro stringification.
static Trivia commentify(BumpAllocator& alloc, SourceManager& sourceManager, const LexerOptions& options, std::span<Token> tokens)
Converts a range of tokens into a block comment; used for macro expansion.
static void splitTokens(BumpAllocator& alloc, Diagnostics& diagnostics, SourceManager& sourceManager, const LexerOptions& options, Token sourceToken, size_t offset, KeywordVersion keywordVersion, SmallVectorBase<Token>& results)
Splits the given token at the specified offset into its raw source text.
static size_t getLocForStringChar(std::string_view rawStr, size_t charIndex, size_t& charLen)
Given a char index into a processed string literal and the raw string representing that literal, returns the offset within the raw string that matches that character.

Public functions

Token lex()
Lexes the next token from the source code.
bool isNextTokenOnSameLine()
Looks ahead in the source stream to see if the next token we would lex is on the same line as the previous token we've lexed.
Token lexEncodedText(ProtectEncoding encoding, uint32_t expectedBytes, bool singleLine, bool legacyProtectedMode)
Lexes a token that contains encoded text as part of a protected envelope.
const SourceLibrary* getLibrary() const
Returns the library with which the lexer's source buffer is associated.

Function documentation

static bool slang::parsing::Lexer::concatenateTokens(BumpAllocator& alloc, SourceManager& sourceManager, const LexerOptions& options, Token left, Token right, SmallVectorBase<Token>& results)

Concatenates two tokens together.

This may result in more than one output token if the right hand token being concatenated ends up splitting and being re-lexed. Returns true if the concatenation succeeded and false otherwise.

static void slang::parsing::Lexer::splitTokens(BumpAllocator& alloc, Diagnostics& diagnostics, SourceManager& sourceManager, const LexerOptions& options, Token sourceToken, size_t offset, KeywordVersion keywordVersion, SmallVectorBase<Token>& results)

Splits the given token at the specified offset into its raw source text.

The trailing portion of the split is lexed into new tokens and appened to results

static size_t slang::parsing::Lexer::getLocForStringChar(std::string_view rawStr, size_t charIndex, size_t& charLen)

Given a char index into a processed string literal and the raw string representing that literal, returns the offset within the raw string that matches that character.

This essentially backs out the processing of things like escape character codes.

If the given character index ends up pointing at a hex or octal encoded escape character, the charLen parameter will receive the length of that escape sequence. Otherwise it will be set to 1.

Token slang::parsing::Lexer::lex()

Lexes the next token from the source code.

This will never return a null pointer; at the end of the buffer, an infinite stream of EndOfFile tokens will be generated