slang::parsing::Lexer class

The Lexer is responsible for taking source text and chopping it up into tokens.

Tokens carry along leading "trivia", which is things like whitespace and comments, so that we can programmatically piece back together what the original file looked like.

There are also helper methods on this class that handle token manipulation on the character level.

Public static functions

static auto concatenateTokens(BumpAllocator& alloc, SourceManager& sourceManager, Token left, Token right) -> Token
Concatenates two tokens together; used for macro pasting.
static auto stringify(Lexer& parentLexer, Token startToken, std::span<Token> bodyTokens, Token endToken) -> Token
Converts a range of tokens into a string literal; used for macro stringification.
static auto commentify(BumpAllocator& alloc, SourceManager& sourceManager, std::span<Token> tokens) -> Trivia
Converts a range of tokens into a block comment; used for macro expansion.
static void splitTokens(BumpAllocator& alloc, Diagnostics& diagnostics, SourceManager& sourceManager, Token sourceToken, size_t offset, KeywordVersion keywordVersion, SmallVectorBase<Token>& results)
Splits the given token at the specified offset into its raw source text.

Public functions

auto lex() -> Token
Lexes the next token from the source code.
auto isNextTokenOnSameLine() -> bool
Looks ahead in the source stream to see if the next token we would lex is on the same line as the previous token we've lexed.
auto lexEncodedText(ProtectEncoding encoding, uint32_t expectedBytes, bool singleLine, bool legacyProtectedMode) -> Token
Lexes a token that contains encoded text as part of a protected envelope.
auto getLibrary() const -> const SourceLibrary*
Returns the library with which the lexer's source buffer is associated.

Function documentation

static void slang::parsing::Lexer::splitTokens(BumpAllocator& alloc, Diagnostics& diagnostics, SourceManager& sourceManager, Token sourceToken, size_t offset, KeywordVersion keywordVersion, SmallVectorBase<Token>& results)

Splits the given token at the specified offset into its raw source text.

The trailing portion of the split is lexed into new tokens and appened to results

Token slang::parsing::Lexer::lex()

Lexes the next token from the source code.

This will never return a null pointer; at the end of the buffer, an infinite stream of EndOfFile tokens will be generated