Inherits from NSObject
Declared in GBTokenizer.h
GBTokenizer.m

Overview

Provides common methods for tokenizing input source strings.

Main responsibilities of the class are to split the given source string into tokens and provide simple methods for iterating over the tokens stream. It works upon ParseKit framework's PKTokenizer. As different parsers require different tokenizers and setups, the class itself doesn't create a tokenizer, but instead requires the client to provide one. Here's an example of simple usage:

NSString *filename = ...
NSString *input = ...
PKTokenizer *worker = [PKTokenizer tokenizerWithString:input];
GBTokenizer *tokenizer = [[GBTokenizer allow] initWithTokenizer:worker filename:filename];
while (![tokenizer eof]) {
	NSLog(@"%@", [tokenizer currentToken]);
	[tokenizer consume:1];
}

This example simply iterates over all tokens and prints each one to the log. If you want to parse a block of input with known start and/or end token, you can use one of the block consuming methods instead. Note that you still need to provide the name of the file as this is used for creating GBSourceInfo objects for parsed objects!

To make comments parsing simpler, GBTokenizer automatically enables comment reporting to the underlying PKTokenizer, however to prevent higher level parsers dealing with complexity of comments, any lookahead and consume method doesn't report them. Instead these methods skip all comment tokens, however they do make them accessible through properties, so if the client wants to check whether there's any comment associated with current token, it can simply ask by sending lastCommentString. Additionally, the client can also get the value of a comment just before the last one by sending previousCommentString - this can be used to get any method section comments which aren't associated with any element. If there is no "stand-alone" comment before the last one, previousCommentString returns nil. GBTokenizer goes even further when dealing with comments - it automatically groups single line comments into a single comment group and removes all prefixes and suffixes.

Note: Both comment values are persistent until a new comment is found! At that time, previous comment contains the value of last comment and the new comment is stored as last comment. This allows us parsing through complex code (like #ifdef / #elif / #else blocks etc.) without fear of loosing any comment information. It does require manual resetting of comments whenever the comment is actually attached to an object. Resetting is performed by sending resetComments message to the receiver.

Tasks

Initialization & disposal

Tokenizing handling

Information handling

Comments handling

  • – resetComments Resets lastComment and previousComment values.
  •   lastComment Returns the last comment or nil if comment is not available. property
  •   previousComment Returns "stand-alone" comment found immediately before the comment returned from lastCommentString. property

Properties

lastComment

Returns the last comment or nil if comment is not available.

@property (readonly) GBComment *lastComment

Discussion

The returned [GBComment stringValue] contains the whole last comment string, without prefixes or suffixes. To optimize things a bit, the actual comment string value is prepared on the fly, as you send the message, so it's only handled if needed. As creating comment string adds some computing overhead, you should cache returned value if possible.

If there's no comment available for current token, nil is returned.

Declared In

GBTokenizer.h

previousComment

Returns "stand-alone" comment found immediately before the comment returned from lastCommentString.

@property (readonly) GBComment *previousComment

Discussion

Previous comment is a "stand-alone" comment which is found immediately before lastCommentString but isn't associated with any language element. These are ussually used to provide meta data and other instructions for formatting or grouping of "normal" comments returned with lastCommentString. The value should be used at the same time as lastCommentString as it is automatically cleared on the next consuming! If there's no stand-alone comment immediately before last comment, the value returned is nil.

The returned [GBComment stringValue] contains the whole previous comment string, without prefixes or suffixes. To optimize things a bit, the actual comment string value is prepared on the fly, as you send the message, so it's only handled if needed. As creating comment string adds some computing overhead, you should cache returned value if possible.

Declared In

GBTokenizer.h

Class Methods

tokenizerWithSource:filename:

Returns initialized autoreleased instance using the given source PKTokenizer.

+ (id)tokenizerWithSource:(PKTokenizer *)tokenizer filename:(NSString *)filename

Parameters

tokenizer
The underlying (worker) tokenizer to use for actual splitting.
filename
The name of the file without path used for generating source info.

Return Value

Returns initialized instance or nil if failed.

Exceptions

NSException
Thrown if the given tokenizer or filename is nil or filename is empty string.

Declared In

GBTokenizer.h

Instance Methods

consume:

Consumes the given ammoun of tokens, starting at the current position.

- (void)consume:(NSUInteger)count

Parameters

count
The number of tokens to consume.

Discussion

This effectively "moves" currentToken to the new position. If EOF is reached before consuming the given ammount of tokens, consuming stops at the end of stream and currentToken returns EOF token. If comment tokens are detected while consuming, they are not counted and consuming count continues with actual language tokens. However if there is a comment just before the next current token (i.e. after the last consumed token), the comment data is saved and is available through lastCommentString. Otherwise last comment data is cleared, even if a comment was detected in between.

Declared In

GBTokenizer.h

consumeFrom:to:usingBlock:

Enumerates and consumes all tokens starting at current token up until the given end token is detected.

- (void)consumeFrom:(NSString *)start to:(NSString *)end usingBlock:(void ( ^ ) ( PKToken *token , BOOL *consume , BOOL *stop ))block

Parameters

start
Optional starting token or nil.
end
Ending token.
block
The block to be called for each token.

Discussion

For each token, the given block is called which gives client a chance to inspect and handle tokens. If start token is given and current token matches it, the token is consumed without reporting it to block. However if the token doesn't match, the method returns immediately without doint anything. End token is also not reported and is also automatically consumed after all previous tokens are reported. Also read consume: documentation to understand how comments are dealt with.

Exceptions

NSException
Thrown if the given end token is nil.

Declared In

GBTokenizer.h

consumeTo:usingBlock:

Enumerates and consumes all tokens starting at current token up until the given end token is detected.

- (void)consumeTo:(NSString *)end usingBlock:(void ( ^ ) ( PKToken *token , BOOL *consume , BOOL *stop ))block

Parameters

end
Ending token.
block
The block to be called for each token.

Discussion

For each token, the given block is called which gives client a chance to inspect and handle tokens. End token is not reported and is automatically consumed after all previous tokens are reported. Sending this message is equivalent to sending consumeFrom:to:usingBlock: and passing nil for start token. Also read consume: documentation to understand how comments are dealt with.

Exceptions

NSException
Thrown if the given end token is nil.

Declared In

GBTokenizer.h

currentToken

Returns the current token.

- (PKToken *)currentToken

Declared In

GBTokenizer.h

eof

Specifies whether we're at EOF.

- (BOOL)eof

Return Value

Returns YES if we're at EOF, NO otherwise.

Declared In

GBTokenizer.h

initWithSourceTokenizer:filename:

Initializes tokenizer with the given source PKTokenizer.

- (id)initWithSourceTokenizer:(PKTokenizer *)tokenizer filename:(NSString *)filename

Parameters

tokenizer
The underlying (worker) tokenizer to use for actual splitting.
filename
The name of the file without path that's the source for tokenizer's input string.

Return Value

Returns initialized instance or nil if failed.

Discussion

This is designated initializer.

Exceptions

NSException
Thrown if the given tokenizer or filename is nil or filename is empty string.

Declared In

GBTokenizer.h

lookahead:

Returns the token by looking ahead the given number of tokens from current position.

- (PKToken *)lookahead:(NSUInteger)offset

Parameters

offset
The offset from the current position.

Return Value

Returns the token at the given offset or EOF token if offset point after EOF.

Discussion

If offset "points" within a valid token, the token is returned, otherwise EOF token is returned. Note that this method automatically skips any comment tokens and only counts actual language tokens.

See Also

Declared In

GBTokenizer.h

resetComments

Resets lastComment and previousComment values.

- (void)resetComments

Discussion

This message should be sent whenever a comment is "attached" to an object. As comments are persistent, failing to reset would lead to using the same comment for next object as well!

Declared In

GBTokenizer.h

sourceInfoForCurrentToken

Returns GBSourceInfo for current token and filename.

- (GBSourceInfo *)sourceInfoForCurrentToken

Return Value

Returns declared file data.

Discussion

This is equivalent to sending sourceInfoForToken: and passing currentToken as the token parameter.

Exceptions

NSException
Thrown if current token is nil.

Declared In

GBTokenizer.h

sourceInfoForToken:

Returns GBSourceInfo object describing the given token source information.

- (GBSourceInfo *)sourceInfoForToken:(PKToken *)token

Parameters

token
The token for which to get file data.

Return Value

Returns declared file data.

Discussion

The method converts the given token's offset within the input string to line number and uses that information together with assigned filename to prepare the token info object.

Exceptions

NSException
Thrown if the given token is nil.

Declared In

GBTokenizer.h