internal/libyaml
This package provides low-level YAML processing functionality through a 3-stage pipeline: Scanner → Parser → Emitter. It implements the libyaml C library functionality in Go.
Directory Overview
The internal/libyaml package implements the core YAML processing stages:
- Scanner - Tokenizes YAML text into tokens
- Parser - Converts tokens into events following YAML grammar rules
- Emitter - Serializes events back into YAML text
File Organization
Main Source Files
- scanner.go - YAML scanner/tokenizer implementation
- parser.go - YAML parser (tokens → events)
- emitter.go - YAML emitter (events → YAML output)
- api.go - Public API for Parser and Emitter types
- yaml.go - Core types and constants (Event, Token, enums)
- reader.go - Input handling and encoding detection
- writer.go - Output handling
- yamlprivate.go - Internal types and helper functions
Test Files
- scanner_test.go - Scanner tests
- parser_test.go - Parser tests
- emitter_test.go - Emitter tests
- api_test.go - API tests
- yaml_test.go - Utility function tests
- reader_test.go - Reader tests
- writer_test.go - Writer tests
- yamlprivate_test.go - Character classification tests
- loader_test.go - Data loader scalar resolution tests
- yamldatatest_test.go - YAML test data loading framework
- yamldatatest_loader.go - YAML test data loader with scalar type resolution (exported for reuse)
Test Data Files (in testdata/)
- scanner.yaml - Scanner test cases
- parser.yaml - Parser test cases
- emitter.yaml - Emitter test cases
- api.yaml - API test cases
- yaml.yaml - Utility function test cases
- reader.yaml - Reader test cases
- writer.yaml - Writer test cases
- yamlprivate.yaml - Character classification test cases
- loader.yaml - Data loader scalar resolution test cases
Processing Pipeline
1. Scanner (scanner.go)
The scanner converts YAML text into tokens.
Input: Raw YAML text (string or []byte) Output: Stream of tokens
Token types include:
SCALAR_TOKEN- Plain, quoted, or block scalar valuesKEY_TOKEN,VALUE_TOKEN- Mapping key/value indicatorsBLOCK_MAPPING_START_TOKEN,FLOW_MAPPING_START_TOKEN- Mapping delimitersBLOCK_SEQUENCE_START_TOKEN,FLOW_SEQUENCE_START_TOKEN- Sequence delimitersANCHOR_TOKEN,ALIAS_TOKEN- Anchor definitions and referencesTAG_TOKEN- Type tagsDOCUMENT_START_TOKEN,DOCUMENT_END_TOKEN- Document boundaries
Responsibilities:
- Character encoding detection (UTF-8, UTF-16LE, UTF-16BE)
- Line break normalization
- Indentation tracking
- Quote and escape sequence handling
2. Parser (parser.go)
The parser converts tokens into events following YAML grammar rules.
Input: Stream of tokens from Scanner Output: Stream of events
Event types include:
STREAM_START_EVENT,STREAM_END_EVENT- Stream boundariesDOCUMENT_START_EVENT,DOCUMENT_END_EVENT- Document boundariesSCALAR_EVENT- Scalar valuesMAPPING_START_EVENT,MAPPING_END_EVENT- Mapping boundariesSEQUENCE_START_EVENT,SEQUENCE_END_EVENT- Sequence boundariesALIAS_EVENT- Anchor references
Responsibilities:
- Implementing YAML grammar and validation
- Managing document directives (%YAML, %TAG)
- Resolving anchors and aliases
- Tracking implicit vs explicit markers
- Style preservation (plain, single-quoted, double-quoted, literal, folded)
3. Emitter (emitter.go)
The emitter converts events back into YAML text.
Input: Stream of events Output: YAML text
Responsibilities:
- Style selection (plain/quoted scalars, block/flow collections)
- Formatting control (canonical mode, indentation, line width)
- Character encoding
- Anchor and tag serialization
- Document marker generation (---, ...)
Configuration options:
Canonical- Emit in canonical YAML formIndent- Indentation width (2-9 spaces)Width- Line width (-1 for unlimited)Unicode- Enable Unicode character outputLineBreak- Line break style (LN, CR, CRLN)
Testing Framework
Test Architecture
The testing framework uses a data-driven approach:
- Test data is stored in YAML files in the
testdata/directory - Test logic is implemented in Go files (
*_test.go) - One-to-one pairing: Each
testdata/foo.yamlhas a correspondingfoo_test.go
Benefits:
- Easy to add new test cases without writing Go code
- Test data is human-readable and self-documenting
- Test logic is reusable across many test cases
- Test data is separated from test code for clarity
- Tests can become a common suite for multiple YAML frameworks
Test Data Files
Each YAML file contains test cases for a specific component:
-
scanner.yaml - Scanner/tokenization tests
- Token sequence verification
- Token property validation (value, style)
- Error detection
-
parser.yaml - Parser/event generation tests
- Event sequence verification
- Event property validation (anchor, tag, value, directives)
- Error detection
-
emitter.yaml - Emitter/serialization tests
- Event-to-YAML conversion
- Configuration options testing
- Roundtrip testing (parse → emit)
- Writer integration
-
api.yaml - API constructor and method tests
- Constructor validation
- Method behavior and state changes
- Panic conditions
- Cleanup verification
-
yaml.yaml - Utility function tests
- Enum String() methods
- Style accessor methods
-
reader.yaml - Reader/input handling tests
- Encoding detection (UTF-8, UTF-16LE, UTF-16BE)
- Buffer management
- Error handling
-
writer.yaml - Writer/output handling tests
- Buffer flushing
- Output handlers (string, io.Writer)
- Error conditions
-
yamlprivate.yaml - Character classification tests
- Character type predicates (isAlpha, isDigit, isHex, etc.)
- Character conversion functions (asDigit, asHex, width)
- Unicode handling
-
loader.yaml - Data loader scalar resolution tests
- Numeric type resolution (integers, floats)
- Boolean and null value handling
- String vs numeric type disambiguation
- Mixed-type collections
Test Framework Implementation
The test framework is implemented in yamldatatest_loader.go and yamldatatest_test.go:
Core functions:
LoadYAML(data []byte) (interface{}, error)- Parses YAML using libyaml parser with scalar type resolution (exported)UnmarshalStruct(target interface{}, data map[string]interface{}) error- Populates structs (exported)LoadTestCases(filename string) ([]TestCase, error)- Loads and parses test YAML filescoerceScalar(value string) interface{}- Resolves scalar strings to appropriate Go types (int, float64, bool, nil, string)
Core types:
TestCasestruct - Umbrella structure containing fields for all test types- Uses
interface{}for flexible field types - Post-processing converts generic fields to specific types
- Uses
Post-processing: After loading, the framework processes test data:
- Converts
Want(interface{}) toWantEvents,WantTokens, orWantSpecsbased on test type - Converts
Want(interface{}) toWantContains(handles both scalar and sequence) - Converts
Checksto field validation specifications
Test Types
Scanner Tests
scan-tokens - Verify token sequence
- scan-tokens:
name: Simple scalar
yaml: |-
hello
want:
- STREAM_START_TOKEN
- SCALAR_TOKEN
- STREAM_END_TOKEN
scan-tokens-detailed - Verify token properties
- scan-tokens-detailed:
name: Single quoted scalar
yaml: |-
'hello world'
want:
- STREAM_START_TOKEN
- SCALAR_TOKEN:
style: SINGLE_QUOTED_SCALAR_STYLE
value: hello world
- STREAM_END_TOKEN
scan-error - Verify error detection
- scan-error:
name: Invalid character
yaml: "\x01"
Parser Tests
parse-events - Verify event sequence
- parse-events:
name: Simple mapping
yaml: |
key: value
want:
- STREAM_START_EVENT
- DOCUMENT_START_EVENT
- MAPPING_START_EVENT
- SCALAR_EVENT
- SCALAR_EVENT
- MAPPING_END_EVENT
- DOCUMENT_END_EVENT
- STREAM_END_EVENT
parse-events-detailed - Verify event properties
- parse-events-detailed:
name: Anchor and alias
yaml: |
- &anchor value
- *anchor
want:
- STREAM_START_EVENT
- DOCUMENT_START_EVENT
- SEQUENCE_START_EVENT
- SCALAR_EVENT:
anchor: anchor
value: value
- ALIAS_EVENT:
anchor: anchor
- SEQUENCE_END_EVENT
- DOCUMENT_END_EVENT
- STREAM_END_EVENT
parse-error - Verify error detection
- parse-error:
name: Error state
yaml: |
key: : invalid
Emitter Tests
emit - Emit events and verify output contains expected strings
- emit:
name: Simple scalar
data:
- STREAM_START_EVENT:
encoding: UTF8_ENCODING
- DOCUMENT_START_EVENT:
implicit: true
- SCALAR_EVENT:
value: hello
implicit: true
style: PLAIN_SCALAR_STYLE
- DOCUMENT_END_EVENT:
implicit: true
- STREAM_END_EVENT
want: hello
emit-config - Emit with configuration
- emit-config:
name: Custom indent
conf:
indent: 4
data:
- STREAM_START_EVENT:
encoding: UTF8_ENCODING
- DOCUMENT_START_EVENT:
implicit: true
- MAPPING_START_EVENT:
implicit: true
style: BLOCK_MAPPING_STYLE
# ... more events
want: key
roundtrip - Parse → emit, verify output
- roundtrip:
name: Roundtrip
yaml: |
key: value
list:
- item1
- item2
want:
- key
- value
- item1
emit-writer - Emit to io.Writer
- emit-writer:
name: Writer
data:
- STREAM_START_EVENT:
encoding: UTF8_ENCODING
# ... more events
want: test
API Tests
api-new - Test constructors
- api-new:
name: New parser
with: NewParser
test:
- nil: [raw-buffer, false]
- cap: [raw-buffer, 512]
- nil: [buffer, false]
- cap: [buffer, 1536]
api-method - Test methods and field state
- api-method:
name: Parser set input string
with: NewParser
byte: true
call: [SetInputString, 'key: value']
test:
- eq: [input, 'key: value']
- eq: [input-pos, 0]
- nil: [read-handler, false]
api-panic - Test methods that should panic
- api-panic:
name: Parser set input string twice
with: NewParser
byte: true
init: [SetInputString, first]
call: [SetInputString, second]
want: must set the input source only once
api-delete - Test cleanup
- api-delete:
name: Parser delete
with: NewParser
byte: true
init: [SetInputString, test]
test:
- len: [input, 0]
- len: [buffer, 0]
api-new-event - Test event constructors
- api-new-event:
name: New stream start event
call: [NewStreamStartEvent, UTF8_ENCODING]
test:
- eq: [Type, STREAM_START_EVENT]
- eq: [encoding, UTF8_ENCODING]
Utility Tests
enum-string - Test String() methods of enums
- enum-string:
name: Scalar style plain
enum: [ScalarStyle, PLAIN_SCALAR_STYLE]
want: Plain
style-accessor - Test style accessor methods
- style-accessor:
name: Event scalar style
test: [ScalarStyle, DOUBLE_QUOTED_SCALAR_STYLE]
Loader Tests
scalar-resolution - Test scalar type resolution
- scalar-resolution:
name: Positive integer
yaml: "42"
want: 42
- scalar-resolution:
name: Negative float
yaml: "-2.5"
want: -2.5
Resolution order:
- Boolean (true, false)
- Null (null keyword only)
- Hexadecimal integer (0x prefix)
- Float (contains .)
- Decimal integer
- String (fallback)
Common Keys in Test YAML Files
Test cases use a type-as-key format where the test type is the map key:
- test-type:
name: Test case name
# ... other fields
Common fields:
- name - Test case name (title case convention)
- yaml - Input YAML string to test
- want - Expected result (format varies by test type)
- For api-panic: string containing expected panic message substring
- For scan-error/parse-error: boolean (defaults to true if omitted; set to false if no error expected)
- For enum-string: string representing expected String() output
- For other types: varies (may be sequence or scalar)
- data - For emitter tests: list of event specifications to emit
- conf - For emitter config tests: emitter configuration options
- with - For API tests: constructor name (NewParser, NewEmitter)
- call - For API tests: method call [MethodName, arg1, arg2, ...]
- init - For API panic tests: setup method call before main method
- byte - For API tests: boolean flag to convert string args to []byte
- test - For API tests: list of field validation checks in format
operator: [field, value]where operator is one of: nil, cap, len, eq, gte, len-gt. - test - For style-accessor tests: array of [Method, STYLE] where Method is the accessor method (e.g., ScalarStyle) and STYLE is the style constant (e.g., DOUBLE_QUOTED_SCALAR_STYLE).
- enum - For enum tests: array of [Type, Value] where Type is the enum type (e.g., ScalarStyle) and Value is the constant (e.g., PLAIN_SCALAR_STYLE)
Note on scalar type resolution: Unquoted scalar values in test data are automatically resolved to appropriate Go types (int, float64, bool, nil) by the LoadYAML function. Quoted scalars remain as strings.
Running Tests
# Run all tests in the package
go test ./internal/libyaml
# Run specific test file
go test ./internal/libyaml -run TestScanner
go test ./internal/libyaml -run TestParser
go test ./internal/libyaml -run TestEmitter
go test ./internal/libyaml -run TestAPI
go test ./internal/libyaml -run TestYAML
go test ./internal/libyaml -run TestLoader
# Run specific test case (using subtest name)
go test ./internal/libyaml -run TestScanner/Block_sequence
go test ./internal/libyaml -run TestParser/Anchor_and_alias
go test ./internal/libyaml -run TestEmitter/Flow_mapping
go test ./internal/libyaml -run TestLoader/Scientific_notation_lowercase_e
# Run with verbose output
go test -v ./internal/libyaml
# Run with coverage
go test -cover ./internal/libyaml