Files
..
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00
2026-03-24 10:31:30 +02:00

internal/libyaml

This package provides low-level YAML processing functionality through a 3-stage pipeline: Scanner → Parser → Emitter. It implements the libyaml C library functionality in Go.

Directory Overview

The internal/libyaml package implements the core YAML processing stages:

  1. Scanner - Tokenizes YAML text into tokens
  2. Parser - Converts tokens into events following YAML grammar rules
  3. Emitter - Serializes events back into YAML text

File Organization

Main Source Files

  • scanner.go - YAML scanner/tokenizer implementation
  • parser.go - YAML parser (tokens → events)
  • emitter.go - YAML emitter (events → YAML output)
  • api.go - Public API for Parser and Emitter types
  • yaml.go - Core types and constants (Event, Token, enums)
  • reader.go - Input handling and encoding detection
  • writer.go - Output handling
  • yamlprivate.go - Internal types and helper functions

Test Files

  • scanner_test.go - Scanner tests
  • parser_test.go - Parser tests
  • emitter_test.go - Emitter tests
  • api_test.go - API tests
  • yaml_test.go - Utility function tests
  • reader_test.go - Reader tests
  • writer_test.go - Writer tests
  • yamlprivate_test.go - Character classification tests
  • loader_test.go - Data loader scalar resolution tests
  • yamldatatest_test.go - YAML test data loading framework
  • yamldatatest_loader.go - YAML test data loader with scalar type resolution (exported for reuse)

Test Data Files (in testdata/)

  • scanner.yaml - Scanner test cases
  • parser.yaml - Parser test cases
  • emitter.yaml - Emitter test cases
  • api.yaml - API test cases
  • yaml.yaml - Utility function test cases
  • reader.yaml - Reader test cases
  • writer.yaml - Writer test cases
  • yamlprivate.yaml - Character classification test cases
  • loader.yaml - Data loader scalar resolution test cases

Processing Pipeline

1. Scanner (scanner.go)

The scanner converts YAML text into tokens.

Input: Raw YAML text (string or []byte) Output: Stream of tokens

Token types include:

  • SCALAR_TOKEN - Plain, quoted, or block scalar values
  • KEY_TOKEN, VALUE_TOKEN - Mapping key/value indicators
  • BLOCK_MAPPING_START_TOKEN, FLOW_MAPPING_START_TOKEN - Mapping delimiters
  • BLOCK_SEQUENCE_START_TOKEN, FLOW_SEQUENCE_START_TOKEN - Sequence delimiters
  • ANCHOR_TOKEN, ALIAS_TOKEN - Anchor definitions and references
  • TAG_TOKEN - Type tags
  • DOCUMENT_START_TOKEN, DOCUMENT_END_TOKEN - Document boundaries

Responsibilities:

  • Character encoding detection (UTF-8, UTF-16LE, UTF-16BE)
  • Line break normalization
  • Indentation tracking
  • Quote and escape sequence handling

2. Parser (parser.go)

The parser converts tokens into events following YAML grammar rules.

Input: Stream of tokens from Scanner Output: Stream of events

Event types include:

  • STREAM_START_EVENT, STREAM_END_EVENT - Stream boundaries
  • DOCUMENT_START_EVENT, DOCUMENT_END_EVENT - Document boundaries
  • SCALAR_EVENT - Scalar values
  • MAPPING_START_EVENT, MAPPING_END_EVENT - Mapping boundaries
  • SEQUENCE_START_EVENT, SEQUENCE_END_EVENT - Sequence boundaries
  • ALIAS_EVENT - Anchor references

Responsibilities:

  • Implementing YAML grammar and validation
  • Managing document directives (%YAML, %TAG)
  • Resolving anchors and aliases
  • Tracking implicit vs explicit markers
  • Style preservation (plain, single-quoted, double-quoted, literal, folded)

3. Emitter (emitter.go)

The emitter converts events back into YAML text.

Input: Stream of events Output: YAML text

Responsibilities:

  • Style selection (plain/quoted scalars, block/flow collections)
  • Formatting control (canonical mode, indentation, line width)
  • Character encoding
  • Anchor and tag serialization
  • Document marker generation (---, ...)

Configuration options:

  • Canonical - Emit in canonical YAML form
  • Indent - Indentation width (2-9 spaces)
  • Width - Line width (-1 for unlimited)
  • Unicode - Enable Unicode character output
  • LineBreak - Line break style (LN, CR, CRLN)

Testing Framework

Test Architecture

The testing framework uses a data-driven approach:

  1. Test data is stored in YAML files in the testdata/ directory
  2. Test logic is implemented in Go files (*_test.go)
  3. One-to-one pairing: Each testdata/foo.yaml has a corresponding foo_test.go

Benefits:

  • Easy to add new test cases without writing Go code
  • Test data is human-readable and self-documenting
  • Test logic is reusable across many test cases
  • Test data is separated from test code for clarity
  • Tests can become a common suite for multiple YAML frameworks

Test Data Files

Each YAML file contains test cases for a specific component:

  • scanner.yaml - Scanner/tokenization tests

    • Token sequence verification
    • Token property validation (value, style)
    • Error detection
  • parser.yaml - Parser/event generation tests

    • Event sequence verification
    • Event property validation (anchor, tag, value, directives)
    • Error detection
  • emitter.yaml - Emitter/serialization tests

    • Event-to-YAML conversion
    • Configuration options testing
    • Roundtrip testing (parse → emit)
    • Writer integration
  • api.yaml - API constructor and method tests

    • Constructor validation
    • Method behavior and state changes
    • Panic conditions
    • Cleanup verification
  • yaml.yaml - Utility function tests

    • Enum String() methods
    • Style accessor methods
  • reader.yaml - Reader/input handling tests

    • Encoding detection (UTF-8, UTF-16LE, UTF-16BE)
    • Buffer management
    • Error handling
  • writer.yaml - Writer/output handling tests

    • Buffer flushing
    • Output handlers (string, io.Writer)
    • Error conditions
  • yamlprivate.yaml - Character classification tests

    • Character type predicates (isAlpha, isDigit, isHex, etc.)
    • Character conversion functions (asDigit, asHex, width)
    • Unicode handling
  • loader.yaml - Data loader scalar resolution tests

    • Numeric type resolution (integers, floats)
    • Boolean and null value handling
    • String vs numeric type disambiguation
    • Mixed-type collections

Test Framework Implementation

The test framework is implemented in yamldatatest_loader.go and yamldatatest_test.go:

Core functions:

  • LoadYAML(data []byte) (interface{}, error) - Parses YAML using libyaml parser with scalar type resolution (exported)
  • UnmarshalStruct(target interface{}, data map[string]interface{}) error - Populates structs (exported)
  • LoadTestCases(filename string) ([]TestCase, error) - Loads and parses test YAML files
  • coerceScalar(value string) interface{} - Resolves scalar strings to appropriate Go types (int, float64, bool, nil, string)

Core types:

  • TestCase struct - Umbrella structure containing fields for all test types
    • Uses interface{} for flexible field types
    • Post-processing converts generic fields to specific types

Post-processing: After loading, the framework processes test data:

  • Converts Want (interface{}) to WantEvents, WantTokens, or WantSpecs based on test type
  • Converts Want (interface{}) to WantContains (handles both scalar and sequence)
  • Converts Checks to field validation specifications

Test Types

Scanner Tests

scan-tokens - Verify token sequence

- scan-tokens:
    name: Simple scalar
    yaml: |-
      hello
    want:
    - STREAM_START_TOKEN
    - SCALAR_TOKEN
    - STREAM_END_TOKEN

scan-tokens-detailed - Verify token properties

- scan-tokens-detailed:
    name: Single quoted scalar
    yaml: |-
      'hello world'
    want:
    - STREAM_START_TOKEN
    - SCALAR_TOKEN:
        style: SINGLE_QUOTED_SCALAR_STYLE
        value: hello world
    - STREAM_END_TOKEN

scan-error - Verify error detection

- scan-error:
    name: Invalid character
    yaml: "\x01"

Parser Tests

parse-events - Verify event sequence

- parse-events:
    name: Simple mapping
    yaml: |
      key: value
    want:
    - STREAM_START_EVENT
    - DOCUMENT_START_EVENT
    - MAPPING_START_EVENT
    - SCALAR_EVENT
    - SCALAR_EVENT
    - MAPPING_END_EVENT
    - DOCUMENT_END_EVENT
    - STREAM_END_EVENT

parse-events-detailed - Verify event properties

- parse-events-detailed:
    name: Anchor and alias
    yaml: |
      - &anchor value
      - *anchor
    want:
    - STREAM_START_EVENT
    - DOCUMENT_START_EVENT
    - SEQUENCE_START_EVENT
    - SCALAR_EVENT:
        anchor: anchor
        value: value
    - ALIAS_EVENT:
        anchor: anchor
    - SEQUENCE_END_EVENT
    - DOCUMENT_END_EVENT
    - STREAM_END_EVENT

parse-error - Verify error detection

- parse-error:
    name: Error state
    yaml: |
      key: : invalid

Emitter Tests

emit - Emit events and verify output contains expected strings

- emit:
    name: Simple scalar
    data:
    - STREAM_START_EVENT:
        encoding: UTF8_ENCODING
    - DOCUMENT_START_EVENT:
        implicit: true
    - SCALAR_EVENT:
        value: hello
        implicit: true
        style: PLAIN_SCALAR_STYLE
    - DOCUMENT_END_EVENT:
        implicit: true
    - STREAM_END_EVENT
    want: hello

emit-config - Emit with configuration

- emit-config:
    name: Custom indent
    conf:
      indent: 4
    data:
    - STREAM_START_EVENT:
        encoding: UTF8_ENCODING
    - DOCUMENT_START_EVENT:
        implicit: true
    - MAPPING_START_EVENT:
        implicit: true
        style: BLOCK_MAPPING_STYLE
    # ... more events
    want: key

roundtrip - Parse → emit, verify output

- roundtrip:
    name: Roundtrip
    yaml: |
      key: value
      list:
        - item1
        - item2
    want:
    - key
    - value
    - item1

emit-writer - Emit to io.Writer

- emit-writer:
    name: Writer
    data:
    - STREAM_START_EVENT:
        encoding: UTF8_ENCODING
    # ... more events
    want: test

API Tests

api-new - Test constructors

- api-new:
    name: New parser
    with: NewParser
    test:
    - nil: [raw-buffer, false]
    - cap: [raw-buffer, 512]
    - nil: [buffer, false]
    - cap: [buffer, 1536]

api-method - Test methods and field state

- api-method:
    name: Parser set input string
    with: NewParser
    byte: true
    call: [SetInputString, 'key: value']
    test:
    - eq: [input, 'key: value']
    - eq: [input-pos, 0]
    - nil: [read-handler, false]

api-panic - Test methods that should panic

- api-panic:
    name: Parser set input string twice
    with: NewParser
    byte: true
    init: [SetInputString, first]
    call: [SetInputString, second]
    want: must set the input source only once

api-delete - Test cleanup

- api-delete:
    name: Parser delete
    with: NewParser
    byte: true
    init: [SetInputString, test]
    test:
    - len: [input, 0]
    - len: [buffer, 0]

api-new-event - Test event constructors

- api-new-event:
    name: New stream start event
    call: [NewStreamStartEvent, UTF8_ENCODING]
    test:
    - eq: [Type, STREAM_START_EVENT]
    - eq: [encoding, UTF8_ENCODING]

Utility Tests

enum-string - Test String() methods of enums

- enum-string:
    name: Scalar style plain
    enum: [ScalarStyle, PLAIN_SCALAR_STYLE]
    want: Plain

style-accessor - Test style accessor methods

- style-accessor:
    name: Event scalar style
    test: [ScalarStyle, DOUBLE_QUOTED_SCALAR_STYLE]

Loader Tests

scalar-resolution - Test scalar type resolution

- scalar-resolution:
    name: Positive integer
    yaml: "42"
    want: 42

- scalar-resolution:
    name: Negative float
    yaml: "-2.5"
    want: -2.5

Resolution order:

  1. Boolean (true, false)
  2. Null (null keyword only)
  3. Hexadecimal integer (0x prefix)
  4. Float (contains .)
  5. Decimal integer
  6. String (fallback)

Common Keys in Test YAML Files

Test cases use a type-as-key format where the test type is the map key:

- test-type:
    name: Test case name
    # ... other fields

Common fields:

  • name - Test case name (title case convention)
  • yaml - Input YAML string to test
  • want - Expected result (format varies by test type)
    • For api-panic: string containing expected panic message substring
    • For scan-error/parse-error: boolean (defaults to true if omitted; set to false if no error expected)
    • For enum-string: string representing expected String() output
    • For other types: varies (may be sequence or scalar)
  • data - For emitter tests: list of event specifications to emit
  • conf - For emitter config tests: emitter configuration options
  • with - For API tests: constructor name (NewParser, NewEmitter)
  • call - For API tests: method call [MethodName, arg1, arg2, ...]
  • init - For API panic tests: setup method call before main method
  • byte - For API tests: boolean flag to convert string args to []byte
  • test - For API tests: list of field validation checks in format operator: [field, value] where operator is one of: nil, cap, len, eq, gte, len-gt.
  • test - For style-accessor tests: array of [Method, STYLE] where Method is the accessor method (e.g., ScalarStyle) and STYLE is the style constant (e.g., DOUBLE_QUOTED_SCALAR_STYLE).
  • enum - For enum tests: array of [Type, Value] where Type is the enum type (e.g., ScalarStyle) and Value is the constant (e.g., PLAIN_SCALAR_STYLE)

Note on scalar type resolution: Unquoted scalar values in test data are automatically resolved to appropriate Go types (int, float64, bool, nil) by the LoadYAML function. Quoted scalars remain as strings.

Running Tests

# Run all tests in the package
go test ./internal/libyaml

# Run specific test file
go test ./internal/libyaml -run TestScanner
go test ./internal/libyaml -run TestParser
go test ./internal/libyaml -run TestEmitter
go test ./internal/libyaml -run TestAPI
go test ./internal/libyaml -run TestYAML
go test ./internal/libyaml -run TestLoader

# Run specific test case (using subtest name)
go test ./internal/libyaml -run TestScanner/Block_sequence
go test ./internal/libyaml -run TestParser/Anchor_and_alias
go test ./internal/libyaml -run TestEmitter/Flow_mapping
go test ./internal/libyaml -run TestLoader/Scientific_notation_lowercase_e

# Run with verbose output
go test -v ./internal/libyaml

# Run with coverage
go test -cover ./internal/libyaml