working commit
This commit is contained in:
+533
@@ -0,0 +1,533 @@
|
||||
# internal/libyaml
|
||||
|
||||
This package provides low-level YAML processing functionality through a 3-stage
|
||||
pipeline: Scanner → Parser → Emitter.
|
||||
It implements the libyaml C library functionality in Go.
|
||||
|
||||
## Directory Overview
|
||||
|
||||
The `internal/libyaml` package implements the core YAML processing stages:
|
||||
|
||||
1. **Scanner** - Tokenizes YAML text into tokens
|
||||
2. **Parser** - Converts tokens into events following YAML grammar rules
|
||||
3. **Emitter** - Serializes events back into YAML text
|
||||
|
||||
## File Organization
|
||||
|
||||
### Main Source Files
|
||||
|
||||
- **scanner.go** - YAML scanner/tokenizer implementation
|
||||
- **parser.go** - YAML parser (tokens → events)
|
||||
- **emitter.go** - YAML emitter (events → YAML output)
|
||||
- **api.go** - Public API for Parser and Emitter types
|
||||
- **yaml.go** - Core types and constants (Event, Token, enums)
|
||||
- **reader.go** - Input handling and encoding detection
|
||||
- **writer.go** - Output handling
|
||||
- **yamlprivate.go** - Internal types and helper functions
|
||||
|
||||
### Test Files
|
||||
|
||||
- **scanner_test.go** - Scanner tests
|
||||
- **parser_test.go** - Parser tests
|
||||
- **emitter_test.go** - Emitter tests
|
||||
- **api_test.go** - API tests
|
||||
- **yaml_test.go** - Utility function tests
|
||||
- **reader_test.go** - Reader tests
|
||||
- **writer_test.go** - Writer tests
|
||||
- **yamlprivate_test.go** - Character classification tests
|
||||
- **loader_test.go** - Data loader scalar resolution tests
|
||||
- **yamldatatest_test.go** - YAML test data loading framework
|
||||
- **yamldatatest_loader.go** - YAML test data loader with scalar type resolution (exported for reuse)
|
||||
|
||||
### Test Data Files (in `testdata/`)
|
||||
|
||||
- **scanner.yaml** - Scanner test cases
|
||||
- **parser.yaml** - Parser test cases
|
||||
- **emitter.yaml** - Emitter test cases
|
||||
- **api.yaml** - API test cases
|
||||
- **yaml.yaml** - Utility function test cases
|
||||
- **reader.yaml** - Reader test cases
|
||||
- **writer.yaml** - Writer test cases
|
||||
- **yamlprivate.yaml** - Character classification test cases
|
||||
- **loader.yaml** - Data loader scalar resolution test cases
|
||||
|
||||
## Processing Pipeline
|
||||
|
||||
### 1. Scanner (scanner.go)
|
||||
|
||||
The scanner converts YAML text into tokens.
|
||||
|
||||
**Input**: Raw YAML text (string or []byte)
|
||||
**Output**: Stream of tokens
|
||||
|
||||
**Token types include**:
|
||||
- `SCALAR_TOKEN` - Plain, quoted, or block scalar values
|
||||
- `KEY_TOKEN`, `VALUE_TOKEN` - Mapping key/value indicators
|
||||
- `BLOCK_MAPPING_START_TOKEN`, `FLOW_MAPPING_START_TOKEN` - Mapping delimiters
|
||||
- `BLOCK_SEQUENCE_START_TOKEN`, `FLOW_SEQUENCE_START_TOKEN` - Sequence delimiters
|
||||
- `ANCHOR_TOKEN`, `ALIAS_TOKEN` - Anchor definitions and references
|
||||
- `TAG_TOKEN` - Type tags
|
||||
- `DOCUMENT_START_TOKEN`, `DOCUMENT_END_TOKEN` - Document boundaries
|
||||
|
||||
**Responsibilities**:
|
||||
- Character encoding detection (UTF-8, UTF-16LE, UTF-16BE)
|
||||
- Line break normalization
|
||||
- Indentation tracking
|
||||
- Quote and escape sequence handling
|
||||
|
||||
### 2. Parser (parser.go)
|
||||
|
||||
The parser converts tokens into events following YAML grammar rules.
|
||||
|
||||
**Input**: Stream of tokens from Scanner
|
||||
**Output**: Stream of events
|
||||
|
||||
**Event types include**:
|
||||
- `STREAM_START_EVENT`, `STREAM_END_EVENT` - Stream boundaries
|
||||
- `DOCUMENT_START_EVENT`, `DOCUMENT_END_EVENT` - Document boundaries
|
||||
- `SCALAR_EVENT` - Scalar values
|
||||
- `MAPPING_START_EVENT`, `MAPPING_END_EVENT` - Mapping boundaries
|
||||
- `SEQUENCE_START_EVENT`, `SEQUENCE_END_EVENT` - Sequence boundaries
|
||||
- `ALIAS_EVENT` - Anchor references
|
||||
|
||||
**Responsibilities**:
|
||||
- Implementing YAML grammar and validation
|
||||
- Managing document directives (%YAML, %TAG)
|
||||
- Resolving anchors and aliases
|
||||
- Tracking implicit vs explicit markers
|
||||
- Style preservation (plain, single-quoted, double-quoted, literal, folded)
|
||||
|
||||
### 3. Emitter (emitter.go)
|
||||
|
||||
The emitter converts events back into YAML text.
|
||||
|
||||
**Input**: Stream of events
|
||||
**Output**: YAML text
|
||||
|
||||
**Responsibilities**:
|
||||
- Style selection (plain/quoted scalars, block/flow collections)
|
||||
- Formatting control (canonical mode, indentation, line width)
|
||||
- Character encoding
|
||||
- Anchor and tag serialization
|
||||
- Document marker generation (---, ...)
|
||||
|
||||
**Configuration options**:
|
||||
- `Canonical` - Emit in canonical YAML form
|
||||
- `Indent` - Indentation width (2-9 spaces)
|
||||
- `Width` - Line width (-1 for unlimited)
|
||||
- `Unicode` - Enable Unicode character output
|
||||
- `LineBreak` - Line break style (LN, CR, CRLN)
|
||||
|
||||
## Testing Framework
|
||||
|
||||
### Test Architecture
|
||||
|
||||
The testing framework uses a data-driven approach:
|
||||
|
||||
1. **Test data** is stored in YAML files in the `testdata/` directory
|
||||
2. **Test logic** is implemented in Go files (`*_test.go`)
|
||||
3. **One-to-one pairing**: Each `testdata/foo.yaml` has a corresponding `foo_test.go`
|
||||
|
||||
**Benefits**:
|
||||
- Easy to add new test cases without writing Go code
|
||||
- Test data is human-readable and self-documenting
|
||||
- Test logic is reusable across many test cases
|
||||
- Test data is separated from test code for clarity
|
||||
- Tests can become a common suite for multiple YAML frameworks
|
||||
|
||||
### Test Data Files
|
||||
|
||||
Each YAML file contains test cases for a specific component:
|
||||
|
||||
- **scanner.yaml** - Scanner/tokenization tests
|
||||
- Token sequence verification
|
||||
- Token property validation (value, style)
|
||||
- Error detection
|
||||
|
||||
- **parser.yaml** - Parser/event generation tests
|
||||
- Event sequence verification
|
||||
- Event property validation (anchor, tag, value, directives)
|
||||
- Error detection
|
||||
|
||||
- **emitter.yaml** - Emitter/serialization tests
|
||||
- Event-to-YAML conversion
|
||||
- Configuration options testing
|
||||
- Roundtrip testing (parse → emit)
|
||||
- Writer integration
|
||||
|
||||
- **api.yaml** - API constructor and method tests
|
||||
- Constructor validation
|
||||
- Method behavior and state changes
|
||||
- Panic conditions
|
||||
- Cleanup verification
|
||||
|
||||
- **yaml.yaml** - Utility function tests
|
||||
- Enum String() methods
|
||||
- Style accessor methods
|
||||
|
||||
- **reader.yaml** - Reader/input handling tests
|
||||
- Encoding detection (UTF-8, UTF-16LE, UTF-16BE)
|
||||
- Buffer management
|
||||
- Error handling
|
||||
|
||||
- **writer.yaml** - Writer/output handling tests
|
||||
- Buffer flushing
|
||||
- Output handlers (string, io.Writer)
|
||||
- Error conditions
|
||||
|
||||
- **yamlprivate.yaml** - Character classification tests
|
||||
- Character type predicates (isAlpha, isDigit, isHex, etc.)
|
||||
- Character conversion functions (asDigit, asHex, width)
|
||||
- Unicode handling
|
||||
|
||||
- **loader.yaml** - Data loader scalar resolution tests
|
||||
- Numeric type resolution (integers, floats)
|
||||
- Boolean and null value handling
|
||||
- String vs numeric type disambiguation
|
||||
- Mixed-type collections
|
||||
|
||||
### Test Framework Implementation
|
||||
|
||||
The test framework is implemented in `yamldatatest_loader.go` and `yamldatatest_test.go`:
|
||||
|
||||
**Core functions**:
|
||||
- `LoadYAML(data []byte) (interface{}, error)` - Parses YAML using libyaml parser with scalar type resolution (exported)
|
||||
- `UnmarshalStruct(target interface{}, data map[string]interface{}) error` - Populates structs (exported)
|
||||
- `LoadTestCases(filename string) ([]TestCase, error)` - Loads and parses test YAML files
|
||||
- `coerceScalar(value string) interface{}` - Resolves scalar strings to appropriate Go types (int, float64, bool, nil, string)
|
||||
|
||||
**Core types**:
|
||||
- `TestCase` struct - Umbrella structure containing fields for all test types
|
||||
- Uses `interface{}` for flexible field types
|
||||
- Post-processing converts generic fields to specific types
|
||||
|
||||
**Post-processing**:
|
||||
After loading, the framework processes test data:
|
||||
- Converts `Want` (interface{}) to `WantEvents`, `WantTokens`, or `WantSpecs` based on test type
|
||||
- Converts `Want` (interface{}) to `WantContains` (handles both scalar and sequence)
|
||||
- Converts `Checks` to field validation specifications
|
||||
|
||||
### Test Types
|
||||
|
||||
#### Scanner Tests
|
||||
|
||||
**scan-tokens** - Verify token sequence
|
||||
|
||||
```yaml
|
||||
- scan-tokens:
|
||||
name: Simple scalar
|
||||
yaml: |-
|
||||
hello
|
||||
want:
|
||||
- STREAM_START_TOKEN
|
||||
- SCALAR_TOKEN
|
||||
- STREAM_END_TOKEN
|
||||
```
|
||||
|
||||
**scan-tokens-detailed** - Verify token properties
|
||||
|
||||
```yaml
|
||||
- scan-tokens-detailed:
|
||||
name: Single quoted scalar
|
||||
yaml: |-
|
||||
'hello world'
|
||||
want:
|
||||
- STREAM_START_TOKEN
|
||||
- SCALAR_TOKEN:
|
||||
style: SINGLE_QUOTED_SCALAR_STYLE
|
||||
value: hello world
|
||||
- STREAM_END_TOKEN
|
||||
```
|
||||
|
||||
**scan-error** - Verify error detection
|
||||
|
||||
```yaml
|
||||
- scan-error:
|
||||
name: Invalid character
|
||||
yaml: "\x01"
|
||||
```
|
||||
|
||||
#### Parser Tests
|
||||
|
||||
**parse-events** - Verify event sequence
|
||||
|
||||
```yaml
|
||||
- parse-events:
|
||||
name: Simple mapping
|
||||
yaml: |
|
||||
key: value
|
||||
want:
|
||||
- STREAM_START_EVENT
|
||||
- DOCUMENT_START_EVENT
|
||||
- MAPPING_START_EVENT
|
||||
- SCALAR_EVENT
|
||||
- SCALAR_EVENT
|
||||
- MAPPING_END_EVENT
|
||||
- DOCUMENT_END_EVENT
|
||||
- STREAM_END_EVENT
|
||||
```
|
||||
|
||||
**parse-events-detailed** - Verify event properties
|
||||
|
||||
```yaml
|
||||
- parse-events-detailed:
|
||||
name: Anchor and alias
|
||||
yaml: |
|
||||
- &anchor value
|
||||
- *anchor
|
||||
want:
|
||||
- STREAM_START_EVENT
|
||||
- DOCUMENT_START_EVENT
|
||||
- SEQUENCE_START_EVENT
|
||||
- SCALAR_EVENT:
|
||||
anchor: anchor
|
||||
value: value
|
||||
- ALIAS_EVENT:
|
||||
anchor: anchor
|
||||
- SEQUENCE_END_EVENT
|
||||
- DOCUMENT_END_EVENT
|
||||
- STREAM_END_EVENT
|
||||
```
|
||||
|
||||
**parse-error** - Verify error detection
|
||||
|
||||
```yaml
|
||||
- parse-error:
|
||||
name: Error state
|
||||
yaml: |
|
||||
key: : invalid
|
||||
```
|
||||
|
||||
#### Emitter Tests
|
||||
|
||||
**emit** - Emit events and verify output contains expected strings
|
||||
|
||||
```yaml
|
||||
- emit:
|
||||
name: Simple scalar
|
||||
data:
|
||||
- STREAM_START_EVENT:
|
||||
encoding: UTF8_ENCODING
|
||||
- DOCUMENT_START_EVENT:
|
||||
implicit: true
|
||||
- SCALAR_EVENT:
|
||||
value: hello
|
||||
implicit: true
|
||||
style: PLAIN_SCALAR_STYLE
|
||||
- DOCUMENT_END_EVENT:
|
||||
implicit: true
|
||||
- STREAM_END_EVENT
|
||||
want: hello
|
||||
```
|
||||
|
||||
**emit-config** - Emit with configuration
|
||||
|
||||
```yaml
|
||||
- emit-config:
|
||||
name: Custom indent
|
||||
conf:
|
||||
indent: 4
|
||||
data:
|
||||
- STREAM_START_EVENT:
|
||||
encoding: UTF8_ENCODING
|
||||
- DOCUMENT_START_EVENT:
|
||||
implicit: true
|
||||
- MAPPING_START_EVENT:
|
||||
implicit: true
|
||||
style: BLOCK_MAPPING_STYLE
|
||||
# ... more events
|
||||
want: key
|
||||
```
|
||||
|
||||
**roundtrip** - Parse → emit, verify output
|
||||
|
||||
```yaml
|
||||
- roundtrip:
|
||||
name: Roundtrip
|
||||
yaml: |
|
||||
key: value
|
||||
list:
|
||||
- item1
|
||||
- item2
|
||||
want:
|
||||
- key
|
||||
- value
|
||||
- item1
|
||||
```
|
||||
|
||||
**emit-writer** - Emit to io.Writer
|
||||
|
||||
```yaml
|
||||
- emit-writer:
|
||||
name: Writer
|
||||
data:
|
||||
- STREAM_START_EVENT:
|
||||
encoding: UTF8_ENCODING
|
||||
# ... more events
|
||||
want: test
|
||||
```
|
||||
|
||||
#### API Tests
|
||||
|
||||
**api-new** - Test constructors
|
||||
|
||||
```yaml
|
||||
- api-new:
|
||||
name: New parser
|
||||
with: NewParser
|
||||
test:
|
||||
- nil: [raw-buffer, false]
|
||||
- cap: [raw-buffer, 512]
|
||||
- nil: [buffer, false]
|
||||
- cap: [buffer, 1536]
|
||||
```
|
||||
|
||||
**api-method** - Test methods and field state
|
||||
|
||||
```yaml
|
||||
- api-method:
|
||||
name: Parser set input string
|
||||
with: NewParser
|
||||
byte: true
|
||||
call: [SetInputString, 'key: value']
|
||||
test:
|
||||
- eq: [input, 'key: value']
|
||||
- eq: [input-pos, 0]
|
||||
- nil: [read-handler, false]
|
||||
```
|
||||
|
||||
**api-panic** - Test methods that should panic
|
||||
|
||||
```yaml
|
||||
- api-panic:
|
||||
name: Parser set input string twice
|
||||
with: NewParser
|
||||
byte: true
|
||||
init: [SetInputString, first]
|
||||
call: [SetInputString, second]
|
||||
want: must set the input source only once
|
||||
```
|
||||
|
||||
**api-delete** - Test cleanup
|
||||
|
||||
```yaml
|
||||
- api-delete:
|
||||
name: Parser delete
|
||||
with: NewParser
|
||||
byte: true
|
||||
init: [SetInputString, test]
|
||||
test:
|
||||
- len: [input, 0]
|
||||
- len: [buffer, 0]
|
||||
```
|
||||
|
||||
**api-new-event** - Test event constructors
|
||||
|
||||
```yaml
|
||||
- api-new-event:
|
||||
name: New stream start event
|
||||
call: [NewStreamStartEvent, UTF8_ENCODING]
|
||||
test:
|
||||
- eq: [Type, STREAM_START_EVENT]
|
||||
- eq: [encoding, UTF8_ENCODING]
|
||||
```
|
||||
|
||||
#### Utility Tests
|
||||
|
||||
**enum-string** - Test String() methods of enums
|
||||
|
||||
```yaml
|
||||
- enum-string:
|
||||
name: Scalar style plain
|
||||
enum: [ScalarStyle, PLAIN_SCALAR_STYLE]
|
||||
want: Plain
|
||||
```
|
||||
|
||||
**style-accessor** - Test style accessor methods
|
||||
|
||||
```yaml
|
||||
- style-accessor:
|
||||
name: Event scalar style
|
||||
test: [ScalarStyle, DOUBLE_QUOTED_SCALAR_STYLE]
|
||||
```
|
||||
|
||||
#### Loader Tests
|
||||
|
||||
**scalar-resolution** - Test scalar type resolution
|
||||
|
||||
```yaml
|
||||
- scalar-resolution:
|
||||
name: Positive integer
|
||||
yaml: "42"
|
||||
want: 42
|
||||
|
||||
- scalar-resolution:
|
||||
name: Negative float
|
||||
yaml: "-2.5"
|
||||
want: -2.5
|
||||
```
|
||||
|
||||
**Resolution order**:
|
||||
1. Boolean (true, false)
|
||||
2. Null (null keyword only)
|
||||
3. Hexadecimal integer (0x prefix)
|
||||
4. Float (contains .)
|
||||
5. Decimal integer
|
||||
6. String (fallback)
|
||||
|
||||
### Common Keys in Test YAML Files
|
||||
|
||||
Test cases use a **type-as-key** format where the test type is the map key:
|
||||
|
||||
```yaml
|
||||
- test-type:
|
||||
name: Test case name
|
||||
# ... other fields
|
||||
```
|
||||
|
||||
**Common fields**:
|
||||
- **name** - Test case name (title case convention)
|
||||
- **yaml** - Input YAML string to test
|
||||
- **want** - Expected result (format varies by test type)
|
||||
- For api-panic: string containing expected panic message substring
|
||||
- For scan-error/parse-error: boolean (defaults to true if omitted; set to false if no error expected)
|
||||
- For enum-string: string representing expected String() output
|
||||
- For other types: varies (may be sequence or scalar)
|
||||
- **data** - For emitter tests: list of event specifications to emit
|
||||
- **conf** - For emitter config tests: emitter configuration options
|
||||
- **with** - For API tests: constructor name (NewParser, NewEmitter)
|
||||
- **call** - For API tests: method call [MethodName, arg1, arg2, ...]
|
||||
- **init** - For API panic tests: setup method call before main method
|
||||
- **byte** - For API tests: boolean flag to convert string args to []byte
|
||||
- **test** - For API tests: list of field validation checks in format `operator: [field, value]` where operator is one of: nil, cap, len, eq, gte, len-gt.
|
||||
- **test** - For style-accessor tests: array of [Method, STYLE] where Method is the accessor method (e.g., ScalarStyle) and STYLE is the style constant (e.g., DOUBLE_QUOTED_SCALAR_STYLE).
|
||||
- **enum** - For enum tests: array of [Type, Value] where Type is the enum type (e.g., ScalarStyle) and Value is the constant (e.g., PLAIN_SCALAR_STYLE)
|
||||
|
||||
**Note on scalar type resolution**: Unquoted scalar values in test data are automatically resolved to appropriate Go types (int, float64, bool, nil) by the `LoadYAML` function. Quoted scalars remain as strings.
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# Run all tests in the package
|
||||
go test ./internal/libyaml
|
||||
|
||||
# Run specific test file
|
||||
go test ./internal/libyaml -run TestScanner
|
||||
go test ./internal/libyaml -run TestParser
|
||||
go test ./internal/libyaml -run TestEmitter
|
||||
go test ./internal/libyaml -run TestAPI
|
||||
go test ./internal/libyaml -run TestYAML
|
||||
go test ./internal/libyaml -run TestLoader
|
||||
|
||||
# Run specific test case (using subtest name)
|
||||
go test ./internal/libyaml -run TestScanner/Block_sequence
|
||||
go test ./internal/libyaml -run TestParser/Anchor_and_alias
|
||||
go test ./internal/libyaml -run TestEmitter/Flow_mapping
|
||||
go test ./internal/libyaml -run TestLoader/Scientific_notation_lowercase_e
|
||||
|
||||
# Run with verbose output
|
||||
go test -v ./internal/libyaml
|
||||
|
||||
# Run with coverage
|
||||
go test -cover ./internal/libyaml
|
||||
```
|
||||
Reference in New Issue
Block a user