import template code

This commit is contained in:
2026-03-24 10:31:30 +02:00
commit b443292720
974 changed files with 487563 additions and 0 deletions
+533
View File
@@ -0,0 +1,533 @@
# internal/libyaml
This package provides low-level YAML processing functionality through a 3-stage
pipeline: Scanner → Parser → Emitter.
It implements the libyaml C library functionality in Go.
## Directory Overview
The `internal/libyaml` package implements the core YAML processing stages:
1. **Scanner** - Tokenizes YAML text into tokens
2. **Parser** - Converts tokens into events following YAML grammar rules
3. **Emitter** - Serializes events back into YAML text
## File Organization
### Main Source Files
- **scanner.go** - YAML scanner/tokenizer implementation
- **parser.go** - YAML parser (tokens → events)
- **emitter.go** - YAML emitter (events → YAML output)
- **api.go** - Public API for Parser and Emitter types
- **yaml.go** - Core types and constants (Event, Token, enums)
- **reader.go** - Input handling and encoding detection
- **writer.go** - Output handling
- **yamlprivate.go** - Internal types and helper functions
### Test Files
- **scanner_test.go** - Scanner tests
- **parser_test.go** - Parser tests
- **emitter_test.go** - Emitter tests
- **api_test.go** - API tests
- **yaml_test.go** - Utility function tests
- **reader_test.go** - Reader tests
- **writer_test.go** - Writer tests
- **yamlprivate_test.go** - Character classification tests
- **loader_test.go** - Data loader scalar resolution tests
- **yamldatatest_test.go** - YAML test data loading framework
- **yamldatatest_loader.go** - YAML test data loader with scalar type resolution (exported for reuse)
### Test Data Files (in `testdata/`)
- **scanner.yaml** - Scanner test cases
- **parser.yaml** - Parser test cases
- **emitter.yaml** - Emitter test cases
- **api.yaml** - API test cases
- **yaml.yaml** - Utility function test cases
- **reader.yaml** - Reader test cases
- **writer.yaml** - Writer test cases
- **yamlprivate.yaml** - Character classification test cases
- **loader.yaml** - Data loader scalar resolution test cases
## Processing Pipeline
### 1. Scanner (scanner.go)
The scanner converts YAML text into tokens.
**Input**: Raw YAML text (string or []byte)
**Output**: Stream of tokens
**Token types include**:
- `SCALAR_TOKEN` - Plain, quoted, or block scalar values
- `KEY_TOKEN`, `VALUE_TOKEN` - Mapping key/value indicators
- `BLOCK_MAPPING_START_TOKEN`, `FLOW_MAPPING_START_TOKEN` - Mapping delimiters
- `BLOCK_SEQUENCE_START_TOKEN`, `FLOW_SEQUENCE_START_TOKEN` - Sequence delimiters
- `ANCHOR_TOKEN`, `ALIAS_TOKEN` - Anchor definitions and references
- `TAG_TOKEN` - Type tags
- `DOCUMENT_START_TOKEN`, `DOCUMENT_END_TOKEN` - Document boundaries
**Responsibilities**:
- Character encoding detection (UTF-8, UTF-16LE, UTF-16BE)
- Line break normalization
- Indentation tracking
- Quote and escape sequence handling
### 2. Parser (parser.go)
The parser converts tokens into events following YAML grammar rules.
**Input**: Stream of tokens from Scanner
**Output**: Stream of events
**Event types include**:
- `STREAM_START_EVENT`, `STREAM_END_EVENT` - Stream boundaries
- `DOCUMENT_START_EVENT`, `DOCUMENT_END_EVENT` - Document boundaries
- `SCALAR_EVENT` - Scalar values
- `MAPPING_START_EVENT`, `MAPPING_END_EVENT` - Mapping boundaries
- `SEQUENCE_START_EVENT`, `SEQUENCE_END_EVENT` - Sequence boundaries
- `ALIAS_EVENT` - Anchor references
**Responsibilities**:
- Implementing YAML grammar and validation
- Managing document directives (%YAML, %TAG)
- Resolving anchors and aliases
- Tracking implicit vs explicit markers
- Style preservation (plain, single-quoted, double-quoted, literal, folded)
### 3. Emitter (emitter.go)
The emitter converts events back into YAML text.
**Input**: Stream of events
**Output**: YAML text
**Responsibilities**:
- Style selection (plain/quoted scalars, block/flow collections)
- Formatting control (canonical mode, indentation, line width)
- Character encoding
- Anchor and tag serialization
- Document marker generation (---, ...)
**Configuration options**:
- `Canonical` - Emit in canonical YAML form
- `Indent` - Indentation width (2-9 spaces)
- `Width` - Line width (-1 for unlimited)
- `Unicode` - Enable Unicode character output
- `LineBreak` - Line break style (LN, CR, CRLN)
## Testing Framework
### Test Architecture
The testing framework uses a data-driven approach:
1. **Test data** is stored in YAML files in the `testdata/` directory
2. **Test logic** is implemented in Go files (`*_test.go`)
3. **One-to-one pairing**: Each `testdata/foo.yaml` has a corresponding `foo_test.go`
**Benefits**:
- Easy to add new test cases without writing Go code
- Test data is human-readable and self-documenting
- Test logic is reusable across many test cases
- Test data is separated from test code for clarity
- Tests can become a common suite for multiple YAML frameworks
### Test Data Files
Each YAML file contains test cases for a specific component:
- **scanner.yaml** - Scanner/tokenization tests
- Token sequence verification
- Token property validation (value, style)
- Error detection
- **parser.yaml** - Parser/event generation tests
- Event sequence verification
- Event property validation (anchor, tag, value, directives)
- Error detection
- **emitter.yaml** - Emitter/serialization tests
- Event-to-YAML conversion
- Configuration options testing
- Roundtrip testing (parse → emit)
- Writer integration
- **api.yaml** - API constructor and method tests
- Constructor validation
- Method behavior and state changes
- Panic conditions
- Cleanup verification
- **yaml.yaml** - Utility function tests
- Enum String() methods
- Style accessor methods
- **reader.yaml** - Reader/input handling tests
- Encoding detection (UTF-8, UTF-16LE, UTF-16BE)
- Buffer management
- Error handling
- **writer.yaml** - Writer/output handling tests
- Buffer flushing
- Output handlers (string, io.Writer)
- Error conditions
- **yamlprivate.yaml** - Character classification tests
- Character type predicates (isAlpha, isDigit, isHex, etc.)
- Character conversion functions (asDigit, asHex, width)
- Unicode handling
- **loader.yaml** - Data loader scalar resolution tests
- Numeric type resolution (integers, floats)
- Boolean and null value handling
- String vs numeric type disambiguation
- Mixed-type collections
### Test Framework Implementation
The test framework is implemented in `yamldatatest_loader.go` and `yamldatatest_test.go`:
**Core functions**:
- `LoadYAML(data []byte) (interface{}, error)` - Parses YAML using libyaml parser with scalar type resolution (exported)
- `UnmarshalStruct(target interface{}, data map[string]interface{}) error` - Populates structs (exported)
- `LoadTestCases(filename string) ([]TestCase, error)` - Loads and parses test YAML files
- `coerceScalar(value string) interface{}` - Resolves scalar strings to appropriate Go types (int, float64, bool, nil, string)
**Core types**:
- `TestCase` struct - Umbrella structure containing fields for all test types
- Uses `interface{}` for flexible field types
- Post-processing converts generic fields to specific types
**Post-processing**:
After loading, the framework processes test data:
- Converts `Want` (interface{}) to `WantEvents`, `WantTokens`, or `WantSpecs` based on test type
- Converts `Want` (interface{}) to `WantContains` (handles both scalar and sequence)
- Converts `Checks` to field validation specifications
### Test Types
#### Scanner Tests
**scan-tokens** - Verify token sequence
```yaml
- scan-tokens:
name: Simple scalar
yaml: |-
hello
want:
- STREAM_START_TOKEN
- SCALAR_TOKEN
- STREAM_END_TOKEN
```
**scan-tokens-detailed** - Verify token properties
```yaml
- scan-tokens-detailed:
name: Single quoted scalar
yaml: |-
'hello world'
want:
- STREAM_START_TOKEN
- SCALAR_TOKEN:
style: SINGLE_QUOTED_SCALAR_STYLE
value: hello world
- STREAM_END_TOKEN
```
**scan-error** - Verify error detection
```yaml
- scan-error:
name: Invalid character
yaml: "\x01"
```
#### Parser Tests
**parse-events** - Verify event sequence
```yaml
- parse-events:
name: Simple mapping
yaml: |
key: value
want:
- STREAM_START_EVENT
- DOCUMENT_START_EVENT
- MAPPING_START_EVENT
- SCALAR_EVENT
- SCALAR_EVENT
- MAPPING_END_EVENT
- DOCUMENT_END_EVENT
- STREAM_END_EVENT
```
**parse-events-detailed** - Verify event properties
```yaml
- parse-events-detailed:
name: Anchor and alias
yaml: |
- &anchor value
- *anchor
want:
- STREAM_START_EVENT
- DOCUMENT_START_EVENT
- SEQUENCE_START_EVENT
- SCALAR_EVENT:
anchor: anchor
value: value
- ALIAS_EVENT:
anchor: anchor
- SEQUENCE_END_EVENT
- DOCUMENT_END_EVENT
- STREAM_END_EVENT
```
**parse-error** - Verify error detection
```yaml
- parse-error:
name: Error state
yaml: |
key: : invalid
```
#### Emitter Tests
**emit** - Emit events and verify output contains expected strings
```yaml
- emit:
name: Simple scalar
data:
- STREAM_START_EVENT:
encoding: UTF8_ENCODING
- DOCUMENT_START_EVENT:
implicit: true
- SCALAR_EVENT:
value: hello
implicit: true
style: PLAIN_SCALAR_STYLE
- DOCUMENT_END_EVENT:
implicit: true
- STREAM_END_EVENT
want: hello
```
**emit-config** - Emit with configuration
```yaml
- emit-config:
name: Custom indent
conf:
indent: 4
data:
- STREAM_START_EVENT:
encoding: UTF8_ENCODING
- DOCUMENT_START_EVENT:
implicit: true
- MAPPING_START_EVENT:
implicit: true
style: BLOCK_MAPPING_STYLE
# ... more events
want: key
```
**roundtrip** - Parse → emit, verify output
```yaml
- roundtrip:
name: Roundtrip
yaml: |
key: value
list:
- item1
- item2
want:
- key
- value
- item1
```
**emit-writer** - Emit to io.Writer
```yaml
- emit-writer:
name: Writer
data:
- STREAM_START_EVENT:
encoding: UTF8_ENCODING
# ... more events
want: test
```
#### API Tests
**api-new** - Test constructors
```yaml
- api-new:
name: New parser
with: NewParser
test:
- nil: [raw-buffer, false]
- cap: [raw-buffer, 512]
- nil: [buffer, false]
- cap: [buffer, 1536]
```
**api-method** - Test methods and field state
```yaml
- api-method:
name: Parser set input string
with: NewParser
byte: true
call: [SetInputString, 'key: value']
test:
- eq: [input, 'key: value']
- eq: [input-pos, 0]
- nil: [read-handler, false]
```
**api-panic** - Test methods that should panic
```yaml
- api-panic:
name: Parser set input string twice
with: NewParser
byte: true
init: [SetInputString, first]
call: [SetInputString, second]
want: must set the input source only once
```
**api-delete** - Test cleanup
```yaml
- api-delete:
name: Parser delete
with: NewParser
byte: true
init: [SetInputString, test]
test:
- len: [input, 0]
- len: [buffer, 0]
```
**api-new-event** - Test event constructors
```yaml
- api-new-event:
name: New stream start event
call: [NewStreamStartEvent, UTF8_ENCODING]
test:
- eq: [Type, STREAM_START_EVENT]
- eq: [encoding, UTF8_ENCODING]
```
#### Utility Tests
**enum-string** - Test String() methods of enums
```yaml
- enum-string:
name: Scalar style plain
enum: [ScalarStyle, PLAIN_SCALAR_STYLE]
want: Plain
```
**style-accessor** - Test style accessor methods
```yaml
- style-accessor:
name: Event scalar style
test: [ScalarStyle, DOUBLE_QUOTED_SCALAR_STYLE]
```
#### Loader Tests
**scalar-resolution** - Test scalar type resolution
```yaml
- scalar-resolution:
name: Positive integer
yaml: "42"
want: 42
- scalar-resolution:
name: Negative float
yaml: "-2.5"
want: -2.5
```
**Resolution order**:
1. Boolean (true, false)
2. Null (null keyword only)
3. Hexadecimal integer (0x prefix)
4. Float (contains .)
5. Decimal integer
6. String (fallback)
### Common Keys in Test YAML Files
Test cases use a **type-as-key** format where the test type is the map key:
```yaml
- test-type:
name: Test case name
# ... other fields
```
**Common fields**:
- **name** - Test case name (title case convention)
- **yaml** - Input YAML string to test
- **want** - Expected result (format varies by test type)
- For api-panic: string containing expected panic message substring
- For scan-error/parse-error: boolean (defaults to true if omitted; set to false if no error expected)
- For enum-string: string representing expected String() output
- For other types: varies (may be sequence or scalar)
- **data** - For emitter tests: list of event specifications to emit
- **conf** - For emitter config tests: emitter configuration options
- **with** - For API tests: constructor name (NewParser, NewEmitter)
- **call** - For API tests: method call [MethodName, arg1, arg2, ...]
- **init** - For API panic tests: setup method call before main method
- **byte** - For API tests: boolean flag to convert string args to []byte
- **test** - For API tests: list of field validation checks in format `operator: [field, value]` where operator is one of: nil, cap, len, eq, gte, len-gt.
- **test** - For style-accessor tests: array of [Method, STYLE] where Method is the accessor method (e.g., ScalarStyle) and STYLE is the style constant (e.g., DOUBLE_QUOTED_SCALAR_STYLE).
- **enum** - For enum tests: array of [Type, Value] where Type is the enum type (e.g., ScalarStyle) and Value is the constant (e.g., PLAIN_SCALAR_STYLE)
**Note on scalar type resolution**: Unquoted scalar values in test data are automatically resolved to appropriate Go types (int, float64, bool, nil) by the `LoadYAML` function. Quoted scalars remain as strings.
### Running Tests
```bash
# Run all tests in the package
go test ./internal/libyaml
# Run specific test file
go test ./internal/libyaml -run TestScanner
go test ./internal/libyaml -run TestParser
go test ./internal/libyaml -run TestEmitter
go test ./internal/libyaml -run TestAPI
go test ./internal/libyaml -run TestYAML
go test ./internal/libyaml -run TestLoader
# Run specific test case (using subtest name)
go test ./internal/libyaml -run TestScanner/Block_sequence
go test ./internal/libyaml -run TestParser/Anchor_and_alias
go test ./internal/libyaml -run TestEmitter/Flow_mapping
go test ./internal/libyaml -run TestLoader/Scientific_notation_lowercase_e
# Run with verbose output
go test -v ./internal/libyaml
# Run with coverage
go test -cover ./internal/libyaml
```
+733
View File
@@ -0,0 +1,733 @@
// Copyright 2006-2010 Kirill Simonov
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0 AND MIT
// High-level API helpers for parser and emitter initialization and
// configuration.
// Provides convenience functions for token insertion and stream management.
package libyaml
import (
"io"
)
func (parser *Parser) insertToken(pos int, token *Token) {
// fmt.Println("yaml_insert_token", "pos:", pos, "typ:", token.typ, "head:", parser.tokens_head, "len:", len(parser.tokens))
// Check if we can move the queue at the beginning of the buffer.
if parser.tokens_head > 0 && len(parser.tokens) == cap(parser.tokens) {
if parser.tokens_head != len(parser.tokens) {
copy(parser.tokens, parser.tokens[parser.tokens_head:])
}
parser.tokens = parser.tokens[:len(parser.tokens)-parser.tokens_head]
parser.tokens_head = 0
}
parser.tokens = append(parser.tokens, *token)
if pos < 0 {
return
}
copy(parser.tokens[parser.tokens_head+pos+1:], parser.tokens[parser.tokens_head+pos:])
parser.tokens[parser.tokens_head+pos] = *token
}
// NewParser creates a new parser object.
func NewParser() Parser {
return Parser{
raw_buffer: make([]byte, 0, input_raw_buffer_size),
buffer: make([]byte, 0, input_buffer_size),
}
}
// Delete a parser object.
func (parser *Parser) Delete() {
*parser = Parser{}
}
// String read handler.
func yamlStringReadHandler(parser *Parser, buffer []byte) (n int, err error) {
if parser.input_pos == len(parser.input) {
return 0, io.EOF
}
n = copy(buffer, parser.input[parser.input_pos:])
parser.input_pos += n
return n, nil
}
// Reader read handler.
func yamlReaderReadHandler(parser *Parser, buffer []byte) (n int, err error) {
return parser.input_reader.Read(buffer)
}
// SetInputString sets a string input.
func (parser *Parser) SetInputString(input []byte) {
if parser.read_handler != nil {
panic("must set the input source only once")
}
parser.read_handler = yamlStringReadHandler
parser.input = input
parser.input_pos = 0
}
// SetInputReader sets a file input.
func (parser *Parser) SetInputReader(r io.Reader) {
if parser.read_handler != nil {
panic("must set the input source only once")
}
parser.read_handler = yamlReaderReadHandler
parser.input_reader = r
}
// SetEncoding sets the source encoding.
func (parser *Parser) SetEncoding(encoding Encoding) {
if parser.encoding != ANY_ENCODING {
panic("must set the encoding only once")
}
parser.encoding = encoding
}
// GetPendingComments returns the parser's comment queue for CLI access.
func (parser *Parser) GetPendingComments() []Comment {
return parser.comments
}
// GetCommentsHead returns the current position in the comment queue.
func (parser *Parser) GetCommentsHead() int {
return parser.comments_head
}
// NewEmitter creates a new emitter object.
func NewEmitter() Emitter {
return Emitter{
buffer: make([]byte, output_buffer_size),
states: make([]EmitterState, 0, initial_stack_size),
events: make([]Event, 0, initial_queue_size),
best_width: -1,
}
}
// Delete an emitter object.
func (emitter *Emitter) Delete() {
*emitter = Emitter{}
}
// String write handler.
func yamlStringWriteHandler(emitter *Emitter, buffer []byte) error {
*emitter.output_buffer = append(*emitter.output_buffer, buffer...)
return nil
}
// yamlWriterWriteHandler uses emitter.output_writer to write the
// emitted text.
func yamlWriterWriteHandler(emitter *Emitter, buffer []byte) error {
_, err := emitter.output_writer.Write(buffer)
return err
}
// SetOutputString sets a string output.
func (emitter *Emitter) SetOutputString(output_buffer *[]byte) {
if emitter.write_handler != nil {
panic("must set the output target only once")
}
emitter.write_handler = yamlStringWriteHandler
emitter.output_buffer = output_buffer
}
// SetOutputWriter sets a file output.
func (emitter *Emitter) SetOutputWriter(w io.Writer) {
if emitter.write_handler != nil {
panic("must set the output target only once")
}
emitter.write_handler = yamlWriterWriteHandler
emitter.output_writer = w
}
// SetEncoding sets the output encoding.
func (emitter *Emitter) SetEncoding(encoding Encoding) {
if emitter.encoding != ANY_ENCODING {
panic("must set the output encoding only once")
}
emitter.encoding = encoding
}
// SetCanonical sets the canonical output style.
func (emitter *Emitter) SetCanonical(canonical bool) {
emitter.canonical = canonical
}
// SetIndent sets the indentation increment.
func (emitter *Emitter) SetIndent(indent int) {
if indent < 2 || indent > 9 {
indent = 2
}
emitter.BestIndent = indent
}
// SetWidth sets the preferred line width.
func (emitter *Emitter) SetWidth(width int) {
if width < 0 {
width = -1
}
emitter.best_width = width
}
// SetUnicode sets if unescaped non-ASCII characters are allowed.
func (emitter *Emitter) SetUnicode(unicode bool) {
emitter.unicode = unicode
}
// SetLineBreak sets the preferred line break character.
func (emitter *Emitter) SetLineBreak(line_break LineBreak) {
emitter.line_break = line_break
}
///*
// * Destroy a token object.
// */
//
//YAML_DECLARE(void)
//yaml_token_delete(yaml_token_t *token)
//{
// assert(token); // Non-NULL token object expected.
//
// switch (token.type)
// {
// case YAML_TAG_DIRECTIVE_TOKEN:
// yaml_free(token.data.tag_directive.handle);
// yaml_free(token.data.tag_directive.prefix);
// break;
//
// case YAML_ALIAS_TOKEN:
// yaml_free(token.data.alias.value);
// break;
//
// case YAML_ANCHOR_TOKEN:
// yaml_free(token.data.anchor.value);
// break;
//
// case YAML_TAG_TOKEN:
// yaml_free(token.data.tag.handle);
// yaml_free(token.data.tag.suffix);
// break;
//
// case YAML_SCALAR_TOKEN:
// yaml_free(token.data.scalar.value);
// break;
//
// default:
// break;
// }
//
// memset(token, 0, sizeof(yaml_token_t));
//}
//
///*
// * Check if a string is a valid UTF-8 sequence.
// *
// * Check 'reader.c' for more details on UTF-8 encoding.
// */
//
//static int
//yaml_check_utf8(yaml_char_t *start, size_t length)
//{
// yaml_char_t *end = start+length;
// yaml_char_t *pointer = start;
//
// while (pointer < end) {
// unsigned char octet;
// unsigned int width;
// unsigned int value;
// size_t k;
//
// octet = pointer[0];
// width = (octet & 0x80) == 0x00 ? 1 :
// (octet & 0xE0) == 0xC0 ? 2 :
// (octet & 0xF0) == 0xE0 ? 3 :
// (octet & 0xF8) == 0xF0 ? 4 : 0;
// value = (octet & 0x80) == 0x00 ? octet & 0x7F :
// (octet & 0xE0) == 0xC0 ? octet & 0x1F :
// (octet & 0xF0) == 0xE0 ? octet & 0x0F :
// (octet & 0xF8) == 0xF0 ? octet & 0x07 : 0;
// if (!width) return 0;
// if (pointer+width > end) return 0;
// for (k = 1; k < width; k ++) {
// octet = pointer[k];
// if ((octet & 0xC0) != 0x80) return 0;
// value = (value << 6) + (octet & 0x3F);
// }
// if (!((width == 1) ||
// (width == 2 && value >= 0x80) ||
// (width == 3 && value >= 0x800) ||
// (width == 4 && value >= 0x10000))) return 0;
//
// pointer += width;
// }
//
// return 1;
//}
//
// NewStreamStartEvent creates a new STREAM-START event.
func NewStreamStartEvent(encoding Encoding) Event {
return Event{
Type: STREAM_START_EVENT,
encoding: encoding,
}
}
// NewStreamEndEvent creates a new STREAM-END event.
func NewStreamEndEvent() Event {
return Event{
Type: STREAM_END_EVENT,
}
}
// NewDocumentStartEvent creates a new DOCUMENT-START event.
func NewDocumentStartEvent(version_directive *VersionDirective, tag_directives []TagDirective, implicit bool) Event {
return Event{
Type: DOCUMENT_START_EVENT,
versionDirective: version_directive,
tagDirectives: tag_directives,
Implicit: implicit,
}
}
// NewDocumentEndEvent creates a new DOCUMENT-END event.
func NewDocumentEndEvent(implicit bool) Event {
return Event{
Type: DOCUMENT_END_EVENT,
Implicit: implicit,
}
}
// NewAliasEvent creates a new ALIAS event.
func NewAliasEvent(anchor []byte) Event {
return Event{
Type: ALIAS_EVENT,
Anchor: anchor,
}
}
// NewScalarEvent creates a new SCALAR event.
func NewScalarEvent(anchor, tag, value []byte, plain_implicit, quoted_implicit bool, style ScalarStyle) Event {
return Event{
Type: SCALAR_EVENT,
Anchor: anchor,
Tag: tag,
Value: value,
Implicit: plain_implicit,
quoted_implicit: quoted_implicit,
Style: Style(style),
}
}
// NewSequenceStartEvent creates a new SEQUENCE-START event.
func NewSequenceStartEvent(anchor, tag []byte, implicit bool, style SequenceStyle) Event {
return Event{
Type: SEQUENCE_START_EVENT,
Anchor: anchor,
Tag: tag,
Implicit: implicit,
Style: Style(style),
}
}
// NewSequenceEndEvent creates a new SEQUENCE-END event.
func NewSequenceEndEvent() Event {
return Event{
Type: SEQUENCE_END_EVENT,
}
}
// NewMappingStartEvent creates a new MAPPING-START event.
func NewMappingStartEvent(anchor, tag []byte, implicit bool, style MappingStyle) Event {
return Event{
Type: MAPPING_START_EVENT,
Anchor: anchor,
Tag: tag,
Implicit: implicit,
Style: Style(style),
}
}
// NewMappingEndEvent creates a new MAPPING-END event.
func NewMappingEndEvent() Event {
return Event{
Type: MAPPING_END_EVENT,
}
}
// Delete an event object.
func (e *Event) Delete() {
*e = Event{}
}
///*
// * Create a document object.
// */
//
//YAML_DECLARE(int)
//yaml_document_initialize(document *yaml_document_t,
// version_directive *yaml_version_directive_t,
// tag_directives_start *yaml_tag_directive_t,
// tag_directives_end *yaml_tag_directive_t,
// start_implicit int, end_implicit int)
//{
// struct {
// error yaml_error_type_t
// } context
// struct {
// start *yaml_node_t
// end *yaml_node_t
// top *yaml_node_t
// } nodes = { NULL, NULL, NULL }
// version_directive_copy *yaml_version_directive_t = NULL
// struct {
// start *yaml_tag_directive_t
// end *yaml_tag_directive_t
// top *yaml_tag_directive_t
// } tag_directives_copy = { NULL, NULL, NULL }
// value yaml_tag_directive_t = { NULL, NULL }
// mark yaml_mark_t = { 0, 0, 0 }
//
// assert(document) // Non-NULL document object is expected.
// assert((tag_directives_start && tag_directives_end) ||
// (tag_directives_start == tag_directives_end))
// // Valid tag directives are expected.
//
// if (!STACK_INIT(&context, nodes, INITIAL_STACK_SIZE)) goto error
//
// if (version_directive) {
// version_directive_copy = yaml_malloc(sizeof(yaml_version_directive_t))
// if (!version_directive_copy) goto error
// version_directive_copy.major = version_directive.major
// version_directive_copy.minor = version_directive.minor
// }
//
// if (tag_directives_start != tag_directives_end) {
// tag_directive *yaml_tag_directive_t
// if (!STACK_INIT(&context, tag_directives_copy, INITIAL_STACK_SIZE))
// goto error
// for (tag_directive = tag_directives_start
// tag_directive != tag_directives_end; tag_directive ++) {
// assert(tag_directive.handle)
// assert(tag_directive.prefix)
// if (!yaml_check_utf8(tag_directive.handle,
// strlen((char *)tag_directive.handle)))
// goto error
// if (!yaml_check_utf8(tag_directive.prefix,
// strlen((char *)tag_directive.prefix)))
// goto error
// value.handle = yaml_strdup(tag_directive.handle)
// value.prefix = yaml_strdup(tag_directive.prefix)
// if (!value.handle || !value.prefix) goto error
// if (!PUSH(&context, tag_directives_copy, value))
// goto error
// value.handle = NULL
// value.prefix = NULL
// }
// }
//
// DOCUMENT_INIT(*document, nodes.start, nodes.end, version_directive_copy,
// tag_directives_copy.start, tag_directives_copy.top,
// start_implicit, end_implicit, mark, mark)
//
// return 1
//
//error:
// STACK_DEL(&context, nodes)
// yaml_free(version_directive_copy)
// while (!STACK_EMPTY(&context, tag_directives_copy)) {
// value yaml_tag_directive_t = POP(&context, tag_directives_copy)
// yaml_free(value.handle)
// yaml_free(value.prefix)
// }
// STACK_DEL(&context, tag_directives_copy)
// yaml_free(value.handle)
// yaml_free(value.prefix)
//
// return 0
//}
//
///*
// * Destroy a document object.
// */
//
//YAML_DECLARE(void)
//yaml_document_delete(document *yaml_document_t)
//{
// struct {
// error yaml_error_type_t
// } context
// tag_directive *yaml_tag_directive_t
//
// context.error = YAML_NO_ERROR // Eliminate a compiler warning.
//
// assert(document) // Non-NULL document object is expected.
//
// while (!STACK_EMPTY(&context, document.nodes)) {
// node yaml_node_t = POP(&context, document.nodes)
// yaml_free(node.tag)
// switch (node.type) {
// case YAML_SCALAR_NODE:
// yaml_free(node.data.scalar.value)
// break
// case YAML_SEQUENCE_NODE:
// STACK_DEL(&context, node.data.sequence.items)
// break
// case YAML_MAPPING_NODE:
// STACK_DEL(&context, node.data.mapping.pairs)
// break
// default:
// assert(0) // Should not happen.
// }
// }
// STACK_DEL(&context, document.nodes)
//
// yaml_free(document.version_directive)
// for (tag_directive = document.tag_directives.start
// tag_directive != document.tag_directives.end
// tag_directive++) {
// yaml_free(tag_directive.handle)
// yaml_free(tag_directive.prefix)
// }
// yaml_free(document.tag_directives.start)
//
// memset(document, 0, sizeof(yaml_document_t))
//}
//
///**
// * Get a document node.
// */
//
//YAML_DECLARE(yaml_node_t *)
//yaml_document_get_node(document *yaml_document_t, index int)
//{
// assert(document) // Non-NULL document object is expected.
//
// if (index > 0 && document.nodes.start + index <= document.nodes.top) {
// return document.nodes.start + index - 1
// }
// return NULL
//}
//
///**
// * Get the root object.
// */
//
//YAML_DECLARE(yaml_node_t *)
//yaml_document_get_root_node(document *yaml_document_t)
//{
// assert(document) // Non-NULL document object is expected.
//
// if (document.nodes.top != document.nodes.start) {
// return document.nodes.start
// }
// return NULL
//}
//
///*
// * Add a scalar node to a document.
// */
//
//YAML_DECLARE(int)
//yaml_document_add_scalar(document *yaml_document_t,
// tag *yaml_char_t, value *yaml_char_t, length int,
// style yaml_scalar_style_t)
//{
// struct {
// error yaml_error_type_t
// } context
// mark yaml_mark_t = { 0, 0, 0 }
// tag_copy *yaml_char_t = NULL
// value_copy *yaml_char_t = NULL
// node yaml_node_t
//
// assert(document) // Non-NULL document object is expected.
// assert(value) // Non-NULL value is expected.
//
// if (!tag) {
// tag = (yaml_char_t *)YAML_DEFAULT_SCALAR_TAG
// }
//
// if (!yaml_check_utf8(tag, strlen((char *)tag))) goto error
// tag_copy = yaml_strdup(tag)
// if (!tag_copy) goto error
//
// if (length < 0) {
// length = strlen((char *)value)
// }
//
// if (!yaml_check_utf8(value, length)) goto error
// value_copy = yaml_malloc(length+1)
// if (!value_copy) goto error
// memcpy(value_copy, value, length)
// value_copy[length] = '\0'
//
// SCALAR_NODE_INIT(node, tag_copy, value_copy, length, style, mark, mark)
// if (!PUSH(&context, document.nodes, node)) goto error
//
// return document.nodes.top - document.nodes.start
//
//error:
// yaml_free(tag_copy)
// yaml_free(value_copy)
//
// return 0
//}
//
///*
// * Add a sequence node to a document.
// */
//
//YAML_DECLARE(int)
//yaml_document_add_sequence(document *yaml_document_t,
// tag *yaml_char_t, style yaml_sequence_style_t)
//{
// struct {
// error yaml_error_type_t
// } context
// mark yaml_mark_t = { 0, 0, 0 }
// tag_copy *yaml_char_t = NULL
// struct {
// start *yaml_node_item_t
// end *yaml_node_item_t
// top *yaml_node_item_t
// } items = { NULL, NULL, NULL }
// node yaml_node_t
//
// assert(document) // Non-NULL document object is expected.
//
// if (!tag) {
// tag = (yaml_char_t *)YAML_DEFAULT_SEQUENCE_TAG
// }
//
// if (!yaml_check_utf8(tag, strlen((char *)tag))) goto error
// tag_copy = yaml_strdup(tag)
// if (!tag_copy) goto error
//
// if (!STACK_INIT(&context, items, INITIAL_STACK_SIZE)) goto error
//
// SEQUENCE_NODE_INIT(node, tag_copy, items.start, items.end,
// style, mark, mark)
// if (!PUSH(&context, document.nodes, node)) goto error
//
// return document.nodes.top - document.nodes.start
//
//error:
// STACK_DEL(&context, items)
// yaml_free(tag_copy)
//
// return 0
//}
//
///*
// * Add a mapping node to a document.
// */
//
//YAML_DECLARE(int)
//yaml_document_add_mapping(document *yaml_document_t,
// tag *yaml_char_t, style yaml_mapping_style_t)
//{
// struct {
// error yaml_error_type_t
// } context
// mark yaml_mark_t = { 0, 0, 0 }
// tag_copy *yaml_char_t = NULL
// struct {
// start *yaml_node_pair_t
// end *yaml_node_pair_t
// top *yaml_node_pair_t
// } pairs = { NULL, NULL, NULL }
// node yaml_node_t
//
// assert(document) // Non-NULL document object is expected.
//
// if (!tag) {
// tag = (yaml_char_t *)YAML_DEFAULT_MAPPING_TAG
// }
//
// if (!yaml_check_utf8(tag, strlen((char *)tag))) goto error
// tag_copy = yaml_strdup(tag)
// if (!tag_copy) goto error
//
// if (!STACK_INIT(&context, pairs, INITIAL_STACK_SIZE)) goto error
//
// MAPPING_NODE_INIT(node, tag_copy, pairs.start, pairs.end,
// style, mark, mark)
// if (!PUSH(&context, document.nodes, node)) goto error
//
// return document.nodes.top - document.nodes.start
//
//error:
// STACK_DEL(&context, pairs)
// yaml_free(tag_copy)
//
// return 0
//}
//
///*
// * Append an item to a sequence node.
// */
//
//YAML_DECLARE(int)
//yaml_document_append_sequence_item(document *yaml_document_t,
// sequence int, item int)
//{
// struct {
// error yaml_error_type_t
// } context
//
// assert(document) // Non-NULL document is required.
// assert(sequence > 0
// && document.nodes.start + sequence <= document.nodes.top)
// // Valid sequence id is required.
// assert(document.nodes.start[sequence-1].type == YAML_SEQUENCE_NODE)
// // A sequence node is required.
// assert(item > 0 && document.nodes.start + item <= document.nodes.top)
// // Valid item id is required.
//
// if (!PUSH(&context,
// document.nodes.start[sequence-1].data.sequence.items, item))
// return 0
//
// return 1
//}
//
///*
// * Append a pair of a key and a value to a mapping node.
// */
//
//YAML_DECLARE(int)
//yaml_document_append_mapping_pair(document *yaml_document_t,
// mapping int, key int, value int)
//{
// struct {
// error yaml_error_type_t
// } context
//
// pair yaml_node_pair_t
//
// assert(document) // Non-NULL document is required.
// assert(mapping > 0
// && document.nodes.start + mapping <= document.nodes.top)
// // Valid mapping id is required.
// assert(document.nodes.start[mapping-1].type == YAML_MAPPING_NODE)
// // A mapping node is required.
// assert(key > 0 && document.nodes.start + key <= document.nodes.top)
// // Valid key id is required.
// assert(value > 0 && document.nodes.start + value <= document.nodes.top)
// // Valid value id is required.
//
// pair.key = key
// pair.value = value
//
// if (!PUSH(&context,
// document.nodes.start[mapping-1].data.mapping.pairs, pair))
// return 0
//
// return 1
//}
//
//
+362
View File
@@ -0,0 +1,362 @@
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// Composer stage: Builds a node tree from a libyaml event stream.
// Handles document structure, anchors, and comment attachment.
package libyaml
import (
"fmt"
"io"
)
// Composer produces a node tree out of a libyaml event stream.
type Composer struct {
Parser Parser
event Event
doc *Node
anchors map[string]*Node
doneInit bool
Textless bool
streamNodes bool // enable stream node emission
returnStream bool // flag to return stream node next
atStreamEnd bool // at stream end
encoding Encoding // stream encoding from STREAM_START
}
// NewComposer creates a new composer from a byte slice.
func NewComposer(b []byte) *Composer {
p := Composer{
Parser: NewParser(),
}
if len(b) == 0 {
b = []byte{'\n'}
}
p.Parser.SetInputString(b)
return &p
}
// NewComposerFromReader creates a new composer from an io.Reader.
func NewComposerFromReader(r io.Reader) *Composer {
p := Composer{
Parser: NewParser(),
}
p.Parser.SetInputReader(r)
return &p
}
func (c *Composer) init() {
if c.doneInit {
return
}
c.anchors = make(map[string]*Node)
// Peek to get the encoding from STREAM_START_EVENT
if c.peek() == STREAM_START_EVENT {
c.encoding = c.event.GetEncoding()
}
c.expect(STREAM_START_EVENT)
c.doneInit = true
// If stream nodes are enabled, prepare to return the first stream node
if c.streamNodes {
c.returnStream = true
}
}
func (c *Composer) Destroy() {
if c.event.Type != NO_EVENT {
c.event.Delete()
}
c.Parser.Delete()
}
// SetStreamNodes enables or disables stream node emission.
func (c *Composer) SetStreamNodes(enable bool) {
c.streamNodes = enable
}
// expect consumes an event from the event stream and
// checks that it's of the expected type.
func (c *Composer) expect(e EventType) {
if c.event.Type == NO_EVENT {
if err := c.Parser.Parse(&c.event); err != nil {
c.fail(err)
}
}
if c.event.Type == STREAM_END_EVENT {
failf("attempted to go past the end of stream; corrupted value?")
}
if c.event.Type != e {
c.fail(fmt.Errorf("expected %s event but got %s", e, c.event.Type))
}
c.event.Delete()
c.event.Type = NO_EVENT
}
// peek peeks at the next event in the event stream,
// puts the results into c.event and returns the event type.
func (c *Composer) peek() EventType {
if c.event.Type != NO_EVENT {
return c.event.Type
}
// It's curious choice from the underlying API to generally return a
// positive result on success, but on this case return true in an error
// scenario. This was the source of bugs in the past (issue #666).
if err := c.Parser.Parse(&c.event); err != nil {
c.fail(err)
}
return c.event.Type
}
func (c *Composer) fail(err error) {
Fail(err)
}
func (c *Composer) anchor(n *Node, anchor []byte) {
if anchor != nil {
n.Anchor = string(anchor)
c.anchors[n.Anchor] = n
}
}
// Parse parses the next YAML node from the event stream.
func (c *Composer) Parse() *Node {
c.init()
// Handle stream nodes if enabled
if c.streamNodes {
// Check for stream end first
if c.peek() == STREAM_END_EVENT {
// If we haven't returned the final stream node yet, return it now
if !c.atStreamEnd {
c.atStreamEnd = true
return c.createStreamNode()
}
// Already returned final stream node
return nil
}
// Check if we should return a stream node before the next document
if c.returnStream {
c.returnStream = false
n := c.createStreamNode()
// Capture directives from upcoming document
c.captureDirectives(n)
return n
}
}
switch c.peek() {
case SCALAR_EVENT:
return c.scalar()
case ALIAS_EVENT:
return c.alias()
case MAPPING_START_EVENT:
return c.mapping()
case SEQUENCE_START_EVENT:
return c.sequence()
case DOCUMENT_START_EVENT:
return c.document()
case STREAM_END_EVENT:
// Happens when attempting to decode an empty buffer (when not using stream nodes).
return nil
case TAIL_COMMENT_EVENT:
panic("internal error: unexpected tail comment event (please report)")
default:
panic("internal error: attempted to parse unknown event (please report): " + c.event.Type.String())
}
}
func (c *Composer) node(kind Kind, defaultTag, tag, value string) *Node {
var style Style
if tag != "" && tag != "!" {
// Normalize tag to short form (e.g., tag:yaml.org,2002:str -> !!str)
tag = shortTag(tag)
style = TaggedStyle
} else if defaultTag != "" {
tag = defaultTag
} else if kind == ScalarNode {
// Delegate to resolver to determine tag from value
tag, _ = resolve("", value)
}
n := &Node{
Kind: kind,
Tag: tag,
Value: value,
Style: style,
}
if !c.Textless {
n.Line = c.event.StartMark.Line + 1
n.Column = c.event.StartMark.Column + 1
n.HeadComment = string(c.event.HeadComment)
n.LineComment = string(c.event.LineComment)
n.FootComment = string(c.event.FootComment)
}
return n
}
func (c *Composer) parseChild(parent *Node) *Node {
child := c.Parse()
parent.Content = append(parent.Content, child)
return child
}
func (c *Composer) document() *Node {
n := c.node(DocumentNode, "", "", "")
c.doc = n
c.expect(DOCUMENT_START_EVENT)
c.parseChild(n)
if c.peek() == DOCUMENT_END_EVENT {
n.FootComment = string(c.event.FootComment)
}
c.expect(DOCUMENT_END_EVENT)
// If stream nodes enabled, prepare to return a stream node next
if c.streamNodes {
c.returnStream = true
}
return n
}
func (c *Composer) createStreamNode() *Node {
n := &Node{
Kind: StreamNode,
Encoding: c.encoding,
}
if !c.Textless && c.event.Type != NO_EVENT {
n.Line = c.event.StartMark.Line + 1
n.Column = c.event.StartMark.Column + 1
}
return n
}
// captureDirectives captures version and tag directives from upcoming DOCUMENT_START.
func (c *Composer) captureDirectives(n *Node) {
if c.peek() == DOCUMENT_START_EVENT {
if vd := c.event.GetVersionDirective(); vd != nil {
n.Version = &StreamVersionDirective{
Major: vd.Major(),
Minor: vd.Minor(),
}
}
if tds := c.event.GetTagDirectives(); len(tds) > 0 {
n.TagDirectives = make([]StreamTagDirective, len(tds))
for i, td := range tds {
n.TagDirectives[i] = StreamTagDirective{
Handle: td.GetHandle(),
Prefix: td.GetPrefix(),
}
}
}
}
}
func (c *Composer) alias() *Node {
n := c.node(AliasNode, "", "", string(c.event.Anchor))
n.Alias = c.anchors[n.Value]
if n.Alias == nil {
msg := fmt.Sprintf("unknown anchor '%s' referenced", n.Value)
Fail(&ParserError{
Message: msg,
Mark: Mark{
Line: n.Line,
Column: n.Column,
},
})
}
c.expect(ALIAS_EVENT)
return n
}
func (c *Composer) scalar() *Node {
parsedStyle := c.event.ScalarStyle()
var nodeStyle Style
switch {
case parsedStyle&DOUBLE_QUOTED_SCALAR_STYLE != 0:
nodeStyle = DoubleQuotedStyle
case parsedStyle&SINGLE_QUOTED_SCALAR_STYLE != 0:
nodeStyle = SingleQuotedStyle
case parsedStyle&LITERAL_SCALAR_STYLE != 0:
nodeStyle = LiteralStyle
case parsedStyle&FOLDED_SCALAR_STYLE != 0:
nodeStyle = FoldedStyle
}
nodeValue := string(c.event.Value)
nodeTag := string(c.event.Tag)
var defaultTag string
if nodeStyle != 0 {
defaultTag = strTag
}
n := c.node(ScalarNode, defaultTag, nodeTag, nodeValue)
n.Style |= nodeStyle
c.anchor(n, c.event.Anchor)
c.expect(SCALAR_EVENT)
return n
}
func (c *Composer) sequence() *Node {
n := c.node(SequenceNode, seqTag, string(c.event.Tag), "")
if c.event.SequenceStyle()&FLOW_SEQUENCE_STYLE != 0 {
n.Style |= FlowStyle
}
c.anchor(n, c.event.Anchor)
c.expect(SEQUENCE_START_EVENT)
for c.peek() != SEQUENCE_END_EVENT {
c.parseChild(n)
}
n.LineComment = string(c.event.LineComment)
n.FootComment = string(c.event.FootComment)
c.expect(SEQUENCE_END_EVENT)
return n
}
func (c *Composer) mapping() *Node {
n := c.node(MappingNode, mapTag, string(c.event.Tag), "")
block := true
if c.event.MappingStyle()&FLOW_MAPPING_STYLE != 0 {
block = false
n.Style |= FlowStyle
}
c.anchor(n, c.event.Anchor)
c.expect(MAPPING_START_EVENT)
for c.peek() != MAPPING_END_EVENT {
k := c.parseChild(n)
if block && k.FootComment != "" {
// Must be a foot comment for the prior value when being dedented.
if len(n.Content) > 2 {
n.Content[len(n.Content)-3].FootComment = k.FootComment
k.FootComment = ""
}
}
v := c.parseChild(n)
if k.FootComment == "" && v.FootComment != "" {
k.FootComment = v.FootComment
v.FootComment = ""
}
if c.peek() == TAIL_COMMENT_EVENT {
if k.FootComment == "" {
k.FootComment = string(c.event.FootComment)
}
c.expect(TAIL_COMMENT_EVENT)
}
}
n.LineComment = string(c.event.LineComment)
n.FootComment = string(c.event.FootComment)
if n.Style&FlowStyle == 0 && n.FootComment != "" && len(n.Content) > 1 {
n.Content[len(n.Content)-2].FootComment = n.FootComment
n.FootComment = ""
}
c.expect(MAPPING_END_EVENT)
return n
}
func Fail(err error) {
panic(&YAMLError{err})
}
func failf(format string, args ...any) {
panic(&YAMLError{fmt.Errorf("yaml: "+format, args...)})
}
File diff suppressed because it is too large Load Diff
+8
View File
@@ -0,0 +1,8 @@
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// Package libyaml contains internal helpers for working with YAML
//
// It's a reworked version of the original libyaml package from go-yaml v2/v3,
// adapted to work with Go specifications
package libyaml
File diff suppressed because it is too large Load Diff
+171
View File
@@ -0,0 +1,171 @@
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// Error types for YAML parsing and emitting.
// Provides structured error reporting with line/column information.
package libyaml
import (
"errors"
"fmt"
"strings"
)
type MarkedYAMLError struct {
// optional context
ContextMark Mark
ContextMessage string
Mark Mark
Message string
}
func (e MarkedYAMLError) Error() string {
var builder strings.Builder
builder.WriteString("yaml: ")
if len(e.ContextMessage) > 0 {
fmt.Fprintf(&builder, "%s at %s: ", e.ContextMessage, e.ContextMark)
}
if len(e.ContextMessage) == 0 || e.ContextMark != e.Mark {
fmt.Fprintf(&builder, "%s: ", e.Mark)
}
builder.WriteString(e.Message)
return builder.String()
}
type ParserError MarkedYAMLError
func (e ParserError) Error() string {
return MarkedYAMLError(e).Error()
}
type ScannerError MarkedYAMLError
func (e ScannerError) Error() string {
return MarkedYAMLError(e).Error()
}
type ReaderError struct {
Offset int
Value int
Err error
}
func (e ReaderError) Error() string {
return fmt.Sprintf("yaml: offset %d: %s", e.Offset, e.Err)
}
func (e ReaderError) Unwrap() error {
return e.Err
}
type EmitterError struct {
Message string
}
func (e EmitterError) Error() string {
return fmt.Sprintf("yaml: %s", e.Message)
}
type WriterError struct {
Err error
}
func (e WriterError) Error() string {
return fmt.Sprintf("yaml: %s", e.Err)
}
func (e WriterError) Unwrap() error {
return e.Err
}
// ConstructError represents a single, non-fatal error that occurred during
// the constructing of a YAML document into a Go value.
type ConstructError struct {
Err error
Line int
Column int
}
func (e *ConstructError) Error() string {
return fmt.Sprintf("line %d: %s", e.Line, e.Err.Error())
}
func (e *ConstructError) Unwrap() error {
return e.Err
}
// LoadErrors is returned when one or more fields cannot be properly decoded.
type LoadErrors struct {
Errors []*ConstructError
}
func (e *LoadErrors) Error() string {
var b strings.Builder
b.WriteString("yaml: construct errors:")
for _, err := range e.Errors {
b.WriteString("\n ")
b.WriteString(err.Error())
}
return b.String()
}
// As implements errors.As for Go versions prior to 1.20 that don't support
// the Unwrap() []error interface. It allows [LoadErrors] to match against
// *ConstructError targets by returning the first error in the list.
func (e *LoadErrors) As(target any) bool {
switch t := target.(type) {
case **ConstructError:
if len(e.Errors) == 0 {
return false
}
*t = e.Errors[0]
return true
case **TypeError:
var msgs []string
for _, err := range e.Errors {
msgs = append(msgs, err.Error())
}
*t = &TypeError{Errors: msgs}
return true
}
return false
}
// Is implements errors.Is for Go versions prior to 1.20 that don't support
// the Unwrap() []error interface. It checks if any wrapped error matches
// the target error.
func (e *LoadErrors) Is(target error) bool {
for _, err := range e.Errors {
if errors.Is(err, target) {
return true
}
}
return false
}
// TypeError is an obsolete error type retained for compatibility.
//
// A TypeError is returned by Unmarshal when one or more fields in
// the YAML document cannot be properly decoded into the requested
// types. When this error is returned, the value is still
// unmarshaled partially.
//
// Deprecated: Use [LoadErrors] instead.
type TypeError struct {
Errors []string
}
func (e *TypeError) Error() string {
return fmt.Sprintf("yaml: unmarshal errors:\n %s", strings.Join(e.Errors, "\n "))
}
// YAMLError is an internal error wrapper type.
type YAMLError struct {
Err error
}
func (e *YAMLError) Error() string {
return e.Err.Error()
}
+363
View File
@@ -0,0 +1,363 @@
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// Node types and constants for YAML tree representation.
// Defines Kind, Style, and Node structure for intermediate YAML representation.
package libyaml
import (
"reflect"
"strings"
"unicode"
"unicode/utf8"
)
// Tag constants for YAML types
const (
nullTag = "!!null"
boolTag = "!!bool"
strTag = "!!str"
intTag = "!!int"
floatTag = "!!float"
timestampTag = "!!timestamp"
seqTag = "!!seq"
mapTag = "!!map"
binaryTag = "!!binary"
mergeTag = "!!merge"
)
const longTagPrefix = "tag:yaml.org,2002:"
var (
longTags = make(map[string]string)
shortTags = make(map[string]string)
)
func init() {
for _, stag := range []string{nullTag, boolTag, strTag, intTag, floatTag, timestampTag, seqTag, mapTag, binaryTag, mergeTag} {
ltag := longTag(stag)
longTags[stag] = ltag
shortTags[ltag] = stag
}
}
func shortTag(tag string) string {
if strings.HasPrefix(tag, longTagPrefix) {
if stag, ok := shortTags[tag]; ok {
return stag
}
return "!!" + tag[len(longTagPrefix):]
}
return tag
}
func longTag(tag string) string {
if strings.HasPrefix(tag, "!!") {
if ltag, ok := longTags[tag]; ok {
return ltag
}
return longTagPrefix + tag[2:]
}
return tag
}
// Kind represents the type of YAML node
type Kind uint32
const (
DocumentNode Kind = 1 << iota
SequenceNode
MappingNode
ScalarNode
AliasNode
StreamNode
)
// Style represents the formatting style of a YAML node
type Style uint32
const (
TaggedStyle Style = 1 << iota
DoubleQuotedStyle
SingleQuotedStyle
LiteralStyle
FoldedStyle
FlowStyle
)
// StreamVersionDirective represents a YAML %YAML version directive for stream nodes.
type StreamVersionDirective struct {
Major int
Minor int
}
// StreamTagDirective represents a YAML %TAG directive for stream nodes.
type StreamTagDirective struct {
Handle string
Prefix string
}
// Node represents an element in the YAML document hierarchy. While documents
// are typically encoded and decoded into higher level types, such as structs
// and maps, Node is an intermediate representation that allows detailed
// control over the content being decoded or encoded.
//
// It's worth noting that although Node offers access into details such as
// line numbers, columns, and comments, the content when re-encoded will not
// have its original textual representation preserved. An effort is made to
// render the data pleasantly, and to preserve comments near the data they
// describe, though.
//
// Values that make use of the Node type interact with the yaml package in the
// same way any other type would do, by encoding and decoding yaml data
// directly or indirectly into them.
//
// For example:
//
// var person struct {
// Name string
// Address yaml.Node
// }
// err := yaml.Unmarshal(data, &person)
//
// Or by itself:
//
// var person Node
// err := yaml.Unmarshal(data, &person)
type Node struct {
// Kind defines whether the node is a document, a mapping, a sequence,
// a scalar value, or an alias to another node. The specific data type of
// scalar nodes may be obtained via the ShortTag and LongTag methods.
Kind Kind
// Style allows customizing the appearance of the node in the tree.
Style Style
// Tag holds the YAML tag defining the data type for the value.
// When decoding, this field will always be set to the resolved tag,
// even when it wasn't explicitly provided in the YAML content.
// When encoding, if this field is unset the value type will be
// implied from the node properties, and if it is set, it will only
// be serialized into the representation if TaggedStyle is used or
// the implicit tag diverges from the provided one.
Tag string
// Value holds the unescaped and unquoted representation of the value.
Value string
// Anchor holds the anchor name for this node, which allows aliases to point to it.
Anchor string
// Alias holds the node that this alias points to. Only valid when Kind is AliasNode.
Alias *Node
// Content holds contained nodes for documents, mappings, and sequences.
Content []*Node
// HeadComment holds any comments in the lines preceding the node and
// not separated by an empty line.
HeadComment string
// LineComment holds any comments at the end of the line where the node is in.
LineComment string
// FootComment holds any comments following the node and before empty lines.
FootComment string
// Line and Column hold the node position in the decoded YAML text.
// These fields are not respected when encoding the node.
Line int
Column int
// StreamNode-specific fields (only valid when Kind == StreamNode)
// Encoding holds the stream encoding (UTF-8, UTF-16LE, UTF-16BE).
// Only valid for StreamNode.
Encoding Encoding
// Version holds the YAML version directive (%YAML).
// Only valid for StreamNode.
Version *StreamVersionDirective
// TagDirectives holds the %TAG directives.
// Only valid for StreamNode.
TagDirectives []StreamTagDirective
}
// IsZero returns whether the node has all of its fields unset.
func (n *Node) IsZero() bool {
return n.Kind == 0 && n.Style == 0 && n.Tag == "" && n.Value == "" && n.Anchor == "" && n.Alias == nil && n.Content == nil &&
n.HeadComment == "" && n.LineComment == "" && n.FootComment == "" && n.Line == 0 && n.Column == 0 &&
n.Encoding == 0 && n.Version == nil && n.TagDirectives == nil
}
// LongTag returns the long form of the tag that indicates the data type for
// the node. If the Tag field isn't explicitly defined, one will be computed
// based on the node properties.
func (n *Node) LongTag() string {
return longTag(n.ShortTag())
}
// ShortTag returns the short form of the YAML tag that indicates data type for
// the node. If the Tag field isn't explicitly defined, one will be computed
// based on the node properties.
func (n *Node) ShortTag() string {
if n.indicatedString() {
return strTag
}
if n.Tag == "" || n.Tag == "!" {
switch n.Kind {
case MappingNode:
return mapTag
case SequenceNode:
return seqTag
case AliasNode:
if n.Alias != nil {
return n.Alias.ShortTag()
}
case ScalarNode:
return strTag
case 0:
// Special case to make the zero value convenient.
if n.IsZero() {
return nullTag
}
}
return ""
}
return shortTag(n.Tag)
}
func (n *Node) indicatedString() bool {
return n.Kind == ScalarNode &&
(shortTag(n.Tag) == strTag ||
(n.Tag == "" || n.Tag == "!") && n.Style&(SingleQuotedStyle|DoubleQuotedStyle|LiteralStyle|FoldedStyle) != 0)
}
// shouldUseLiteralStyle determines if a string should use literal style.
// It returns true if the string contains newlines AND meets additional criteria:
// - is at least 2 characters long
// - contains at least one non-whitespace character
func shouldUseLiteralStyle(s string) bool {
if !strings.Contains(s, "\n") || len(s) < 2 {
return false
}
// Must contain at least one non-whitespace character
for _, r := range s {
if !unicode.IsSpace(r) {
return true
}
}
return false
}
// SetString is a convenience function that sets the node to a string value
// and defines its style in a pleasant way depending on its content.
func (n *Node) SetString(s string) {
n.Kind = ScalarNode
if utf8.ValidString(s) {
n.Value = s
n.Tag = strTag
} else {
n.Value = encodeBase64(s)
n.Tag = binaryTag
}
if shouldUseLiteralStyle(n.Value) {
n.Style = LiteralStyle
}
}
// Decode decodes the node and stores its data into the value pointed to by v.
//
// See the documentation for Unmarshal for details about the
// conversion of YAML into a Go value.
func (n *Node) Decode(v any) (err error) {
d := NewConstructor(DefaultOptions)
defer handleErr(&err)
out := reflect.ValueOf(v)
if out.Kind() == reflect.Pointer && !out.IsNil() {
out = out.Elem()
}
d.Construct(n, out)
if len(d.TypeErrors) > 0 {
return &LoadErrors{Errors: d.TypeErrors}
}
return nil
}
// Load decodes the node and stores its data into the value pointed to by v,
// applying the given options.
//
// This method is useful when you need to preserve options like WithKnownFields()
// inside custom UnmarshalYAML implementations.
//
// Maps and pointers (to a struct, string, int, etc) are accepted as v
// values. If an internal pointer within a struct is not initialized,
// the yaml package will initialize it if necessary. The v parameter
// must not be nil.
//
// See the documentation of the package-level Load function for details
// about YAML to Go conversion and tag options.
func (n *Node) Load(v any, opts ...Option) (err error) {
defer handleErr(&err)
o, err := ApplyOptions(opts...)
if err != nil {
return err
}
d := NewConstructor(o)
out := reflect.ValueOf(v)
if out.Kind() == reflect.Pointer && !out.IsNil() {
out = out.Elem()
}
d.Construct(n, out)
if len(d.TypeErrors) > 0 {
return &LoadErrors{Errors: d.TypeErrors}
}
return nil
}
// Encode encodes value v and stores its representation in n.
//
// See the documentation for Marshal for details about the
// conversion of Go values into YAML.
func (n *Node) Encode(v any) (err error) {
defer handleErr(&err)
e := NewRepresenter(noWriter, DefaultOptions)
defer e.Destroy()
e.MarshalDoc("", reflect.ValueOf(v))
e.Finish()
p := NewComposer(e.Out)
p.Textless = true
defer p.Destroy()
doc := p.Parse()
*n = *doc.Content[0]
return nil
}
// Dump encodes value v and stores its representation in n,
// applying the given options.
//
// This method is useful when you need to apply specific encoding options
// while building Node trees programmatically.
//
// See the documentation for Marshal for details about the
// conversion of Go values into YAML.
func (n *Node) Dump(v any, opts ...Option) (err error) {
defer handleErr(&err)
o, err := ApplyOptions(opts...)
if err != nil {
return err
}
e := NewRepresenter(noWriter, o)
defer e.Destroy()
e.MarshalDoc("", reflect.ValueOf(v))
e.Finish()
p := NewComposer(e.Out)
p.Textless = true
defer p.Destroy()
doc := p.Parse()
*n = *doc.Content[0]
return nil
}
+390
View File
@@ -0,0 +1,390 @@
//
// Copyright (c) 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
//
// Options configuration for loading and dumping YAML.
// Provides centralized control for indentation, line width, strictness, and
// more.
package libyaml
import (
"errors"
"fmt"
)
// Options holds configuration for both loading and dumping YAML.
type Options struct {
// Loading options
KnownFields bool // Enforce known fields in structs
SingleDocument bool // Only load first document
UniqueKeys bool // Enforce unique keys in mappings
StreamNodes bool // Enable stream node emission
AllDocuments bool // Load/Dump all documents in multi-document streams
// Dumping options
Indent int // Indentation spaces (2-9)
CompactSeqIndent bool // Whether '- ' counts as indentation
LineWidth int // Preferred line width (-1 for unlimited)
Unicode bool // Allow non-ASCII characters
Canonical bool // Canonical YAML output
LineBreak LineBreak // Line ending style
ExplicitStart bool // Always emit ---
ExplicitEnd bool // Always emit ...
FlowSimpleCollections bool // Use flow style for simple collections
QuotePreference QuoteStyle // Preferred quote style when quoting is required
}
// Option allows configuring YAML loading and dumping operations.
type Option func(*Options) error
// WithIndent sets the number of spaces to use for indentation when
// dumping YAML content.
//
// Valid values are 2-9. Common choices: 2 (compact), 4 (readable).
func WithIndent(indent int) Option {
return func(o *Options) error {
if indent < 2 || indent > 9 {
return errors.New("yaml: indent must be between 2 and 9 spaces")
}
o.Indent = indent
return nil
}
}
// WithCompactSeqIndent configures whether the sequence indicator '- ' is
// considered part of the indentation when dumping YAML content.
//
// If compact is true, '- ' is treated as part of the indentation.
// If compact is false, '- ' is not treated as part of the indentation.
// When called without arguments, defaults to true.
func WithCompactSeqIndent(compact ...bool) Option {
if len(compact) > 1 {
return func(o *Options) error {
return errors.New("yaml: WithCompactSeqIndent accepts at most one argument")
}
}
val := len(compact) == 0 || compact[0]
return func(o *Options) error {
o.CompactSeqIndent = val
return nil
}
}
// WithKnownFields enables or disables strict field checking during YAML loading.
//
// When enabled, loading will return an error if the YAML input contains fields
// that do not correspond to any fields in the target struct.
// When called without arguments, defaults to true.
func WithKnownFields(knownFields ...bool) Option {
if len(knownFields) > 1 {
return func(o *Options) error {
return errors.New("yaml: WithKnownFields accepts at most one argument")
}
}
val := len(knownFields) == 0 || knownFields[0]
return func(o *Options) error {
o.KnownFields = val
return nil
}
}
// WithSingleDocument configures the Loader to only process the first document
// in a YAML stream. After the first document is loaded, subsequent calls to
// Load will return io.EOF.
//
// When called without arguments, defaults to true.
//
// This is useful when you expect exactly one document and want behavior
// similar to [Unmarshal].
func WithSingleDocument(singleDocument ...bool) Option {
if len(singleDocument) > 1 {
return func(o *Options) error {
return errors.New("yaml: WithSingleDocument accepts at most one argument")
}
}
val := len(singleDocument) == 0 || singleDocument[0]
return func(o *Options) error {
o.SingleDocument = val
return nil
}
}
// WithStreamNodes enables returning stream boundary nodes when loading YAML.
//
// When enabled, Loader.Load returns an interleaved sequence of StreamNode and
// DocumentNode values:
//
// [StreamNode, DocNode, StreamNode, DocNode, ..., StreamNode]
//
// StreamNodes contain metadata about the stream including:
// - Encoding (UTF-8, UTF-16LE, UTF-16BE)
// - YAML version directive (%YAML)
// - Tag directives (%TAG)
// - Position information (Line, Column)
//
// An empty YAML stream returns a single StreamNode.
// When called without arguments, defaults to true.
//
// The default is false.
func WithStreamNodes(enable ...bool) Option {
if len(enable) > 1 {
return func(o *Options) error {
return errors.New("yaml: WithStreamNodes accepts at most one argument")
}
}
val := len(enable) == 0 || enable[0]
return func(o *Options) error {
o.StreamNodes = val
return nil
}
}
// WithAllDocuments enables multi-document mode for Load and Dump operations.
//
// When used with Load, the target must be a pointer to a slice.
// All documents in the YAML stream will be decoded into the slice.
// Zero documents results in an empty slice (no error).
//
// When used with Dump, the input must be a slice.
// Each element will be encoded as a separate YAML document
// with "---" separators.
//
// When called without arguments, defaults to true.
//
// The default is false (single-document mode).
func WithAllDocuments(all ...bool) Option {
if len(all) > 1 {
return func(o *Options) error {
return errors.New("yaml: WithAllDocuments accepts at most one argument")
}
}
val := len(all) == 0 || all[0]
return func(o *Options) error {
o.AllDocuments = val
return nil
}
}
// WithLineWidth sets the preferred line width for YAML output.
//
// When encoding long strings, the encoder will attempt to wrap them at this
// width using literal block style (|). Set to -1 or 0 for unlimited width.
//
// The default is 80 characters.
func WithLineWidth(width int) Option {
return func(o *Options) error {
if width < 0 {
width = -1
}
o.LineWidth = width
return nil
}
}
// WithUnicode controls whether non-ASCII characters are allowed in YAML output.
//
// When true, non-ASCII characters appear as-is (e.g., "café").
// When false, non-ASCII characters are escaped (e.g., "caf\u00e9").
// When called without arguments, defaults to true.
//
// The default is true.
func WithUnicode(unicode ...bool) Option {
if len(unicode) > 1 {
return func(o *Options) error {
return errors.New("yaml: WithUnicode accepts at most one argument")
}
}
val := len(unicode) == 0 || unicode[0]
return func(o *Options) error {
o.Unicode = val
return nil
}
}
// WithUniqueKeys enables or disables duplicate key detection during YAML loading.
//
// When enabled, loading will return an error if the YAML input contains
// duplicate keys in any mapping. This is a security feature that prevents
// key override attacks.
// When called without arguments, defaults to true.
//
// The default is true.
func WithUniqueKeys(uniqueKeys ...bool) Option {
if len(uniqueKeys) > 1 {
return func(o *Options) error {
return errors.New("yaml: WithUniqueKeys accepts at most one argument")
}
}
val := len(uniqueKeys) == 0 || uniqueKeys[0]
return func(o *Options) error {
o.UniqueKeys = val
return nil
}
}
// WithCanonical forces canonical YAML output format.
//
// When enabled, the encoder outputs strictly canonical YAML with explicit
// tags for all values. This produces verbose output primarily useful for
// debugging and YAML spec compliance testing.
// When called without arguments, defaults to true.
//
// The default is false.
func WithCanonical(canonical ...bool) Option {
if len(canonical) > 1 {
return func(o *Options) error {
return errors.New("yaml: WithCanonical accepts at most one argument")
}
}
val := len(canonical) == 0 || canonical[0]
return func(o *Options) error {
o.Canonical = val
return nil
}
}
// WithLineBreak sets the line ending style for YAML output.
//
// Available options:
// - LineBreakLN: Unix-style \n (default)
// - LineBreakCR: Old Mac-style \r
// - LineBreakCRLN: Windows-style \r\n
//
// The default is LineBreakLN.
func WithLineBreak(lineBreak LineBreak) Option {
return func(o *Options) error {
o.LineBreak = lineBreak
return nil
}
}
// WithExplicitStart controls whether document start markers (---) are always emitted.
//
// When true, every document begins with an explicit "---" marker.
// When false (default), the marker is omitted for the first document.
// When called without arguments, defaults to true.
func WithExplicitStart(explicit ...bool) Option {
if len(explicit) > 1 {
return func(o *Options) error {
return errors.New("yaml: WithExplicitStart accepts at most one argument")
}
}
val := len(explicit) == 0 || explicit[0]
return func(o *Options) error {
o.ExplicitStart = val
return nil
}
}
// WithExplicitEnd controls whether document end markers (...) are always emitted.
//
// When true, every document ends with an explicit "..." marker.
// When false (default), the marker is omitted.
// When called without arguments, defaults to true.
func WithExplicitEnd(explicit ...bool) Option {
if len(explicit) > 1 {
return func(o *Options) error {
return errors.New("yaml: WithExplicitEnd accepts at most one argument")
}
}
val := len(explicit) == 0 || explicit[0]
return func(o *Options) error {
o.ExplicitEnd = val
return nil
}
}
// WithFlowSimpleCollections controls whether simple collections use flow style.
//
// When true, sequences and mappings containing only scalar values (no nested
// collections) are rendered in flow style if they fit within the line width.
// Example: {name: test, count: 42} or [a, b, c]
// When called without arguments, defaults to true.
//
// When false (default), all collections use block style.
func WithFlowSimpleCollections(flow ...bool) Option {
if len(flow) > 1 {
return func(o *Options) error {
return errors.New("yaml: WithFlowSimpleCollections accepts at most one argument")
}
}
val := len(flow) == 0 || flow[0]
return func(o *Options) error {
o.FlowSimpleCollections = val
return nil
}
}
// WithQuotePreference sets the preferred quote style for strings that require
// quoting.
//
// This option only affects strings that require quoting per the YAML spec.
// Plain strings that don't need quoting remain unquoted regardless of this
// setting. Quoting is required for:
// - Strings that look like other YAML types (true, false, null, 123, etc.)
// - Strings with leading/trailing whitespace
// - Strings containing special YAML syntax characters
// - Empty strings in certain contexts
//
// Quote styles:
// - QuoteSingle: Use single quotes (v4 default)
// - QuoteDouble: Use double quotes
// - QuoteLegacy: Legacy v2/v3 behavior (mixed quoting)
func WithQuotePreference(style QuoteStyle) Option {
return func(o *Options) error {
switch style {
case QuoteSingle, QuoteDouble, QuoteLegacy:
o.QuotePreference = style
return nil
default:
return fmt.Errorf("invalid QuoteStyle value: %d", style)
}
}
}
// CombineOptions combines multiple options into a single Option.
// This is useful for creating option presets or combining version defaults
// with custom options.
func CombineOptions(opts ...Option) Option {
return func(o *Options) error {
for _, opt := range opts {
if err := opt(o); err != nil {
return err
}
}
return nil
}
}
// ApplyOptions applies the given options to a new options struct.
// Starts with v4 defaults.
func ApplyOptions(opts ...Option) (*Options, error) {
o := &Options{
Canonical: false,
LineBreak: LN_BREAK,
// v4 defaults
Indent: 2,
CompactSeqIndent: true,
LineWidth: 80,
Unicode: true,
UniqueKeys: true,
}
for _, opt := range opts {
if err := opt(o); err != nil {
return nil, err
}
}
return o, nil
}
// DefaultOptions holds the default options for APIs that don't accept options.
var DefaultOptions = &Options{
Indent: 4,
LineWidth: -1,
Unicode: true,
UniqueKeys: true,
QuotePreference: QuoteLegacy,
}
File diff suppressed because it is too large Load Diff
+441
View File
@@ -0,0 +1,441 @@
// Copyright 2006-2010 Kirill Simonov
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0 AND MIT
// Input reader with encoding detection and buffering.
// Handles BOM detection, UTF-8/UTF-16 conversion, and provides buffered input
// for the scanner.
package libyaml
import (
"errors"
"fmt"
"io"
)
func formatReaderError(problem string, offset int, value int) error {
return ReaderError{
Offset: offset,
Value: value,
Err: errors.New(problem),
}
}
// Byte order marks.
const (
bom_UTF8 = "\xef\xbb\xbf"
bom_UTF16LE = "\xff\xfe"
bom_UTF16BE = "\xfe\xff"
)
// Determine the input stream encoding by checking the BOM symbol. If no BOM is
// found, the UTF-8 encoding is assumed. Return 1 on success, 0 on failure.
func (parser *Parser) determineEncoding() error {
// Ensure that we had enough bytes in the raw buffer.
for !parser.eof && len(parser.raw_buffer)-parser.raw_buffer_pos < 3 {
if err := parser.updateRawBuffer(); err != nil {
return err
}
}
// Determine the encoding.
buf := parser.raw_buffer
pos := parser.raw_buffer_pos
avail := len(buf) - pos
if avail >= 2 && buf[pos] == bom_UTF16LE[0] && buf[pos+1] == bom_UTF16LE[1] {
parser.encoding = UTF16LE_ENCODING
parser.raw_buffer_pos += 2
parser.offset += 2
} else if avail >= 2 && buf[pos] == bom_UTF16BE[0] && buf[pos+1] == bom_UTF16BE[1] {
parser.encoding = UTF16BE_ENCODING
parser.raw_buffer_pos += 2
parser.offset += 2
} else if avail >= 3 && buf[pos] == bom_UTF8[0] && buf[pos+1] == bom_UTF8[1] && buf[pos+2] == bom_UTF8[2] {
parser.encoding = UTF8_ENCODING
parser.raw_buffer_pos += 3
parser.offset += 3
} else {
parser.encoding = UTF8_ENCODING
}
return nil
}
// Update the raw buffer.
func (parser *Parser) updateRawBuffer() error {
size_read := 0
// Return if the raw buffer is full.
if parser.raw_buffer_pos == 0 && len(parser.raw_buffer) == cap(parser.raw_buffer) {
return nil
}
// Return on EOF.
if parser.eof {
return nil
}
// Move the remaining bytes in the raw buffer to the beginning.
if parser.raw_buffer_pos > 0 && parser.raw_buffer_pos < len(parser.raw_buffer) {
copy(parser.raw_buffer, parser.raw_buffer[parser.raw_buffer_pos:])
}
parser.raw_buffer = parser.raw_buffer[:len(parser.raw_buffer)-parser.raw_buffer_pos]
parser.raw_buffer_pos = 0
// Call the read handler to fill the buffer.
size_read, err := parser.read_handler(parser, parser.raw_buffer[len(parser.raw_buffer):cap(parser.raw_buffer)])
parser.raw_buffer = parser.raw_buffer[:len(parser.raw_buffer)+size_read]
if err == io.EOF {
parser.eof = true
} else if err != nil {
return ReaderError{
Offset: parser.offset,
Value: -1,
Err: fmt.Errorf("input error: %w", err),
}
}
return nil
}
// Ensure that the buffer contains at least `length` characters.
// Return true on success, false on failure.
//
// The length is supposed to be significantly less that the buffer size.
func (parser *Parser) updateBuffer(length int) error {
if parser.read_handler == nil {
panic("read handler must be set")
}
// [Go] This function was changed to guarantee the requested length size at EOF.
// The fact we need to do this is pretty awful, but the description above implies
// for that to be the case, and there are tests
// If the EOF flag is set and the raw buffer is empty, do nothing.
//
//nolint:staticcheck // there is no problem with this empty branch as it's documentation.
if parser.eof && parser.raw_buffer_pos == len(parser.raw_buffer) {
// [Go] ACTUALLY! Read the documentation of this function above.
// This is just broken. To return true, we need to have the
// given length in the buffer. Not doing that means every single
// check that calls this function to make sure the buffer has a
// given length is Go) panicking; or C) accessing invalid memory.
// return true
}
// Return if the buffer contains enough characters.
if parser.unread >= length {
return nil
}
// Determine the input encoding if it is not known yet.
if parser.encoding == ANY_ENCODING {
if err := parser.determineEncoding(); err != nil {
return err
}
}
// Move the unread characters to the beginning of the buffer.
buffer_len := len(parser.buffer)
if parser.buffer_pos > 0 && parser.buffer_pos < buffer_len {
copy(parser.buffer, parser.buffer[parser.buffer_pos:])
buffer_len -= parser.buffer_pos
parser.buffer_pos = 0
} else if parser.buffer_pos == buffer_len {
buffer_len = 0
parser.buffer_pos = 0
}
// Open the whole buffer for writing, and cut it before returning.
parser.buffer = parser.buffer[:cap(parser.buffer)]
// Fill the buffer until it has enough characters.
first := true
for parser.unread < length {
// Fill the raw buffer if necessary.
if !first || parser.raw_buffer_pos == len(parser.raw_buffer) {
if err := parser.updateRawBuffer(); err != nil {
parser.buffer = parser.buffer[:buffer_len]
return err
}
}
first = false
// Decode the raw buffer.
inner:
for parser.raw_buffer_pos != len(parser.raw_buffer) {
var value rune
var width int
raw_unread := len(parser.raw_buffer) - parser.raw_buffer_pos
// Decode the next character.
switch parser.encoding {
case UTF8_ENCODING:
// Decode a UTF-8 character. Check RFC 3629
// (http://www.ietf.org/rfc/rfc3629.txt) for more details.
//
// The following table (taken from the RFC) is used for
// decoding.
//
// Char. number range | UTF-8 octet sequence
// (hexadecimal) | (binary)
// --------------------+------------------------------------
// 0000 0000-0000 007F | 0xxxxxxx
// 0000 0080-0000 07FF | 110xxxxx 10xxxxxx
// 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
// 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
//
// Additionally, the characters in the range 0xD800-0xDFFF
// are prohibited as they are reserved for use with UTF-16
// surrogate pairs.
// Determine the length of the UTF-8 sequence.
octet := parser.raw_buffer[parser.raw_buffer_pos]
switch {
case octet&0x80 == 0x00:
width = 1
case octet&0xE0 == 0xC0:
width = 2
case octet&0xF0 == 0xE0:
width = 3
case octet&0xF8 == 0xF0:
width = 4
default:
// The leading octet is invalid.
return formatReaderError(
"invalid leading UTF-8 octet",
parser.offset, int(octet))
}
// Check if the raw buffer contains an incomplete character.
if width > raw_unread {
if parser.eof {
return formatReaderError(
"incomplete UTF-8 octet sequence",
parser.offset, -1)
}
break inner
}
// Decode the leading octet.
switch {
case octet&0x80 == 0x00:
value = rune(octet & 0x7F)
case octet&0xE0 == 0xC0:
value = rune(octet & 0x1F)
case octet&0xF0 == 0xE0:
value = rune(octet & 0x0F)
case octet&0xF8 == 0xF0:
value = rune(octet & 0x07)
default:
value = 0
}
// Check and decode the trailing octets.
for k := 1; k < width; k++ {
octet = parser.raw_buffer[parser.raw_buffer_pos+k]
// Check if the octet is valid.
if (octet & 0xC0) != 0x80 {
return formatReaderError(
"invalid trailing UTF-8 octet",
parser.offset+k, int(octet))
}
// Decode the octet.
value = (value << 6) + rune(octet&0x3F)
}
// Check the length of the sequence against the value.
switch {
case width == 1:
case width == 2 && value >= 0x80:
case width == 3 && value >= 0x800:
case width == 4 && value >= 0x10000:
default:
return formatReaderError(
"invalid length of a UTF-8 sequence",
parser.offset, -1)
}
// Check the range of the value.
if value >= 0xD800 && value <= 0xDFFF || value > 0x10FFFF {
return formatReaderError(
"invalid Unicode character",
parser.offset, int(value))
}
case UTF16LE_ENCODING, UTF16BE_ENCODING:
var low, high int
if parser.encoding == UTF16LE_ENCODING {
low, high = 0, 1
} else {
low, high = 1, 0
}
// The UTF-16 encoding is not as simple as one might
// naively think. Check RFC 2781
// (http://www.ietf.org/rfc/rfc2781.txt).
//
// Normally, two subsequent bytes describe a Unicode
// character. However a special technique (called a
// surrogate pair) is used for specifying character
// values larger than 0xFFFF.
//
// A surrogate pair consists of two pseudo-characters:
// high surrogate area (0xD800-0xDBFF)
// low surrogate area (0xDC00-0xDFFF)
//
// The following formulas are used for decoding
// and encoding characters using surrogate pairs:
//
// U = U' + 0x10000 (0x01 00 00 <= U <= 0x10 FF FF)
// U' = yyyyyyyyyyxxxxxxxxxx (0 <= U' <= 0x0F FF FF)
// W1 = 110110yyyyyyyyyy
// W2 = 110111xxxxxxxxxx
//
// where U is the character value, W1 is the high surrogate
// area, W2 is the low surrogate area.
// Check for incomplete UTF-16 character.
if raw_unread < 2 {
if parser.eof {
return formatReaderError(
"incomplete UTF-16 character",
parser.offset, -1)
}
break inner
}
// Get the character.
value = rune(parser.raw_buffer[parser.raw_buffer_pos+low]) +
(rune(parser.raw_buffer[parser.raw_buffer_pos+high]) << 8)
// Check for unexpected low surrogate area.
if value&0xFC00 == 0xDC00 {
return formatReaderError(
"unexpected low surrogate area",
parser.offset, int(value))
}
// Check for a high surrogate area.
if value&0xFC00 == 0xD800 {
width = 4
// Check for incomplete surrogate pair.
if raw_unread < 4 {
if parser.eof {
return formatReaderError(
"incomplete UTF-16 surrogate pair",
parser.offset, -1)
}
break inner
}
// Get the next character.
value2 := rune(parser.raw_buffer[parser.raw_buffer_pos+low+2]) +
(rune(parser.raw_buffer[parser.raw_buffer_pos+high+2]) << 8)
// Check for a low surrogate area.
if value2&0xFC00 != 0xDC00 {
return formatReaderError(
"expected low surrogate area",
parser.offset+2, int(value2))
}
// Generate the value of the surrogate pair.
value = 0x10000 + ((value & 0x3FF) << 10) + (value2 & 0x3FF)
} else {
width = 2
}
default:
panic("impossible")
}
// YAML 1.2 compatible character sets
// Check if the character is in the allowed range:
// For JSON compatibility in quoted scalars, we must allow all
// non-C0 characters. This includes ASCII DEL (0x7F) and the
// C1 control block [#x80-#x9F].
// ref: https://yaml.org/spec/1.2.2/#51-character-set
switch {
// 8 bit set
// Tab (\t)
case value == 0x09:
// Line feed (LF \n)
case value == 0x0A:
// Carriage Return (CR \r)
case value == 0x0D:
// 16 bit set
// Printable ASCII
case value >= 0x20 && value <= 0x7E:
// DEL, C1 control
// incompatible with YAML versions <= 1.1
case value >= 0x7F && value <= 0x9F:
// and Basic Multilingual Plane (BMP),
case value >= 0xA0 && value <= 0xD7FF:
// Additional Unicode Areas
case value >= 0xE000 && value <= 0xFFFD:
// 32 bit set
case value >= 0x10000 && value <= 0x10FFFF:
default:
return formatReaderError(
"control characters are not allowed",
parser.offset, int(value))
}
// Move the raw pointers.
parser.raw_buffer_pos += width
parser.offset += width
// Finally put the character into the buffer.
if value <= 0x7F {
// 0000 0000-0000 007F . 0xxxxxxx
parser.buffer[buffer_len+0] = byte(value)
buffer_len += 1
} else if value <= 0x7FF {
// 0000 0080-0000 07FF . 110xxxxx 10xxxxxx
parser.buffer[buffer_len+0] = byte(0xC0 + (value >> 6))
parser.buffer[buffer_len+1] = byte(0x80 + (value & 0x3F))
buffer_len += 2
} else if value <= 0xFFFF {
// 0000 0800-0000 FFFF . 1110xxxx 10xxxxxx 10xxxxxx
parser.buffer[buffer_len+0] = byte(0xE0 + (value >> 12))
parser.buffer[buffer_len+1] = byte(0x80 + ((value >> 6) & 0x3F))
parser.buffer[buffer_len+2] = byte(0x80 + (value & 0x3F))
buffer_len += 3
} else {
// 0001 0000-0010 FFFF . 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
parser.buffer[buffer_len+0] = byte(0xF0 + (value >> 18))
parser.buffer[buffer_len+1] = byte(0x80 + ((value >> 12) & 0x3F))
parser.buffer[buffer_len+2] = byte(0x80 + ((value >> 6) & 0x3F))
parser.buffer[buffer_len+3] = byte(0x80 + (value & 0x3F))
buffer_len += 4
}
parser.unread++
}
// On EOF, put NUL into the buffer and return.
if parser.eof {
parser.buffer[buffer_len] = 0
buffer_len++
parser.unread++
break
}
}
// [Go] Read the documentation of this function above. To return true,
// we need to have the given length in the buffer. Not doing that means
// every single check that calls this function to make sure the buffer
// has a given length is Go) panicking; or C) accessing invalid memory.
// This happens here due to the EOF above breaking early.
for buffer_len < length {
parser.buffer[buffer_len] = 0
buffer_len++
}
parser.buffer = parser.buffer[:buffer_len]
return nil
}
+571
View File
@@ -0,0 +1,571 @@
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// Representer stage: Converts Go values to YAML nodes.
// Handles marshaling from Go types to the intermediate node representation.
package libyaml
import (
"encoding"
"fmt"
"io"
"reflect"
"regexp"
"sort"
"strconv"
"strings"
"time"
"unicode"
"unicode/utf8"
)
type keyList []reflect.Value
func (l keyList) Len() int { return len(l) }
func (l keyList) Swap(i, j int) { l[i], l[j] = l[j], l[i] }
func (l keyList) Less(i, j int) bool {
a := l[i]
b := l[j]
ak := a.Kind()
bk := b.Kind()
for (ak == reflect.Interface || ak == reflect.Pointer) && !a.IsNil() {
a = a.Elem()
ak = a.Kind()
}
for (bk == reflect.Interface || bk == reflect.Pointer) && !b.IsNil() {
b = b.Elem()
bk = b.Kind()
}
af, aok := keyFloat(a)
bf, bok := keyFloat(b)
if aok && bok {
if af != bf {
return af < bf
}
if ak != bk {
return ak < bk
}
return numLess(a, b)
}
if ak != reflect.String || bk != reflect.String {
return ak < bk
}
ar, br := []rune(a.String()), []rune(b.String())
digits := false
for i := 0; i < len(ar) && i < len(br); i++ {
if ar[i] == br[i] {
digits = unicode.IsDigit(ar[i])
continue
}
al := unicode.IsLetter(ar[i])
bl := unicode.IsLetter(br[i])
if al && bl {
return ar[i] < br[i]
}
if al || bl {
if digits {
return al
} else {
return bl
}
}
var ai, bi int
var an, bn int64
if ar[i] == '0' || br[i] == '0' {
for j := i - 1; j >= 0 && unicode.IsDigit(ar[j]); j-- {
if ar[j] != '0' {
an = 1
bn = 1
break
}
}
}
for ai = i; ai < len(ar) && unicode.IsDigit(ar[ai]); ai++ {
an = an*10 + int64(ar[ai]-'0')
}
for bi = i; bi < len(br) && unicode.IsDigit(br[bi]); bi++ {
bn = bn*10 + int64(br[bi]-'0')
}
if an != bn {
return an < bn
}
if ai != bi {
return ai < bi
}
return ar[i] < br[i]
}
return len(ar) < len(br)
}
// keyFloat returns a float value for v if it is a number/bool
// and whether it is a number/bool or not.
func keyFloat(v reflect.Value) (f float64, ok bool) {
switch v.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return float64(v.Int()), true
case reflect.Float32, reflect.Float64:
return v.Float(), true
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
return float64(v.Uint()), true
case reflect.Bool:
if v.Bool() {
return 1, true
}
return 0, true
}
return 0, false
}
// numLess returns whether a < b.
// a and b must necessarily have the same kind.
func numLess(a, b reflect.Value) bool {
switch a.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return a.Int() < b.Int()
case reflect.Float32, reflect.Float64:
return a.Float() < b.Float()
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
return a.Uint() < b.Uint()
case reflect.Bool:
return !a.Bool() && b.Bool()
}
panic("not a number")
}
// Sentinel values for newRepresenter parameters.
// These provide clarity at call sites, similar to http.NoBody.
var (
noWriter io.Writer = nil
noVersionDirective *VersionDirective = nil
noTagDirective []TagDirective = nil
)
type Representer struct {
Emitter Emitter
Out []byte
flow bool
Indent int
lineWidth int
doneInit bool
explicitStart bool
explicitEnd bool
flowSimpleCollections bool
quotePreference QuoteStyle
}
// NewRepresenter creates a new YAML representr with the given options.
//
// The writer parameter specifies the output destination for the representr.
// If writer is nil, the representr will write to an internal buffer.
func NewRepresenter(writer io.Writer, opts *Options) *Representer {
emitter := NewEmitter()
emitter.CompactSequenceIndent = opts.CompactSeqIndent
emitter.quotePreference = opts.QuotePreference
emitter.SetWidth(opts.LineWidth)
emitter.SetUnicode(opts.Unicode)
emitter.SetCanonical(opts.Canonical)
emitter.SetLineBreak(opts.LineBreak)
r := &Representer{
Emitter: emitter,
Indent: opts.Indent,
lineWidth: opts.LineWidth,
explicitStart: opts.ExplicitStart,
explicitEnd: opts.ExplicitEnd,
flowSimpleCollections: opts.FlowSimpleCollections,
quotePreference: opts.QuotePreference,
}
if writer != nil {
r.Emitter.SetOutputWriter(writer)
} else {
r.Emitter.SetOutputString(&r.Out)
}
return r
}
func (r *Representer) init() {
if r.doneInit {
return
}
if r.Indent == 0 {
r.Indent = 4
}
r.Emitter.BestIndent = r.Indent
r.emit(NewStreamStartEvent(UTF8_ENCODING))
r.doneInit = true
}
func (r *Representer) Finish() {
r.Emitter.OpenEnded = false
r.emit(NewStreamEndEvent())
}
func (r *Representer) Destroy() {
r.Emitter.Delete()
}
func (r *Representer) emit(event Event) {
// This will internally delete the event value.
r.must(r.Emitter.Emit(&event))
}
func (r *Representer) must(err error) {
if err != nil {
msg := err.Error()
if msg == "" {
msg = "unknown problem generating YAML content"
}
failf("%s", msg)
}
}
func (r *Representer) MarshalDoc(tag string, in reflect.Value) {
r.init()
var node *Node
if in.IsValid() {
node, _ = in.Interface().(*Node)
}
if node != nil && node.Kind == DocumentNode {
r.nodev(in)
} else {
// Use !explicitStart for implicit flag (true = implicit/no marker)
r.emit(NewDocumentStartEvent(noVersionDirective, noTagDirective, !r.explicitStart))
r.marshal(tag, in)
// Use !explicitEnd for implicit flag
r.emit(NewDocumentEndEvent(!r.explicitEnd))
}
}
func (r *Representer) marshal(tag string, in reflect.Value) {
tag = shortTag(tag)
if !in.IsValid() || in.Kind() == reflect.Pointer && in.IsNil() {
r.nilv()
return
}
iface := in.Interface()
switch value := iface.(type) {
case *Node:
r.nodev(in)
return
case Node:
if !in.CanAddr() {
n := reflect.New(in.Type()).Elem()
n.Set(in)
in = n
}
r.nodev(in.Addr())
return
case time.Time:
r.timev(tag, in)
return
case *time.Time:
r.timev(tag, in.Elem())
return
case time.Duration:
r.stringv(tag, reflect.ValueOf(value.String()))
return
case Marshaler:
v, err := value.MarshalYAML()
if err != nil {
Fail(err)
}
if v == nil {
r.nilv()
return
}
r.marshal(tag, reflect.ValueOf(v))
return
case encoding.TextMarshaler:
text, err := value.MarshalText()
if err != nil {
Fail(err)
}
in = reflect.ValueOf(string(text))
case nil:
r.nilv()
return
}
switch in.Kind() {
case reflect.Interface:
r.marshal(tag, in.Elem())
case reflect.Map:
r.mapv(tag, in)
case reflect.Pointer:
r.marshal(tag, in.Elem())
case reflect.Struct:
r.structv(tag, in)
case reflect.Slice, reflect.Array:
r.slicev(tag, in)
case reflect.String:
r.stringv(tag, in)
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
r.intv(tag, in)
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
r.uintv(tag, in)
case reflect.Float32, reflect.Float64:
r.floatv(tag, in)
case reflect.Bool:
r.boolv(tag, in)
default:
panic("cannot marshal type: " + in.Type().String())
}
}
func (r *Representer) mapv(tag string, in reflect.Value) {
r.mappingv(tag, func() {
keys := keyList(in.MapKeys())
sort.Sort(keys)
for _, k := range keys {
r.marshal("", k)
r.marshal("", in.MapIndex(k))
}
})
}
func (r *Representer) fieldByIndex(v reflect.Value, index []int) (field reflect.Value) {
for _, num := range index {
for {
if v.Kind() == reflect.Pointer {
if v.IsNil() {
return reflect.Value{}
}
v = v.Elem()
continue
}
break
}
v = v.Field(num)
}
return v
}
func (r *Representer) structv(tag string, in reflect.Value) {
sinfo, err := getStructInfo(in.Type())
if err != nil {
panic(err)
}
r.mappingv(tag, func() {
for _, info := range sinfo.FieldsList {
var value reflect.Value
if info.Inline == nil {
value = in.Field(info.Num)
} else {
value = r.fieldByIndex(in, info.Inline)
if !value.IsValid() {
continue
}
}
if info.OmitEmpty && isZero(value) {
continue
}
r.marshal("", reflect.ValueOf(info.Key))
r.flow = info.Flow
r.marshal("", value)
}
if sinfo.InlineMap >= 0 {
m := in.Field(sinfo.InlineMap)
if m.Len() > 0 {
r.flow = false
keys := keyList(m.MapKeys())
sort.Sort(keys)
for _, k := range keys {
if _, found := sinfo.FieldsMap[k.String()]; found {
panic(fmt.Sprintf("cannot have key %q in inlined map: conflicts with struct field", k.String()))
}
r.marshal("", k)
r.flow = false
r.marshal("", m.MapIndex(k))
}
}
}
})
}
func (r *Representer) mappingv(tag string, f func()) {
implicit := tag == ""
style := BLOCK_MAPPING_STYLE
if r.flow {
r.flow = false
style = FLOW_MAPPING_STYLE
}
r.emit(NewMappingStartEvent(nil, []byte(tag), implicit, style))
f()
r.emit(NewMappingEndEvent())
}
func (r *Representer) slicev(tag string, in reflect.Value) {
implicit := tag == ""
style := BLOCK_SEQUENCE_STYLE
if r.flow {
r.flow = false
style = FLOW_SEQUENCE_STYLE
}
r.emit(NewSequenceStartEvent(nil, []byte(tag), implicit, style))
n := in.Len()
for i := 0; i < n; i++ {
r.marshal("", in.Index(i))
}
r.emit(NewSequenceEndEvent())
}
// isBase60 returns whether s is in base 60 notation as defined in YAML 1.1.
//
// The base 60 float notation in YAML 1.1 is a terrible idea and is unsupported
// in YAML 1.2 and by this package, but these should be marshaled quoted for
// the time being for compatibility with other parsers.
func isBase60Float(s string) (result bool) {
// Fast path.
if s == "" {
return false
}
c := s[0]
if !(c == '+' || c == '-' || c >= '0' && c <= '9') || strings.IndexByte(s, ':') < 0 {
return false
}
// Do the full match.
return base60float.MatchString(s)
}
// From http://yaml.org/type/float.html, except the regular expression there
// is bogus. In practice parsers do not enforce the "\.[0-9_]*" suffix.
var base60float = regexp.MustCompile(`^[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+(?:\.[0-9_]*)?$`)
// isOldBool returns whether s is bool notation as defined in YAML 1.1.
//
// We continue to force strings that YAML 1.1 would interpret as booleans to be
// rendered as quotes strings so that the marshaled output valid for YAML 1.1
// parsing.
func isOldBool(s string) (result bool) {
switch s {
case "y", "Y", "yes", "Yes", "YES", "on", "On", "ON",
"n", "N", "no", "No", "NO", "off", "Off", "OFF":
return true
default:
return false
}
}
// looksLikeMerge returns true if the given string is the merge indicator "<<".
//
// When encoding a scalar with this exact value, it must be quoted to prevent it
// from being interpreted as a merge indicator during decoding.
func looksLikeMerge(s string) (result bool) {
return s == "<<"
}
func (r *Representer) stringv(tag string, in reflect.Value) {
var style ScalarStyle
s := in.String()
canUsePlain := true
switch {
case !utf8.ValidString(s):
if tag == binaryTag {
failf("explicitly tagged !!binary data must be base64-encoded")
}
if tag != "" {
failf("cannot marshal invalid UTF-8 data as %s", shortTag(tag))
}
// It can't be represented directly as YAML so use a binary tag
// and represent it as base64.
tag = binaryTag
s = encodeBase64(s)
case tag == "":
// Check to see if it would resolve to a specific
// tag when represented unquoted. If it doesn't,
// there's no need to quote it.
rtag, _ := resolve("", s)
canUsePlain = rtag == strTag &&
!(isBase60Float(s) ||
isOldBool(s) ||
looksLikeMerge(s))
}
// Note: it's possible for user code to emit invalid YAML
// if they explicitly specify a tag and a string containing
// text that's incompatible with that tag.
switch {
case strings.Contains(s, "\n"):
if r.flow || !shouldUseLiteralStyle(s) {
style = DOUBLE_QUOTED_SCALAR_STYLE
} else {
style = LITERAL_SCALAR_STYLE
}
case canUsePlain:
style = PLAIN_SCALAR_STYLE
default:
style = r.quotePreference.ScalarStyle()
}
r.emitScalar(s, "", tag, style, nil, nil, nil, nil)
}
func (r *Representer) boolv(tag string, in reflect.Value) {
var s string
if in.Bool() {
s = "true"
} else {
s = "false"
}
r.emitScalar(s, "", tag, PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) intv(tag string, in reflect.Value) {
s := strconv.FormatInt(in.Int(), 10)
r.emitScalar(s, "", tag, PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) uintv(tag string, in reflect.Value) {
s := strconv.FormatUint(in.Uint(), 10)
r.emitScalar(s, "", tag, PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) timev(tag string, in reflect.Value) {
t := in.Interface().(time.Time)
s := t.Format(time.RFC3339Nano)
r.emitScalar(s, "", tag, PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) floatv(tag string, in reflect.Value) {
// Issue #352: When formatting, use the precision of the underlying value
precision := 64
if in.Kind() == reflect.Float32 {
precision = 32
}
s := strconv.FormatFloat(in.Float(), 'g', -1, precision)
switch s {
case "+Inf":
s = ".inf"
case "-Inf":
s = "-.inf"
case "NaN":
s = ".nan"
}
r.emitScalar(s, "", tag, PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) nilv() {
r.emitScalar("null", "", "", PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) emitScalar(
value, anchor, tag string, style ScalarStyle, head, line, foot, tail []byte,
) {
// TODO Kill this function. Replace all initialize calls by their underlining Go literals.
implicit := tag == ""
if !implicit {
tag = longTag(tag)
}
event := NewScalarEvent([]byte(anchor), []byte(tag), []byte(value), implicit, implicit, style)
event.HeadComment = head
event.LineComment = line
event.FootComment = foot
event.TailComment = tail
r.emit(event)
}
func (r *Representer) nodev(in reflect.Value) {
r.node(in.Interface().(*Node), "")
}
+231
View File
@@ -0,0 +1,231 @@
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// Tag resolution for YAML scalars.
// Determines implicit types (int, float, bool, null, timestamp) from untagged
// scalar values.
package libyaml
import (
"encoding/base64"
"math"
"regexp"
"strconv"
"strings"
"time"
)
type resolveMapItem struct {
value any
tag string
}
var (
resolveTable = make([]byte, 256)
resolveMap = make(map[string]resolveMapItem)
)
// negativeZero represents -0.0 for YAML encoding/decoding
// this is needed because Go constants cannot express -0.0
// https://staticcheck.dev/docs/checks/#SA4026
var negativeZero = math.Copysign(0.0, -1.0)
func init() {
t := resolveTable
t[int('+')] = 'S' // Sign
t[int('-')] = 'S'
for _, c := range "0123456789" {
t[int(c)] = 'D' // Digit
}
for _, c := range "yYnNtTfFoO~<" { // < for merge key <<
t[int(c)] = 'M' // In map
}
t[int('.')] = '.' // Float (potentially in map)
resolveMapList := []struct {
v any
tag string
l []string
}{
{true, boolTag, []string{"true", "True", "TRUE"}},
{false, boolTag, []string{"false", "False", "FALSE"}},
{nil, nullTag, []string{"", "~", "null", "Null", "NULL"}},
{math.NaN(), floatTag, []string{".nan", ".NaN", ".NAN"}},
{math.Inf(+1), floatTag, []string{".inf", ".Inf", ".INF"}},
{math.Inf(+1), floatTag, []string{"+.inf", "+.Inf", "+.INF"}},
{math.Inf(-1), floatTag, []string{"-.inf", "-.Inf", "-.INF"}},
{negativeZero, floatTag, []string{"-0", "-0.0"}},
{"<<", mergeTag, []string{"<<"}},
}
m := resolveMap
for _, item := range resolveMapList {
for _, s := range item.l {
m[s] = resolveMapItem{item.v, item.tag}
}
}
}
func resolvableTag(tag string) bool {
switch tag {
case "", strTag, boolTag, intTag, floatTag, nullTag, timestampTag:
return true
}
return false
}
var yamlStyleFloat = regexp.MustCompile(`^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$`)
func resolve(tag string, in string) (rtag string, out any) {
tag = shortTag(tag)
if !resolvableTag(tag) {
return tag, in
}
defer func() {
switch tag {
case "", rtag, strTag, binaryTag:
return
case floatTag:
if rtag == intTag {
switch v := out.(type) {
case int64:
rtag = floatTag
out = float64(v)
return
case int:
rtag = floatTag
out = float64(v)
return
}
}
}
failf("cannot construct %s `%s` as a %s", shortTag(rtag), in, shortTag(tag))
}()
// Any data is accepted as a !!str or !!binary.
// Otherwise, the prefix is enough of a hint about what it might be.
hint := byte('N')
if in != "" {
hint = resolveTable[in[0]]
}
if hint != 0 && tag != strTag && tag != binaryTag {
// Handle things we can lookup in a map.
if item, ok := resolveMap[in]; ok {
return item.tag, item.value
}
// Base 60 floats are a bad idea, were dropped in YAML 1.2, and
// are purposefully unsupported here. They're still quoted on
// the way out for compatibility with other parser, though.
switch hint {
case 'M':
// We've already checked the map above.
case '.':
// Not in the map, so maybe a normal float.
floatv, err := strconv.ParseFloat(in, 64)
if err == nil {
return floatTag, floatv
}
case 'D', 'S':
// Int, float, or timestamp.
// Only try values as a timestamp if the value is unquoted or there's an explicit
// !!timestamp tag.
if tag == "" || tag == timestampTag {
t, ok := parseTimestamp(in)
if ok {
return timestampTag, t
}
}
plain := strings.ReplaceAll(in, "_", "")
intv, err := strconv.ParseInt(plain, 0, 64)
if err == nil {
if intv == int64(int(intv)) {
return intTag, int(intv)
} else {
return intTag, intv
}
}
uintv, err := strconv.ParseUint(plain, 0, 64)
if err == nil {
return intTag, uintv
}
if yamlStyleFloat.MatchString(plain) {
floatv, err := strconv.ParseFloat(plain, 64)
if err == nil {
return floatTag, floatv
}
}
default:
panic("internal error: missing handler for resolver table: " + string(rune(hint)) + " (with " + in + ")")
}
}
return strTag, in
}
// encodeBase64 encodes s as base64 that is broken up into multiple lines
// as appropriate for the resulting length.
func encodeBase64(s string) string {
const lineLen = 70
encLen := base64.StdEncoding.EncodedLen(len(s))
lines := encLen/lineLen + 1
buf := make([]byte, encLen*2+lines)
in := buf[0:encLen]
out := buf[encLen:]
base64.StdEncoding.Encode(in, []byte(s))
k := 0
for i := 0; i < len(in); i += lineLen {
j := i + lineLen
if j > len(in) {
j = len(in)
}
k += copy(out[k:], in[i:j])
if lines > 1 {
out[k] = '\n'
k++
}
}
return string(out[:k])
}
// This is a subset of the formats allowed by the regular expression
// defined at http://yaml.org/type/timestamp.html.
var allowedTimestampFormats = []string{
"2006-1-2T15:4:5.999999999Z07:00", // RCF3339Nano with short date fields.
"2006-1-2t15:4:5.999999999Z07:00", // RFC3339Nano with short date fields and lower-case "t".
"2006-1-2 15:4:5.999999999", // space separated with no time zone
"2006-1-2", // date only
// Notable exception: time.Parse cannot handle: "2001-12-14 21:59:43.10 -5"
// from the set of examples.
}
// parseTimestamp parses s as a timestamp string and
// returns the timestamp and reports whether it succeeded.
// Timestamp formats are defined at http://yaml.org/type/timestamp.html
func parseTimestamp(s string) (time.Time, bool) {
// TODO write code to check all the formats supported by
// http://yaml.org/type/timestamp.html instead of using time.Parse.
// Quick check: all date formats start with YYYY-.
i := 0
for ; i < len(s); i++ {
if c := s[i]; c < '0' || c > '9' {
break
}
}
if i != 4 || i == len(s) || s[i] != '-' {
return time.Time{}, false
}
for _, format := range allowedTimestampFormats {
if t, err := time.Parse(format, s); err == nil {
return t, true
}
}
return time.Time{}, false
}
File diff suppressed because it is too large Load Diff
+219
View File
@@ -0,0 +1,219 @@
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// Serializer stage: Converts representation tree (Nodes) to event stream.
// Walks the node tree and produces events for the emitter.
package libyaml
import (
"strings"
"unicode/utf8"
)
// node serializes a Node tree into YAML events.
// This is the core of the serializer stage - it walks the tree and produces events.
func (r *Representer) node(node *Node, tail string) {
// Zero nodes behave as nil.
if node.Kind == 0 && node.IsZero() {
r.nilv()
return
}
// If the tag was not explicitly requested, and dropping it won't change the
// implicit tag of the value, don't include it in the presentation.
tag := node.Tag
stag := shortTag(tag)
var forceQuoting bool
if tag != "" && node.Style&TaggedStyle == 0 {
if node.Kind == ScalarNode {
if stag == strTag && node.Style&(SingleQuotedStyle|DoubleQuotedStyle|LiteralStyle|FoldedStyle) != 0 {
tag = ""
} else {
rtag, _ := resolve("", node.Value)
if rtag == stag && stag != mergeTag {
tag = ""
} else if stag == strTag {
tag = ""
forceQuoting = true
}
}
} else {
var rtag string
switch node.Kind {
case MappingNode:
rtag = mapTag
case SequenceNode:
rtag = seqTag
}
if rtag == stag {
tag = ""
}
}
}
switch node.Kind {
case DocumentNode:
event := NewDocumentStartEvent(noVersionDirective, noTagDirective, !r.explicitStart)
event.HeadComment = []byte(node.HeadComment)
r.emit(event)
for _, node := range node.Content {
r.node(node, "")
}
event = NewDocumentEndEvent(!r.explicitEnd)
event.FootComment = []byte(node.FootComment)
r.emit(event)
case SequenceNode:
style := BLOCK_SEQUENCE_STYLE
// Use flow style if explicitly requested or if it's a simple
// collection (scalar-only contents that fit within line width,
// enabled via WithFlowSimpleCollections)
if node.Style&FlowStyle != 0 || r.isSimpleCollection(node) {
style = FLOW_SEQUENCE_STYLE
}
event := NewSequenceStartEvent([]byte(node.Anchor), []byte(longTag(tag)), tag == "", style)
event.HeadComment = []byte(node.HeadComment)
r.emit(event)
for _, node := range node.Content {
r.node(node, "")
}
event = NewSequenceEndEvent()
event.LineComment = []byte(node.LineComment)
event.FootComment = []byte(node.FootComment)
r.emit(event)
case MappingNode:
style := BLOCK_MAPPING_STYLE
// Use flow style if explicitly requested or if it's a simple
// collection (scalar-only contents that fit within line width,
// enabled via WithFlowSimpleCollections)
if node.Style&FlowStyle != 0 || r.isSimpleCollection(node) {
style = FLOW_MAPPING_STYLE
}
event := NewMappingStartEvent([]byte(node.Anchor), []byte(longTag(tag)), tag == "", style)
event.TailComment = []byte(tail)
event.HeadComment = []byte(node.HeadComment)
r.emit(event)
// The tail logic below moves the foot comment of prior keys to the following key,
// since the value for each key may be a nested structure and the foot needs to be
// processed only the entirety of the value is streamed. The last tail is processed
// with the mapping end event.
var tail string
for i := 0; i+1 < len(node.Content); i += 2 {
k := node.Content[i]
foot := k.FootComment
if foot != "" {
kopy := *k
kopy.FootComment = ""
k = &kopy
}
r.node(k, tail)
tail = foot
v := node.Content[i+1]
r.node(v, "")
}
event = NewMappingEndEvent()
event.TailComment = []byte(tail)
event.LineComment = []byte(node.LineComment)
event.FootComment = []byte(node.FootComment)
r.emit(event)
case AliasNode:
event := NewAliasEvent([]byte(node.Value))
event.HeadComment = []byte(node.HeadComment)
event.LineComment = []byte(node.LineComment)
event.FootComment = []byte(node.FootComment)
r.emit(event)
case ScalarNode:
value := node.Value
if !utf8.ValidString(value) {
if stag == binaryTag {
failf("explicitly tagged !!binary data must be base64-encoded")
}
if stag != "" {
failf("cannot marshal invalid UTF-8 data as %s", stag)
}
// It can't be represented directly as YAML so use a binary tag
// and represent it as base64.
tag = binaryTag
value = encodeBase64(value)
}
style := PLAIN_SCALAR_STYLE
switch {
case node.Style&DoubleQuotedStyle != 0:
style = DOUBLE_QUOTED_SCALAR_STYLE
case node.Style&SingleQuotedStyle != 0:
style = SINGLE_QUOTED_SCALAR_STYLE
case node.Style&LiteralStyle != 0:
style = LITERAL_SCALAR_STYLE
case node.Style&FoldedStyle != 0:
style = FOLDED_SCALAR_STYLE
case strings.Contains(value, "\n"):
style = LITERAL_SCALAR_STYLE
case forceQuoting:
style = r.quotePreference.ScalarStyle()
}
r.emitScalar(value, node.Anchor, tag, style, []byte(node.HeadComment), []byte(node.LineComment), []byte(node.FootComment), []byte(tail))
default:
failf("cannot represent node with unknown kind %d", node.Kind)
}
}
// isSimpleCollection checks if a node contains only scalar values and would
// fit within the line width when rendered in flow style.
func (r *Representer) isSimpleCollection(node *Node) bool {
if !r.flowSimpleCollections {
return false
}
if node.Kind != SequenceNode && node.Kind != MappingNode {
return false
}
// Check all children are scalars
for _, child := range node.Content {
if child.Kind != ScalarNode {
return false
}
}
// Estimate flow style length
estimatedLen := r.estimateFlowLength(node)
width := r.lineWidth
if width <= 0 {
width = 80 // Default width if not set
}
return estimatedLen > 0 && estimatedLen <= width
}
// estimateFlowLength estimates the character length of a node in flow style.
func (r *Representer) estimateFlowLength(node *Node) int {
if node.Kind == SequenceNode {
// [item1, item2, ...] = 2 + sum(len(items)) + 2*(len-1)
length := 2 // []
for i, child := range node.Content {
if i > 0 {
length += 2 // ", "
}
length += len(child.Value)
}
return length
}
if node.Kind == MappingNode {
// {key1: val1, key2: val2} = 2 + sum(key: val) + 2*(pairs-1)
length := 2 // {}
for i := 0; i < len(node.Content); i += 2 {
if i > 0 {
length += 2 // ", "
}
length += len(node.Content[i].Value) + 2 + len(node.Content[i+1].Value) // "key: val"
}
return length
}
return 0
}
+31
View File
@@ -0,0 +1,31 @@
// Copyright 2006-2010 Kirill Simonov
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0 AND MIT
// Output writer with buffering.
// Provides write buffering for the emitter stage.
package libyaml
import "fmt"
// Flush the output buffer.
func (emitter *Emitter) flush() error {
if emitter.write_handler == nil {
panic("write handler not set")
}
// Check if the buffer is empty.
if emitter.buffer_pos == 0 {
return nil
}
if err := emitter.write_handler(emitter, emitter.buffer[:emitter.buffer_pos]); err != nil {
return WriterError{
Err: fmt.Errorf("write error: %w", err),
}
}
emitter.buffer_pos = 0
return nil
}
+834
View File
@@ -0,0 +1,834 @@
// Copyright 2006-2010 Kirill Simonov
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0 AND MIT
// Core libyaml types and structures.
// Defines Parser, Emitter, Event, Token, and related constants for YAML
// processing.
package libyaml
import (
"fmt"
"io"
"strings"
)
// VersionDirective holds the YAML version directive data.
type VersionDirective struct {
major int8 // The major version number.
minor int8 // The minor version number.
}
// Major returns the major version number.
func (v *VersionDirective) Major() int { return int(v.major) }
// Minor returns the minor version number.
func (v *VersionDirective) Minor() int { return int(v.minor) }
// TagDirective holds the YAML tag directive data.
type TagDirective struct {
handle []byte // The tag handle.
prefix []byte // The tag prefix.
}
// GetHandle returns the tag handle.
func (t *TagDirective) GetHandle() string { return string(t.handle) }
// GetPrefix returns the tag prefix.
func (t *TagDirective) GetPrefix() string { return string(t.prefix) }
type Encoding int
// The stream encoding.
const (
// Let the parser choose the encoding.
ANY_ENCODING Encoding = iota
UTF8_ENCODING // The default UTF-8 encoding.
UTF16LE_ENCODING // The UTF-16-LE encoding with BOM.
UTF16BE_ENCODING // The UTF-16-BE encoding with BOM.
)
type LineBreak int
// Line break types.
const (
// Let the parser choose the break type.
ANY_BREAK LineBreak = iota
CR_BREAK // Use CR for line breaks (Mac style).
LN_BREAK // Use LN for line breaks (Unix style).
CRLN_BREAK // Use CR LN for line breaks (DOS style).
)
type QuoteStyle int
// Quote style types for required quoting.
const (
QuoteSingle QuoteStyle = iota // Prefer single quotes when quoting is required.
QuoteDouble // Prefer double quotes when quoting is required.
QuoteLegacy // Legacy behavior: double in representer, single in emitter.
)
// ScalarStyle returns the scalar style for this quote preference in the
// representer/serializer context.
// In this context, both QuoteDouble and QuoteLegacy use double quotes.
func (q QuoteStyle) ScalarStyle() ScalarStyle {
if q == QuoteDouble || q == QuoteLegacy {
return DOUBLE_QUOTED_SCALAR_STYLE
}
return SINGLE_QUOTED_SCALAR_STYLE
}
type ErrorType int
// Many bad things could happen with the parser and emitter.
const (
// No error is produced.
NO_ERROR ErrorType = iota
MEMORY_ERROR // Cannot allocate or reallocate a block of memory.
READER_ERROR // Cannot read or decode the input stream.
SCANNER_ERROR // Cannot scan the input stream.
PARSER_ERROR // Cannot parse the input stream.
COMPOSER_ERROR // Cannot compose a YAML document.
WRITER_ERROR // Cannot write to the output stream.
EMITTER_ERROR // Cannot emit a YAML stream.
)
// Mark holds the pointer position.
type Mark struct {
Index int // The position index.
Line int // The position line (1-indexed).
Column int // The position column (0-indexed internally, displayed as 1-indexed).
}
func (m Mark) String() string {
var builder strings.Builder
if m.Line == 0 {
return "<unknown position>"
}
fmt.Fprintf(&builder, "line %d", m.Line)
if m.Column != 0 {
fmt.Fprintf(&builder, ", column %d", m.Column+1)
}
return builder.String()
}
// Node Styles
type styleInt int8
type ScalarStyle styleInt
// Scalar styles.
const (
// Let the emitter choose the style.
ANY_SCALAR_STYLE ScalarStyle = 0
PLAIN_SCALAR_STYLE ScalarStyle = 1 << iota // The plain scalar style.
SINGLE_QUOTED_SCALAR_STYLE // The single-quoted scalar style.
DOUBLE_QUOTED_SCALAR_STYLE // The double-quoted scalar style.
LITERAL_SCALAR_STYLE // The literal scalar style.
FOLDED_SCALAR_STYLE // The folded scalar style.
)
// String returns a string representation of a [ScalarStyle].
func (style ScalarStyle) String() string {
switch style {
case PLAIN_SCALAR_STYLE:
return "Plain"
case SINGLE_QUOTED_SCALAR_STYLE:
return "Single"
case DOUBLE_QUOTED_SCALAR_STYLE:
return "Double"
case LITERAL_SCALAR_STYLE:
return "Literal"
case FOLDED_SCALAR_STYLE:
return "Folded"
default:
return ""
}
}
type SequenceStyle styleInt
// Sequence styles.
const (
// Let the emitter choose the style.
ANY_SEQUENCE_STYLE SequenceStyle = iota
BLOCK_SEQUENCE_STYLE // The block sequence style.
FLOW_SEQUENCE_STYLE // The flow sequence style.
)
type MappingStyle styleInt
// Mapping styles.
const (
// Let the emitter choose the style.
ANY_MAPPING_STYLE MappingStyle = iota
BLOCK_MAPPING_STYLE // The block mapping style.
FLOW_MAPPING_STYLE // The flow mapping style.
)
// Tokens
type TokenType int
// Token types.
const (
// An empty token.
NO_TOKEN TokenType = iota
STREAM_START_TOKEN // A STREAM-START token.
STREAM_END_TOKEN // A STREAM-END token.
VERSION_DIRECTIVE_TOKEN // A VERSION-DIRECTIVE token.
TAG_DIRECTIVE_TOKEN // A TAG-DIRECTIVE token.
DOCUMENT_START_TOKEN // A DOCUMENT-START token.
DOCUMENT_END_TOKEN // A DOCUMENT-END token.
BLOCK_SEQUENCE_START_TOKEN // A BLOCK-SEQUENCE-START token.
BLOCK_MAPPING_START_TOKEN // A BLOCK-SEQUENCE-END token.
BLOCK_END_TOKEN // A BLOCK-END token.
FLOW_SEQUENCE_START_TOKEN // A FLOW-SEQUENCE-START token.
FLOW_SEQUENCE_END_TOKEN // A FLOW-SEQUENCE-END token.
FLOW_MAPPING_START_TOKEN // A FLOW-MAPPING-START token.
FLOW_MAPPING_END_TOKEN // A FLOW-MAPPING-END token.
BLOCK_ENTRY_TOKEN // A BLOCK-ENTRY token.
FLOW_ENTRY_TOKEN // A FLOW-ENTRY token.
KEY_TOKEN // A KEY token.
VALUE_TOKEN // A VALUE token.
ALIAS_TOKEN // An ALIAS token.
ANCHOR_TOKEN // An ANCHOR token.
TAG_TOKEN // A TAG token.
SCALAR_TOKEN // A SCALAR token.
COMMENT_TOKEN // A COMMENT token.
)
func (tt TokenType) String() string {
switch tt {
case NO_TOKEN:
return "NO_TOKEN"
case STREAM_START_TOKEN:
return "STREAM_START_TOKEN"
case STREAM_END_TOKEN:
return "STREAM_END_TOKEN"
case VERSION_DIRECTIVE_TOKEN:
return "VERSION_DIRECTIVE_TOKEN"
case TAG_DIRECTIVE_TOKEN:
return "TAG_DIRECTIVE_TOKEN"
case DOCUMENT_START_TOKEN:
return "DOCUMENT_START_TOKEN"
case DOCUMENT_END_TOKEN:
return "DOCUMENT_END_TOKEN"
case BLOCK_SEQUENCE_START_TOKEN:
return "BLOCK_SEQUENCE_START_TOKEN"
case BLOCK_MAPPING_START_TOKEN:
return "BLOCK_MAPPING_START_TOKEN"
case BLOCK_END_TOKEN:
return "BLOCK_END_TOKEN"
case FLOW_SEQUENCE_START_TOKEN:
return "FLOW_SEQUENCE_START_TOKEN"
case FLOW_SEQUENCE_END_TOKEN:
return "FLOW_SEQUENCE_END_TOKEN"
case FLOW_MAPPING_START_TOKEN:
return "FLOW_MAPPING_START_TOKEN"
case FLOW_MAPPING_END_TOKEN:
return "FLOW_MAPPING_END_TOKEN"
case BLOCK_ENTRY_TOKEN:
return "BLOCK_ENTRY_TOKEN"
case FLOW_ENTRY_TOKEN:
return "FLOW_ENTRY_TOKEN"
case KEY_TOKEN:
return "KEY_TOKEN"
case VALUE_TOKEN:
return "VALUE_TOKEN"
case ALIAS_TOKEN:
return "ALIAS_TOKEN"
case ANCHOR_TOKEN:
return "ANCHOR_TOKEN"
case TAG_TOKEN:
return "TAG_TOKEN"
case SCALAR_TOKEN:
return "SCALAR_TOKEN"
case COMMENT_TOKEN:
return "COMMENT_TOKEN"
}
return "<unknown token>"
}
// Token holds information about a scanning token.
type Token struct {
// The token type.
Type TokenType
// The start/end of the token.
StartMark, EndMark Mark
// The stream encoding (for STREAM_START_TOKEN).
encoding Encoding
// The alias/anchor/scalar Value or tag/tag directive handle
// (for ALIAS_TOKEN, ANCHOR_TOKEN, SCALAR_TOKEN, TAG_TOKEN, TAG_DIRECTIVE_TOKEN).
Value []byte
// The tag suffix (for TAG_TOKEN).
suffix []byte
// The tag directive prefix (for TAG_DIRECTIVE_TOKEN).
prefix []byte
// The scalar Style (for SCALAR_TOKEN).
Style ScalarStyle
// The version directive major/minor (for VERSION_DIRECTIVE_TOKEN).
major, minor int8
}
// Events
type EventType int8
// Event types.
const (
// An empty event.
NO_EVENT EventType = iota
STREAM_START_EVENT // A STREAM-START event.
STREAM_END_EVENT // A STREAM-END event.
DOCUMENT_START_EVENT // A DOCUMENT-START event.
DOCUMENT_END_EVENT // A DOCUMENT-END event.
ALIAS_EVENT // An ALIAS event.
SCALAR_EVENT // A SCALAR event.
SEQUENCE_START_EVENT // A SEQUENCE-START event.
SEQUENCE_END_EVENT // A SEQUENCE-END event.
MAPPING_START_EVENT // A MAPPING-START event.
MAPPING_END_EVENT // A MAPPING-END event.
TAIL_COMMENT_EVENT
)
var eventStrings = []string{
NO_EVENT: "none",
STREAM_START_EVENT: "stream start",
STREAM_END_EVENT: "stream end",
DOCUMENT_START_EVENT: "document start",
DOCUMENT_END_EVENT: "document end",
ALIAS_EVENT: "alias",
SCALAR_EVENT: "scalar",
SEQUENCE_START_EVENT: "sequence start",
SEQUENCE_END_EVENT: "sequence end",
MAPPING_START_EVENT: "mapping start",
MAPPING_END_EVENT: "mapping end",
TAIL_COMMENT_EVENT: "tail comment",
}
func (e EventType) String() string {
if e < 0 || int(e) >= len(eventStrings) {
return fmt.Sprintf("unknown event %d", e)
}
return eventStrings[e]
}
// Event holds information about a parsing or emitting event.
type Event struct {
// The event type.
Type EventType
// The start and end of the event.
StartMark, EndMark Mark
// The document encoding (for STREAM_START_EVENT).
encoding Encoding
// The version directive (for DOCUMENT_START_EVENT).
versionDirective *VersionDirective
// The list of tag directives (for DOCUMENT_START_EVENT).
tagDirectives []TagDirective
// The comments
HeadComment []byte
LineComment []byte
FootComment []byte
TailComment []byte
// The Anchor (for SCALAR_EVENT, SEQUENCE_START_EVENT, MAPPING_START_EVENT, ALIAS_EVENT).
Anchor []byte
// The Tag (for SCALAR_EVENT, SEQUENCE_START_EVENT, MAPPING_START_EVENT).
Tag []byte
// The scalar Value (for SCALAR_EVENT).
Value []byte
// Is the document start/end indicator Implicit, or the tag optional?
// (for DOCUMENT_START_EVENT, DOCUMENT_END_EVENT, SEQUENCE_START_EVENT, MAPPING_START_EVENT, SCALAR_EVENT).
Implicit bool
// Is the tag optional for any non-plain style? (for SCALAR_EVENT).
quoted_implicit bool
// The Style (for SCALAR_EVENT, SEQUENCE_START_EVENT, MAPPING_START_EVENT).
Style Style
}
func (e *Event) ScalarStyle() ScalarStyle { return ScalarStyle(e.Style) }
func (e *Event) SequenceStyle() SequenceStyle { return SequenceStyle(e.Style) }
func (e *Event) MappingStyle() MappingStyle { return MappingStyle(e.Style) }
// GetEncoding returns the stream encoding (for STREAM_START_EVENT).
func (e *Event) GetEncoding() Encoding { return e.encoding }
// GetVersionDirective returns the version directive (for DOCUMENT_START_EVENT).
func (e *Event) GetVersionDirective() *VersionDirective { return e.versionDirective }
// GetTagDirectives returns the tag directives (for DOCUMENT_START_EVENT).
func (e *Event) GetTagDirectives() []TagDirective { return e.tagDirectives }
// Nodes
const (
NULL_TAG = "tag:yaml.org,2002:null" // The tag !!null with the only possible value: null.
BOOL_TAG = "tag:yaml.org,2002:bool" // The tag !!bool with the values: true and false.
STR_TAG = "tag:yaml.org,2002:str" // The tag !!str for string values.
INT_TAG = "tag:yaml.org,2002:int" // The tag !!int for integer values.
FLOAT_TAG = "tag:yaml.org,2002:float" // The tag !!float for float values.
TIMESTAMP_TAG = "tag:yaml.org,2002:timestamp" // The tag !!timestamp for date and time values.
SEQ_TAG = "tag:yaml.org,2002:seq" // The tag !!seq is used to denote sequences.
MAP_TAG = "tag:yaml.org,2002:map" // The tag !!map is used to denote mapping.
// Not in original libyaml.
BINARY_TAG = "tag:yaml.org,2002:binary"
MERGE_TAG = "tag:yaml.org,2002:merge"
DEFAULT_SCALAR_TAG = STR_TAG // The default scalar tag is !!str.
DEFAULT_SEQUENCE_TAG = SEQ_TAG // The default sequence tag is !!seq.
DEFAULT_MAPPING_TAG = MAP_TAG // The default mapping tag is !!map.
)
type NodeType int
// Node types.
const (
// An empty node.
NO_NODE NodeType = iota
SCALAR_NODE // A scalar node.
SEQUENCE_NODE // A sequence node.
MAPPING_NODE // A mapping node.
)
// NodeItem represents an element of a sequence node.
type NodeItem int
// NodePair represents an element of a mapping node.
type NodePair struct {
key int // The key of the element.
value int // The value of the element.
}
// parserNode represents a single node in the YAML document tree.
type parserNode struct {
typ NodeType // The node type.
tag []byte // The node tag.
// The node data.
// The scalar parameters (for SCALAR_NODE).
scalar struct {
value []byte // The scalar value.
length int // The length of the scalar value.
style ScalarStyle // The scalar style.
}
// The sequence parameters (for YAML_SEQUENCE_NODE).
sequence struct {
items_data []NodeItem // The stack of sequence items.
style SequenceStyle // The sequence style.
}
// The mapping parameters (for MAPPING_NODE).
mapping struct {
pairs_data []NodePair // The stack of mapping pairs (key, value).
pairs_start *NodePair // The beginning of the stack.
pairs_end *NodePair // The end of the stack.
pairs_top *NodePair // The top of the stack.
style MappingStyle // The mapping style.
}
start_mark Mark // The beginning of the node.
end_mark Mark // The end of the node.
}
// Document structure.
type Document struct {
// The document nodes.
nodes []parserNode
// The version directive.
version_directive *VersionDirective
// The list of tag directives.
tag_directives_data []TagDirective
tag_directives_start int // The beginning of the tag directives list.
tag_directives_end int // The end of the tag directives list.
start_implicit int // Is the document start indicator implicit?
end_implicit int // Is the document end indicator implicit?
// The start/end of the document.
start_mark, end_mark Mark
}
// ReadHandler is called when the [Parser] needs to read more bytes from the
// source. The handler should write not more than size bytes to the buffer.
// The number of written bytes should be set to the size_read variable.
//
// [in,out] data A pointer to an application data specified by
//
// yamlParser.setInput().
//
// [out] buffer The buffer to write the data from the source.
// [in] size The size of the buffer.
// [out] size_read The actual number of bytes read from the source.
//
// On success, the handler should return 1. If the handler failed,
// the returned value should be 0. On EOF, the handler should set the
// size_read to 0 and return 1.
type ReadHandler func(parser *Parser, buffer []byte) (n int, err error)
// SimpleKey holds information about a potential simple key.
type SimpleKey struct {
flow_level int // What flow level is the key at?
required bool // Is a simple key required?
token_number int // The number of the token.
mark Mark // The position mark.
}
// ParserState represents the state of the parser.
type ParserState int
const (
PARSE_STREAM_START_STATE ParserState = iota
PARSE_IMPLICIT_DOCUMENT_START_STATE // Expect the beginning of an implicit document.
PARSE_DOCUMENT_START_STATE // Expect DOCUMENT-START.
PARSE_DOCUMENT_CONTENT_STATE // Expect the content of a document.
PARSE_DOCUMENT_END_STATE // Expect DOCUMENT-END.
PARSE_BLOCK_NODE_STATE // Expect a block node.
PARSE_BLOCK_SEQUENCE_FIRST_ENTRY_STATE // Expect the first entry of a block sequence.
PARSE_BLOCK_SEQUENCE_ENTRY_STATE // Expect an entry of a block sequence.
PARSE_INDENTLESS_SEQUENCE_ENTRY_STATE // Expect an entry of an indentless sequence.
PARSE_BLOCK_MAPPING_FIRST_KEY_STATE // Expect the first key of a block mapping.
PARSE_BLOCK_MAPPING_KEY_STATE // Expect a block mapping key.
PARSE_BLOCK_MAPPING_VALUE_STATE // Expect a block mapping value.
PARSE_FLOW_SEQUENCE_FIRST_ENTRY_STATE // Expect the first entry of a flow sequence.
PARSE_FLOW_SEQUENCE_ENTRY_STATE // Expect an entry of a flow sequence.
PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_KEY_STATE // Expect a key of an ordered mapping.
PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_VALUE_STATE // Expect a value of an ordered mapping.
PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_END_STATE // Expect the and of an ordered mapping entry.
PARSE_FLOW_MAPPING_FIRST_KEY_STATE // Expect the first key of a flow mapping.
PARSE_FLOW_MAPPING_KEY_STATE // Expect a key of a flow mapping.
PARSE_FLOW_MAPPING_VALUE_STATE // Expect a value of a flow mapping.
PARSE_FLOW_MAPPING_EMPTY_VALUE_STATE // Expect an empty value of a flow mapping.
PARSE_END_STATE // Expect nothing.
)
func (ps ParserState) String() string {
switch ps {
case PARSE_STREAM_START_STATE:
return "PARSE_STREAM_START_STATE"
case PARSE_IMPLICIT_DOCUMENT_START_STATE:
return "PARSE_IMPLICIT_DOCUMENT_START_STATE"
case PARSE_DOCUMENT_START_STATE:
return "PARSE_DOCUMENT_START_STATE"
case PARSE_DOCUMENT_CONTENT_STATE:
return "PARSE_DOCUMENT_CONTENT_STATE"
case PARSE_DOCUMENT_END_STATE:
return "PARSE_DOCUMENT_END_STATE"
case PARSE_BLOCK_NODE_STATE:
return "PARSE_BLOCK_NODE_STATE"
case PARSE_BLOCK_SEQUENCE_FIRST_ENTRY_STATE:
return "PARSE_BLOCK_SEQUENCE_FIRST_ENTRY_STATE"
case PARSE_BLOCK_SEQUENCE_ENTRY_STATE:
return "PARSE_BLOCK_SEQUENCE_ENTRY_STATE"
case PARSE_INDENTLESS_SEQUENCE_ENTRY_STATE:
return "PARSE_INDENTLESS_SEQUENCE_ENTRY_STATE"
case PARSE_BLOCK_MAPPING_FIRST_KEY_STATE:
return "PARSE_BLOCK_MAPPING_FIRST_KEY_STATE"
case PARSE_BLOCK_MAPPING_KEY_STATE:
return "PARSE_BLOCK_MAPPING_KEY_STATE"
case PARSE_BLOCK_MAPPING_VALUE_STATE:
return "PARSE_BLOCK_MAPPING_VALUE_STATE"
case PARSE_FLOW_SEQUENCE_FIRST_ENTRY_STATE:
return "PARSE_FLOW_SEQUENCE_FIRST_ENTRY_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_KEY_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_KEY_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_VALUE_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_VALUE_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_END_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_END_STATE"
case PARSE_FLOW_MAPPING_FIRST_KEY_STATE:
return "PARSE_FLOW_MAPPING_FIRST_KEY_STATE"
case PARSE_FLOW_MAPPING_KEY_STATE:
return "PARSE_FLOW_MAPPING_KEY_STATE"
case PARSE_FLOW_MAPPING_VALUE_STATE:
return "PARSE_FLOW_MAPPING_VALUE_STATE"
case PARSE_FLOW_MAPPING_EMPTY_VALUE_STATE:
return "PARSE_FLOW_MAPPING_EMPTY_VALUE_STATE"
case PARSE_END_STATE:
return "PARSE_END_STATE"
}
return "<unknown parser state>"
}
// AliasData holds information about aliases.
type AliasData struct {
anchor []byte // The anchor.
index int // The node id.
mark Mark // The anchor mark.
}
// Parser structure holds all information about the current
// state of the parser.
type Parser struct {
lastError error
// Reader stuff
read_handler ReadHandler // Read handler.
input_reader io.Reader // File input data.
input []byte // String input data.
input_pos int
eof bool // EOF flag
buffer []byte // The working buffer.
buffer_pos int // The current position of the buffer.
unread int // The number of unread characters in the buffer.
newlines int // The number of line breaks since last non-break/non-blank character
raw_buffer []byte // The raw buffer.
raw_buffer_pos int // The current position of the buffer.
encoding Encoding // The input encoding.
offset int // The offset of the current position (in bytes).
mark Mark // The mark of the current position.
// Comments
HeadComment []byte // The current head comments
LineComment []byte // The current line comments
FootComment []byte // The current foot comments
tail_comment []byte // Foot comment that happens at the end of a block.
stem_comment []byte // Comment in item preceding a nested structure (list inside list item, etc)
comments []Comment // The folded comments for all parsed tokens
comments_head int
// Scanner stuff
stream_start_produced bool // Have we started to scan the input stream?
stream_end_produced bool // Have we reached the end of the input stream?
flow_level int // The number of unclosed '[' and '{' indicators.
tokens []Token // The tokens queue.
tokens_head int // The head of the tokens queue.
tokens_parsed int // The number of tokens fetched from the queue.
token_available bool // Does the tokens queue contain a token ready for dequeueing.
indent int // The current indentation level.
indents []int // The indentation levels stack.
simple_key_allowed bool // May a simple key occur at the current position?
simple_key_possible bool // Is the current simple key possible?
simple_key SimpleKey // The current simple key.
simple_key_stack []SimpleKey // The stack of simple keys.
// Parser stuff
state ParserState // The current parser state.
states []ParserState // The parser states stack.
marks []Mark // The stack of marks.
tag_directives []TagDirective // The list of TAG directives.
// Representer stuff
aliases []AliasData // The alias data.
document *Document // The currently parsed document.
}
type Comment struct {
ScanMark Mark // Position where scanning for comments started
TokenMark Mark // Position after which tokens will be associated with this comment
StartMark Mark // Position of '#' comment mark
EndMark Mark // Position where comment terminated
Head []byte
Line []byte
Foot []byte
}
// Emitter Definitions
// WriteHandler is called when the [Emitter] needs to flush the accumulated
// characters to the output. The handler should write @a size bytes of the
// @a buffer to the output.
//
// @param[in,out] data A pointer to an application data specified by
//
// yamlEmitter.setOutput().
//
// @param[in] buffer The buffer with bytes to be written.
// @param[in] size The size of the buffer.
//
// @returns On success, the handler should return @c 1. If the handler failed,
// the returned value should be @c 0.
type WriteHandler func(emitter *Emitter, buffer []byte) error
type EmitterState int
// The emitter states.
const (
// Expect STREAM-START.
EMIT_STREAM_START_STATE EmitterState = iota
EMIT_FIRST_DOCUMENT_START_STATE // Expect the first DOCUMENT-START or STREAM-END.
EMIT_DOCUMENT_START_STATE // Expect DOCUMENT-START or STREAM-END.
EMIT_DOCUMENT_CONTENT_STATE // Expect the content of a document.
EMIT_DOCUMENT_END_STATE // Expect DOCUMENT-END.
EMIT_FLOW_SEQUENCE_FIRST_ITEM_STATE // Expect the first item of a flow sequence.
EMIT_FLOW_SEQUENCE_TRAIL_ITEM_STATE // Expect the next item of a flow sequence, with the comma already written out
EMIT_FLOW_SEQUENCE_ITEM_STATE // Expect an item of a flow sequence.
EMIT_FLOW_MAPPING_FIRST_KEY_STATE // Expect the first key of a flow mapping.
EMIT_FLOW_MAPPING_TRAIL_KEY_STATE // Expect the next key of a flow mapping, with the comma already written out
EMIT_FLOW_MAPPING_KEY_STATE // Expect a key of a flow mapping.
EMIT_FLOW_MAPPING_SIMPLE_VALUE_STATE // Expect a value for a simple key of a flow mapping.
EMIT_FLOW_MAPPING_VALUE_STATE // Expect a value of a flow mapping.
EMIT_BLOCK_SEQUENCE_FIRST_ITEM_STATE // Expect the first item of a block sequence.
EMIT_BLOCK_SEQUENCE_ITEM_STATE // Expect an item of a block sequence.
EMIT_BLOCK_MAPPING_FIRST_KEY_STATE // Expect the first key of a block mapping.
EMIT_BLOCK_MAPPING_KEY_STATE // Expect the key of a block mapping.
EMIT_BLOCK_MAPPING_SIMPLE_VALUE_STATE // Expect a value for a simple key of a block mapping.
EMIT_BLOCK_MAPPING_VALUE_STATE // Expect a value of a block mapping.
EMIT_END_STATE // Expect nothing.
)
// Emitter holds all information about the current state of the emitter.
type Emitter struct {
// Writer stuff
write_handler WriteHandler // Write handler.
output_buffer *[]byte // String output data.
output_writer io.Writer // File output data.
buffer []byte // The working buffer.
buffer_pos int // The current position of the buffer.
encoding Encoding // The stream encoding.
// Emitter stuff
canonical bool // If the output is in the canonical style?
BestIndent int // The number of indentation spaces.
best_width int // The preferred width of the output lines.
unicode bool // Allow unescaped non-ASCII characters?
line_break LineBreak // The preferred line break.
quotePreference QuoteStyle // Preferred quote style when quoting is required.
state EmitterState // The current emitter state.
states []EmitterState // The stack of states.
events []Event // The event queue.
events_head int // The head of the event queue.
indents []int // The stack of indentation levels.
tag_directives []TagDirective // The list of tag directives.
indent int // The current indentation level.
CompactSequenceIndent bool // Is '- ' is considered part of the indentation for sequence elements?
flow_level int // The current flow level.
root_context bool // Is it the document root context?
sequence_context bool // Is it a sequence context?
mapping_context bool // Is it a mapping context?
simple_key_context bool // Is it a simple mapping key context?
line int // The current line.
column int // The current column.
whitespace bool // If the last character was a whitespace?
indention bool // If the last character was an indentation character (' ', '-', '?', ':')?
OpenEnded bool // If an explicit document end is required?
space_above bool // Is there's an empty line above?
foot_indent int // The indent used to write the foot comment above, or -1 if none.
// Anchor analysis.
anchor_data struct {
anchor []byte // The anchor value.
alias bool // Is it an alias?
}
// Tag analysis.
tag_data struct {
handle []byte // The tag handle.
suffix []byte // The tag suffix.
}
// Scalar analysis.
scalar_data struct {
value []byte // The scalar value.
multiline bool // Does the scalar contain line breaks?
flow_plain_allowed bool // Can the scalar be expressed in the flow plain style?
block_plain_allowed bool // Can the scalar be expressed in the block plain style?
single_quoted_allowed bool // Can the scalar be expressed in the single quoted style?
block_allowed bool // Can the scalar be expressed in the literal or folded styles?
style ScalarStyle // The output style.
}
// Comments
HeadComment []byte
LineComment []byte
FootComment []byte
TailComment []byte
key_line_comment []byte
// Representer stuff
opened bool // If the stream was already opened?
closed bool // If the stream was already closed?
// The information associated with the document nodes.
anchors *struct {
references int // The number of references.
anchor int // The anchor id.
serialized bool // If the node has been emitted?
}
last_anchor_id int // The last assigned anchor id.
document *Document // The currently emitted document.
}
+192
View File
@@ -0,0 +1,192 @@
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// YAML test data loading utilities.
// Provides helper functions for loading and processing YAML test data,
// including scalar coercion.
package libyaml
import (
"errors"
"fmt"
"io"
"strings"
)
// coerceScalar converts a YAML scalar string to an appropriate Go type
func coerceScalar(value string) any {
// Try bool and null
switch value {
case "true":
return true
case "false":
return false
case "null":
return nil
}
// Try hex int (0x or 0X prefix) - needed for test data byte arrays
var intVal int
if _, err := fmt.Sscanf(strings.ToLower(value), "0x%x", &intVal); err == nil {
return intVal
}
// Try float (must check before int because %d will parse "1.5" as "1")
if strings.Contains(value, ".") {
var floatVal float64
if _, err := fmt.Sscanf(value, "%f", &floatVal); err == nil {
return floatVal
}
}
// Try decimal int - use int64 to handle large values on 32-bit systems
var int64Val int64
if _, err := fmt.Sscanf(value, "%d", &int64Val); err == nil {
// Return as int if it fits, otherwise int64
if int64Val == int64(int(int64Val)) {
return int(int64Val)
}
return int64Val
}
// Default to string
return value
}
// LoadYAML parses YAML data using the native libyaml Parser.
// This function is exported so it can be used by other packages for data-driven testing.
// It returns a generic interface{} which is typically:
// - map[string]interface{} for YAML mappings
// - []interface{} for YAML sequences
// - scalar values, resolved according to the following rules:
// - Booleans: "true" and "false" are returned as bool (true/false).
// - Nulls: "null" is returned as nil.
// - Floats: values containing "." are parsed as float64.
// - Decimal integers: values matching integer format are parsed as int.
// - All other values are returned as string.
//
// This scalar resolution behavior matches the implementation in coerceScalar.
func LoadYAML(data []byte) (any, error) {
parser := NewParser()
parser.SetInputString(data)
defer parser.Delete()
type stackEntry struct {
container any // map[string]interface{} or []interface{}
key string // for maps: current key waiting for value
}
var stack []stackEntry
var root any
for {
var event Event
if err := parser.Parse(&event); err != nil {
if errors.Is(err, io.EOF) {
break
}
return nil, err
}
switch event.Type {
case STREAM_END_EVENT:
// End of stream, we're done
return root, nil
case STREAM_START_EVENT, DOCUMENT_START_EVENT:
// Structural markers, no action needed
case MAPPING_START_EVENT:
newMap := make(map[string]any)
stack = append(stack, stackEntry{container: newMap})
case MAPPING_END_EVENT:
if len(stack) > 0 {
popped := stack[len(stack)-1]
stack = stack[:len(stack)-1]
// Add completed map to parent or set as root
if len(stack) == 0 {
root = popped.container
} else {
parent := &stack[len(stack)-1]
if m, ok := parent.container.(map[string]any); ok {
m[parent.key] = popped.container
parent.key = "" // Reset key after use
} else if s, ok := parent.container.([]any); ok {
parent.container = append(s, popped.container)
}
}
}
case SEQUENCE_START_EVENT:
newSlice := make([]any, 0)
stack = append(stack, stackEntry{container: newSlice})
case SEQUENCE_END_EVENT:
if len(stack) > 0 {
popped := stack[len(stack)-1]
stack = stack[:len(stack)-1]
// Add completed slice to parent or set as root
if len(stack) == 0 {
root = popped.container
} else {
parent := &stack[len(stack)-1]
if m, ok := parent.container.(map[string]any); ok {
m[parent.key] = popped.container
parent.key = "" // Reset key after use
} else if s, ok := parent.container.([]any); ok {
parent.container = append(s, popped.container)
}
}
}
case SCALAR_EVENT:
value := string(event.Value)
// Only coerce plain (unquoted) scalars
isQuoted := ScalarStyle(event.Style) != PLAIN_SCALAR_STYLE
if len(stack) == 0 {
// Scalar at root level
if isQuoted {
root = value
} else {
root = coerceScalar(value)
}
} else {
parent := &stack[len(stack)-1]
if m, ok := parent.container.(map[string]any); ok {
if parent.key == "" {
// This scalar is a key - keep as string, don't coerce
parent.key = value
} else {
// This scalar is a value
if isQuoted {
m[parent.key] = value
} else {
m[parent.key] = coerceScalar(value)
}
parent.key = ""
}
} else if s, ok := parent.container.([]any); ok {
// Add to sequence
if isQuoted {
parent.container = append(s, value)
} else {
parent.container = append(s, coerceScalar(value))
}
}
}
case DOCUMENT_END_EVENT:
// Document end marker, continue processing
case ALIAS_EVENT, TAIL_COMMENT_EVENT:
// For now, skip aliases and comments (not used in test data)
}
}
return root, nil
}
+249
View File
@@ -0,0 +1,249 @@
// Copyright 2006-2010 Kirill Simonov
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0 AND MIT
// Internal constants and buffer sizes.
// Defines buffer sizes, stack sizes, and other internal configuration
// constants for libyaml.
package libyaml
const (
// The size of the input raw buffer.
input_raw_buffer_size = 512
// The size of the input buffer.
// It should be possible to decode the whole raw buffer.
input_buffer_size = input_raw_buffer_size * 3
// The size of the output buffer.
output_buffer_size = 128
// The size of other stacks and queues.
initial_stack_size = 16
initial_queue_size = 16
initial_string_size = 16
)
// Check if the character at the specified position is an alphabetical
// character, a digit, '_', or '-'.
func isAlpha(b []byte, i int) bool {
return b[i] >= '0' && b[i] <= '9' || b[i] >= 'A' && b[i] <= 'Z' ||
b[i] >= 'a' && b[i] <= 'z' || b[i] == '_' || b[i] == '-'
}
// Check if the character at the specified position is a flow indicator as
// defined by spec production [23] c-flow-indicator ::=
// c-collect-entry | c-sequence-start | c-sequence-end |
// c-mapping-start | c-mapping-end
func isFlowIndicator(b []byte, i int) bool {
return b[i] == '[' || b[i] == ']' ||
b[i] == '{' || b[i] == '}' || b[i] == ','
}
// Check if the character at the specified position is valid for anchor names
// as defined by spec production [102] ns-anchor-char ::= ns-char -
// c-flow-indicator.
// This includes all printable characters except: CR, LF, BOM, space, tab, '[',
// ']', '{', '}', ','.
// We further limit it to ascii chars only, which is a subset of the spec
// production but is usually what most people expect.
func isAnchorChar(b []byte, i int) bool {
if isColon(b, i) {
// [Go] we exclude colons from anchor/alias names.
//
// A colon is a valid anchor character according to the YAML 1.2 specification,
// but it can lead to ambiguity.
// https://github.com/yaml/go-yaml/issues/109
//
// Also, it would have been a breaking change to support it, as go.yaml.in/yaml/v3 ignores it.
// Supporting it could lead to unexpected behavior.
return false
}
return isPrintable(b, i) &&
!isLineBreak(b, i) &&
!isBlank(b, i) &&
!isBOM(b, i) &&
!isFlowIndicator(b, i) &&
isASCII(b, i)
}
// isColon checks whether the character at the specified position is a colon.
func isColon(b []byte, i int) bool {
return b[i] == ':'
}
// Check if the character at the specified position is valid in a tag URI.
//
// The set of valid characters is:
//
// '0'-'9', 'A'-'Z', 'a'-'z', '_', '-', ';', '/', '?', ':', '@', '&',
// '=', '+', '$', '.', '!', '~', '*', '\'', '(', ')', '%'.
//
// If verbatim is true, flow indicators (',', '[', ']', '{', '}') are also
// allowed.
func isTagURIChar(b []byte, i int, verbatim bool) bool {
c := b[i]
// isAlpha covers: 0-9, A-Z, a-z, _, -
if isAlpha(b, i) {
return true
}
// Check special URI characters
switch c {
case ';', '/', '?', ':', '@', '&', '=', '+', '$', '.', '!', '~', '*', '\'', '(', ')', '%':
return true
case ',', '[', ']', '{', '}':
return verbatim
}
return false
}
// Check if the character at the specified position is a digit.
func isDigit(b []byte, i int) bool {
return b[i] >= '0' && b[i] <= '9'
}
// Get the value of a digit.
func asDigit(b []byte, i int) int {
return int(b[i]) - '0'
}
// Check if the character at the specified position is a hex-digit.
func isHex(b []byte, i int) bool {
return b[i] >= '0' && b[i] <= '9' || b[i] >= 'A' && b[i] <= 'F' ||
b[i] >= 'a' && b[i] <= 'f'
}
// Get the value of a hex-digit.
func asHex(b []byte, i int) int {
bi := b[i]
if bi >= 'A' && bi <= 'F' {
return int(bi) - 'A' + 10
}
if bi >= 'a' && bi <= 'f' {
return int(bi) - 'a' + 10
}
return int(bi) - '0'
}
// Check if the character is ASCII.
func isASCII(b []byte, i int) bool {
return b[i] <= 0x7F
}
// Check if the character at the start of the buffer can be printed unescaped.
func isPrintable(b []byte, i int) bool {
return ((b[i] == 0x0A) || // . == #x0A
(b[i] >= 0x20 && b[i] <= 0x7E) || // #x20 <= . <= #x7E
(b[i] == 0xC2 && b[i+1] >= 0xA0) || // #0xA0 <= . <= #xD7FF
(b[i] > 0xC2 && b[i] < 0xED) ||
(b[i] == 0xED && b[i+1] < 0xA0) ||
(b[i] == 0xEE) ||
(b[i] == 0xEF && // #xE000 <= . <= #xFFFD
!(b[i+1] == 0xBB && b[i+2] == 0xBF) && // && . != #xFEFF
!(b[i+1] == 0xBF && (b[i+2] == 0xBE || b[i+2] == 0xBF))))
}
// Check if the character at the specified position is NUL.
func isZeroChar(b []byte, i int) bool {
return b[i] == 0x00
}
// Check if the beginning of the buffer is a BOM.
func isBOM(b []byte, i int) bool {
return b[0] == 0xEF && b[1] == 0xBB && b[2] == 0xBF
}
// Check if the character at the specified position is space.
func isSpace(b []byte, i int) bool {
return b[i] == ' '
}
// Check if the character at the specified position is tab.
func isTab(b []byte, i int) bool {
return b[i] == '\t'
}
// Check if the character at the specified position is blank (space or tab).
func isBlank(b []byte, i int) bool {
// return isSpace(b, i) || isTab(b, i)
return b[i] == ' ' || b[i] == '\t'
}
// Check if the character at the specified position is a line break.
func isLineBreak(b []byte, i int) bool {
return (b[i] == '\r' || // CR (#xD)
b[i] == '\n' || // LF (#xA)
b[i] == 0xC2 && b[i+1] == 0x85 || // NEL (#x85)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA8 || // LS (#x2028)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA9) // PS (#x2029)
}
func isCRLF(b []byte, i int) bool {
return b[i] == '\r' && b[i+1] == '\n'
}
// Check if the character is a line break or NUL.
func isBreakOrZero(b []byte, i int) bool {
// return isLineBreak(b, i) || isZeroChar(b, i)
return (
// isBreak:
b[i] == '\r' || // CR (#xD)
b[i] == '\n' || // LF (#xA)
b[i] == 0xC2 && b[i+1] == 0x85 || // NEL (#x85)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA8 || // LS (#x2028)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA9 || // PS (#x2029)
// isZeroChar:
b[i] == 0)
}
// Check if the character is a line break, space, or NUL.
func isSpaceOrZero(b []byte, i int) bool {
// return isSpace(b, i) || isBreakOrZero(b, i)
return (
// isSpace:
b[i] == ' ' ||
// isBreakOrZero:
b[i] == '\r' || // CR (#xD)
b[i] == '\n' || // LF (#xA)
b[i] == 0xC2 && b[i+1] == 0x85 || // NEL (#x85)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA8 || // LS (#x2028)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA9 || // PS (#x2029)
b[i] == 0)
}
// Check if the character is a line break, space, tab, or NUL.
func isBlankOrZero(b []byte, i int) bool {
// return isBlank(b, i) || isBreakOrZero(b, i)
return (
// isBlank:
b[i] == ' ' || b[i] == '\t' ||
// isBreakOrZero:
b[i] == '\r' || // CR (#xD)
b[i] == '\n' || // LF (#xA)
b[i] == 0xC2 && b[i+1] == 0x85 || // NEL (#x85)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA8 || // LS (#x2028)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA9 || // PS (#x2029)
b[i] == 0)
}
// Determine the width of the character.
func width(b byte) int {
// Don't replace these by a switch without first
// confirming that it is being inlined.
if b&0x80 == 0x00 {
return 1
}
if b&0xE0 == 0xC0 {
return 2
}
if b&0xF0 == 0xE0 {
return 3
}
if b&0xF8 == 0xF0 {
return 4
}
return 0
}