updated vendor

This commit is contained in:
2026-06-16 08:02:19 +02:00
parent 2f7f99d3f0
commit 77299d0c64
1283 changed files with 67302 additions and 208958 deletions
+34 -18
View File
@@ -23,7 +23,7 @@ The `internal/libyaml` package implements the core YAML processing stages:
- **yaml.go** - Core types and constants (Event, Token, enums)
- **reader.go** - Input handling and encoding detection
- **writer.go** - Output handling
- **yamlprivate.go** - Internal types and helper functions
- **util.go** - Internal types and helper functions
### Test Files
@@ -34,10 +34,11 @@ The `internal/libyaml` package implements the core YAML processing stages:
- **yaml_test.go** - Utility function tests
- **reader_test.go** - Reader tests
- **writer_test.go** - Writer tests
- **yamlprivate_test.go** - Character classification tests
- **util_test.go** - Character classification tests
- **loader_test.go** - Data loader scalar resolution tests
- **yamldatatest_test.go** - YAML test data loading framework
- **yamldatatest_loader.go** - YAML test data loader with scalar type resolution (exported for reuse)
- **yamldatatest_loader.go** - YAML test data loader with scalar type
resolution (exported for reuse)
### Test Data Files (in `testdata/`)
@@ -48,7 +49,7 @@ The `internal/libyaml` package implements the core YAML processing stages:
- **yaml.yaml** - Utility function test cases
- **reader.yaml** - Reader test cases
- **writer.yaml** - Writer test cases
- **yamlprivate.yaml** - Character classification test cases
- **util.yaml** - Character classification test cases
- **loader.yaml** - Data loader scalar resolution test cases
## Processing Pipeline
@@ -126,7 +127,8 @@ The testing framework uses a data-driven approach:
1. **Test data** is stored in YAML files in the `testdata/` directory
2. **Test logic** is implemented in Go files (`*_test.go`)
3. **One-to-one pairing**: Each `testdata/foo.yaml` has a corresponding `foo_test.go`
3. **One-to-one pairing**: Each `testdata/foo.yaml` has a corresponding
`foo_test.go`
**Benefits**:
- Easy to add new test cases without writing Go code
@@ -175,7 +177,7 @@ Each YAML file contains test cases for a specific component:
- Output handlers (string, io.Writer)
- Error conditions
- **yamlprivate.yaml** - Character classification tests
- **util.yaml** - Character classification tests
- Character type predicates (isAlpha, isDigit, isHex, etc.)
- Character conversion functions (asDigit, asHex, width)
- Unicode handling
@@ -188,13 +190,15 @@ Each YAML file contains test cases for a specific component:
### Test Framework Implementation
The test framework is implemented in `yamldatatest_loader.go` and `yamldatatest_test.go`:
The test framework is implemented in `testdata_test.go`:
**Core functions**:
- `LoadYAML(data []byte) (interface{}, error)` - Parses YAML using libyaml parser with scalar type resolution (exported)
- `UnmarshalStruct(target interface{}, data map[string]interface{}) error` - Populates structs (exported)
- `LoadTestCases(filename string) ([]TestCase, error)` - Loads and parses test YAML files
- `coerceScalar(value string) interface{}` - Resolves scalar strings to appropriate Go types (int, float64, bool, nil, string)
- `LoadAny(data []byte) (interface{}, error)` - Parses YAML using production
loader with scalar type resolution (exported from loader.go)
- `UnmarshalStruct(target interface{}, data map[string]interface{}) error` -
Populates structs (exported)
- `LoadTestCases(filename string) ([]TestCase, error)` - Loads and parses
test YAML files
**Core types**:
- `TestCase` struct - Umbrella structure containing fields for all test types
@@ -203,8 +207,10 @@ The test framework is implemented in `yamldatatest_loader.go` and `yamldatatest_
**Post-processing**:
After loading, the framework processes test data:
- Converts `Want` (interface{}) to `WantEvents`, `WantTokens`, or `WantSpecs` based on test type
- Converts `Want` (interface{}) to `WantContains` (handles both scalar and sequence)
- Converts `Want` (interface{}) to `WantEvents`, `WantTokens`, or `WantSpecs`
based on test type
- Converts `Want` (interface{}) to `WantContains` (handles both scalar and
sequence)
- Converts `Checks` to field validation specifications
### Test Types
@@ -490,7 +496,8 @@ Test cases use a **type-as-key** format where the test type is the map key:
- **yaml** - Input YAML string to test
- **want** - Expected result (format varies by test type)
- For api-panic: string containing expected panic message substring
- For scan-error/parse-error: boolean (defaults to true if omitted; set to false if no error expected)
- For scan-error/parse-error: boolean (defaults to true if omitted; set to
false if no error expected)
- For enum-string: string representing expected String() output
- For other types: varies (may be sequence or scalar)
- **data** - For emitter tests: list of event specifications to emit
@@ -499,11 +506,20 @@ Test cases use a **type-as-key** format where the test type is the map key:
- **call** - For API tests: method call [MethodName, arg1, arg2, ...]
- **init** - For API panic tests: setup method call before main method
- **byte** - For API tests: boolean flag to convert string args to []byte
- **test** - For API tests: list of field validation checks in format `operator: [field, value]` where operator is one of: nil, cap, len, eq, gte, len-gt.
- **test** - For style-accessor tests: array of [Method, STYLE] where Method is the accessor method (e.g., ScalarStyle) and STYLE is the style constant (e.g., DOUBLE_QUOTED_SCALAR_STYLE).
- **enum** - For enum tests: array of [Type, Value] where Type is the enum type (e.g., ScalarStyle) and Value is the constant (e.g., PLAIN_SCALAR_STYLE)
- **test** - For API tests: list of field validation checks in format
`operator: [field, value]` where operator is one of: nil, cap, len, eq, gte,
len-gt.
- **test** - For style-accessor tests: array of [Method, STYLE] where Method
is the accessor method (e.g., ScalarStyle) and STYLE is the style constant
(e.g., DOUBLE_QUOTED_SCALAR_STYLE).
- **enum** - For enum tests: array of [Type, Value] where Type is the enum
type (e.g., ScalarStyle) and Value is the constant (e.g.,
PLAIN_SCALAR_STYLE)
**Note on scalar type resolution**: Unquoted scalar values in test data are automatically resolved to appropriate Go types (int, float64, bool, nil) by the `LoadYAML` function. Quoted scalars remain as strings.
**Note on scalar type resolution**: Unquoted scalar values in test data are
automatically resolved to appropriate Go types (int, float64, bool, nil) by the
`LoadAny` function.
Quoted scalars remain as strings.
### Running Tests
-733
View File
@@ -1,733 +0,0 @@
// Copyright 2006-2010 Kirill Simonov
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0 AND MIT
// High-level API helpers for parser and emitter initialization and
// configuration.
// Provides convenience functions for token insertion and stream management.
package libyaml
import (
"io"
)
func (parser *Parser) insertToken(pos int, token *Token) {
// fmt.Println("yaml_insert_token", "pos:", pos, "typ:", token.typ, "head:", parser.tokens_head, "len:", len(parser.tokens))
// Check if we can move the queue at the beginning of the buffer.
if parser.tokens_head > 0 && len(parser.tokens) == cap(parser.tokens) {
if parser.tokens_head != len(parser.tokens) {
copy(parser.tokens, parser.tokens[parser.tokens_head:])
}
parser.tokens = parser.tokens[:len(parser.tokens)-parser.tokens_head]
parser.tokens_head = 0
}
parser.tokens = append(parser.tokens, *token)
if pos < 0 {
return
}
copy(parser.tokens[parser.tokens_head+pos+1:], parser.tokens[parser.tokens_head+pos:])
parser.tokens[parser.tokens_head+pos] = *token
}
// NewParser creates a new parser object.
func NewParser() Parser {
return Parser{
raw_buffer: make([]byte, 0, input_raw_buffer_size),
buffer: make([]byte, 0, input_buffer_size),
}
}
// Delete a parser object.
func (parser *Parser) Delete() {
*parser = Parser{}
}
// String read handler.
func yamlStringReadHandler(parser *Parser, buffer []byte) (n int, err error) {
if parser.input_pos == len(parser.input) {
return 0, io.EOF
}
n = copy(buffer, parser.input[parser.input_pos:])
parser.input_pos += n
return n, nil
}
// Reader read handler.
func yamlReaderReadHandler(parser *Parser, buffer []byte) (n int, err error) {
return parser.input_reader.Read(buffer)
}
// SetInputString sets a string input.
func (parser *Parser) SetInputString(input []byte) {
if parser.read_handler != nil {
panic("must set the input source only once")
}
parser.read_handler = yamlStringReadHandler
parser.input = input
parser.input_pos = 0
}
// SetInputReader sets a file input.
func (parser *Parser) SetInputReader(r io.Reader) {
if parser.read_handler != nil {
panic("must set the input source only once")
}
parser.read_handler = yamlReaderReadHandler
parser.input_reader = r
}
// SetEncoding sets the source encoding.
func (parser *Parser) SetEncoding(encoding Encoding) {
if parser.encoding != ANY_ENCODING {
panic("must set the encoding only once")
}
parser.encoding = encoding
}
// GetPendingComments returns the parser's comment queue for CLI access.
func (parser *Parser) GetPendingComments() []Comment {
return parser.comments
}
// GetCommentsHead returns the current position in the comment queue.
func (parser *Parser) GetCommentsHead() int {
return parser.comments_head
}
// NewEmitter creates a new emitter object.
func NewEmitter() Emitter {
return Emitter{
buffer: make([]byte, output_buffer_size),
states: make([]EmitterState, 0, initial_stack_size),
events: make([]Event, 0, initial_queue_size),
best_width: -1,
}
}
// Delete an emitter object.
func (emitter *Emitter) Delete() {
*emitter = Emitter{}
}
// String write handler.
func yamlStringWriteHandler(emitter *Emitter, buffer []byte) error {
*emitter.output_buffer = append(*emitter.output_buffer, buffer...)
return nil
}
// yamlWriterWriteHandler uses emitter.output_writer to write the
// emitted text.
func yamlWriterWriteHandler(emitter *Emitter, buffer []byte) error {
_, err := emitter.output_writer.Write(buffer)
return err
}
// SetOutputString sets a string output.
func (emitter *Emitter) SetOutputString(output_buffer *[]byte) {
if emitter.write_handler != nil {
panic("must set the output target only once")
}
emitter.write_handler = yamlStringWriteHandler
emitter.output_buffer = output_buffer
}
// SetOutputWriter sets a file output.
func (emitter *Emitter) SetOutputWriter(w io.Writer) {
if emitter.write_handler != nil {
panic("must set the output target only once")
}
emitter.write_handler = yamlWriterWriteHandler
emitter.output_writer = w
}
// SetEncoding sets the output encoding.
func (emitter *Emitter) SetEncoding(encoding Encoding) {
if emitter.encoding != ANY_ENCODING {
panic("must set the output encoding only once")
}
emitter.encoding = encoding
}
// SetCanonical sets the canonical output style.
func (emitter *Emitter) SetCanonical(canonical bool) {
emitter.canonical = canonical
}
// SetIndent sets the indentation increment.
func (emitter *Emitter) SetIndent(indent int) {
if indent < 2 || indent > 9 {
indent = 2
}
emitter.BestIndent = indent
}
// SetWidth sets the preferred line width.
func (emitter *Emitter) SetWidth(width int) {
if width < 0 {
width = -1
}
emitter.best_width = width
}
// SetUnicode sets if unescaped non-ASCII characters are allowed.
func (emitter *Emitter) SetUnicode(unicode bool) {
emitter.unicode = unicode
}
// SetLineBreak sets the preferred line break character.
func (emitter *Emitter) SetLineBreak(line_break LineBreak) {
emitter.line_break = line_break
}
///*
// * Destroy a token object.
// */
//
//YAML_DECLARE(void)
//yaml_token_delete(yaml_token_t *token)
//{
// assert(token); // Non-NULL token object expected.
//
// switch (token.type)
// {
// case YAML_TAG_DIRECTIVE_TOKEN:
// yaml_free(token.data.tag_directive.handle);
// yaml_free(token.data.tag_directive.prefix);
// break;
//
// case YAML_ALIAS_TOKEN:
// yaml_free(token.data.alias.value);
// break;
//
// case YAML_ANCHOR_TOKEN:
// yaml_free(token.data.anchor.value);
// break;
//
// case YAML_TAG_TOKEN:
// yaml_free(token.data.tag.handle);
// yaml_free(token.data.tag.suffix);
// break;
//
// case YAML_SCALAR_TOKEN:
// yaml_free(token.data.scalar.value);
// break;
//
// default:
// break;
// }
//
// memset(token, 0, sizeof(yaml_token_t));
//}
//
///*
// * Check if a string is a valid UTF-8 sequence.
// *
// * Check 'reader.c' for more details on UTF-8 encoding.
// */
//
//static int
//yaml_check_utf8(yaml_char_t *start, size_t length)
//{
// yaml_char_t *end = start+length;
// yaml_char_t *pointer = start;
//
// while (pointer < end) {
// unsigned char octet;
// unsigned int width;
// unsigned int value;
// size_t k;
//
// octet = pointer[0];
// width = (octet & 0x80) == 0x00 ? 1 :
// (octet & 0xE0) == 0xC0 ? 2 :
// (octet & 0xF0) == 0xE0 ? 3 :
// (octet & 0xF8) == 0xF0 ? 4 : 0;
// value = (octet & 0x80) == 0x00 ? octet & 0x7F :
// (octet & 0xE0) == 0xC0 ? octet & 0x1F :
// (octet & 0xF0) == 0xE0 ? octet & 0x0F :
// (octet & 0xF8) == 0xF0 ? octet & 0x07 : 0;
// if (!width) return 0;
// if (pointer+width > end) return 0;
// for (k = 1; k < width; k ++) {
// octet = pointer[k];
// if ((octet & 0xC0) != 0x80) return 0;
// value = (value << 6) + (octet & 0x3F);
// }
// if (!((width == 1) ||
// (width == 2 && value >= 0x80) ||
// (width == 3 && value >= 0x800) ||
// (width == 4 && value >= 0x10000))) return 0;
//
// pointer += width;
// }
//
// return 1;
//}
//
// NewStreamStartEvent creates a new STREAM-START event.
func NewStreamStartEvent(encoding Encoding) Event {
return Event{
Type: STREAM_START_EVENT,
encoding: encoding,
}
}
// NewStreamEndEvent creates a new STREAM-END event.
func NewStreamEndEvent() Event {
return Event{
Type: STREAM_END_EVENT,
}
}
// NewDocumentStartEvent creates a new DOCUMENT-START event.
func NewDocumentStartEvent(version_directive *VersionDirective, tag_directives []TagDirective, implicit bool) Event {
return Event{
Type: DOCUMENT_START_EVENT,
versionDirective: version_directive,
tagDirectives: tag_directives,
Implicit: implicit,
}
}
// NewDocumentEndEvent creates a new DOCUMENT-END event.
func NewDocumentEndEvent(implicit bool) Event {
return Event{
Type: DOCUMENT_END_EVENT,
Implicit: implicit,
}
}
// NewAliasEvent creates a new ALIAS event.
func NewAliasEvent(anchor []byte) Event {
return Event{
Type: ALIAS_EVENT,
Anchor: anchor,
}
}
// NewScalarEvent creates a new SCALAR event.
func NewScalarEvent(anchor, tag, value []byte, plain_implicit, quoted_implicit bool, style ScalarStyle) Event {
return Event{
Type: SCALAR_EVENT,
Anchor: anchor,
Tag: tag,
Value: value,
Implicit: plain_implicit,
quoted_implicit: quoted_implicit,
Style: Style(style),
}
}
// NewSequenceStartEvent creates a new SEQUENCE-START event.
func NewSequenceStartEvent(anchor, tag []byte, implicit bool, style SequenceStyle) Event {
return Event{
Type: SEQUENCE_START_EVENT,
Anchor: anchor,
Tag: tag,
Implicit: implicit,
Style: Style(style),
}
}
// NewSequenceEndEvent creates a new SEQUENCE-END event.
func NewSequenceEndEvent() Event {
return Event{
Type: SEQUENCE_END_EVENT,
}
}
// NewMappingStartEvent creates a new MAPPING-START event.
func NewMappingStartEvent(anchor, tag []byte, implicit bool, style MappingStyle) Event {
return Event{
Type: MAPPING_START_EVENT,
Anchor: anchor,
Tag: tag,
Implicit: implicit,
Style: Style(style),
}
}
// NewMappingEndEvent creates a new MAPPING-END event.
func NewMappingEndEvent() Event {
return Event{
Type: MAPPING_END_EVENT,
}
}
// Delete an event object.
func (e *Event) Delete() {
*e = Event{}
}
///*
// * Create a document object.
// */
//
//YAML_DECLARE(int)
//yaml_document_initialize(document *yaml_document_t,
// version_directive *yaml_version_directive_t,
// tag_directives_start *yaml_tag_directive_t,
// tag_directives_end *yaml_tag_directive_t,
// start_implicit int, end_implicit int)
//{
// struct {
// error yaml_error_type_t
// } context
// struct {
// start *yaml_node_t
// end *yaml_node_t
// top *yaml_node_t
// } nodes = { NULL, NULL, NULL }
// version_directive_copy *yaml_version_directive_t = NULL
// struct {
// start *yaml_tag_directive_t
// end *yaml_tag_directive_t
// top *yaml_tag_directive_t
// } tag_directives_copy = { NULL, NULL, NULL }
// value yaml_tag_directive_t = { NULL, NULL }
// mark yaml_mark_t = { 0, 0, 0 }
//
// assert(document) // Non-NULL document object is expected.
// assert((tag_directives_start && tag_directives_end) ||
// (tag_directives_start == tag_directives_end))
// // Valid tag directives are expected.
//
// if (!STACK_INIT(&context, nodes, INITIAL_STACK_SIZE)) goto error
//
// if (version_directive) {
// version_directive_copy = yaml_malloc(sizeof(yaml_version_directive_t))
// if (!version_directive_copy) goto error
// version_directive_copy.major = version_directive.major
// version_directive_copy.minor = version_directive.minor
// }
//
// if (tag_directives_start != tag_directives_end) {
// tag_directive *yaml_tag_directive_t
// if (!STACK_INIT(&context, tag_directives_copy, INITIAL_STACK_SIZE))
// goto error
// for (tag_directive = tag_directives_start
// tag_directive != tag_directives_end; tag_directive ++) {
// assert(tag_directive.handle)
// assert(tag_directive.prefix)
// if (!yaml_check_utf8(tag_directive.handle,
// strlen((char *)tag_directive.handle)))
// goto error
// if (!yaml_check_utf8(tag_directive.prefix,
// strlen((char *)tag_directive.prefix)))
// goto error
// value.handle = yaml_strdup(tag_directive.handle)
// value.prefix = yaml_strdup(tag_directive.prefix)
// if (!value.handle || !value.prefix) goto error
// if (!PUSH(&context, tag_directives_copy, value))
// goto error
// value.handle = NULL
// value.prefix = NULL
// }
// }
//
// DOCUMENT_INIT(*document, nodes.start, nodes.end, version_directive_copy,
// tag_directives_copy.start, tag_directives_copy.top,
// start_implicit, end_implicit, mark, mark)
//
// return 1
//
//error:
// STACK_DEL(&context, nodes)
// yaml_free(version_directive_copy)
// while (!STACK_EMPTY(&context, tag_directives_copy)) {
// value yaml_tag_directive_t = POP(&context, tag_directives_copy)
// yaml_free(value.handle)
// yaml_free(value.prefix)
// }
// STACK_DEL(&context, tag_directives_copy)
// yaml_free(value.handle)
// yaml_free(value.prefix)
//
// return 0
//}
//
///*
// * Destroy a document object.
// */
//
//YAML_DECLARE(void)
//yaml_document_delete(document *yaml_document_t)
//{
// struct {
// error yaml_error_type_t
// } context
// tag_directive *yaml_tag_directive_t
//
// context.error = YAML_NO_ERROR // Eliminate a compiler warning.
//
// assert(document) // Non-NULL document object is expected.
//
// while (!STACK_EMPTY(&context, document.nodes)) {
// node yaml_node_t = POP(&context, document.nodes)
// yaml_free(node.tag)
// switch (node.type) {
// case YAML_SCALAR_NODE:
// yaml_free(node.data.scalar.value)
// break
// case YAML_SEQUENCE_NODE:
// STACK_DEL(&context, node.data.sequence.items)
// break
// case YAML_MAPPING_NODE:
// STACK_DEL(&context, node.data.mapping.pairs)
// break
// default:
// assert(0) // Should not happen.
// }
// }
// STACK_DEL(&context, document.nodes)
//
// yaml_free(document.version_directive)
// for (tag_directive = document.tag_directives.start
// tag_directive != document.tag_directives.end
// tag_directive++) {
// yaml_free(tag_directive.handle)
// yaml_free(tag_directive.prefix)
// }
// yaml_free(document.tag_directives.start)
//
// memset(document, 0, sizeof(yaml_document_t))
//}
//
///**
// * Get a document node.
// */
//
//YAML_DECLARE(yaml_node_t *)
//yaml_document_get_node(document *yaml_document_t, index int)
//{
// assert(document) // Non-NULL document object is expected.
//
// if (index > 0 && document.nodes.start + index <= document.nodes.top) {
// return document.nodes.start + index - 1
// }
// return NULL
//}
//
///**
// * Get the root object.
// */
//
//YAML_DECLARE(yaml_node_t *)
//yaml_document_get_root_node(document *yaml_document_t)
//{
// assert(document) // Non-NULL document object is expected.
//
// if (document.nodes.top != document.nodes.start) {
// return document.nodes.start
// }
// return NULL
//}
//
///*
// * Add a scalar node to a document.
// */
//
//YAML_DECLARE(int)
//yaml_document_add_scalar(document *yaml_document_t,
// tag *yaml_char_t, value *yaml_char_t, length int,
// style yaml_scalar_style_t)
//{
// struct {
// error yaml_error_type_t
// } context
// mark yaml_mark_t = { 0, 0, 0 }
// tag_copy *yaml_char_t = NULL
// value_copy *yaml_char_t = NULL
// node yaml_node_t
//
// assert(document) // Non-NULL document object is expected.
// assert(value) // Non-NULL value is expected.
//
// if (!tag) {
// tag = (yaml_char_t *)YAML_DEFAULT_SCALAR_TAG
// }
//
// if (!yaml_check_utf8(tag, strlen((char *)tag))) goto error
// tag_copy = yaml_strdup(tag)
// if (!tag_copy) goto error
//
// if (length < 0) {
// length = strlen((char *)value)
// }
//
// if (!yaml_check_utf8(value, length)) goto error
// value_copy = yaml_malloc(length+1)
// if (!value_copy) goto error
// memcpy(value_copy, value, length)
// value_copy[length] = '\0'
//
// SCALAR_NODE_INIT(node, tag_copy, value_copy, length, style, mark, mark)
// if (!PUSH(&context, document.nodes, node)) goto error
//
// return document.nodes.top - document.nodes.start
//
//error:
// yaml_free(tag_copy)
// yaml_free(value_copy)
//
// return 0
//}
//
///*
// * Add a sequence node to a document.
// */
//
//YAML_DECLARE(int)
//yaml_document_add_sequence(document *yaml_document_t,
// tag *yaml_char_t, style yaml_sequence_style_t)
//{
// struct {
// error yaml_error_type_t
// } context
// mark yaml_mark_t = { 0, 0, 0 }
// tag_copy *yaml_char_t = NULL
// struct {
// start *yaml_node_item_t
// end *yaml_node_item_t
// top *yaml_node_item_t
// } items = { NULL, NULL, NULL }
// node yaml_node_t
//
// assert(document) // Non-NULL document object is expected.
//
// if (!tag) {
// tag = (yaml_char_t *)YAML_DEFAULT_SEQUENCE_TAG
// }
//
// if (!yaml_check_utf8(tag, strlen((char *)tag))) goto error
// tag_copy = yaml_strdup(tag)
// if (!tag_copy) goto error
//
// if (!STACK_INIT(&context, items, INITIAL_STACK_SIZE)) goto error
//
// SEQUENCE_NODE_INIT(node, tag_copy, items.start, items.end,
// style, mark, mark)
// if (!PUSH(&context, document.nodes, node)) goto error
//
// return document.nodes.top - document.nodes.start
//
//error:
// STACK_DEL(&context, items)
// yaml_free(tag_copy)
//
// return 0
//}
//
///*
// * Add a mapping node to a document.
// */
//
//YAML_DECLARE(int)
//yaml_document_add_mapping(document *yaml_document_t,
// tag *yaml_char_t, style yaml_mapping_style_t)
//{
// struct {
// error yaml_error_type_t
// } context
// mark yaml_mark_t = { 0, 0, 0 }
// tag_copy *yaml_char_t = NULL
// struct {
// start *yaml_node_pair_t
// end *yaml_node_pair_t
// top *yaml_node_pair_t
// } pairs = { NULL, NULL, NULL }
// node yaml_node_t
//
// assert(document) // Non-NULL document object is expected.
//
// if (!tag) {
// tag = (yaml_char_t *)YAML_DEFAULT_MAPPING_TAG
// }
//
// if (!yaml_check_utf8(tag, strlen((char *)tag))) goto error
// tag_copy = yaml_strdup(tag)
// if (!tag_copy) goto error
//
// if (!STACK_INIT(&context, pairs, INITIAL_STACK_SIZE)) goto error
//
// MAPPING_NODE_INIT(node, tag_copy, pairs.start, pairs.end,
// style, mark, mark)
// if (!PUSH(&context, document.nodes, node)) goto error
//
// return document.nodes.top - document.nodes.start
//
//error:
// STACK_DEL(&context, pairs)
// yaml_free(tag_copy)
//
// return 0
//}
//
///*
// * Append an item to a sequence node.
// */
//
//YAML_DECLARE(int)
//yaml_document_append_sequence_item(document *yaml_document_t,
// sequence int, item int)
//{
// struct {
// error yaml_error_type_t
// } context
//
// assert(document) // Non-NULL document is required.
// assert(sequence > 0
// && document.nodes.start + sequence <= document.nodes.top)
// // Valid sequence id is required.
// assert(document.nodes.start[sequence-1].type == YAML_SEQUENCE_NODE)
// // A sequence node is required.
// assert(item > 0 && document.nodes.start + item <= document.nodes.top)
// // Valid item id is required.
//
// if (!PUSH(&context,
// document.nodes.start[sequence-1].data.sequence.items, item))
// return 0
//
// return 1
//}
//
///*
// * Append a pair of a key and a value to a mapping node.
// */
//
//YAML_DECLARE(int)
//yaml_document_append_mapping_pair(document *yaml_document_t,
// mapping int, key int, value int)
//{
// struct {
// error yaml_error_type_t
// } context
//
// pair yaml_node_pair_t
//
// assert(document) // Non-NULL document is required.
// assert(mapping > 0
// && document.nodes.start + mapping <= document.nodes.top)
// // Valid mapping id is required.
// assert(document.nodes.start[mapping-1].type == YAML_MAPPING_NODE)
// // A mapping node is required.
// assert(key > 0 && document.nodes.start + key <= document.nodes.top)
// // Valid key id is required.
// assert(value > 0 && document.nodes.start + value <= document.nodes.top)
// // Valid value id is required.
//
// pair.key = key
// pair.value = value
//
// if (!PUSH(&context,
// document.nodes.start[mapping-1].data.mapping.pairs, pair))
// return 0
//
// return 1
//}
//
//
+188 -137
View File
@@ -24,112 +24,48 @@ type Composer struct {
returnStream bool // flag to return stream node next
atStreamEnd bool // at stream end
encoding Encoding // stream encoding from STREAM_START
opts *Options // options for loading
}
// NewComposer creates a new composer from a byte slice.
func NewComposer(b []byte) *Composer {
func NewComposer(b []byte, opts *Options) *Composer {
p := Composer{
Parser: NewParser(),
opts: opts,
}
if len(b) == 0 {
b = []byte{'\n'}
}
p.Parser.SetInputString(b)
if opts != nil {
p.Parser.depthCheck = opts.DepthCheck
}
return &p
}
// NewComposerFromReader creates a new composer from an io.Reader.
func NewComposerFromReader(r io.Reader) *Composer {
// NewComposerFromReader creates a new composer from an [io.Reader].
func NewComposerFromReader(r io.Reader, opts *Options) *Composer {
p := Composer{
Parser: NewParser(),
opts: opts,
}
p.Parser.SetInputReader(r)
if opts != nil {
p.Parser.depthCheck = opts.DepthCheck
}
return &p
}
func (c *Composer) init() {
if c.doneInit {
return
}
c.anchors = make(map[string]*Node)
// Peek to get the encoding from STREAM_START_EVENT
if c.peek() == STREAM_START_EVENT {
c.encoding = c.event.GetEncoding()
}
c.expect(STREAM_START_EVENT)
c.doneInit = true
// If stream nodes are enabled, prepare to return the first stream node
if c.streamNodes {
c.returnStream = true
}
}
func (c *Composer) Destroy() {
if c.event.Type != NO_EVENT {
c.event.Delete()
}
c.Parser.Delete()
}
// SetStreamNodes enables or disables stream node emission.
func (c *Composer) SetStreamNodes(enable bool) {
c.streamNodes = enable
}
// expect consumes an event from the event stream and
// checks that it's of the expected type.
func (c *Composer) expect(e EventType) {
if c.event.Type == NO_EVENT {
if err := c.Parser.Parse(&c.event); err != nil {
c.fail(err)
}
}
if c.event.Type == STREAM_END_EVENT {
failf("attempted to go past the end of stream; corrupted value?")
}
if c.event.Type != e {
c.fail(fmt.Errorf("expected %s event but got %s", e, c.event.Type))
}
c.event.Delete()
c.event.Type = NO_EVENT
}
// peek peeks at the next event in the event stream,
// puts the results into c.event and returns the event type.
func (c *Composer) peek() EventType {
if c.event.Type != NO_EVENT {
return c.event.Type
}
// It's curious choice from the underlying API to generally return a
// positive result on success, but on this case return true in an error
// scenario. This was the source of bugs in the past (issue #666).
if err := c.Parser.Parse(&c.event); err != nil {
c.fail(err)
}
return c.event.Type
}
func (c *Composer) fail(err error) {
Fail(err)
}
func (c *Composer) anchor(n *Node, anchor []byte) {
if anchor != nil {
n.Anchor = string(anchor)
c.anchors[n.Anchor] = n
}
}
// Parse parses the next YAML node from the event stream.
func (c *Composer) Parse() *Node {
// Compose composes the next YAML node from the event stream.
func (c *Composer) Compose() *Node {
c.init()
// Handle stream nodes if enabled
if c.streamNodes {
// Check for stream end first
if c.peek() == STREAM_END_EVENT {
// If we haven't returned the final stream node yet, return it now
// If we haven't returned the final stream node yet,
// return it now
if !c.atStreamEnd {
c.atStreamEnd = true
return c.createStreamNode()
@@ -138,7 +74,8 @@ func (c *Composer) Parse() *Node {
return nil
}
// Check if we should return a stream node before the next document
// Check if we should return a stream node before the next
// document
if c.returnStream {
c.returnStream = false
n := c.createStreamNode()
@@ -160,7 +97,8 @@ func (c *Composer) Parse() *Node {
case DOCUMENT_START_EVENT:
return c.document()
case STREAM_END_EVENT:
// Happens when attempting to decode an empty buffer (when not using stream nodes).
// Happens when attempting to decode an empty buffer (when not
// using stream nodes).
return nil
case TAIL_COMMENT_EVENT:
panic("internal error: unexpected tail comment event (please report)")
@@ -169,18 +107,17 @@ func (c *Composer) Parse() *Node {
}
}
func (c *Composer) node(kind Kind, defaultTag, tag, value string) *Node {
// node creates a new node with the given kind, tag, and value, and attaches
// position and comment information from the current event.
func (c *Composer) node(kind Kind, tag, value string) *Node {
var style Style
if tag != "" && tag != "!" {
// Normalize tag to short form (e.g., tag:yaml.org,2002:str -> !!str)
tag = shortTag(tag)
style = TaggedStyle
} else if defaultTag != "" {
tag = defaultTag
} else if kind == ScalarNode {
// Delegate to resolver to determine tag from value
tag, _ = resolve("", value)
}
// Note: Nodes without explicit tags are left with empty tags.
// Tag defaulting happens in a separate stage via Resolver.
n := &Node{
Kind: kind,
Tag: tag,
@@ -188,8 +125,8 @@ func (c *Composer) node(kind Kind, defaultTag, tag, value string) *Node {
Style: style,
}
if !c.Textless {
n.Line = c.event.StartMark.Line + 1
n.Column = c.event.StartMark.Column + 1
n.Line = c.event.StartMark.Line
n.Column = c.event.StartMark.Column
n.HeadComment = string(c.event.HeadComment)
n.LineComment = string(c.event.LineComment)
n.FootComment = string(c.event.FootComment)
@@ -197,14 +134,10 @@ func (c *Composer) node(kind Kind, defaultTag, tag, value string) *Node {
return n
}
func (c *Composer) parseChild(parent *Node) *Node {
child := c.Parse()
parent.Content = append(parent.Content, child)
return child
}
// document composes a document node by parsing its content between
// DOCUMENT_START and DOCUMENT_END events.
func (c *Composer) document() *Node {
n := c.node(DocumentNode, "", "", "")
n := c.node(DocumentNode, "", "")
c.doc = n
c.expect(DOCUMENT_START_EVENT)
c.parseChild(n)
@@ -221,56 +154,35 @@ func (c *Composer) document() *Node {
return n
}
// createStreamNode creates a stream node with encoding information.
func (c *Composer) createStreamNode() *Node {
n := &Node{
Kind: StreamNode,
Encoding: c.encoding,
Kind: StreamNode,
Stream: &Stream{Encoding: c.encoding},
}
if !c.Textless && c.event.Type != NO_EVENT {
n.Line = c.event.StartMark.Line + 1
n.Column = c.event.StartMark.Column + 1
n.Line = c.event.StartMark.Line
n.Column = c.event.StartMark.Column
}
return n
}
// captureDirectives captures version and tag directives from upcoming DOCUMENT_START.
func (c *Composer) captureDirectives(n *Node) {
if c.peek() == DOCUMENT_START_EVENT {
if vd := c.event.GetVersionDirective(); vd != nil {
n.Version = &StreamVersionDirective{
Major: vd.Major(),
Minor: vd.Minor(),
}
}
if tds := c.event.GetTagDirectives(); len(tds) > 0 {
n.TagDirectives = make([]StreamTagDirective, len(tds))
for i, td := range tds {
n.TagDirectives[i] = StreamTagDirective{
Handle: td.GetHandle(),
Prefix: td.GetPrefix(),
}
}
}
}
}
// alias composes an alias node by resolving the referenced anchor.
func (c *Composer) alias() *Node {
n := c.node(AliasNode, "", "", string(c.event.Anchor))
n := c.node(AliasNode, "", string(c.event.Anchor))
n.Alias = c.anchors[n.Value]
if n.Alias == nil {
msg := fmt.Sprintf("unknown anchor '%s' referenced", n.Value)
Fail(&ParserError{
Message: msg,
Mark: Mark{
Line: n.Line,
Column: n.Column,
},
})
Fail(formatComposerError(msg, Mark{
Line: n.Line,
Column: n.Column,
}))
}
c.expect(ALIAS_EVENT)
return n
}
// scalar composes a scalar node with value, tag, and style information.
func (c *Composer) scalar() *Node {
parsedStyle := c.event.ScalarStyle()
var nodeStyle Style
@@ -286,19 +198,17 @@ func (c *Composer) scalar() *Node {
}
nodeValue := string(c.event.Value)
nodeTag := string(c.event.Tag)
var defaultTag string
if nodeStyle != 0 {
defaultTag = strTag
}
n := c.node(ScalarNode, defaultTag, nodeTag, nodeValue)
n := c.node(ScalarNode, nodeTag, nodeValue)
n.Style |= nodeStyle
c.anchor(n, c.event.Anchor)
c.expect(SCALAR_EVENT)
return n
}
// sequence composes a sequence node by parsing elements between
// SEQUENCE_START and SEQUENCE_END events.
func (c *Composer) sequence() *Node {
n := c.node(SequenceNode, seqTag, string(c.event.Tag), "")
n := c.node(SequenceNode, string(c.event.Tag), "")
if c.event.SequenceStyle()&FLOW_SEQUENCE_STYLE != 0 {
n.Style |= FlowStyle
}
@@ -313,8 +223,10 @@ func (c *Composer) sequence() *Node {
return n
}
// mapping composes a mapping node by parsing key-value pairs between
// MAPPING_START and MAPPING_END events, handling foot comments appropriately.
func (c *Composer) mapping() *Node {
n := c.node(MappingNode, mapTag, string(c.event.Tag), "")
n := c.node(MappingNode, string(c.event.Tag), "")
block := true
if c.event.MappingStyle()&FLOW_MAPPING_STYLE != 0 {
block = false
@@ -353,10 +265,149 @@ func (c *Composer) mapping() *Node {
return n
}
// init initializes the composer by setting up the anchor map and consuming
// the STREAM_START event.
func (c *Composer) init() {
if c.doneInit {
return
}
c.anchors = make(map[string]*Node)
// Peek to get the encoding from STREAM_START_EVENT
if c.peek() == STREAM_START_EVENT {
c.encoding = c.event.GetEncoding()
}
c.expect(STREAM_START_EVENT)
c.doneInit = true
// If stream nodes are enabled, prepare to return the first stream node
if c.streamNodes {
c.returnStream = true
}
}
// Destroy cleans up the composer by deleting any pending event and the
// underlying parser.
func (c *Composer) Destroy() {
if c.event.Type != NO_EVENT {
c.event.Delete()
}
c.Parser.Delete()
}
// SetStreamNodes enables or disables stream node emission.
func (c *Composer) SetStreamNodes(enable bool) {
c.streamNodes = enable
}
// expect consumes an event from the event stream and
// checks that it's of the expected type.
func (c *Composer) expect(e EventType) {
if c.event.Type == NO_EVENT {
if err := c.Parser.Parse(&c.event); err != nil {
c.fail(err)
}
}
if c.event.Type == STREAM_END_EVENT {
Fail(formatComposerError(
"attempted to go past the end of stream; corrupted value?",
Mark{Line: c.event.StartMark.Line, Column: c.event.StartMark.Column},
))
}
if c.event.Type != e {
Fail(formatComposerError(
fmt.Sprintf("expected %s event but got %s", e, c.event.Type),
Mark{Line: c.event.StartMark.Line, Column: c.event.StartMark.Column},
))
}
c.event.Delete()
c.event.Type = NO_EVENT
}
// peek peeks at the next event in the event stream,
// puts the results into c.event and returns the event type.
func (c *Composer) peek() EventType {
if c.event.Type != NO_EVENT {
return c.event.Type
}
// It's curious choice from the underlying API to generally return a
// positive result on success, but on this case return true in an error
// scenario. This was the source of bugs in the past (issue #666).
if err := c.Parser.Parse(&c.event); err != nil {
c.fail(err)
}
return c.event.Type
}
// fail panics with the given error.
func (c *Composer) fail(err error) {
Fail(err)
}
// anchor sets the anchor name on a node and records it in the anchor map.
func (c *Composer) anchor(n *Node, anchor []byte) {
if anchor != nil {
n.Anchor = string(anchor)
c.anchors[n.Anchor] = n
}
}
// parseChild composes the next node and adds it as a child to the parent.
func (c *Composer) parseChild(parent *Node) *Node {
child := c.Compose()
parent.Content = append(parent.Content, child)
return child
}
// captureDirectives captures version and tag directives from upcoming
// DOCUMENT_START.
// The node n must have Stream initialized (as created by createStreamNode).
func (c *Composer) captureDirectives(n *Node) {
if c.peek() == DOCUMENT_START_EVENT {
if vd := c.event.GetVersionDirective(); vd != nil {
n.Stream.Version = &StreamVersionDirective{
Major: vd.Major(),
Minor: vd.Minor(),
}
}
if tds := c.event.GetTagDirectives(); len(tds) > 0 {
n.Stream.TagDirectives = make([]StreamTagDirective, len(tds))
for i, td := range tds {
n.Stream.TagDirectives[i] = StreamTagDirective{
Handle: td.GetHandle(),
Prefix: td.GetPrefix(),
}
}
}
}
}
// Fail panics with a YAMLError wrapping the given error.
func Fail(err error) {
panic(&YAMLError{err})
}
// failf panics with a YAMLError containing a formatted error message.
func failf(format string, args ...any) {
panic(&YAMLError{fmt.Errorf("yaml: "+format, args...)})
}
// formatComposerError creates a LoadError for composer-stage errors.
func formatComposerError(message string, mark Mark) *LoadError {
return &LoadError{
Stage: ComposerStage,
Mark: mark,
Message: message,
}
}
// formatComposerErrorContext creates a LoadError with both context and
// problem information for composer-stage errors.
func formatComposerErrorContext(context string, contextMark Mark, message string, mark Mark) *LoadError {
return &LoadError{
Stage: ComposerStage,
ContextMark: contextMark,
ContextMsg: context,
Mark: mark,
Message: message,
}
}
File diff suppressed because it is too large Load Diff
+152
View File
@@ -0,0 +1,152 @@
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// Desolver stage: Removes inferable tags from YAML nodes.
// This is the inverse of the Resolver - it walks a tagged node tree and
// removes tags that can be inferred during parsing, producing cleaner YAML
// output without unnecessary type annotations.
package libyaml
// Desolver handles tag removal for YAML nodes during serialization.
// It removes tags that would be automatically resolved to the same type
// during parsing, making the output cleaner and more readable.
type Desolver struct {
opts *Options
}
// NewDesolver creates a new Desolver with the given options.
func NewDesolver(opts *Options) *Desolver {
return &Desolver{opts: opts}
}
// Desolve walks the node tree and removes tags that can be inferred.
// This is the inverse of Resolver - it takes a fully-tagged node tree
// (from Representer) and removes unnecessary tags to produce clean output.
//
// For scalar nodes: if the value would resolve to the same tag when parsed,
// the tag is removed. For strings that would resolve differently, the tag is
// removed and quoting style is set to preserve the string type.
//
// For collection nodes (maps/sequences): default tags (!!map, !!seq) are
// removed since they're implied by the structure.
func (d *Desolver) Desolve(n *Node) {
if n == nil {
return
}
switch n.Kind {
case ScalarNode:
d.desolveScalar(n)
case DocumentNode, SequenceNode, MappingNode:
d.desolveCollection(n)
// Recursively desolve children
for _, child := range n.Content {
d.Desolve(child)
}
case AliasNode:
// Alias nodes don't have tags to remove
}
}
// desolveScalar removes tags from scalar nodes when they can be inferred.
func (d *Desolver) desolveScalar(n *Node) {
// If explicitly tagged by user (TaggedStyle), keep it
if n.Style&TaggedStyle != 0 {
return
}
// Empty tag means it's already untagged - nothing to do
if n.Tag == "" {
return
}
stag := shortTag(n.Tag)
// Check if this is a standard scalar tag that we can potentially remove
isStandardTag := false
switch stag {
case nullTag, boolTag, strTag, intTag, floatTag, timestampTag:
isStandardTag = true
case binaryTag:
// Binary scalars are not implicitly resolvable - never remove.
return
case mergeTag:
// Elide the implicit !!merge tag when the value is the canonical
// merge key marker. The TaggedStyle early-return above already
// preserves !!merge when it was explicit in the source.
if n.Value == "<<" {
n.Tag = ""
}
return
default:
// Custom tag - preserve it
return
}
// Only process standard tags from here
if !isStandardTag {
return
}
// What tag would this value resolve to?
rtag, _ := resolve("", n.Value)
// If resolved tag matches current tag, we can elide the tag
if rtag == stag {
// Tag can be inferred - remove it
n.Tag = ""
} else if stag == strTag {
// This is a string type, but would resolve to something else.
// Remove the tag and force quoting to preserve string type.
n.Tag = ""
// If not already quoted, set quote style based on content
if n.Style&(SingleQuotedStyle|DoubleQuotedStyle|LiteralStyle|FoldedStyle) == 0 {
// Determine quote style based on options or default to single quotes
if d.opts != nil {
// Convert ScalarStyle to Style
switch d.opts.QuotePreference.ScalarStyle() {
case DOUBLE_QUOTED_SCALAR_STYLE:
n.Style |= DoubleQuotedStyle
default:
n.Style |= SingleQuotedStyle
}
} else {
n.Style |= SingleQuotedStyle
}
}
} else if stag == floatTag || stag == intTag {
// For numeric type mismatches (like float64(1) → "1" with !!float tag):
// Elide the tag and let YAML resolve naturally.
// Without the tag, "1" resolves as !!int, which may change the type,
// but that's acceptable for cleaner output (and matches old behavior).
n.Tag = ""
}
// For other standard tags with mismatches, keep the tag to preserve type
}
// desolveCollection removes default tags from collection nodes.
func (d *Desolver) desolveCollection(n *Node) {
// If explicitly tagged by user, keep it
if n.Style&TaggedStyle != 0 {
return
}
stag := shortTag(n.Tag)
switch n.Kind {
case MappingNode:
// !!map is the default for mappings - remove it
if stag == mapTag {
n.Tag = ""
}
case SequenceNode:
// !!seq is the default for sequences - remove it
if stag == seqTag {
n.Tag = ""
}
case DocumentNode:
// Documents don't have tags in YAML output
n.Tag = ""
}
// For other tags, keep them - they're explicit type information
}
+149
View File
@@ -0,0 +1,149 @@
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// This file contains the Dumper API for writing YAML documents.
//
// Primary functions:
// - Dump: Encode value(s) to YAML (use WithAll for multi-doc)
// - NewDumper: Create a streaming dumper to io.Writer
package libyaml
import (
"bytes"
"io"
"reflect"
)
// A Dumper writes YAML values to an output stream with configurable options.
// It uses a 3-stage pipeline mirroring the Loader:
// 1. Representer: Go values → Tagged Node tree
// 2. Desolver: Remove inferable tags
// 3. Serializer: Node tree → Events → YAML
type Dumper struct {
representer *Representer
desolver *Desolver
serializer *Serializer
options *Options
}
// NewDumper returns a new Dumper that writes to w with the given options.
//
// The Dumper should be closed after use to flush all data to w.
func NewDumper(w io.Writer, opts ...Option) (*Dumper, error) {
o, err := ApplyOptions(opts...)
if err != nil {
return nil, err
}
return &Dumper{
representer: NewRepresenter(o), // No writer - builds nodes
desolver: NewDesolver(o),
serializer: NewSerializer(w, o), // Writer here - emits YAML
options: o,
}, nil
}
// Dump encodes a value to YAML with the given options.
//
// By default, Dump encodes a single value as a single YAML document.
//
// Use WithAllDocuments() to encode multiple values as a multi-document stream:
//
// docs := []Config{config1, config2, config3}
// yaml.Dump(docs, yaml.WithAllDocuments())
//
// When WithAllDocuments is used, in must be a slice.
// Each element is encoded as a separate YAML document with "---" separators.
//
// See [Marshal] for details about the conversion of Go values to YAML.
func Dump(in any, opts ...Option) (out []byte, err error) {
defer handleErr(&err)
o, err := ApplyOptions(opts...)
if err != nil {
return nil, err
}
var buf bytes.Buffer
d, err := NewDumper(&buf, func(opts *Options) error {
*opts = *o // Copy options
return nil
})
if err != nil {
return nil, err
}
if o.AllDocuments {
// Multi-document mode: in must be a slice
inVal := reflect.ValueOf(in)
if inVal.Kind() != reflect.Slice {
return nil, &DumpError{
Stage: RepresenterStage,
Message: "WithAllDocuments requires a slice input",
}
}
// Dump each element as a separate document
for i := 0; i < inVal.Len(); i++ {
if err := d.Dump(inVal.Index(i).Interface()); err != nil {
return nil, err
}
}
} else {
// Single-document mode
if err := d.Dump(in); err != nil {
return nil, err
}
}
if err := d.Close(); err != nil {
return nil, err
}
return buf.Bytes(), nil
}
// Dump writes the YAML encoding of v to the stream.
//
// If multiple values are dumped to the stream, the second and subsequent
// documents will be preceded with a "---" document separator.
//
// See the documentation for [Marshal] for details about the conversion of Go
// values to YAML.
func (d *Dumper) Dump(v any) (err error) {
defer handleErr(&err)
// Stage 1: Represent - Go values → Tagged Node tree
node := d.representer.Represent("", reflect.ValueOf(v))
// Stage 2: Desolve - Remove inferable tags
d.desolver.Desolve(node)
// Stage 3: Serialize - Node tree → Events → YAML
d.serializer.Serialize(node)
return nil
}
// Close closes the Dumper by writing any remaining data.
// It does not write a stream terminating string "...".
func (d *Dumper) Close() (err error) {
defer handleErr(&err)
d.serializer.Finish()
return nil
}
// SetIndent changes the indentation used when encoding.
// This is used by the legacy Encoder.SetIndent() method.
func (d *Dumper) SetIndent(spaces int) {
if spaces < 0 {
failDumpf(SerializerStage, "cannot indent to a negative number of spaces")
}
// Set on serializer's emitter
d.serializer.Emitter.BestIndent = spaces
}
// SetCompactSeqIndent controls whether '- ' is considered part of the indentation.
// This is used by the legacy Encoder methods.
func (d *Dumper) SetCompactSeqIndent(compact bool) {
d.serializer.Emitter.CompactSequenceIndent = compact
}
+586 -298
View File
File diff suppressed because it is too large Load Diff
+145 -57
View File
@@ -12,111 +12,185 @@ import (
"strings"
)
type MarkedYAMLError struct {
// optional context
ContextMark Mark
ContextMessage string
// Stage identifies the processing stage where an error occurred during YAML
// loading or dumping.
type Stage string
Mark Mark
Message string
const (
// Load stages
ReaderStage Stage = "reader" // Input reading and encoding
ScannerStage Stage = "scanner" // Tokenization
ParserStage Stage = "parser" // Event stream parsing
ComposerStage Stage = "composer" // Node tree construction
ResolverStage Stage = "resolver" // Tag resolution
ConstructorStage Stage = "constructor" // Go value construction
// Dump stages
RepresenterStage Stage = "representer" // Go value to Node tree
SerializerStage Stage = "serializer" // Node tree to events
EmitterStage Stage = "emitter" // Events to YAML bytes
WriterStage Stage = "writer" // Output writing
)
// LoadError represents an error that occurred while loading a YAML document.
//
// It provides detailed location information and identifies the processing
// stage where the error occurred.
type LoadError struct {
Stage Stage // Processing stage where error occurred
Message string // Error description
// Position information
Mark Mark // Primary error position
ContextMark Mark // Optional context position (e.g., start of construct)
ContextMsg string // Optional context message
// Error chaining
err error // Underlying error (for Unwrap support)
}
func (e MarkedYAMLError) Error() string {
var builder strings.Builder
builder.WriteString("yaml: ")
if len(e.ContextMessage) > 0 {
fmt.Fprintf(&builder, "%s at %s: ", e.ContextMessage, e.ContextMark)
// Error returns the error message with stage and position information.
// Format: "go-yaml load error in <stage> at L:C: <message>"
// Or with context: "go-yaml load error in <stage> (<ctx>) at L:C-L:C: <message>"
func (e *LoadError) Error() string {
if len(e.ContextMsg) > 0 {
return fmt.Sprintf("go-yaml load error in %s (%s) at %s: %s",
e.Stage, e.ContextMsg, e.ContextMark.rangeString(e.Mark), e.Message)
}
if len(e.ContextMessage) == 0 || e.ContextMark != e.Mark {
fmt.Fprintf(&builder, "%s: ", e.Mark)
return fmt.Sprintf("go-yaml load error in %s at %s: %s",
e.Stage, e.Mark.shortString(), e.Message)
}
// simpleError returns the error message without the "yaml: Load error (in stage)" prefix.
// Used for formatting errors within LoadErrors collections.
// Format: "line L: <message>" (backwards compatible - no column info)
func (e *LoadError) simpleError() string {
var builder strings.Builder
if len(e.ContextMsg) > 0 {
fmt.Fprintf(&builder, "%s at %s: ", e.ContextMsg, e.ContextMark)
}
if len(e.ContextMsg) == 0 || e.ContextMark != e.Mark {
if e.Mark.Line > 0 {
fmt.Fprintf(&builder, "line %d: ", e.Mark.Line)
} else {
builder.WriteString("<unknown position>: ")
}
}
builder.WriteString(e.Message)
return builder.String()
}
type ParserError MarkedYAMLError
func (e ParserError) Error() string {
return MarkedYAMLError(e).Error()
// Unwrap returns the underlying error.
func (e *LoadError) Unwrap() error {
return e.err
}
type ScannerError MarkedYAMLError
func (e ScannerError) Error() string {
return MarkedYAMLError(e).Error()
// NewLoadError creates a LoadError with an underlying cause.
// The cause is accessible via Unwrap for use with [errors.Is] and [errors.As].
func NewLoadError(stage Stage, message string, mark Mark, cause error) *LoadError {
return &LoadError{
Stage: stage,
Message: message,
Mark: mark,
err: cause,
}
}
type ReaderError struct {
Offset int
Value int
Err error
// DumpError represents an error that occurred while dumping a YAML document.
//
// It identifies the processing stage where the error occurred and provides
// an optional underlying cause via Unwrap.
type DumpError struct {
Stage Stage // Processing stage where error occurred
Message string // Error description
// Error chaining
err error // Underlying error (for Unwrap support)
}
func (e ReaderError) Error() string {
return fmt.Sprintf("yaml: offset %d: %s", e.Offset, e.Err)
// Error returns the error message with stage information.
// Format: "go-yaml dump error in <stage>: <message>"
func (e *DumpError) Error() string {
return fmt.Sprintf("go-yaml dump error in %s: %s", e.Stage, e.Message)
}
func (e ReaderError) Unwrap() error {
return e.Err
// Unwrap returns the underlying error.
func (e *DumpError) Unwrap() error {
return e.err
}
// NewDumpError creates a DumpError with an underlying cause.
// The cause is accessible via Unwrap for use with [errors.Is] and [errors.As].
func NewDumpError(stage Stage, message string, cause error) *DumpError {
return &DumpError{Stage: stage, Message: message, err: cause}
}
// failDump panics with a YAMLError wrapping a DumpError for the given stage.
// If err is exactly a *DumpError it is passed through unchanged to avoid
// double-wrapping (e.g. a user MarshalYAML that returns yaml.NewDumpError).
// Errors that merely wrap a *DumpError are treated as ordinary errors so that
// the outer wrapper's message and context are preserved.
func failDump(stage Stage, err error) {
if de, ok := err.(*DumpError); ok {
panic(&YAMLError{de})
}
panic(&YAMLError{&DumpError{Stage: stage, Message: err.Error(), err: err}})
}
// failDumpf panics with a YAMLError wrapping a formatted DumpError.
func failDumpf(stage Stage, format string, args ...any) {
panic(&YAMLError{&DumpError{Stage: stage, Message: fmt.Sprintf(format, args...)}})
}
// EmitterError represents an error that occurred during emitting.
type EmitterError struct {
Message string
}
// Error returns the error message.
func (e EmitterError) Error() string {
return fmt.Sprintf("yaml: %s", e.Message)
}
// WriterError represents an error that occurred while writing output.
type WriterError struct {
Err error
}
// Error returns the error message.
func (e WriterError) Error() string {
return fmt.Sprintf("yaml: %s", e.Err)
}
// Unwrap returns the underlying error.
func (e WriterError) Unwrap() error {
return e.Err
}
// ConstructError represents a single, non-fatal error that occurred during
// the constructing of a YAML document into a Go value.
type ConstructError struct {
Err error
Line int
Column int
}
func (e *ConstructError) Error() string {
return fmt.Sprintf("line %d: %s", e.Line, e.Err.Error())
}
func (e *ConstructError) Unwrap() error {
return e.Err
}
// LoadErrors is returned when one or more fields cannot be properly decoded.
type LoadErrors struct {
Errors []*ConstructError
Errors []*LoadError
}
// Error returns a formatted error message listing all construct errors.
func (e *LoadErrors) Error() string {
var b strings.Builder
b.WriteString("yaml: construct errors:")
for _, err := range e.Errors {
b.WriteString("\n ")
b.WriteString(err.Error())
b.WriteString("yaml: construct errors: ")
for i, err := range e.Errors {
if i > 0 {
b.WriteString("; ")
}
b.WriteString(err.simpleError())
}
return b.String()
}
// As implements errors.As for Go versions prior to 1.20 that don't support
// As implements [errors.As] for Go versions prior to 1.20 that don't support
// the Unwrap() []error interface. It allows [LoadErrors] to match against
// *ConstructError targets by returning the first error in the list.
// *LoadError or *TypeError targets.
func (e *LoadErrors) As(target any) bool {
switch t := target.(type) {
case **ConstructError:
case **LoadError:
if len(e.Errors) == 0 {
return false
}
@@ -125,7 +199,7 @@ func (e *LoadErrors) As(target any) bool {
case **TypeError:
var msgs []string
for _, err := range e.Errors {
msgs = append(msgs, err.Error())
msgs = append(msgs, err.simpleError())
}
*t = &TypeError{Errors: msgs}
return true
@@ -133,7 +207,7 @@ func (e *LoadErrors) As(target any) bool {
return false
}
// Is implements errors.Is for Go versions prior to 1.20 that don't support
// Is implements [errors.Is] for Go versions prior to 1.20 that don't support
// the Unwrap() []error interface. It checks if any wrapped error matches
// the target error.
func (e *LoadErrors) Is(target error) bool {
@@ -145,7 +219,7 @@ func (e *LoadErrors) Is(target error) bool {
return false
}
// TypeError is an obsolete error type retained for compatibility.
// TypeError is a legacy error type retained for compatibility.
//
// A TypeError is returned by Unmarshal when one or more fields in
// the YAML document cannot be properly decoded into the requested
@@ -157,8 +231,9 @@ type TypeError struct {
Errors []string
}
// Error returns a formatted error message listing all unmarshal errors.
func (e *TypeError) Error() string {
return fmt.Sprintf("yaml: unmarshal errors:\n %s", strings.Join(e.Errors, "\n "))
return fmt.Sprintf("yaml: unmarshal errors: %s", strings.Join(e.Errors, "; "))
}
// YAMLError is an internal error wrapper type.
@@ -166,6 +241,19 @@ type YAMLError struct {
Err error
}
// Error returns the error message.
func (e *YAMLError) Error() string {
return e.Err.Error()
}
// handleErr recovers from panics caused by yaml errors.
// It's used in defer statements to convert YAMLError panics into regular errors.
func handleErr(err *error) {
if v := recover(); v != nil {
if e, ok := v.(*YAMLError); ok {
*err = e.Err
} else {
panic(v)
}
}
}
+298
View File
@@ -0,0 +1,298 @@
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// This file contains the Loader API for reading YAML documents.
//
// Primary functions:
// - Load: Decode YAML document(s) into a value (use WithAll for multi-doc)
// - NewLoader: Create a streaming loader from io.Reader
package libyaml
import (
"bytes"
"errors"
"io"
"reflect"
)
// A Loader reads and loads YAML values from an input stream with configurable
// options.
type Loader struct {
composer *Composer
resolver *Resolver
constructor *Constructor
options *Options
docCount int
}
// NewLoader returns a new Loader that reads from r with the given options.
//
// The Loader introduces its own buffering and may read data from r beyond the
// YAML values requested.
func NewLoader(r io.Reader, opts ...Option) (*Loader, error) {
o, err := ApplyOptions(opts...)
if err != nil {
return nil, err
}
c := NewComposerFromReader(r, o)
c.SetStreamNodes(o.StreamNodes)
return &Loader{
composer: c,
resolver: NewResolver(o),
constructor: NewConstructor(o),
options: o,
}, nil
}
// Load loads YAML document(s) with the given options.
//
// By default, Load requires exactly one document in the input.
// If zero documents are found, it returns an error.
// If multiple documents are found, it returns an error.
//
// Use WithAllDocuments() to load all documents into a slice:
//
// var configs []Config
// yaml.Load(multiDocYAML, &configs, yaml.WithAllDocuments())
//
// When WithAllDocuments is used, out must be a pointer to a slice.
// Each document is loaded into the slice element type.
// Zero documents results in an empty slice (no error).
//
// Maps and pointers (to a struct, string, int, etc) are accepted as out
// values. If an internal pointer within a struct is not initialized,
// the yaml package will initialize it if necessary. The out parameter
// must not be nil.
//
// The type of the loaded values should be compatible with the respective
// values in out. If one or more values cannot be loaded due to type
// mismatches, decoding continues partially until the end of the YAML
// content, and a *yaml.LoadErrors is returned with details for all
// missed values.
//
// Struct fields are only loaded if they are exported (have an upper case
// first letter), and are loaded using the field name lowercased as the
// default key. Custom keys may be defined via the "yaml" name in the field
// tag: the content preceding the first comma is used as the key, and the
// following comma-separated options control the loading and dumping behavior.
//
// For example:
//
// type T struct {
// F int `yaml:"a,omitempty"`
// B int
// }
// var t T
// yaml.Load([]byte("a: 1\nb: 2"), &t)
//
// See the documentation of Dump for the format of tags and a list of
// supported tag options.
func Load(in []byte, out any, opts ...Option) error {
o, err := ApplyOptions(opts...)
if err != nil {
return err
}
if o.AllDocuments {
// Multi-document mode: out must be pointer to slice
return loadAll(in, out, o)
}
// Single-document mode: exactly one document required
return loadSingle(in, out, o)
}
// Load reads the next YAML-encoded document from its input and stores it
// in the value pointed to by v.
//
// Returns [io.EOF] when there are no more documents to read.
// If WithSingleDocument option was set and a document was already read,
// subsequent calls return [io.EOF].
//
// Maps and pointers (to a struct, string, int, etc) are accepted as v
// values. If an internal pointer within a struct is not initialized,
// the yaml package will initialize it if necessary. The v parameter
// must not be nil.
//
// Struct fields are only loaded if they are exported (have an upper case
// first letter), and are loaded using the field name lowercased as the
// default key. Custom keys may be defined via the "yaml" name in the field
// tag: the content preceding the first comma is used as the key, and the
// following comma-separated options control the loading and dumping behavior.
//
// See the documentation of the package-level Load function for more details
// about YAML to Go conversion and tag options.
func (l *Loader) Load(v any) (err error) {
defer handleErr(&err)
if l.options.SingleDocument && l.docCount > 0 {
return io.EOF
}
// Stage 1: Compose - parse events into node tree (unresolved tags)
node := l.composer.Compose() // *Node
if node == nil {
return io.EOF
}
l.docCount++
// Stage 2: Resolve - determine implicit types for untagged scalars
l.resolver.Resolve(node)
// Stage 3: Construct - convert node tree to Go values
out := reflect.ValueOf(v)
if out.Kind() == reflect.Pointer && !out.IsNil() {
out = out.Elem()
}
l.constructor.Construct(node, out)
if len(l.constructor.TypeErrors) > 0 {
typeErrors := l.constructor.TypeErrors
l.constructor.TypeErrors = nil
return &LoadErrors{Errors: typeErrors}
}
return nil
}
// loadAll loads all documents from the input into a slice.
// The out parameter must be a non-nil pointer to a slice.
// Each document is appended to the slice as an element.
func loadAll(in []byte, out any, opts *Options) error {
outVal := reflect.ValueOf(out)
if outVal.Kind() != reflect.Pointer || outVal.IsNil() {
msg := "yaml: WithAllDocuments requires a non-nil pointer to a slice"
return &LoadErrors{Errors: []*LoadError{{
Stage: ConstructorStage,
Message: msg,
err: errors.New(msg),
}}}
}
sliceVal := outVal.Elem()
if sliceVal.Kind() != reflect.Slice {
msg := "yaml: WithAllDocuments requires a pointer to a slice"
return &LoadErrors{Errors: []*LoadError{{
Stage: ConstructorStage,
Message: msg,
err: errors.New(msg),
}}}
}
// Create a new slice (clear existing content)
sliceVal.Set(reflect.MakeSlice(sliceVal.Type(), 0, 0))
l, err := NewLoader(bytes.NewReader(in), func(o *Options) error {
*o = *opts // Copy options
return nil
})
if err != nil {
return err
}
elemType := sliceVal.Type().Elem()
for {
// Create new element of slice's element type
elemPtr := reflect.New(elemType)
err := l.Load(elemPtr.Interface())
if err == io.EOF {
break
}
if err != nil {
return err
}
// Append loaded element to slice
sliceVal.Set(reflect.Append(sliceVal, elemPtr.Elem()))
}
return nil
}
// loadSingle loads exactly one document from the input.
// Returns an error if the input contains zero or multiple documents
// (unless FromLegacy option is set for backward compatibility).
func loadSingle(in []byte, out any, opts *Options) error {
l, err := NewLoader(bytes.NewReader(in), func(o *Options) error {
*o = *opts // Copy options
return nil
})
if err != nil {
return err
}
// Load first document
err = l.Load(out)
if err == io.EOF {
msg := "yaml: no documents in stream"
return &LoadErrors{Errors: []*LoadError{{
Stage: ConstructorStage,
Message: msg,
err: errors.New(msg),
}}}
}
if err != nil {
return err
}
// Skip trailing document check for legacy Unmarshal() compatibility
if opts.FromLegacy {
return nil
}
// Check for additional documents
var dummy any
err = l.Load(&dummy)
if err != io.EOF {
if err != nil {
// Some other error occurred
return err
}
// Successfully loaded a second document - this is an error in strict mode
msg := "yaml: expected single document, found multiple"
return &LoadErrors{Errors: []*LoadError{{
Stage: ConstructorStage,
Message: msg,
err: errors.New(msg),
}}}
}
return nil
}
// SetKnownFields enables or disables strict field checking for subsequent Load
// calls.
// This is used by the legacy Decoder.KnownFields() method.
func (l *Loader) SetKnownFields(enable bool) {
l.constructor.KnownFields = enable
}
// ComposeAndResolve composes and resolves the next document from the input
// and returns the node without constructing Go values. This is used by
// Unmarshal() to support the Unmarshaler interface.
func (l *Loader) ComposeAndResolve() *Node {
if l.options.SingleDocument && l.docCount > 0 {
return nil
}
// Stage 1: Compose - parse events into node tree (unresolved tags)
node := l.composer.Compose()
if node == nil {
return nil
}
l.docCount++
// Stage 2: Resolve - determine implicit types for untagged scalars
l.resolver.Resolve(node)
return node
}
// LoadAny parses YAML data into generic Go structures (map[string]any, []any).
//
// Useful for test data loading where the structure is unknown at compile time.
// This is a convenience wrapper around Load with an any target.
func LoadAny(data []byte) (any, error) {
var result any
if err := Load(data, &result); err != nil {
return nil, err
}
return result, nil
}
+132 -26
View File
@@ -28,13 +28,17 @@ const (
mergeTag = "!!merge"
)
// longTagPrefix is the standard YAML tag prefix for core types.
const longTagPrefix = "tag:yaml.org,2002:"
// longTags maps short tags to their long form representations.
// shortTags maps long tags to their short form representations.
var (
longTags = make(map[string]string)
shortTags = make(map[string]string)
)
// init initializes the tag conversion maps.
func init() {
for _, stag := range []string{nullTag, boolTag, strTag, intTag, floatTag, timestampTag, seqTag, mapTag, binaryTag, mergeTag} {
ltag := longTag(stag)
@@ -43,6 +47,7 @@ func init() {
}
}
// shortTag converts a long-form tag to its short form (e.g., "tag:yaml.org,2002:str" to "!!str").
func shortTag(tag string) string {
if strings.HasPrefix(tag, longTagPrefix) {
if stag, ok := shortTags[tag]; ok {
@@ -53,6 +58,7 @@ func shortTag(tag string) string {
return tag
}
// longTag converts a short-form tag to its long form (e.g., "!!str" to "tag:yaml.org,2002:str").
func longTag(tag string) string {
if strings.HasPrefix(tag, "!!") {
if ltag, ok := longTags[tag]; ok {
@@ -66,6 +72,7 @@ func longTag(tag string) string {
// Kind represents the type of YAML node
type Kind uint32
// Kind constants define the different types of YAML nodes.
const (
DocumentNode Kind = 1 << iota
SequenceNode
@@ -78,6 +85,7 @@ const (
// Style represents the formatting style of a YAML node
type Style uint32
// Style constants define different formatting styles for YAML nodes.
const (
TaggedStyle Style = 1 << iota
DoubleQuotedStyle
@@ -99,6 +107,14 @@ type StreamTagDirective struct {
Prefix string
}
// Stream holds stream-level metadata for StreamNode.
// This includes encoding, version directive, and tag directives.
type Stream struct {
Encoding Encoding
Version *StreamVersionDirective
TagDirectives []StreamTagDirective
}
// Node represents an element in the YAML document hierarchy. While documents
// are typically encoded and decoded into higher level types, such as structs
// and maps, Node is an intermediate representation that allows detailed
@@ -171,26 +187,15 @@ type Node struct {
Line int
Column int
// StreamNode-specific fields (only valid when Kind == StreamNode)
// Encoding holds the stream encoding (UTF-8, UTF-16LE, UTF-16BE).
// Only valid for StreamNode.
Encoding Encoding
// Version holds the YAML version directive (%YAML).
// Only valid for StreamNode.
Version *StreamVersionDirective
// TagDirectives holds the %TAG directives.
// Only valid for StreamNode.
TagDirectives []StreamTagDirective
// Stream holds stream metadata (non-nil only when Kind == StreamNode).
Stream *Stream
}
// IsZero returns whether the node has all of its fields unset.
func (n *Node) IsZero() bool {
return n.Kind == 0 && n.Style == 0 && n.Tag == "" && n.Value == "" && n.Anchor == "" && n.Alias == nil && n.Content == nil &&
n.HeadComment == "" && n.LineComment == "" && n.FootComment == "" && n.Line == 0 && n.Column == 0 &&
n.Encoding == 0 && n.Version == nil && n.TagDirectives == nil
n.Stream == nil
}
// LongTag returns the long form of the tag that indicates the data type for
@@ -230,6 +235,7 @@ func (n *Node) ShortTag() string {
return shortTag(n.Tag)
}
// indicatedString returns true if the node's style explicitly indicates a string type.
func (n *Node) indicatedString() bool {
return n.Kind == ScalarNode &&
(shortTag(n.Tag) == strTag ||
@@ -324,14 +330,22 @@ func (n *Node) Load(v any, opts ...Option) (err error) {
// conversion of Go values into YAML.
func (n *Node) Encode(v any) (err error) {
defer handleErr(&err)
e := NewRepresenter(noWriter, DefaultOptions)
defer e.Destroy()
e.MarshalDoc("", reflect.ValueOf(v))
e.Finish()
p := NewComposer(e.Out)
// Use the 3-stage dump pipeline with round-trip to preserve styles
r := NewRepresenter(DefaultOptions)
node := r.Represent("", reflect.ValueOf(v))
d := NewDesolver(DefaultOptions)
d.Desolve(node)
s := NewSerializer(nil, DefaultOptions)
var out []byte
s.Emitter.SetOutputString(&out)
s.Serialize(node)
s.Finish()
// Parse back to get styles
p := NewComposer(out, nil)
p.Textless = true
defer p.Destroy()
doc := p.Parse()
doc := p.Compose()
NewResolver(nil).Resolve(doc)
*n = *doc.Content[0]
return nil
}
@@ -350,14 +364,106 @@ func (n *Node) Dump(v any, opts ...Option) (err error) {
if err != nil {
return err
}
e := NewRepresenter(noWriter, o)
defer e.Destroy()
e.MarshalDoc("", reflect.ValueOf(v))
e.Finish()
p := NewComposer(e.Out)
// Use the 3-stage dump pipeline with round-trip to preserve styles
r := NewRepresenter(o)
node := r.Represent("", reflect.ValueOf(v))
d := NewDesolver(o)
d.Desolve(node)
s := NewSerializer(nil, o)
var out []byte
s.Emitter.SetOutputString(&out)
s.Serialize(node)
s.Finish()
// Parse back to get styles
p := NewComposer(out, nil)
p.Textless = true
defer p.Destroy()
doc := p.Parse()
doc := p.Compose()
NewResolver(nil).Resolve(doc)
*n = *doc.Content[0]
return nil
}
// Marshaler interface may be implemented by types to customize their
// behavior when being marshaled into a YAML document.
type Marshaler interface {
MarshalYAML() (any, error)
}
// Unmarshaler is the interface implemented by types that can unmarshal
// a YAML description of themselves.
type Unmarshaler interface {
UnmarshalYAML(node *Node) error
}
// IsZeroer is used to check whether an object is zero to determine whether
// it should be omitted when marshaling with the ,omitempty flag. One notable
// implementation is [time.Time].
type IsZeroer interface {
IsZero() bool
}
// FromYAMLNode is a new interface that types can implement to customize
// their unmarshaling behavior. It receives a Node directly and modifies
// the receiver in place.
// This is the preferred interface for new code.
//
// Note: This interface is reserved for the v4 API and is not yet fully
// integrated into the current implementation.
type FromYAMLNode interface {
FromYAMLNode(*Node) error
}
// ToYAMLNode is a new interface that types can implement to customize
// their marshaling behavior. It returns a Node directly.
// This is the preferred interface for new code.
//
// Note: This interface is reserved for the v4 API and is not yet fully
// integrated into the current implementation.
type ToYAMLNode interface {
ToYAMLNode() (*Node, error)
}
// isZero reports whether v represents the zero value for its type.
// If v implements the IsZeroer interface, IsZero() is called.
// Otherwise, zero is determined by checking type-specific conditions.
// This is used to determine omitempty behavior when marshaling.
func isZero(v reflect.Value) bool {
kind := v.Kind()
if z, ok := v.Interface().(IsZeroer); ok {
if (kind == reflect.Pointer || kind == reflect.Interface) && v.IsNil() {
return true
}
return z.IsZero()
}
switch kind {
case reflect.String:
return len(v.String()) == 0
case reflect.Interface, reflect.Pointer:
return v.IsNil()
case reflect.Slice:
return v.Len() == 0
case reflect.Map:
return v.Len() == 0
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return v.Int() == 0
case reflect.Float32, reflect.Float64:
return v.Float() == 0
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
return v.Uint() == 0
case reflect.Bool:
return !v.Bool()
case reflect.Struct:
vt := v.Type()
for i := v.NumField() - 1; i >= 0; i-- {
if vt.Field(i).PkgPath != "" {
continue // Private field
}
if !isZero(v.Field(i)) {
return false
}
}
return true
}
return false
}
+63 -4
View File
@@ -1,7 +1,5 @@
//
// Copyright (c) 2025 The go-yaml Project Contributors
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
//
// Options configuration for loading and dumping YAML.
// Provides centralized control for indentation, line width, strictness, and
@@ -34,11 +32,68 @@ type Options struct {
ExplicitEnd bool // Always emit ...
FlowSimpleCollections bool // Use flow style for simple collections
QuotePreference QuoteStyle // Preferred quote style when quoting is required
// Safety limit checks (set by ApplyOptions or WithPlugin(limit.New(...)))
DepthCheck func(depth int, ctx *DepthContext) error
AliasCheck func(aliasCount, constructCount int) error
// Private options (not exported, used internally)
FromLegacy bool // Indicates legacy Unmarshal()/Decoder path (check Unmarshaler, allow trailing content)
}
// Option allows configuring YAML loading and dumping operations.
type Option func(*Options) error
// DepthKind represents the type of nesting (flow or block).
type DepthKind string
// DepthKindFlow and DepthKindBlock are the possible values of DepthContext.Kind.
const (
DepthKindFlow DepthKind = "flow"
DepthKindBlock DepthKind = "block"
)
// DepthContext holds context about a nesting depth check.
type DepthContext struct {
Kind DepthKind
}
// DefaultDepthCheck is the default depth check function.
// It returns an error when depth exceeds 10000.
func DefaultDepthCheck(depth int, ctx *DepthContext) error {
const maxDepth = 10000
if depth > maxDepth {
return fmt.Errorf("exceeded max depth of %d", maxDepth)
}
return nil
}
// DefaultAliasCheck is the default alias check function.
// It uses a ratio-based heuristic to prevent DoS attacks via excessive aliasing.
func DefaultAliasCheck(aliasCount, constructCount int) error {
const (
aliasRatioRangeLow = 400000
aliasRatioRangeHigh = 4000000
aliasRatioRange = float64(aliasRatioRangeHigh - aliasRatioRangeLow)
)
if aliasCount <= 100 || constructCount <= 1000 {
return nil
}
var allowed float64
switch {
case constructCount <= aliasRatioRangeLow:
allowed = 0.99
case constructCount >= aliasRatioRangeHigh:
allowed = 0.10
default:
allowed = 0.99 - 0.89*(float64(constructCount-aliasRatioRangeLow)/aliasRatioRange)
}
if float64(aliasCount)/float64(constructCount) > allowed {
return errors.New("document contains excessive aliasing")
}
return nil
}
// WithIndent sets the number of spaces to use for indentation when
// dumping YAML content.
//
@@ -92,7 +147,7 @@ func WithKnownFields(knownFields ...bool) Option {
// WithSingleDocument configures the Loader to only process the first document
// in a YAML stream. After the first document is loaded, subsequent calls to
// Load will return io.EOF.
// Load will return [io.EOF].
//
// When called without arguments, defaults to true.
//
@@ -371,6 +426,10 @@ func ApplyOptions(opts ...Option) (*Options, error) {
LineWidth: 80,
Unicode: true,
UniqueKeys: true,
// Default safety limits
DepthCheck: DefaultDepthCheck,
AliasCheck: DefaultAliasCheck,
}
for _, opt := range opts {
if err := opt(o); err != nil {
+538 -266
View File
@@ -4,7 +4,8 @@
// SPDX-License-Identifier: Apache-2.0 AND MIT
// Parser stage: Transforms token stream into event stream.
// Implements a recursive-descent parser (LL(1)) following the YAML grammar specification.
// Implements a recursive-descent parser (LL(1)) following the YAML grammar
// specification.
//
// The parser implements the following grammar:
//
@@ -52,59 +53,213 @@ import (
"strings"
)
// Peek the next token in the token queue.
func (parser *Parser) peekToken(out **Token) error {
if !parser.token_available {
if err := parser.fetchMoreTokens(); err != nil {
return err
}
}
// ReadHandler is called by the [Parser] when it needs to read more bytes
// from the input source. The handler should fill the provided buffer with
// up to len(buffer) bytes from the input source.
//
// The arguments are as follows:
//
// [in] parser The parser object.
// [out] buffer The buffer for reading.
// [out] size_read The actual number of bytes read from the source.
//
// On success, the handler should return 1. If the handler failed,
// the returned value should be 0. On EOF, the handler should set the
// size_read to 0 and return 1.
type ReadHandler func(parser *Parser, buffer []byte) (n int, err error)
token := &parser.tokens[parser.tokens_head]
parser.UnfoldComments(token)
*out = token
return nil
// SimpleKey holds information about a potential simple key.
type SimpleKey struct {
flow_level int // What flow level is the key at?
required bool // Is a simple key required?
token_number int // The number of the token.
mark Mark // The position mark.
}
// UnfoldComments walks through the comments queue and joins all
// comments behind the position of the provided token into the respective
// top-level comment slices in the parser.
func (parser *Parser) UnfoldComments(token *Token) {
for parser.comments_head < len(parser.comments) && token.StartMark.Index >= parser.comments[parser.comments_head].TokenMark.Index {
comment := &parser.comments[parser.comments_head]
if len(comment.Head) > 0 {
if token.Type == BLOCK_END_TOKEN {
// No heads on ends, so keep comment.Head for a follow up token.
break
}
if len(parser.HeadComment) > 0 {
parser.HeadComment = append(parser.HeadComment, '\n')
}
parser.HeadComment = append(parser.HeadComment, comment.Head...)
}
if len(comment.Foot) > 0 {
if len(parser.FootComment) > 0 {
parser.FootComment = append(parser.FootComment, '\n')
}
parser.FootComment = append(parser.FootComment, comment.Foot...)
}
if len(comment.Line) > 0 {
if len(parser.LineComment) > 0 {
parser.LineComment = append(parser.LineComment, '\n')
}
parser.LineComment = append(parser.LineComment, comment.Line...)
}
*comment = Comment{}
parser.comments_head++
// ParserState represents the state of the parser.
type ParserState int
// Parser state constants define the different states the parser can be in.
const (
PARSE_STREAM_START_STATE ParserState = iota
PARSE_IMPLICIT_DOCUMENT_START_STATE // Expect the beginning of an implicit document.
PARSE_DOCUMENT_START_STATE // Expect DOCUMENT-START.
PARSE_DOCUMENT_CONTENT_STATE // Expect the content of a document.
PARSE_DOCUMENT_END_STATE // Expect DOCUMENT-END.
PARSE_BLOCK_NODE_STATE // Expect a block node.
PARSE_BLOCK_SEQUENCE_FIRST_ENTRY_STATE // Expect the first entry of a block sequence.
PARSE_BLOCK_SEQUENCE_ENTRY_STATE // Expect an entry of a block sequence.
PARSE_INDENTLESS_SEQUENCE_ENTRY_STATE // Expect an entry of an indentless sequence.
PARSE_BLOCK_MAPPING_FIRST_KEY_STATE // Expect the first key of a block mapping.
PARSE_BLOCK_MAPPING_KEY_STATE // Expect a block mapping key.
PARSE_BLOCK_MAPPING_VALUE_STATE // Expect a block mapping value.
PARSE_FLOW_SEQUENCE_FIRST_ENTRY_STATE // Expect the first entry of a flow sequence.
PARSE_FLOW_SEQUENCE_ENTRY_STATE // Expect an entry of a flow sequence.
PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_KEY_STATE // Expect a key of an ordered mapping.
PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_VALUE_STATE // Expect a value of an ordered mapping.
PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_END_STATE // Expect the and of an ordered mapping entry.
PARSE_FLOW_MAPPING_FIRST_KEY_STATE // Expect the first key of a flow mapping.
PARSE_FLOW_MAPPING_KEY_STATE // Expect a key of a flow mapping.
PARSE_FLOW_MAPPING_VALUE_STATE // Expect a value of a flow mapping.
PARSE_FLOW_MAPPING_EMPTY_VALUE_STATE // Expect an empty value of a flow mapping.
PARSE_END_STATE // Expect nothing.
)
// String returns a string representation of the parser state.
func (ps ParserState) String() string {
switch ps {
case PARSE_STREAM_START_STATE:
return "PARSE_STREAM_START_STATE"
case PARSE_IMPLICIT_DOCUMENT_START_STATE:
return "PARSE_IMPLICIT_DOCUMENT_START_STATE"
case PARSE_DOCUMENT_START_STATE:
return "PARSE_DOCUMENT_START_STATE"
case PARSE_DOCUMENT_CONTENT_STATE:
return "PARSE_DOCUMENT_CONTENT_STATE"
case PARSE_DOCUMENT_END_STATE:
return "PARSE_DOCUMENT_END_STATE"
case PARSE_BLOCK_NODE_STATE:
return "PARSE_BLOCK_NODE_STATE"
case PARSE_BLOCK_SEQUENCE_FIRST_ENTRY_STATE:
return "PARSE_BLOCK_SEQUENCE_FIRST_ENTRY_STATE"
case PARSE_BLOCK_SEQUENCE_ENTRY_STATE:
return "PARSE_BLOCK_SEQUENCE_ENTRY_STATE"
case PARSE_INDENTLESS_SEQUENCE_ENTRY_STATE:
return "PARSE_INDENTLESS_SEQUENCE_ENTRY_STATE"
case PARSE_BLOCK_MAPPING_FIRST_KEY_STATE:
return "PARSE_BLOCK_MAPPING_FIRST_KEY_STATE"
case PARSE_BLOCK_MAPPING_KEY_STATE:
return "PARSE_BLOCK_MAPPING_KEY_STATE"
case PARSE_BLOCK_MAPPING_VALUE_STATE:
return "PARSE_BLOCK_MAPPING_VALUE_STATE"
case PARSE_FLOW_SEQUENCE_FIRST_ENTRY_STATE:
return "PARSE_FLOW_SEQUENCE_FIRST_ENTRY_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_KEY_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_KEY_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_VALUE_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_VALUE_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_END_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_END_STATE"
case PARSE_FLOW_MAPPING_FIRST_KEY_STATE:
return "PARSE_FLOW_MAPPING_FIRST_KEY_STATE"
case PARSE_FLOW_MAPPING_KEY_STATE:
return "PARSE_FLOW_MAPPING_KEY_STATE"
case PARSE_FLOW_MAPPING_VALUE_STATE:
return "PARSE_FLOW_MAPPING_VALUE_STATE"
case PARSE_FLOW_MAPPING_EMPTY_VALUE_STATE:
return "PARSE_FLOW_MAPPING_EMPTY_VALUE_STATE"
case PARSE_END_STATE:
return "PARSE_END_STATE"
}
return "<unknown parser state>"
}
// Remove the next token from the queue (must be called after peek_token).
func (parser *Parser) skipToken() {
parser.token_available = false
parser.tokens_parsed++
parser.stream_end_produced = parser.tokens[parser.tokens_head].Type == STREAM_END_TOKEN
parser.tokens_head++
// AliasData holds information about aliases.
type AliasData struct {
anchor []byte // The anchor.
index int // The node id.
mark Mark // The anchor mark.
}
// Comment holds information about a comment in the YAML stream.
type Comment struct {
ScanMark Mark // Position where scanning for comments started
TokenMark Mark // Position after which tokens will be associated with this comment
StartMark Mark // Position of '#' comment mark
EndMark Mark // Position where comment terminated
Head []byte
Line []byte
Foot []byte
}
// Parser structure holds all information about the current
// state of the parser.
type Parser struct {
lastError error
// Reader stuff
read_handler ReadHandler // Read handler.
input_reader io.Reader // File input data.
input []byte // String input data.
input_pos int
eof bool // EOF flag
buffer []byte // The working buffer.
buffer_pos int // The current position of the buffer.
unread int // The number of unread characters in the buffer.
newlines int // The number of line breaks since last non-break/non-blank character
raw_buffer []byte // The raw buffer.
raw_buffer_pos int // The current position of the buffer.
encoding Encoding // The input encoding.
offset int // The offset of the current position (in bytes).
mark Mark // The mark of the current position.
// Comments
HeadComment []byte // The current head comments
LineComment []byte // The current line comments
FootComment []byte // The current foot comments
tail_comment []byte // Foot comment that happens at the end of a block.
stem_comment []byte // Comment in item preceding a nested structure (list inside list item, etc)
comments []Comment // The folded comments for all parsed tokens
comments_head int
skip_comments bool // Skip comment scanning for performance
// Scanner stuff
stream_start_produced bool // Have we started to scan the input stream?
stream_end_produced bool // Have we reached the end of the input stream?
flow_level int // The number of unclosed '[' and '{' indicators.
tokens []Token // The tokens queue.
tokens_head int // The head of the tokens queue.
tokens_parsed int // The number of tokens fetched from the queue.
token_available bool // Does the tokens queue contain a token ready for dequeueing.
indent int // The current indentation level.
indents []int // The indentation levels stack.
simple_key_allowed bool // May a simple key occur at the current position?
simple_key_possible bool // Is the current simple key possible?
simple_key SimpleKey // The current simple key.
simple_key_stack []SimpleKey // The stack of simple keys.
depthCheck func(int, *DepthContext) error // Depth limit check function
// Parser stuff
state ParserState // The current parser state.
states []ParserState // The parser states stack.
marks []Mark // The stack of marks.
tag_directives []TagDirective // The list of TAG directives.
// Representer stuff
aliases []AliasData // The alias data.
}
// NewParser creates a new parser object.
func NewParser() Parser {
return Parser{
raw_buffer: make([]byte, 0, input_raw_buffer_size),
buffer: make([]byte, 0, input_buffer_size),
mark: Mark{Line: 1, Column: 1},
depthCheck: DefaultDepthCheck,
}
}
// Parse gets the next event.
@@ -130,21 +285,74 @@ func (parser *Parser) Parse(event *Event) error {
return nil
}
func formatParserError(problem string, problem_mark Mark) error {
return ParserError{
Mark: problem_mark,
Message: problem,
}
// Delete a parser object.
func (parser *Parser) Delete() {
*parser = Parser{}
}
func formatParserErrorContext(context string, context_mark Mark, problem string, problem_mark Mark) error {
return ParserError{
ContextMark: context_mark,
ContextMessage: context,
Mark: problem_mark,
Message: problem,
// String read handler.
func yamlStringReadHandler(parser *Parser, buffer []byte) (n int, err error) {
if parser.input_pos == len(parser.input) {
return 0, io.EOF
}
n = copy(buffer, parser.input[parser.input_pos:])
parser.input_pos += n
return n, nil
}
// Reader read handler.
func yamlReaderReadHandler(parser *Parser, buffer []byte) (n int, err error) {
return parser.input_reader.Read(buffer)
}
// SetInputString sets a string input.
func (parser *Parser) SetInputString(input []byte) {
if parser.read_handler != nil {
panic("must set the input source only once")
}
parser.read_handler = yamlStringReadHandler
parser.input = input
parser.input_pos = 0
}
// SetInputReader sets a file input.
func (parser *Parser) SetInputReader(r io.Reader) {
if parser.read_handler != nil {
panic("must set the input source only once")
}
parser.read_handler = yamlReaderReadHandler
parser.input_reader = r
}
// SetEncoding sets the source encoding.
func (parser *Parser) SetEncoding(encoding Encoding) {
if parser.encoding != ANY_ENCODING {
panic("must set the encoding only once")
}
parser.encoding = encoding
}
// GetPendingComments returns the parser's comment queue for CLI access.
func (parser *Parser) GetPendingComments() []Comment {
return parser.comments
}
// GetCommentsHead returns the current position in the comment queue.
func (parser *Parser) GetCommentsHead() int {
return parser.comments_head
}
// SetSkipComments enables or disables comment scanning.
// When enabled, the scanner skips comment tokens for better performance.
func (parser *Parser) SetSkipComments(skip bool) {
parser.skip_comments = skip
}
// default_tag_directives defines the standard tag directives (! and !!)
// that are implicitly available in all YAML documents.
var default_tag_directives = []TagDirective{
{[]byte("!"), []byte("!")},
{[]byte("!!"), []byte("tag:yaml.org,2002:")},
}
// State dispatcher.
@@ -221,9 +429,9 @@ func (parser *Parser) stateMachine(event *Event) error {
}
// Parse the production:
// stream ::= STREAM-START implicit_document? explicit_document* STREAM-END
//
// ************
// stream ::= STREAM-START implicit_document? explicit_document* STREAM-END
// ************
func (parser *Parser) parseStreamStart(event *Event) error {
var token *Token
if err := parser.peekToken(&token); err != nil {
@@ -244,13 +452,11 @@ func (parser *Parser) parseStreamStart(event *Event) error {
}
// Parse the productions:
// implicit_document ::= block_node DOCUMENT-END*
//
// *
//
// explicit_document ::= DIRECTIVE* DOCUMENT-START block_node? DOCUMENT-END*
//
// *************************
// implicit_document ::= block_node DOCUMENT-END*
// *
// explicit_document ::= DIRECTIVE* DOCUMENT-START block_node? DOCUMENT-END*
// *************************
func (parser *Parser) parseDocumentStart(event *Event, implicit bool) error {
var token *Token
if err := parser.peekToken(&token); err != nil {
@@ -258,12 +464,10 @@ func (parser *Parser) parseDocumentStart(event *Event, implicit bool) error {
}
// Parse extra document end indicators.
if !implicit {
for token.Type == DOCUMENT_END_TOKEN {
parser.skipToken()
if err := parser.peekToken(&token); err != nil {
return err
}
for token.Type == DOCUMENT_END_TOKEN {
parser.skipToken()
if err := parser.peekToken(&token); err != nil {
return err
}
}
@@ -280,9 +484,11 @@ func (parser *Parser) parseDocumentStart(event *Event, implicit bool) error {
var head_comment []byte
if len(parser.HeadComment) > 0 {
// [Go] Scan the header comment backwards, and if an empty line is found, break
// the header so the part before the last empty line goes into the
// document header, while the bottom of it goes into a follow up event.
// [Go] Scan the header comment backwards, and if an
// empty line is found, break the header so the part
// before the last empty line goes into the document
// header, while the bottom of it goes into a follow up
// event.
for i := len(parser.HeadComment) - 1; i > 0; i-- {
if parser.HeadComment[i] == '\n' {
if i == len(parser.HeadComment)-1 {
@@ -351,9 +557,9 @@ func (parser *Parser) parseDocumentStart(event *Event, implicit bool) error {
}
// Parse the productions:
// explicit_document ::= DIRECTIVE* DOCUMENT-START block_node? DOCUMENT-END*
//
// ***********
// explicit_document ::= DIRECTIVE* DOCUMENT-START block_node? DOCUMENT-END*
// ***********
func (parser *Parser) parseDocumentContent(event *Event) error {
var token *Token
if err := parser.peekToken(&token); err != nil {
@@ -374,11 +580,10 @@ func (parser *Parser) parseDocumentContent(event *Event) error {
}
// Parse the productions:
// implicit_document ::= block_node DOCUMENT-END*
//
// *************
//
// explicit_document ::= DIRECTIVE* DOCUMENT-START block_node? DOCUMENT-END*
// implicit_document ::= block_node DOCUMENT-END*
// *************
// explicit_document ::= DIRECTIVE* DOCUMENT-START block_node? DOCUMENT-END*
func (parser *Parser) parseDocumentEnd(event *Event) error {
var token *Token
if err := parser.peekToken(&token); err != nil {
@@ -412,54 +617,113 @@ func (parser *Parser) parseDocumentEnd(event *Event) error {
return nil
}
func (parser *Parser) setEventComments(event *Event) {
event.HeadComment = parser.HeadComment
event.LineComment = parser.LineComment
event.FootComment = parser.FootComment
parser.HeadComment = nil
parser.LineComment = nil
parser.FootComment = nil
parser.tail_comment = nil
parser.stem_comment = nil
// Parse directives.
func (parser *Parser) processDirectives(version_directive_ref **VersionDirective, tag_directives_ref *[]TagDirective) error {
var version_directive *VersionDirective
var tag_directives []TagDirective
var token *Token
if err := parser.peekToken(&token); err != nil {
return err
}
for token.Type == VERSION_DIRECTIVE_TOKEN || token.Type == TAG_DIRECTIVE_TOKEN {
switch token.Type {
case VERSION_DIRECTIVE_TOKEN:
if version_directive != nil {
return formatParserError(
"found duplicate %YAML directive", token.StartMark)
}
if token.major != 1 || token.minor != 1 {
return formatParserError(
"found incompatible YAML document", token.StartMark)
}
version_directive = &VersionDirective{
major: token.major,
minor: token.minor,
}
case TAG_DIRECTIVE_TOKEN:
value := TagDirective{
handle: token.Value,
prefix: token.prefix,
}
if err := parser.appendTagDirective(value, false, token.StartMark); err != nil {
return err
}
tag_directives = append(tag_directives, value)
}
parser.skipToken()
if err := parser.peekToken(&token); err != nil {
return err
}
}
for i := range default_tag_directives {
if err := parser.appendTagDirective(default_tag_directives[i], true, token.StartMark); err != nil {
return err
}
}
if version_directive_ref != nil {
*version_directive_ref = version_directive
}
if tag_directives_ref != nil {
*tag_directives_ref = tag_directives
}
return nil
}
// Append a tag directive to the directives stack.
func (parser *Parser) appendTagDirective(value TagDirective, allow_duplicates bool, mark Mark) error {
for i := range parser.tag_directives {
if bytes.Equal(value.handle, parser.tag_directives[i].handle) {
if allow_duplicates {
return nil
}
return formatParserError("found duplicate %TAG directive", mark)
}
}
// [Go] I suspect the copy is unnecessary. This was likely done
// because there was no way to track ownership of the data.
value_copy := TagDirective{
handle: make([]byte, len(value.handle)),
prefix: make([]byte, len(value.prefix)),
}
copy(value_copy.handle, value.handle)
copy(value_copy.prefix, value.prefix)
parser.tag_directives = append(parser.tag_directives, value_copy)
return nil
}
// Parse the productions:
// block_node_or_indentless_sequence ::=
//
// ALIAS
// *****
// | properties (block_content | indentless_block_sequence)?
// ********** *
// | block_content | indentless_block_sequence
// *
//
// block_node ::= ALIAS
//
// *****
// | properties block_content?
// ********** *
// | block_content
// *
//
// flow_node ::= ALIAS
//
// *****
// | properties flow_content?
// ********** *
// | flow_content
// *
//
// properties ::= TAG ANCHOR? | ANCHOR TAG?
//
// *************************
//
// block_content ::= block_collection | flow_collection | SCALAR
//
// ******
//
// flow_content ::= flow_collection | SCALAR
//
// ******
// block_node_or_indentless_sequence ::=
// ALIAS
// *****
// | properties (block_content | indentless_block_sequence)?
// ********** *
// | block_content | indentless_block_sequence
// *
// block_node ::= ALIAS
// *****
// | properties block_content?
// ********** *
// | block_content
// *
// flow_node ::= ALIAS
// *****
// | properties flow_content?
// ********** *
// | flow_content
// *
// properties ::= TAG ANCHOR? | ANCHOR TAG?
// *************************
// block_content ::= block_collection | flow_collection | SCALAR
// ******
// flow_content ::= flow_collection | SCALAR
// ******
func (parser *Parser) parseNode(event *Event, block, indentless_sequence bool) error {
// defer trace("yaml_parser_parse_node", "block:", block, "indentless_sequence:", indentless_sequence)()
@@ -683,9 +947,9 @@ func (parser *Parser) parseNode(event *Event, block, indentless_sequence bool) e
}
// Parse the productions:
// block_sequence ::= BLOCK-SEQUENCE-START (BLOCK-ENTRY block_node?)* BLOCK-END
//
// ******************** *********** * *********
// block_sequence ::= BLOCK-SEQUENCE-START (BLOCK-ENTRY block_node?)* BLOCK-END
// ******************** *********** * *********
func (parser *Parser) parseBlockSequenceEntry(event *Event, first bool) error {
if first {
var token *Token
@@ -742,9 +1006,9 @@ func (parser *Parser) parseBlockSequenceEntry(event *Event, first bool) error {
}
// Parse the productions:
// indentless_sequence ::= (BLOCK-ENTRY block_node?)+
//
// *********** *
// indentless_sequence ::= (BLOCK-ENTRY block_node?)+
// *********** *
func (parser *Parser) parseIndentlessSequenceEntry(event *Event) error {
var token *Token
if err := parser.peekToken(&token); err != nil {
@@ -784,10 +1048,11 @@ func (parser *Parser) parseIndentlessSequenceEntry(event *Event) error {
// Split stem comment from head comment.
//
// When a sequence or map is found under a sequence entry, the former head comment
// is assigned to the underlying sequence or map as a whole, not the individual
// sequence or map entry as would be expected otherwise. To handle this case the
// previous head comment is moved aside as the stem comment.
// When a sequence or map is found under a sequence entry, the former head
// comment is assigned to the underlying sequence or map as a whole, not the
// individual sequence or map entry as would be expected otherwise.
// To handle this case the previous head comment is moved aside as the stem
// comment.
func (parser *Parser) splitStemComment(stem_len int) error {
if stem_len == 0 {
return nil
@@ -813,15 +1078,15 @@ func (parser *Parser) splitStemComment(stem_len int) error {
}
// Parse the productions:
// block_mapping ::= BLOCK-MAPPING_START
//
// *******************
// ((KEY block_node_or_indentless_sequence?)?
// *** *
// (VALUE block_node_or_indentless_sequence?)?)*
// block_mapping ::= BLOCK-MAPPING_START
// *******************
// ((KEY block_node_or_indentless_sequence?)?
// *** *
// (VALUE block_node_or_indentless_sequence?)?)*
//
// BLOCK-END
// *********
// BLOCK-END
// *********
func (parser *Parser) parseBlockMappingKey(event *Event, first bool) error {
if first {
var token *Token
@@ -837,8 +1102,9 @@ func (parser *Parser) parseBlockMappingKey(event *Event, first bool) error {
return err
}
// [Go] A tail comment was left from the prior mapping value processed. Emit an event
// as it needs to be processed with that value and not the following key.
// [Go] A tail comment was left from the prior mapping value processed.
// Emit an event as it needs to be processed with that value and not
// the following key.
if len(parser.tail_comment) > 0 {
*event = Event{
Type: TAIL_COMMENT_EVENT,
@@ -888,13 +1154,14 @@ func (parser *Parser) parseBlockMappingKey(event *Event, first bool) error {
}
// Parse the productions:
// block_mapping ::= BLOCK-MAPPING_START
//
// ((KEY block_node_or_indentless_sequence?)?
// block_mapping ::= BLOCK-MAPPING_START
//
// (VALUE block_node_or_indentless_sequence?)?)*
// ***** *
// BLOCK-END
// ((KEY block_node_or_indentless_sequence?)?
//
// (VALUE block_node_or_indentless_sequence?)?)*
// ***** *
// BLOCK-END
func (parser *Parser) parseBlockMappingValue(event *Event) error {
var token *Token
if err := parser.peekToken(&token); err != nil {
@@ -920,19 +1187,17 @@ func (parser *Parser) parseBlockMappingValue(event *Event) error {
}
// Parse the productions:
// flow_sequence ::= FLOW-SEQUENCE-START
//
// *******************
// (flow_sequence_entry FLOW-ENTRY)*
// * **********
// flow_sequence_entry?
// *
// FLOW-SEQUENCE-END
// *****************
//
// flow_sequence_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
//
// *
// flow_sequence ::= FLOW-SEQUENCE-START
// *******************
// (flow_sequence_entry FLOW-ENTRY)*
// * **********
// flow_sequence_entry?
// *
// FLOW-SEQUENCE-END
// *****************
// flow_sequence_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
// *
func (parser *Parser) parseFlowSequenceEntry(event *Event, first bool) error {
if first {
var token *Token
@@ -995,9 +1260,9 @@ func (parser *Parser) parseFlowSequenceEntry(event *Event, first bool) error {
}
// Parse the productions:
// flow_sequence_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
//
// *** *
// flow_sequence_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
// *** *
func (parser *Parser) parseFlowSequenceEntryMappingKey(event *Event) error {
var token *Token
if err := parser.peekToken(&token); err != nil {
@@ -1016,9 +1281,9 @@ func (parser *Parser) parseFlowSequenceEntryMappingKey(event *Event) error {
}
// Parse the productions:
// flow_sequence_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
//
// ***** *
// flow_sequence_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
// ***** *
func (parser *Parser) parseFlowSequenceEntryMappingValue(event *Event) error {
var token *Token
if err := parser.peekToken(&token); err != nil {
@@ -1040,9 +1305,9 @@ func (parser *Parser) parseFlowSequenceEntryMappingValue(event *Event) error {
}
// Parse the productions:
// flow_sequence_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
//
// *
// flow_sequence_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
// *
func (parser *Parser) parseFlowSequenceEntryMappingEnd(event *Event) error {
var token *Token
if err := parser.peekToken(&token); err != nil {
@@ -1058,18 +1323,17 @@ func (parser *Parser) parseFlowSequenceEntryMappingEnd(event *Event) error {
}
// Parse the productions:
// flow_mapping ::= FLOW-MAPPING-START
//
// ******************
// (flow_mapping_entry FLOW-ENTRY)*
// * **********
// flow_mapping_entry?
// ******************
// FLOW-MAPPING-END
// ****************
//
// flow_mapping_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
// - *** *
// flow_mapping ::= FLOW-MAPPING-START
// ******************
// (flow_mapping_entry FLOW-ENTRY)*
// * **********
// flow_mapping_entry?
// ******************
// FLOW-MAPPING-END
// ****************
// flow_mapping_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
// * *** *
func (parser *Parser) parseFlowMappingKey(event *Event, first bool) error {
if first {
var token *Token
@@ -1135,8 +1399,9 @@ func (parser *Parser) parseFlowMappingKey(event *Event, first bool) error {
}
// Parse the productions:
// flow_mapping_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
// - ***** *
//
// flow_mapping_entry ::= flow_node | KEY flow_node? (VALUE flow_node?)?
// * ***** *
func (parser *Parser) parseFlowMappingValue(event *Event, empty bool) error {
var token *Token
if err := parser.peekToken(&token); err != nil {
@@ -1160,6 +1425,96 @@ func (parser *Parser) parseFlowMappingValue(event *Event, empty bool) error {
return parser.processEmptyScalar(event, token.StartMark)
}
// Peek the next token in the token queue.
func (parser *Parser) peekToken(out **Token) error {
if !parser.token_available {
if err := parser.fetchMoreTokens(); err != nil {
return err
}
}
token := &parser.tokens[parser.tokens_head]
parser.UnfoldComments(token)
*out = token
return nil
}
// UnfoldComments walks through the comments queue and joins all
// comments behind the position of the provided token into the respective
// top-level comment slices in the parser.
func (parser *Parser) UnfoldComments(token *Token) {
for parser.comments_head < len(parser.comments) && token.StartMark.Index >= parser.comments[parser.comments_head].TokenMark.Index {
comment := &parser.comments[parser.comments_head]
if len(comment.Head) > 0 {
if token.Type == BLOCK_END_TOKEN {
// No heads on ends, so keep comment.Head for a follow up token.
break
}
if len(parser.HeadComment) > 0 {
parser.HeadComment = append(parser.HeadComment, '\n')
}
parser.HeadComment = append(parser.HeadComment, comment.Head...)
}
if len(comment.Foot) > 0 {
if len(parser.FootComment) > 0 {
parser.FootComment = append(parser.FootComment, '\n')
}
parser.FootComment = append(parser.FootComment, comment.Foot...)
}
if len(comment.Line) > 0 {
if len(parser.LineComment) > 0 {
parser.LineComment = append(parser.LineComment, '\n')
}
parser.LineComment = append(parser.LineComment, comment.Line...)
}
*comment = Comment{}
parser.comments_head++
}
}
// Remove the next token from the queue (must be called after peek_token).
func (parser *Parser) skipToken() {
parser.token_available = false
parser.tokens_parsed++
parser.stream_end_produced = parser.tokens[parser.tokens_head].Type == STREAM_END_TOKEN
parser.tokens_head++
}
// formatParserError creates a LoadError with the given problem message
// and mark position.
func formatParserError(problem string, problemMark Mark) *LoadError {
return &LoadError{
Stage: ParserStage,
Mark: problemMark,
Message: problem,
}
}
// formatParserErrorContext creates a LoadError with both context and
// problem information, each with their own mark positions.
func formatParserErrorContext(context string, contextMark Mark, problem string, problemMark Mark) *LoadError {
return &LoadError{
Stage: ParserStage,
ContextMark: contextMark,
ContextMsg: context,
Mark: problemMark,
Message: problem,
}
}
// setEventComments transfers accumulated comments from the parser to the
// event and clears the parser's comment state.
func (parser *Parser) setEventComments(event *Event) {
event.HeadComment = parser.HeadComment
event.LineComment = parser.LineComment
event.FootComment = parser.FootComment
parser.HeadComment = nil
parser.LineComment = nil
parser.FootComment = nil
parser.tail_comment = nil
parser.stem_comment = nil
}
// Generate an empty scalar event.
func (parser *Parser) processEmptyScalar(event *Event, mark Mark) error {
*event = Event{
@@ -1173,94 +1528,9 @@ func (parser *Parser) processEmptyScalar(event *Event, mark Mark) error {
return nil
}
var default_tag_directives = []TagDirective{
{[]byte("!"), []byte("!")},
{[]byte("!!"), []byte("tag:yaml.org,2002:")},
}
// Parse directives.
func (parser *Parser) processDirectives(version_directive_ref **VersionDirective, tag_directives_ref *[]TagDirective) error {
var version_directive *VersionDirective
var tag_directives []TagDirective
var token *Token
if err := parser.peekToken(&token); err != nil {
return err
}
for token.Type == VERSION_DIRECTIVE_TOKEN || token.Type == TAG_DIRECTIVE_TOKEN {
switch token.Type {
case VERSION_DIRECTIVE_TOKEN:
if version_directive != nil {
return formatParserError(
"found duplicate %YAML directive", token.StartMark)
}
if token.major != 1 || token.minor != 1 {
return formatParserError(
"found incompatible YAML document", token.StartMark)
}
version_directive = &VersionDirective{
major: token.major,
minor: token.minor,
}
case TAG_DIRECTIVE_TOKEN:
value := TagDirective{
handle: token.Value,
prefix: token.prefix,
}
if err := parser.appendTagDirective(value, false, token.StartMark); err != nil {
return err
}
tag_directives = append(tag_directives, value)
}
parser.skipToken()
if err := parser.peekToken(&token); err != nil {
return err
}
}
for i := range default_tag_directives {
if err := parser.appendTagDirective(default_tag_directives[i], true, token.StartMark); err != nil {
return err
}
}
if version_directive_ref != nil {
*version_directive_ref = version_directive
}
if tag_directives_ref != nil {
*tag_directives_ref = tag_directives
}
return nil
}
// Append a tag directive to the directives stack.
func (parser *Parser) appendTagDirective(value TagDirective, allow_duplicates bool, mark Mark) error {
for i := range parser.tag_directives {
if bytes.Equal(value.handle, parser.tag_directives[i].handle) {
if allow_duplicates {
return nil
}
return formatParserError("found duplicate %TAG directive", mark)
}
}
// [Go] I suspect the copy is unnecessary. This was likely done
// because there was no way to track ownership of the data.
value_copy := TagDirective{
handle: make([]byte, len(value.handle)),
prefix: make([]byte, len(value.prefix)),
}
copy(value_copy.handle, value.handle)
copy(value_copy.prefix, value.prefix)
parser.tag_directives = append(parser.tag_directives, value_copy)
return nil
}
// ParserGetEvents parses the YAML input and returns the generated event stream.
func ParserGetEvents(in []byte) (string, error) {
p := NewComposer(in)
p := NewComposer(in, nil)
defer p.Destroy()
var events strings.Builder
var event Event
@@ -1280,6 +1550,8 @@ func ParserGetEvents(in []byte) (string, error) {
return events.String(), nil
}
// formatEvent formats an event as a human-readable string for debugging
// and testing purposes.
func formatEvent(e *Event) string {
var b strings.Builder
switch e.Type {
+107 -102
View File
@@ -15,89 +15,6 @@ import (
"io"
)
func formatReaderError(problem string, offset int, value int) error {
return ReaderError{
Offset: offset,
Value: value,
Err: errors.New(problem),
}
}
// Byte order marks.
const (
bom_UTF8 = "\xef\xbb\xbf"
bom_UTF16LE = "\xff\xfe"
bom_UTF16BE = "\xfe\xff"
)
// Determine the input stream encoding by checking the BOM symbol. If no BOM is
// found, the UTF-8 encoding is assumed. Return 1 on success, 0 on failure.
func (parser *Parser) determineEncoding() error {
// Ensure that we had enough bytes in the raw buffer.
for !parser.eof && len(parser.raw_buffer)-parser.raw_buffer_pos < 3 {
if err := parser.updateRawBuffer(); err != nil {
return err
}
}
// Determine the encoding.
buf := parser.raw_buffer
pos := parser.raw_buffer_pos
avail := len(buf) - pos
if avail >= 2 && buf[pos] == bom_UTF16LE[0] && buf[pos+1] == bom_UTF16LE[1] {
parser.encoding = UTF16LE_ENCODING
parser.raw_buffer_pos += 2
parser.offset += 2
} else if avail >= 2 && buf[pos] == bom_UTF16BE[0] && buf[pos+1] == bom_UTF16BE[1] {
parser.encoding = UTF16BE_ENCODING
parser.raw_buffer_pos += 2
parser.offset += 2
} else if avail >= 3 && buf[pos] == bom_UTF8[0] && buf[pos+1] == bom_UTF8[1] && buf[pos+2] == bom_UTF8[2] {
parser.encoding = UTF8_ENCODING
parser.raw_buffer_pos += 3
parser.offset += 3
} else {
parser.encoding = UTF8_ENCODING
}
return nil
}
// Update the raw buffer.
func (parser *Parser) updateRawBuffer() error {
size_read := 0
// Return if the raw buffer is full.
if parser.raw_buffer_pos == 0 && len(parser.raw_buffer) == cap(parser.raw_buffer) {
return nil
}
// Return on EOF.
if parser.eof {
return nil
}
// Move the remaining bytes in the raw buffer to the beginning.
if parser.raw_buffer_pos > 0 && parser.raw_buffer_pos < len(parser.raw_buffer) {
copy(parser.raw_buffer, parser.raw_buffer[parser.raw_buffer_pos:])
}
parser.raw_buffer = parser.raw_buffer[:len(parser.raw_buffer)-parser.raw_buffer_pos]
parser.raw_buffer_pos = 0
// Call the read handler to fill the buffer.
size_read, err := parser.read_handler(parser, parser.raw_buffer[len(parser.raw_buffer):cap(parser.raw_buffer)])
parser.raw_buffer = parser.raw_buffer[:len(parser.raw_buffer)+size_read]
if err == io.EOF {
parser.eof = true
} else if err != nil {
return ReaderError{
Offset: parser.offset,
Value: -1,
Err: fmt.Errorf("input error: %w", err),
}
}
return nil
}
// Ensure that the buffer contains at least `length` characters.
// Return true on success, false on failure.
//
@@ -107,9 +24,10 @@ func (parser *Parser) updateBuffer(length int) error {
panic("read handler must be set")
}
// [Go] This function was changed to guarantee the requested length size at EOF.
// The fact we need to do this is pretty awful, but the description above implies
// for that to be the case, and there are tests
// [Go] This function was changed to guarantee the requested length
// size at EOF.
// The fact we need to do this is pretty awful, but the description
// above implies for that to be the case, and there are tests
// If the EOF flag is set and the raw buffer is empty, do nothing.
//
@@ -205,8 +123,8 @@ func (parser *Parser) updateBuffer(length int) error {
default:
// The leading octet is invalid.
return formatReaderError(
"invalid leading UTF-8 octet",
parser.offset, int(octet))
fmt.Sprintf("invalid leading UTF-8 octet (value: %d)", octet),
Mark{Index: parser.offset})
}
// Check if the raw buffer contains an incomplete character.
@@ -214,7 +132,7 @@ func (parser *Parser) updateBuffer(length int) error {
if parser.eof {
return formatReaderError(
"incomplete UTF-8 octet sequence",
parser.offset, -1)
Mark{Index: parser.offset})
}
break inner
}
@@ -240,8 +158,8 @@ func (parser *Parser) updateBuffer(length int) error {
// Check if the octet is valid.
if (octet & 0xC0) != 0x80 {
return formatReaderError(
"invalid trailing UTF-8 octet",
parser.offset+k, int(octet))
fmt.Sprintf("invalid trailing UTF-8 octet (value: %d)", octet),
Mark{Index: parser.offset + k})
}
// Decode the octet.
@@ -257,14 +175,14 @@ func (parser *Parser) updateBuffer(length int) error {
default:
return formatReaderError(
"invalid length of a UTF-8 sequence",
parser.offset, -1)
Mark{Index: parser.offset})
}
// Check the range of the value.
if value >= 0xD800 && value <= 0xDFFF || value > 0x10FFFF {
return formatReaderError(
"invalid Unicode character",
parser.offset, int(value))
fmt.Sprintf("invalid Unicode character (value: %d)", value),
Mark{Index: parser.offset})
}
case UTF16LE_ENCODING, UTF16BE_ENCODING:
@@ -304,7 +222,7 @@ func (parser *Parser) updateBuffer(length int) error {
if parser.eof {
return formatReaderError(
"incomplete UTF-16 character",
parser.offset, -1)
Mark{Index: parser.offset})
}
break inner
}
@@ -316,8 +234,8 @@ func (parser *Parser) updateBuffer(length int) error {
// Check for unexpected low surrogate area.
if value&0xFC00 == 0xDC00 {
return formatReaderError(
"unexpected low surrogate area",
parser.offset, int(value))
fmt.Sprintf("unexpected low surrogate area (value: %d)", value),
Mark{Index: parser.offset})
}
// Check for a high surrogate area.
@@ -329,7 +247,7 @@ func (parser *Parser) updateBuffer(length int) error {
if parser.eof {
return formatReaderError(
"incomplete UTF-16 surrogate pair",
parser.offset, -1)
Mark{Index: parser.offset})
}
break inner
}
@@ -341,8 +259,8 @@ func (parser *Parser) updateBuffer(length int) error {
// Check for a low surrogate area.
if value2&0xFC00 != 0xDC00 {
return formatReaderError(
"expected low surrogate area",
parser.offset+2, int(value2))
fmt.Sprintf("expected low surrogate area (value: %d)", value2),
Mark{Index: parser.offset + 2})
}
// Generate the value of the surrogate pair.
@@ -383,8 +301,8 @@ func (parser *Parser) updateBuffer(length int) error {
case value >= 0x10000 && value <= 0x10FFFF:
default:
return formatReaderError(
"control characters are not allowed",
parser.offset, int(value))
fmt.Sprintf("control characters are not allowed (value: %d)", value),
Mark{Index: parser.offset})
}
// Move the raw pointers.
@@ -439,3 +357,90 @@ func (parser *Parser) updateBuffer(length int) error {
parser.buffer = parser.buffer[:buffer_len]
return nil
}
// Byte order marks for UTF-8, UTF-16LE, and UTF-16BE encodings.
const (
bom_UTF8 = "\xef\xbb\xbf"
bom_UTF16LE = "\xff\xfe"
bom_UTF16BE = "\xfe\xff"
)
// Determine the input stream encoding by checking the BOM symbol.
// If no BOM is found, the UTF-8 encoding is assumed.
// Return 1 on success, 0 on failure.
func (parser *Parser) determineEncoding() error {
// Ensure that we had enough bytes in the raw buffer.
for !parser.eof && len(parser.raw_buffer)-parser.raw_buffer_pos < 3 {
if err := parser.updateRawBuffer(); err != nil {
return err
}
}
// Determine the encoding.
buf := parser.raw_buffer
pos := parser.raw_buffer_pos
avail := len(buf) - pos
if avail >= 2 && buf[pos] == bom_UTF16LE[0] && buf[pos+1] == bom_UTF16LE[1] {
parser.encoding = UTF16LE_ENCODING
parser.raw_buffer_pos += 2
parser.offset += 2
} else if avail >= 2 && buf[pos] == bom_UTF16BE[0] && buf[pos+1] == bom_UTF16BE[1] {
parser.encoding = UTF16BE_ENCODING
parser.raw_buffer_pos += 2
parser.offset += 2
} else if avail >= 3 && buf[pos] == bom_UTF8[0] && buf[pos+1] == bom_UTF8[1] && buf[pos+2] == bom_UTF8[2] {
parser.encoding = UTF8_ENCODING
parser.raw_buffer_pos += 3
parser.offset += 3
} else {
parser.encoding = UTF8_ENCODING
}
return nil
}
// Update the raw buffer.
func (parser *Parser) updateRawBuffer() error {
size_read := 0
// Return if the raw buffer is full.
if parser.raw_buffer_pos == 0 && len(parser.raw_buffer) == cap(parser.raw_buffer) {
return nil
}
// Return on EOF.
if parser.eof {
return nil
}
// Move the remaining bytes in the raw buffer to the beginning.
if parser.raw_buffer_pos > 0 && parser.raw_buffer_pos < len(parser.raw_buffer) {
copy(parser.raw_buffer, parser.raw_buffer[parser.raw_buffer_pos:])
}
parser.raw_buffer = parser.raw_buffer[:len(parser.raw_buffer)-parser.raw_buffer_pos]
parser.raw_buffer_pos = 0
// Call the read handler to fill the buffer.
size_read, err := parser.read_handler(parser, parser.raw_buffer[len(parser.raw_buffer):cap(parser.raw_buffer)])
parser.raw_buffer = parser.raw_buffer[:len(parser.raw_buffer)+size_read]
if err == io.EOF {
parser.eof = true
} else if err != nil {
return &LoadError{
Stage: ReaderStage,
Message: fmt.Sprintf("input error: %v", err),
Mark: Mark{Index: parser.offset},
err: err,
}
}
return nil
}
// formatReaderError creates a LoadError for reader-stage errors.
func formatReaderError(message string, mark Mark) *LoadError {
return &LoadError{
Stage: ReaderStage,
Message: message,
Mark: mark,
err: errors.New(message),
}
}
+395 -384
View File
@@ -3,14 +3,12 @@
// SPDX-License-Identifier: Apache-2.0
// Representer stage: Converts Go values to YAML nodes.
// Handles marshaling from Go types to the intermediate node representation.
// Handles representing from Go types to the intermediate node representation.
package libyaml
import (
"encoding"
"fmt"
"io"
"reflect"
"regexp"
"sort"
@@ -21,10 +19,399 @@ import (
"unicode/utf8"
)
// keyList is a sortable slice of reflect.Values used for sorting map keys
// in a natural order (numeric, then lexicographic).
type keyList []reflect.Value
func (l keyList) Len() int { return len(l) }
// Representer converts Go values to YAML node trees with configurable
// formatting options.
type Representer struct {
flow bool
Indent int
lineWidth int
explicitStart bool
explicitEnd bool
flowSimpleCollections bool
quotePreference QuoteStyle
}
// NewRepresenter creates a new YAML representer with the given options.
func NewRepresenter(opts *Options) *Representer {
return &Representer{
Indent: opts.Indent,
lineWidth: opts.LineWidth,
explicitStart: opts.ExplicitStart,
explicitEnd: opts.ExplicitEnd,
flowSimpleCollections: opts.FlowSimpleCollections,
quotePreference: opts.QuotePreference,
}
}
// Represent converts a Go value to a YAML node tree.
// This is the primary method for the Representer stage in the dump pipeline.
func (r *Representer) Represent(tag string, in reflect.Value) *Node {
var node *Node
if in.IsValid() {
node, _ = in.Interface().(*Node)
}
if node != nil && node.Kind == DocumentNode {
// Already a document node, return as-is
return node
} else {
// Wrap the represented value in a document node
contentNode := r.represent(tag, in)
return &Node{
Kind: DocumentNode,
Content: []*Node{contentNode},
}
}
}
// From http://yaml.org/type/float.html, except the regular expression there
// is bogus. In practice parsers do not enforce the "\.[0-9_]*" suffix.
var base60float = regexp.MustCompile(`^[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+(?:\.[0-9_]*)?$`)
// represent is the core conversion method that handles the actual
// type-specific conversion from Go values to YAML nodes.
func (r *Representer) represent(tag string, in reflect.Value) *Node {
tag = shortTag(tag)
if !in.IsValid() || in.Kind() == reflect.Pointer && in.IsNil() {
return r.nilv()
}
iface := in.Interface()
switch value := iface.(type) {
case *Node:
return r.nodev(in)
case Node:
if !in.CanAddr() {
n := reflect.New(in.Type()).Elem()
n.Set(in)
in = n
}
return r.nodev(in.Addr())
case time.Time:
return r.timev(tag, in)
case *time.Time:
return r.timev(tag, in.Elem())
case time.Duration:
return r.stringv(tag, reflect.ValueOf(value.String()))
case Marshaler:
v, err := value.MarshalYAML()
if err != nil {
failDump(RepresenterStage, err)
}
if v == nil {
return r.nilv()
}
return r.represent(tag, reflect.ValueOf(v))
case encoding.TextMarshaler:
text, err := value.MarshalText()
if err != nil {
failDump(RepresenterStage, err)
}
in = reflect.ValueOf(string(text))
case nil:
return r.nilv()
}
switch in.Kind() {
case reflect.Interface:
return r.represent(tag, in.Elem())
case reflect.Map:
return r.mapv(tag, in)
case reflect.Pointer:
return r.represent(tag, in.Elem())
case reflect.Struct:
return r.structv(tag, in)
case reflect.Slice, reflect.Array:
return r.slicev(tag, in)
case reflect.String:
return r.stringv(tag, in)
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return r.intv(tag, in)
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
return r.uintv(tag, in)
case reflect.Float32, reflect.Float64:
return r.floatv(tag, in)
case reflect.Bool:
return r.boolv(tag, in)
default:
failDumpf(RepresenterStage, "cannot represent type: %s", in.Type().String())
return nil // unreachable; failDumpf always panics
}
}
// mapv converts a Go map to a YAML mapping node with sorted keys.
func (r *Representer) mapv(tag string, in reflect.Value) *Node {
if tag == "" {
tag = mapTag
}
var style Style
if r.flow {
r.flow = false
style = FlowStyle
}
keys := keyList(in.MapKeys())
sort.Sort(keys)
content := make([]*Node, 0, len(keys)*2)
for _, k := range keys {
content = append(content, r.represent("", k))
content = append(content, r.represent("", in.MapIndex(k)))
}
return &Node{
Kind: MappingNode,
Tag: tag,
Content: content,
Style: style,
}
}
// structv converts a Go struct to a YAML mapping node, handling field tags,
// omitempty, inline fields, and inline maps.
func (r *Representer) structv(tag string, in reflect.Value) *Node {
sinfo, err := getStructInfo(in.Type())
if err != nil {
failDump(RepresenterStage, err)
}
if tag == "" {
tag = mapTag
}
var style Style
if r.flow {
r.flow = false
style = FlowStyle
}
content := make([]*Node, 0)
for _, info := range sinfo.FieldsList {
var value reflect.Value
if info.Inline == nil {
value = in.Field(info.Num)
} else {
value = r.fieldByIndex(in, info.Inline)
if !value.IsValid() {
continue
}
}
if info.OmitEmpty && isZero(value) {
continue
}
content = append(content, r.represent("", reflect.ValueOf(info.Key)))
r.flow = info.Flow
content = append(content, r.represent("", value))
}
if sinfo.InlineMap >= 0 {
m := in.Field(sinfo.InlineMap)
if m.Len() > 0 {
r.flow = false
keys := keyList(m.MapKeys())
sort.Sort(keys)
for _, k := range keys {
if _, found := sinfo.FieldsMap[k.String()]; found {
failDumpf(RepresenterStage, "cannot have key %q in inlined map: conflicts with struct field", k.String())
}
content = append(content, r.represent("", k))
r.flow = false
content = append(content, r.represent("", m.MapIndex(k)))
}
}
}
return &Node{
Kind: MappingNode,
Tag: tag,
Content: content,
Style: style,
}
}
// slicev converts a Go slice or array to a YAML sequence node.
func (r *Representer) slicev(tag string, in reflect.Value) *Node {
if tag == "" {
tag = seqTag
}
var style Style
if r.flow {
r.flow = false
style = FlowStyle
}
n := in.Len()
content := make([]*Node, n)
for i := 0; i < n; i++ {
content[i] = r.represent("", in.Index(i))
}
return &Node{
Kind: SequenceNode,
Tag: tag,
Content: content,
Style: style,
}
}
// stringv converts a Go string to a YAML scalar node, handling quoting,
// binary data (base64 encoding), and special string values.
func (r *Representer) stringv(tag string, in reflect.Value) *Node {
var style Style
s := in.String()
needsQuoting := false
switch {
case !utf8.ValidString(s):
if tag == binaryTag {
failDumpf(RepresenterStage, "explicitly tagged !!binary data must be base64-encoded")
}
if tag != "" {
failDumpf(RepresenterStage, "cannot represent invalid UTF-8 data as %s", shortTag(tag))
}
// It can't be represented directly as YAML so use a binary tag
// and represent it as base64.
tag = binaryTag
s = encodeBase64(s)
case tag == "":
tag = strTag
// Check if this string needs quoting for compatibility
// even though it would resolve as !!str
needsQuoting = isBase60Float(s) || isOldBool(s) || looksLikeMerge(s)
}
// Set the style based on content
switch {
case strings.Contains(s, "\n"):
if r.flow || !shouldUseLiteralStyle(s) {
style = DoubleQuotedStyle
} else {
style = LiteralStyle
}
case needsQuoting:
// Force quoting for YAML 1.1 compatibility values
style = SingleQuotedStyle
default:
// Plain style by default - Desolver will add quotes if type mismatch
style = 0
}
return &Node{
Kind: ScalarNode,
Tag: tag,
Value: s,
Style: style,
}
}
// boolv converts a Go bool to a YAML scalar node.
func (r *Representer) boolv(tag string, in reflect.Value) *Node {
var s string
if in.Bool() {
s = "true"
} else {
s = "false"
}
if tag == "" {
tag = boolTag
}
return &Node{
Kind: ScalarNode,
Tag: tag,
Value: s,
}
}
// intv converts a Go signed integer to a YAML scalar node.
func (r *Representer) intv(tag string, in reflect.Value) *Node {
s := strconv.FormatInt(in.Int(), 10)
if tag == "" {
tag = intTag
}
return &Node{
Kind: ScalarNode,
Tag: tag,
Value: s,
}
}
// uintv converts a Go unsigned integer to a YAML scalar node.
func (r *Representer) uintv(tag string, in reflect.Value) *Node {
s := strconv.FormatUint(in.Uint(), 10)
if tag == "" {
tag = intTag
}
return &Node{
Kind: ScalarNode,
Tag: tag,
Value: s,
}
}
// timev converts a Go [time.Time] to a YAML scalar node in RFC3339Nano format.
func (r *Representer) timev(tag string, in reflect.Value) *Node {
t := in.Interface().(time.Time)
s := t.Format(time.RFC3339Nano)
if tag == "" {
tag = timestampTag
}
return &Node{
Kind: ScalarNode,
Tag: tag,
Value: s,
}
}
// floatv converts a Go float to a YAML scalar node, handling special values
// like infinity and NaN.
func (r *Representer) floatv(tag string, in reflect.Value) *Node {
// Issue #352: When formatting, use the precision of the underlying value
precision := 64
if in.Kind() == reflect.Float32 {
precision = 32
}
s := strconv.FormatFloat(in.Float(), 'g', -1, precision)
switch s {
case "+Inf":
s = ".inf"
case "-Inf":
s = "-.inf"
case "NaN":
s = ".nan"
}
if tag == "" {
tag = floatTag
}
return &Node{
Kind: ScalarNode,
Tag: tag,
Value: s,
}
}
// nilv creates a YAML null node.
func (r *Representer) nilv() *Node {
return &Node{
Kind: ScalarNode,
Tag: nullTag,
Value: "null",
}
}
// nodev returns a node value as-is without conversion.
func (r *Representer) nodev(in reflect.Value) *Node {
// Return the node as-is - no conversion needed
return in.Interface().(*Node)
}
// Len returns the number of keys in the list.
func (l keyList) Len() int { return len(l) }
// Swap exchanges the positions of two keys in the list.
func (l keyList) Swap(i, j int) { l[i], l[j] = l[j], l[i] }
// Less implements a natural sort order for map keys: numeric values sort
// numerically, strings sort with natural number ordering, and mixed types
// sort by kind.
func (l keyList) Less(i, j int) bool {
a := l[i]
b := l[j]
@@ -134,198 +521,8 @@ func numLess(a, b reflect.Value) bool {
panic("not a number")
}
// Sentinel values for newRepresenter parameters.
// These provide clarity at call sites, similar to http.NoBody.
var (
noWriter io.Writer = nil
noVersionDirective *VersionDirective = nil
noTagDirective []TagDirective = nil
)
type Representer struct {
Emitter Emitter
Out []byte
flow bool
Indent int
lineWidth int
doneInit bool
explicitStart bool
explicitEnd bool
flowSimpleCollections bool
quotePreference QuoteStyle
}
// NewRepresenter creates a new YAML representr with the given options.
//
// The writer parameter specifies the output destination for the representr.
// If writer is nil, the representr will write to an internal buffer.
func NewRepresenter(writer io.Writer, opts *Options) *Representer {
emitter := NewEmitter()
emitter.CompactSequenceIndent = opts.CompactSeqIndent
emitter.quotePreference = opts.QuotePreference
emitter.SetWidth(opts.LineWidth)
emitter.SetUnicode(opts.Unicode)
emitter.SetCanonical(opts.Canonical)
emitter.SetLineBreak(opts.LineBreak)
r := &Representer{
Emitter: emitter,
Indent: opts.Indent,
lineWidth: opts.LineWidth,
explicitStart: opts.ExplicitStart,
explicitEnd: opts.ExplicitEnd,
flowSimpleCollections: opts.FlowSimpleCollections,
quotePreference: opts.QuotePreference,
}
if writer != nil {
r.Emitter.SetOutputWriter(writer)
} else {
r.Emitter.SetOutputString(&r.Out)
}
return r
}
func (r *Representer) init() {
if r.doneInit {
return
}
if r.Indent == 0 {
r.Indent = 4
}
r.Emitter.BestIndent = r.Indent
r.emit(NewStreamStartEvent(UTF8_ENCODING))
r.doneInit = true
}
func (r *Representer) Finish() {
r.Emitter.OpenEnded = false
r.emit(NewStreamEndEvent())
}
func (r *Representer) Destroy() {
r.Emitter.Delete()
}
func (r *Representer) emit(event Event) {
// This will internally delete the event value.
r.must(r.Emitter.Emit(&event))
}
func (r *Representer) must(err error) {
if err != nil {
msg := err.Error()
if msg == "" {
msg = "unknown problem generating YAML content"
}
failf("%s", msg)
}
}
func (r *Representer) MarshalDoc(tag string, in reflect.Value) {
r.init()
var node *Node
if in.IsValid() {
node, _ = in.Interface().(*Node)
}
if node != nil && node.Kind == DocumentNode {
r.nodev(in)
} else {
// Use !explicitStart for implicit flag (true = implicit/no marker)
r.emit(NewDocumentStartEvent(noVersionDirective, noTagDirective, !r.explicitStart))
r.marshal(tag, in)
// Use !explicitEnd for implicit flag
r.emit(NewDocumentEndEvent(!r.explicitEnd))
}
}
func (r *Representer) marshal(tag string, in reflect.Value) {
tag = shortTag(tag)
if !in.IsValid() || in.Kind() == reflect.Pointer && in.IsNil() {
r.nilv()
return
}
iface := in.Interface()
switch value := iface.(type) {
case *Node:
r.nodev(in)
return
case Node:
if !in.CanAddr() {
n := reflect.New(in.Type()).Elem()
n.Set(in)
in = n
}
r.nodev(in.Addr())
return
case time.Time:
r.timev(tag, in)
return
case *time.Time:
r.timev(tag, in.Elem())
return
case time.Duration:
r.stringv(tag, reflect.ValueOf(value.String()))
return
case Marshaler:
v, err := value.MarshalYAML()
if err != nil {
Fail(err)
}
if v == nil {
r.nilv()
return
}
r.marshal(tag, reflect.ValueOf(v))
return
case encoding.TextMarshaler:
text, err := value.MarshalText()
if err != nil {
Fail(err)
}
in = reflect.ValueOf(string(text))
case nil:
r.nilv()
return
}
switch in.Kind() {
case reflect.Interface:
r.marshal(tag, in.Elem())
case reflect.Map:
r.mapv(tag, in)
case reflect.Pointer:
r.marshal(tag, in.Elem())
case reflect.Struct:
r.structv(tag, in)
case reflect.Slice, reflect.Array:
r.slicev(tag, in)
case reflect.String:
r.stringv(tag, in)
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
r.intv(tag, in)
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
r.uintv(tag, in)
case reflect.Float32, reflect.Float64:
r.floatv(tag, in)
case reflect.Bool:
r.boolv(tag, in)
default:
panic("cannot marshal type: " + in.Type().String())
}
}
func (r *Representer) mapv(tag string, in reflect.Value) {
r.mappingv(tag, func() {
keys := keyList(in.MapKeys())
sort.Sort(keys)
for _, k := range keys {
r.marshal("", k)
r.marshal("", in.MapIndex(k))
}
})
}
// fieldByIndex navigates through struct fields using the given index path,
// dereferencing pointers as needed.
func (r *Representer) fieldByIndex(v reflect.Value, index []int) (field reflect.Value) {
for _, num := range index {
for {
@@ -343,79 +540,10 @@ func (r *Representer) fieldByIndex(v reflect.Value, index []int) (field reflect.
return v
}
func (r *Representer) structv(tag string, in reflect.Value) {
sinfo, err := getStructInfo(in.Type())
if err != nil {
panic(err)
}
r.mappingv(tag, func() {
for _, info := range sinfo.FieldsList {
var value reflect.Value
if info.Inline == nil {
value = in.Field(info.Num)
} else {
value = r.fieldByIndex(in, info.Inline)
if !value.IsValid() {
continue
}
}
if info.OmitEmpty && isZero(value) {
continue
}
r.marshal("", reflect.ValueOf(info.Key))
r.flow = info.Flow
r.marshal("", value)
}
if sinfo.InlineMap >= 0 {
m := in.Field(sinfo.InlineMap)
if m.Len() > 0 {
r.flow = false
keys := keyList(m.MapKeys())
sort.Sort(keys)
for _, k := range keys {
if _, found := sinfo.FieldsMap[k.String()]; found {
panic(fmt.Sprintf("cannot have key %q in inlined map: conflicts with struct field", k.String()))
}
r.marshal("", k)
r.flow = false
r.marshal("", m.MapIndex(k))
}
}
}
})
}
func (r *Representer) mappingv(tag string, f func()) {
implicit := tag == ""
style := BLOCK_MAPPING_STYLE
if r.flow {
r.flow = false
style = FLOW_MAPPING_STYLE
}
r.emit(NewMappingStartEvent(nil, []byte(tag), implicit, style))
f()
r.emit(NewMappingEndEvent())
}
func (r *Representer) slicev(tag string, in reflect.Value) {
implicit := tag == ""
style := BLOCK_SEQUENCE_STYLE
if r.flow {
r.flow = false
style = FLOW_SEQUENCE_STYLE
}
r.emit(NewSequenceStartEvent(nil, []byte(tag), implicit, style))
n := in.Len()
for i := 0; i < n; i++ {
r.marshal("", in.Index(i))
}
r.emit(NewSequenceEndEvent())
}
// isBase60 returns whether s is in base 60 notation as defined in YAML 1.1.
//
// The base 60 float notation in YAML 1.1 is a terrible idea and is unsupported
// in YAML 1.2 and by this package, but these should be marshaled quoted for
// in YAML 1.2 and by this package, but these should be represented quoted for
// the time being for compatibility with other parsers.
func isBase60Float(s string) (result bool) {
// Fast path.
@@ -430,14 +558,10 @@ func isBase60Float(s string) (result bool) {
return base60float.MatchString(s)
}
// From http://yaml.org/type/float.html, except the regular expression there
// is bogus. In practice parsers do not enforce the "\.[0-9_]*" suffix.
var base60float = regexp.MustCompile(`^[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+(?:\.[0-9_]*)?$`)
// isOldBool returns whether s is bool notation as defined in YAML 1.1.
//
// We continue to force strings that YAML 1.1 would interpret as booleans to be
// rendered as quotes strings so that the marshaled output valid for YAML 1.1
// rendered as quotes strings so that the represented output valid for YAML 1.1
// parsing.
func isOldBool(s string) (result bool) {
switch s {
@@ -456,116 +580,3 @@ func isOldBool(s string) (result bool) {
func looksLikeMerge(s string) (result bool) {
return s == "<<"
}
func (r *Representer) stringv(tag string, in reflect.Value) {
var style ScalarStyle
s := in.String()
canUsePlain := true
switch {
case !utf8.ValidString(s):
if tag == binaryTag {
failf("explicitly tagged !!binary data must be base64-encoded")
}
if tag != "" {
failf("cannot marshal invalid UTF-8 data as %s", shortTag(tag))
}
// It can't be represented directly as YAML so use a binary tag
// and represent it as base64.
tag = binaryTag
s = encodeBase64(s)
case tag == "":
// Check to see if it would resolve to a specific
// tag when represented unquoted. If it doesn't,
// there's no need to quote it.
rtag, _ := resolve("", s)
canUsePlain = rtag == strTag &&
!(isBase60Float(s) ||
isOldBool(s) ||
looksLikeMerge(s))
}
// Note: it's possible for user code to emit invalid YAML
// if they explicitly specify a tag and a string containing
// text that's incompatible with that tag.
switch {
case strings.Contains(s, "\n"):
if r.flow || !shouldUseLiteralStyle(s) {
style = DOUBLE_QUOTED_SCALAR_STYLE
} else {
style = LITERAL_SCALAR_STYLE
}
case canUsePlain:
style = PLAIN_SCALAR_STYLE
default:
style = r.quotePreference.ScalarStyle()
}
r.emitScalar(s, "", tag, style, nil, nil, nil, nil)
}
func (r *Representer) boolv(tag string, in reflect.Value) {
var s string
if in.Bool() {
s = "true"
} else {
s = "false"
}
r.emitScalar(s, "", tag, PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) intv(tag string, in reflect.Value) {
s := strconv.FormatInt(in.Int(), 10)
r.emitScalar(s, "", tag, PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) uintv(tag string, in reflect.Value) {
s := strconv.FormatUint(in.Uint(), 10)
r.emitScalar(s, "", tag, PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) timev(tag string, in reflect.Value) {
t := in.Interface().(time.Time)
s := t.Format(time.RFC3339Nano)
r.emitScalar(s, "", tag, PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) floatv(tag string, in reflect.Value) {
// Issue #352: When formatting, use the precision of the underlying value
precision := 64
if in.Kind() == reflect.Float32 {
precision = 32
}
s := strconv.FormatFloat(in.Float(), 'g', -1, precision)
switch s {
case "+Inf":
s = ".inf"
case "-Inf":
s = "-.inf"
case "NaN":
s = ".nan"
}
r.emitScalar(s, "", tag, PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) nilv() {
r.emitScalar("null", "", "", PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
}
func (r *Representer) emitScalar(
value, anchor, tag string, style ScalarStyle, head, line, foot, tail []byte,
) {
// TODO Kill this function. Replace all initialize calls by their underlining Go literals.
implicit := tag == ""
if !implicit {
tag = longTag(tag)
}
event := NewScalarEvent([]byte(anchor), []byte(tag), []byte(value), implicit, implicit, style)
event.HeadComment = head
event.LineComment = line
event.FootComment = foot
event.TailComment = tail
r.emit(event)
}
func (r *Representer) nodev(in reflect.Value) {
r.node(in.Interface().(*Node), "")
}
+205 -104
View File
@@ -10,6 +10,7 @@ package libyaml
import (
"encoding/base64"
"fmt"
"math"
"regexp"
"strconv"
@@ -17,11 +18,174 @@ import (
"time"
)
// resolveMapItem holds a resolved value and its YAML tag for exact string
// matches in the resolution table.
type resolveMapItem struct {
value any
tag string
}
// Resolver handles tag resolution for YAML nodes.
type Resolver struct {
opts *Options
}
// NewResolver creates a new Resolver with the given options.
func NewResolver(opts *Options) *Resolver {
return &Resolver{opts: opts}
}
// Resolve walks the node tree and resolves tags for untagged nodes.
// This is called after composition to:
// - Default quoted scalars to !!str
// - Default sequences to !!seq
// - Default mappings to !!map
// - Resolve plain scalars to implicit types (int, float, bool, null, timestamp)
func (r *Resolver) Resolve(n *Node) {
if n == nil {
return
}
switch n.Kind {
case ScalarNode:
if n.Tag == "" {
if n.Style&(SingleQuotedStyle|DoubleQuotedStyle|LiteralStyle|FoldedStyle) != 0 {
// Quoted scalars default to !!str without value resolution
n.Tag = strTag
} else {
// Plain scalars: resolve type from value
n.Tag, _ = resolve("", n.Value)
}
}
case SequenceNode:
if n.Tag == "" {
n.Tag = seqTag
}
for _, child := range n.Content {
r.Resolve(child)
}
case MappingNode:
if n.Tag == "" {
n.Tag = mapTag
}
for _, child := range n.Content {
r.Resolve(child)
}
case DocumentNode:
for _, child := range n.Content {
r.Resolve(child)
}
case AliasNode:
// Alias nodes point to already-resolved nodes
}
}
// resolve determines the YAML tag and Go value for a scalar string.
// It takes a tag hint and the scalar string value, and returns the resolved
// tag and the corresponding Go value (int, float, bool, [time.Time], etc.).
// If the tag is already specified and non-resolvable, it returns the input
// unchanged.
func resolve(tag string, in string) (rtag string, out any) {
tag = shortTag(tag)
if !resolvableTag(tag) {
return tag, in
}
defer func() {
switch tag {
case "", rtag, strTag, binaryTag:
return
case floatTag:
if rtag == intTag {
switch v := out.(type) {
case int64:
rtag = floatTag
out = float64(v)
return
case int:
rtag = floatTag
out = float64(v)
return
}
}
}
Fail(formatResolverError(
fmt.Sprintf("cannot construct %s `%s` as a %s", shortTag(rtag), in, shortTag(tag)),
Mark{},
))
}()
// Any data is accepted as a !!str or !!binary.
// Otherwise, the prefix is enough of a hint about what it might be.
hint := byte('N')
if in != "" {
hint = resolveTable[in[0]]
}
if hint != 0 && tag != strTag && tag != binaryTag {
// Handle things we can lookup in a map.
if item, ok := resolveMap[in]; ok {
return item.tag, item.value
}
// Base 60 floats are a bad idea, were dropped in YAML 1.2, and
// are purposefully unsupported here. They're still quoted on
// the way out for compatibility with other parser, though.
switch hint {
case 'M':
// We've already checked the map above.
case '.':
// Not in the map, so maybe a normal float.
floatv, err := strconv.ParseFloat(in, 64)
if err == nil {
return floatTag, floatv
}
case 'D', 'S':
// Int, float, or timestamp.
// Only try values as a timestamp if the value is
// unquoted or there's an explicit !!timestamp tag.
if tag == "" || tag == timestampTag {
t, ok := parseTimestamp(in)
if ok {
return timestampTag, t
}
}
plain := strings.ReplaceAll(in, "_", "")
intv, err := strconv.ParseInt(plain, 0, 64)
if err == nil {
if intv == int64(int(intv)) {
return intTag, int(intv)
} else {
return intTag, intv
}
}
uintv, err := strconv.ParseUint(plain, 0, 64)
if err == nil {
return intTag, uintv
}
if yamlStyleFloat.MatchString(plain) {
floatv, err := strconv.ParseFloat(plain, 64)
if err == nil {
return floatTag, floatv
}
}
default:
panic("internal error: missing handler for resolver table: " + string(rune(hint)) + " (with " + in + ")")
}
}
return strTag, in
}
// resolveTable provides a fast lookup table for initial character-based
// classification during tag resolution.
// resolveMap maps specific scalar strings to their resolved values and tags.
var (
resolveTable = make([]byte, 256)
resolveMap = make(map[string]resolveMapItem)
@@ -32,6 +196,24 @@ var (
// https://staticcheck.dev/docs/checks/#SA4026
var negativeZero = math.Copysign(0.0, -1.0)
// yamlStyleFloat matches floating-point numbers in YAML style (including
// scientific notation and numbers starting with a dot).
var yamlStyleFloat = regexp.MustCompile(`^[-+]?(?:\.[0-9]+|[0-9]+(?:\.[0-9]*)?)(?:[eE][-+]?[0-9]+)?$`)
// allowedTimestampFormats lists the timestamp formats supported by the
// resolver.
// This is a subset of the formats allowed by the regular expression
// defined at http://yaml.org/type/timestamp.html.
var allowedTimestampFormats = []string{
"2006-1-2T15:4:5.999999999Z07:00", // RCF3339Nano with short date fields.
"2006-1-2t15:4:5.999999999Z07:00", // RFC3339Nano with short date fields and lower-case "t".
"2006-1-2 15:4:5.999999999", // space separated with no time zone
"2006-1-2", // date only
// Notable exception: time.Parse cannot handle: "2001-12-14 21:59:43.10 -5"
// from the set of examples.
}
// init initializes the resolveTable with character class mappings for tag resolution.
func init() {
t := resolveTable
t[int('+')] = 'S' // Sign
@@ -68,6 +250,8 @@ func init() {
}
}
// resolvableTag checks if a tag can be automatically resolved from a scalar
// value.
func resolvableTag(tag string) bool {
switch tag {
case "", strTag, boolTag, intTag, floatTag, nullTag, timestampTag:
@@ -76,99 +260,6 @@ func resolvableTag(tag string) bool {
return false
}
var yamlStyleFloat = regexp.MustCompile(`^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$`)
func resolve(tag string, in string) (rtag string, out any) {
tag = shortTag(tag)
if !resolvableTag(tag) {
return tag, in
}
defer func() {
switch tag {
case "", rtag, strTag, binaryTag:
return
case floatTag:
if rtag == intTag {
switch v := out.(type) {
case int64:
rtag = floatTag
out = float64(v)
return
case int:
rtag = floatTag
out = float64(v)
return
}
}
}
failf("cannot construct %s `%s` as a %s", shortTag(rtag), in, shortTag(tag))
}()
// Any data is accepted as a !!str or !!binary.
// Otherwise, the prefix is enough of a hint about what it might be.
hint := byte('N')
if in != "" {
hint = resolveTable[in[0]]
}
if hint != 0 && tag != strTag && tag != binaryTag {
// Handle things we can lookup in a map.
if item, ok := resolveMap[in]; ok {
return item.tag, item.value
}
// Base 60 floats are a bad idea, were dropped in YAML 1.2, and
// are purposefully unsupported here. They're still quoted on
// the way out for compatibility with other parser, though.
switch hint {
case 'M':
// We've already checked the map above.
case '.':
// Not in the map, so maybe a normal float.
floatv, err := strconv.ParseFloat(in, 64)
if err == nil {
return floatTag, floatv
}
case 'D', 'S':
// Int, float, or timestamp.
// Only try values as a timestamp if the value is unquoted or there's an explicit
// !!timestamp tag.
if tag == "" || tag == timestampTag {
t, ok := parseTimestamp(in)
if ok {
return timestampTag, t
}
}
plain := strings.ReplaceAll(in, "_", "")
intv, err := strconv.ParseInt(plain, 0, 64)
if err == nil {
if intv == int64(int(intv)) {
return intTag, int(intv)
} else {
return intTag, intv
}
}
uintv, err := strconv.ParseUint(plain, 0, 64)
if err == nil {
return intTag, uintv
}
if yamlStyleFloat.MatchString(plain) {
floatv, err := strconv.ParseFloat(plain, 64)
if err == nil {
return floatTag, floatv
}
}
default:
panic("internal error: missing handler for resolver table: " + string(rune(hint)) + " (with " + in + ")")
}
}
return strTag, in
}
// encodeBase64 encodes s as base64 that is broken up into multiple lines
// as appropriate for the resulting length.
func encodeBase64(s string) string {
@@ -194,17 +285,6 @@ func encodeBase64(s string) string {
return string(out[:k])
}
// This is a subset of the formats allowed by the regular expression
// defined at http://yaml.org/type/timestamp.html.
var allowedTimestampFormats = []string{
"2006-1-2T15:4:5.999999999Z07:00", // RCF3339Nano with short date fields.
"2006-1-2t15:4:5.999999999Z07:00", // RFC3339Nano with short date fields and lower-case "t".
"2006-1-2 15:4:5.999999999", // space separated with no time zone
"2006-1-2", // date only
// Notable exception: time.Parse cannot handle: "2001-12-14 21:59:43.10 -5"
// from the set of examples.
}
// parseTimestamp parses s as a timestamp string and
// returns the timestamp and reports whether it succeeded.
// Timestamp formats are defined at http://yaml.org/type/timestamp.html
@@ -229,3 +309,24 @@ func parseTimestamp(s string) (time.Time, bool) {
}
return time.Time{}, false
}
// formatResolverError creates a LoadError for resolver-stage errors.
func formatResolverError(message string, mark Mark) *LoadError {
return &LoadError{
Stage: ResolverStage,
Mark: mark,
Message: message,
}
}
// formatResolverErrorContext creates a LoadError with both context and
// problem information for resolver-stage errors.
func formatResolverErrorContext(context string, contextMark Mark, message string, mark Mark) *LoadError {
return &LoadError{
Stage: ResolverStage,
ContextMark: contextMark,
ContextMsg: context,
Mark: mark,
Message: message,
}
}
+1772 -1470
View File
File diff suppressed because it is too large Load Diff
+262 -59
View File
@@ -8,99 +8,158 @@
package libyaml
import (
"errors"
"fmt"
"io"
"strings"
"unicode/utf8"
)
// Serializer handles serialization of YAML nodes to event stream.
type Serializer struct {
Emitter Emitter
out []byte
lineWidth int
explicitStart bool
explicitEnd bool
flowSimpleCollections bool
quotePreference QuoteStyle
doneInit bool
}
// NewSerializer creates a new Serializer with the given options.
func NewSerializer(w io.Writer, opts *Options) *Serializer {
emitter := NewEmitter()
emitter.CompactSequenceIndent = opts.CompactSeqIndent
emitter.quotePreference = opts.QuotePreference
emitter.SetWidth(opts.LineWidth)
emitter.SetUnicode(opts.Unicode)
emitter.SetCanonical(opts.Canonical)
emitter.SetLineBreak(opts.LineBreak)
// Set indentation (defaults to 2 if not specified)
indent := opts.Indent
if indent == 0 {
indent = 2
}
emitter.BestIndent = indent
if w != nil {
emitter.SetOutputWriter(w)
}
return &Serializer{
Emitter: emitter,
lineWidth: opts.LineWidth,
explicitStart: opts.ExplicitStart,
explicitEnd: opts.ExplicitEnd,
flowSimpleCollections: opts.FlowSimpleCollections,
quotePreference: opts.QuotePreference,
}
}
// Serialize walks a Node tree and emits events to produce YAML output.
// This is the primary method for the Serializer stage.
func (s *Serializer) Serialize(node *Node) {
s.init()
s.node(node, "")
}
// Sentinel values for event creation.
// These provide clarity at call sites, similar to http.NoBody.
var (
noVersionDirective *VersionDirective = nil
noTagDirective []TagDirective = nil
)
// init initializes the serializer by emitting a STREAM_START event.
func (s *Serializer) init() {
if s.doneInit {
return
}
s.emit(NewStreamStartEvent(UTF8_ENCODING))
s.doneInit = true
}
// Finish completes serialization by emitting a STREAM_END event.
func (s *Serializer) Finish() {
s.Emitter.OpenEnded = false
s.emit(NewStreamEndEvent())
}
// node serializes a Node tree into YAML events.
// This is the core of the serializer stage - it walks the tree and produces events.
func (r *Representer) node(node *Node, tail string) {
// This is the core of the serializer stage - it walks the tree and produces
// events.
func (s *Serializer) node(node *Node, tail string) {
// Zero nodes behave as nil.
if node.Kind == 0 && node.IsZero() {
r.nilv()
s.emitScalar("null", "", "", PLAIN_SCALAR_STYLE, nil, nil, nil, nil)
return
}
// If the tag was not explicitly requested, and dropping it won't change the
// implicit tag of the value, don't include it in the presentation.
// Tags have been processed by Desolver:
// - Empty tag = can be inferred or style handles it
// - Non-empty tag = emit explicitly
// Style has also been set by Desolver for quoting needs
tag := node.Tag
stag := shortTag(tag)
var forceQuoting bool
if tag != "" && node.Style&TaggedStyle == 0 {
if node.Kind == ScalarNode {
if stag == strTag && node.Style&(SingleQuotedStyle|DoubleQuotedStyle|LiteralStyle|FoldedStyle) != 0 {
tag = ""
} else {
rtag, _ := resolve("", node.Value)
if rtag == stag && stag != mergeTag {
tag = ""
} else if stag == strTag {
tag = ""
forceQuoting = true
}
}
} else {
var rtag string
switch node.Kind {
case MappingNode:
rtag = mapTag
case SequenceNode:
rtag = seqTag
}
if rtag == stag {
tag = ""
}
if tag == "" && node.Kind == ScalarNode {
// Empty tag with quoting style means the string type needs to
// be preserved
if node.Style&(SingleQuotedStyle|DoubleQuotedStyle|LiteralStyle|FoldedStyle) != 0 {
forceQuoting = true
}
}
switch node.Kind {
case DocumentNode:
event := NewDocumentStartEvent(noVersionDirective, noTagDirective, !r.explicitStart)
event := NewDocumentStartEvent(noVersionDirective, noTagDirective, !s.explicitStart)
event.HeadComment = []byte(node.HeadComment)
r.emit(event)
s.emit(event)
for _, node := range node.Content {
r.node(node, "")
s.node(node, "")
}
event = NewDocumentEndEvent(!r.explicitEnd)
event = NewDocumentEndEvent(!s.explicitEnd)
event.FootComment = []byte(node.FootComment)
r.emit(event)
s.emit(event)
case SequenceNode:
style := BLOCK_SEQUENCE_STYLE
// Use flow style if explicitly requested or if it's a simple
// collection (scalar-only contents that fit within line width,
// enabled via WithFlowSimpleCollections)
if node.Style&FlowStyle != 0 || r.isSimpleCollection(node) {
if node.Style&FlowStyle != 0 || s.isSimpleCollection(node) {
style = FLOW_SEQUENCE_STYLE
}
event := NewSequenceStartEvent([]byte(node.Anchor), []byte(longTag(tag)), tag == "", style)
event.HeadComment = []byte(node.HeadComment)
r.emit(event)
s.emit(event)
for _, node := range node.Content {
r.node(node, "")
s.node(node, "")
}
event = NewSequenceEndEvent()
event.LineComment = []byte(node.LineComment)
event.FootComment = []byte(node.FootComment)
r.emit(event)
s.emit(event)
case MappingNode:
style := BLOCK_MAPPING_STYLE
// Use flow style if explicitly requested or if it's a simple
// collection (scalar-only contents that fit within line width,
// enabled via WithFlowSimpleCollections)
if node.Style&FlowStyle != 0 || r.isSimpleCollection(node) {
if node.Style&FlowStyle != 0 || s.isSimpleCollection(node) {
style = FLOW_MAPPING_STYLE
}
event := NewMappingStartEvent([]byte(node.Anchor), []byte(longTag(tag)), tag == "", style)
event.TailComment = []byte(tail)
event.HeadComment = []byte(node.HeadComment)
r.emit(event)
s.emit(event)
// The tail logic below moves the foot comment of prior keys to the following key,
// since the value for each key may be a nested structure and the foot needs to be
// processed only the entirety of the value is streamed. The last tail is processed
// with the mapping end event.
// The tail logic below moves the foot comment of prior keys to
// the following key, since the value for each key may be a
// nested structure and the foot needs to be processed only the
// entirety of the value is streamed. The last tail is
// processed with the mapping end event.
var tail string
for i := 0; i+1 < len(node.Content); i += 2 {
k := node.Content[i]
@@ -110,34 +169,35 @@ func (r *Representer) node(node *Node, tail string) {
kopy.FootComment = ""
k = &kopy
}
r.node(k, tail)
s.node(k, tail)
tail = foot
v := node.Content[i+1]
r.node(v, "")
s.node(v, "")
}
event = NewMappingEndEvent()
event.TailComment = []byte(tail)
event.LineComment = []byte(node.LineComment)
event.FootComment = []byte(node.FootComment)
r.emit(event)
s.emit(event)
case AliasNode:
event := NewAliasEvent([]byte(node.Value))
event.HeadComment = []byte(node.HeadComment)
event.LineComment = []byte(node.LineComment)
event.FootComment = []byte(node.FootComment)
r.emit(event)
s.emit(event)
case ScalarNode:
value := node.Value
if !utf8.ValidString(value) {
stag := shortTag(tag)
if stag == binaryTag {
failf("explicitly tagged !!binary data must be base64-encoded")
failDumpf(SerializerStage, "explicitly tagged !!binary data must be base64-encoded")
}
if stag != "" {
failf("cannot marshal invalid UTF-8 data as %s", stag)
failDumpf(SerializerStage, "cannot marshal invalid UTF-8 data as %s", stag)
}
// It can't be represented directly as YAML so use a binary tag
// and represent it as base64.
@@ -158,19 +218,67 @@ func (r *Representer) node(node *Node, tail string) {
case strings.Contains(value, "\n"):
style = LITERAL_SCALAR_STYLE
case forceQuoting:
style = r.quotePreference.ScalarStyle()
style = s.quotePreference.ScalarStyle()
}
r.emitScalar(value, node.Anchor, tag, style, []byte(node.HeadComment), []byte(node.LineComment), []byte(node.FootComment), []byte(tail))
s.emitScalar(value, node.Anchor, tag, style, []byte(node.HeadComment), []byte(node.LineComment), []byte(node.FootComment), []byte(tail))
default:
failf("cannot represent node with unknown kind %d", node.Kind)
failDumpf(SerializerStage, "cannot represent node with unknown kind %d", node.Kind)
}
}
// emit sends an event to the underlying emitter.
func (s *Serializer) emit(event Event) {
s.must(s.Emitter.Emit(&event))
}
// must panics if the given error is non-nil, routing to the appropriate stage.
func (s *Serializer) must(err error) {
if err == nil {
return
}
var ee EmitterError
if errors.As(err, &ee) {
failDumpf(EmitterStage, "%s", ee.Message)
}
var we WriterError
if errors.As(err, &we) {
// Unwrap to get the original I/O error, stripping the
// "write error: " prefix that WriterError adds internally.
cause := we.Err
if unwrapped := errors.Unwrap(we.Err); unwrapped != nil {
cause = unwrapped
}
failDump(WriterStage, cause)
}
msg := err.Error()
if msg == "" {
msg = fmt.Sprintf("unknown problem generating YAML content with %T", err)
}
failDumpf(SerializerStage, "%s", msg)
}
// emitScalar emits a scalar event with the given value, anchor, tag, style,
// and associated comments.
func (s *Serializer) emitScalar(
value, anchor, tag string, style ScalarStyle, head, line, foot, tail []byte,
) {
implicit := tag == ""
if !implicit {
tag = longTag(tag)
}
event := NewScalarEvent([]byte(anchor), []byte(tag), []byte(value), implicit, implicit, style)
event.HeadComment = head
event.LineComment = line
event.FootComment = foot
event.TailComment = tail
s.emit(event)
}
// isSimpleCollection checks if a node contains only scalar values and would
// fit within the line width when rendered in flow style.
func (r *Representer) isSimpleCollection(node *Node) bool {
if !r.flowSimpleCollections {
func (s *Serializer) isSimpleCollection(node *Node) bool {
if !s.flowSimpleCollections {
return false
}
if node.Kind != SequenceNode && node.Kind != MappingNode {
@@ -183,8 +291,8 @@ func (r *Representer) isSimpleCollection(node *Node) bool {
}
}
// Estimate flow style length
estimatedLen := r.estimateFlowLength(node)
width := r.lineWidth
estimatedLen := s.estimateFlowLength(node)
width := s.lineWidth
if width <= 0 {
width = 80 // Default width if not set
}
@@ -192,7 +300,7 @@ func (r *Representer) isSimpleCollection(node *Node) bool {
}
// estimateFlowLength estimates the character length of a node in flow style.
func (r *Representer) estimateFlowLength(node *Node) int {
func (s *Serializer) estimateFlowLength(node *Node) int {
if node.Kind == SequenceNode {
// [item1, item2, ...] = 2 + sum(len(items)) + 2*(len-1)
length := 2 // []
@@ -217,3 +325,98 @@ func (r *Representer) estimateFlowLength(node *Node) int {
}
return 0
}
// NewStreamStartEvent creates a new STREAM-START event.
func NewStreamStartEvent(encoding Encoding) Event {
return Event{
Type: STREAM_START_EVENT,
encoding: encoding,
}
}
// NewStreamEndEvent creates a new STREAM-END event.
func NewStreamEndEvent() Event {
return Event{
Type: STREAM_END_EVENT,
}
}
// NewDocumentStartEvent creates a new DOCUMENT-START event.
func NewDocumentStartEvent(version_directive *VersionDirective, tag_directives []TagDirective, implicit bool) Event {
return Event{
Type: DOCUMENT_START_EVENT,
versionDirective: version_directive,
tagDirectives: tag_directives,
Implicit: implicit,
}
}
// NewDocumentEndEvent creates a new DOCUMENT-END event.
func NewDocumentEndEvent(implicit bool) Event {
return Event{
Type: DOCUMENT_END_EVENT,
Implicit: implicit,
}
}
// NewAliasEvent creates a new ALIAS event.
func NewAliasEvent(anchor []byte) Event {
return Event{
Type: ALIAS_EVENT,
Anchor: anchor,
}
}
// NewScalarEvent creates a new SCALAR event.
func NewScalarEvent(anchor, tag, value []byte, plain_implicit, quoted_implicit bool, style ScalarStyle) Event {
return Event{
Type: SCALAR_EVENT,
Anchor: anchor,
Tag: tag,
Value: value,
Implicit: plain_implicit,
quoted_implicit: quoted_implicit,
Style: Style(style),
}
}
// NewSequenceStartEvent creates a new SEQUENCE-START event.
func NewSequenceStartEvent(anchor, tag []byte, implicit bool, style SequenceStyle) Event {
return Event{
Type: SEQUENCE_START_EVENT,
Anchor: anchor,
Tag: tag,
Implicit: implicit,
Style: Style(style),
}
}
// NewSequenceEndEvent creates a new SEQUENCE-END event.
func NewSequenceEndEvent() Event {
return Event{
Type: SEQUENCE_END_EVENT,
}
}
// NewMappingStartEvent creates a new MAPPING-START event.
func NewMappingStartEvent(anchor, tag []byte, implicit bool, style MappingStyle) Event {
return Event{
Type: MAPPING_START_EVENT,
Anchor: anchor,
Tag: tag,
Implicit: implicit,
Style: Style(style),
}
}
// NewMappingEndEvent creates a new MAPPING-END event.
func NewMappingEndEvent() Event {
return Event{
Type: MAPPING_END_EVENT,
}
}
// Delete an event object.
func (e *Event) Delete() {
*e = Event{}
}
+250
View File
@@ -0,0 +1,250 @@
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// Struct metadata extraction for YAML marshaling/unmarshaling.
//
// This file analyzes Go struct types to build mappings between YAML keys and
// struct fields. It parses struct tags like `yaml:"name,omitempty,flow,inline"`
// and caches the results for efficient repeated access.
//
// Used by:
// - Constructor: maps YAML keys to struct fields when unmarshaling
// - Representer: maps struct fields to YAML keys when marshaling
//
// Key types:
// - structInfo: cached metadata about a struct type
// - fieldInfo: metadata about a single struct field
// - getStructInfo(): analyzes a struct type and returns cached metadata
package libyaml
import (
"errors"
"fmt"
"reflect"
"strings"
"sync"
)
// structInfo holds cached information about a struct's YAML-relevant fields.
type structInfo struct {
FieldsMap map[string]fieldInfo
FieldsList []fieldInfo
// InlineMap is the number of the field in the struct that
// contains an ,inline map, or -1 if there's none.
InlineMap int
// InlineConstructors holds indexes to inlined fields that
// contain constructor values.
InlineConstructors [][]int
}
// fieldInfo holds information about a single struct field.
type fieldInfo struct {
Key string
Num int
OmitEmpty bool
Flow bool
// Id holds the unique field identifier, so we can cheaply
// check for field duplicates without maintaining an extra map.
Id int
// Inline holds the field index if the field is part of an inlined struct.
Inline []int
}
// structMap caches struct reflection information.
// fieldMapMutex protects access to structMap.
// constructorType holds the [reflect.Type] for the constructor interface.
var (
structMap = make(map[reflect.Type]*structInfo)
fieldMapMutex sync.RWMutex
constructorType reflect.Type
)
// constructor interface is defined here to detect types that implement
// UnmarshalYAML during struct reflection.
type constructor interface {
UnmarshalYAML(value *Node) error
}
// init initializes the constructorType variable with the [reflect.Type] of constructor interface.
func init() {
var v constructor
constructorType = reflect.ValueOf(&v).Elem().Type()
}
// hasConstructYAMLMethod checks if a type has an UnmarshalYAML method
// that takes a *Node from an allowlisted v3 yaml package. This detects
// v3 backward-compatible Unmarshaler implementations whose Node type
// can't be checked via interface assertion from this package.
func hasConstructYAMLMethod(t reflect.Type) bool {
method, found := t.MethodByName("UnmarshalYAML")
if !found {
return false
}
// Check signature: func(*T) UnmarshalYAML(*Node) error
mtype := method.Type
if mtype.NumIn() != 2 || mtype.NumOut() != 1 {
return false
}
// First param is receiver (already checked by MethodByName)
// Second param should be a pointer to a Node-like struct
paramType := mtype.In(1)
if paramType.Kind() != reflect.Ptr {
return false
}
elemType := paramType.Elem()
if elemType.Kind() != reflect.Struct || elemType.Name() != "Node" || !isYAMLNodePkg(elemType.PkgPath()) {
return false
}
// Return type should be error
retType := mtype.Out(0)
if retType.Kind() != reflect.Interface || retType.Name() != "error" {
return false
}
return true
}
func isYAMLNodePkg(pkg string) bool {
switch pkg {
case "gopkg.in/yaml.v3", "go.yaml.in/yaml/v3":
return true
}
return false
}
// getStructInfo returns cached information about a struct type's fields.
// It parses struct tags and builds a map of field names to field info.
func getStructInfo(st reflect.Type) (*structInfo, error) {
fieldMapMutex.RLock()
sinfo, found := structMap[st]
fieldMapMutex.RUnlock()
if found {
return sinfo, nil
}
n := st.NumField()
fieldsMap := make(map[string]fieldInfo)
fieldsList := make([]fieldInfo, 0, n)
inlineMap := -1
inlineConstructors := [][]int(nil)
for i := 0; i != n; i++ {
field := st.Field(i)
if field.PkgPath != "" && !field.Anonymous {
continue // Private field
}
info := fieldInfo{Num: i}
tag := field.Tag.Get("yaml")
if tag == "" && !strings.Contains(string(field.Tag), ":") {
tag = string(field.Tag)
}
if tag == "-" {
continue
}
inline := false
fields := strings.Split(tag, ",")
if len(fields) > 1 {
for _, flag := range fields[1:] {
switch flag {
case "omitempty":
info.OmitEmpty = true
case "flow":
info.Flow = true
case "inline":
inline = true
default:
return nil, fmt.Errorf("unsupported flag %q in tag %q of type %s", flag, tag, st)
}
}
tag = fields[0]
}
if inline {
switch field.Type.Kind() {
case reflect.Map:
if inlineMap >= 0 {
return nil, errors.New("multiple ,inline maps in struct " + st.String())
}
if field.Type.Key() != reflect.TypeOf("") {
return nil, errors.New("option ,inline needs a map with string keys in struct " + st.String())
}
inlineMap = info.Num
case reflect.Struct, reflect.Pointer:
ftype := field.Type
for ftype.Kind() == reflect.Pointer {
ftype = ftype.Elem()
}
if ftype.Kind() != reflect.Struct {
return nil, errors.New("option ,inline may only be used on a struct or map field")
}
// Check for both libyaml.constructor and yaml.Unmarshaler (by method name)
if reflect.PointerTo(ftype).Implements(constructorType) || hasConstructYAMLMethod(reflect.PointerTo(ftype)) {
inlineConstructors = append(inlineConstructors, []int{i})
} else {
sinfo, err := getStructInfo(ftype)
if err != nil {
return nil, err
}
for _, index := range sinfo.InlineConstructors {
inlineConstructors = append(inlineConstructors, append([]int{i}, index...))
}
for _, finfo := range sinfo.FieldsList {
if _, found := fieldsMap[finfo.Key]; found {
msg := "duplicated key '" + finfo.Key + "' in struct " + st.String()
return nil, errors.New(msg)
}
if finfo.Inline == nil {
finfo.Inline = []int{i, finfo.Num}
} else {
finfo.Inline = append([]int{i}, finfo.Inline...)
}
finfo.Id = len(fieldsList)
fieldsMap[finfo.Key] = finfo
fieldsList = append(fieldsList, finfo)
}
}
default:
return nil, errors.New("option ,inline may only be used on a struct or map field")
}
continue
}
if tag != "" {
info.Key = tag
} else {
info.Key = strings.ToLower(field.Name)
}
if _, found = fieldsMap[info.Key]; found {
msg := "duplicated key '" + info.Key + "' in struct " + st.String()
return nil, errors.New(msg)
}
info.Id = len(fieldsList)
fieldsList = append(fieldsList, info)
fieldsMap[info.Key] = info
}
sinfo = &structInfo{
FieldsMap: fieldsMap,
FieldsList: fieldsList,
InlineMap: inlineMap,
InlineConstructors: inlineConstructors,
}
fieldMapMutex.Lock()
structMap[st] = sinfo
fieldMapMutex.Unlock()
return sinfo, nil
}
+66 -424
View File
@@ -11,7 +11,6 @@ package libyaml
import (
"fmt"
"io"
"strings"
)
@@ -39,6 +38,7 @@ func (t *TagDirective) GetHandle() string { return string(t.handle) }
// GetPrefix returns the tag prefix.
func (t *TagDirective) GetPrefix() string { return string(t.prefix) }
// Encoding represents the character encoding of a YAML stream.
type Encoding int
// The stream encoding.
@@ -51,6 +51,7 @@ const (
UTF16BE_ENCODING // The UTF-16-BE encoding with BOM.
)
// LineBreak represents the line break style used in YAML output.
type LineBreak int
// Line break types.
@@ -63,6 +64,7 @@ const (
CRLN_BREAK // Use CR LN for line breaks (DOS style).
)
// QuoteStyle represents the preferred quote style for scalar values.
type QuoteStyle int
// Quote style types for required quoting.
@@ -82,6 +84,7 @@ func (q QuoteStyle) ScalarStyle() ScalarStyle {
return SINGLE_QUOTED_SCALAR_STYLE
}
// ErrorType represents the category of error that occurred during processing.
type ErrorType int
// Many bad things could happen with the parser and emitter.
@@ -101,10 +104,11 @@ const (
// Mark holds the pointer position.
type Mark struct {
Index int // The position index.
Line int // The position line (1-indexed).
Column int // The position column (0-indexed internally, displayed as 1-indexed).
Line int // The position line (1-indexed; 0 means unknown).
Column int // The position column (1-indexed; 0 means unknown).
}
// String returns a human-readable string representation of the position mark.
func (m Mark) String() string {
var builder strings.Builder
if m.Line == 0 {
@@ -112,17 +116,60 @@ func (m Mark) String() string {
}
fmt.Fprintf(&builder, "line %d", m.Line)
if m.Column != 0 {
fmt.Fprintf(&builder, ", column %d", m.Column+1)
if m.Column > 0 {
fmt.Fprintf(&builder, ", column %d", m.Column)
}
return builder.String()
}
// shortString returns a compact position string.
// Returns "<unknown position>" when Line is 0 (position not known).
// When Column is 0 (unknown), it is omitted from output ("L{line}");
// otherwise it is displayed as "L{line}.C{col}".
func (m Mark) shortString() string {
if m.Line == 0 {
return "<unknown position>"
}
if m.Column > 0 {
return fmt.Sprintf("L%d.C%d", m.Line, m.Column)
}
return fmt.Sprintf("L%d", m.Line)
}
// rangeString formats a position range from start mark m to end mark.
// Both marks use shortString for their individual display.
// When marks are on the same line:
// - Both Column==0: just "L2" (unknown columns, no range shown)
// - Both Column>0: "L2.C6-C7" (compact column range)
// - Mixed columns: "L1.C4-L1" (full start with line-only end)
//
// When marks are on different lines: "L1.C8-L2.C3"
func (m Mark) rangeString(end Mark) string {
start := m.shortString()
if m.Line == end.Line {
if m.Column == 0 && end.Column == 0 {
// Same line, unknown columns: just "L2"
return start
}
if m.Column > 0 && end.Column > 0 {
if m.Column == end.Column {
// Same position: just "L2.C6"
return start
}
// Same line with columns: "L2.C6-C7"
return fmt.Sprintf("%s-C%d", start, end.Column)
}
}
return fmt.Sprintf("%s-%s", start, end.shortString())
}
// Node Styles
// styleInt is the underlying type for style constants.
type styleInt int8
// ScalarStyle represents the formatting style of a scalar value.
type ScalarStyle styleInt
// Scalar styles.
@@ -155,6 +202,7 @@ func (style ScalarStyle) String() string {
}
}
// SequenceStyle represents the formatting style of a sequence node.
type SequenceStyle styleInt
// Sequence styles.
@@ -166,6 +214,7 @@ const (
FLOW_SEQUENCE_STYLE // The flow sequence style.
)
// MappingStyle represents the formatting style of a mapping node.
type MappingStyle styleInt
// Mapping styles.
@@ -179,6 +228,7 @@ const (
// Tokens
// TokenType represents the type of a scanned token.
type TokenType int
// Token types.
@@ -215,6 +265,7 @@ const (
COMMENT_TOKEN // A COMMENT token.
)
// String returns a string representation of the token type.
func (tt TokenType) String() string {
switch tt {
case NO_TOKEN:
@@ -297,6 +348,7 @@ type Token struct {
// Events
// EventType represents the type of a parsing or emitting event.
type EventType int8
// Event types.
@@ -317,6 +369,7 @@ const (
TAIL_COMMENT_EVENT
)
// eventStrings maps EventType constants to their string representations.
var eventStrings = []string{
NO_EVENT: "none",
STREAM_START_EVENT: "stream start",
@@ -332,6 +385,7 @@ var eventStrings = []string{
TAIL_COMMENT_EVENT: "tail comment",
}
// String returns a string representation of the event type.
func (e EventType) String() string {
if e < 0 || int(e) >= len(eventStrings) {
return fmt.Sprintf("unknown event %d", e)
@@ -382,9 +436,14 @@ type Event struct {
Style Style
}
func (e *Event) ScalarStyle() ScalarStyle { return ScalarStyle(e.Style) }
// ScalarStyle returns the style of a scalar event.
func (e *Event) ScalarStyle() ScalarStyle { return ScalarStyle(e.Style) }
// SequenceStyle returns the style of a sequence event.
func (e *Event) SequenceStyle() SequenceStyle { return SequenceStyle(e.Style) }
func (e *Event) MappingStyle() MappingStyle { return MappingStyle(e.Style) }
// MappingStyle returns the style of a mapping event.
func (e *Event) MappingStyle() MappingStyle { return MappingStyle(e.Style) }
// GetEncoding returns the stream encoding (for STREAM_START_EVENT).
func (e *Event) GetEncoding() Encoding { return e.encoding }
@@ -396,7 +455,6 @@ func (e *Event) GetVersionDirective() *VersionDirective { return e.versionDirect
func (e *Event) GetTagDirectives() []TagDirective { return e.tagDirectives }
// Nodes
const (
NULL_TAG = "tag:yaml.org,2002:null" // The tag !!null with the only possible value: null.
BOOL_TAG = "tag:yaml.org,2002:bool" // The tag !!bool with the values: true and false.
@@ -416,419 +474,3 @@ const (
DEFAULT_SEQUENCE_TAG = SEQ_TAG // The default sequence tag is !!seq.
DEFAULT_MAPPING_TAG = MAP_TAG // The default mapping tag is !!map.
)
type NodeType int
// Node types.
const (
// An empty node.
NO_NODE NodeType = iota
SCALAR_NODE // A scalar node.
SEQUENCE_NODE // A sequence node.
MAPPING_NODE // A mapping node.
)
// NodeItem represents an element of a sequence node.
type NodeItem int
// NodePair represents an element of a mapping node.
type NodePair struct {
key int // The key of the element.
value int // The value of the element.
}
// parserNode represents a single node in the YAML document tree.
type parserNode struct {
typ NodeType // The node type.
tag []byte // The node tag.
// The node data.
// The scalar parameters (for SCALAR_NODE).
scalar struct {
value []byte // The scalar value.
length int // The length of the scalar value.
style ScalarStyle // The scalar style.
}
// The sequence parameters (for YAML_SEQUENCE_NODE).
sequence struct {
items_data []NodeItem // The stack of sequence items.
style SequenceStyle // The sequence style.
}
// The mapping parameters (for MAPPING_NODE).
mapping struct {
pairs_data []NodePair // The stack of mapping pairs (key, value).
pairs_start *NodePair // The beginning of the stack.
pairs_end *NodePair // The end of the stack.
pairs_top *NodePair // The top of the stack.
style MappingStyle // The mapping style.
}
start_mark Mark // The beginning of the node.
end_mark Mark // The end of the node.
}
// Document structure.
type Document struct {
// The document nodes.
nodes []parserNode
// The version directive.
version_directive *VersionDirective
// The list of tag directives.
tag_directives_data []TagDirective
tag_directives_start int // The beginning of the tag directives list.
tag_directives_end int // The end of the tag directives list.
start_implicit int // Is the document start indicator implicit?
end_implicit int // Is the document end indicator implicit?
// The start/end of the document.
start_mark, end_mark Mark
}
// ReadHandler is called when the [Parser] needs to read more bytes from the
// source. The handler should write not more than size bytes to the buffer.
// The number of written bytes should be set to the size_read variable.
//
// [in,out] data A pointer to an application data specified by
//
// yamlParser.setInput().
//
// [out] buffer The buffer to write the data from the source.
// [in] size The size of the buffer.
// [out] size_read The actual number of bytes read from the source.
//
// On success, the handler should return 1. If the handler failed,
// the returned value should be 0. On EOF, the handler should set the
// size_read to 0 and return 1.
type ReadHandler func(parser *Parser, buffer []byte) (n int, err error)
// SimpleKey holds information about a potential simple key.
type SimpleKey struct {
flow_level int // What flow level is the key at?
required bool // Is a simple key required?
token_number int // The number of the token.
mark Mark // The position mark.
}
// ParserState represents the state of the parser.
type ParserState int
const (
PARSE_STREAM_START_STATE ParserState = iota
PARSE_IMPLICIT_DOCUMENT_START_STATE // Expect the beginning of an implicit document.
PARSE_DOCUMENT_START_STATE // Expect DOCUMENT-START.
PARSE_DOCUMENT_CONTENT_STATE // Expect the content of a document.
PARSE_DOCUMENT_END_STATE // Expect DOCUMENT-END.
PARSE_BLOCK_NODE_STATE // Expect a block node.
PARSE_BLOCK_SEQUENCE_FIRST_ENTRY_STATE // Expect the first entry of a block sequence.
PARSE_BLOCK_SEQUENCE_ENTRY_STATE // Expect an entry of a block sequence.
PARSE_INDENTLESS_SEQUENCE_ENTRY_STATE // Expect an entry of an indentless sequence.
PARSE_BLOCK_MAPPING_FIRST_KEY_STATE // Expect the first key of a block mapping.
PARSE_BLOCK_MAPPING_KEY_STATE // Expect a block mapping key.
PARSE_BLOCK_MAPPING_VALUE_STATE // Expect a block mapping value.
PARSE_FLOW_SEQUENCE_FIRST_ENTRY_STATE // Expect the first entry of a flow sequence.
PARSE_FLOW_SEQUENCE_ENTRY_STATE // Expect an entry of a flow sequence.
PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_KEY_STATE // Expect a key of an ordered mapping.
PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_VALUE_STATE // Expect a value of an ordered mapping.
PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_END_STATE // Expect the and of an ordered mapping entry.
PARSE_FLOW_MAPPING_FIRST_KEY_STATE // Expect the first key of a flow mapping.
PARSE_FLOW_MAPPING_KEY_STATE // Expect a key of a flow mapping.
PARSE_FLOW_MAPPING_VALUE_STATE // Expect a value of a flow mapping.
PARSE_FLOW_MAPPING_EMPTY_VALUE_STATE // Expect an empty value of a flow mapping.
PARSE_END_STATE // Expect nothing.
)
func (ps ParserState) String() string {
switch ps {
case PARSE_STREAM_START_STATE:
return "PARSE_STREAM_START_STATE"
case PARSE_IMPLICIT_DOCUMENT_START_STATE:
return "PARSE_IMPLICIT_DOCUMENT_START_STATE"
case PARSE_DOCUMENT_START_STATE:
return "PARSE_DOCUMENT_START_STATE"
case PARSE_DOCUMENT_CONTENT_STATE:
return "PARSE_DOCUMENT_CONTENT_STATE"
case PARSE_DOCUMENT_END_STATE:
return "PARSE_DOCUMENT_END_STATE"
case PARSE_BLOCK_NODE_STATE:
return "PARSE_BLOCK_NODE_STATE"
case PARSE_BLOCK_SEQUENCE_FIRST_ENTRY_STATE:
return "PARSE_BLOCK_SEQUENCE_FIRST_ENTRY_STATE"
case PARSE_BLOCK_SEQUENCE_ENTRY_STATE:
return "PARSE_BLOCK_SEQUENCE_ENTRY_STATE"
case PARSE_INDENTLESS_SEQUENCE_ENTRY_STATE:
return "PARSE_INDENTLESS_SEQUENCE_ENTRY_STATE"
case PARSE_BLOCK_MAPPING_FIRST_KEY_STATE:
return "PARSE_BLOCK_MAPPING_FIRST_KEY_STATE"
case PARSE_BLOCK_MAPPING_KEY_STATE:
return "PARSE_BLOCK_MAPPING_KEY_STATE"
case PARSE_BLOCK_MAPPING_VALUE_STATE:
return "PARSE_BLOCK_MAPPING_VALUE_STATE"
case PARSE_FLOW_SEQUENCE_FIRST_ENTRY_STATE:
return "PARSE_FLOW_SEQUENCE_FIRST_ENTRY_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_KEY_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_KEY_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_VALUE_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_VALUE_STATE"
case PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_END_STATE:
return "PARSE_FLOW_SEQUENCE_ENTRY_MAPPING_END_STATE"
case PARSE_FLOW_MAPPING_FIRST_KEY_STATE:
return "PARSE_FLOW_MAPPING_FIRST_KEY_STATE"
case PARSE_FLOW_MAPPING_KEY_STATE:
return "PARSE_FLOW_MAPPING_KEY_STATE"
case PARSE_FLOW_MAPPING_VALUE_STATE:
return "PARSE_FLOW_MAPPING_VALUE_STATE"
case PARSE_FLOW_MAPPING_EMPTY_VALUE_STATE:
return "PARSE_FLOW_MAPPING_EMPTY_VALUE_STATE"
case PARSE_END_STATE:
return "PARSE_END_STATE"
}
return "<unknown parser state>"
}
// AliasData holds information about aliases.
type AliasData struct {
anchor []byte // The anchor.
index int // The node id.
mark Mark // The anchor mark.
}
// Parser structure holds all information about the current
// state of the parser.
type Parser struct {
lastError error
// Reader stuff
read_handler ReadHandler // Read handler.
input_reader io.Reader // File input data.
input []byte // String input data.
input_pos int
eof bool // EOF flag
buffer []byte // The working buffer.
buffer_pos int // The current position of the buffer.
unread int // The number of unread characters in the buffer.
newlines int // The number of line breaks since last non-break/non-blank character
raw_buffer []byte // The raw buffer.
raw_buffer_pos int // The current position of the buffer.
encoding Encoding // The input encoding.
offset int // The offset of the current position (in bytes).
mark Mark // The mark of the current position.
// Comments
HeadComment []byte // The current head comments
LineComment []byte // The current line comments
FootComment []byte // The current foot comments
tail_comment []byte // Foot comment that happens at the end of a block.
stem_comment []byte // Comment in item preceding a nested structure (list inside list item, etc)
comments []Comment // The folded comments for all parsed tokens
comments_head int
// Scanner stuff
stream_start_produced bool // Have we started to scan the input stream?
stream_end_produced bool // Have we reached the end of the input stream?
flow_level int // The number of unclosed '[' and '{' indicators.
tokens []Token // The tokens queue.
tokens_head int // The head of the tokens queue.
tokens_parsed int // The number of tokens fetched from the queue.
token_available bool // Does the tokens queue contain a token ready for dequeueing.
indent int // The current indentation level.
indents []int // The indentation levels stack.
simple_key_allowed bool // May a simple key occur at the current position?
simple_key_possible bool // Is the current simple key possible?
simple_key SimpleKey // The current simple key.
simple_key_stack []SimpleKey // The stack of simple keys.
// Parser stuff
state ParserState // The current parser state.
states []ParserState // The parser states stack.
marks []Mark // The stack of marks.
tag_directives []TagDirective // The list of TAG directives.
// Representer stuff
aliases []AliasData // The alias data.
document *Document // The currently parsed document.
}
type Comment struct {
ScanMark Mark // Position where scanning for comments started
TokenMark Mark // Position after which tokens will be associated with this comment
StartMark Mark // Position of '#' comment mark
EndMark Mark // Position where comment terminated
Head []byte
Line []byte
Foot []byte
}
// Emitter Definitions
// WriteHandler is called when the [Emitter] needs to flush the accumulated
// characters to the output. The handler should write @a size bytes of the
// @a buffer to the output.
//
// @param[in,out] data A pointer to an application data specified by
//
// yamlEmitter.setOutput().
//
// @param[in] buffer The buffer with bytes to be written.
// @param[in] size The size of the buffer.
//
// @returns On success, the handler should return @c 1. If the handler failed,
// the returned value should be @c 0.
type WriteHandler func(emitter *Emitter, buffer []byte) error
type EmitterState int
// The emitter states.
const (
// Expect STREAM-START.
EMIT_STREAM_START_STATE EmitterState = iota
EMIT_FIRST_DOCUMENT_START_STATE // Expect the first DOCUMENT-START or STREAM-END.
EMIT_DOCUMENT_START_STATE // Expect DOCUMENT-START or STREAM-END.
EMIT_DOCUMENT_CONTENT_STATE // Expect the content of a document.
EMIT_DOCUMENT_END_STATE // Expect DOCUMENT-END.
EMIT_FLOW_SEQUENCE_FIRST_ITEM_STATE // Expect the first item of a flow sequence.
EMIT_FLOW_SEQUENCE_TRAIL_ITEM_STATE // Expect the next item of a flow sequence, with the comma already written out
EMIT_FLOW_SEQUENCE_ITEM_STATE // Expect an item of a flow sequence.
EMIT_FLOW_MAPPING_FIRST_KEY_STATE // Expect the first key of a flow mapping.
EMIT_FLOW_MAPPING_TRAIL_KEY_STATE // Expect the next key of a flow mapping, with the comma already written out
EMIT_FLOW_MAPPING_KEY_STATE // Expect a key of a flow mapping.
EMIT_FLOW_MAPPING_SIMPLE_VALUE_STATE // Expect a value for a simple key of a flow mapping.
EMIT_FLOW_MAPPING_VALUE_STATE // Expect a value of a flow mapping.
EMIT_BLOCK_SEQUENCE_FIRST_ITEM_STATE // Expect the first item of a block sequence.
EMIT_BLOCK_SEQUENCE_ITEM_STATE // Expect an item of a block sequence.
EMIT_BLOCK_MAPPING_FIRST_KEY_STATE // Expect the first key of a block mapping.
EMIT_BLOCK_MAPPING_KEY_STATE // Expect the key of a block mapping.
EMIT_BLOCK_MAPPING_SIMPLE_VALUE_STATE // Expect a value for a simple key of a block mapping.
EMIT_BLOCK_MAPPING_VALUE_STATE // Expect a value of a block mapping.
EMIT_END_STATE // Expect nothing.
)
// Emitter holds all information about the current state of the emitter.
type Emitter struct {
// Writer stuff
write_handler WriteHandler // Write handler.
output_buffer *[]byte // String output data.
output_writer io.Writer // File output data.
buffer []byte // The working buffer.
buffer_pos int // The current position of the buffer.
encoding Encoding // The stream encoding.
// Emitter stuff
canonical bool // If the output is in the canonical style?
BestIndent int // The number of indentation spaces.
best_width int // The preferred width of the output lines.
unicode bool // Allow unescaped non-ASCII characters?
line_break LineBreak // The preferred line break.
quotePreference QuoteStyle // Preferred quote style when quoting is required.
state EmitterState // The current emitter state.
states []EmitterState // The stack of states.
events []Event // The event queue.
events_head int // The head of the event queue.
indents []int // The stack of indentation levels.
tag_directives []TagDirective // The list of tag directives.
indent int // The current indentation level.
CompactSequenceIndent bool // Is '- ' is considered part of the indentation for sequence elements?
flow_level int // The current flow level.
root_context bool // Is it the document root context?
sequence_context bool // Is it a sequence context?
mapping_context bool // Is it a mapping context?
simple_key_context bool // Is it a simple mapping key context?
line int // The current line.
column int // The current column.
whitespace bool // If the last character was a whitespace?
indention bool // If the last character was an indentation character (' ', '-', '?', ':')?
OpenEnded bool // If an explicit document end is required?
space_above bool // Is there's an empty line above?
foot_indent int // The indent used to write the foot comment above, or -1 if none.
// Anchor analysis.
anchor_data struct {
anchor []byte // The anchor value.
alias bool // Is it an alias?
}
// Tag analysis.
tag_data struct {
handle []byte // The tag handle.
suffix []byte // The tag suffix.
}
// Scalar analysis.
scalar_data struct {
value []byte // The scalar value.
multiline bool // Does the scalar contain line breaks?
flow_plain_allowed bool // Can the scalar be expressed in the flow plain style?
block_plain_allowed bool // Can the scalar be expressed in the block plain style?
single_quoted_allowed bool // Can the scalar be expressed in the single quoted style?
block_allowed bool // Can the scalar be expressed in the literal or folded styles?
style ScalarStyle // The output style.
}
// Comments
HeadComment []byte
LineComment []byte
FootComment []byte
TailComment []byte
key_line_comment []byte
// Representer stuff
opened bool // If the stream was already opened?
closed bool // If the stream was already closed?
// The information associated with the document nodes.
anchors *struct {
references int // The number of references.
anchor int // The anchor id.
serialized bool // If the node has been emitted?
}
last_anchor_id int // The last assigned anchor id.
document *Document // The currently emitted document.
}
-192
View File
@@ -1,192 +0,0 @@
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0
// YAML test data loading utilities.
// Provides helper functions for loading and processing YAML test data,
// including scalar coercion.
package libyaml
import (
"errors"
"fmt"
"io"
"strings"
)
// coerceScalar converts a YAML scalar string to an appropriate Go type
func coerceScalar(value string) any {
// Try bool and null
switch value {
case "true":
return true
case "false":
return false
case "null":
return nil
}
// Try hex int (0x or 0X prefix) - needed for test data byte arrays
var intVal int
if _, err := fmt.Sscanf(strings.ToLower(value), "0x%x", &intVal); err == nil {
return intVal
}
// Try float (must check before int because %d will parse "1.5" as "1")
if strings.Contains(value, ".") {
var floatVal float64
if _, err := fmt.Sscanf(value, "%f", &floatVal); err == nil {
return floatVal
}
}
// Try decimal int - use int64 to handle large values on 32-bit systems
var int64Val int64
if _, err := fmt.Sscanf(value, "%d", &int64Val); err == nil {
// Return as int if it fits, otherwise int64
if int64Val == int64(int(int64Val)) {
return int(int64Val)
}
return int64Val
}
// Default to string
return value
}
// LoadYAML parses YAML data using the native libyaml Parser.
// This function is exported so it can be used by other packages for data-driven testing.
// It returns a generic interface{} which is typically:
// - map[string]interface{} for YAML mappings
// - []interface{} for YAML sequences
// - scalar values, resolved according to the following rules:
// - Booleans: "true" and "false" are returned as bool (true/false).
// - Nulls: "null" is returned as nil.
// - Floats: values containing "." are parsed as float64.
// - Decimal integers: values matching integer format are parsed as int.
// - All other values are returned as string.
//
// This scalar resolution behavior matches the implementation in coerceScalar.
func LoadYAML(data []byte) (any, error) {
parser := NewParser()
parser.SetInputString(data)
defer parser.Delete()
type stackEntry struct {
container any // map[string]interface{} or []interface{}
key string // for maps: current key waiting for value
}
var stack []stackEntry
var root any
for {
var event Event
if err := parser.Parse(&event); err != nil {
if errors.Is(err, io.EOF) {
break
}
return nil, err
}
switch event.Type {
case STREAM_END_EVENT:
// End of stream, we're done
return root, nil
case STREAM_START_EVENT, DOCUMENT_START_EVENT:
// Structural markers, no action needed
case MAPPING_START_EVENT:
newMap := make(map[string]any)
stack = append(stack, stackEntry{container: newMap})
case MAPPING_END_EVENT:
if len(stack) > 0 {
popped := stack[len(stack)-1]
stack = stack[:len(stack)-1]
// Add completed map to parent or set as root
if len(stack) == 0 {
root = popped.container
} else {
parent := &stack[len(stack)-1]
if m, ok := parent.container.(map[string]any); ok {
m[parent.key] = popped.container
parent.key = "" // Reset key after use
} else if s, ok := parent.container.([]any); ok {
parent.container = append(s, popped.container)
}
}
}
case SEQUENCE_START_EVENT:
newSlice := make([]any, 0)
stack = append(stack, stackEntry{container: newSlice})
case SEQUENCE_END_EVENT:
if len(stack) > 0 {
popped := stack[len(stack)-1]
stack = stack[:len(stack)-1]
// Add completed slice to parent or set as root
if len(stack) == 0 {
root = popped.container
} else {
parent := &stack[len(stack)-1]
if m, ok := parent.container.(map[string]any); ok {
m[parent.key] = popped.container
parent.key = "" // Reset key after use
} else if s, ok := parent.container.([]any); ok {
parent.container = append(s, popped.container)
}
}
}
case SCALAR_EVENT:
value := string(event.Value)
// Only coerce plain (unquoted) scalars
isQuoted := ScalarStyle(event.Style) != PLAIN_SCALAR_STYLE
if len(stack) == 0 {
// Scalar at root level
if isQuoted {
root = value
} else {
root = coerceScalar(value)
}
} else {
parent := &stack[len(stack)-1]
if m, ok := parent.container.(map[string]any); ok {
if parent.key == "" {
// This scalar is a key - keep as string, don't coerce
parent.key = value
} else {
// This scalar is a value
if isQuoted {
m[parent.key] = value
} else {
m[parent.key] = coerceScalar(value)
}
parent.key = ""
}
} else if s, ok := parent.container.([]any); ok {
// Add to sequence
if isQuoted {
parent.container = append(s, value)
} else {
parent.container = append(s, coerceScalar(value))
}
}
}
case DOCUMENT_END_EVENT:
// Document end marker, continue processing
case ALIAS_EVENT, TAIL_COMMENT_EVENT:
// For now, skip aliases and comments (not used in test data)
}
}
return root, nil
}
-249
View File
@@ -1,249 +0,0 @@
// Copyright 2006-2010 Kirill Simonov
// Copyright 2011-2019 Canonical Ltd
// Copyright 2025 The go-yaml Project Contributors
// SPDX-License-Identifier: Apache-2.0 AND MIT
// Internal constants and buffer sizes.
// Defines buffer sizes, stack sizes, and other internal configuration
// constants for libyaml.
package libyaml
const (
// The size of the input raw buffer.
input_raw_buffer_size = 512
// The size of the input buffer.
// It should be possible to decode the whole raw buffer.
input_buffer_size = input_raw_buffer_size * 3
// The size of the output buffer.
output_buffer_size = 128
// The size of other stacks and queues.
initial_stack_size = 16
initial_queue_size = 16
initial_string_size = 16
)
// Check if the character at the specified position is an alphabetical
// character, a digit, '_', or '-'.
func isAlpha(b []byte, i int) bool {
return b[i] >= '0' && b[i] <= '9' || b[i] >= 'A' && b[i] <= 'Z' ||
b[i] >= 'a' && b[i] <= 'z' || b[i] == '_' || b[i] == '-'
}
// Check if the character at the specified position is a flow indicator as
// defined by spec production [23] c-flow-indicator ::=
// c-collect-entry | c-sequence-start | c-sequence-end |
// c-mapping-start | c-mapping-end
func isFlowIndicator(b []byte, i int) bool {
return b[i] == '[' || b[i] == ']' ||
b[i] == '{' || b[i] == '}' || b[i] == ','
}
// Check if the character at the specified position is valid for anchor names
// as defined by spec production [102] ns-anchor-char ::= ns-char -
// c-flow-indicator.
// This includes all printable characters except: CR, LF, BOM, space, tab, '[',
// ']', '{', '}', ','.
// We further limit it to ascii chars only, which is a subset of the spec
// production but is usually what most people expect.
func isAnchorChar(b []byte, i int) bool {
if isColon(b, i) {
// [Go] we exclude colons from anchor/alias names.
//
// A colon is a valid anchor character according to the YAML 1.2 specification,
// but it can lead to ambiguity.
// https://github.com/yaml/go-yaml/issues/109
//
// Also, it would have been a breaking change to support it, as go.yaml.in/yaml/v3 ignores it.
// Supporting it could lead to unexpected behavior.
return false
}
return isPrintable(b, i) &&
!isLineBreak(b, i) &&
!isBlank(b, i) &&
!isBOM(b, i) &&
!isFlowIndicator(b, i) &&
isASCII(b, i)
}
// isColon checks whether the character at the specified position is a colon.
func isColon(b []byte, i int) bool {
return b[i] == ':'
}
// Check if the character at the specified position is valid in a tag URI.
//
// The set of valid characters is:
//
// '0'-'9', 'A'-'Z', 'a'-'z', '_', '-', ';', '/', '?', ':', '@', '&',
// '=', '+', '$', '.', '!', '~', '*', '\'', '(', ')', '%'.
//
// If verbatim is true, flow indicators (',', '[', ']', '{', '}') are also
// allowed.
func isTagURIChar(b []byte, i int, verbatim bool) bool {
c := b[i]
// isAlpha covers: 0-9, A-Z, a-z, _, -
if isAlpha(b, i) {
return true
}
// Check special URI characters
switch c {
case ';', '/', '?', ':', '@', '&', '=', '+', '$', '.', '!', '~', '*', '\'', '(', ')', '%':
return true
case ',', '[', ']', '{', '}':
return verbatim
}
return false
}
// Check if the character at the specified position is a digit.
func isDigit(b []byte, i int) bool {
return b[i] >= '0' && b[i] <= '9'
}
// Get the value of a digit.
func asDigit(b []byte, i int) int {
return int(b[i]) - '0'
}
// Check if the character at the specified position is a hex-digit.
func isHex(b []byte, i int) bool {
return b[i] >= '0' && b[i] <= '9' || b[i] >= 'A' && b[i] <= 'F' ||
b[i] >= 'a' && b[i] <= 'f'
}
// Get the value of a hex-digit.
func asHex(b []byte, i int) int {
bi := b[i]
if bi >= 'A' && bi <= 'F' {
return int(bi) - 'A' + 10
}
if bi >= 'a' && bi <= 'f' {
return int(bi) - 'a' + 10
}
return int(bi) - '0'
}
// Check if the character is ASCII.
func isASCII(b []byte, i int) bool {
return b[i] <= 0x7F
}
// Check if the character at the start of the buffer can be printed unescaped.
func isPrintable(b []byte, i int) bool {
return ((b[i] == 0x0A) || // . == #x0A
(b[i] >= 0x20 && b[i] <= 0x7E) || // #x20 <= . <= #x7E
(b[i] == 0xC2 && b[i+1] >= 0xA0) || // #0xA0 <= . <= #xD7FF
(b[i] > 0xC2 && b[i] < 0xED) ||
(b[i] == 0xED && b[i+1] < 0xA0) ||
(b[i] == 0xEE) ||
(b[i] == 0xEF && // #xE000 <= . <= #xFFFD
!(b[i+1] == 0xBB && b[i+2] == 0xBF) && // && . != #xFEFF
!(b[i+1] == 0xBF && (b[i+2] == 0xBE || b[i+2] == 0xBF))))
}
// Check if the character at the specified position is NUL.
func isZeroChar(b []byte, i int) bool {
return b[i] == 0x00
}
// Check if the beginning of the buffer is a BOM.
func isBOM(b []byte, i int) bool {
return b[0] == 0xEF && b[1] == 0xBB && b[2] == 0xBF
}
// Check if the character at the specified position is space.
func isSpace(b []byte, i int) bool {
return b[i] == ' '
}
// Check if the character at the specified position is tab.
func isTab(b []byte, i int) bool {
return b[i] == '\t'
}
// Check if the character at the specified position is blank (space or tab).
func isBlank(b []byte, i int) bool {
// return isSpace(b, i) || isTab(b, i)
return b[i] == ' ' || b[i] == '\t'
}
// Check if the character at the specified position is a line break.
func isLineBreak(b []byte, i int) bool {
return (b[i] == '\r' || // CR (#xD)
b[i] == '\n' || // LF (#xA)
b[i] == 0xC2 && b[i+1] == 0x85 || // NEL (#x85)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA8 || // LS (#x2028)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA9) // PS (#x2029)
}
func isCRLF(b []byte, i int) bool {
return b[i] == '\r' && b[i+1] == '\n'
}
// Check if the character is a line break or NUL.
func isBreakOrZero(b []byte, i int) bool {
// return isLineBreak(b, i) || isZeroChar(b, i)
return (
// isBreak:
b[i] == '\r' || // CR (#xD)
b[i] == '\n' || // LF (#xA)
b[i] == 0xC2 && b[i+1] == 0x85 || // NEL (#x85)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA8 || // LS (#x2028)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA9 || // PS (#x2029)
// isZeroChar:
b[i] == 0)
}
// Check if the character is a line break, space, or NUL.
func isSpaceOrZero(b []byte, i int) bool {
// return isSpace(b, i) || isBreakOrZero(b, i)
return (
// isSpace:
b[i] == ' ' ||
// isBreakOrZero:
b[i] == '\r' || // CR (#xD)
b[i] == '\n' || // LF (#xA)
b[i] == 0xC2 && b[i+1] == 0x85 || // NEL (#x85)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA8 || // LS (#x2028)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA9 || // PS (#x2029)
b[i] == 0)
}
// Check if the character is a line break, space, tab, or NUL.
func isBlankOrZero(b []byte, i int) bool {
// return isBlank(b, i) || isBreakOrZero(b, i)
return (
// isBlank:
b[i] == ' ' || b[i] == '\t' ||
// isBreakOrZero:
b[i] == '\r' || // CR (#xD)
b[i] == '\n' || // LF (#xA)
b[i] == 0xC2 && b[i+1] == 0x85 || // NEL (#x85)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA8 || // LS (#x2028)
b[i] == 0xE2 && b[i+1] == 0x80 && b[i+2] == 0xA9 || // PS (#x2029)
b[i] == 0)
}
// Determine the width of the character.
func width(b byte) int {
// Don't replace these by a switch without first
// confirming that it is being inlined.
if b&0x80 == 0x00 {
return 1
}
if b&0xE0 == 0xC0 {
return 2
}
if b&0xF0 == 0xE0 {
return 3
}
if b&0xF8 == 0xF0 {
return 4
}
return 0
}