How To Parse Deeply Nested Yaml Data Structures In Python
Solution 1:
Your title makes it look like you are a bit confused about what is
going on, or at least about terminology: although "YAML data
structure" might be construed as shorthand for "Python data structure
loaded from a YAML document", you do not further parse that data
structure. Any parsing is done as part of the loading of the YAML
document and parsing is completely finished even before yaml.load()
returns. As a result of that loading you have a data structure in Python and
you "just" need to lookup a key in a nested Python data-structure by
recursively walking that data structure.
Your YAML example is somewhat uninteresting, as it only represents a tiny subset of real YAML as your YAML only consists of (plain) scalars that are strings, mappings, and mapping keys that are scalars.
To walk over that data structure a simplified version of the recursive function @aaaaaa presented will do:
import sys
import yaml
yaml_str = """\
all:
children:
allnetxsites:
children:
netxsites:
hosts:
bar.:
ansible_ssh_host: bart.j
domain: bart.local.domain
nfs: lars.local.domain
"""
data = yaml.safe_load(yaml_str)
deffind(key, dictionary):
# everything is a dictfor k, v in dictionary.items():
if k == key:
yield v
elifisinstance(v, dict):
for result in find(key, v):
yield result
for x in find("nfs", data):
print(x)
which prints the expected:
lars.local.domain
I have simplified the function find
because the list handling in the version in the
snippet is incorrect.
Although the kinds of scalars used do not affect the recursive lookup, you probably want a more generic solution that can handle YAML with (nested) sequences, tagged nodes and complex mapping keys as well.
Assuming your input file to be the slightly more complex input.yaml
:
all:
{a:x}:!xyz- [k, l, 0943]
children:allnetxsites:children:netxsites:hosts:bar.:ansible_ssh_host:bart.jdomain:bart.local.domainnfs:lars.local.domain
You can use ruamel.yaml
(disclaimer: I am the author of that package) to do:
import sys
from pathlib import Path
import ruamel.yaml
in_file = Path('input.yaml')
yaml = ruamel.yaml.YAML()
data = yaml.load(in_file)
deflookup(sk, d, path=[]):
# lookup the values for key(s) sk return as list the tuple (path to the value, value)ifisinstance(d, dict):
for k, v in d.items():
if k == sk:
yield (path + [k], v)
for res in lookup(sk, v, path + [k]):
yield res
elifisinstance(d, list):
for item in d:
for res in lookup(sk, item, path + [item]):
yield res
for path, value in lookup("nfs", data):
print(path, '->', value)
which gives:
['all', 'children', 'allnetxsites', 'children', 'netxsites', 'hosts', 'bar.', 'nfs'] -> lars.local.domain
As PyYAML only parses a subset of YAML 1.1 and loads even less of
that, it cannot handle the valid YAML in input.yaml
.
The abovementioned snippet, the one @aaaaa is using, is will break on the loaded YAML because of the (directly) nested sequences/lists
Solution 2:
Maybe this snippet would provide you some help
deffind(key, dictionary):
for k, v in dictionary.iteritems():
if k == key:
yield v
elifisinstance(v, dict):
for result in find(key, v):
yield result
elifisinstance(v, list):
for d in v:
for result in find(key, d):
yield result
Then your code is equivalent to
find('nfs', data_loaded)
Post a Comment for "How To Parse Deeply Nested Yaml Data Structures In Python"