Skip to content Skip to sidebar Skip to footer

How To Parse Deeply Nested Yaml Data Structures In Python

We have a YAML file which looks somewhat like the following: all: children: allnetxsites: children: netxsites: hosts: bar.:

Solution 1:

Your title makes it look like you are a bit confused about what is going on, or at least about terminology: although "YAML data structure" might be construed as shorthand for "Python data structure loaded from a YAML document", you do not further parse that data structure. Any parsing is done as part of the loading of the YAML document and parsing is completely finished even before yaml.load() returns. As a result of that loading you have a data structure in Python and you "just" need to lookup a key in a nested Python data-structure by recursively walking that data structure.


Your YAML example is somewhat uninteresting, as it only represents a tiny subset of real YAML as your YAML only consists of (plain) scalars that are strings, mappings, and mapping keys that are scalars.

To walk over that data structure a simplified version of the recursive function @aaaaaa presented will do:

import sys
import yaml

yaml_str = """\
all:
  children:
    allnetxsites:
      children:
        netxsites:
          hosts:
            bar.:
              ansible_ssh_host: bart.j
              domain: bart.local.domain
              nfs: lars.local.domain
"""

data = yaml.safe_load(yaml_str)

deffind(key, dictionary):
    # everything is a dictfor k, v in dictionary.items():
        if k == key:
            yield v
        elifisinstance(v, dict):
            for result in find(key, v):
                yield result

for x in find("nfs", data):
    print(x)

which prints the expected:

lars.local.domain

I have simplified the function find because the list handling in the version in the snippet is incorrect.

Although the kinds of scalars used do not affect the recursive lookup, you probably want a more generic solution that can handle YAML with (nested) sequences, tagged nodes and complex mapping keys as well.

Assuming your input file to be the slightly more complex input.yaml:

all:
  {a:x}:!xyz- [k, l, 0943]
  children:allnetxsites:children:netxsites:hosts:bar.:ansible_ssh_host:bart.jdomain:bart.local.domainnfs:lars.local.domain

You can use ruamel.yaml (disclaimer: I am the author of that package) to do:

import sys
from pathlib import Path
import ruamel.yaml

in_file = Path('input.yaml')

yaml = ruamel.yaml.YAML()
data = yaml.load(in_file)

deflookup(sk, d, path=[]):
   # lookup the values for key(s) sk return as list the tuple (path to the value, value)ifisinstance(d, dict):
       for k, v in d.items():
           if k == sk:
               yield (path + [k], v)
           for res in lookup(sk, v, path + [k]):
               yield res
   elifisinstance(d, list):
       for item in d:
           for res in lookup(sk, item, path + [item]):
               yield res

for path, value in lookup("nfs", data):
    print(path, '->', value)

which gives:

['all', 'children', 'allnetxsites', 'children', 'netxsites', 'hosts', 'bar.', 'nfs'] -> lars.local.domain

As PyYAML only parses a subset of YAML 1.1 and loads even less of that, it cannot handle the valid YAML in input.yaml.

The abovementioned snippet, the one @aaaaa is using, is will break on the loaded YAML because of the (directly) nested sequences/lists

Solution 2:

Maybe this snippet would provide you some help

deffind(key, dictionary):
    for k, v in dictionary.iteritems():
        if k == key:
            yield v
        elifisinstance(v, dict):
            for result in find(key, v):
                yield result
        elifisinstance(v, list):
            for d in v:
                for result in find(key, d):
                    yield result

Then your code is equivalent to

find('nfs', data_loaded)

Post a Comment for "How To Parse Deeply Nested Yaml Data Structures In Python"