[comment {-*- tcl -*-}]
[manpage_begin struct::tree n 2.1.1]
[keywords breadth-first]
[keywords depth-first]
[keywords in-order]
[keywords node]
[keywords post-order]
[keywords pre-order]
[keywords serialization]
[keywords tree]
[copyright {2002-2004,2012 Andreas Kupries <andreas_kupries@users.sourceforge.net>}]
[moddesc   {Tcl Data Structures}]
[titledesc {Create and manipulate tree objects}]
[category  {Data structures}]
[require Tcl 8.2]
[require struct::tree [opt 2.1.1]]
[require struct::list [opt 1.5]]
[description]
[para]

A tree is a collection of named elements, called nodes, one of which is
distinguished as a root, along with a relation ("parenthood") that
places a hierarchical structure on the nodes. (Data Structures and
Algorithms; Aho, Hopcroft and Ullman; Addison-Wesley, 1987).  In
addition to maintaining the node relationships, this tree
implementation allows any number of keyed values to be associated with
each node.

[para]

The element names can be arbitrary strings.

[para][comment {This comparison (C) 2007 Lars Bergstrom, Bug 1687902}]

A tree is thus similar to an array, but with three important
differences:

[list_begin enumerated]
[enum] Trees are accessed through an object command, whereas arrays are
accessed as variables. (This means trees cannot be local to a procedure.)

[enum] Trees have a hierarchical structure, whereas an array is just an
unordered collection.

[enum] Each node of a tree has a separate collection of attributes and
values. This is like an array where every value is a dictionary.

[list_end]

[para]

[emph Note:] The major version of the package [package struct] has
been changed to version 2.0, due to backward incompatible changes in
the API of this module. Please read the section

[sectref {Changes for 2.0}] for a full list of all changes,
incompatible and otherwise.

[para]

[section API]
[subsection {Tree CLASS API}]

The main commands of the package are:

[list_begin definitions]

[call [cmd ::struct::tree] [opt [arg treeName]] \
	[opt "[const =]|[const :=]|[const as]|[const deserialize] [arg source]"]]

The command creates a new tree object with an associated global Tcl
command whose name is [arg treeName]. This command may be used to
invoke various operations on the tree.

It has the following general form:

[list_begin definitions]
[call [cmd treeName] [method option] [opt [arg "arg arg ..."]]]

[arg Option] and the [arg arg]s determine the exact behavior of the
command.

[list_end]
[para]

If [arg treeName] is not specified a unique name will be generated by
the package itself. If a [arg source] is specified the new tree will
be initialized to it. For the operators [const =], [const :=], and
[const as] [arg source] is interpreted as the name of another tree
object, and the assignment operator [method =] will be executed. For
[const deserialize] the [arg source] is a serialized tree object and
[method deserialize] will be executed.

[para]

In other words
[para]
[example {
    ::struct::tree mytree = b
}]
[para]
is equivalent to
[para]
[example {
    ::struct::tree mytree
    mytree = b
}]
[para]
and
[para]
[example {
    ::struct::tree mytree deserialize $b
}]
[para]
is equivalent to
[para]
[example {
    ::struct::tree mytree
    mytree deserialize $b
}]

[call [cmd ::struct::tree::prune]]

This command is provided outside of the tree methods, as it is not a
tree method per se. It however interacts tightly with the method
[method walk]. When used in the walk script it causes the traversal to
ignore the children of the node we are currently at.

This command cannot be used with the traversal modes which look at
children before their parent, i.e. [const post] and [const in]. The
only applicable orders of traversal are [const pre] and
[const both]. An error is thrown if the command and chosen order of
traversal do not fit.

[list_end]
[para]

[subsection {Tree OBJECT API}]

Two general observations beforehand:

[list_begin enumerated]
[enum]

The root node of the tree can be used in most places where a node is
asked for. The default name of the rootnode is "root", but this can be
changed with the method [method rename] (see below). Whatever the
current name for the root node of the tree is, it can be retrieved by
calling the method [method rootname].

[enum]

The method [method insert] is the only way to create new nodes, and
they are automatically added to a parent. A tree object cannot have
nodes without a parent, save the root node.

[list_end]
[para]

And now the methods supported by tree objects created by this package:

[list_begin definitions]

[call [arg treeName] [method =] [arg sourcetree]]

This is the assignment operator for tree objects. It copies the tree
contained in the tree object [arg sourcetree] over the tree data in
[arg treeName]. The old contents of [arg treeName] are deleted by this
operation.

[para]

This operation is in effect equivalent to
[para]
[example_begin]
    [arg treeName] [method deserialize] [lb][arg sourcetree] [method serialize][rb]
[example_end]

[call [arg treeName] [method -->] [arg desttree]]

This is the reverse assignment operator for tree objects. It copies the tree
contained in the tree object [arg treeName] over the tree data in the object
[arg desttree]. The old contents of [arg desttree] are deleted by this
operation.

[para]

This operation is in effect equivalent to
[para]
[example_begin]
    [arg desttree] [method deserialize] [lb][arg treeName] [method serialize][rb]
[example_end]

[call [arg treeName] [method ancestors] [arg node]]

This method extends the method [method parent] and returns a list
containing all ancestor nodes to the specified [arg node]. The
immediate ancestor, in other words, parent node, is the first element
in that list, its parent the second element, and so on until the root
node is reached, making it the last element of the returned list.

[call [arg treeName] [method append] [arg node] [arg key] [arg value]]

Appends a [arg value] to one of the keyed values associated with an
node. Returns the new value given to the attribute [arg key].

[call [arg treeName] [method attr] [arg key]]
[call [arg treeName] [method attr] [arg key] [option -nodes] [arg list]]
[call [arg treeName] [method attr] [arg key] [option -glob] [arg globpattern]]
[call [arg treeName] [method attr] [arg key] [option -regexp] [arg repattern]]

This method retrieves the value of the attribute named [arg key], for
all nodes in the tree (matching the restriction specified via one of
the possible options) and having the specified attribute.

[para]

The result is a dictionary mapping from node names to the value of
attribute [arg key] at that node.

Nodes not having the attribute [arg key], or not passing a
specified restriction, are not listed in the result.

[para]

The possible restrictions are:

[list_begin options]
[opt_def -nodes]

The value is a list of nodes. Only the nodes mentioned in this list
are searched for the attribute.

[opt_def -glob]

The value is a glob pattern. Only the nodes in the tree whose names
match this pattern are searched for the attribute.

[opt_def -regexp]

The value is a regular expression. Only the nodes in the tree whose
names match this pattern are searched for the attribute.

[list_end]
[para]

[call [arg treeName] [method children] [opt [option -all]] [arg node] [opt "[const filter] [arg cmdprefix]"]]

Return a list of the children of [arg node].

If the option [option -all] is specified, then not only the direct
children, but their children, and so on are returned in the result.

If a filter command is specified only those nodes are listed in the
final result which pass the test. The command in [arg cmdprefix] is
called with two arguments, the name of the tree object, and the name
of the node in question. It is executed in the context of the caller
and has to return a boolean value. Nodes for which the command returns
[const false] are removed from the result list before it is returned
to the caller.

[para]
Some examples:
[para]
[example {
    mytree insert root end 0 ; mytree set 0 volume 30
    mytree insert root end 1
    mytree insert root end 2
    mytree insert 0    end 3
    mytree insert 0    end 4
    mytree insert 4    end 5 ; mytree set 5 volume 50
    mytree insert 4    end 6

    proc vol {t n} {
	$t keyexists $n volume
    }
    proc vgt40 {t n} {
	if {![$t keyexists $n volume]} {return 0}
	expr {[$t get $n volume] > 40}
    }

    tclsh> lsort [mytree children -all root filter vol]
    0 5

    tclsh> lsort [mytree children -all root filter vgt40]
    5

    tclsh> lsort [mytree children root filter vol]
    0

    tclsh> puts ([lsort [mytree children root filter vgt40]])
    ()
}]

[call [arg treeName] [method cut] [arg node]]

Removes the node specified by [arg node] from the tree, but not its
children.  The children of [arg node] are made children of the parent
of the [arg node], at the index at which [arg node] was located.

[call [arg treeName] [method delete] [arg node] [opt "[arg node] ..."]]

Removes the specified nodes from the tree.  All of the nodes' children
will be removed as well to prevent orphaned nodes.

[call [arg treeName] [method depth] [arg node]]

Return the number of steps from node [arg node] to the root node.

[call [arg treeName] [method descendants] [arg node] [opt "[const filter] [arg cmdprefix]"]]

This method extends the method [method children] and returns a list
containing all nodes descending from [arg node], and passing the
filter, if such was specified.

[para]

This is actually the same as
"[arg treeName] [method children] [option -all]".
[method descendants] should be prefered, and "children -all"
will be deprecated sometime in the future.

[call [arg treeName] [method deserialize] [arg serialization]]

This is the complement to [method serialize]. It replaces tree data in
[arg treeName] with the tree described by the [arg serialization]
value. The old contents of [arg treeName] are deleted by this
operation.

[call [arg treeName] [method destroy]]

Destroy the tree, including its storage space and associated command.

[call [arg treeName] [method exists] [arg node]]

Returns true if the specified node exists in the tree.

[call [arg treeName] [method get] [arg node] [arg key]]

Returns the value associated with the key [arg key] for the node
[arg node].

[call [arg treeName] [method getall] [arg node] [opt [arg pattern]]]

Returns a dictionary (suitable for use with [lb][cmd {array set}][rb])
containing the attribute data for the [arg node].

If the glob [arg pattern] is specified only the attributes whose names
match the pattern will be part of the dictionary.

[call [arg treeName] [method keys] [arg node] [opt [arg pattern]]]

Returns a list of keys for the [arg node].

If the [arg pattern] is specified only the attributes whose names
match the pattern will be part of the returned list. The pattern is a
[cmd glob] pattern.

[call [arg treeName] [method keyexists] [arg node] [arg key]]

Return true if the specified [arg key] exists for the [arg node].

[call [arg treeName] [method index] [arg node]]

Returns the index of [arg node] in its parent's list of children.  For
example, if a node has [term nodeFoo], [term nodeBar], and

[term nodeBaz] as children, in that order, the index of

[term nodeBar] is 1.

[call [arg treeName] [method insert] [arg parent] [arg index] [opt "[arg child] [opt "[arg child] ..."]"]]

Insert one or more nodes into the tree as children of the node

[arg parent]. The nodes will be added in the order they are given. If
[arg parent] is [const root], it refers to the root of the tree. The
new nodes will be added to the [arg parent] node's child list at the
index given by [arg index]. The [arg index] can be [const end] in
which case the new nodes will be added after the current last child.
Indices of the form "end-[var n]" are accepted as well.

[para]

If any of the specified children already exist in [arg treeName],
those nodes will be moved from their original location to the new
location indicated by this command.

[para]

If no [arg child] is specified, a single node will be added, and a
name will be generated for the new node. The generated name is of the
form [emph node][var x], where [var x] is a number. If names are
specified they must neither contain whitespace nor colons (":").

[para]

The return result from this command is a list of nodes added.

[call [arg treeName] [method isleaf] [arg node]]

Returns true if [arg node] is a leaf of the tree (if [arg node] has no
children), false otherwise.

[call [arg treeName] [method lappend] [arg node] [arg key] [arg value]]

Appends a [arg value] (as a list) to one of the keyed values
associated with an [arg node]. Returns the new value given to the
attribute [arg key].

[call [arg treeName] [method leaves]]

Return a list containing all leaf nodes known to the tree.

[call [arg treeName] [method move] [arg parent] [arg index] [arg node] [opt "[arg node] ..."]]

Make the specified nodes children of [arg parent], inserting them into
the parent's child list at the index given by [arg index]. Note that
the command will take all nodes out of the tree before inserting them
under the new parent, and that it determines the position to place
them into after the removal, before the re-insertion. This behaviour
is important when it comes to moving one or more nodes to a different
index without changing their parent node.

[call [arg treeName] [method next] [arg node] ]

Return the right sibling of [arg node], or the empty string if

[arg node] was the last child of its parent.

[call [arg treeName] [method numchildren] [arg node]]

Return the number of immediate children of [arg node].

[call [arg treeName] [method nodes]]

Return a list containing all nodes known to the tree.

[call [arg treeName] [method parent] [arg node]]

Return the parent of [arg node].

[call [arg treeName] [method previous] [arg node] ]

Return the left sibling of [arg node], or the empty string if

[arg node] was the first child of its parent.

[call [arg treeName] [method rename] [arg node] [arg newname]]

Renames the node [arg node] to [arg newname]. An error is thrown if
either the node does not exist, or a node with name [arg newname] does
exist. The result of the command is the new name of the node.

[call [arg treeName] [method rootname]]

Returns the name of the root node of the tree.

[call [arg treeName] [method serialize] [opt [arg node]]]

This method serializes the sub-tree starting at [arg node]. In other
words it returns a tcl [emph value] completely describing the tree
starting at [arg node].

This allows, for example, the transfer of tree objects (or parts
thereof) over arbitrary channels, persistence, etc.

This method is also the basis for both the copy constructor and
the assignment operator.

[para]

The result of this method has to be semantically identical over all
implementations of the tree interface. This is what will enable us to
copy tree data between different implementations of the same
interface.

[para]

The result is a list containing containing a multiple of three
elements. It is like a serialized array except that there are two
values following each key. They are the names of the nodes in the
serialized tree. The two values are a reference to the parent node and
the attribute data, in this order.

[para]

The reference to the parent node is the empty string for the root node
of the tree. For all other nodes it is the index of the parent node in
the list. This means that they are integers, greater than or equal to
zero, less than the length of the list, and multiples of three.

The order of the nodes in the list is important insofar as it is used
to reconstruct the lists of children for each node. The children of a
node have to be listed in the serialization in the same order as they
are listed in their parent in the tree.

[para]

The attribute data of a node is a dictionary, i.e. a list of even
length containing a serialized array. For a node without attribute
data the dictionary is the empty list.

[para]

[emph Note:] While the current implementation returns the root node as
the first element of the list, followed by its children and their
children in a depth-first traversal this is not necessarily true for
other implementations.

The only information a reader of the serialized data can rely on for
the structure of the tree is that the root node is signaled by the
empty string for the parent reference, that all other nodes refer to
their parent through the index in the list, and that children occur in
the same order as in their parent.

[para]
[example {
 A possible serialization for the tree structure

             +- d
       +- a -+
 root -+- b  +- e
       +- c
 is

 {root {} {} a 0 {} d 3 {} e 3 {} b 0 {} c 0 {}}

 The above assumes that none of the nodes have attributes.
}]

[call [arg treeName] [method set] [arg node] [arg key] [opt [arg value]]]

Set or get one of the keyed values associated with a node. A node may
have any number of keyed values associated with it.  If [arg value] is
not specified, this command returns the current value assigned to the
key; if [arg value] is specified, this command assigns that value to
the key, and returns it.

[call [arg treeName] [method size] [opt [arg node]]]

Return a count of the number of descendants of the node [arg node]; if
no node is specified, [const root] is assumed.

[call [arg treeName] [method splice] [arg parent] [arg from] [opt [arg to]] [opt [arg child]]]

Insert a node named [arg child] into the tree as a child of the node
[arg parent]. If [arg parent] is [const root], it refers to the root
of the tree. The new node will be added to the parent node's child
list at the index given by [arg from].  The children of [arg parent]
which are in the range of the indices [arg from] and [arg to] are made
children of [arg child].  If the value of [arg to] is not specified it
defaults to [const end].  If no name is given for [arg child], a name
will be generated for the new node.  The generated name is of the form
[emph node][var x], where [var x] is a number.  The return result
from this command is the name of the new node.

[para]

The arguments [arg from] and [arg to] are regular list indices, i.e.
the form "end-[var n]" is accepted as well.

[call [arg treeName] [method swap] [arg node1] [arg node2]]

Swap the position of [arg node1] and [arg node2] in the tree.

[call [arg treeName] [method unset] [arg node] [arg key]]

Removes a keyed value from the node [arg node]. The method will do
nothing if the [arg key] does not exist.

[call [arg treeName] [method walk] [arg node] [opt "[option -order] [arg order]"] [opt "[option -type] [arg type]"] [arg loopvar] [arg script]]

Perform a breadth-first or depth-first walk of the tree starting at
the node [arg node].  The type of walk, breadth-first or depth-first,
is determined by the value of [arg type]; [const bfs] indicates
breadth-first, [const dfs] indicates depth-first.  Depth-first is the
default. The order of the walk, pre-, post-, both- or in-order is
determined by the value of [arg order]; [const pre] indicates
pre-order, [const post] indicates post-order, [const both] indicates
both-order and [const in] indicates in-order. Pre-order is the
default.

[para]

Pre-order walking means that a parent node is visited before any of
its children.  For example, a breadth-first search starting from the
root will visit the root, followed by all of the root's children,
followed by all of the root's grandchildren. Post-order walking means
that a parent node is visited after any of its children. Both-order
walking means that a parent node is visited before [emph and] after
any of its children. In-order walking means that a parent node is
visited after its first child and before the second. This is a
generalization of in-order walking for binary trees and will do the
right thing if a binary tree is walked. The combination of a breadth-first
walk with in-order is illegal.

[para]

As the walk progresses, the [arg script] will be evaluated at each
node. The evaluation takes place in the context of the caller of the
method.

Regarding loop variables, these are listed in [arg loopvar]. If one
only one variable is specified it will be set to the id of the
node. When two variables are specified, i.e. [arg loopvar] is a true
list, then the first variable will be set to the action performed at
the node, and the other to the id of the node itself.

All loop variables are created in the context of the caller.

[para]

There are three possible actions: [const enter], [const leave],
or [const visit].  [const enter] actions occur during pre-order
walks; [const leave] actions occur during post-order walks;

[const visit] actions occur during in-order walks.  In a both-order
walk, the command will be evaluated twice for each node; the action is
[const enter] for the first evaluation, and [const leave] for the
second.

[para]

[emph Note]: The [const enter] action for a node is always performed
before the walker will look at the children of that node. This means
that changes made by the [arg script] to the children of the node
will immediately influence the walker and the steps it will take.

[para]

Any other manipulation, for example of nodes higher in the tree (i.e
already visited), or upon leaving will have undefined results. They
may succeed, error out, silently compute the wrong result, or anything
in between.

[para]

At last a small table showing the relationship between the various
options and the possible actions.

[para]
[example {
 order       type    actions         notes
 -----       ----    -----           -----
 pre         dfs     enter           parent before children
 post        dfs     leave           parent after children
 in          dfs     visit           parent between first and second child.
 both        dfs     enter, leave    parent before and after children
 -----       ----    -----           -----
 pre         bfs     enter           parent before children
 post        bfs     leave           parent after children
 in          bfs             -- illegal --
 both        bfs     enter, leave    parent before and after children
 -----       ----    -----           -----
}]

[para]

Note the command [cmd ::struct::tree::prune]. This command can be used
in the walk script to force the command to ignore the children of the
node we are currently at. It will throw an error if the order of
traversal is either [const post] or [const in] as these modes visit
the children before their parent, making pruning non-sensical.

[call [arg treeName] [method walkproc] [arg node] [opt "[option -order] [arg order]"] [opt "[option -type] [arg type]"] [arg cmdprefix]]

This method is like method [method walk] in all essentials, except the
interface to the user code. This method invokes a command prefix with
three additional arguments (tree, node, and action), instead of
evaluating a script and passing the node via a loop variable.

[list_end]

[subsection {Changes for 2.0}]

The following noteworthy changes have occurred:

[list_begin enumerated]
[enum]

The API for accessing attributes and their values has been
simplified.

[para]

All functionality regarding the default attribute "data" has been
removed. This default attribute does not exist anymore. All accesses
to attributes have to specify the name of the attribute in
question. This backward [emph incompatible] change allowed us to
simplify the signature of all methods handling attributes.

[para]

Especially the flag [option -key] is not required anymore, even more,
its use is now forbidden. Please read the documentation for the
methods [method set], [method get], [method getall], [method unset],
[method append], [method lappend], [method keyexists]

and [method keys] for a description of the new API's.

[enum]

The methods [method keys] and [method getall] now take an optional
pattern argument and will return only attribute data for keys matching
this pattern.

[enum]

Nodes can now be renamed. See the documentation for the method
[method rename].

[enum]

The structure has been extended with API's for the serialization and
deserialization of tree objects, and a number of operations based on
them (tree assignment, copy construction).

[para]

Please read the documentation for the methods [method serialize],
[method deserialize], [method =], and [method -->], and the
documentation on the construction of tree objects.

[para]

Beyond the copying of whole tree objects these new API's also enable
the transfer of tree objects over arbitrary channels and for easy
persistence.

[enum]

The walker API has been streamlined and made more similar to the
command [cmd foreach]. In detail:

[list_begin itemized]

[item]

The superfluous option [option -command] has been removed.

[item]

Ditto for the place holders. Instead of the placeholders two loop
variables have to be specified to contain node and action information.

[item]

The old command argument has been documented as a script now, which it
was in the past too.

[item]

The fact that [const enter] actions are called before the walker looks
at the children of a node has been documented now. In other words it
is now officially allowed to manipulate the list of children for a
node under [emph these] circumstances. It has been made clear that
changes under any other circumstances will have undefined results,
from silently computing the wrong result to erroring out.

[list_end]

[enum]

A new method, [method attr], was added allowing the query and
retrieval of attribute data without regard to the node relationship.

[enum]

The method [method children] has been extended with the ability to
select from the children of the node based on an arbitrary filtering
criterium. Another extension is the ability to look not only at the
immediate children of the node, but the whole tree below it.

[list_end]

[section EXAMPLES]

The following example demonstrates the creation of new nodes:

[example {
    mytree insert root end 0   ; # Create node 0, as child of the root
    mytree insert root end 1 2 ; # Ditto nodes 1 & 2
    mytree insert 0    end 3   ; # Now create node 3 as child of node 0
    mytree insert 0    end     ; # Create another child of 0, with a
    #                              generated name. The name is returned
    #                              as the result of the command.
}]

[vset CATEGORY {struct :: tree}]
[include ../common-text/feedback.inc]
[manpage_end]