How to Use Pygplib Module¶
This document provides how-to instructions for a Python library for constructing, manipulating, and encoding graph properties expressible with first-order logic of graphs. We mean by graphs graphs with no directed edge, no multiple edge, no isolated edge, at most one isolated vertex, and no loop. See First-Order Logic of Graphs.
Parsing First-Order Formula¶
The pygplib mostly supports the format of untyped first-order formulas in
the TPTP
format
for automated theorem proving.
The format is detailed in Format of First-Order Formula.
The expression of a first-order formula can be parsed and
constructed to a formula object by Fog.read().
from pygplib import Fog
f = Fog.read("(~ edg(x1,x2)) & (~ edg(x1,x3)) & (~ edg(x2,x3))")
A formula object can be converted back to string by op.to_str().
from pygplib import op
assert op.to_str(f) == "(((~ edg(x1, x2)) & (~ edg(x1, x3))) & (~ edg(x2, x3)))"
ff = Fog.read(op.to_str(f))
assert f == ff
Clearly the resulted string is logically equivalent to the one just read, and
the assertion holds in this case, however it should be noted that
a formula object simply represents its syntax
and f == ff does not mean asserting the logical equivalence.
The above assertion holds simply because the precedence of & is
interpreted as being left associative.
Hence, the following formula fff, explicitly specified
as being right associative, is determined to be a different formula from
f (and ff).
fff = Fog.read("((~ edg(x1, x2)) & ((~ edg(x1, x3)) & (~ edg(x2, x3))))")
assert f != fff
Note: when the class variable partitioning_order of Prop class and that
of Fog class are set True, binary operations for objects of these classes
with equal priority are associated in such a way that objects are recursively
partitioned into the left and the right halves.
Fog.partitioning_order = False
f = Fog.read("T & T & T & T")
assert op.to_str(f) == "(((T & T) & T) & T)"
Fog.partitioning_order = True
f = Fog.read("T & T & T & T")
assert op.to_str(f) == "((T & T) & (T & T))"
Fog.partitioning_order = False
Name and Index of Symbol¶
As soon as an expression is parsed, all variable symbols as well as constant symbols appearing in the formula will be registered to NameManager class and unique indices are assigned.
from pygplib import NameMgr
v = NameMgr.lookup_index("x1")
name = NameMgr.lookup_name(v)
assert "x1" == name
Reset NameManager, if necessary, by NameMgr.clear(), which deletes all
registered names and indices.
NameMgr.clear()
assert not NameMgr.has_index("x1")
v = NameMgr.lookup_index("x1") # new index is issued for the first time.
assert NameMgr.has_index("x1")
assert NameMgr.has_name(v)
NameMgr.clear()
assert not NameMgr.has_name(v)
assert not NameMgr.has_index("x1")
The name of a variable symbol should begin with a lowercase letter, followed by zero or more lower case letters, digits, or underscore, while the name of a constant symbol should begin with an uppercase letter, followed by zero or more uppercase letters, digits, or underscore.
NameMgr.clear()
v = NameMgr.lookup_index("x1")
w = NameMgr.lookup_index("V1")
assert NameMgr.is_variable(v)
assert NameMgr.is_constant(w)
Constructing First-Order Formula¶
An arbitrary well-formed formula can be constructed with built-in operators.
Basic Operators¶
v = NameMgr.lookup_index("x")
w = NameMgr.lookup_index("y")
f = Fog.edg(v,w)
assert op.to_str(f) == "edg(x, y)"
g = Fog.neg(Fog.eq(v,w))
assert op.to_str(g) == "(~ x = y)"
h = Fog.implies(f,g)
assert op.to_str(h) == "(edg(x, y) -> (~ x = y))"
Fog.true_const()returns the true constant,T.Fog.false_const()returns the false constantF.Fog.neg(f)returns the negation off.Fog.land(f,g)returns the AND offandg.Fog.lor(f,g)returns the OR offandg.Fog.implies(f,g)returns the implication fromftog.Fog.iff(f,g)returns the equivalence betweenfandg.Fog.forall(f,v)returns the formula in which all free occurrences of the name ofvare universally quantified.Fog.exists(f,v)returns the formula in which all free occurrences of the name ofvare existentially quantified.Fog.eq(v,w)returns the formula written asx=yin string, wherexandyare the names ofvandw.Fog.edg(v,w)returns the formula written asedg(x,y)in string, wherexandyare the names ofvandw.
Utility Functions and Advanced Operations¶
Some utility functions and advanced operations are listed.
to_str(f)returns the string representation of formula objectf.print_formula(f, stream=out, fmt=type)prints out formula object in stream (stdout if not given) in human-readable format (fmt=“str”) or DOT format (fmt=“dot”).reduce_formula(f)returns the reduced result obtained fromfby applying equivalent transformations.get_free_vars(f)returns a tuple of the indices of all free variables off.get_free_vars_and_consts(f)returns a tuple of not only the indices of free variables but also the indices of constants.perform_boolean_encoding(f, st)returns a propositional formula that is equivalent to first-order formulafin graph structurest. Note: since this method performs quantifier elimination, it would take much time and space if a formula contains quantifiers and a graph is large.
The formula written as edg(x, x) is unsatisfiable as we assume that a graph has no loop.
The formula is evaluated to false by op.reduce_formula() just like
eval() does in arithmetic expression.
f = Fog.read("edg(x, x)")
assert f != Fog.false_const()
assert op.reduce_formula(f) == Fog.false_const()
However, op.reduce_formula() only performs a few equivalent transformations
and the resulted formula not always becomes irreducible, as demonstrated
in the following code block.
f = Fog.read("edg(x, y) -> (~ x = y)")
for v in op.get_free_vars(f):
f = Fog.forall(f,v)
assert op.to_str(f) == "(! [y] : (! [x] : (edg(x, y) -> (~ x = y))))"
assert op.reduce_formula(f) != Fog.true_const()
# f is logically equivalent to true, but it is not reduced upto true.
A formula can be printed out in DOT format, allowing us to visualize.
from pygplib import Fog
from pygplib import op
f = Fog.read("! [w] : (w=x1 | w=x2 | w=x3 | edg(w,x1) | edg(w,x2) | edg(w,x3))")
with open("f.dot","w") as out:
op.print_formula(f,stream=out,fmt="dot")
$ dot -Tpng f.dot -o f.png
The above image depicts the data structure of a first-order formula. The
whole formula consists of objects of Fog class with the root node f.
Creating Graph Structure¶
In order to encode a first-order formula, it is necessary to create a
graph structure. A graph structure is an
object of GrSt class, which manages domain of discourse as well as
the interpretation of relation symbols over it.
Moreover GrSt class manages the encoding and decoding
between first-order variables and CNF variables.
Currently there are different ways for the initialization of GrSt objects,
depending on the types of domain encoding: edge encoding,
clique encoding, direct encoding, and log encoding.
The first example is edge-encoding. As commented, each vertex is assigned a binary code (a row vector) of the matrix, which is a vertex-edge incidence matrix.
from pygplib import GrSt
from pygplib import NameMgr
# V1 --- V2
# \ /
# \ /
# V3
# / \
# / \
# V4---V5
vertex_list = [1,2,3,4,5]
edge_list = [(1,2),(1,3),(2,3),(3,4),(3,5),(4,5)]
#
# V1 |1 1 0 0 0 0|
# V2 |1 0 1 0 0 0|
# V3 |0 1 1 1 1 0|
# V4 |0 0 0 1 0 1|
# V5 |0 0 0 0 1 1|
st = GrSt(vertex_list, edge_list, encoding="edge", prefix="V")
assert NameMgr.lookup_name(st.vertex_to_object(vertex_list[1])) == "V2"
As above, vertex_to_object() converts a vertex into a constant symbol
index. When a graph structure is initialized, the constant symbols for all vertices are
registered to NameMgr so that the name of a constant symbol begins with a given prefix,
followed by a vertex index.
If a prefix is not given, default prefix is V.
The second example is clique-encoding.
The following matrix is a vertex-clique incidence matrix, where
the collection of cliques, designated by column vectors, is
a separating edge clique cover.
In general, clique-encoding has size less than or equal
to the edge-encoding.
The program for computing a separating edge clique cover is developed by the
author of pygplib, but it is based on
heuristic algorithms by Conte et al .
Although the program ECC8 developed in Java
by Conte is publicly available, it is not used in pygplib to make it
self-contained and pure-Python module.
#
# V1 |1 0 1 0|
# V2 |1 0 0 0|
# V3 |1 1 1 1|
# V4 |0 1 0 0|
# V5 |0 1 0 1|
st = GrSt(vertex_list, edge_list, encoding="clique", prefix="V")
The third example is direct-encoding (or one-hot encoding).
Given the following structure, a first-order variables is assigned vertex,
say V2, if and only if it has the code of high value at the
corresponding bit 01000.
#
# V1 |1 0 0 0 0|
# V2 |0 1 0 0 0|
# V3 |0 0 1 0 0|
# V4 |0 0 0 1 0|
# V5 |0 0 0 0 1|
st = GrSt(vertex_list, edge_list, encoding="direct", prefix="V")
The forth example is log-encoding. Each vertex is assigned with the binary representation of its index.
#
# V1 |0 0 0|
# V2 |1 0 0|
# V3 |0 1 0|
# V4 |1 1 0|
# V5 |0 0 1|
st = GrSt(vertex_list, edge_list, encoding="log", prefix="V")
Note: Interpretation of Atoms¶
The following formulas are evaluated to true regardless of terms
x, y, and graph structures.
~ edg(x, x)edg(x, y) -> edg(y, x)x = xx = y -> y = xx = y & y = z -> x = z
Encoding First-Order Formula¶
Let us now describe how first-order formulas can be encoded into CNFs.
In the following code block, a graph structure is created.
A first-order formula is parsed and constructed.
It is then converted to a list of propositional formulas
Prop [g, ] + li, with which Cnf object mgr is created.
from pygplib import Fog, op, GrSt, Cnf, Prop
vertex_list = [1,2,3,4,5]
edge_list = [(1,2),(1,3),(2,3),(3,4),(3,5),(4,5)]
st = GrSt(vertex_list, edge_list, encoding="clique", prefix="V")
f = Fog.read("(~ edg(x1,x2)) & (~ edg(x1,x3)) & (~ edg(x2,x3))")
g = op.perform_boolean_encoding(f, st)
li = [st.compute_domain_constraint(v) \
for v in op.get_free_vars(f)]
mgr = Cnf( [g, ] + li , st=st)
# If the 2nd argument is given, the extra information of
# encoding will be added to the header of DIMACS CNF file.
We will describe this code block in more details in the following sections
in terms of the Boolean encoding part (the computation of
g and li) and the CNF encoding part.
Boolean Encoding¶
We will describe why we consider not only g but also li in the
previous code block. Remember that a first-order variable runs over
vertices (valid binary codes), in other words, a variable never runs
outside domain. To impose this (called domain
constraints) on first-order variables, we need to add li, a list of
propositional formula objects, one for each first-order variable.
Let us visualize a domain constraint for a better understanding.
with open("t1.dot","w") as out:
op.print_formula(li[0],stream=out,fmt="dot")
$ dot -Tpng t1.dot -o t1.png
The above image depicts the domain constraint for x3 in clique-encoding,
where x3@1, x3@2, x3@3, and x3@4 represents propositional variables
representing the binary code of x3.
The li consists of the constraints for x1, x2, and x3, while
g represents the propositional formula for f but excluding domain constraints.
In summary, the propositional formula encoded from f amounts to the
conjunction of g, li[0], li[1], and li[2].
Note that domain constraints in one encoding are logically equivalent but different as formulas from those in another encoding. The same applies to the propositional formulas encoded from atomic first-order formulas.
CNF Encoding¶
When a Cnf object is initialized,
the list of propositional formulas given as input is converted to CNF.
A Cnf object is able to generate CNF in DIMACS CNF format.
It should be noted that
the indices of CNF variables in the output are changed so that there is no
missing index.
A Cnf object manages the index mapping between
internal CNF variables (those in propositional formulas in [g, ] + li)
and external CNF variables (those in the output DIMACS CNF).
The easiest way to solve CNF would be to use
pysat, a toolkit for SAT-based prototyping in Python .
The pygplib in itself does not provide any functionality of
solving formulas, and is independent of pysat module.
Please see Examples of Usage.
In the following code block, the CNF manager mgr generates a CNF to a file in
DIMACS CNF format, which provides an alternative way to solve formulas with external solvers that conforms to
the DIMACS CNF requirements .
with open("f.cnf","w") as out:
mgr.write(stream=out)
If an external solver is used, the comment lines of a DIMACS CNF file might be useful to decode a satisfying assignment.
$ cat f.cnf
(The first part omitted)
c dom V1: 2 4
c dom V2: 2
c dom V3: 1 2 3 4
c dom V4: 1 3
c dom V5: 1
c enc 2 x1@1
c enc 4 x2@1
c enc 7 x1@2
c enc 9 x2@2
c enc 13 x1@3
c enc 15 x2@3
c enc 19 x1@4
(The remaining part omitted)
Each line beginning with c dom shows the code of each vertex, and
each line beginning with c enc shows the mapping between external CNF
variable indices and internal CNF variable names:
c enc <dimacs_cnf_variable_index> <name_of_first_order_variable>@<bit>
, where the name of an internal CNF variable is the concatenation of the
corresponding first-order variable and bit position.
For instance, the above header means that a first-order variable, say x1,
is encoded in such a way that the first-bit x1@1 is represented by
DIMACS CNF variable 2, the second bit x1@2 by 4, and so on;
The assignment x1@1=0, x1@2=1, x1@3=0, and x1@4=1 means that
x1 is assigned vertex V1 (because the code of V1 is (2,4)).
The CNF computation is done by Tseitin transformation. There is a one-to-one correspondence between satisfying assignments of CNF variables and those of first-order variables.