Assembly Language
Extensions of Visual
Prolog
For Data Mining of Huge Data
by George
A. Stathis
(ENB Ltd)
PRESENTATION
at the "ALC Visual Prolog
Programming Conference"
in Faro / Portugal, 25 April 2006
INTRODUCTION
One important reason that
discourages the use of Prolog (and Visual Prolog) in industry is the
inherent
difficulty of traditional recursive
Prolog programming techniques to deal with huge
data adequately. Most Prolog textbooks are prolific with quite
elegant
list-processing predicates that are adequate for lists of a few hundred
(or a
few thousand) elements. When the number of list-elements exceeds a
certain
limit, most of these ‘elegant’ predicates (even using the best “tail
recursion
optimization”) tend to cause crashes.
Mainstream languages (such as Visual Basic or C++/C) tend to have
simple ways
of avoiding such problems (the use of iteration instead of recursion,
etc.)
that reinforce certain common prejudices against the use of Prolog,
while
underestimating the advantages and the high-level expressiveness of
Prolog
(beyond doubt in academic circles).
In this paper we present an
overview of very specific methodologies that were used in real-world
problems,
producing quite adequate -or even impressive-
results by the combined use of Visual
Prolog and Assembly Language.
Combining Visual Prolog
with Assembly (or with any other
programming language) is not at all difficult to implement, from the
point of
view of linking. Any predicates
implemented in another programming language and linked to Visual Prolog
appear
and behave exactly as if they were
implemented in Visual
Prolog.
Visual Prolog has been
designed with serious optimizations in mind: Whenever possible, it
tries to
avoid unnecessary computations (that other Prolog compilers perform all
the
time) if these computations are not required for the operation of each
specific
(e.g. deterministic) predicate. All linked “foreign language code”
works
seamlessly inside native Prolog code. As a result, the linking of
Assembly
language subroutines “as if they are Prolog predicates” is efficient
from the
outset, and these foreign predicates run as fast as they would in any
other
(“mainstream”) developer environment.
PREDICATE
LIBRARIES
At the time of writing
(March 2006) there are more than 1100 Visual Prolog predicates that
have been
implemented in Assembler, as well as more than 150 predicates
implemented as
‘C’ glue-functions. In addition, about 20 new predicates are added to
these
libraries every month (depending on the practical needs of programming
projects).
There are families of such predicates dealing with lists, strings and
characters, “binaries”, CSV Excel-files (texts delimited by
separators),
time-series, dates, tree-like data structures and other objects. All
these
predicates originated from real needs in
real software applications. The efficiency and speed of these
applications
destroys the myth of Prolog being “just an academic language”, while
outperforming commercial applications of similar nature developed using
mainstream languages by a significant factor, typically of an order of
magnitude - and quite often more.
WORK ORGANIZATION
This work has been
organized on the basis of the fact that Assembly Language code is not easy to modify or debug unless it is
properly commented, and it is not
useable at all unless the comments include instructions
for calling each routine externally as a
Prolog predicate.
CONVERSIONS OF
STRINGS TO LISTS
Such conversions are
typically necessary for 95% of current Prolog projects. The growing
need for
such predicates led to about a dozen of them implemented in Assembly.
Their
full global definitions are listed in
Appendix A: This library includes
customized predicates implementing many such conversions, from strings
to lists
-and vice versa- in a variety of possible ways.
When dealing with large
texts, our most basic need is to
copy them inside main memory for further processing. Unfortunately, it
has been
verified that -after certain specific list or text size limits- almost any Prolog predicate that uses recursion
is useless for simple tasks, like
converting texts to lists!
This situation can
(of course) be improved: If we don’t
want to use Assembly language, it is certainly possible to create
optimized
Visual Prolog predicates that minimize the use of the stack, so that
texts of somewhat larger sizes (than before) can
be converted, at somewhat improved
speeds.
Visual Prolog includes
implementations of (non-ISO) predicates for string-handling that can
access the
memory of strings (or lists) directly.
Examples of code using these built-in predicates are already
given inside Visual Prolog’s documentation for
advanced users, so they won’t be repeated here.
The important thing, however, is that even such optimized coding,
impossible to
do in most other Prolog compilers
(which lack those extra predicates) and possible only in
Visual Prolog, inevitably and very quickly hits against
some new limits of speed and memory,
that can be shown to be (this time) unsurpassable.
One of our most
highly optimized conversion
predicates is ‘strx2list’: It can
convert strings (typically large texts,
which can be of several megabytes
size) to Visual Prolog string-lists.
Here is a table of measured execution times, for the
optimized predicate “strx2list”, in
all its possible variants and memory
methods.
|
Size |
List-Size |
(A) |
(B) |
(C) |
|
(Mb): |
elements |
stack alloc. |
Heap alloc. |
Pre- alloc. |
|
1 |
100,000 |
0.01 |
<0.01 |
<0.01 |
|
2 |
200,000 |
0.02 |
0.01 |
<0.01 |
|
3 |
300,000 |
0.03 |
0.01 |
0.02 |
|
4 |
400,000 |
0.03 |
0.03 |
0.02 |
|
5 |
500,000 |
Crash! |
0.03 |
0.02 |
|
6 |
600,000 |
- |
0.05 |
0.03 |
|
7 |
700,000 |
- |
0.06 |
0.03 |
|
8 |
800,000 |
- |
0.06 |
0.03 |
|
… |
… |
… |
… |
… |
|
10 |
1,000,000 |
- |
0.09 |
0.05 |
|
20 |
2,000,000 |
- |
0.17 |
0.09 |
|
30 |
3,000,000 |
- |
0.26 |
0.14 |
|
40 |
4,000,000 |
- |
0.36 |
0.18 |
|
50 |
5,000,000 |
- |
0.47 |
0.23 |
|
60 |
6,000,000 |
- |
0.55 |
0.28 |
|
70 |
7,000,000 |
- |
0.66 |
0.36 |
|
80 |
8,000,000 |
- |
0.75 |
0.39 |
|
90 |
9,000,000 |
- |
0.84 |
0.44 |
|
100 |
10,000,000 |
- |
0.88 |
0.48 |
|
110 |
11,000,000 |
- |
1.00 |
0.50 |
|
120 |
12,000,000 |
- |
1.02 |
0.52 |
|
130 |
13,000,000 |
- |
1.17 |
0.61 |
|
140 |
14,000,000 |
- |
1.29 |
0.64 |
|
150 |
15,000,000 |
- |
1.36 |
0.72 |
|
160 |
16,000,000 |
- |
1.45 |
0.74 |
|
170 |
17,000,000 |
- |
1.56 |
0.80 |
|
180 |
18,000,000 |
- |
2.31 |
1.48 |
|
190 |
19,000,000 |
- |
3.30 |
2.12 |
|
200 |
20,000,000 |
- |
8.06 |
5.08 |
|
… |
…… |
…. |
… |
… |
|
250 |
25,000,000 |
- |
20.45 |
19.58 |
These
timings show an apparently impressive speed:
-It is evidently
possible to convert a 100Mb text of 10 million lines to a list (of 10
million
elements) in much less than 1 second!
The
timings (in seconds) of columns A, B, C refer to literally
millions or tens of millions of
elements! Alas, it
was discovered that (after a few hundred
thousand elements) most high-level language code does
not work (Prolog, as well as any language).
Optimised Prolog code executes at about one order of
magnitude (i.e. 10 times) slower than optimised
Assembly for the
restricted small range of text and list sizes where it does
work.
Now,
although these results are correct (even
at higher data-sizes than those shown in the table) at a turning point
of about
250 Mb of text (and 25 million elements) the execution time of the
ASM-predicates increases steeply and unpredictably,
due to RAM limitations of the specific
computer used (P4, 512 Mb).
Experienced
users of Visual Prolog may have noticed that columns A,
B, C (of the previous timing-table)
carry
annotations of the memory allocation
method used in each case. And the three possible ways of allocating
memory
are
A) Visual Prolog’s stack, B)
Visual Prolog’s heap, and C) no new allocation
(i.e. memory must be already allocated, before
the call).
All
these three memory allocation methods
are possible, using the predicate “strx2list”:
It was implemented in two variants,
one with two and one with three
input-arguments.
PREDICATES
strx2list :
(string8 STR, char sepc)
->
slist
procedure(i,i)
language c
strx2list : (string8 STR, char
sepc,
binary Buffer)
è
slist
procedure(i,i,i)
language
c
Our
main concern is producing an output-list which is correct and adequate for fast further processing.
The inputs are recycled, in a very destructive
“procedural” way -that can
make a Prolog purist turn his head
away in dismay:
The
first variant strx2list/2 allocates memory
before returning the
output-list, but… only as much memory as
strictly necessary. In order to operate well under “extreme
conditions”, it
does not allocate any memory for the
strings inside the list: Instead -like the 2nd predicate- it re-uses the input string in “recycled
form” by chopping it up into pieces, while inserting pointers
to these pieces inside the list, rather than creating new
strings and copying the old.
If
you are somewhat familiar with Assembly language, you might see what is
going
on in detail by looking into Appendix B,
the source code of “strx2list”: The assembly
language main
loop checks out every byte of the input string for presence of a
“separator-character” (arg. 2) while also
planting into this position a zero-byte
(signifier of the “end of a string” in ‘C’ and Visual Prolog) and then
inserts
as a list-element only a pointer to
the start of each chopped part of the
string (rather than a new string). Consequently, we could say that the
input-text is “recycled”, to play the
part of the output-strings (in the list) as
well.
This
is a raw memory allocation strategy
that works very well to economise memory,
when processing huge texts and lists in
RAM.
As
regards the internal structure of a “list” in Visual Prolog, suffice it
to say
that it consists of 12-byte structures, where the first 4 bytes are
indicators
of “element-type” (“normal” or end elements), the next 4 bytes are
(pointers
to) each element itself, and the last 4 bytes are pointers to the “next list-element”. So, instead of
allocating small pieces of memory N times (for N list-members), memory
can be
allocated only once, for an entire
list, if the number of list-elements is known:
For this reason, the first predicate strx2list/2
has two internal loops: (1) One for
finding the number of lines in the text (which is also the number of
list-elements) and (2) a second loop that uses
this number to create each list-element.
Before the second loop begins, a single
call of an external memory allocation predicate is enough to
reserve memory
for the entire list (either in the
stack, or in the heap).
The
second variant of strx2list/3 was created for those rare cases when we
need to
call this predicate several times, inside another loop (external to
it). In
this case, it is not really worth
allocating memory during every call.
Instead, memory is allocated prior to
the outer loop only once, to create a
binary buffer which is used by strx2list/3 as a third (bound) argument,
a
Visual Prolog “binary”, pre-allocated
with 12*(N+1) bytes, where N is the pre-calculated number of
list-elements. So, the second variant, strx2list/3,
uses no memory allocation at all.
Instead of this, it re-cycles all the
memory of its own inputs. The reason we use a “binary”
is that Visual Prolog binaries are equipped with an indicator
of size embedded in a memory location before their
pointer. This
size-indicator can be used later to release
the heap-memory -if in the heap- precisely as required, without
side-effects that can arise from intermediate processing, e.g.
zero-bytes
inside a string making it appear shorter -confusingly for de-allocation-
etc.
Of course, there are other
similar conversion predicates in this family that don’t
behave as stingily
towards memory, as does “strx2list”:
Predicates “str2list” (with no “x”
before the 2) and “conclist/2” are
routinely used in most ordinary text-to-list
/ list-to-text conversions.
In Appendix A, a full list of Global Predicate
specifications is given for all
list-conversion predicates in these libraries, as well as many other list-handling ASM
predicates. Some of these
predicates are non-deterministic:
There are interesting non-deterministic
versions of
list-membership and other list-operations in this library that cannot be discussed here, due to
limitations of space and time.
As
a final note -about the predicate strx2list-
that may be of general value, if you examine the source code of strx2list, in Appendix B,
you may be surprised to realize that
it doesn’t call literally any
external memory allocation predicates. Instead of
this, it uses function-pointers to an external
allocation predicate, included in the library’s Global
data segment as “_allocfunc”. This was
done for flexibility; so that the actual
allocation predicate may be
changed, at any time. E.g. the test-program, which produced the results
for the
previous timing-table, used different
assignments for “_allocfunc” for each column of the results:
Column
A was generated while using (as the actual
contents of the location
“_allocfunc”) Visual Prolog’s stack-allocation predicate
“_MEM_AllocGStack”,
and column B was created by using
another -similar- predicate, which allocates memory in the heap,
instead.
The
results (shown in the timing table) should not
surprise experienced Visual Prolog users:
·
Heap memory is much more suitable -than
the stack- for
storing huge data!
In
conclusion, programming techniques such as these may seem “dirty” to
“Prolog
Purists” (since they are “procedural as hell”) but many years of Visual
Prolog
programming dictate the necessity of
using such “dirty methods” when we are in desperate
need to process huge amounts of data
in RAM, rather than in hard disks.
Of
course, today’s hard disk databases (with SQL, etc) are more efficient
than
ever before. However, there are important
advantages in keeping as much as possible of our data in RAM, when
we want
to do (e.g.) exhaustive fuzzy
string-matching, cross-examining each list-element versus all other
list-elements, etc.
NO HARD-CODED
EXTERNAL
CALLS
Avoiding
hard-coded external function-calls was
found to be good programming practice.
Today
all Assembly language predicates in
these libraries use pointers to external
calls instead of literal external
predicates. So, if (e.g.) a newer version of Visual Prolog (or a new
‘C’
compiler, used for mixed language programming) is installed, it is
still
possible to use the same libraries,
with minor changes in very few pointer-definition
files where the names of some external
predicates are stored (and are compiled
into pointers immediately).
Alternatively,
it is also possible to use initial calls of certain “configuration
predicates”,
to fill the locations of external call pointers with addresses of
external
predicates at run-time.
So,
instead of _IO_WrStr (a built-in
Visual Prolog/‘C’ predicate that writes strings to the screen or to
current
output, defined in the file “pdcrunt.h”) many ASM predicates use a call
to the function pointer “_wr_str”,
located in the Data Segment. This makes it possible to do much
more than just writing on the
screen:
E.g.
if the contents of location _wr_str
are changed at run-time to contain
the address of hp_wrstr, a new ASM
predicate that writes strings to “virtual
devices” inside the heap, it
becomes possible to accumulate text
that may be the result of some processing, writing it in
small lumps inside the heap,
and in the end to dump it -all at once- to a file, or to the
screen, after certain conditions are met.
Nevertheless,
the default-values of all these memory
locations are pre-filled with Visual
Prolog predicates, and are rarely
expected to change: Location “_wr_ch” contains Visual Prolog’s
“_IO_WrCh”,
location “_wr_int” contains Visual Prolog’s “_IO_WrIntg”, and so on. If
any of
these need changing, there are appropriate ASM-predicates to do the
job.
For
example, in order to change at run-time the
current string-writing predicate, there is a configuration
predicate, “set_wr_str/1”
(in Visual Prolog 5.* syntax):
GLOBAL
DOMAINS
STR_i:
determ (STRING)
–(i)
language c
GLOBAL
PREDICATES
set_wr_str( STR_i )
–(i)
language c
Calling
“set_wr_str(foo)” where “foo” is any
predicate of domain “STR_i”, results in “foo” becoming the current
string-writing predicate globally called by every ASM
predicate (that does any calls to any external
string-writing).
Now,
“foo” is not “obliged” to do any literal
string-writing; Instead of this,
it could very happily be asked to do other
things, such as calling “assert” (so
that -instead of writing- it asserts facts
in a Prolog database, and so on).
Another
common example (part of which was discussed earlier, with “strx2list”)
is this: To change at run-time the current global memory
allocation predicate used (by “strx2list”, as well
as others), there
is a global configuration predicate,
“set_allocfunc”:
GLOBAL
DOMAINS
allocbin_func
= determ
BINARY (UNSIGNED size)
-(i)
language c
GLOBAL
PREDICATES
set_allocfunc(allocbin_func)
-(i)
language c
The
possibility of changing the currently
used memory-allocation predicate
(globally) by a program on the fly is
also useful when calling these predicates inside a “main program” that
is in a
language other than Visual Prolog.
SPECIALISED
WRITING
PREDICATES
The
ASM-predicate write_slist, which
writes a list of strings with separators,
belongs to a large family of similar writing predicates that can have
their
output redirected anywhere by
changing the function pointers called
by them for writing. The default-values of the pointers (as explained
before) are
set to Visual Prolog writing predicates, but can be changed to any other predicates at any time.
The
family where “write_slist” belongs also
includes predicates that perform sophisticated writing
scripts: Some of them can write the contents of Visual Prolog
binaries in ways understandable by humans; others may write
integer-lists,
strings and binaries in ways that are needed by specific
software applications, e.g. to create CSV Excel-files or
HTML files.
All
these can be redirected for writing
to the heap (in “virtual strings”), or for doing other (often
unexpected) tasks
irrelevant to writing:
Any predicate of similar type to
“_IO_WrStr” may be used as a “virtual writing predicate”, by inserting
its
address in location “_wr_str”.
In
Appendix C there is a complete listing of all
these “special writing predicates”, with some brief
comments about their operation.
Appendix
C also includes another big family
of ASM predicates, which are
often useful for experimentations and software development, but are not really so important for most final
applications: Random generation predicates.
CSV-FILE HANDLING
PREDICATES
In
Appendix D there are Predicate
definitions and descriptions of an Assembly language library to process
CSV-
(Excel) files, in many possible ways. Some
of these ASM predicates analyse CSV files to extract parameters from
them (e.g.
their number of data-fields); others write them out in various ways;
others
select, delete or insert columns, rows, or cells, etc.
However,
due to space and time limitations this
CSV-file handling library is not
included in the main part of this
presentation.
USING THE HEAP
FOR
“STREAMING”
A
common need is to suspend actual writing (to the screen, or to a file),
writing
everything to a virtual device or “stream”, which can be flushed.
So, another family of predicates was created for this;
writing strings, numbers and characters to “virtual strings” inside the
heap.
First,
the initialisation-predicate “heap_init”
should be called, with a single argument: the number of virtual heap strings required.
This
call allocates heap-memory for the array
of pointers (and all the “position indicators”) needed by the
“virtual
strings”, but not for the strings themselves.
To allocate heap memory for
each individual virtual string,
predicate heap_create/1 must be
called, once for each “virtual
string”: It has only one argument, the required
size of every string; It returns (in register EAX) the “integer-handle”
for each “virtual string”. This “handle” can then
be re-used for “virtual writing”, i.e. adding content to the specific
“virtual
string” in ways that are similar to writing on the screen, or to a
file.
If
we don’t need many such “virtual
strings”, or if we want to use one of them most
of the time, a “current default string” can be defined by predicate
“hpstream/1”, with a single argument:
-This virtual string’s “handle”.
Some
Assembly language predicates in this “heap-string library” emulate
precisely
Visual Prolog’s _IO_WrStr, _IO_WrCh, IO_WrIntg,
etc, but (this time) writing to “virtual strings”, rather than
to the
screen or to a file. Their “target” is the “current virtual string”
(which can
be changed by “hpstream/1”). These
are predicates that can be inserted as “clones” of standard
writing predicates, in the locations invoked by “write_slist”
(and many others).
See
Appendix E
for a listing of all the global predicate
definitions in this specialised heap-writing and “streaming”
library.
ENHANCED SEARCH PREDICATES
1)
Start from a
pre-specified position (rather than
always at the
beginning)
2)
Return the “rest” (after
the search, as a pointer; not through
memory allocation)
3)
Search backwards (used in parsers)
4)
Search
non-deterministically (a Prolog–specific
feature, useful for Data Mining)
5)
Search
case-insensitively, and with Generalized
Case-insensitivity based on Character
Table techniques
6)
Do fuzzy searches and
inexact pattern- matching Searches (in many ways),
7)
Search optimally, using the Boyer-Moore
algorithm (and
other types of optimized search)
8) Do everything for both Strings and Binaries (providing all possible variants of all these predicates, together with all their features; as a result, e.g. there are 15 different variants of “nd_search” alone, whi