Assembly Language Extensions of Visual Prolog
For Data Mining of Huge Data


by George A. Stathis (ENB Ltd)



PRESENTATION

at the "ALC Visual Prolog Programming Conference"
  in Faro / Portugal, 25 April 2006



ABSTRACT

 

INTRODUCTION


One important reason that discourages the use of Prolog (and Visual Prolog) in industry is the inherent difficulty of traditional recursive Prolog programming techniques to deal with huge data adequately. Most Prolog textbooks are prolific with quite elegant list-processing predicates that are adequate for lists of a few hundred (or a few thousand) elements. When the number of list-elements exceeds a certain limit, most of these ‘elegant’ predicates (even using the best “tail recursion optimization”) tend to cause crashes. Mainstream languages (such as Visual Basic or C++/C) tend to have simple ways of avoiding such problems (the use of iteration instead of recursion, etc.) that reinforce certain common prejudices against the use of Prolog, while underestimating the advantages and the high-level expressiveness of Prolog (beyond doubt in academic circles).

 

In this paper we present an overview of very specific methodologies that were used in real-world problems, producing quite adequate -or even impressive- results by the combined use of Visual Prolog and Assembly Language.

 

Combining Visual Prolog with Assembly (or with any other programming language) is not at all difficult to implement, from the point of view of linking. Any predicates implemented in another programming language and linked to Visual Prolog appear and behave exactly as if they were implemented in Visual Prolog.

Visual Prolog has been designed with serious optimizations in mind: Whenever possible, it tries to avoid unnecessary computations (that other Prolog compilers perform all the time) if these computations are not required for the operation of each specific (e.g. deterministic) predicate. All linked “foreign language code” works seamlessly inside native Prolog code. As a result, the linking of Assembly language subroutines “as if they are Prolog predicates” is efficient from the outset, and these foreign predicates run as fast as they would in any other (“mainstream”) developer environment.

 

PREDICATE LIBRARIES

At the time of writing (March 2006) there are more than 1100 Visual Prolog predicates that have been implemented in Assembler, as well as more than 150 predicates implemented as ‘C’ glue-functions. In addition, about 20 new predicates are added to these libraries every month (depending on the practical needs of programming projects). There are families of such predicates dealing with lists, strings and characters, “binaries”, CSV Excel-files (texts delimited by separators), time-series, dates, tree-like data structures and other objects. All these predicates originated from real needs in real software applications. The efficiency and speed of these applications destroys the myth of Prolog being “just an academic language”, while outperforming commercial applications of similar nature developed using mainstream languages by a significant factor, typically of an order of magnitude - and quite often more.

 

WORK ORGANIZATION

This work has been organized on the basis of the fact that Assembly Language code is not easy to modify or debug unless it is properly commented, and it is not useable at all unless the comments include instructions for calling each routine externally as a Prolog predicate.

 

CONVERSIONS OF STRINGS TO LISTS

 

Such conversions are typically necessary for 95% of current Prolog projects. The growing need for such predicates led to about a dozen of them implemented in Assembly. Their full global definitions are listed in Appendix A: This library includes customized predicates implementing many such conversions, from strings to lists -and vice versa- in a variety of possible ways.

 

When dealing with large texts, our most basic need is to copy them inside main memory for further processing. Unfortunately, it has been verified that -after certain specific list or text size limits- almost any Prolog predicate that uses recursion is useless for simple tasks, like converting texts to lists!

 

This situation can (of course) be improved: If we don’t want to use Assembly language, it is certainly possible to create optimized Visual Prolog predicates that minimize the use of the stack, so that texts of somewhat larger sizes (than before) can be converted, at somewhat improved speeds.

 

Visual Prolog includes implementations of (non-ISO) predicates for string-handling that can access the memory of strings (or lists) directly. Examples of code using these built-in predicates are already given inside Visual Prolog’s documentation for advanced users, so they won’t be repeated here. The important thing, however, is that even such optimized coding, impossible to do in most other Prolog compilers (which lack those extra predicates) and possible only in Visual Prolog, inevitably and very quickly hits against some new limits of speed and memory, that can be shown to be (this time) unsurpassable.

 

One of our most highly optimized conversion predicates is ‘strx2list’: It can convert strings (typically large texts, which can be of several megabytes size) to Visual Prolog string-lists. Here is a table of measured execution times, for the optimized predicate “strx2list”, in all its possible variants and memory methods.

 

Size

List-Size

(A)

(B)

(C)

(Mb):

elements

stack

alloc.

Heap

alloc.

Pre-

alloc.

1

100,000

0.01

<0.01

<0.01

2

200,000

0.02

0.01

<0.01

3

300,000

0.03

0.01

0.02

4

400,000

0.03

0.03

0.02

5

500,000

Crash!

0.03

0.02

6

600,000

-

0.05

0.03

7

700,000

-

0.06

0.03

8

800,000

-

0.06

0.03

10

1,000,000

-

0.09

0.05

20

2,000,000

-

0.17

0.09

30

3,000,000

-

0.26

0.14

40

4,000,000

-

0.36

0.18

50

5,000,000

-

0.47

0.23

60

6,000,000

-

0.55

0.28

70

7,000,000

-

0.66

0.36

80

8,000,000

-

0.75

0.39

90

9,000,000

-

0.84

0.44

100

10,000,000

-

0.88

0.48

110

11,000,000

-

1.00

0.50

120

12,000,000

-

1.02

0.52

130

13,000,000

-

1.17

0.61

140

14,000,000

-

1.29

0.64

150

15,000,000

-

1.36

0.72

160

16,000,000

-

1.45

0.74

170

17,000,000

-

1.56

0.80

180

18,000,000

-

2.31

1.48

190

19,000,000

-

3.30

2.12

200

20,000,000

-

8.06

5.08

……

….

250

25,000,000

-

20.45

19.58

 

These timings show an apparently impressive speed: -It is evidently possible to convert a 100Mb text of 10 million lines to a list (of 10 million elements) in much less than 1 second!

 

The timings (in seconds) of columns A, B, C refer to literally millions or tens of millions of elements! Alas, it was discovered that (after a few hundred thousand elements) most high-level language code does not work (Prolog, as well as any language). Optimised Prolog code executes at about one order of magnitude (i.e. 10 times) slower than optimised Assembly for the restricted small range of text and list sizes where it does work.

Now, although these results are correct (even at higher data-sizes than those shown in the table) at a turning point of about 250 Mb of text (and 25 million elements) the execution time of the ASM-predicates increases steeply and unpredictably, due to RAM limitations of the specific computer used (P4, 512 Mb).

 

Experienced users of Visual Prolog may have noticed that columns A, B, C (of the previous timing-table) carry annotations of the memory allocation method used in each case. And the three possible ways of allocating memory are

A) Visual Prolog’s stack, B) Visual Prolog’s heap, and C) no new allocation (i.e. memory must be already allocated, before the call).

All these three memory allocation methods are possible, using the predicate “strx2list”: It was implemented in two variants, one with two and one with three input-arguments.

 

PREDICATES

 strx2list : (string8 STR, char sepc)

-> slist

procedure(i,i) language c

 strx2list : (string8 STR, char sepc,

              binary Buffer)

è      slist procedure(i,i,i)

language c

 

Our main concern is producing an output-list which is correct and adequate for fast further processing. The inputs are recycled, in a very destructive “procedural” way -that can make a Prolog purist turn his head away in dismay:

 

The first variant strx2list/2 allocates memory before returning the output-list, but… only as much memory as strictly necessary. In order to operate well under “extreme conditions”, it does not allocate any memory for the strings inside the list: Instead -like the 2nd predicate- it re-uses the input string in “recycled form” by chopping it up into pieces, while inserting pointers to these pieces inside the list, rather than creating new strings and copying the old.

 

If you are somewhat familiar with Assembly language, you might see what is going on in detail by looking into Appendix B, the source code of “strx2list”: The assembly language main loop checks out every byte of the input string for presence of a “separator-character” (arg. 2) while also planting into this position a zero-byte (signifier of the “end of a string” in ‘C’ and Visual Prolog) and then inserts as a list-element only a pointer to the start of each chopped part of the string (rather than a new string). Consequently, we could say that the input-text is “recycled”, to play the part of the output-strings (in the list) as well.

 

This is a raw memory allocation strategy that works very well to economise memory, when processing huge texts and lists in RAM.

 

As regards the internal structure of a “list” in Visual Prolog, suffice it to say that it consists of 12-byte structures, where the first 4 bytes are indicators of “element-type” (“normal” or end elements), the next 4 bytes are (pointers to) each element itself, and the last 4 bytes are pointers to the “next list-element”. So, instead of allocating small pieces of memory N times (for N list-members), memory can be allocated only once, for an entire list, if the number of list-elements is known: For this reason, the first predicate strx2list/2 has two internal loops: (1) One for finding the number of lines in the text (which is also the number of list-elements) and (2) a second loop that uses this number to create each list-element. Before the second loop begins, a single call of an external memory allocation predicate is enough to reserve memory for the entire list (either in the stack, or in the heap).

 

The second variant of strx2list/3 was created for those rare cases when we need to call this predicate several times, inside another loop (external to it). In this case, it is not really worth allocating memory during every call. Instead, memory is allocated prior to the outer loop only once, to create a binary buffer which is used by strx2list/3 as a third (bound) argument, a Visual Prolog “binary”, pre-allocated with 12*(N+1) bytes, where N is the pre-calculated number of list-elements. So, the second variant, strx2list/3, uses no memory allocation at all. Instead of this, it re-cycles all the memory of its own inputs. The reason we use a “binary” is that Visual Prolog binaries are equipped with an indicator of size embedded in a memory location before their pointer. This size-indicator can be used later to release the heap-memory -if in the heap- precisely as required, without side-effects that can arise from intermediate processing, e.g. zero-bytes inside a string making it appear shorter -confusingly for de-allocation- etc.

 

Of course, there are other similar conversion predicates in this family that don’t behave as stingily towards memory, as does “strx2list”: Predicates “str2list” (with no “x” before the 2) and “conclist/2” are routinely used in most ordinary text-to-list /  list-to-text conversions.

 

In Appendix A, a full list of Global Predicate specifications is given for all list-conversion predicates in these libraries, as well as many other list-handling ASM predicates. Some of these predicates are non-deterministic: There are interesting non-deterministic versions of list-membership and other list-operations in this library that cannot be discussed here, due to limitations of space and time.

 

As a final note -about the predicate strx2list- that may be of general value, if you examine the source code of strx2list, in Appendix B, you may be surprised to realize that it doesn’t call literally any external memory allocation predicates. Instead of this, it uses function-pointers to an external allocation predicate, included in the library’s Global data segment as “_allocfunc”. This was done for flexibility; so that the actual allocation predicate may be changed, at any time. E.g. the test-program, which produced the results for the previous timing-table, used different assignments for “_allocfunc” for each column of the results:

 

Column A was generated while using (as the actual contents of the location “_allocfunc”) Visual Prolog’s stack-allocation predicate “_MEM_AllocGStack”, and column B was created by using another -similar- predicate, which allocates memory in the heap, instead.

The results (shown in the timing table) should not surprise experienced Visual Prolog users:

·         Heap memory is much more suitable -than the stack- for storing huge data!

 

In conclusion, programming techniques such as these may seem “dirty” to “Prolog Purists” (since they are “procedural as hell”) but many years of Visual Prolog programming dictate the necessity of using such “dirty methods” when we are in desperate need to process huge amounts of data in RAM, rather than in hard disks.

 

Of course, today’s hard disk databases (with SQL, etc) are more efficient than ever before. However, there are important advantages in keeping as much as possible of our data in RAM, when we want to do (e.g.) exhaustive fuzzy string-matching, cross-examining each list-element versus all other list-elements, etc.

 

 

NO HARD-CODED EXTERNAL CALLS

 

Avoiding hard-coded external function-calls was found to be good programming practice.

 

Today all Assembly language predicates in these libraries use pointers to external calls instead of literal external predicates. So, if (e.g.) a newer version of Visual Prolog (or a new ‘C’ compiler, used for mixed language programming) is installed, it is still possible to use the same libraries, with minor changes in very few pointer-definition files where the names of some external predicates are stored (and are compiled into pointers immediately).

 

Alternatively, it is also possible to use initial calls of certain “configuration predicates”, to fill the locations of external call pointers with addresses of external predicates at run-time.

 

So, instead of _IO_WrStr (a built-in Visual Prolog/‘C’ predicate that writes strings to the screen or to current output, defined in the file “pdcrunt.h”) many ASM predicates use a call to the function pointer “_wr_str”, located in the Data Segment. This makes it possible to do much more than just writing on the screen:

E.g. if the contents of location _wr_str are changed at run-time to contain the address of hp_wrstr, a new ASM predicate that writes strings to “virtual devices” inside the heap, it becomes possible to accumulate text that may be the result of some processing, writing it in small lumps inside the heap, and in the end to dump it -all at once- to a file, or to the screen, after certain conditions are met.

 

Nevertheless, the default-values of all these memory locations are pre-filled with Visual Prolog predicates, and are rarely expected to change: Location “_wr_ch” contains Visual Prolog’s “_IO_WrCh”, location “_wr_int” contains Visual Prolog’s “_IO_WrIntg”, and so on. If any of these need changing, there are appropriate ASM-predicates to do the job.

 

For example, in order to change at run-time the current string-writing predicate, there is a configuration predicate, “set_wr_str/1” (in Visual Prolog 5.* syntax):

 

GLOBAL DOMAINS

  STR_i: determ (STRING)

–(i) language c

 

GLOBAL PREDICATES

   set_wr_str( STR_i )

       –(i) language c

 

Calling “set_wr_str(foo)” where “foo” is any predicate of domain “STR_i”, results in “foo” becoming the current string-writing predicate globally called by every ASM predicate (that does any calls to any external string-writing).

 

Now, “foo” is not “obliged” to do any literal string-writing; Instead of this, it could very happily be asked to do other things, such as calling “assert” (so that -instead of writing- it asserts facts in a Prolog database, and so on).

 

Another common example (part of which was discussed earlier, with “strx2list”) is this: To change at run-time the current global memory allocation predicate used (by “strx2list”, as well as others), there is a global configuration predicate, “set_allocfunc”:

 

GLOBAL DOMAINS

allocbin_func

  = determ BINARY (UNSIGNED size)

        -(i) language c

GLOBAL PREDICATES

  set_allocfunc(allocbin_func)

       -(i) language c

 

The possibility of changing the currently used memory-allocation predicate (globally) by a program on the fly is also useful when calling these predicates inside a “main program” that is in a language other than Visual Prolog.

 

SPECIALISED WRITING PREDICATES

 

The ASM-predicate write_slist, which writes a list of strings with separators, belongs to a large family of similar writing predicates that can have their output redirected anywhere by changing the function pointers called by them for writing. The default-values of the pointers (as explained before) are set to Visual Prolog writing predicates, but can be changed to any other predicates at any time.

 

The family where “write_slist” belongs also includes predicates that perform sophisticated writing scripts: Some of them can write the contents of Visual Prolog binaries in ways understandable by humans; others may write integer-lists, strings and binaries in ways that are needed by specific software applications, e.g. to create CSV Excel-files or HTML files.

 

All these can be redirected for writing to the heap (in “virtual strings”), or for doing other (often unexpected) tasks irrelevant to writing:

Any predicate of similar type to “_IO_WrStr” may be used as a “virtual writing predicate”, by inserting its address in location “_wr_str”.

 

In Appendix C there is a complete listing of all these “special writing predicates”, with some brief comments about their operation.

Appendix C also includes another big family of ASM predicates, which are often useful for experimentations and software development, but are not really so important for most final applications: Random generation predicates.

 

CSV-FILE HANDLING PREDICATES

 

In Appendix D there are Predicate definitions and descriptions of an Assembly language library to process CSV- (Excel) files, in many possible ways. Some of these ASM predicates analyse CSV files to extract parameters from them (e.g. their number of data-fields); others write them out in various ways; others select, delete or insert columns, rows, or cells, etc.

 

However, due to space and time limitations this CSV-file handling library is not included in the main part of this presentation.

 

 

USING THE HEAP FOR “STREAMING”

 

A common need is to suspend actual writing (to the screen, or to a file), writing everything to a virtual device or “stream”, which can be flushed. So, another family of predicates was created for this; writing strings, numbers and characters to “virtual strings” inside the heap.

 

First, the initialisation-predicate “heap_init” should be called, with a single argument: the number of virtual heap strings required.

 

This call allocates heap-memory for the array of pointers (and all the “position indicators”) needed by the “virtual strings”, but not for the strings themselves. To allocate heap memory for each individual virtual string, predicate heap_create/1 must be called, once for each “virtual string”: It has only one argument, the required size of every string; It returns (in register EAX) the “integer-handle” for each “virtual string”. This “handle” can then be re-used for “virtual writing”, i.e. adding content to the specific “virtual string” in ways that are similar to writing on the screen, or to a file.

 

If we don’t need many such “virtual strings”, or if we want to use one of them most of the time, a “current default string” can be defined by predicate “hpstream/1”, with a single argument: -This virtual string’s “handle”.

 

Some Assembly language predicates in this “heap-string library” emulate precisely Visual Prolog’s _IO_WrStr, _IO_WrCh, IO_WrIntg,  etc, but (this time) writing to “virtual strings”, rather than to the screen or to a file. Their “target” is the “current virtual string” (which can be changed by “hpstream/1”). These are predicates that can be inserted as “clones” of standard writing predicates, in the locations invoked by “write_slist” (and many others).

 

See Appendix E for a listing of all the global predicate definitions in this specialised heap-writing and “streaming” library.

 

 

ENHANCED SEARCH PREDICATES

 

 

An entirely new family of predicates for searching (inside strings, as well as inside binaries) was implemented in Pure Assembly language. This family of predicates contains many enhancements, including fuzzy search, non-deterministic search, etc), in order to:

 

1)      Start from a pre-specified position (rather than always at the beginning)

2)      Return the “rest” (after the search, as a pointer; not through memory allocation)

3)     Search backwards (used in parsers)

4)      Search non-deterministically (a Prolog–specific feature, useful for Data Mining)

5)     Search case-insensitively, and with Generalized Case-insensitivity  based on Character Table techniques

6)      Do fuzzy searches and inexact pattern- matching Searches (in many ways),

7)      Search optimally, using the Boyer-Moore algorithm (and other types of optimized search)

8)      Do everything for both Strings and Binaries (providing all possible variants of  all these predicates, together with all their features; as a result, e.g. there are 15 different variants of  nd_search” alone, whi