| Manipulating Strings with ObjectPAL |
|
|
|
| Contributed by Al Breveleri | |
| 22 January 2002 | |
|
Manipulating Strings with ObjectPAL © 2002 Al Breveleri Previous Section: Part 3: Parsing Grammatically 4. Replacing Parts One of the most common string operations is to replace every instance of a specified substring with a substitute substring. Here are several approaches to this delightful practice. 4.1. Working In a String Variable This algorithm is the fastest but will crash if the string represented by 'XXX' is a substring of that represented by 'YYY'. Listing 10: Replacing all occurrences of the constant string 'XXX' with the constant string 'YYY' proc REPL_XXX_YYY( const asSUBJ string ) string
var
psTEMP, psBGN, psEND string
endvar
psTEMP = asSUBJ
while psTEMP'advMatch("^(..)XXX(..)$",psBGN,psEND)
psTEMP = psBGN + "YYY" + psEND
endwhile
return psTEMP
endproc
This algorithm works correctly even if 'X' is a substring of 'YYY' but the match string can be only one char long.Listing 11: Replacing all occurrences of the constant string 'X' with 'the constant string YYY' where 'X' is one character proc REPL_X_YYY( const asSUBJ string ) string
var
prTOK array [] string
psTEMP string
II longint
endvar
breakApart(asSUBJ+"X",prTOK,"X")
psTEMP = prTOK[1]
for II from 2 to size(prTOK)
psTEMP = psTEMP + "YYY" + prTOK[II]
endfor
return psTEMP
endproc
The fastest way to replace all matches in the general case is with a recursive procedure.Listing 12: Replacing all occurrences of the constant string 'XXX' with the constant string 'YYYXXXZZZ' for the general case, where 'XXX' is more than one character and 'XXX' is a substring of 'YYYXXXZZZ' proc REPL_XXX_YYYXXXZZZ( const asSUBJ string ) string
var
psBGN, psEND string
endvar
if asSUBJ'advMatch("^(..)XXX(..)$",psBGN,psEND) then
return REPL_XXX_YYYXXXZZZ(psBGN) + "YYYXXXZZZ" + REPL_XXX_YYYXXXZZZ(psEND)
else
return asSUBJ
endif
endproc
Only the general case can safely accept arbitrary strings as match and replacement arguments.Listing 13: Replacing all occurrences of <asOLD> with <asNEW> in the general case, where <asOLD> might be more than one character and <asOLD> might be a subset of <asNEW>proc REPL_STR_1( const asSUBJ string, const asOLD string, const asNEW string ) string
var
psBGN, psEND string
endvar
if asSUBJ'advMatch("^(..)"+asOLD+"(..)$", psBGN,psEND) then
return REPL_STR_1(psBGN,asOLD,asNEW) + asNEW + REPL_STR_1(psEND,asOLD,asNEW)
else
return asSUBJ
endif
endproc
Both execution time and stack space can be saved by not passing the unvarying 'asOLD' and 'asNEW' parameters to each recursive call.Listing 14: Slightly faster version of above 4.2. Working In a TextStream File Replacement in a textstream file can only be effected by copying the file, except in the very special case (not discussed here) where the match string and the replacement string are exactly the same length. Listing 15: General case substring replacer for textstream (copies tsSRC to tsDST) ; assuming gtsSRC has been opened globally
; as the input textstream and gtsDST has been created
; globally as the output textstream
proc REPL_FILE( const asOLD string, const asNEW string )
var
psBFFR string
piANCHOR, piFNDBGN, piFNDEND longint
piREMAINING longint
endvar
piFNDEND = 1 ; start from beginning of input file
while true
piANCHOR = piFNDEND
piFNDBGN = piANCHOR
if not gtsSRC'advMatch(piFNDBGN,piFNDEND,asOLD) then
; If a match is found, advMatch sets piFNDBGN to
; point to the first char of the match and piFNDEND
; to point to the first char after it. If a match
; is not found, the next two statements set piFNDBGN
; and piFNDEND to point to the first char after the
; end of the file.
piFNDBGN = size(gtsSRC)+1
piFNDEND = piFNDBGN
endif
; Now, piANCHOR, piFNDBGN, and piFNDEND can be
; compared to determine what was found, and what
; action should be taken:
; case piFNDBGN piFNDEND action
; ---------- ---------- ---------- ----------
; no text, no match = piANCHOR = piFNDBGN quit (end of file)
; no text but match = piANCHOR end of match process match
; text but no match end of text = piFNDBGN process unmatched text
; text and match end of text end of match process text and match
; ---------- ---------- ---------- ----------
; quit if piFNDBGN=piANCHOR and piFNDEND=piFNDBGN
if piFNDEND=piANCHOR then quitloop endif
if piFNDBGN<>piANCHOR then
; unmatched text found
; copy the unmatched text
gtsSRC'setPosition(piANCHOR)
piREMAINING = piFNDBGN-piANCHOR
while piREMAINING>0
gtsSRC'readChars(psBFFR, int(min(piREMAINING,32767)))
gtsDST'writeString(psBFFR)
piREMAINING = piREMAINING-32767
endwhile
endif
if piFNDEND<>piFNDBGN then ; match found
; replace the matched text
gtsDST'writeString(asNEW)
endif
endwhile
endproc
5. Building Long Strings 5.1. Building Into a String Variable Listing 16: Concatenating the contents of a string array gasTOKENS[] ; assuming gasTOKENS is a string array to be
; concatenated together
proc CONCAT_TOKENS_1() string
var psRESULT string II longint endvar
psRESULT = blank()
for II from 1 to size(gasTOKENS)
psRESULT = psRESULT + gasTOKENS[II]
endfor
return psRESULT
endproc
The ObjectPAL RTL string package apparently creates or keeps a pool of 4KB string buffers for operations such as substring extraction and string concatenation. Operations involving a string longer than 4KB involve one or more extra trips to the Windows global dynamic memory manager which IS WRITTEN IN 16 BIT BASIC and is not very efficient. The actual limit seems to be 4095 chars, probably to allow for an empty string.This program concatenates smaller strings into a temporary buffer, and concatenates with the longer string only when the next buffer concatenation would have been over 4095 chars anyway. Listing 17: This is faster in the general case, where the ultimate result may be longer than 4095 chars ; assuming gasTOKENS is a string array to be
; concatenated together
proc CONCAT_TOKENS_2() string
var
psRESULT, psBUFFER string
piBFFRSIZE, piITEMSIZE, II longint
endvar
psRESULT = blank()
psBUFFER = blank()
piBFFRSIZE = 0
for II from 1 to size(gasTOKENS)
piITEMSIZE = size(gasTOKENS[II])
if (piBFFRSIZE+piITEMSIZE)>4095 then
if piITEMSIZE>4095 then
psRESULT = psRESULT + psBUFFER + gasTOKENS[II]
psBUFFER = blank()
piBFFRSIZE = 0
else
psRESULT = psRESULT + psBUFFER
psBUFFER = gasTOKENS[II]
piBFFRSIZE = piITEMSIZE
endif
else
psBUFFER = psBUFFER + gasTOKENS[II]
piBFFRSIZE = piBFFRSIZE+piITEMSIZE
endif
endfor
return psRESULT + psBUFFER
endproc
Listing 18: Similar to above but adds a specified separator string; assuming gasTOKENS is a string array to be
; concatenated together
proc MENDTOGETHER( const asSEP string ) string
var
psRESULT, psBUFFER string
piBFFRSIZE, piSEPSIZE, piITEMSIZE, II longint
endvar
psRESULT = blank()
psBUFFER = blank()
piBFFRSIZE = 0
piSEPSIZE = size(asSEP)
if size(gasTOKENS)>0 then psBUFFER = gasTOKENS[1] endif
for II from 2 to size(gasTOKENS)
piITEMSIZE = size(gasTOKENS[II])
if (piBFFRSIZE+piSEPSIZE+piITEMSIZE)>4095 then
if (piSEPSIZE+piITEMSIZE)>4095 then
if piITEMSIZE>4095 then
psRESULT = psRESULT+psBUFFER+asSEP+gasTOKENS[II]
psBUFFER = blank()
piBFFRSIZE = 0
else
psRESULT = psRESULT + psBUFFER + asSEP
psBUFFER = gasTOKENS[II]
piBFFRSIZE = piITEMSIZE
endif
else
psRESULT = psRESULT + psBUFFER
psBUFFER = asSEP + gasTOKENS[II]
piBFFRSIZE = piSEPSIZE+piITEMSIZE
endif
else
psBUFFER = psBUFFER + asSEP + gasTOKENS[II]
piBFFRSIZE = piBFFRSIZE+piSEPSIZE+piITEMSIZE
endif
endfor
return psRESULT + psBUFFER
endproc
5.2. Building Into a Textstream FileIf your processor speed, RAM size, and disk size are roughly in balance, or if you have increased your file buffer pool size, it is faster to build a long string in a text file using textstream than it is to build it in memory using a string variable. If you are constructing a long string that will end up in a file anyway, try to design your program to create the file directly. This is the usual case when emitting dynamic web pages. The syntax for writeString() allows you cite an array in the arg list to the writeString() method. This will write out all the elements in the array, but each one will be written on a separate line. To fully control the concatenation of the elements, you must iterate the array. Listing 19: Concatenating the contents of a string array gasTOKENS[] directly into a text file ; assuming gasTOKENS is a string array to be
; concatenated together assuming gtsDST has been opened
; globally as the output textstream
proc CONCAT_TOKENS_WRITE()
var II longint endvar
for II from 1 to size(gasTOKENS)
gtsDST'writeString( gasTOKENS[II] )
endfor
endproc
Use this code if you want to add a separator character between the elements.Listing 20: Similar to above but adds a specified separator string ; assuming gasTOKENS is a string array to be
; concatenated together assuming gtsDST has been opened
; globally as the output textstream
proc MENDTOGETHER_WRITE( const asSEP string )
var II longint endvar
gtsDST'writeString( gasTOKENS[1] )
for II from 2 to size(gasTOKENS)
gtsDST'writeString( asSEP, gasTOKENS[II] )
endfor
endproc
|
| < Prev | Next > |
|---|





