| Email Address Validation |
|
|
|
| Contributed by Rick Kelly | |
| 30 April 2004 | |
|
Rick provides us with this article and sample code for the challenging task of validating email addresses.Email Address Validation © 2004 Rick Kelly www.crooit.com Preface The example OPAL (Paradox® 9) presented in this article is available as a download here. After downloading into the folder of your choice, make that folder :WORK: and run the included script for a demonstration. Introduction An interesting aspect of email addresses is that there does not exist an established set of rules for validation. In developing an OPAL based validator, there is a balancing act between how tight or how open the validation process will be. The approach taken here is not the only methodology that could be applied and the process outlined does have a solid technical foundation based on the published RFC standards for SMTP and POP3 mail protocols. Note that the validation is for syntax only and a logical followup would be to connect to the domain mail server (MX record via DNS) for additional validity checking. This connection session does take time and a deployment decision by the developer is necessary to evaluate requirements and trade-offs. Email Address Syntax The basic email address syntax structure is: <account@domain> The bounding <> pair is optional and does not affect the general syntax and are stripped out if found. A leading account, delimited by a @ character, precedes a domain. An account and domain are both required for a valid email address. Account After review of RFC 822 (http://www.faqs.org/rfcs/rfc822.html), it seems that the account portion can potentially contain a wide range of character values. Although there are some rules covering syntax, our validator will only ensure that some account is present and that it is terminated by the last @ character found. This means that the account itself could contain @ and we will have to design the account validator portion to take that into account. The first steps for our validator are to separate the account and domain portions using the last @ separator found. The OPAL methods breakApart() and searchEx() will be the main agents. The first obstacle is to locate the last @ character. Since searchEx() scans from left to right, it seems that some sort of repetitive loop would be necessary. Rather than loop through, we will reverse the entire presented email address and one searchEx() will locate the correct @ character. In effect, after reversal, we are searching from right to left which is exactly what we want in this case. Along the way, we will check for missing or empty account and/or domain segments. A generic string reversal procedure might look like: Proc cmReverseString(var stInput String) String
;
; This function takes an input string and reverses it
;
var
stOutput String
liIndex LongInt
liSize LongInt
endVar
stOutput = blank()
liSize = stInput.sizeEx()
switch
case liSize > 0 :
for liIndex from liSize to 1 step -1
stOutput = stOutput
+ stInput.substr(liIndex,1)
endFor
endSwitch
return stOutput
endProc
One additional feature in the validator will be the return of error codes of the LongInt Type that can be used to pinpoint the problem and build custom error messages.Now that we have a string reversal procedure, the extraction and separation of the account and domain looks like: Proc cmSeparateAccountAndDomain(
var stEmailAddress String,
var stEmailAccount String,
var stEmailDomain String,
var liError LongInt) Logical
;
; Given an email address, separate and return the
; account and domain portions.
;
; The leading account is separated from the domain portion
; by the rightmost @ character.
;
var
loReturn Logical
stAny String
liPosition LongInt
endVar
loReturn = False
stEmailAccount = blank()
stEmailDomain = blank()
;
; Strip leading and trailing white space
;
stEmailAddress = stEmailAddress.rTrim()
stEmailAddress = stEmailAddress.lTrim()
switch
;
; Missing email address
;
case stEmailAddress.isBlank() = True or
stEmailAddress = "<>" :
liError = 1
;
; @ account/domain separator found?
;
case stEmailAddress.searchEx("@") = 0 :
liError = 2
otherwise :
;
; The address may be encapsulated by < and > and
; those will be removed if found and any leading
; or trailing white space removed.
;
switch
case stEmailAddress.substr(1,1) = "<" and
stEmailAddress.substr(stEmailAddress.sizeEx(),1) = ">" :
stEmailAddress = stEmailAddress.substr(2,stEmailAddress.sizeEx() - 2)
stEmailAddress = stEmailAddress.rTrim()
stEmailAddress = stEmailAddress.lTrim()
endSwitch
;
; To determine the position of the last @ character, we
; will reverse the string, locate the first @ character
; and calculate the position.
;
stAny = cmReverseString(stEmailAddress)
liPosition = stAny.searchEx("@")
liPosition = stAny.sizeEx() - liPosition + 1
switch
;
; If the @ character is at the end or beginning of the address,
; the address is invalid
;
case liPosition = 1 :
liError = 3
case liPosition = stAny.sizeEx() :
liError = 4
otherwise :
stEmailAccount = stEmailAddress.substr(1,liPosition - 1)
stEmailDomain = stEmailAddress.substr(liPosition + 1,
stEmailAddress.sizeEx() - liPosition)
loReturn = True
endSwitch
endSwitch
return loReturn
endProc
DomainAt this point, we have separated the account and domain address components and validated that both are available. Any leading or trailing white space and/or encapsulating <> characters have been removed. The domain is where a majority of the validation work is performed. The domain consists of one or more sub-domains separated by one dot (.) each of which must contains one or more of the allowable characters 0-9, a-z and dash (-). The right most sub domain found is considered to be a top level domain (TLD). Domain Syntax: subdomain.subdomain.TLD Each "subdomain" shown above is optional, only the TLD is required. The Internet Corporation for Assigned Names and Numbers (ICANN) maintains a list of TLD’s at: http://www.icann.org/tlds/ To maximize flexibility for our validator, we will use a standard windows profile (*.ini) file to store TLD’s for validation that can be customized and/or maintained for each application reference. [TLDS] LastTLDId=257 1=ac 2=ad ... 257=biz The TLD’s are loaded into an Array of type String using the following procedure. Type
arString = Array[] String
endType
Proc cmRetrieveTLDValidators(var arTLD arString,
stTLDFileName String)
;
; Parse the TLD.ini profile file for valid TLD's
; to validate against
;
var
liTotalTLD LongInt
liTLDID LongInt
stSection String
stTLD String
endVar
arTLD.empty()
stSection = "TLDS"
try
liTotalTLD = longInt(cmReadINI(stTLDFileName,stSection,"LastTLDId"))
onFail
liTotalTLD = 0
errorClear()
endTry
switch
case liTotalTLD > 0 :
for liTLDID from 1 to liTotalTLD
stTLD = lower(cmReadINI(stTLDFileName,stSection,strval(liTLDID)))
switch
case stTLD.isBlank() = False :
arTLD.addLast(stTLD)
endSwitch
endFor
endSwitch
endProc
Proc cmReadINI(stINIFile String,stSection String,stKey String) String
return readProfileString(stINIFile, stSection, stKey)
endProc
Each sub-domain and TLD will also be scanned to see that only the characters 0-9, a-z and dash (-) are found.
Proc cmDomainValidCharacters(var stDomain String) Logical
;
; Check that a domain or sub domain contains only the
; characters a-z, 0-9 or -
;
var
loReturn Logical
liIndex LongInt
stChar String
endVar
loReturn = True
for liIndex from 1 to stDomain.sizeEx()
stChar = stDomain.substr(liIndex,1)
switch
case (stChar >= "a" and stChar <= "z") or
(stChar >= "0" and stChar <= "9") or
stChar = "-" :
otherwise :
liIndex = stDomain.sizeEx()
loReturn = False
endSwitch
endFor
return loReturn
endProc
Our validator will also provide for validation with a TLD required and also where TLD validation is optional – i.e. myname@mydomain which is a valid email address if TLD validation is optional. This flexibility is useful when internal email networks assume and append a TLD.Our domain validation procedure looks like this: Proc cmDomainValidation(
var stEmailDomain String,
var arSubDomains arString,
var stTopLevelDomain String,
var arTLD arString,
var liError LongInt,
loSubDomainsRequired Logical) Logical
;
; Validate the domain portion of an email address
;
; Rules are:
;
; 1. The domain is split into one or more sub-domains
; 2. Sub domains are always separated by a period ('.')
; 3. Sub domain valid character set is a-z, 0-9 and -
; 4. Each sub domain must be at least one character is size
; 5. The last sub domain might be a top level domain name
;
var
loReturn Logical
stDomain String
arDomains arString
liIndex LongInt
stAny String
liTotalDomains LongInt
endVar
loReturn = True
arSubDomains.empty()
stTopLevelDomain = blank()
;
; Check if the domain ends with a period
;
switch
case stEmailDomain.substr(stEmailDomain.sizeEx(),1) = "." :
loReturn = False
liError = 8
otherwise :
;
; Put raw domain address portion in lower case to allow
; easier validation
;
stDomain = stEmailDomain.lower()
stDomain.breakApart(arDomains,".")
liTotalDomains = arDomains.size()
for liIndex from 1 to liTotalDomains
stAny = arDomains[liIndex]
switch
;
; If an empty string is found, there must have
; been two or more consecutive periods in the
; domain address portion
;
case stAny.isBlank() = True :
liIndex = liTotalDomains
loReturn = False
liError = 5
;
; Validate domain character set values
;
case cmDomainValidCharacters(stAny) = False :
liIndex = liTotalDomains
loReturn = False
liError = 6
;
; If this is last possible sub domain, treat it as
; a possible top level
;
case liIndex = liTotalDomains :
stTopLevelDomain = stAny
;
; Save sub domain
;
otherwise :
arSubDomains.addLast(stAny)
endSwitch
endFor
;
; Validate against TLD list
;
switch
case loReturn = False :
case arSubDomains.size() = 0 and loSubDomainsRequired = True :
loReturn = False
liError = 7
otherwise :
loReturn = arTLD.contains(stTopLevelDomain)
liError = iif(loReturn = True,0,8)
endSwitch
endSwitch
return loReturn
endProc
Now that we have all the individual validation procedures defined, here is an example of how they might be used.
method ValidationErrorDescription(var liError LongInt) String
;
; Return validation error description
;
var
stErrorMessage String
endVar
switch
case liError = 1 :
stErrorMessage = "No email address was found."
case liError = 2 or liError = 4 :
stErrorMessage = "Email address domain is missing."
case liError = 3 :
stErrorMessage = "Email address account is missing."
case liError = 5 :
stErrorMessage = "Email address domain cannot be blank."
case liError = 6 :
stErrorMessage = "Email address domain can only use a-z, 0-9 and -."
case liError = 7 :
stErrorMessage = "Email address sub domain is missing."
case liError = 8 :
stErrorMessage = "Email address top level domain is invalid."
otherwise :
stErrorMessage = "Unknown ("
+ strval(liError)
+ ") email address validation error"
endSwitch
return stErrorMessage
endMethod
Putting everything together:
var
stEmailAddress String
stEmailAccount String
stEmailDomain String
arSubDomains arString
stTopLevelDomain String
arTLD arString
liError LongInt
endVar
cmRetrieveTLDValidators(arTLD,":WORK:TLD.ini")
stEmailAddress = "myname@myisp.net"
switch
case EmailAddressValidation(stEmailAddress,
stEmailAccount,
arSubDomains,
stTopLevelDomain,
arTLD,
liError,
True) = False :
msgStop("Email address validation failed",
stEmailAddress +
"\n\n" +
ValidationErrorDescription(liError))
otherwise :
msgInfo(stEMailAddress,"Email address format validated successfully")
endSwitch
Conclusion We now have methods that provide basic syntax validation of email addresses. Use the parts that work for you. If you add or improve to what is shown here, share it with the rest of us. From my Paradox toolbox to yours! Rick Kelly |
| < Prev | Next > |
|---|





