logo

Previous Thread:   Variable Issue

1/26/2006 8:43:06 AM    reg expression
I'm trying to write an script that does a check of each line in a text  
  
file for all non-alphabetic characters.......   I know it can be done  
  
using the reg exp.....  
  
can anyone provide some examples of how this can be done........I have  
  
a look through some but they all seem so complicated and I haven't done  
  
much scripting in this area...........  
  
thaks in advance

1/26/2006 3:30:38 PM    Re: reg expression
Try this:  
  
Option Explicit  
  
Dim objRegEx, strLine, objMatches, strMatch  
  
Set objRegEx = CreateObject("VBScript.RegExp")  
  
objRegEx.Global = True  
  
objRegEx.Pattern = "[^a-z,^A-Z]"  
  
strLine = "1%3a4*D8fg9"  
  
Set objMatches = objRegEx.Execute(strLine)  
  
If objMatches.Count > 0 Then  
  
For Each strMatch in objMatches  
  
WScript.Echo strMatch.Value  
  
Next  
  
End If  
  
Set objMatches = Nothing  
  
Set objRegEx = Nothing  
  
Here are the various 'Character Usage':  
  
*  Matches the previous character zero or more times  
  
+  Matches the previous character one or more times  
  
?  Matches the previous character zero or one times  
  
..  Matches any single character except the newline  
  
^  Matches the start of the input  
  
$  Matches the end of the input  
  
x|y  Matches either first or second character listed  
  
(pattern)  Matches pattern  
  
{number}  Matches exactly number times  
  
{number,}  Matches number, or more, times (note comma)  
  
{num1, num2}  Matches at least num1 and at most num2 times  
  
[abc]  Matches any character listed between the [ ]  
  
[^abc]  Matches all characters except those listed between the [ ]  
  
[a-e]  Matches any characters in the specified range (a,b,c,d,e)  
  
[^K-Q]  Matches all characters except in the specified range  
  
\  Signifies that the next character is special or a literal.  
  
\b  Matches only on a word boundary  
  
\B  Matches only inside a word  
  
\d  Matches only on a digit  
  
\D  Matches only on a non-digit  
  
\f  Matches only on a form feed character  
  
\n  Matches only on a new line  
  
\r  Matches only on a carriage return  
  
\s  Matches only on a blank space  
  
\S  Matches only on nonblank spaces  
  
\t  Matches only on a tab  
  
\v  Matches only on a vertical tab  
  
\w  Matches only on A to Z, a to z, 0 to 9, and _  
  
\W  Matches characters other than A to Z, a to z, 0 to 9, and _  
  
\number  Matches any positive number  
  
\octal  Matches any octal number  
  
\xhex  Matches any hexadecimal number (x is required)  
  
"Star" <momo2804@gmail.com> wrote in message  
  
news:1138293786.951167.150060@g49g2000cwa.googlegroups.com...

1/26/2006 5:48:07 PM    Re: reg expression
"Star" <momo2804@gmail.com> wrote in message  
  
news:1138293786.951167.150060@g49g2000cwa.googlegroups.com...  
  
Are you wanting to simply identify if there are any non-alphabetic  
  
characters in the line, return only the non-alphabetic characters or discard  
  
the non-alphabetic characters?  
  
Below are some small regular expression function code examples to  
  
accomplish each of these.  
  
Just identify if line contains non-alphabetic characters:  
  
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  
Dim sLine  
  
sLine = "a5bc%(#5gcv=_+"  
  
MsgBox IsAlpha(sLine)  
  
sLine = "Alphabetic"  
  
MsgBox IsAlpha(sLine)  
  
Function IsAlpha(sLine)  
  
'Returns True is all chars are alphabetic, otherwise False  
  
Dim oRegEx  
  
Set oRegEx = CreateObject("VBScript.RegExp")  
  
oRegEx.Global = True  
  
oRegEx.Pattern = "[^a-zA-Z]"  
  
IsAlpha = Not oRegEx.Test(sLine)  
  
End Function  
  
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  
Strip all non-alphabetic characters from the string:  
  
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  
Dim sLine  
  
sLine = "a5bc%(#5gcv=_+"  
  
MsgBox Only_Alpha(sLine)  
  
Function Only_Alpha(sLine)  
  
'Strips all non-alphabetic characters  
  
Dim oRegEx  
  
Set oRegEx = CreateObject("VBScript.RegExp")  
  
oRegEx.Global = True  
  
oRegEx.Pattern = "[^a-zA-Z]"  
  
Only_Alpha = oRegEx.Replace(sLine, Empty)  
  
End Function  
  
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  
Strip all alphabetic characters from the string:  
  
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  
Dim sLine  
  
sLine = "a5bc%(#5gcv=_+"  
  
MsgBox DiscardAlpha(sLine)  
  
Function DiscardAlpha(sLine)  
  
'Strips all non-alphabetic characters  
  
Dim oRegEx  
  
Set oRegEx = CreateObject("VBScript.RegExp")  
  
oRegEx.Global = True  
  
oRegEx.Pattern = "[a-zA-Z]"  
  
DiscardAlpha = oRegEx.Replace(sLine, Empty)  
  
End Function  
  
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  
If hope one of these answers your request.

1/26/2006 7:08:12 PM    Re: reg expression
James,  
  
Your spot on what I actually want is to identify the non-alphanumeric  
  
characters in the lines this is because ......... were expecting the  
  
lines to containt some foreign language words ie. japanes/korea and we  
  
want to list them into a text file............  
  
Thanks

1/26/2006 7:13:46 PM    Re: reg expression
FYI - The VBScript engine has "insider knowledge" about the RegExp component  
  
and can be simplified to...  
  
Set oRegEx = New RegExp  
  
--  
  
Michael Harris  
  
Microsoft MVP Scripting

1/26/2006 9:54:47 PM    Re: reg expression
"Star" <momo2804@gmail.com> wrote in message  
  
news:1138331292.712076.61680@g44g2000cwa.googlegroups.com...  
  
In your original message, you mentioned "all non-alphabetic characters".  
  
In your last message you stated "non-alphanumeric characters". Please  
  
consider that the code I posted for identifying and removing non-alphabetic  
  
does just that. If there are numbers, spaces, punctuation, etc, the result  
  
of the 'IsAlpha' will be false. Do you want to expand this to include  
  
numbers, spaces and normal punctuation?  
  
I do not have any unicode text files on hand with Japanese or Korean text.  
  
If you would like some additional help with this, please attach a sample  
  
text file to a reply. I have no actual experience in regards to foreign  
  
characters, but I would assume they would be in unicode and have an 'AscW'  
  
value above 255.

1/26/2006 11:41:32 PM    Re: reg expression
Actually what I need is to get the script to read a text file with dir  
  
paths and list down those paths that containt non-aphabetic  
  
characters........ since I would beleive the  
  
test will detect foreign characters Chinese/Japanese as  
  
non-aplhabetic..... here is a sample from the text file....  
  
data\  
  
data\  
  
data\testpath  
  
data\testpath  
  
data\testpath  
  
data\pathII\BUDGET  
  
data\pathII\BUDGET\CAPBUD00\PURCHASE\=E7=9B=B8=E6=A9=9Ffuction=E5=95=8F=E9=  
  
=A1=8C  
  
data\pathII\BUDGET\CAPBUD02\AREAMGT\=E4=BB=B2=E7=88=AD=E5=80=8B DK-21M D200=  
  
=E5=8A=9F  
  
data\pathII\BUDGET\CAPBUD02\MACAU OFFICE  
  
data\pathII\BUDGET\CAPBUD02\=E4=BB=B2=E7=88=AD=E5=80=8B DK-21M D200 =E5=8A=  
  
=9F  
  
data\pathII\BUDGET\CAPBUD02\PM&S  
  
data\pathII\BUDGET\FY02\=E8=94=A1=E5=B1=8B=E5=9C=8D  
  
data\pathII\BUDGET\FY02\MACAU  
  
data\pathII\BUDGET\FY02\PM&S=E8=BE=A3=E6=A4=92=E5=B0=8F=E6=8F=90=E7=A4=BA=  
  
=E5=85=89=E5=8D=80=E5=AE=9A=E7=BE=A9=E9=87=8D=E7=94=B3  
  
really appreciate your generous help..................

1/27/2006 7:06:46 AM    Re: reg expression
Thanks alot Alex..........  I'll give it a try and see  
  
..................  
  
Cheers.............

1/27/2006 12:54:27 PM    Re: reg expression
Star schrieb:  
  
SayHUC "data\"  
  
SayHUC "data\testpath"  
  
SayHUC "data\pathII\BUDGET"  
  
SayHUC "data\pathII\BUDGET\CAPBUD00\PURCHASE\相機fuction問題"  
  
SayHUC "data\pathII\BUDGET\CAPBUD02\AREAMGT\仲爭個 DK-21M D200 功"  
  
SayHUC "data\pathII\BUDGET\CAPBUD02\MACAU OFFICE"  
  
Function HasUnicode(str)  
  
Set re = new RegExp  
  
'matches all UTF-16(aka unicode)-chars  
  
re.pattern = "[\u0100-\u9999]"  
  
HasUnicode = re.test(str)  
  
End Function  
  
Sub SayHUC(str)  
  
MsgBox "String: " & str & vbcr _  
  
& "HasUnicode: " & CStr(HasUnicode(str))  
  
End Sub  
  
Mfg,  
  
Alex

2/2/2006 8:51:55 AM    Re: reg expression
Alex..........  
  
I tried the above script by creating a MsgBox and typing in some values  
  
to test such as =E4=BB=B2=E7=88=AD=E5=80=8B  
  
however it doesn't seem to work...........am I doing something wrong?  
  
Function HasUnicode(str)  
  
Set re =3D new RegExp  
  
'matches all UTF-16(aka unicode)-chars  
  
re.pattern =3D "[\u0100-\u9999]"  
  
HasUnicode =3D re.test(str)  
  
End Function  
  
Sub SayHUC(str)  
  
MsgBox "String: " & str & vbcr _  
  
& "HasUnicode: " & CStr(HasUnicode(str))=20  
  
End Sub

2/7/2006 1:21:29 PM    Re: reg expression
Star schrieb:  
  
If you save the script be sure you save it as UNICODE, not ANSI (or  
  
ASCII or UTF-8) which requires an editor that allows handling of  
  
CharWidth/CharSet, Notepad for XP has such an option.  
  
After all this is only a script for testing. If you want to check the  
  
filepathes from your textfile to contain Unicode-characters (whose  
  
charcode is > 255) simply read the filelist line by line.  
  
Mfg,  
  
Alex  
  
'# This one works for me, if filelist contains unicode-chars > 255  
  
Option Explicit  
  
Dim fs, reader  
  
Const strTextFile = "C:\filelist.txt"  
  
set fs = createobject("scripting.filesystemobject")  
  
set reader = fs.OpenTextFile(strTextFile, 1, False, -1) '-1 = As Unicode  
  
while not reader.AtEndOfStream  
  
sayHasUniCode reader.readline  
  
wend  
  
reader.close  
  
Function HasUnicode(str)  
  
Dim re  
  
Set re = new RegExp  
  
'matches all UTF-16(aka unicode)-characters  
  
'bigger then ansi-range (0-255)  
  
re.pattern = "[\u0100-\u9999]"  
  
HasUnicode = re.test(str)  
  
End Function  
  
Sub sayHasUniCode(str)  
  
MsgBox "String: " & str & vbcr & "HasUnicode: " _  
  
& CStr(HasUnicode(str))  
  
End Sub