|
| Previous Thread: Variable Issue |
|
|
1/26/2006 8:43:06 AM reg expression |
I'm trying to write an script that does a check of each line in a text
file for all non-alphabetic characters....... I know it can be done
using the reg exp.....
can anyone provide some examples of how this can be done........I have
a look through some but they all seem so complicated and I haven't done
much scripting in this area...........
thaks in advance
|
|
|
|
|
1/26/2006 3:30:38 PM Re: reg expression |
Try this:
Option Explicit
Dim objRegEx, strLine, objMatches, strMatch
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.Pattern = "[^a-z,^A-Z]"
strLine = "1%3a4*D8fg9"
Set objMatches = objRegEx.Execute(strLine)
If objMatches.Count > 0 Then
For Each strMatch in objMatches
WScript.Echo strMatch.Value
Next
End If
Set objMatches = Nothing
Set objRegEx = Nothing
Here are the various 'Character Usage':
* Matches the previous character zero or more times
+ Matches the previous character one or more times
? Matches the previous character zero or one times
.. Matches any single character except the newline
^ Matches the start of the input
$ Matches the end of the input
x|y Matches either first or second character listed
(pattern) Matches pattern
{number} Matches exactly number times
{number,} Matches number, or more, times (note comma)
{num1, num2} Matches at least num1 and at most num2 times
[abc] Matches any character listed between the [ ]
[^abc] Matches all characters except those listed between the [ ]
[a-e] Matches any characters in the specified range (a,b,c,d,e)
[^K-Q] Matches all characters except in the specified range
\ Signifies that the next character is special or a literal.
\b Matches only on a word boundary
\B Matches only inside a word
\d Matches only on a digit
\D Matches only on a non-digit
\f Matches only on a form feed character
\n Matches only on a new line
\r Matches only on a carriage return
\s Matches only on a blank space
\S Matches only on nonblank spaces
\t Matches only on a tab
\v Matches only on a vertical tab
\w Matches only on A to Z, a to z, 0 to 9, and _
\W Matches characters other than A to Z, a to z, 0 to 9, and _
\number Matches any positive number
\octal Matches any octal number
\xhex Matches any hexadecimal number (x is required)
"Star" <momo2804@gmail.com> wrote in message
news:1138293786.951167.150060@g49g2000cwa.googlegroups.com...
|
|
|
1/26/2006 5:48:07 PM Re: reg expression |
"Star" <momo2804@gmail.com> wrote in message
news:1138293786.951167.150060@g49g2000cwa.googlegroups.com...
Are you wanting to simply identify if there are any non-alphabetic
characters in the line, return only the non-alphabetic characters or discard
the non-alphabetic characters?
Below are some small regular expression function code examples to
accomplish each of these.
Just identify if line contains non-alphabetic characters:
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dim sLine
sLine = "a5bc%(#5gcv=_+"
MsgBox IsAlpha(sLine)
sLine = "Alphabetic"
MsgBox IsAlpha(sLine)
Function IsAlpha(sLine)
'Returns True is all chars are alphabetic, otherwise False
Dim oRegEx
Set oRegEx = CreateObject("VBScript.RegExp")
oRegEx.Global = True
oRegEx.Pattern = "[^a-zA-Z]"
IsAlpha = Not oRegEx.Test(sLine)
End Function
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Strip all non-alphabetic characters from the string:
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dim sLine
sLine = "a5bc%(#5gcv=_+"
MsgBox Only_Alpha(sLine)
Function Only_Alpha(sLine)
'Strips all non-alphabetic characters
Dim oRegEx
Set oRegEx = CreateObject("VBScript.RegExp")
oRegEx.Global = True
oRegEx.Pattern = "[^a-zA-Z]"
Only_Alpha = oRegEx.Replace(sLine, Empty)
End Function
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Strip all alphabetic characters from the string:
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dim sLine
sLine = "a5bc%(#5gcv=_+"
MsgBox DiscardAlpha(sLine)
Function DiscardAlpha(sLine)
'Strips all non-alphabetic characters
Dim oRegEx
Set oRegEx = CreateObject("VBScript.RegExp")
oRegEx.Global = True
oRegEx.Pattern = "[a-zA-Z]"
DiscardAlpha = oRegEx.Replace(sLine, Empty)
End Function
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If hope one of these answers your request.
|
|
|
1/26/2006 7:08:12 PM Re: reg expression |
James,
Your spot on what I actually want is to identify the non-alphanumeric
characters in the lines this is because ......... were expecting the
lines to containt some foreign language words ie. japanes/korea and we
want to list them into a text file............
Thanks
|
|
|
1/26/2006 7:13:46 PM Re: reg expression |
FYI - The VBScript engine has "insider knowledge" about the RegExp component
and can be simplified to...
Set oRegEx = New RegExp
--
Michael Harris
Microsoft MVP Scripting
|
|
|
1/26/2006 9:54:47 PM Re: reg expression |
"Star" <momo2804@gmail.com> wrote in message
news:1138331292.712076.61680@g44g2000cwa.googlegroups.com...
In your original message, you mentioned "all non-alphabetic characters".
In your last message you stated "non-alphanumeric characters". Please
consider that the code I posted for identifying and removing non-alphabetic
does just that. If there are numbers, spaces, punctuation, etc, the result
of the 'IsAlpha' will be false. Do you want to expand this to include
numbers, spaces and normal punctuation?
I do not have any unicode text files on hand with Japanese or Korean text.
If you would like some additional help with this, please attach a sample
text file to a reply. I have no actual experience in regards to foreign
characters, but I would assume they would be in unicode and have an 'AscW'
value above 255.
|
|
|
1/26/2006 11:41:32 PM Re: reg expression |
Actually what I need is to get the script to read a text file with dir
paths and list down those paths that containt non-aphabetic
characters........ since I would beleive the
test will detect foreign characters Chinese/Japanese as
non-aplhabetic..... here is a sample from the text file....
data\
data\
data\testpath
data\testpath
data\testpath
data\pathII\BUDGET
data\pathII\BUDGET\CAPBUD00\PURCHASE\=E7=9B=B8=E6=A9=9Ffuction=E5=95=8F=E9=
=A1=8C
data\pathII\BUDGET\CAPBUD02\AREAMGT\=E4=BB=B2=E7=88=AD=E5=80=8B DK-21M D200=
=E5=8A=9F
data\pathII\BUDGET\CAPBUD02\MACAU OFFICE
data\pathII\BUDGET\CAPBUD02\=E4=BB=B2=E7=88=AD=E5=80=8B DK-21M D200 =E5=8A=
=9F
data\pathII\BUDGET\CAPBUD02\PM&S
data\pathII\BUDGET\FY02\=E8=94=A1=E5=B1=8B=E5=9C=8D
data\pathII\BUDGET\FY02\MACAU
data\pathII\BUDGET\FY02\PM&S=E8=BE=A3=E6=A4=92=E5=B0=8F=E6=8F=90=E7=A4=BA=
=E5=85=89=E5=8D=80=E5=AE=9A=E7=BE=A9=E9=87=8D=E7=94=B3
really appreciate your generous help..................
|
|
|
1/27/2006 7:06:46 AM Re: reg expression |
Thanks alot Alex.......... I'll give it a try and see
..................
Cheers.............
|
|
|
1/27/2006 12:54:27 PM Re: reg expression |
Star schrieb:
SayHUC "data\"
SayHUC "data\testpath"
SayHUC "data\pathII\BUDGET"
SayHUC "data\pathII\BUDGET\CAPBUD00\PURCHASE\相機fuctionå•題"
SayHUC "data\pathII\BUDGET\CAPBUD02\AREAMGT\仲çˆå€‹ DK-21M D200 功"
SayHUC "data\pathII\BUDGET\CAPBUD02\MACAU OFFICE"
Function HasUnicode(str)
Set re = new RegExp
'matches all UTF-16(aka unicode)-chars
re.pattern = "[\u0100-\u9999]"
HasUnicode = re.test(str)
End Function
Sub SayHUC(str)
MsgBox "String: " & str & vbcr _
& "HasUnicode: " & CStr(HasUnicode(str))
End Sub
Mfg,
Alex
|
|
|
2/2/2006 8:51:55 AM Re: reg expression |
Alex..........
I tried the above script by creating a MsgBox and typing in some values
to test such as =E4=BB=B2=E7=88=AD=E5=80=8B
however it doesn't seem to work...........am I doing something wrong?
Function HasUnicode(str)
Set re =3D new RegExp
'matches all UTF-16(aka unicode)-chars
re.pattern =3D "[\u0100-\u9999]"
HasUnicode =3D re.test(str)
End Function
Sub SayHUC(str)
MsgBox "String: " & str & vbcr _
& "HasUnicode: " & CStr(HasUnicode(str))=20
End Sub
|
|
|
2/7/2006 1:21:29 PM Re: reg expression |
Star schrieb:
If you save the script be sure you save it as UNICODE, not ANSI (or
ASCII or UTF-8) which requires an editor that allows handling of
CharWidth/CharSet, Notepad for XP has such an option.
After all this is only a script for testing. If you want to check the
filepathes from your textfile to contain Unicode-characters (whose
charcode is > 255) simply read the filelist line by line.
Mfg,
Alex
'# This one works for me, if filelist contains unicode-chars > 255
Option Explicit
Dim fs, reader
Const strTextFile = "C:\filelist.txt"
set fs = createobject("scripting.filesystemobject")
set reader = fs.OpenTextFile(strTextFile, 1, False, -1) '-1 = As Unicode
while not reader.AtEndOfStream
sayHasUniCode reader.readline
wend
reader.close
Function HasUnicode(str)
Dim re
Set re = new RegExp
'matches all UTF-16(aka unicode)-characters
'bigger then ansi-range (0-255)
re.pattern = "[\u0100-\u9999]"
HasUnicode = re.test(str)
End Function
Sub sayHasUniCode(str)
MsgBox "String: " & str & vbcr & "HasUnicode: " _
& CStr(HasUnicode(str))
End Sub
|
|