Regular Expression

 Varchar fields can be a drag if you do not have a constraint or any other rule what can be inserted and what not. But even with this, one must not omitt to such restrictions.

Imagine a typical Customer table with column Name.  Should I create restriction not allowing end user to insert signs such as hyphen, comma, semicolon or any other delimiter. People having two names might be agitated and might think "what a lousy application". Well we do not want that. :)

But for DBA this might be a nightmare when you want to return all the users with first name "Ana"; one might write endless combinations of all extra characters to include as many as possible (preferably all) users named Ana.

Imagine following situation.


--Create table
create table TBL_Customer
(CustID int
,name varchar(30)
,surname varchar(50)

--Populate table
insert into TBL_Customer
select 1, 'ana, ','doe' union all
select 2, 'ana-maria','doe smith' union all
select 3, 'mara-ana','johnson' union all
select 4, 'manara','jackson' union all
select 5, 'thomas','doe' union all
select 6, 'Maannaara','doubled' union all
select 7, 'Inna','justbe' union all
select 8, 'ana','smithson' union all
select 9, 'ana.mick','blacksmith' union all
select 10, 'ana; mick','smithscon' union all
select 11, ';ana,maria ','strichpunkt' union all
select 12, ' tom ','withspaces' union all
select 13, 'anko','justatebanana' union all
select 14, ',','zweig' 


I believe this is not far from reality and i'm 100% sure one might find even more "varieties" of all the garbage that comes into a single table.

Now, let us retrieve all the users named "Ana":

--solution #1
select * from TBL_Customer
    name not like 'ana'
and name not like 'ana-%'
and name not like 'ana %'
and name not like 'ana,%'
and name not like '%-ana'
and name not like '% ana'
and name not like 'ana.%'
and name not like 'ana;%'
and name not like ';ana%'

and this list might go on and on if one find more and more varieties of all the typos. To revert the thinking how to get or - in this case exclude - all the users named Ana. Instead of slow and hardcoding elimination, one can include all users called 'Ana' and exclude all that have anything following using a simple case of regular expression:

--solution #2
select * from TBL_Customer
   name like '%[a-zA-Z]ana[a-zA-Z]%'
or name not like '%ana%'


this would return same results, as well including all the names that also consist of word 'ana' ( manara). query costs against the server are lower in comparison with first solution, retrieval time is faster and execution plan is more accurate. Since this is just a sample case, working on a large-scale database, results should be even better with second solution.


No Comments