This blog will no longer be updated.

New content is available on my new blog

updating column with a random value - Piotr Rodak

updating column with a random value

I have been working recently on load testing of our ETL. One of the factors that determines amount of transfered data is a lookup table that is joined to the source. I populated this table with aproppriate values, and then realized that I also need some values in second column, and they were null. Let's say the table looks like that:

create table t101

(

a int,

b char(10)

)

go

insert t101(a, b)

select 1, ''

union all select 2, ''

union all select 3, ''

union all select 4, ''

union all select 5, ''

union all select 6, ''

union all select 7, ''

union all select 8, ''

go

Luckily I had another table that contained the values I needed. The problem was only, there were much less rows in the second table than in the first one. There is also no relation between both tables.

 

create table t102

(

someval char(10)

)

 

go

insert t102(someval)

select 'val 01'

union all select 'val 02'

union all select 'val 03'

union all select 'val 04'

union all select 'val 05'

union all select 'val 06'

union all select 'val 07'

union all select 'val 08'

union all select 'val 09'

go

Happily I wrote a simple update:

update t101 set b = (select top 1 someval from t102 order by newid())

I ran the query and then selected from the t101:

select * from t101

And the results are..

a    b
1    val 06   
2    val 06   
3    val 06   
4    val 06   
5    val 06   
6    val 06   
7    val 06   
8    val 06     

I was surprised, to say the least. Since I had something other to do, I left this as it was, letting my 'background threads' do the thinking. Of course, I figured out what was going on when I was walking home, wading in snow slush and struggling with wind.

The problem with above subquery is that it is not correlated. This means, its execution doesn't depend on the row the query produces output for. So the optimizer chooses to execute the query only once and apply the result to every row of the outer query. It is a random result, but only one for the whole set.

Ok, so what to do then? No, don't use cursor.

declare c cursor for select a from t101
declare @a int
open c
fetch next from c into @a
while @@fetch_status = 0
begin
        update t101 set b = (select top 1 someval from t102 order by newid())
        where a = @a
        fetch next from c into @a
end
close c
deallocate c

Seriously, don't use cursor.

You have to find a way to correlate the query even if there is no relationship between the data. I thought about two ways, but I am sure there may be more.

First option is to return value which would be a result of concatenation of value from table 2 and value from current row of table 1.:

update t101 set b = (select top 1 someval + case when b is null then '' else '' end from t102 order by newid())

As you see, I don't add effectively anything to someval. This query works, but is kind of too elaborate and complicated to write. So there is another option for you - add a filter that will always return true.

update t101 set b = (select top 1 someval from t102 where a = a order by newid())

 

As you see, as long as a value is not null, the subquery will return a result. And it will be a new result for each processed row.

I attach the code so you can play with this idea a bit more, maybe you will find simpler ways of generating random data than these above - if you do, please let me know.

 Testing update random value.zip

 

 

 

 

Published 13 January 2010 21:54 by Piotr Rodak

Comments

# updating column with a random value - Piotr Rodak Mobile

Pingback from  updating column with a random value - Piotr Rodak Mobile