updating column with a random value
I have been working recently on load testing of our ETL. One of the factors that determines amount of transfered data is a lookup table that is joined to the source. I populated this table with aproppriate values, and then realized that I also need some values in second column, and they were null. Let's say the table looks like that:
create
table
t101
(
a
int,
b
char(10)
)
go
insert
t101(a,
b)
select
1,
''
union
all
select
2,
''
union
all
select
3,
''
union
all
select
4,
''
union
all
select
5,
''
union
all
select
6,
''
union
all
select
7,
''
union
all
select
8,
''
go
Luckily I had another table that contained the values I needed. The problem was only, there were much less rows in the second table than in the first one. There is also no relation between both tables.
create
table
t102
(
someval
char(10)
)
go
insert
t102(someval)
select
'val
01'
union
all
select
'val
02'
union
all
select
'val
03'
union
all
select
'val
04'
union
all
select
'val
05'
union
all
select
'val
06'
union
all
select
'val
07'
union
all
select
'val
08'
union
all
select
'val
09'
go
Happily I wrote a simple update:
update
t101 set
b =
(select
top
1 someval from
t102 order
by
newid())
I ran the query and then selected from the t101:
select
*
from
t101
And the results are..
a b
1 val 06
2 val 06
3 val 06
4 val 06
5 val 06
6 val 06
7 val 06
8 val 06
I was surprised, to say the least. Since I had something other to do, I left this as it was, letting my 'background threads' do the thinking. Of course, I figured out what was going on when I was walking home, wading in snow slush and struggling with wind.
The problem with above subquery is that it is not correlated. This means, its execution doesn't depend on the row the query produces output for. So the optimizer chooses to execute the query only once and apply the result to every row of the outer query. It is a random result, but only one for the whole set.
Ok, so what to do then? No, don't use cursor.
declare c cursor for select a from t101
declare @a int
open c
fetch next from c into @a
while @@fetch_status = 0
begin
update t101 set b = (select top 1 someval from t102 order by newid())
where a = @a
fetch next from c into @a
end
close c
deallocate c
Seriously, don't use cursor.
You have to find a way to correlate the query even if there is no relationship between the data. I thought about two ways, but I am sure there may be more.
First option is to return value which would be a result of concatenation of value from table 2 and value from current row of table 1.:
update
t101 set
b =
(select
top
1 someval +
case
when
b is
null
then
''
else
''
end
from
t102 order
by
newid())
As you see, I don't add effectively anything to someval. This query works, but is kind of too elaborate and complicated to write. So there is another option for you - add a filter that will always return true.
update
t101 set
b =
(select
top
1 someval from
t102 where
a =
a order
by
newid())
As you see, as long as a value is not null, the subquery will return a result. And it will be a new result for each processed row.
I attach the code so you can play with this idea a bit more, maybe you will find simpler ways of generating random data than these above - if you do, please let me know.
Testing update random value.zip