How to create a
multi-column table with adjacent words
insert into corpusBig(ID,textID,w1,w2,w3,w4,w5,w6,w7,w8,w9)
select
a.ID,a.textID,a.wordID,b.wordID,c.wordID,d.wordID,e.wordID,f.wordID,g.wordID,h.wordID,i.wordID
from corpus as a, corpus as b, corpus as c, corpus as d, corpus as e, corpus as
f, corpus as g, corpus as h, corpus as i where
a.ID between 1 and 1000000 and
a.ID+1 = b.ID and
a.ID+2 = c.ID and
a.ID+3 = d.ID and
a.ID+4 = e.ID and
a.ID+5 = f.ID and
a.ID+6 = g.ID and
a.ID+7 = h.ID and
a.ID+8 = i.ID
order by a.ID asc
You don't need to use the line in red above, but if you have a clustered
index on the ID column, and if you take the corpus in 1 million word groups
(1-1,000,000, then 1,000,001-2,000,000 etc) it will really speed things up. On a
fairly fast machine, you should be able to create a 440 million word, 9 column/word
table in 3-4 hours.
To make it even faster, you may want to iteratively copy the million or so
rows into a temp table, copy from there into [corpusBig], then delete from the
temp table, and do the next million rows, etc.
|