Load Multiple (over 100,000) Files

From NazimWIKI
Jump to navigation Jump to search

Requirement is to load 100,000+ CSV files via SQL Loader. Files are located in an Unix directory.

Initially attempted to load the files through a single controlfile and it appeared to work, but the row counts just weren't matching. So devised a way to:

Dynamically Create 100,000 Control files for each of the files

To achieve this, I decided to first create the bulk of my controlfile which would remain static through the process. This is what the static controlfile which I saved as static.ctl looks like:

APPEND INTO TABLE <SCHEMA>.<TABLE_NAME>
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
(
 <FIELD_1>,
 <FIELD_2>,
 <FIELD_3>
)

I did a simple loop through the directory listing and as each <FILENAME>.CSV was listed, the process would create a controlfile with the format <FILENAME>.CSV.CTL

for f in *.CSV; 
  do echo "LOAD DATA" > $f.ctl; 
     echo "INFILE '"$f"'" >> $f.ctl; 
     cat static.ctl >> $f.ctl;
  done

The end-result is a controlfile for each CSV file that looks like:

LOAD DATA
INFILE '<filename>.CSV'
APPEND INTO TABLE <SCHEMA>.<TABLE_NAME>
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
(
 <FIELD_1>,
 <FIELD_2>,
 <FIELD_3>
)

So, now, for the 100,000 CSV files, I have a corresponding CTL file.

Now instead of looping through the directory listing and directly invoking sqlldr, I'm going to loop through and create a dynamic script containing the 100,000 sqlldr commands. This allows me to run it in various ways. I could schedule it. I could split and run them at different times. I could review what I'm going to run etc etc.

To achieve this I have done:

for f in *.CSV; 
  do echo "sqlldr <username>/<password>@<DATABASE> control="$f.ctl log=$f.log errors=20000 >> load_data.ksh; 
 done


The shell file load_data.ksh is now ready to run.