Load Multiple (over 100,000) Files
Requirement is to load 100,000+ CSV files via SQL Loader. Files are located in an Unix directory.
Initially attempted to load the files through a single controlfile and it appeared to work, but the row counts just weren't matching. So devised a way to:
Dynamically Create 100,000 Control files for each of the files
To achieve this, I decided to first create the bulk of my controlfile which would remain static through the process. This is what the static controlfile which I saved as static.ctl looks like:
APPEND INTO TABLE <SCHEMA>.<TABLE_NAME> FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ( <FIELD_1>, <FIELD_2>, <FIELD_3> )
I did a simple loop through the directory listing and as each <FILENAME>.CSV was listed, the process would create a controlfile with the format <FILENAME>.CSV.CTL
for f in *.CSV; do echo "LOAD DATA" > $f.ctl; echo "INFILE '"$f"'" >> $f.ctl; cat static.ctl >> $f.ctl; done
The end-result is a controlfile for each CSV file that looks like:
LOAD DATA INFILE '<filename>.CSV' APPEND INTO TABLE <SCHEMA>.<TABLE_NAME> FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ( <FIELD_1>, <FIELD_2>, <FIELD_3> )
So, now, for the 100,000 CSV files, I have a corresponding CTL file.
Now instead of looping through the directory listing and directly invoking sqlldr, I'm going to loop through and create a dynamic script containing the 100,000 sqlldr commands. This allows me to run it in various ways. I could schedule it. I could split and run them at different times. I could review what I'm going to run etc etc.
To achieve this I have done:
for f in *.CSV; do echo "sqlldr <username>/<password>@<DATABASE> control="$f.ctl log=$f.log errors=20000 >> load_data.ksh; done
The shell file load_data.ksh is now ready to run.