Wednesday, April 11, 2012

Extract from SalesForce, Load to Oracle

I did it this way for 2 reasons.
1)These are the tools I know how to use.
2)There are things I am not allowed to do at work due to access rights. 

This is not intended to be a perfect document, it's just intended to be helpful.
No doubt, you will find problems which I did not encounter. If you do, please leave notes in the comments to help the next guy out.

NOTE: When I mean something literally I use the left-bracket and right-bracket
       character like this:
      [that's literally what "I" mean]

This is the results of dozens of failed attempts due to SQLPLUS idiosyncrasies of dealing with 
multi-line text. If you don't have multi-line text, many of the following edits won't be applicable. 

The process I use is basically
  1) extract data from Salesforce into a file
  2) Use Excel to format salesforce data into Oracle INSERT statements
  3) edit all the things out of the file which SQLPLUS will find objectionable
   and put in a few additional script commands like COMMIT
  4) run SQLPLUS 

This document assumes that the columns you extract from SalesForce will match exactly with the Oracle
DB table you want to load. 

Software you will need.
  1) The Salesforce Data Loader program
  2) TextPad (get it off the web. It's the only text editor that won't screw up your files.)
  3) Excel 2007 or later (maybe earlier versions would work too)
  4) An Oracle database.

1) Use Salesforce Data Loader to create the extract .csv file (hence forth called 'the extract file').
   If you don't know how  to do this, Google it. There is much better documentation on this than
   I could ever provide. 

2) At this point, I copy all of the column headers in the .cvs file and use them to create an Oracle table. 
   I define all the columns as VARCHAR2(4000) just for ease but you can use whatever matches the data 
   if you want.

3) Use Excel to open the extract file, from step 1, and do SAVE AS sf_extract_edited.xlxs. In other words
   immediately save it back under a new name so that you don't overwrite the extract file in step 1.
   It does not really matter what you name it but it is a good idea to include the word "edited" so that
   you know that this is an edited version.

   3.1) Rename the first worksheet tab (at the bottom) to [raw_data]. 
   3.2) Do a FIND on [#NAME] using the following options ... 
        3.2.1) ctrl-f
        3.2.2) click on [Options] button on the FIND-dialog pop-up window.
        3.2.3) for [Look in:] dropdown, choose [Values]  <--IMPORTANT !
        3.2.4) make sure the entire sheet will be searched and click on Find Next
        3.2.5) if the find gets a hit, note that in for formula field you will
               probably see a value that starts with [=]
        3.2.6) on the formula field, insert a ['] before the [=] (put a single quote before the equal sign)
               this will prevent excel from interpreting the equal sign as a formula indicator. 
        3.2.7) repeat this process until no more hits on the FIND.
   3.3) Insert an empty column to the left of column A (this empty column will become the new column A)
        I'm doing this so that the cells line up with the cells of the next sheet which we are going to create

   3.4) create a new "Sheet", name the new sheet 'inserts'
   3.5) Create an insert statement that matches the table you will be loading then 
        position cursor at cell A2 and paste in the first part of your INSERT statement, for example:


 3.6) select cell B2 and enter the formula  [=CONCATENATE("q'~",raw_data!B2,"~',")] without
      the outer brackets, on the formula line
    3.6.1) If any text has more than 4000 characters, modify the above formula for that column to be ..
           [=CONCATENATE("q'~",  MID( raw_data!N11409,1,3200),"~',")]  
            excel and sqlplus count chars differently hence 3200 for a safe margin.

 3.7) drag the cell out to the right to cover all of the columns that have data in the raw_data sheet
 3.8) edit the right-most cell's formula so that your INSERT statement will end with [);] and not [,]
      3.8.1) i.e. change this: [=CONCATENATE("q'~",raw_data!CQ2,"~',")]
                      to this: [=CONCATENATE("q'~",raw_data!CQ2,"~');")]
 3.8) select all cells of column 2 and drag down to cover all rows in the raw_data sheet
 3.9) save your work
 4.0) open a new (empty) file using TextPad. You can get TextPad for free trial off the internet.
 4.1) copy-paste the entire excel content of the "inserts" tab to the empty TextPad window.
 4.2) insert the following at the top of the textpad window:
       set sqlblanklines on
       set define off
       spool opp.log
 4.3) insert SPOOL OFF at the bottom
 4.4) save and quit out of EXCEL, we won't need it any more after this.
 4.5) Save the TextPad file now, just for safety. Several edits will now need to be made to the TextPad file. 
   4.5.1) n/a
   4.5.2) Replace all ["INSERT] with [INSERT]  i.e. remove the quote mark
   4.5.3) Replace all [VALUES ("] with [VALUES (] i.e. remove the quote mark
   4.5.4) This one is a little tricky and requires using a regular expression for the Replace.
          The intent is that no text line should start with a pound-sign so we are inserting a blank.
          On the Replace dialog box, put a check mark in the Regular expression option, then do
          Replace all [^#] with [ #]
        Now un-check Regular express option.
   4.5.6) Replace all ["q'~] with [q'~]
   4.5.7) Replace all [~',"] with [~',] 
   4.5.8) Replace all [);"]  with [);]

   4.5.9) Next we will change all lines that end with [;], except for ends with [');],  to end with [:]
          This is to pick up cases such as the following multi-line text example:

                 q'~A Multi-line text;
                    where a text line ends with semi-colon
                    will cause SQLPLUS to think the line has ended and you will get
                    ORA-01756: quoted string not properly terminated~'

          1) first replace all [');] with [````]. So they won't be picked up by our next replace.

          2) then replace all [; *$] with [:] using regular expression
                  This says, If the line ends with semi-colon, or semi-colon and a bunch of spaces,
                  replace it with a colon. If it is not acceptable for you to make slight changes
                  like this, then you will have to figure something else out.
                  WARNING: If you already coded COMMIT; it is now COMMIT:, go fix it. 

          3) change all  [````] back to [');]

          4) Replace all [^/$] with [ ] using regular expression
                 SQLPLUS does not like a text line that only contains a slash.

          5) Replace all [^\.$] with [ ] using regular expression
                 SQLPLUS does not like a text line that only contains a dot.
   4.5.9) optional - remove all tab characters 
          1) check the Hex box and replace [09] with []
   4.5.10) Save the file and close TextPad
 5.0) Get into SQLPLUS and run the INSERTs you just created.

One last thing, after loading the table,  carefully check any columns that are supposed to have SalesForce dates. Excel sometimes turns these into Julian dates. If you end up with excel julian dates in your oracle tables, you can translate by doing:

SELECT   to_date(to_char(2415019+closedate),'J') as closedate

where closeddate is the Excel julian date.


No comments: