Importing Data Overview
Imagine the following:
You've finished your evaluation of the xTuple applications and have decided to use PostBooks® for your ERP needs. How do you migrate your legacy data from your old accounting package to the PostBooks® database?
Or perhaps you are in a different situation:
You've been using the Manufacturing Edition (formerly OpenMFG) for several years. Your boss has decided you need to open a web store. How do you integrate that web system with your xTuple ERP database?
The general problem is the same in both situations — you need to import data to an xTuple database. In the case of the implementing PostBooks®, the data are coming from some other database system, while for the web store you receive new orders as text files using email or ftp or some other file transfer mechanism. How do you get these data into your database so the xTuple application can use them?
The first main section below uses the web store as an example to show the different ways you can import data into your xTuple database and the relative merits and problems with each. It is not a tutorial on the various technologies involved but rather is intended to introduce you to your options and to help you choose how to import your data for use by xTuple ERP.
The second major section goes through the steps you should follow to successfully migrate your data from another ERP or accounting package into the xTuple ERP database. It is general in nature, rather than specific to any particular other piece of software.
Data Migration Strategies and Technologies
Sometimes the format of the data helps choose the technology you will use to process those data. For our web store, we'll say that our eCommerce provider gives us an option of formats and we can switch between them at any time. Here is a sample order in several of the possible formats:
Labeled format with multiple items per order and summarized totals
first: John last: Smith address1: 123 Main Street address2: Apt 4 city: Norfolk state: VA zip: 23510 country: United States ordernum: 56789 item: LAMP qty: 5 itemcost: 37.50 taxrate: 0.05 item: PHONE qty: 1 itemcost: 15.50 taxrate: 0.05 tax: 10.15 total: 213.15
Comma-separated format with one line per item and duplicated data for multiple line items
John, Smith, 123 Main Street, Apt 4, Norfolk, VA, 23510, United States, 56789, LAMP, 5, 37.50, 0.05 John, Smith, 123 Main Street, Apt 4, Norfolk, VA, 23510, United States, 56789, PHONE, 1, 15.50, 0.05
XML format with separate elements for multiple line items
<?xml version="1.0> <!DOCTYPE orders SYSTEM "http://my.ecommerce.biz/orders.dtd"> <orders> <order number="56789"> <first>John</first> <last>Smith</last> <address> <street1>123 Main Street</street1> <street2>Apt 4</street2> <city>Norfolk</city> <state>VA</state> <zip>23510</zip> <country>United States</country> </address> <itemlist> <item number="LAMP"> <qty>5</qty> <itemcost>3.75</itemcost> <taxrate>0.05</taxrate> </item> <item name="PHONE"> <qty>1</qty> <itemcost>15.50</itemcost> <taxrate>0.05</taxrate> </item> </itemlist> <tax>10.15</tax> <total>213.15</total> </order> </orders>
In all three of these examples, the data are the same although they are arranged differently. The comma-separated example differs only in that there are no values given for total tax and the total cost of the order because there is nowhere to put this information — there is no line summarizing the order as a whole. For purposes of illustration we'll assume that John Smith is a new customer, so we'll have to add his name and address, as well as the details of his order.
Import Directly Into Tables
The database schema for the xTuple ERP applications has separate tables for customer information, addresses, sales orders, and sales order line items. To import this simple order, only 19 pieces of information, we'll need to create entries in at least 5 different tables. Complicating this fact, these tables are related to each other by foreign keys. Here's what you have to do:
- Convert the data from one of the formats sent by the web store to database statements.
- Execute those database statements.
You have two main options on which database statements to build: you could create SQL insert statements or you could create calls to stored procedures. If you do the latter then you have to write those stored procedures yourself. Here are the insert commands necessary to import the data for the sample order above. Transforming the data from any of the three format samples to these insert statements is left as an exercise for the reader.
BEGIN; INSERT INTO addr (addr_line1, addr_line2, addr_city, addr_state, addr_postalcode, addr_country) VALUES ('123 Main Street', 'Apt 4', 'Norfolk', 'VA', '23510', 'United States'); INSERT INTO cntct (cntct_addr_id, cntct_first_name, cntct_last_name) VALUES (CURRVAL('addr_addr_id_seq'), 'John', 'Smith'); INSERT INTO custinfo (cust_number, cust_name, cust_cntct_id, cust_corrcntct_id, cust_custtype_id, cust_salesrep_id, cust_shipform_id, cust_terms_id, cust_active, cust_backorder, cust_partialship, cust_discntprcnt, cust_balmethod, cust_ffshipto, cust_blanketpos, cust_shipchrg_id, cust_creditstatus, cust_ffbillto, cust_usespos, cust_emaildelivery, cust_autoupdatestatus, cust_autoholdorders, cust_soemaildelivery) VALUES ('JSMITH', 'John Smith', CURRVAL('cntct_cntct_id_seq'), CURRVAL('cntct_cntct_id_seq'), fetchMetricValue('DefaultCustType'), fetchMetricValue('DefaultSalesRep'), fetchMetricValue('DefaultShipFormId'), fetchMetricValue('DefaultTerms'), true, fetchMetricBool('DefaultBackOrders'), fetchMetricBool('DefaultPartialShipments'), 0, fetchMetricText('DefaultBalanceMethod'), fetchMetricBool('DefaultFreeFormShiptos'), false, 1, 'G', false, false, false, true, true, false); INSERT INTO cohead (cohead_number, cohead_cust_id, cohead_orderdate, cohead_salesrep_id, cohead_shipform_id, cohead_terms_id, cohead_shipto_id, cohead_freight) SELECT 56789, cust_id, 'today', cust_salesrep_id, cust_shipform_id, cust_terms_id, -1, 0 FROM custinfo WHERE cust_id = CURRVAL('cust_cust_id_seq'); INSERT INTO coitem (coitem_cohead_id, coitem_linenumber, coitem_itemsite_id, coitem_status, coitem_scheddate, coitem_qtyord, coitem_price, coitem_custprice, coitem_qtyshipped, coitem_imported, coitem_qty_uom_id, coitem_qty_invuomratio, coitem_price_uom_id, coitem_price_invuomratio) SELECT CURRVAL('cohead_cohead_id_seq'), 1, itemsite_id, 'O', 'today', 5, 37.50, 37.50, 0, true, uom_id, itemUOMratioByType(item_id, 'Selling'), uom_id, itemUOMratioByType(item_id, 'Selling') FROM item, itemsite, usrpref, uom WHERE item_number = 'LAMP' AND item_id = itemsite_item_id AND itemsite_warehous_id = usrpref_value::INTEGER AND usrpref_username = CURRENT_USER AND usrpref_name = 'PreferredWarehouse' AND uom_name = itemUOMbyType(item_id, 'Selling'); INSERT INTO coitem (coitem_cohead_id, coitem_linenumber, coitem_itemsite_id, coitem_status, coitem_scheddate, coitem_qtyord, coitem_price, coitem_custprice, coitem_qtyshipped, coitem_imported, coitem_qty_uom_id, coitem_qty_invuomratio, coitem_price_uom_id, coitem_price_invuomratio) SELECT CURRVAL('cohead_cohead_id_seq'), 1, itemsite_id, 'O', 'today', 1, 15.50, 15.50, 0, true, uom_id, itemUOMratioByType(item_id, 'Selling'), uom_id, itemUOMratioByType(item_id, 'Selling') FROM item, itemsite, usrpref, uom WHERE item_number = 'PHONE' AND item_id = itemsite_item_id AND itemsite_warehous_id = usrpref_value::INTEGER AND usrpref_username = CURRENT_USER AND usrpref_name = 'PreferredWarehouse' AND uom_name = itemUOMbyType(item_id, 'Selling'); COMMIT;
Note that the further along you go in the process, the more tables you must use to find the data to properly insert. In addition there are lots of magic values and special calls to get default values.
The insert into the addr table is fairly simple, drawing data directly from the import. Inserting the contact information in the cntct table is more complex. The cntct record requires an internal identifying value for the associated address which is directly not available. You have to grab the last-used sequence number for the addr table's addr_id column with the currval function.
The details about individual customers are stored in the custinfo table. Note the following:
- A customer number was generated from the customer name by some unspecified means.
- There is a large number of calls to the fetchMetricValue and fetchMetricBool functions.
- The hard-coded false and true values for some other columns, G for the customer's current credit status and 1 for the cust_shipchrg_id.
- You probably didn't notice that there is no commission inserted by this command even though there is a default sales representative chosen.
You have to increment the line number when creating the line items, as well as handling the unit of measure conversions and selecting the data from some odd mix of apparently unrelated tables.
As you go further down through the inserts into the sales order table, cohead, and the sales order line items, coitem, the processing gets more and more obscure, with the interesting data obscured by specifying default values and calls to stored procedures you might not know exist.
As the application grows more complex over time, with new features constantly added, you would have to update the program that converts the input, whichever form you selected to start with, to keep pace.
As an alternative to directly transforming the data from your web store into this series of insert statements, you could write a short series of stored procedures:
CREATE OR REPLACE FUNCTION insertContact(first TEXT, last TEXT, line1 TEXT, line2 TEXT, city TEXT, state TEXT, country TEXT, zip TEXT) RETURNS INTEGER AS ' BEGIN -- insert into the address and contact tables as above RETURN CURRVAL(''cntct_cntct_id_seq''); END;' LANGUAGE 'plpgsql'; CREATE OR REPLACE FUNCTION insertCustomer(custname TEXT, contactid INTEGER) RETURNS INTEGER AS ' BEGIN -- insert a customer record, hiding the default values here inside this stored procedure RETURN CURRVAL(''cust_cust_id_seq''); END;' LANGUAGE 'plpgsql'; CREATE OR REPLACE FUNCTION insertOrder(ordernumber TEXT, custid INTEGER) RETURNS INTEGER AS ' BEGIN -- select what you need from the customer record and use that information -- to create a sales order record, along with whatever other defaults you need RETURN CURRVAL(''cohead_cohead_id_seq''); END;' LANGUAGE 'plpgsql'; CREATE OR REPLACE FUNCTION insertOrderLine(orderid INTEGER, itemnumber TEXT, qty NUMERIC, price NUMERIC) RETURNS INTEGER AS ' BEGIN -- select what you need, auto-increment the line number, and -- create the line item RETURN CURRVAL(''coitem_coitem_id_seq''); END;' LANGUAGE 'plpgsql';
Hold on a sec—this is starting to look an awful lot like programming and you're not a programmer! OK, maybe you are, but isn't there an easier way?
Import into API Views
The database for xTuple ERP has a collection of views that can make this data import a lot easier. As the functionality of the application grows, the views grow with it. In case you didn't know, a view is something that looks like a database table and can be programmed to act like a table, but isn't really a table. It's actually a set of rules that describe what should happen when someone selects data from the view or tries to modify data through the view.
These views have the following advantages:
- The rules on the view can update some auxiliary tables in the same statement as the update on the main table.
- You don't have to worry about the serial column values and foreign key maintenance as much.
You can learn more about these by reading the ApiWhitePaper. For our purposes here, let's just see how to create our order using the API Views instead of the mess in the previous section:
BEGIN; INSERT INTO api.customer (customer_number, customer_name, billing_contact_first, billing_contact_last, billing_contact_address1, billing_contact_address2, billing_contact_city, billing_contact_state, billing_contact_country, billing_contact_postalcode, correspond_contact_first, correspond_contact_last, correspond_contact_address1, correspond_contact_address2, correspond_contact_city, correspond_contact_state, correspond_contact_country, correspond_contact_postalcode, customer_type) VALUES ('JSMITH', 'John Smith', 'John', 'Smith', '123 Main Street', 'Apt 4', 'Norfolk', 'VA', 'United States', '23510', 'John', 'Smith', '123 Main Street', 'Apt 4', 'Norfolk', 'VA', 'United States', '23510', 'NORMAL'); INSERT INTO api.salesorder (order_number, customer_number) VALUES (56789, 'JSMITH'); INSERT INTO api.salesline (order_number, item_number, qty_ordered, net_unit_price) VALUES (56789, 'LAMP', 5, 37.50); INSERT INTO api.salesline (order_number, item_number, qty_ordered, net_unit_price) VALUES (56789, 'PHONE', 1, 15.50); COMMIT;
While there is still the problem of figuring out what the customer number should be, notice that the insert is pretty simple. All you have to do is create a new customer and supply some basic information about that customer. Everything else is done for you behind the scenes: figuring out what the defaults should be, creating the address and contact records, and linking them all together.
Creating the sales order and individual sales lines are just as simple. You specify the data you know and let the views fill in the rest. And notice that you don't even have to give a line number—it's done for you!
If you now start up the xTuple ERP application and look at the created records, you'll find that everything looks OK. Perhaps some of the details aren't what you would like, but the application will allow you to ship the order and process the invoice. You can always fix the details with the application and add more fields to these straight-forward insert statements to get the right data next time. The views give you a wide range of control while still ensuring that you have consistent data.
Why would anyone ever use the tables directly? It's a matter of control and volume of data. If you have a lot of data to import and you understand the relationships between those data, both in the source database and the destination xTuple ERP database, you might find it makes sense to work directly with the core tables. However, most import tasks are better handled using the API views.
As an alternative to transforming the data to import from the format sent by the web store to SQL insert statements, it might be easier to have the web store send the data as comma-separated values (CSV), make minimal changes to the data, and import them with the CSV Import utility in the xTuple suite. You can find the manual for this utility in the csvimp_docs_examples.zip file on the SourceForge downloads page for PostBooks®.
The first time you do this you will have to create a CSV Atlas to map the various columns of the input to columns in the appropriate tables or API views. Remember the point made above about the customer number, though—you have to generate a customer number—and you'll have to generate a customer name, too. To do this, edit the CSV file and add the customer number and customer name to the end of each line.
Open the CSV Import application and use it to open the CSV file from the web store. You will see something like this:
Each column is numbered. You'll need these numbers when you create the CSV Atlas next.
Now select Edit from the Map menu and add three maps, one for the api.customer view, one for api.salesorder, and one for api.salesline.
The window shows the name of the table or view into which the data will be inserted.
Here you see the map between fields in the salesline view and columns in the CSV file.
Everything up to this point has been setup. Now it's time to actually run the import.
To actually import the data, click on the CSV Tool window and select Start... from the Import menu. Since we're creating a new customer, we'll have to import into all three views. Make sure that the customer map is the current map shown in the Select Map window and click the OK button.
Repeat this sequence for the salesorder map and the salesline map. It might be useful to open the Log window to look for errors (select View Log... from the Import menu). In this case we expect to see 2 errors:
- There should be one error reporting a duplicate insert on the customer view because there are two lines in the input and we try to create a customer from each line.
- There should be another error reporting a duplicate insert on the sales order view, essentially for the same reason.
After the initial configuration, importing should be fairly easy. You should just follow these steps:
- Download the comma-separated order file from the web store.
- Edit the file to add customer number and customer name fields to the end of each line.
- Open the CSV Import application.
- Inside the CSV Import application, open the atlas, the edited CSV file, and the log window.
- If all of the customers in the file are new, import the customers.
- Import the sales orders.
- Import the sales lines.
- Review the log window for errors and correct them when necessary.
XML, the Extensible Markup Language, is a World Wide Web Consortium standard for structuring, transferring, and processing data on the Web. The newer versions of HTML are specific instances of XML. Many eCommerce sites use XML files to communicate with their users. For example Yahoo! uses XML to send orders to the companies that have Yahoo! storefronts; in turn these companies send XML files back to Yahoo! to describe their catalogs.
The xTuple applications support importing XML files. The complexity comes from having to deal with many XML formats for incoming data, some of which may be structured very differently from xTuple's database schema.
If you can convert incoming XML files to a format that matches xTuple's expectations, you're in good shape. This is where the API views and a basic knowledge of DTDs and XSLT come in handy. What? You don't know what a DTD is? XSLT looks like a collision of fingers on a keyboard?
DTD stands for Document Type Definition. It's the part of the XML standard that describes what a well-formed document looks like. This exists so both the creator and recipient of a document know what to expect and so software can ensure that a document is complete. For example, it would be difficult to communicate sales order information without both parties knowing the sales order number, so a DTD for a sales order document would have a mandatory tag for the sales order number.
Yahoo! has a DTD which describes the sales orders it collects from Yahoo! stores and sends to its users. This is the OrderList2.dtd and is available directly from Yahoo!.
XSLT stands for XSL Transformations, and XSL stands for Extensible Stylesheet Language. XSLT lets you transform XML files from one form to another. As an example of how you can import XML files into an xTuple database, we have written an XSLT stylesheet to convert a Yahoo! order list to a file that xTuple ERP can then import directly. See below for a description of the core XML file format for xTuple ERP.
The conversion of the web store XML shown above into XML that can be handled by the xTuple ERP is pretty simple. It's conceptually very similar to creating the CSV Atlas. The advantage is that you can run an import in a single step from within the xTuple ERP application instead of having to start CSV Import and run the import multiple times. The cost is that you, or someone you know, has to know enough XSLT to write the transformation stylesheet.
How hard is that? You can start by copying the yahoo_to_xtupleapi.xsl file, which is available for download from the xChange. Strip out the parts you don't need and change a few key words. Here's an example of the type of change you would have to make: The stylesheet which converts between Yahoo and xTuple documents expects the top-level element to be an Order element with an id attribute that contains an account number and order number combined, while the example XML for our hypothetical web store has an order element with a number attribute.
A portion of the XSLT stylesheet for importing from a Yahoo! store
... <xsl:template match="Order"> ... <salesorder> <order_number> <xsl:value-of select="substring-after(@id, concat(/OrderList/@StoreAccountName, '-'))"/> </order_number> ... </xsl:template> ...
These are just a few short lines from the middle of a fairly long file—about 730 lines at this writing. This excerpt is fairly easy to understand: after doing some preliminary work processing the Order element, this creates a salesorder element, strips off a prefix (StoreAccountName-) from the original Order's id attribute and sets the test of the salesorder element to the result.
The same XSLT changed for importing from the example web store
... <xsl:template match="order"> ... <salesorder> <order_number> <xsl:value-of select="@number"/> </order_number> ... </xsl:template> ...
The structure of the example web store's XML is slightly simpler than a real Yahoo! store, so less work is required to get the order number from our example. In the Yahoo! transformation, the order number has to be extracted out of the id attribute (the @id in the Yahoo! XSLT stands for "the id attribute of the current element (Order)") by stripping off the store number, which itself is embedded in an attribute of the OrderList element. In our example the order number is simply an attribute of the current element order.
Modifying XSLT is not always easy but it has a pretty big pay-off. Conceptually it is like setting up the CSV Atlas but in practice it is very different. However, once you have an XSL stylesheet to transform from your web store's format to the xTuple API format and have configured xTuple ERP to use this stylesheet, the import is very easy to perform.
To configure the application to use your stylesheet, use the Configure Data Import and Export screen in System > Master Information. Click the New button and fill in the details that describe when to use your stylesheet:
- Map Name
- Put a meaningful name here, one that will distinguish this mapping from any others you might use.
- Document Type
- The value to put here should be the first word in the <!DOCTYPE> directive in the XML files you will get from your eCommerce provider.
- System Identifier
- The value of the System Identifier also comes from the eCommerce provider's XML file. It should be the quoted string that follows the word SYSTEM in the sample file above.
- Import XSLT File Name
- Type the name of your XSLT stylesheet here. The best thing to do is put the simple filename here and use the Default XSLT File Directories fields on the Configure Import and Export window to name the path through the directory tree to find this file.
Then use the Configure Import and Export window to name the default directory to scan for files to import.
The steps for importing the data once everything is configured are simple:
- Every time you receive an XML orders file from your eCommerce provider, copy the file to your default import directory.
- Open the Import XML window in the xTuple ERP application under System > System Utilities
- Select the line(s) in the display for the new file(s).
- Click the Import Selected button.
The number of API views is growing faster than we can maintain a DTD for import, so for now the description of the XML file used for importing data is informal. The basic idea is that the file is a list of elements named after views or tables, with a child element for each column in the view or table for which you want to supply data.
Here is a simple example with line-by-line annotations. Don't actually load this example! It's only here to illustrate what you can do.
|Identify this as an XML file, conforming to the version 1.0 definition of XML|
<!DOCTYPE xtupleimport >
|Identify this as an "xmlimport" document, conforming to the "xmlimport" definition|
|This is the root element of the document. Each child element names a table or view in the database.|
|Insert a row into the budgetentry view in the api schema|
|The name of the budget entry is Mail|
|The G/L account number is 01-05-1099-01|
<period_start> 2010-01-05 </period_start>
This budget entry is for the period starting January 5, 2010.
If your data source gives you dates that do not align with your xTuple ERP accounting periods, you can select the period_start from the period table:
<period_start> SELECT period_start FROM public.period WHERE '2010-01-05' BETWEEN period_start AND period_end; </period_start>
We're allocating $410.25 for this budget item.
If you get a database error that mentions a type mismatch, something like the database server cannot convert TEXT to NUMERIC, add the quote attribute:
|End of this budget entry|
Insert a row into the carrier table in the public schema. This is a simple core table.
Setting the ignore attribute to true tells the import to process the rest of the file even if there is a database error while importing this carrier record.
The XML import also accepts a schema attribute instead of embedding the schema in the element tag name:
<carrier schema="public" ignore="true">
Attributes that will be supported in the future are
<carrier_name value="FedEx" />
set the carrier_name column of this new record to FedEx. This demonstrates an alternative way to set the value of a column in the database. It's more compact than the method demonstrated above.
The method you use is a matter of preference. Both methods accept [NULL]to insert an SQL NULL value; if a blank value is given (value="") then the value in the database will be set to an empty string.
<carrier_comments> Federal Express use 123456-78 for overnight and 123456-90 for 2-day </carrier_comments>
|Set the comment on this new carrier record.|
|End of this carrier record. Note that there was no carrier_id set even though the carrier table has a carrier_id column. This column has a default value defined in the database, so we don't have to worry about it.|
|This ends the root element.|
Importing Into xTuple ERP (under construction)
This section covers the steps necessary to import data from another application into a fresh xTuple ERP database. You should understand the previous section on strategies and technologies before continuing.
Create A Database
Start with empty.
Study The Schema
Use both public and api schemas.
There is a diagram in xtupledocs/trunk/devguide/postbooks_schema.dia which must be opened using the program dia. Describe how to find dia and install it, including fink and an X server for Mac.
Help improve xTuple ERP and ease the burden for you and others by adding views if you can.
Database structural guide:
- codes tables
- orders distinct from line items
- currency manipulation
- general ledger
Clean The Legacy Data
This would be useful even if you don't wind up switching.
Export Legacy Data
CSV would be best for now although XML would work for those data controlled by API views.
It'll probably be easier to massage the data at export time than import time but this is dependent on how the data are stored and the capabilities of your legacy software and the other tools at your disposal.
Import Legacy Data
Sequence is critical.
Start simple at first with basic codes tables. This both eases you into the process and provides the necessary setup for importing the more complex data.