Once you've broken the backbone of a scrape by writing the Parser, you're going to want to put the results somewhere.
The existing Stores are:
ConsoleStore - Print the data to System.out NullStore - Throw the data away FileStore - Store the data in a file JdbcStore - Store data via an INSERT sql statement CallableJdbcStore - Store data via a stored procedure
Let's go over each and discuss the configuration options.
Xxx.store=Console
Simple to configure, simple to use. Hook it up and your Results be dumped on screen. Useful in debugging.
Xxx.store=Null
You also get this store when you provide no configuration. This is also of use in debugging; especially if the debugging you want to see is from your Parser or Fetcher, and you don't want the spam of your results trying to go somewhere.
NullStore is also useful if you're really interested in the fact the server had a dynamic page invoked and not in the actual result from the page.
Xxx.store=File Xxx.path=/tmp/foo/ Xxx.saveAs=somefile.foo
The first store with additional options; ie) where to save the file. Currently this store only writes the {0,0} field from your result; ie) the first field in the first row. This may seem silly, but FileStore was created to store scraped images rather than data. Improving this is a TODO.
Xxx.store=Jdbc Xxx.DS=FooDS # then either Xxx.sql=INSERT INTO Foo (col1, col2) VALUES(?,?) # or Xxx.table=Foo
Storing data in a database is hard to get away from when scraping. JdbcStore makes it easy.
The first configuration option is the DataSource; this should be obtained via JNDI as a javax.sql.DataSource object and for most of us this means knowing how to configure a DataSource in Simple-JNDI.
Then you can either specify the INSERT statement to use, in java.sql.PreparedStatement notation, or for the exceptionally lazy you can simply specify the table to INSERT into, which will result in a generic statement of the type INSERT INTO Foo VALUES(?, ?) where the number of question marks inside VALUES is dependent on the length of the row in the results.
Currently JdbcStore does not do anything special with your data, so if you want to insert a Date into the database, you'll need to place the correct java.sql type in your results and not java.util.Date.
Xxx.store=CallableJdbc Xxx.DS=FooDS Xxx.sql=call stored_proc_example(?,?) # side-effect of inheritence means it'll accept this; will cause problems to try and use it. Xxx.table=Foo
CallableJdbcStore is an example of an extension to JdbcStore which uses java.sql.CallableStatement instead of java.sql.PreparedStatement. With the exception of the table option, all of the JdbcStore information applies.
As storing is a service provided by OSCube, creating your own Store involves implementing the two methods in the org.osjava.oscube.service.store.Store interface.
public void store(Result result, Config cfg, Session session) throws StoringException; public boolean exists(Header header, Config cfg, Session session) throws StoringException;
For the moment, it is recommended that the exists method merely return false, it's designed for scrapers that don't want to insert repeated data but is currently not very well tested as a concept.
You can also extend existing Stores, such as the org.osjava.oscube.service.store.JdbcStore to avoid having to do all the JDBC work again.