Storing results

Once you've broken the backbone of a scrape by writing the Parser, you're going to want to put the results somewhere.

The existing Stores are:

ConsoleStore - Print the data to System.out
NullStore - Throw the data away
FileStore - Store the data in a file
JdbcStore - Store data via an INSERT sql statement
CallableJdbcStore - Store data via a stored procedure

Let's go over each and discuss the configuration options.

ConsoleStore

Xxx.store=Console

Simple to configure, simple to use. Hook it up and your Results be dumped on screen. Useful in debugging.

NullStore

Xxx.store=Null

You also get this store when you provide no configuration. This is also of use in debugging; especially if the debugging you want to see is from your Parser or Fetcher, and you don't want the spam of your results trying to go somewhere.

NullStore is also useful if you're really interested in the fact the server had a dynamic page invoked and not in the actual result from the page.

FileStore

Xxx.store=File
Xxx.path=/tmp/foo/
Xxx.saveAs=somefile.foo

The first store with additional options; ie) where to save the file. Currently this store only writes the {0,0} field from your result; ie) the first field in the first row. This may seem silly, but FileStore was created to store scraped images rather than data. Improving this is a TODO.

JdbcStore

Xxx.store=Jdbc
Xxx.DS=FooDS
# then either
Xxx.sql=INSERT INTO Foo (col1, col2) VALUES(?,?)
# or
Xxx.table=Foo

Storing data in a database is hard to get away from when scraping. JdbcStore makes it easy.

The first configuration option is the DataSource; this should be obtained via JNDI as a javax.sql.DataSource object and for most of us this means knowing how to configure a DataSource in Simple-JNDI.

Then you can either specify the INSERT statement to use, in java.sql.PreparedStatement notation, or for the exceptionally lazy you can simply specify the table to INSERT into, which will result in a generic statement of the type INSERT INTO Foo VALUES(?, ?) where the number of question marks inside VALUES is dependent on the length of the row in the results.

Currently JdbcStore does not do anything special with your data, so if you want to insert a Date into the database, you'll need to place the correct java.sql type in your results and not java.util.Date.

CallableJdbcStore

Xxx.store=CallableJdbc
Xxx.DS=FooDS
Xxx.sql=call stored_proc_example(?,?)
# side-effect of inheritence means it'll accept this; will cause problems to try and use it.
Xxx.table=Foo

CallableJdbcStore is an example of an extension to JdbcStore which uses java.sql.CallableStatement instead of java.sql.PreparedStatement. With the exception of the table option, all of the JdbcStore information applies.

Implementing your own Store

As storing is a service provided by OSCube, creating your own Store involves implementing the two methods in the org.osjava.oscube.service.store.Store interface.

public void store(Result result, Config cfg, Session session) throws StoringException;
public boolean exists(Header header, Config cfg, Session session) throws StoringException;

For the moment, it is recommended that the exists method merely return false, it's designed for scrapers that don't want to insert repeated data but is currently not very well tested as a concept.

You can also extend existing Stores, such as the org.osjava.oscube.service.store.JdbcStore to avoid having to do all the JDBC work again.