开发者

file (not in memory) based JDBC driver for CSV files

开发者 https://www.devze.com 2023-04-09 05:31 出处:网络
Is there a open source file based (NOT in-memory based) JDBC driver for CSV files? My CSV are dynamically generated from the UI according to the user selections and each user will have a different CSV

Is there a open source file based (NOT in-memory based) JDBC driver for CSV files? My CSV are dynamically generated from the UI according to the user selections and each user will have a different CSV file. I'm doing this to reduce database hits, since the information is contained in the CSV file. I only need to perform SELECT operations.

HSQLDB allows for indexed searches if we specify an index, but I won't be able to provide an unique column th开发者_开发百科at can be used as an index, hence it does SQL operations in memory.


Edit:

I've tried CSVJDBC but that doesn't support simple operations like order by and group by. It is still unclear whether it reads from file or loads into memory.

I've tried xlSQL, but that again relies on HSQLDB and only works with Excel and not CSV. Plus its not in development or support anymore.

H2, but that only reads CSV. Doesn't support SQL.


You can solve this problem using the H2 database.

The following groovy script demonstrates:

  1. Loading data into the database
  2. Running a "GROUP BY" and "ORDER BY" sql query

Note: H2 supports in-memory databases, so you have the choice of persisting the data or not.

// Create the database
def sql = Sql.newInstance("jdbc:h2:db/csv", "user", "pass", "org.h2.Driver")

// Load CSV file
sql.execute("CREATE TABLE data (id INT PRIMARY KEY, message VARCHAR(255), score INT) AS SELECT * FROM CSVREAD('data.csv')")

// Print results 
def result = sql.firstRow("SELECT message, score, count(*) FROM data GROUP BY message, score ORDER BY score")

assert result[0] == "hello world"
assert result[1] == 0
assert result[2] == 5

// Cleanup
sql.close()

Sample CSV data:

0,hello world,0
1,hello world,1
2,hello world,0
3,hello world,1
4,hello world,0
5,hello world,1
6,hello world,0
7,hello world,1
8,hello world,0
9,hello world,1
10,hello world,0


If you check the sourceforge project csvjdbc please report your expierences. the documentation says it is useful for importing CSV files.

Project page


This was discussed on Superuser https://superuser.com/questions/7169/querying-a-csv-file.

You can use the Text Tables feature of hsqldb: http://hsqldb.org/doc/2.0/guide/texttables-chapt.html

csvsql/gcsvsql are also possible solutions (but there is no JDBC driver, you will have to run a command line program for your query).

sqlite is another solution but you have to import the CSV file into a database before you can query it.

Alternatively, there is commercial software such as http://www.csv-jdbc.com/ which will do what you want.


To do anything with a file you have to load it into memory at some point. What you could do is just open the file and read it line by line, discarding the previous line as you read in a new one. Only downside to this approach is its linearity. Have you thought about using something like memcache on a server where you use Key-Value stores in memory you can query instead of dumping to a CSV file?


You can use either specialized JDBC driver, like CsvJdbc (http://csvjdbc.sourceforge.net) or you may chose to configure a database engine such as mySQL to treat your CSV as a table and then manipulate your CSV through standard JDBC driver.

The trade-off here - available SQL features vs performance.

  • Direct access to CSV via CsvJdbc (or similar) will allow you very quick operations on big data volumes, but without capabilities to sort or group records using SQL commands ;
  • mySQL CSV engine can provide rich set of SQL features, but with the cost of performance.

So if the size of your table is relatively small - go with mySQL. However if you need to process big files (> 100Mb) without need for grouping or sorting - go with CsvJdbc. If you need both - handle very bif files and be able to manipulate them using SQL, then optimal course of action - to load the CSV into normal database table (e.g. mySQL) first and then handle the data as usual SQL table.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号