Troubles reading csv files, maybe due to their weird formatting...
m.ardito Feb 26, 2016 5:47 AMHi,
I'm trying to read data from a csv file, generated by a snmp data collector
I've looked
- this quickstart teiid-quickstarts/dynamicvdb-datafederation at master · teiid/teiid-quickstarts · GitHub
- this wiki guide Text to Table with Teiid
I tried to mimic those examples, but in those examples the target text file is always like
SYMBOL,PRICE
RHT,30.00
BA,42.75
MON,78.75
PNRA,84.97
SY,24.30
and this is quite different from the file I get form the snmp collector... i.e. like
#Contatore totale
#Format Version:4.2.3.0
#Data/Ora esportazione: 2015/10/05 11:13:17
#Codice Risultato,Informazioni contatore più aggiornate,Indirizzo,Indirizzo nodo,Modello,Nome,Commento,Contatore totale
<ReturnCode>,<Latest Comm. Time>,<Address>,<MacAddress>,<Model Name>,<Device Name>,<Comment>,<Total Counter>
0,2015/10/05 11:12:58,[192.168.x.y],[00-00-00-00-00-00],[Aficio SP 3510SF],[3510SF-39E733],[RICOH Aficio SP 3510SF V2.01 / ],48173
0,2015/10/05 11:12:58,[192.168.x.y],[00-00-00-00-00-00],[Aficio SP 3510SF],[3510SF-539A78],[RICOH Aficio SP 3510SF V2.07 / ],24594
0,2015/10/05 11:12:57,[192.168.x.y],[00-00-00-00-00-00],[Aficio SP 3510SF],[3510SF-539AF0],[RICOH Aficio SP 3510SF V2.07 / ],34818
0,2015/10/05 11:12:57,[192.168.x.y],[00-00-00-00-00-00],[Aficio SP 3510SF],[3510SF-539B00],[RICOH Aficio SP 3510SF V2.07 / ],5735
0,2015/10/05 11:12:58,[192.168.x.y],[00-00-00-00-00-00],[Aficio SP 3510SF],[3510SF-53B156],[RICOH Aficio SP 3510SF V2.07 / ],76948
After several failures, I tried several things:
- removing the first 4 rows,
- removing the "<" and ">" form the header row
- removing spaces in headers field names
...it seems nothing works...
The last try was after converting the csv file to
ReturnCode,LatestCommTime,Address,MacAddress,ModelName,DeviceName,Comment,TotalCounter
0,2015/09/17 16:16:35,[192.168.x.y],[00-00-00-00-00-00],[Aficio SP 3510SF],[3510SF-39E733],[RICOH Aficio SP 3510SF V2.01 / ],47652
0,2015/09/17 16:16:35,[192.168.x.y],[00-00-00-00-00-00],[Aficio SP 3510SF],[3510SF-539A78],[RICOH Aficio SP 3510SF V2.07 / ],23202
0,2015/09/17 16:16:35,[192.168.x.y],[00-00-00-00-00-00],[Aficio SP 3510SF],[3510SF-539AF0],[RICOH Aficio SP 3510SF V2.07 / ],34355
0,2015/09/17 16:16:33,[192.168.x.y],[00-00-00-00-00-00],[Aficio SP 3510SF],[3510SF-539B00],[RICOH Aficio SP 3510SF V2.07 / ],4342
0,2015/09/17 16:16:33,[192.168.x.y],[00-00-00-00-00-00],[Aficio SP 3510SF],[3510SF-53B156],[RICOH Aficio SP 3510SF V2.07 / ],75335
after loading the vdb
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<vdb name="Contatori" version="1">
<description>Contatori lan</description>
<property name="UseConnectorMetadata" value="true" />
<model name="ContatoriLan">
<source name="text-connector" translator-name="file" connection-jndi-name="java:/contatori"/>
</model>
<model name="Dati" type="VIRTUAL">
<metadata type="DDL"><![CDATA[
CREATE VIEW Stampanti (
updated date,
address string
)
AS
SELECT x.LatestCommTime, x.Address
FROM (EXEC ContatoriLan.getTextFiles('*.CSV')) AS f,
TEXTTABLE(f.file COLUMNS LatestCommTime date, Address string HEADER) AS x;
]]>
</metadata>
</model>
</vdb>
I get
11:31:23,323 INFO [org.teiid.RUNTIME.VDBLifeCycleListener] (MSC service thread 1-3) TEIID40120 VDB Contatori.1 will be removed from the repository
11:31:23,326 INFO [org.teiid.RUNTIME.VDBLifeCycleListener] (MSC service thread 1-3) TEIID40119 VDB Contatori.1 removed from the repository
11:31:23,329 INFO [org.teiid.SECURITY] (MSC service thread 1-3) TEIID40009 Admin [null] is terminating this session: 3byYHBqd0NyU.
11:31:23,332 INFO [org.teiid.RUNTIME] (MSC service thread 1-3) TEIID50026 VDB "Contatori.1[ContatoriLan{text-connector=text-connector, file, java:/contatori}, Dati{}]" undeployed.
11:31:23,338 INFO [org.teiid.RUNTIME.VDBLifeCycleListener] (MSC service thread 1-3) TEIID40118 VDB Contatori.1 added to the repository - is reloading false
11:31:23,341 INFO [org.teiid.RUNTIME] (MSC service thread 1-3) TEIID50029 VDB Contatori.1 model "ContatoriLan" metadata is currently being loaded. Start Time: 26/02/16 11.31
11:31:23,345 INFO [org.teiid.RUNTIME] (teiid-async-threads - 4) TEIID50030 VDB Contatori.1 model "ContatoriLan" metadata loaded. End Time: 26/02/16 11.31
11:31:23,345 INFO [org.teiid.RUNTIME] (MSC service thread 1-3) TEIID50029 VDB Contatori.1 model "Dati" metadata is currently being loaded. Start Time: 26/02/16 11.31
11:31:23,352 INFO [org.teiid.RUNTIME] (teiid-async-threads - 3) TEIID50030 VDB Contatori.1 model "Dati" metadata loaded. End Time: 26/02/16 11.31
11:31:23,356 INFO [org.teiid.RUNTIME.VDBLifeCycleListener] (teiid-async-threads - 3) TEIID40003 VDB Contatori.1 is set to ACTIVE
From squirrel, connecting to the vdb looks fine, but as soon as I query anything like "select * from Stampanti" I get
11:34:40,311 WARN [org.teiid.PROCESSOR] (Worker10_QueryProcessorQueue237) DG5IdTWJSfjH TEIID30020 Processing exception for request DG5IdTWJSfjH.11 'TEIID30176 Could not convert value for column LatestCommTime in the row ending on text line 2 in file:/mnt/common/TEST/20150917161732_Lan_T.CSV.'. Originally TeiidProcessingException Date.java:140. Enable more detailed logging to see the entire stacktrace.
then I tried changing the "date" to "timestamp" in the vdb like
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<vdb name="Contatori" version="1">
<description>Contatori lan</description>
<property name="UseConnectorMetadata" value="true" />
<model name="ContatoriLan">
<source name="text-connector" translator-name="file" connection-jndi-name="java:/contatori"/>
</model>
<model name="Dati" type="VIRTUAL">
<metadata type="DDL"><![CDATA[
CREATE VIEW Stampanti (
updated timestamp,
address string
)
AS
SELECT x.LatestCommTime, x.Address
FROM (EXEC ContatoriLan.getTextFiles('*.CSV')) AS f,
TEXTTABLE(f.file COLUMNS LatestCommTime timestamp, Address string HEADER) AS x;
]]>
</metadata>
</model>
</vdb>
and now I get
11:37:49,892 WARN [org.teiid.PROCESSOR] (Worker10_QueryProcessorQueue262) xIOTB4045PN0 TEIID30020 Processing exception for request xIOTB4045PN0.6 'TEIID30176 Could not convert value for column LatestCommTime in the row ending on text line 2 in file:/mnt/common/TEST/20150917161732_Lan_T.CSV.'. Originally TeiidProcessingException 'Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]' Timestamp.java:235. Enable more detailed logging to see the entire stacktrace.
... I can't randomly try everything that comes to mind, but also I can't preprocess every csv in order to be read in this way...
It is simpler to load that in a mysql database, and then query it (which works perfectly, btw..), althought I was trying to avoid this intermediate step, from the start.
What is the best approach to use those .csv, files, if possible without "preprocessing" them?
is it possible to add some paramenter or function to
- specify which are the header and data start rows (like the excel connector allows)
- use even those weird header column name names (like <headername>) in some way?
what you suggest?
I could find so many differently formatted .csv files, I need to learn all tips & tricks of teiid-fu to be able to use them, or find other ways...
Thanks,
Marco