Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run reconcile-csv.jar file #9

Open
santanu4 opened this issue Aug 25, 2014 · 19 comments
Open

Can't run reconcile-csv.jar file #9

santanu4 opened this issue Aug 25, 2014 · 19 comments
Labels

Comments

@santanu4
Copy link

I am a newbie i data reconciliation with open refine and I am still in learner stage. I am using windows XP SP3 and I have java run time environemnt installed in my machine. My question is where do I run the command:-
java -jar reconcile-csv-1.0.1-SNAPSHOT-standalone.jar <column with id's>

I have tried running it in windows command prompt and I used different prmutationsa dn combinations to see if any runs. I have also tried double clicking and running the reconcile-csv.jar file. It threw a hava virtual machine error and eventually didn't proceed. I want to know how to make it run?

@mihi-tr
Copy link
Contributor

mihi-tr commented Sep 9, 2014

Sorry for the late reply - extensive offline vacations here - The command prompt would be the right spot to run it. Don't forget to pass the right parameters to reconcile-csv. What is the result of your command prompt experiment?

@meriteadrupal
Copy link

Hello!

My apologies in advance for my newbie-ness. The reconcile-cvs instructions breeze over things that likely are obvious to seasoned data wranglers. But they aren't (yet) obvious to me. Like the member above, I'm am also having trouble starting the reconcile-csv server using the .jar file.

A few questions:

I am on the MacBook platform and in entering commands in terminal. Do I need to navigate to the same directory as the .jar file to run the command prompt to launch the server?

Where should I put the .jar file so that it works? (Should I put it in a "Reconcile-CSV" folder along with the other reconcile services in the Extensions folder. I'm referring to the directory you can see when you click on the "Browse workspace directory" link at the bottom of the OpenRefine home page when the app is running.)

where to put reconcile-csv

Where should I put the .csv file to which it refers? Does it belong in the same directory as the .jar file?

Do I need to navigate to that folder in Terminal to execute the command to start?

When I enter the code into Terminal to start the server (java -jar reconcile-csv-0.1.1.jar ), how should the CSV-File name be entered? I have a file called InventorsList.csv. Does it need the suffix ".csv"? Do the values need quotes around the names? I've tried with and without quotes, with and without .csv and still get the following error:

error message

Last, more a big-picture concept thing so that I can wrap my brain around this -- how does this work exactly?

As an example:
I have a spreadsheet of new patent inventor records (Spreadsheet A) that contains a column with the full name of an inventor. I have an export of contacts in my database (Spreadsheet B). I want to figure out what inventors we already have in our database as Contacts. I then need to append Contact IDs, etc to those records to link inventor records to the contact records in my database (2 different objects). Our unique identifier is the full names of people, the closest thing we could get to a unique identifier.

My plan is to bring in the inventor Spreadsheet A into OpenRefine to reconcile on the column called Name against Spreadsheet B. My understanding is that Spreadsheet B would be the .csv file that I would list in the .jar code. In this scenario, the "Search Column" would be the "Name" column in Spreadsheet B and the "ID column" would be the "Name" column in the project we created from Spreadsheet A. Am I understanding this correctly? In that example would I enter the following to start the server:

java -jar reconcile-csv-0.1.1.jar <Spreadsheet_B.csv>

I welcome your insights and corrections . . .again, my apologies if any of this seems obvious. I'm relatively technical, but am still coming up to speed on command prompt tasks.

Many thanks!
Krista

@mihi-tr
Copy link
Contributor

mihi-tr commented Sep 27, 2014

@meriteadrupal,

Thanks for the long description of what you're trying to do.

To run, java needs to know where exactly the .jar file is you are trying to run. So you can either run

``java -jar reconcile-csv-0.1.1.jar` if the jar is in the same folder or

java -jar /path/to/reconcile-csv.jar if the jar is in a different folder. The same for the CSV file.

Don't put the pointy brackets in - run e.g.

java -jar reconcile-csv.jar spreadsheetB.csv "name" "Contact ID"

reconcile-csv just works with the spreadsheet you are giving and exposes a reconciliation API towards Refine - the reconciliation itself will happen in Refine.

This means: The CSV you give to it is your database export - the "Search Column" would be the name and the "ID column" would be the column with the database ID.

Hope that helps.

@santanu4
Copy link
Author

santanu4 commented Oct 2, 2014

Hello Mr. Bauer,
I have been able to run and add the reconcile-csv add-on to open refine. However,  I have a few more questions in my mind. Reconcile-csv compares 2 csv datasets, Now! I want to know whether the file I am loading inside open refine should be the same as the file I have mentioned in the command prompt while running reconcile-csv jar file?
I got slightly lost as to how reconcile-csv is going to reference another file from inside open refine to do that fuzzy matching between 2 files. i tried googling it but there is very limited docs on the net and tht too not very explainatory.
 
thanking you,
yours sincerely,                           Santanu Chatterjee

 On Saturday, 27 September 2014 3:17 PM, Michael Bauer <[email protected]> wrote:

@meriteadrupal,Thanks for the long description of what you're trying to do.To run, java needs to know where exactly the .jar file is you are trying to run. So you can either run`java -jar reconcile-csv-0.1.1.jar if the jar is in the same folder or java -jar /path/to/reconcile-csv.jar if the jar is in a different folder. The same for the CSV file.Don't put the pointy brackets in - run e.g.java -jar reconcile-csv.jar spreadsheetB.csv "name" "Contact ID"reconcile-csv just works with the spreadsheet you are giving and exposes a reconciliation API towards Refine - the reconciliation itself will happen in Refine.This means: The CSV you give to it is your database export - the "Search Column" would be the name and the "ID column" would be the column with the database ID.Hope that helps.—
Reply to this email directly or view it on GitHub.

@mihi-tr
Copy link
Contributor

mihi-tr commented Oct 3, 2014

Hi there,

You will generally open one file (the one with the unique-IDs) in reconcile-csv and the other (the one where you want to introduce the IDs in refine.

Hope that helps.

@intellerati
Copy link

Hello Michael,

I got the .jar file running, thanks to your clarifications. Thank you so much. So we've made progress. However, I am getting a new error that I hope you can help me resolve:

Krista-Bradfords-MacBook-Pro-2:openrefine kristabradford$ java -jar reconcile-csv-0.1.1.jar inventorslist.csv “name” “Contact ID”
Exception in thread "main" clojure.lang.ArityException: Wrong number of args (4) passed to: core$-main
at clojure.lang.AFn.throwArity(AFn.java:437)
at clojure.lang.AFn.invoke(AFn.java:51)
at clojure.lang.AFn.applyToHelper(AFn.java:172)
at clojure.lang.AFn.applyTo(AFn.java:151)
at reconcile_csv.core.main(Unknown Source)

In troubleshooting, I noticed instruction that I needed to add the reconciliation service and thought that might have something to do with the issue above. So I went to Start Reconciling->Add Standard Service. I entered "http://localhost:8000/reconcile" without the quotes. It returned the error:

Error contacting recon service: timeout : timeout - http://localhost:8000/reconcile

I look forward to your thoughts.

Thank you!
Krista

@mihi-tr
Copy link
Contributor

mihi-tr commented Oct 4, 2014

Hi there,

Krista-Bradfords-MacBook-Pro-2:openrefine kristabradford$ java -jar reconcile-csv-0.1.1.jar inventorslist.csv “name” “Contact ID”
Exception in thread "main" clojure.lang.ArityException: Wrong number of args (4) passed to: core$-main
at clojure.lang.AFn.throwArity(AFn.java:437)
at clojure.lang.AFn.invoke(AFn.java:51)
at clojure.lang.AFn.applyToHelper(AFn.java:172)
at clojure.lang.AFn.applyTo(AFn.java:151)
at reconcile_csv.core.main(Unknown Source)

This looks wierd - are the dobule quotes the same as you posted? If yes -
try to replace them with ordinary double quotes (or single quotes).

@intellerati
Copy link

OK. The following produces no error. So I think we're making progress . . .

java -jar reconcile-csv-0.1.1.jar inventorslist.csv ‘name’ ‘Contact ID’

Just the cursor is returned, which i presume means it is working. Here, I'm unclear on what to do next.

Your instructions say:

Then add http://localhost:8000/reconcile as a reconciliation service to refine. You can add more columns through the reconcile-interface in Refine.

Then use
cell.recon.match.id

to get the ID from the match.

What I did:

I went to the Name column I want to reconcile in OpenRefine. In the dropdown I selected Start Reconciling->Add Standard Service. I entered "http://localhost:8000/reconcile" without the quotes. It doesn't seem to save or show up in list of services. It returned the error:

Error contacting recon service: timeout : timeout - http://localhost:8000/reconcile

Am I doing something wrong?

@mihi-tr
Copy link
Contributor

mihi-tr commented Oct 4, 2014

OK. The following produces no error. So I think we're making progress . . .

java -jar reconcile-csv-0.1.1.jar inventorslist.csv ‘name’ ‘Contact ID’

Just the cursor is returned, which i presume means it is working. Here, I'm unclear on what to do next.

Your instructions say:

Then add http://localhost:8000/reconcile as a reconciliation service to refine. You can add more columns through the reconcile-interface in Refine.

Then use
cell.recon.match.id

to get the ID from the match.

What I did:

I went to the Name column I want to reconcile in OpenRefine. In the dropdown I selected Start Reconciling->Add Standard Service. I entered "http://localhost:8000/reconcile" without the quotes. It doesn't seem to save or show up in list of services. It returned the error:

Error contacting recon service: timeout : timeout - http://localhost:8000/reconcile

Am I doing something wrong?

The cursor shouldn't be returned from the call - The program should stay
running. This is strange.

Data Diva | skype: mihi_tr | @mihi_tr
Open Knowledge | School of Data
http://okfn.org | http://schoolofdata.org
GPG/PGP key: http://tentacleriot.eu/mihi.asc

@intellerati
Copy link

I tried again and I misspoke. The cursor with the $ was not returned . . . just the rectangular square part of the cursor. So I presume it loaded. I still get the time-out error. FWIW.

@intellerati
Copy link

So i have managed to get Reconcile-csv to run in java. (Yes!) I went into OpenRefine. Selected "Start Reconciling" on the column "linkedin". ( I am using the URL as a unique identifier.) I went to Add Standard Service and entered http://localhost:8000/reconcile.

It took me to this screen. It showed as "working" but I never got beyond that point.

reconcileworking

In checking Terminal, it showed an exception error:

ExecutionException: java.lang.NullPointerException (See text below.)

Also, as I reviewed instructions --- there seems to be two sets of instructions -- I don't understand the difference between the two. One set of instructions involves the reconcile-csv-0.1.0-SNAPSHOT-standalone.jar file and the other features the reconcile-csv-0.1.1.jar. If it isn't an imposition could you fill me in on the difference? (I presume the first "stands alone" -- it isn't entirely obvious to me what that means and I don't see that detailed in the documentation.)

Also, i've seen instructions to point to http://localhost:8000/reconcile and to just http://localhost:8000/ I've tried both -- http://localhost:8000/reconcile is the one that gets me to the "working" screen.

(I hope you find my fumblings a wee bit entertaining . . .)

Krista-Bradfords-MacBook-Pro-2:openrefine kristabradford$ java -jar reconcile-csv-0.1.1.jar LinkedInCheck-standardizedURLs.csv ‘LinkedInStandardized’ ‘linkedin’Starting CSV Reconciliation service
Point refine to http://localhost:8000 as reconciliation service
2014-10-09 15:57:48.550:INFO:oejs.Server:jetty-7.x.y-SNAPSHOT
2014-10-09 15:57:48.579:INFO:oejs.AbstractConnector:Started [email protected]:8000
2014-10-09 15:58:08.619:WARN:oejs.AbstractHttpConnection:/reconcile
java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at clojure.core$deref_future.invoke(core.clj:2108)
at clojure.core$future_call$reify__6267.deref(core.clj:6308)
at clojure.core$deref.invoke(core.clj:2128)
at clojure.core$map$fn__4207.invoke(core.clj:2487)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:67)
at clojure.lang.RT.seq(RT.java:484)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$zipmap.invoke(core.clj:2713)
at reconcile_csv.core$reconcile_params.invoke(core.clj:131)
at reconcile_csv.core$reconcile.invoke(core.clj:141)
at reconcile_csv.core$fn__66.invoke(core.clj:212)
at compojure.core$make_route$fn__528.invoke(core.clj:94)
at compojure.core$if_route$fn__516.invoke(core.clj:40)
at compojure.core$if_method$fn__509.invoke(core.clj:25)
at compojure.core$routing$fn__534.invoke(core.clj:107)
at clojure.core$some.invoke(core.clj:2443)
at compojure.core$routing.doInvoke(core.clj:107)
at clojure.lang.RestFn.applyTo(RestFn.java:139)
at clojure.core$apply.invoke(core.clj:619)
at compojure.core$routes$fn__538.invoke(core.clj:112)
at ring.middleware.keyword_params$wrap_keyword_params$fn__1335.invoke(keyword_params.clj:32)
at ring.middleware.nested_params$wrap_nested_params$fn__1377.invoke(nested_params.clj:70)
at ring.middleware.params$wrap_params$fn__199.invoke(params.clj:58)
at ring.adapter.jetty$proxy_handler$fn__75.invoke(jetty.clj:18)
at ring.adapter.jetty.proxy$org.eclipse.jetty.server.handler.AbstractHandler$0.handle(Unknown Source)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:363)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:483)
at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:931)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:992)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:695)
Caused by:
java.lang.NullPointerException
at fuzzy_string.core$bigrams.invoke(core.clj:9)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:617)
at clojure.core$memoize$fn__5049.doInvoke(core.clj:5735)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at fuzzy_string.core$dice.invoke(core.clj:26)
at reconcile_csv.core$score$fuzzy_match__31.invoke(core.clj:78)
at clojure.core$map$fn__4207.invoke(core.clj:2487)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:484)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core.protocols$seq_reduce.invoke(protocols.clj:26)
at clojure.core.protocols$fn__6026.invoke(protocols.clj:53)
at clojure.core.protocols$fn__5979$G__5974__5992.invoke(protocols.clj:13)
at clojure.core$reduce.invoke(core.clj:6175)
at reconcile_csv.core$score.invoke(core.clj:80)
at clojure.lang.AFn.applyToHelper(AFn.java:163)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:619)
at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.core$map$fn__4207.invoke(core.clj:2487)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:484)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$sort.invoke(core.clj:2752)
at clojure.core$sort_by.invoke(core.clj:2769)
at clojure.core$sort_by.invoke(core.clj:2767)
at reconcile_csv.core$scores.invoke(core.clj:112)
at reconcile_csv.core$reconcile_param.invoke(core.clj:124)
at clojure.core$pmap$fn__6275$fn__6276.invoke(core.clj:6354)
at clojure.core$binding_conveyor_fn$fn__4107.invoke(core.clj:1836)
at clojure.lang.AFn.call(AFn.java:18)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)

My apologies in advance for the multiple posts. My hope is that there will be information in our discussion that will help other users.

Regards,
Krista

@mihi-tr
Copy link
Contributor

mihi-tr commented Oct 22, 2014

Krista,

First the two different names are simply two different versions of reconcile-csv.

Nevertheless, It's hard to say what the error is you are encountering. There seems to be something happening within the data that makes it barf.

Could you share your .csv file? Did you make sure there are no empty cells?

@sjg-transparency
Copy link

I'm very much in the same position as the original poster - I have very little experience of doing anything in command prompt so I'm probably doing something very obviously wrong, but I have absolutely no clue what.

I've download the file and stored it in FolderX. The Data I want to use as my reconciliation file i.e. the one with the IDs is in Folder Y. I've tried putting the below into the command prompt but it keeps on saying "Unable to access jarfile P:\FolderX\reconcile-csv-0.1.2.jar

java -jar "P:\FolderX\reconcile-csv-0.1.2.jar" P\FolderY\HouseOfLords-Donations-Reconciliation.csv "Name" "PersonID"

Am I missing something or is my computer being buggy? If I can get this going ASAP it'll save me a load of time with what I'm trying to do, so any swift assistance on this would be greatly appreciated.

@mihi-tr
Copy link
Contributor

mihi-tr commented Aug 11, 2015

Interesting issue, toe command line seems ok - if your path is correct. Can you dir P:\FolderX\reconcile-csv-0.1.2.jar . Also I do think the double quotes should not be needed...

@sjg-transparency
Copy link

Think I'm getting somewhere. It now seems to be having problems finding the file I want to reconcile from. You might have to spell things out very simply for me if I'm doing something wrong because I have no idea how this language works.

Thanks for your assistance

C:\Users\Steve>java -jar "P:_Shared files exc DSP\PROJECTS\Lobbying\Data\reconc
ile-csv-0.1.2.jar" "P:_Shared files exc DSP\PROJECTS\Lobbying\Data\HouseOfLords
-Donations-Reconciliation.csv" Name PersonID
Exception in thread "main" java.io.FileNotFoundException: P:_Shared files exc D
SP\PROJECTS\Lobbying\Data\HouseOfLords-Donations-Reconciliation.csv (The system
cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(Unknown Source)
at java.io.FileInputStream.(Unknown Source)
at clojure.java.io$fn__8638.invoke(io.clj:233)
at clojure.java.io$fn__8577$G__8542__8584.invoke(io.clj:73)
at clojure.java.io$fn__8650.invoke(io.clj:262)
at clojure.java.io$fn__8577$G__8542__8584.invoke(io.clj:73)
at clojure.java.io$fn__8612.invoke(io.clj:169)
at clojure.java.io$fn__8551$G__8546__8558.invoke(io.clj:73)
at clojure.java.io$reader.doInvoke(io.clj:106)
at clojure.lang.RestFn.invoke(RestFn.java:410)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.RestFn.applyTo(RestFn.java:132)
at clojure.core$apply.invoke(core.clj:619)
at clojure.core$slurp.doInvoke(core.clj:6278)
at clojure.lang.RestFn.invoke(RestFn.java:410)
at reconcile_csv.core$main$fn__2676.invoke(core.clj:238)
at clojure.lang.Atom.swap(Atom.java:51)
at clojure.core$swap_BANG
.invoke(core.clj:2161)
at reconcile_csv.core$_main.invoke(core.clj:238)
at clojure.lang.AFn.applyToHelper(AFn.java:167)
at clojure.lang.AFn.applyTo(AFn.java:151)
at reconcile_csv.core.main(Unknown Source)

@mihi-tr
Copy link
Contributor

mihi-tr commented Aug 12, 2015

I could be your folder names with a ton of spaces in them - not sure it handles those well. Could you move your files to the same folder? Or type cd "THEFOLDERWITHYOURFILE" and just run:
java -jar "P:_Shared files exc DSP\PROJECTS\Lobbying\Data\reconc
ile-csv-0.1.2.jar" thefile.csv name personID

@sjg-transparency
Copy link

Morning Michael,

Sorry for the very late reply. In the end I had to do this the long hard way, but I’ll try your solution the next time I need to reconcile some data.

Thanks for your time
Steve

From: Michael Bauer [mailto:[email protected]]
Sent: 12 August 2015 20:54
To: okfn/reconcile-csv
Cc: Steve Goodrich
Subject: Re: [reconcile-csv] Can't run reconcile-csv.jar file (#9)

I could be your folder names with a ton of spaces in them - not sure it handles those well. Could you move your files to the same folder? Or type cd "THEFOLDERWITHYOURFILE" and just run:
java -jar "P:_Shared files exc DSP\PROJECTS\Lobbying\Data\reconc
ile-csv-0.1.2.jar" thefile.csv name personID


Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-130426006.

@tfmorris
Copy link

tfmorris commented Oct 1, 2015

@abubelinha
Copy link

I had similar issues and finally it was a problem of a malformed csv.
Solution: I loaded myt text file with Openrefine, and used the "export as csv", as described here:
https://groups.google.com/forum/#!msg/openrefine/jgzicV-Bj9g/HhHu37x1AwAJ

After that, I could use it as a openrefine reconciliation service with reconcile-csv like this:

C:\currentpath> java -Xmx1g -jar C:\path-to-executable\reconcile-csv-0.1.2.jar C:\path-to-csv\myfile.csv "name" "id"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants