Jump to content

Picture challenge


Gowator
 Share

Recommended Posts

OK the challenge is...

How to go about automatically finding similarities between pictures/scanned documents..

 

Some sort of checksome or whatever but i have say millions of documents and I know some are duplicated..

 

the idea is to give each a 'score' of some sort which describes it and then find similar ones which could be the same or very close.

Edited by Gowator
Link to comment
Share on other sites

so... you are searching for a cataloging tool for your documents?... hard to come by. for pictures you could do something with kimdaba i guess (never really used it) and for documents you need to add some keywords somehow. i always searched for a good text-cataloging tool but never found one.

i will let you know, once i find some answers.

Link to comment
Share on other sites

not quite, well i am but this ios before that....

 

Say I have lods of Cd images I can make a MD5 sum of them and if 2 of them have the same MD5 I can be certain they are the same CD (1:alot)

 

What I want is a way to find pictures which are similar but not quite the same..for instance 2 scans of the same document where one is offset by say 1cm... or one has a torn page wheras an earlier copy didnt?

Link to comment
Share on other sites

uuh... jesus... what you are asking here is a very "intelligent" software. i can tell you that even at our newspaper, the image databases we use don't have anything as powerful as what you are asking for.

...

i think you gotta code that by yourself, mate. :P

Link to comment
Share on other sites

What may be better is not a direct comparison, but instead creating several simple "similarity measures", such as average color, or something more sophisticated such as the primary components of a FFT on the image. It would then be possible to create a graphical space where similar photos are closer together.

 

It's been done for UK Parliament voting records by the guys at The Public Whip. The beauty of it is that you can do this without necessarily knowing what the axes actually mean.

 

In an ideal world, you would put tiny thumbnails where the images were, and have zoom in/out and pan capabilities on the applet, and then have right-click menus to delete, move or group images together.

 

Of course, actually doing this is a big job!

Link to comment
Share on other sites

if those pictures are mostly scans of typed text documents, you might want to look for OCR (Optical Character Recognition) software to turn them into text (generally designed to work with a scanner but it might still work with existing pictures) and compare that.

Again this probably wouldn't work if you have say thousands of customer agreements where only the customer name and date changes, because the documents would still be 98% identical.

Sounds like a fun challenge, let us know if you find a solution, even a partial one.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...