Gowator Posted September 20, 2004 Report Share Posted September 20, 2004 (edited) OK the challenge is... How to go about automatically finding similarities between pictures/scanned documents.. Some sort of checksome or whatever but i have say millions of documents and I know some are duplicated.. the idea is to give each a 'score' of some sort which describes it and then find similar ones which could be the same or very close. Edited September 20, 2004 by Gowator Quote Link to comment Share on other sites More sharing options...
arctic Posted September 20, 2004 Report Share Posted September 20, 2004 so... you are searching for a cataloging tool for your documents?... hard to come by. for pictures you could do something with kimdaba i guess (never really used it) and for documents you need to add some keywords somehow. i always searched for a good text-cataloging tool but never found one. i will let you know, once i find some answers. Quote Link to comment Share on other sites More sharing options...
Gowator Posted September 20, 2004 Author Report Share Posted September 20, 2004 not quite, well i am but this ios before that.... Say I have lods of Cd images I can make a MD5 sum of them and if 2 of them have the same MD5 I can be certain they are the same CD (1:alot) What I want is a way to find pictures which are similar but not quite the same..for instance 2 scans of the same document where one is offset by say 1cm... or one has a torn page wheras an earlier copy didnt? Quote Link to comment Share on other sites More sharing options...
arctic Posted September 20, 2004 Report Share Posted September 20, 2004 uuh... jesus... what you are asking here is a very "intelligent" software. i can tell you that even at our newspaper, the image databases we use don't have anything as powerful as what you are asking for. ... i think you gotta code that by yourself, mate. :P Quote Link to comment Share on other sites More sharing options...
sellis Posted September 20, 2004 Report Share Posted September 20, 2004 What may be better is not a direct comparison, but instead creating several simple "similarity measures", such as average color, or something more sophisticated such as the primary components of a FFT on the image. It would then be possible to create a graphical space where similar photos are closer together. It's been done for UK Parliament voting records by the guys at The Public Whip. The beauty of it is that you can do this without necessarily knowing what the axes actually mean. In an ideal world, you would put tiny thumbnails where the images were, and have zoom in/out and pan capabilities on the applet, and then have right-click menus to delete, move or group images together. Of course, actually doing this is a big job! Quote Link to comment Share on other sites More sharing options...
papaschtroumpf Posted September 20, 2004 Report Share Posted September 20, 2004 if those pictures are mostly scans of typed text documents, you might want to look for OCR (Optical Character Recognition) software to turn them into text (generally designed to work with a scanner but it might still work with existing pictures) and compare that. Again this probably wouldn't work if you have say thousands of customer agreements where only the customer name and date changes, because the documents would still be 98% identical. Sounds like a fun challenge, let us know if you find a solution, even a partial one. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.