How to write an intranet windows file shares crawler
From wiki.eser.org
- goal: automatically find all windows file shares in a company intranet
[edit] cmb and cifs clients
- JCIFS -- Open Source client library that implements the CIFS/SMB networking protocol in 100% Java.
[edit] implementations
- PunkSearch SMB/FTP indexing/searching engine with web interface -- written in Java using JCIFS and Lucene
- open-source
- ShareHound SMB/FTP indexer and crawer -- written in Java using JCIFS and Lucene
- open-source
[edit] TODO
- use ESER algorithm to text-mine intranet documents