Reaching Decentralised Search Engine
------------------------------------ # just remarksYacy
---- - 17yr+ old, but still sort of immature & hardware inefficient - sadly enough the best p2p search engine we have - P2P - distributing index (DHT) + distributed search - own crawler - written in java -> poor performance -> [homepage] https://yacy.net -> [documentation-standalone] https://eldar.cz/yacydoc/ -> [github] https://github.com/yacy/yacy_search_server -> [okyacy fork] https://github.com/okybaca/yacy_search_server -> [forum] https://community.searchlab.eu/ -> [latest info] https://eldar.cz/news/source/@comp/@yacy/ -- author is concentrated on another project, cloud variant without p2p features -> lack of support - p2p remote search unusable for me - disable DHT in, enable DHT out as a broadcast & backup of crawled content - needs a lot of RAM - had to buy maximum RAM the server was capable to hold - to speed up, use a external solr - it needs some more RAM otherwise it crashes regularly freebsd - edit: /usr/local/etc/solr.in.sh SOLR_HEAP="2048m" --> [article, czech] Decentralizovaný vyhledávač YaCy: indexujte a vyhledávejte si po svém --> well... search - fulltext media search, available time to timebugs
---- - bad article date detection - https://github.com/Webhose/article-date-extractor/ could be solution? - or experiment with dates_in_content_dts ?xapian+recoll
------------- - perfect for a local library of eg. books - searching in books is sometimes even better than searching the web - books usually don't contain spam ;-) - optional web interface as standalone python script or apache module - able to OCR using tesseract: https://www.lesbonscomptes.com/recoll/usermanual/webhelp/docs/RCL.INSTALL.CONFIG.RECOLLCONF.OCR.htmlSearX
----- - metasearch - mature & usable - superseded by searxng - lot of plugins for various search engines (out of nature, the plugins change often, if some particular doesn't work, check github for latest version) - if you combine duckduckgo, startpage, bing & yahoo (and/or locals as seznam or yandex), you don't need google any more - privacy - interface between you and search engine - on the other hand sends all your requests to all of them - balance between: public instance = huge load // private instance = less privacy - able to bring together recoll and yacy in a single search page -> good interface to connect 'em all - lot of dependencies, in fact it builds on browsers to pull queries out of the search engines - lot of public instances - relatively fast (depends on number of search engines enabled) EOF Comments requested ~~~~~~~~~ Binary Sxizophreny - index of comp related stuff Kangaroo's Homepage (czech)