Reaching Decentralised Search Engine


Reaching Decentralised Search Engine
------------------------------------
# just remarks



Yacy
----
- 17yr+ old, but still sort of immature & hardware inefficient
  - sadly enough the best p2p search engine we have
- P2P - distributing index (DHT) + distributed search
- own crawler
- written in java -> poor performance

-> [homepage] https://yacy.net
-> [documentation-standalone] https://eldar.cz/yacydoc/
-> [github] https://github.com/yacy/yacy_search_server
-> [okyacy fork] https://github.com/okybaca/yacy_search_server
-> [forum] https://community.searchlab.eu/
-> [latest info] https://eldar.cz/news/source/@comp/@yacy/

-- author is concentrated on another project, cloud variant without p2p features
   -> lack of support

- p2p remote search unusable for me - disable DHT in, enable DHT out as a
  broadcast & backup of crawled content

- needs a lot of RAM
- had to buy maximum RAM the server was capable to hold

- to speed up, use a external solr
  - it needs some more RAM otherwise it crashes regularly
    freebsd - edit: /usr/local/etc/solr.in.sh
    SOLR_HEAP="2048m"


--> [article, czech] Decentralizovaný vyhledávač YaCy: indexujte a vyhledávejte si po svém

--> well... search - fulltext media search, available time to time

bugs
----
- bad article date detection
 - https://github.com/Webhose/article-date-extractor/ could be solution?
 - or experiment with dates_in_content_dts ?



xapian+recoll
-------------
- perfect for a local library of eg. books
- searching in books is sometimes even better than searching the web - books
usually don't contain spam ;-)
- optional web interface as standalone python script or apache module
- able to OCR using tesseract:
https://www.lesbonscomptes.com/recoll/usermanual/webhelp/docs/RCL.INSTALL.CONFIG.RECOLLCONF.OCR.html



SearX
-----
- metasearch
- mature & usable
- superseded by searxng
- lot of plugins for various search engines (out of nature, the plugins
  change often, if some particular doesn't work, check github for latest
  version)
- if you combine duckduckgo, startpage, bing & yahoo (and/or locals as seznam or
  yandex), you don't need google any more
- privacy - interface between you and search engine - on the other hand
  sends all your requests to all of them
  - balance between:  public instance = huge load // private instance = less privacy
- able to bring together recoll and yacy in a single search page
 -> good interface to connect 'em all
- lot of dependencies, in fact it builds on browsers to pull queries out of
  the search engines
- lot of public instances
- relatively fast (depends on number of search engines enabled)




EOF

Comments requested

~~~~~~~~~
Binary Sxizophreny - index of comp related stuff
Kangaroo's Homepage (czech)