2018 Scrapy Environment Enhance(2)Proxy to Tor Network

2018 Scrapy Environment Enhance(2)Proxy to Tor Network

Follow this Blog and Set the Proxy

https://gist.github.com/DusanMadar/8d11026b7ce0bce6a67f7dd87b999f6b

https://stackoverflow.com/questions/45009940/scrapy-with-privoxy-and-tor-how-to-renew-ip

Install Tor thing and Verify

> apt update

> apt install tor

Install Client Tool

> apt install netcat

Set Up Tor

> echo “ControlPort 9051” >> /etc/tor/torrc

> echo HashedControlPassword $(tor –hash-password “mypassword” | tail -n 1) >> /etc/tor/torrcpassword

Start Tor

> service tor start

Exception:

/etc/init.d/tor: line 140: ulimit: open files: cannot modify limit: Operation not permitted

Solution:

?

Verify the thing

> echo -e ‘AUTHENTICATE “password”‘ | nc 127.0.0.1 9051

Check the Public IP

> apt install curl

My Local IP

> curl http://icanhazip.com/

xxx.xxx.244.5

My Proxy IP

> torify curl http://icanhazip.com/

185.56.80.242

Change the IP

> echo -e ‘AUTHENTICATE “password”\r\nsignal NEWNYM\r\nQUIT’ | nc 127.0.0.1 9051

> torify curl http://icanhazip.com/

51.15.86.162

Change and Check IP in Python

> pip install stem

> python

Python 3.5.2 (default, Nov 23 2017, 16:37:01)

[GCC 5.4.0 20160609] on linux

Type “help”, “copyright”, “credits” or “license” for more information.

>>>

>>> from stem import Signal

>>> from stem.control import Controller

>>> with Controller.from_port(port=9051) as controller:

… controller.authenticate()

… controller.signal(Signal.NEWNYM)



>>> exit()

> torify curl http://icanhazip.com/

142.4.211.161

Install Privoxy and Check

> apt install privoxy

Configure to connect to Tor

> echo “forward-socks5t / 127.0.0.1:9050 .” >> /etc/privoxy/config

Start the Service

> service privoxy start

> curl -x 127.0.0.1:8118 http://icanhazip.com/

142.4.211.161

Check all the things in Python3

> pip install requests

> python

Python 3.5.2 (default, Nov 23 2017, 16:37:01)

[GCC 5.4.0 20160609] on linux

Type “help”, “copyright”, “credits” or “license” for more information.

>>>

>>> import requests

>>> from stem import Signal

>>> from stem.control import Controller

>>> response = requests.get(‘http://icanhazip.com/’, proxies={‘http’: ‘127.0.0.1:8118’})

>>> response.text.strip()

‘142.4.211.161’

>>> with Controller.from_port(port=9051) as controller:

… controller.authenticate(password=’password’)

… controller.signal(Signal.NEWNYM)



>>> response = requests.get(‘http://icanhazip.com/’, proxies={‘http’: ‘127.0.0.1:8118’})

>>> response.text.strip()

‘95.128.43.164’

At least it works there in Scrapy framework

class ChromeHeadlessMiddleware(object):

def process_request(self, request, spider):

#by pass the access deny

#https://intoli.com/blog/making-chrome-headless-undetectable/

options = webdriver.ChromeOptions()

options.add_argument(‘headless’)

options.add_argument(‘no-sandbox’)

options.add_argument(‘window-size=800×600’)

options.add_argument(‘user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36’)

options.add_argument(‘–proxy-server=127.0.0.1:8118’)

browser = webdriver.Chrome(chrome_options=options)

browser.switch_to.window(browser.window_handles[0])

browser.get(request.url)

body = browser.page_source

return HtmlResponse(browser.current_url, body=body, encoding=’utf-8′, request=request)

References:

http://sillycat.iteye.com/blog/2418229

http://neuralfoundry.com/scrapy-in-a-container-docker-development-environment/

https://github.com/dataisbeautiful/scrapy-development-docker

https://github.com/scrapy-plugins/scrapy-splash

https://www.jianshu.com/p/4052926bc12c

https://www.cnblogs.com/jclian91/p/8590617.html

IP Proxy Setting

https://free-proxy-list.net/

https://github.com/cnu/scrapy-random-useragent

https://github.com/aivarsk/scrapy-proxies

https://gist.github.com/seagatesoft/e7de4e3878035726731d

https://stackoverflow.com/questions/28852057/change-ip-address-dynamically

http://danielphil.github.io/raspberrypi/http/proxy/2015/04/01/raspberry-pi-http-proxy.html

https://docs.proxymesh.com/article/4-python-proxy-configuration

https://gist.github.com/DusanMadar/8d11026b7ce0bce6a67f7dd87b999f6b