I wrote a program that sends large amounts of requests to a site.
At the moment the whole thing is running via PHP (curl_multi) at a web host, so I'm just trying out whether it is faster with Python.
Such a web host is also quite annoying if, for example, modules are missing and you do not have root access or the like, in addition, it does not support node.js.
I thought about using Node.js to increase the speed even more. (But it's more because I just care.
I could do the whole thing on my laptop or alternatively I thought about buying a Raspberry Pi.
Problems:
Laptop should always be on because the inquiries run continuously (at certain times of the day). But the laptop shouldn't always be on.
Internet speed here is okay, but now and then also failures and then that would not work.
With the hoster I pay 2 euro per month for unlimited PHP limits, Python and Perl only via cgi
I also have a website or web app that I run, which I would leave with the host either way.
Another alternative would be a V-Server with hourly billing (that would be a maximum of 2 euro per month). But then only for my scripts and not for my websites because I don't really know about servers (and the 7 euro / month managed servers are not worth it to me)
Free is always difficult. Theoretically, something like this could possibly run on an EC2 Micro instance or on an EC2 Spot instance. With the latter there's of course no guarantee, as availability and price must match.
Do you access a lot of data very quickly from just one website? Be careful not to get a report, because that can be interpreted as an attack.
In addition, you may not use the data obtained in this way, let alone publish it.
And for crawling itself: PHP is at most unsuitable for this! With Python, 10,000 requests per second are easily possible, but as I said, NOT on the same server, but distributed over thousands of them.
The inquiries go to a huge provider who automatically distributes the inquiries to many servers. The publication takes place according to the fair use principle, as determined by the rights owner. The requested URLs are so simple that there's no dissemination of data that is not intended for this purpose.