This topic is locked

Prevent Web Scraping

6/25/2021 12:48:42 PM
PHPRunner General questions
W
wypman author

I have developed a website, that allows users to register with an email address and access will only be granted to a validated email. This part works fine.

I have discovered that a user has been extracting data, I presume using some kind of script or web scraping program by incrementing the ID in the URL. Is there a way of preventing this? I thought about limiting the number of views within a certain time period, but I'm not sure how or if this is the best way.

Does anyone know how to prevent this?

Dalkeith 6/25/2021

First off congratulations to have become so successful to get a web scraper...

https://blog.hartleybrody.com/prevent-scrapers/

Changing the URL is probably quite easy but Rate Limiting the specific IP Address sounds good although a lot of people have semi dynamic IPs now and I don't know if this is an internal or external system so that might limit too many people.

W
wypman author 6/25/2021

I was thinking along the lines of limiting the number of views per day, i.e. impliment a fair use policy. The user doing the scraping is using three accounts and three IP addresses, so they have to be logged in to view the data. If I limit the number of view and set up an alert when it is reached, the question is how?

D
david22585 6/26/2021

I'm just not sure if you can access the CAPTCHA on the PHPRunner events or not, but if so, I would log the last view timestamp of a user on a view page. Then setup a counter that calculates the time since the last view, and if it's less than say 2 minutes, add 1 to the count. WHen the count gets to 3, make the page forward to a add/edit page where they must enter a CAPTCHA to reset the counter. Or even have it to where it will keep the counter going, but if the time from the last access was greater than say 5 minutes since the last record was accessed, it would then reset the counter to 1 again.

A
AlphaBase 6/26/2021

by incrementing the ID in the URL. Is there a way of preventing this?

So they are seeing records they are not supposed to? Sounds like it might ba web security issue? You can set you data or pages by individual users or by groups. Maybe that would be the way to go?

admin 6/28/2021

As a first step make sure that users are not able to see data they are not supposed to see.

If they are allowed to see all pages and you just want to make things more difficult for them - there are some things that can be done but not much really. If yo uare talking about the View page, you can change the URL so it is no longer sequential. You can use a GUID-like field as a primary key or you can use a technique that Invoice template uses, where View page uses a special parameter named hash in the View page URL.

... invoices_view.php?hash=b460b1ca54ece8479469225d5b89b6dd

This will work to the certain extent but limiting the number of pages users can see over a peiod of time is a better option. Also make sure you explained website rules in terms and conditions of your service.