Use cURL from your Pwnbox (not the target machine) to obtain the source code of the "" website and filter all unique paths of that domain. Submit the number of these paths as the answer

Bearing in mind it was a while ago and my memory sucks, lol

For the query string, in this case no, as the items with different query strings are counted as different paths (I guess it depends on what your end game is, as index.php?page=home and index.php?page=about aren’t the same - even though they run from the same backend script) - although the word path does make me think of directory… anyhows…

I don’t believe there were any anchor/fragment links in the output, so there was no need to filter on that. But yes, you are right, usually you should stop at the # as well.

I tried to use this command but it gives me this error : “Out-File: Access to the path ‘/root/htb.txt’ is denied.”

For some reason the original command didn’t work but I used similar techniques to get the right answer.
curl | tr " " “\n” | grep “” | tr “'” ‘"’ | cut -d’"’ -f2 | sort -u | wc -l

This gives the answer 35 which has an extra entry that isn’t a link at the beginning so minus 1 is 34 which was the asnwer.

This question kept me up for days trying to find the right solution on my own but couldn’t come up with anything and instead of wasting my time I tried the commands which were given in the forum htb eventually it seemed to work, So I’ll jot down some of the commands that might help though it won’t produce the correct outcome:

curl | grep -o 'href="[^"]*' | sort | wc -l

for the above it has given the output 38 , which in a way is better than getting 53 and 20 .
|'^_^| , yeah I’ve got 53 and 20 too,

another approach is ,

curl | grep -Po "https?://[^'\"?#]*" | sort -u | wc -l

In this command the grep ‘P’ refers to a special way to look up for specific details in the given text, html files etc. I’ve got the answer as ‘33’ which is very close than the previous one .

I’ve tried other approaches too, but this seems to do the job as mentioned by you guys ,

curl | grep -Po “[^'\"]*” | sort -u | wc -l

For this it shows the correct output from the terminal and the submission is done .

if there is any better way to do this yes please. do inform .

This is a super difficult one.

The commands in here work, but it’s a shame you have to come here to cheat.

The should’ve taught more filtration methods or at least redirected you to much more useful info that would help you complete this task more easy. This is one of the things that is quite annoying about HTB - they don’t seem to nudge beginners in the right direction

To Whomever it Concerns:

It seems like the website cannot be accessed anymore. I did a nslookup, found the IP, and did a quick nmap, it says that host seems to be down. That would explain why none of my curl commands go through.

As I backup I tried wget --spider --recursive
and I tried wget --spider --recursive

Can someone confirm? I would really not want to lose a point.

It looks up and running to me, I can connect using spider and can use all curl commands in here
If it’s more about the point than the assignment you can find the answer in this forum

I want to go through the process myself as well and see the messages being returned. Any thoughts why mine isn’t connecting and times out?

Not quite sure, I am quite new as well only on like my 4th module at the moment. hope someone more experienced can help you with this
maybe if it keeps on going wrong you could take this to discord, it seems to be a lot more active and a lot quicker for getting answers

1 Like

curl "" | tr " " "\n" | grep -oE 'https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)' | grep inlane | sort -u | wc -l

tr " " “\n” → will replace all whitespaces to new lines
grep -E ‘regex’ → will extract all urls (found this regex on internet)
grep inlane → we want only that urls of inlanefreight
sort -u → will remove the duplicates
wc -l → to count total number of outputs

Hope this helps…


curl “h ttps://” | tr " " “\n” | cut -d “'” -f 2 | cut -d ‘"’ -f 2 | grep “h ttps://” | sort -u | wc -l
This command also output the correct answer. It can be break down like this:

  1. curl “h ttps://” will download the source code.
  2. tr " " “\n” will replace all the space by entering the new line. This step is to make sure all the unique paths will be in separate line.
  3. cut -d “'” -f 2 | cut -d ‘"’ -f 2 | grep “h ttps://” will search for all the lines that contain the pattern of “h ttps://”, setting the delimiters of ’ and " to remove and keep the field after the delimiters. After this step, we already had all the unique path of the domain. However, some of the paths are duplicated. Thus it will output the wrong answer if we just use wc -l command.
  4. As a result we need to remove all the duplicated lines by using sort -u. And the result will be correct which is 34.

Sorry as a new user, I 'm not allowed to put more than 2 links in my reply so I had to use space in the domain link

Personally I think the key here is have to read the source code and sort out how the pattern of how the unique paths are written then we can find the solution. Many thanks to every one who posted the solution before me as you all helped to understand the logic of solving the problem. My reply is just trying to explain more detailed.

I also have a question here. Can anyone tell me the differences between running the sort -u command and sort | uniq one?