Use cURL from your Pwnbox (not the target machine) to obtain the source code of the "https://www.inlanefreight.com" website and filter all unique paths of that domain. Submit the number of these paths as the answer

Bearing in mind it was a while ago and my memory sucks, lol

For the query string, in this case no, as the items with different query strings are counted as different paths (I guess it depends on what your end game is, as index.php?page=home and index.php?page=about aren’t the same - even though they run from the same backend script) - although the word path does make me think of directory… anyhows…

I don’t believe there were any anchor/fragment links in the output, so there was no need to filter on that. But yes, you are right, usually you should stop at the # as well.

I tried to use this command but it gives me this error : “Out-File: Access to the path ‘/root/htb.txt’ is denied.”

For some reason the original command didn’t work but I used similar techniques to get the right answer.
curl https://www.inlanefreight.com | tr " " “\n” | grep “www.inlanefreight.com” | tr “'” ‘"’ | cut -d’"’ -f2 | sort -u | wc -l

This gives the answer 35 which has an extra entry that isn’t a link at the beginning so minus 1 is 34 which was the asnwer.

This question kept me up for days trying to find the right solution on my own but couldn’t come up with anything and instead of wasting my time I tried the commands which were given in the forum htb eventually it seemed to work, So I’ll jot down some of the commands that might help though it won’t produce the correct outcome:

curl https://www.inlanefreight.com | grep -o 'href="[^"]*' | sort | wc -l

for the above it has given the output 38 , which in a way is better than getting 53 and 20 .
|'^_^| , yeah I’ve got 53 and 20 too,

another approach is ,

curl https://www.inlanefreight.com | grep -Po "https?://www.inlanefreight.com/[^'\"?#]*" | sort -u | wc -l

In this command the grep ‘P’ refers to a special way to look up for specific details in the given text, html files etc. I’ve got the answer as ‘33’ which is very close than the previous one .

I’ve tried other approaches too, but this seems to do the job as mentioned by you guys ,

curl https://www.inlanefreight.com | grep -Po “https://www.inlanefreight.com/[^'\"]*” | sort -u | wc -l

For this it shows the correct output from the terminal and the submission is done .

if there is any better way to do this yes please. do inform .

1 Like

This is a super difficult one.

The commands in here work, but it’s a shame you have to come here to cheat.

The should’ve taught more filtration methods or at least redirected you to much more useful info that would help you complete this task more easy. This is one of the things that is quite annoying about HTB - they don’t seem to nudge beginners in the right direction

3 Likes

To Whomever it Concerns:

It seems like the website cannot be accessed anymore. I did a nslookup, found the IP, and did a quick nmap, it says that host seems to be down. That would explain why none of my curl commands go through.

As I backup I tried wget --spider --recursive https://www.inlanefreight.com
and I tried wget --spider --recursive www.inlanefreight.com

Can someone confirm? I would really not want to lose a point.

1 Like

It looks up and running to me, I can connect using spider and can use all curl commands in here
If it’s more about the point than the assignment you can find the answer in this forum

I want to go through the process myself as well and see the messages being returned. Any thoughts why mine isn’t connecting and times out?

Not quite sure, I am quite new as well only on like my 4th module at the moment. hope someone more experienced can help you with this
maybe if it keeps on going wrong you could take this to discord, it seems to be a lot more active and a lot quicker for getting answers

1 Like

curl "https://www.inlanefreight.com" | tr " " "\n" | grep -oE 'https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)' | grep inlane | sort -u | wc -l

tr " " “\n” → will replace all whitespaces to new lines
grep -E ‘regex’ → will extract all urls (found this regex on internet)
grep inlane → we want only that urls of inlanefreight
sort -u → will remove the duplicates
wc -l → to count total number of outputs

Hope this helps…

4 Likes

curl “h ttps://www.inlanefreight.com” | tr " " “\n” | cut -d “'” -f 2 | cut -d ‘"’ -f 2 | grep “h ttps://www.inlanefreight.com” | sort -u | wc -l
This command also output the correct answer. It can be break down like this:

  1. curl “h ttps://www.inlanefreight.com” will download the source code.
  2. tr " " “\n” will replace all the space by entering the new line. This step is to make sure all the unique paths will be in separate line.
  3. cut -d “'” -f 2 | cut -d ‘"’ -f 2 | grep “h ttps://www.inlanefreight.com” will search for all the lines that contain the pattern of “h ttps://www.inlanefreight.com”, setting the delimiters of ’ and " to remove and keep the field after the delimiters. After this step, we already had all the unique path of the domain. However, some of the paths are duplicated. Thus it will output the wrong answer if we just use wc -l command.
  4. As a result we need to remove all the duplicated lines by using sort -u. And the result will be correct which is 34.

Sorry as a new user, I 'm not allowed to put more than 2 links in my reply so I had to use space in the domain link

Personally I think the key here is have to read the source code and sort out how the pattern of how the unique paths are written then we can find the solution. Many thanks to every one who posted the solution before me as you all helped to understand the logic of solving the problem. My reply is just trying to explain more detailed.

I also have a question here. Can anyone tell me the differences between running the sort -u command and sort | uniq one?
Thanks

1 Like

Not sure why but i keep getting “permission denied”

try this ==> curl -s https://www.inlanefreight.com | tr ‘"’ ‘\n’ | tr “'” “\n” | grep “www.inlanefreight.com/” | sort -u | wc -l

i was able to solve this question using regax

curl -s 'https://www.inlanefreight.com' | grep -Po "https://www\.inlanefreight\.com.*?(?='|\")" | sort | uniq | wc -l 

regax explain :

  • .*? will only match the characters before the first quote it encounters.
  • (?='|\") regax group to match single and double quotes but exclude it from the result
3 Likes

I can’t even reach this site that is listed in the task using either my Kali Box connected to the Academy VPN or through the Pwnbox like it mentions in the actual question.

Has anyone else had this issue?

1 Like

I had the same issue:

"curl: (28) Failed to connect to www.inlanefreight.com port 443 after 130037 ms: Couldn’t connect to server "

idk why we can’t reach it. Kinda silly bc now I don’t know if I’m even using the right commands or not.

After looking around I found an answer: 34

1 Like

Sadly, did have to come find this thread to check if it was me messing up the command. Tried several of the other commands other people suggested and still no luck. Finally tried it outside of the VPN, and worked fine. Seems like anyone having the issue of “curl: (6) Could not resolve host: www.inlanefreight.com” isn’t a them issue, but a VPN issue.

2 Likes

My solution was:

cat site.txt|grep -o “https://www.inlanefreight.com[^\”\']*"|sort|uniq|wc -l

Here site.txt is the output of curl,then i use grep that would show only the words that contain the domain and then any amount of characters until it reaches " or ’ where " and ’ basically mean the path ended.Then i sort it to be able to use uniq and count the lines.I know it uses regex but personally i think it is much easier this way

i did something i think easy but a long way

curl https://www.inlanefreight.com > abc.txt
cat abc.txt | tr " " “\n” | grep -i “https://www.inlanefreight.com/” > abc2.txt
cat abc2.txt| tr “>” “\n” | tr ‘"’ “'” | grep -i “same link” | sort -u | wc -l

even though i can do that in a short way but in first time i did it that way replaced space with \n new line and then > to \n and all " ’ " to ’ " ’ (single quote to double) and got the answer

I used a very repetitive combination of tr ’ ’ ‘\n’ and grep http://www.inlanefreight.com like this one:

curl -s https://www.inlanefreight.com | tr ' ' '\n' | grep https://www.inlanefreight.com | tr '"' '\n' | grep https://www.inlanefreight.com | tr "'" '\n' | grep https://www.inlanefreight.com

It does look unholy, but it as I was trying each iteration, it helped me to get comfortable with the use of both commands (tr and grep)

After that I only needed to sort the result with sort -u (to avoid duplicates), and then use wc -l.