I’m running into the same issue, have you made any progress? or could someone please drop a clue or a breadcrumb or something. Ive been stuck on this for a while now.
Thanks for the help, it sucks that they only accept 1 answer.
Btw I have tried a lot of answers for this question too: “Use cURL from your Pwnbox (not the target machine) to obtain the source code of the “https://www.inlanefreight.com” website and filter all unique paths of that domain. Submit the number of these paths as the answer.”. Anyone got a clue?
Thanks for the help, it sucks that they only accept 1 answer.
Btw I have tried a lot of answers for this question too: “Use cURL from your Pwnbox (not the target machine) to obtain the source code of the “https://www.inlanefreight.com” website and filter all unique paths of that domain. Submit the number of these paths as the answer.”. Anyone got a clue?
Yes. Start with cURL. Tty that command alone, we get the source code of the website. Now pipe it to grep. With grep, we find all of the lines that include a path of the domain (https://www.inlanefreight.com/*). Then you just have to find a way to count only the unique outputs of grep.
I still dont know what they want. Do they just the href lines like the urls ad you have to cut away all the ■■■■ beforehand or just the lines ? This is really confusing because i dont get the question right… i notedthis with a lot of questions in academy that the use other words like index instead of inode etc. very hard to know what they actually want…
I still dont know what they want. Do they just the href lines like the urls ad you have to cut away all the ■■■■ beforehand or just the lines ? This is really confusing because i dont get the question right… i notedthis with a lot of questions in academy that the use other words like index instead of inode etc. very hard to know what they actually want…
@jinbu , I’m stuck at this stupid question… I’ve tried everything. I even scraped each and every link in the website (I know web development and web scraping). After getting really frustrated, I started putting in numbers (from 1 up to 75) as answer, but none got accepted… Can somebody please give me an answer, so I can just move on to next exercises (I have a decent understanding of HTML, so I won’t be losing anything is I don’t solve this one…).
Note: I used my own PC instead of pwnbox, because I had to use jupyter and had to take my time coding web scraper. Since pwnbox works a little slow for me probable because of my internet connection…
Hey guys, i managed to did it by
using grep to get lines with http
using vi and regexp pattern replacement to remove everything before and after the url
using then sort,uniq and wc gave the good output
some tips: use grep with option -o to only get the matching part instead of the whole line. They are asking you for all https paths with the inlanefreight.com domain. Beware of paths with url parameters (?). Give a look at regex to match the required strings.
ps. the solution is under 75
At a guess, based on the previous comments, I’d say you are getting every reference not just the ones you want. For example, if there is a link to a javascript hosted on a CDN, your grep will catch that.
Ok, so I got the number of unique URLs. I don’t think it was as clean of an approach as it could be. I don’t want to leave a spoiler. I’d like to tell someone how I got the answer and see if they could show me a better way. I used grep and could see four extra lines that I knew needed to be filtered. So I just subtracted them.
What i just did was first i checked the web page using curl then i checked the unique features of the link and i found that the links had " //www.inlanefreight.com " in common so i just grep that out in a tabular format in a temp file. After that i opened that file and further checked for some more unique features of the links and found that they all had “li” word in common so i grep out the lines li word and then saved that in tabular format in a new file temp1. then i manually removed some of unwanted results like starting with a or ul and then just wc -l the temp1 and i got what i wanted.
| sed ‘s/https/$/g’ - to replace all “https” to $ sign
| cut -d ‘$’ -f2 - to cut 2nd field after $ delimiter
| cut -d ’ ’ -f1 - to cut 1st field till “space”
| cut -f1 - to cut 1st field till “\t”
| sort - sort to uniq
| uniq
Then just count “uniq” paths manually (I’ve got only 30 if it must contain DOMAIN_NAME)
OR you can use below command to filter all paths (not only domain)
curl DOMAIN_NAME | grep ‘https’ | sed ‘s/https/$/g’ | cut -d ‘$’ -f2 | cut -d ’ ’ -f1 | cut -f1 | sort | uniq
You can add " | wc -l " to this to count number of lines. However the answer is not correct =(
Q. Use cURL from your Pwnbox (not the target machine) to obtain the source code of the “https://www.inlanefreight.com” website and filter all unique paths of that domain. Submit the number of these paths as the answer.
steps 1
curl https://put given link > test.txt
step 2
cat test.txt | tr " " “\n” | cut -d “'” -f2 | cut -d ‘"’ -f2 | grep “put given link” > data.txt
step 3
open data.txt file in subl text and delete duplicates
Now click “Edit” > “Sort Lines” to sort the lines by value.
Now click “Edit” > “Permute Lines” > “Unique” to remove duplicate values.
Save the file
step 4
cat date.txt | wc -l