Question about NPM on Linux Fundamentals module of Accademy

I’m running into the same issue, have you made any progress? or could someone please drop a clue or a breadcrumb or something. Ive been stuck on this for a while now.

-Thanks

Spoiler Removed

1 Like

@hackerookie said:

just “http-server -p 8080” !!!

Thanks for the help, it sucks that they only accept 1 answer.
Btw I have tried a lot of answers for this question too: “Use cURL from your Pwnbox (not the target machine) to obtain the source code of the “https://www.inlanefreight.com” website and filter all unique paths of that domain. Submit the number of these paths as the answer.”. Anyone got a clue?

Type your comment> @ttnoob said:

@hackerookie said:

just “http-server -p 8080” !!!

Thanks for the help, it sucks that they only accept 1 answer.
Btw I have tried a lot of answers for this question too: “Use cURL from your Pwnbox (not the target machine) to obtain the source code of the “https://www.inlanefreight.com” website and filter all unique paths of that domain. Submit the number of these paths as the answer.”. Anyone got a clue?

Yes. Start with cURL. Tty that command alone, we get the source code of the website. Now pipe it to grep. With grep, we find all of the lines that include a path of the domain (https://www.inlanefreight.com/*). Then you just have to find a way to count only the unique outputs of grep.

@CyberPatDall

I still dont know what they want. Do they just the href lines like the urls ad you have to cut away all the crap beforehand or just the lines ? This is really confusing because i dont get the question right… i notedthis with a lot of questions in academy that the use other words like index instead of inode etc. very hard to know what they actually want…

Thanks for a tip

Type your comment> @WHLSW said:

@CyberPatDall

I still dont know what they want. Do they just the href lines like the urls ad you have to cut away all the crap beforehand or just the lines ? This is really confusing because i dont get the question right… i notedthis with a lot of questions in academy that the use other words like index instead of inode etc. very hard to know what they actually want…

Thanks for a tip

They want all the unique urls related to the domain inlanefreight.com (excluding https://www.inlanefreight.com itself) So find a way to filter out irrelevant hrefs. gl :smiley:

@jinbu , I’m stuck at this stupid question… I’ve tried everything. I even scraped each and every link in the website (I know web development and web scraping). After getting really frustrated, I started putting in numbers (from 1 up to 75) as answer, but none got accepted… Can somebody please give me an answer, so I can just move on to next exercises (I have a decent understanding of HTML, so I won’t be losing anything is I don’t solve this one…).

Note: I used my own PC instead of pwnbox, because I had to use jupyter and had to take my time coding web scraper. Since pwnbox works a little slow for me probable because of my internet connection…

Hey guys, i managed to did it by
using grep to get lines with http
using vi and regexp pattern replacement to remove everything before and after the url
using then sort,uniq and wc gave the good output

some tips: use grep with option -o to only get the matching part instead of the whole line. They are asking you for all https paths with the inlanefreight.com domain. Beware of paths with url parameters (?). Give a look at regex to match the required strings.
ps. the solution is under 75

curl https://www.inlanefreight.com/ | grep -0 “http” | wc -l gave me 64
curl https://www.inlanefreight.com/ | grep -0 “https” | wc -l gave gave me 62

Both are under 75. I’m getting wrong answers.

Can anyone tell me what I’m doing wrong?

@MountainBoy said:

curl https://www.inlanefreight.com/ | grep -0 “http” | wc -l gave me 64
curl https://www.inlanefreight.com/ | grep -0 “https” | wc -l gave gave me 62

Both are under 75. I’m getting wrong answers.

Can anyone tell me what I’m doing wrong?

At a guess, based on the previous comments, I’d say you are getting every reference not just the ones you want. For example, if there is a link to a javascript hosted on a CDN, your grep will catch that.

The suggestion here might help: Question about NPM on Linux Fundamentals module of Accademy — Hack The Box :: Forums

It looks like you need to find a way to narrow it down to only http/https links you want, not all of them.

Also, what is -O on grep? I don’t think I’ve seen it before and I cant find it on the manpages.

Ok, so I got the number of unique URLs. I don’t think it was as clean of an approach as it could be. I don’t want to leave a spoiler. I’d like to tell someone how I got the answer and see if they could show me a better way. I used grep and could see four extra lines that I knew needed to be filtered. So I just subtracted them.

This is a weird challenge, and even after getting it, i still believe this challenge needs to be reworked.

Great concept, love the work HTB is doing.

Message me for how to

What i just did was first i checked the web page using curl then i checked the unique features of the link and i found that the links had " //www.inlanefreight.com " in common so i just grep that out in a tabular format in a temp file. After that i opened that file and further checked for some more unique features of the links and found that they all had “li” word in common so i grep out the lines li word and then saved that in tabular format in a new file temp1. then i manually removed some of unwanted results like starting with a or ul and then just wc -l the temp1 and i got what i wanted.

This is probably gonna get taken down, but its 34

2 Likes

I couldn’t get it “properly”, but I was off by 1. I have no idea why.

This is really a quite poor design challenge imo.

After a ton of regex I finally got the right answer the following grep command

grep -Po "(?<=[\"\'])https:\/\/www\.inlanefreight\.com\/.*?(?=[\"\'])"

For more details about the regex, you can paste this

/(?<=[\"\'])https:\/\/www\.inlanefreight\.com\/.*(?=[\"\'])/gmU

into the website https://regex101.com/ to have a better understanding

2 Likes

curl DOMAIN_NAME | grep ‘DOMAIN_NAME’ | sed ‘s/https/$/g’ | cut -d ‘$’ -f2 | cut -d ’ ’ -f1 | cut -f1 | sort | uniq

  1. curl DOMAIN_NAME - html code
  2. | grep ‘DOMAIN_NAME’ - filer out all lines containing www.inlanefreight.com
  3. | sed ‘s/https/$/g’ - to replace all “https” to $ sign
  4. | cut -d ‘$’ -f2 - to cut 2nd field after $ delimiter
  5. | cut -d ’ ’ -f1 - to cut 1st field till “space”
  6. | cut -f1 - to cut 1st field till “\t”
  7. | sort - sort to uniq
  8. | uniq

Then just count “uniq” paths manually (I’ve got only 30 if it must contain DOMAIN_NAME)

OR you can use below command to filter all paths (not only domain)
curl DOMAIN_NAME | grep ‘https’ | sed ‘s/https/$/g’ | cut -d ‘$’ -f2 | cut -d ’ ’ -f1 | cut -f1 | sort | uniq

You can add " | wc -l " to this to count number of lines. However the answer is not correct =(

1 Like

Q. Use cURL from your Pwnbox (not the target machine) to obtain the source code of the “https://www.inlanefreight.com” website and filter all unique paths of that domain. Submit the number of these paths as the answer.
steps 1
curl https://put given link > test.txt

step 2
cat test.txt | tr " " “\n” | cut -d “’” -f2 | cut -d ‘"’ -f2 | grep “put given link” > data.txt
step 3
open data.txt file in subl text and delete duplicates

Now click “Edit” > “Sort Lines” to sort the lines by value.
Now click “Edit” > “Permute Lines” > “Unique” to remove duplicate values.
Save the file
step 4
cat date.txt | wc -l

ans= 34

Q. Use cURL from your Pwnbox (not the target machine) to obtain the source code of the “https://www.inlanefreight.com” website and filter all unique paths of that domain. Submit the number of these paths as the answer.
steps 1
curl https://put given link > test.txt

step 2
cat test.txt | tr " " “\n” | cut -d “’” -f2 | cut -d ‘"’ -f2 | grep “put given link” > data.txt
step 3
open data.txt file in subl text and delete duplicates

Now click “Edit” > “Sort Lines” to sort the lines by value.
Now click “Edit” > “Permute Lines” > “Unique” to remove duplicate values.
Save the file
step 4
cat date.txt | wc -l

This is answer 34