Glam Prestige Journal

Bright entertainment trends with youth appeal.

For a project I'm working on I need to get a list of all URLs in a certain folder of a domain, or better yet all URLs matching a regular expression.

I want to do this using bash so as to avoid installing any programs that I won't ever end up using, but if there is a solution using programs I might already have, such as FireFox, please go ahead and tell me.

Thank you for you time.

8

1 Answer

I figured out how to manage this in my case, much should be the same for anyone else, you should be able to adapt this process to work for with any URL.

  1. Change to a new directory
    First we should change to a new directory to avoid files getting lost or being kept after we need them.
    mkdir ~/Desktop/dev
    cd ~/Desktop/dev
  2. Get URLs with wget
    Next we use the wget command to find all URLs for files and folders in the domain, for me the command was :
    wget -o ./urls.txt --spider -r --reject="index.html" --no-verbose --no-parent
    Just replace the URL in the above command and it sould create a text file (urls.txt) full of URLs and a bunch of other nonsense.
  3. Remove folder left bywget
    wget will have left behind a folder named what ever the domain of your input URL is. There is no important information in this folder, so go ahead and remove it with the rm command or through your file manager.
  4. Build a regex to extract the actual URLs
    This is the hard part, I recommend opening urls.txt in a text edit or that allows finding with regexs and open regexer in your browser, now you have to build a . Once you find a regex that matches the URLs run the command :
    grep -o -E "(https.*\/([0-9](\.[0-9])+)\/(mono\/)?Godot_v\2[-_]stable[_-](mono_)?((win)?(x11[\._])?(osx\.?)?)((32)?(64)?)?((\.exe)?(\.fat)?)\.zip)" ./urls.txt >> urls\ filtered.txt
    This will copy all lines matching the regex to a text file (urls filtered.txt) .Replace the regex (the bit in the quotes) with you regex.

After all that you should be left with a text file of all the URLs that you need.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy