think tank forum

ttf development » Handling URL's

dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
Should we support BBcode style tags "[ url = http:// domain.com ] domain.com [ /url ]" and "[ url ] domain.com [ /url ]"? (without spaces of course).

Should we make long links brief via auto-shortening?

http:// domain.com/products/technology/articles/review/index.php?id=19191919911919#19321

to a linked : http:// domain.com/products/...321

if so, what length should it be from the beginning and the end?

Should posts with common syntax matching everything between www. and .net/com/org be understood?

lastly should common linking occur automatically in very short common syntax for regular .com/.net/.org/.gov: projectdp.com turns into a link with the name "projectdp.com" in the post.

In the last two cases it may break some things. But that's only if people are typing posts with strange syntax to begin with.

Input!?
lucas's avatar
17 years ago
link
lucas
i ❤ demo
dp and i were discussing this the other day. there are two ways that i could go:

1. do not do any type of automatic post formatting. this way, what the author intended is _always_ what is displayed. pure and simple. perhaps even use a mono spaced font.

2. make ttf ultra-productive and intuitive. this includes url linking and such. mis-links would be handled on a case by case basis, hopefully by making the regexp more robust, and not by merely editing the post to make it "compatible with ttf."
asemisldkfj's avatar
17 years ago
link
asemisldkfj
the law is no protection
I support auto-shortening just because it looks better. but I also don't really like the idea of not showing some of the URL. maybe only shorten it so it doesn't expand beyond one line of text? or more than half of a line? I guess that's hard to measure with a not monospaced font though.

speaking of which, I don't like the idea of posts being in a monospaced font, they're not quite as pleasant to read as the current font. if someone wants to preserve formatting I think it should be done via the pre or tt HTML tags, or some ttf-specific equivalent.

I like the idea of creating links out of text like thehomerow.net (I guess we're doing some self-promotion in this thread :P) a lot. this would make it a lot nicer to put links to sites midsentence without having to do [url=blahblah] or anything.
phi_'s avatar
17 years ago
link
phi_
... and let the Earth be silent after ye.
Yeah. What if you had a [mono] tag for monospaced sections of a post.

And, I'm in favor of auto-shortening. But the BB [url=....][/url] tags have never set well with me. I'd prefer to have limited XHTML than BB.
lucas's avatar
17 years ago
link
lucas
i ❤ demo
i like our html "pre" tag for mono-spaced formatting, but we need to change it so that nl2br() is _not_ run on "pre" text.

this
is
what
happens
dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
> And, I'm in favor of auto-shortening. But the BB [url=....][/url] tags have never set well with me. I'd prefer to have limited XHTML than BB.

Technically we do only use XHTML. b, i, u, pre tags. Should we just go ahead and add an anchor < a > tag?

> I guess we're doing some self-promotion in this thread :P

Don't worry, got ya covered!
http://www.thinktankforums.com/thread.php?thread_id=558
dp> http://www.thehomerow.net/~brain/
dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
>2. make ttf ultra-productive and intuitive. this includes url linking and such. mis-links would be handled on a case by case basis, hopefully by making the regexp more robust, and not by merely editing the post to make it "compatible with ttf."

What's the baseline for intuition in url linking? I need some input from everyone.

In the case that you want to make a url a link with a specific name do you intuitively type

textile style: "google" => "google" //as a link
markdown style: [google](http://google.com/) => "google"
bb style: [url=http://google.com/]google[/url] => "google"

We can do any of those but we'd need to agree upon how intuitive each is. The simplest one in my opinion for named links would be something like:

google|http://google.com/ => "google"

so I'd just match "|http://" and chop anything previous to that or until a space. Any other character instead of | would suffice of course. How does that sound for named links? Seems like the easiest way.

Also if you mean by robust, accepting of varying input methods of matching, how robust should it be? Should it be what people 'intuitively' expect with BBcode? or intuitive like a href anchor style?
asemisldkfj's avatar
17 years ago
link
asemisldkfj
the law is no protection
I definitely want at least <a href=""></a> to work. I'd rather type this than learn some strange new markup. but if the new markup is good enough maybe I'll get used to it :).

the google|http://google.com seems pretty nice. then comes the question of whether or not the http should be required.

I like the idea of supporting multiple formats, like bbcode, textile, etc. but this would be very cumbersome and might not be worth the extra code it requires.
dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
well it would be for longer ones like this|http://www.thinktankforums.com/thread.php?thread_id=558 for example.
sriehl's avatar
17 years ago
link
sriehl
surreal
What about links with a space in the url? Will you make people convert all those spaces to %20?
dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
Space in the url? you mean the actual name of the link? it could probably be done with quotes "google bsd!"|http://www.google.com/bsd good different new ways are always appreciated!

And no, how is typing %20 intuitive? Does it look like I'm making it seem like I'll make people conform in a certain way? It should just be a clear posting method that is quick and works. The third and fourth method in the first post covers the really simple ones and it seems like it has at least two votes, asemi and I. Regular pasting of links will be supported as usual. It's just how to deal with text name url's.

snap, I'm tired. Goodnight!
dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
So far, I've gotten the following methods working on my testbed version (for all examples omit space that breaks link):

1. site.com/net/org : short
2. www.site.com/net/org : short v2
3. http:// anysubdomain.site.com/the_restOFline_until/break.html?id=9081 : long-pasted
4. [ url = http://pasted.site.com ]site![ /url ] : bb style named
5. < a href = "http://site.com" >site!< / a > : html input named

Personally I favor 1, 3, 5. 4 is less keys of input, and 2 will make things a little harder for mistakes to be made, though 1 is nicer overall in my opinion.

in progress:

6. http:// domain.com/products/...321 : auto-shortened links.
7. lr!http://www.wingedleopard.net/lucas/ => "lr" as link : name-paste

6 is sorta proving to be difficult, but once I get a few more things done I think I'll have it. I haven't started the regular expression for 7, but it looks pretty simple, I'll just use what matches the long-pasted (3) as well as a unique (probably "!", "|" or something) suggest a unique!
 
17 years ago
link
dbrown
ok, quick question. in #1,2,3,6,7: does the delimiter character effectively remove it from ttf posting?

additionally i'd like to throw my support behind #4 or #5, although i prefer 5.
lucas's avatar
17 years ago
link
lucas
i ❤ demo
so i take it that styles 1, 2, and 3 do not support spaces; do 4 and 5 support spaces? they should.

i advocate keeping ttf bbcode-free and removing style 4 from trunk.

why is style 6 difficult? look at the old code you deleted in r22 (r21 lines 110-1).

style 7 will be badass. is that hoodwink style? and what do you mean, "a unique"? something that urls don't use?
dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
I didn't want the current output function making a link out of all that, but it snuck some in on me. I only put spaces in the examples, which is what I meant by omitting spaces when interpreting the example. 4 and 5 do support spaces.

Sample unique "!" in the following:

lr!http://www.wingedleopard.net/lucas/ => "lr" as a link

or a different unique ":" like this?
dp => "dp" as a link
lucas's avatar
17 years ago
link
lucas
i ❤ demo
i understood what you meant when you said "omit space that breaks link."

i'm considering URLs with spaces in them, like "larz.com/long dumb file name.php"

i'm assuming that 1,2,3 won't succeed at linking that correctly, but they will probably link the first part up to the space, namely "larz.com/long"

however, 4 and 5 should be able to link the entire url correctly. do they?

so it's okay if the "unique" isn't really unique? because you used a colon and a colon naturally occurs in many urls. that's what i was asking about.

what about doing the hoodwink style?
"name of link"

at least this way, you can have spaces in your linked text. but that still kinda sucks because it doesn't support spaces in the url.

what about wikipedia style?

[http://domain.tld/long dumb file.name | name of link]
lucas's avatar
17 years ago
link
lucas
i ❤ demo
i really like wikipedia style or markdown style. both allow for spaces in both the url and in the linked text.

so here's the ultimate question:
do we support as many methods as possible?
or do we choose the best one and only support it?

we still can take php-markdown and modify it to suit our needs if we would like.
lucas's avatar
17 years ago
link
lucas
i ❤ demo
look at http://www.wingedleopard.net/ , it's so great!
lucas's avatar
17 years ago
link
lucas
i ❤ demo
hey, dp is right, it does link commas

http://dumb.punbb/code.html , it works terribly
dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
So it does. Now how to fix this issue? I have to find a working method to match all word characters, including periods, questionmarks, and equals signs, etc. except for the last character which must not be a punctuation mark. I'm having difficulty because if I use the $ operator ( last character matches previous expression) it matches the period. For example in the aforementioned link larz posted: "code.html," the regular expression matches all that until the end of the "word" which is basically everything until the space. Now I want to exclude the last character matching. So if I do something like \.$ (match last literal period character) it only links until: "code" excluding the last period matched, but not the last character in the word.

I'm trying to find a workaround but I really don't know what to do. I guess I need to read up on nongreedy matches that will do exactly what I want, which will make the regular expression more complex all so people don't do something stupid like put punctuation on the end of their link.

I'm heading to BMT tomorrow so I'm really hustlin' to get stuff done here in socal before I leave. I just failed the smog test for the second time, after making what I thought were the proper fixes, now it's failing worse. Also I was not able to get a malaria perscription at my school so now I'll have to get it when I'm in BMT for a higher price.

I'll work on it when I get a chance, but for now it's working at least as good as the GPL version for pasted links.
lucas's avatar
17 years ago
link
lucas
i ❤ demo
> I'll work on it when I get a chance, but for now it's working at least as good as the GPL version for pasted links.

sweet.. i'll take a look and possibly push trunk HEAD to branches/live.

thanks for the work, dp. we need to maintain a credits/contributions file in trunk.
lucas's avatar
17 years ago
link
lucas
i ❤ demo
we just committed working url pregexps as r42.

then we pushed r42 to branches/live!

welcome to r43 (live!).
dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
we now have:

1. shorts: thinktankforums.com or wingedleopard.net
2. subdomains: icarus.projectdp.com or www.thehomerow.net
3. long pastes: http://www.thinktankforums.com/thread.php?thread_id=1
4. named pastes:
a.
lr => lr (as a link)
b.
winged leopard => winged leopard (as a link)
5. auto-shorten: if link > 60 char!
6. < pre > tags are doing as well as they may ever do.

these links should not all work on live ttf at the time of this writing.

bbcode is no longer.
dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
 HHH  HHH   HHH HH HH HH HH HH HHHH
HH HH HH H HH   HH HH HH HH HH HH  
HHHHH HHHH HH   HHHHH HH HH HH HHH 
HH HH HH HH HHH HH HH HH  HHH  HHHH
HHHHHHHHHHHHHHHHHHHH  HHHH      HHH
HHHHHHHHHHHHHHHHHH    HH   HHHH  HH
HHHHHHHHHHHHHHHH   HHHHHHHHHHHH   H
HHHHHHHHHHHHHH   HHHHHHHHHHHHHH  HH
HHHHHHHHHHHH    HHHHHHHHHHHHH   HHH
HHHHHHHHHHHHHH   HHHHHHHHHHHHHH  HH
HHHHHHHHHHHHHHHH   HHHHHHHHHHHH   H
HHHHHHHHHHHHHHHHHH    HH   HHHH  HH
HHHHHHHHHHHHHHHHHHHH  HHHH      HHH
asemisldkfj's avatar
17 years ago
link
asemisldkfj
the law is no protection
the home row
dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
it'll be all set up once the new live is merged :)
lucas's avatar
17 years ago
link
lucas
i ❤ demo
and there it is!
asemisldkfj's avatar
17 years ago
link
asemisldkfj
the law is no protection
nice!
lucas's avatar
17 years ago
link
lucas
i ❤ demo
         tttt               tttt             ffffffffffffffff  
      ttt:::t            ttt:::t            f::::::::::::::::f 
      t:::::t            t:::::t           f::::::::::::::::::f
      t:::::t            t:::::t           f::::::fffffff:::::f
ttttttt:::::tttttttttttttt:::::ttttttt     f:::::f       ffffff
t:::::::::::::::::tt:::::::::::::::::t     f:::::f             
t:::::::::::::::::tt:::::::::::::::::t    f:::::::ffffff       
tttttt:::::::tttttttttttt:::::::tttttt    f::::::::::::f       
      t:::::t            t:::::t          f::::::::::::f       
      t:::::t            t:::::t          f:::::::ffffff       
      t:::::t            t:::::t           f:::::f             
      t:::::t    tttttt  t:::::t    tttttt f:::::f             
      t::::::tttt:::::t  t::::::tttt:::::tf:::::::f            
      tt::::::::::::::t  tt::::::::::::::tf:::::::f            
        tt:::::::::::tt    tt:::::::::::ttf:::::::f            
          ttttttttttt        ttttttttttt  fffffffff            



                                  ___     
                                 /\__\    
      ___           ___         /:/ _/_   
     /\__\         /\__\       /:/ /\__\  
    /:/  /        /:/  /      /:/ /:/  /  
   /:/__/        /:/__/      /:/_/:/  /   
  /::\  \       /::\  \      \:\/:/  /    
 /:/\:\  \     /:/\:\  \      \::/__/     
 \/__\:\  \    \/__\:\  \      \:\  \     
      \:\__\        \:\__\      \:\__\    
       \/__/         \/__/       \/__/    



 __    __       ___  
/\ \__/\ \__  /'___\ 
\ \ ,_\ \ ,_\/\ \__/ 
 \ \ \/\ \ \/\ \ ,__\
  \ \ \_\ \ \_\ \ \_/
   \ \__\\ \__\\ \_\ 
    \/__/ \/__/ \/_/ 


http://www.network-science.de/ascii/
lucas's avatar
17 years ago
link
lucas
i ❤ demo
so why didn't it work right there? does the regexp not allow for "\r\n" between the pre tags?
lucas's avatar
17 years ago
link
lucas
i ❤ demo
err, of course not. maybe it doesn't allow for multiple line breaks like "\r\n\r\n\r\n"

i don't know
lucas's avatar
17 years ago
link
lucas
i ❤ demo
ref: http://www.faqs.org/rfcs/rfc1738
dannyp's avatar
17 years ago
link
dannyp
dʎuuɐp
still working on those pre tags, it's a beast. I think I have some working theories that I will work on!
lucas's avatar
17 years ago
link
lucas
i ❤ demo
haha

good luck
lucas's avatar
17 years ago
link
lucas
i ❤ demo
http://imgs.xkcd.com/comics/regular_expressions.png
lucas's avatar
17 years ago
link
lucas
i ❤ demo
i got "pre" tags working in trunk. here's the code:

http://code.google.com/p/thinktankforums/issu … il?id=6#c9

i'll merge live to trunk r64 sometime soon. there's lots of revisions between live HEAD and trunk HEAD now, though.
asemisldkfj's avatar
17 years ago
link
asemisldkfj
the law is no protection
just testing some stuff

"http://thehomerow.net/~brain/":"testy testy"
asemisldkfj's avatar
17 years ago
link
asemisldkfj
the law is no protection
http://thehomerow.net/~brain/ :"and again"
asemisldkfj's avatar
17 years ago
link
asemisldkfj
the law is no protection
http://thehomerow.net/~brain/ :'hmm hmm'
asemisldkfj's avatar
17 years ago
link
asemisldkfj
the law is no protection
'http://thehomerow.net/~brain/':'this isnt working'
lucas's avatar
17 years ago
link
lucas
i ❤ demo
hi
lucas's avatar
17 years ago
link
lucas
i ❤ demo
you're going backwards

it's:

' blah blah ' : <URL>
lucas's avatar
17 years ago
link
lucas
i ❤ demo
' blah blah' : [URL]

nothing works anymore since ttf went utf-8 :(

i don't know what to do
asemisldkfj's avatar
17 years ago
link
asemisldkfj
the law is no protection
ah, woops

hello
asemisldkfj's avatar
17 years ago
link
asemisldkfj
the law is no protection
what doesn't work? or did you fix it before I could see?
lucas's avatar
17 years ago
link
lucas
i ❤ demo
the ellipses are question marks for me

does
  pre
    still
      work?
lucas's avatar
16 years ago
link
lucas
i ❤ demo

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.


http://tools.ietf.org/html/rfc1738

this should be kept in mind when making the regexps for auto-linking.
dannyp's avatar
16 years ago
link
dannyp
dʎuuɐp
yeah, i looked at the rfc before hand.

the main problem comes with intelligently ending a link, especially if there are multiple odd characters.
lucas's avatar
16 years ago
link
lucas
i ❤ demo
yeah, like with "." , ")" , "-" , and "!" .
lucas's avatar
16 years ago
link
lucas
i ❤ demo
http://code.google.com/p/thinktankforums/issues/detail?id=34
dannyp's avatar
16 years ago
link
dannyp
dʎuuɐp
Hi everybody, will you people help me find links broken by the linking script? I'm going to try to debug the regular expression.

I think it has to do with the way it identifies the end of the link.

If you've posted a link and it broke recently paste it here please!!
asemisldkfj's avatar
16 years ago
link
asemisldkfj
the law is no protection
will this still be relevant if we use textile? I think it will but I just want to make sure.
dannyp's avatar
16 years ago
link
dannyp
dʎuuɐp
I think so. I do know that what we call 'named url' in the code, exists in textile.

e.g.
'this is a link':http[SPACE]://www.site.com
-- on ttf is like this on textile:
"this is a link":http[SPACE]://www.site.com

(the only difference is ' versus ")
(note: in both cases remove [SPACE] to replicate normal ttf link behavior)

But I don't think textile gives the ability to use what we call 'full url' links in the code. This would be when you just paste a link into a post and do no other formatting -- which most people primarily use on this forum anyway.
 
16 years ago
r1, link
dbrown
@dp & asemisldkfj: after talking to lucas it sounded like we're still leaving in the auto-linking that exists in ttf as textile doesn't support it. it'll just be the named url that'll be removed in favor of textile.
dannyp's avatar
16 years ago
link
dannyp
dʎuuɐp
no help locating bad links?

If you've posted a link and it broke EVER paste it here please!!
lucas's avatar
16 years ago
link
lucas
i ❤ demo
http://www.thinktankforums.com/thread.php?thread_id=1830
Carpetsmoker's avatar
12 years ago
link
Carpetsmoker
Martin
This fails:
'HEMA' : http://en.wikipedia.org/wiki/HEMA_(store )