![]() Registered Member ![]()
|
Lots of valid points made. I stick to my original assessment - from a theoretical standpoint. However $USER doesn't care about theory in general, he wants to achieve a goal as easily as possible. So how about looking at it from the user's perspective after pasting?
'ex-\nample' preserves most information but is least legible and most likely to need edit. 'example' drops most information but is probably what the user expects in most cases. 'ex-ample' might be a good compromise: It drops the line break, which is probably dispensible most of the time. It is well legible after pasting even if the hyphen is obsolete, and thus easily editable. So after re-thinking from a more practical perspective, preserving the line break is likely worst. Is there really any scenario that $USER might want to paste with line break? Not making a choice just because one might be wrong in few cases is questionable (if at all possible - it's still a choice). As for the other two, I like TheBlackCat's approach of an alternate action. Which to make the default? The more frequent one, which would likely be 'example' dropping the hyphen. Thus edit would be needed in least cases. However even on "verbatim copy" dropping the line break might be called for. |
![]() Registered Member ![]()
|
There are use cases for copying with line breaks preserved, e.g. if copying (source) code. I've voted for removing both the hyphen and the line break, but I too think having different modes of copying would be a good idea.
In the bad old times of DOS (and still in e.g. SAP's old line-based editor) there was a copying mode called “block copy”. It would let you copy a block of text, selecting whatever was in the rectangle between the two corners you had marked. For instance, from the following text:
I would/am able to put into the clipboard the following:
simply by selecting from the third position on line two to a point to somewhere between the 44th and 50th position of the last line. Indeed, the clipboard contains trailing spaces after the shorter lines. I'm not sure if this is a possible solution in Okular, but in my mind this would be the ideal additional copy mode which preserved everything as it was in the block you copied, line by line and character by character. Sure, there are some challenges for Okular that doesn't exist in SAP's old line-based editor such as non-proportional fonts - but I'm sure nothing which can't be solved with a best-effort that is more than acceptable. |
![]() Registered Member ![]()
|
I voted for option 3 ("example"). But I have to agree with those who argued that this makes no sense, if line breaks are retained. Thus, my vote should be understood as "please replace line breaks by simple spaces and merge hyphened (is that the correct term?) words". Let me quote Hans's call for more succinct survey questions:
|
![]() Registered Member ![]()
|
Valid point, thx. Yet unlikely in combination with hyphenation, which is mainly related to natural language. |
![]() Registered Member ![]()
|
And how will a "-" be determined to be hyphenation, and not a mathematical minus, or a "-" with some other punctuation function, or something else? Making an exact copy (with "-" and end of line) is easy for the user to predict the result, see where the hyphens plus end of lines are, and correct if needed. Removing "-" may introduce unwanted changes that the user does not want, and worse, does not notice (likely if the user is copying a large amount of text and does not know about this behaviour). PMC |
![]() Registered Member ![]()
|
Yes bro-\nken will be converted with broken\n or broken\s (\s for space) for the 3rd option. I think broken\s is better for the user as most line breaks are actually forced by the software and not intended by the user. So, while copying text from a document it should be better to keep it in the same line. That will need less user edition.
Last edited by mamun on Wed Feb 08, 2012 1:22 pm, edited 2 times in total.
|
![]() Registered Member ![]()
|
I voted third option. Because the word needs to be correct when copying. As the source is already hyphen, but the target might not. So give the correct word, and let the target then hyphen it again by situation. If the copying does follow exactly how the source is showing it, then it is impossible to paste it to other targets correctly and results errors.
Example with typical problem with HTML forums where URL is shortened with a way of http://verylong....url.org. It works as link for browser, but it is actually impossible to copy with typical ways. Instead user needs to right click it and select "Copy URL" and sometimes it does not even work as the URL is copied as it looks, not how it works. So after trying to paste it to text editor or URL bar or to notes, result is "verylong....url" instead the correct "verylongaddressurl" kind. So what user needs to do, is to first open the link, then copy the addressbar and lastly paste it to target position. What would be correct thing, is that the URL gets fixed in the first place. As text is data, and it needs to be universally flexible so we can take that data and if needed, convert it to correct form (hyphenation if needed) in the target side... not store it from source as it is. Think about copying data from PDF and pasteing it to kwrite or even konsole or any web form. It is better that data is exact at beginning, instead it is modified version of PDF. And user should not need to go trough the data and "fix the corrections" what were made by PDF formatting. |
![]() Registered Member ![]()
|
Your question is justified and the reason why I resorted to words such as "unlikely" and "mainly". I dare to claim the average user (as opposed to the tech-savvy) is not aware the line break would show up as character(s). Even if he was - does he expect '\n' (*nix), "\r\n" (win), or '\r' (mac)? How would he tell from the doc's appearance only? Leaving it there may introduce an unexpected need to edit, and worse, should he expect it to be removed, unnoticed. You will always contradict some peoples' expectations. The goal should be to minimize the number, as I understand this poll does. My guess is that keeping line break is not what most people expect. |
![]() Registered Member ![]()
|
How do you tell a new line in a text editor? Usually it is self evident due to the presence of a new line! For example, if the original is "aaaaaa bbbb cccc- ddddd eeeee ddddd eeeee" and I copy & paste it to a text editor, what is the best option to get: 1) "aaaaaa bbbb cccc- ddddd eeeee ffffff ggggg" 2) "aaaaaa bbbb cccc-ddddd eeeee ffffff ggggg" 3) "aaaaaa bbbb ccccddddd eeeee ffffff ggggg" What if the new lines are important to the text semantics (e.g. poetry) and should not be removed? What if the "-" is a hyphen in a compound word and should not be removed? What if the "-" is not an hyphen at all and should not be removed? Both 2 and 3 will be wrong in far too many cases, and 1 will at least be a verbatim of the original, easily predictable behaviour, and if the original was semantically right the copy will also be semantically right.
This pool has one important flaw. It particularizes, and since in that particular example, removing the hyphen would be the best answer most people vote for it. Now, try with the above example (think of it as a unknown language). Can you really support the case that 2 or 3 are the best approach? PMC |
![]() Registered Member ![]()
|
It depends. In flow text i don't but rely on dynamic word wrap. I disagree. If your major use case were flow text, there would hardly be any explicit line break. So why should it be encountered in the copy? I am guessing, but I feel these are all rare cases. As such they should be dealt with not as rule but as exception. (As an aside, when DTPing natural language, afaik hyphenation is deprecated with ragged margin.) Ok you guess differently. So should a statistical analysis be carried out? For verbatim copy, a separate "verbatim copy" action has been proposed. Is it? What about the different flavours of line break? What about $USER not expecting line break at all due to (accidental) exclusive use of flow text? Is that so? What about that "previously large" (ex-ample) homonym? Constructed, granted, but you get the point. Actually, although all of your objections are valid, they are special cases, too. Let me restress my point: As you cannot accomodate everyone and certainly not everytime, edit: and no clear-cut right or wrong, a sensible approach would be to make the most common case the default but deal with other cases graciously. The task ist to determine the most common case and define 'graciously'. Sorry for the long post. I'll retire from this discussion for now. |
![]() Registered Member ![]()
|
I voted for the first, since I think the third may break some text where the hyphen is actually needed in the resulting word.
However if it is removed if there is a "verbatim copy" mode, as TheBlackCat suggested, the third option should be the default. |
![]() Registered Member ![]()
|
I voted for option 3 (The contents of the clipboard should be "This is an example"), but I can see the merit in both of the other options.
Is it possible to add a pop-up menu similar to Dolphin when you drag a file into a directory (copy, link, move, cancel), or similar to the digital clock plasmoid when you right-click and choose "Copy to Clipboard"? |
![]() Registered Member ![]()
|
I voted for option 3 - but I would prefer it to be configurable.
Can you please make this an option for searching (ctrl-f) too? It happened for me in the past that I searched for a specific keyword in a text, but didn't find it because of the hyphen. That's really a big problem if you search through large documents which you can't read completely and miss some important information. |
![]() Administrator ![]()
|
Already done: http://tsdgeos.blogspot.com/2012/02/oku ... earch.html
Problem solved? Please click on "Accept this answer" below the post with the best answer to mark your topic as solved.
10 things you might want to do in KDE | Open menu with Super key | Mouse shortcuts |
![]() Registered Member ![]()
|
For PDF one can distinguish between hyphens that are necessary and hyphens that are added for formatting reasons. Here is a paragraph from the reference manual:
Hyphenation. Among the artifacts introduced by text layout is the hyphen marking the incidental division of a word at the end of a line. In Tagged PDF, such an incidental word division shall be represented by a soft hyphen character, which the Unicode mapping algorithm (see “Unicode Mapping in Tagged PDF” in 14.8.2.4, “Extraction of Character Properties”) translates to the Unicode value U+00AD. (This character is distinct from an ordinary hard hyphen, whose Unicode value is U+002D.) The producer of a Tagged PDF document shall distinguish explicitly between soft and hard hyphens so that the consumer does not have to guess which type a given character represents. So okular could at least remove soft hyphens. |
Registered users: Bing [Bot], claydoh, Google [Bot], rblackwell, Yahoo [Bot]