This forum has been archived. All content is frozen. Please use KDE Discuss instead.
The Discussions and Opinions forum is a place for open discussion regarding everything related to KDE, within the boundaries of KDE Code of Conduct. If you have a question or need a solution for a KDE problem, please post in the apppropriate forum instead.

Okular: Include hyphen in copied text or not?

What should happen when you select the text from the first post and copy it?

Poll ended at Tue Feb 14, 2012 10:20 pm

The contents of the clipboard should be "This is an ex-\nample"
12%
The contents of the clipboard should be "This is an ex-ample"
3%
The contents of the clipboard should be "This is an example"
85%

Total votes : 114


Tags: okular okular okular
(comma "," separated)
zaphod.b
Registered Member
Posts
5
Karma
0
Lots of valid points made. I stick to my original assessment - from a theoretical standpoint. However $USER doesn't care about theory in general, he wants to achieve a goal as easily as possible. So how about looking at it from the user's perspective after pasting?

'ex-\nample' preserves most information but is least legible and most likely to need edit.
'example' drops most information but is probably what the user expects in most cases.
'ex-ample' might be a good compromise: It drops the line break, which is probably dispensible most of the time. It is well legible after pasting even if the hyphen is obsolete, and thus easily editable.

So after re-thinking from a more practical perspective, preserving the line break is likely worst. Is there really any scenario that $USER might want to paste with line break? Not making a choice just because one might be wrong in few cases is questionable (if at all possible - it's still a choice).

As for the other two, I like TheBlackCat's approach of an alternate action. Which to make the default? The more frequent one, which would likely be 'example' dropping the hyphen. Thus edit would be needed in least cases. However even on "verbatim copy" dropping the line break might be called for.
kjetil_kilhavn
Registered Member
Posts
3
Karma
0
OS
There are use cases for copying with line breaks preserved, e.g. if copying (source) code. I've voted for removing both the hyphen and the line break, but I too think having different modes of copying would be a good idea.

In the bad old times of DOS (and still in e.g. SAP's old line-based editor) there was a copying mode called “block copy”. It would let you copy a block of text, selecting whatever was in the rectangle between the two corners you had marked. For instance, from the following text:
Code: Select all
*** Disable this code by turning it into comments
* BREAK-POINT.                                    "Stop here!
* CALL FUNCTION MODULE 'SOME_FUNCTION_MODULE'
*      EXPORTING i_parameter = le_value
*      IMPORTING e_value1    = le_choice
*                e_cancelled = le_cancelled
*      EXCEPTIONS bad_choice = 1.


I would/am able to put into the clipboard the following:
Code: Select all
BREAK-POINT.
CALL FUNCTION MODULE 'SOME_FUNCTION_MODULE'
     EXPORTING i_parameter = le_value
     IMPORTING e_value1    = le_choice
               e_cancelled = le_cancelled
     EXCEPTIONS bad_choice = 1.


simply by selecting from the third position on line two to a point to somewhere between the 44th and 50th position of the last line. Indeed, the clipboard contains trailing spaces after the shorter lines.

I'm not sure if this is a possible solution in Okular, but in my mind this would be the ideal additional copy mode which preserved everything as it was in the block you copied, line by line and character by character. Sure, there are some challenges for Okular that doesn't exist in SAP's old line-based editor such as non-proportional fonts - but I'm sure nothing which can't be solved with a best-effort that is more than acceptable.
mutlu
Registered Member
Posts
75
Karma
0
OS
I voted for option 3 ("example"). But I have to agree with those who argued that this makes no sense, if line breaks are retained. Thus, my vote should be understood as "please replace line breaks by simple spaces and merge hyphened (is that the correct term?) words". Let me quote Hans's call for more succinct survey questions:
Hans wrote:I feel that this is more a question of whether to preserve line breaks or not.

Line breaks - include hyphen (option 1).
No line breaks - remove hyphen (option 3).
zaphod.b
Registered Member
Posts
5
Karma
0
kjetil_kilhavn wrote:There are use cases for copying with line breaks preserved, e.g. if copying (source) code.

Valid point, thx. Yet unlikely in combination with hyphenation, which is mainly related to natural language.
pedromc
Registered Member
Posts
11
Karma
0
OS
zaphod.b wrote:
kjetil_kilhavn wrote:There are use cases for copying with line breaks preserved, e.g. if copying (source) code.

Valid point, thx. Yet unlikely in combination with hyphenation, which is mainly related to natural language.


And how will a "-" be determined to be hyphenation, and not a mathematical minus, or a "-" with some other punctuation function, or something else?

Making an exact copy (with "-" and end of line) is easy for the user to predict the result, see where the hyphens plus end of lines are, and correct if needed.

Removing "-" may introduce unwanted changes that the user does not want, and worse, does not notice (likely if the user is copying a large amount of text and does not know about this behaviour).

PMC
mamun
Registered Member
Posts
1
Karma
0
cptG wrote:Given the fact that Okular respects line breaks when copying text I think it makes sense to preserve the line break in this case, too.
Everything else would require the program to know more than it really can.
What happens when option 3 is implemented?
Will this:
This is a sentence with a bro-\nken word
be turned into:
This is a sentence with a broken\n word
?


Yes bro-\nken will be converted with broken\n or broken\s (\s for space) for the 3rd option. I think broken\s is better for the user as most line breaks are actually forced by the software and not intended by the user. So, while copying text from a document it should be better to keep it in the same line. That will need less user edition.

Last edited by mamun on Wed Feb 08, 2012 1:22 pm, edited 2 times in total.
User avatar
Fri13
Registered Member
Posts
397
Karma
4
OS
I voted third option. Because the word needs to be correct when copying. As the source is already hyphen, but the target might not. So give the correct word, and let the target then hyphen it again by situation. If the copying does follow exactly how the source is showing it, then it is impossible to paste it to other targets correctly and results errors.

Example with typical problem with HTML forums where URL is shortened with a way of http://verylong....url.org. It works as link for browser, but it is actually impossible to copy with typical ways. Instead user needs to right click it and select "Copy URL" and sometimes it does not even work as the URL is copied as it looks, not how it works.

So after trying to paste it to text editor or URL bar or to notes, result is "verylong....url" instead the correct "verylongaddressurl" kind. So what user needs to do, is to first open the link, then copy the addressbar and lastly paste it to target position. What would be correct thing, is that the URL gets fixed in the first place.

As text is data, and it needs to be universally flexible so we can take that data and if needed, convert it to correct form (hyphenation if needed) in the target side... not store it from source as it is.

Think about copying data from PDF and pasteing it to kwrite or even konsole or any web form. It is better that data is exact at beginning, instead it is modified version of PDF. And user should not need to go trough the data and "fix the corrections" what were made by PDF formatting.
zaphod.b
Registered Member
Posts
5
Karma
0
pedromc wrote:And how will a "-" be determined to be hyphenation, and not a mathematical minus, or a "-" with some other punctuation function, or something else?
Your question is justified and the reason why I resorted to words such as "unlikely" and "mainly".
pedromc wrote:Making an exact copy (with "-" and end of line) is easy for the user to predict the result, see where the hyphens plus end of lines are, and correct if needed.
I dare to claim the average user (as opposed to the tech-savvy) is not aware the line break would show up as character(s). Even if he was - does he expect '\n' (*nix), "\r\n" (win), or '\r' (mac)? How would he tell from the doc's appearance only?
pedromc wrote:Removing "-" may introduce unwanted changes that the user does not want, and worse, does not notice (likely if the user is copying a large amount of text and does not know about this behaviour).
Leaving it there may introduce an unexpected need to edit, and worse, should he expect it to be removed, unnoticed.

You will always contradict some peoples' expectations. The goal should be to minimize the number, as I understand this poll does. My guess is that keeping line break is not what most people expect.
pedromc
Registered Member
Posts
11
Karma
0
OS
zaphod.b wrote:
pedromc wrote:Making an exact copy (with "-" and end of line) is easy for the user to predict the result, see where the hyphens plus end of lines are, and correct if needed.

I dare to claim the average user (as opposed to the tech-savvy) is not aware the line break would show up as character(s). Even if he was - does he expect '\n' (*nix), "\r\n" (win), or '\r' (mac)? How would he tell from the doc's appearance only?

How do you tell a new line in a text editor? Usually it is self evident due to the presence of a new line!

For example, if the original is

"aaaaaa bbbb cccc-
ddddd eeeee ddddd
eeeee"

and I copy & paste it to a text editor, what is the best option to get:

1)
"aaaaaa bbbb cccc-
ddddd eeeee ffffff
ggggg"

2)
"aaaaaa bbbb cccc-ddddd eeeee ffffff ggggg"

3)
"aaaaaa bbbb ccccddddd eeeee ffffff ggggg"

What if the new lines are important to the text semantics (e.g. poetry) and should not be removed?
What if the "-" is a hyphen in a compound word and should not be removed?
What if the "-" is not an hyphen at all and should not be removed?

Both 2 and 3 will be wrong in far too many cases, and 1 will at least be a verbatim of the original, easily predictable behaviour, and if the original was semantically right the copy will also be semantically right.

zaphod.b wrote:
pedromc wrote:Removing "-" may introduce unwanted changes that the user does not want, and worse, does not notice (likely if the user is copying a large amount of text and does not know about this behaviour).
Leaving it there may introduce an unexpected need to edit, and worse, should he expect it to be removed, unnoticed.
You will always contradict some peoples' expectations. The goal should be to minimize the number, as I understand this poll does. My guess is that keeping line break is not what most people expect.

This pool has one important flaw. It particularizes, and since in that particular example, removing the hyphen would be the best answer most people vote for it.

Now, try with the above example (think of it as a unknown language). Can you really support the case that 2 or 3 are the best approach?

PMC
zaphod.b
Registered Member
Posts
5
Karma
0
pedromc wrote:How do you tell a new line in a text editor?
It depends. In flow text i don't but rely on dynamic word wrap.
pedromc wrote:Usually it is self evident due to the presence of a new line!
I disagree. If your major use case were flow text, there would hardly be any explicit line break. So why should it be encountered in the copy?
pedromc wrote:What if the new lines are important to the text semantics (e.g. poetry) and should not be removed?
What if the "-" is a hyphen in a compound word and should not be removed?
What if the "-" is not an hyphen at all and should not be removed?
I am guessing, but I feel these are all rare cases. As such they should be dealt with not as rule but as exception.
(As an aside, when DTPing natural language, afaik hyphenation is deprecated with ragged margin.)
pedromc wrote:Both 2 and 3 will be wrong in far too many cases
Ok you guess differently. So should a statistical analysis be carried out?
pedromc wrote:1 will at least be a verbatim of the original
For verbatim copy, a separate "verbatim copy" action has been proposed.
pedromc wrote:easily predictable behaviour
Is it? What about the different flavours of line break? What about $USER not expecting line break at all due to (accidental) exclusive use of flow text?
pedromc wrote:if the original was semantically right the copy will also be semantically right.
Is that so? What about that "previously large" (ex-ample) homonym? Constructed, granted, but you get the point.
pedromc wrote:This pool has one important flaw. It particularizes, and since in that particular example, removing the hyphen would be the best answer most people vote for it.
Actually, although all of your objections are valid, they are special cases, too.

Let me restress my point: As you cannot accomodate everyone and certainly not everytime, edit: and no clear-cut right or wrong, a sensible approach would be to make the most common case the default but deal with other cases graciously. The task ist to determine the most common case and define 'graciously'.

Sorry for the long post. I'll retire from this discussion for now.
arichardson
Registered Member
Posts
3
Karma
0
OS
I voted for the first, since I think the third may break some text where the hyphen is actually needed in the resulting word.

However if it is removed if there is a "verbatim copy" mode, as TheBlackCat suggested, the third option should be the default.
User avatar
tomsneddon
Registered Member
Posts
1
Karma
0
OS
I voted for option 3 (The contents of the clipboard should be "This is an example"), but I can see the merit in both of the other options.

Is it possible to add a pop-up menu similar to Dolphin when you drag a file into a directory (copy, link, move, cancel), or similar to the digital clock plasmoid when you right-click and choose "Copy to Clipboard"?
JSowieso
Registered Member
Posts
2
Karma
0
OS
I voted for option 3 - but I would prefer it to be configurable.

Can you please make this an option for searching (ctrl-f) too? It happened for me in the past that I searched for a specific keyword in a text, but didn't find it because of the hyphen. That's really a big problem if you search through large documents which you can't read completely and miss some important information.
User avatar
Hans
Administrator
Posts
3304
Karma
24
OS
JSowieso wrote:Can you please make this an option for searching (ctrl-f) too? It happened for me in the past that I searched for a specific keyword in a text, but didn't find it because of the hyphen. That's really a big problem if you search through large documents which you can't read completely and miss some important information.


Already done: http://tsdgeos.blogspot.com/2012/02/oku ... earch.html


Problem solved? Please click on "Accept this answer" below the post with the best answer to mark your topic as solved.

10 things you might want to do in KDE | Open menu with Super key | Mouse shortcuts
Ponto
Registered Member
Posts
1
Karma
0
OS
For PDF one can distinguish between hyphens that are necessary and hyphens that are added for formatting reasons. Here is a paragraph from the reference manual:

Hyphenation. Among the artifacts introduced by text layout is the hyphen marking the incidental division of a word at the end of a line. In Tagged PDF, such an incidental word division shall be represented by a soft hyphen character, which the Unicode mapping algorithm (see “Unicode Mapping in Tagged PDF” in 14.8.2.4, “Extraction of Character Properties”) translates to the Unicode value U+00AD. (This character is distinct from an ordinary hard hyphen, whose Unicode value is U+002D.) The producer of a Tagged PDF document shall distinguish explicitly between soft and hard hyphens so that the consumer does not have to guess which type a given character represents.


So okular could at least remove soft hyphens.


Bookmarks



Who is online

Registered users: Bing [Bot], claydoh, Google [Bot], rblackwell, Yahoo [Bot]