The Discussions and Opinions forum is a place for open discussion regarding everything related to KDE, within the boundaries of KDE Code of Conduct. If you have a question or need a solution for a KDE problem, please post in the apppropriate forum instead.
Reply to topic

Okular: Include hyphen in copied text or not?

What should happen when you select the text from the first post and copy it?

Poll ended at Tue Feb 14, 2012 10:20 pm

The contents of the clipboard should be "This is an ex-\nample"
12%
The contents of the clipboard should be "This is an ex-ample"
3%
The contents of the clipboard should be "This is an example"
85%

Total votes : 114


User avatar TSDgeos
Moderator
Posts
36
Karma
-1
Imagine the situation in which you have a word spanning two lines

This is an ex-
ample

and you copy it. What should happen when you select the text from the first post and copy it?

Vote in the poll above!

Note: \n in the first option stands for new line break
zaphod.b
Registered Member
Posts
5
Karma
0
Any tool would be hard put to decide whether hyphenation took place mid-word due to line break, or because it's a composite that would be hyphenated even without line break.

As this decision is hard to take, both 'example' and 'ex-ample' are possibly unjustified guesses. Therefore imho the copy should be made "as-is", that is, as 'ex-\nample'.

A more intricate approach would be to make it an option. ("If in doubt, let the user decide.")

jm2c
User avatar Kubuntiac
Registered Member
Posts
769
Karma
2
OS
Ditto what zaphod said.


Krita - All the cool kids are painting with it!
purple-bobby
Registered Member
Posts
6
Karma
0
OS
It depends on which selection mode you have.

If you are doing text selection, then you want text to paste into something else, I would think you do not want the hyphen newline.

If you are doing eps/image/object selection, then you probably want it as close as on screen.

What about soft-hyphens? The ones which only appear when the word is broken (in English).

I am a big fan of Okular, but not a big fan of the selection mode pull down.

I guess ideally you would copy a full-fat version of the selection, then have the ability to paste in different representations. I would like to be able to paste unformatted (Unicode) text
User avatar acidrums4
Registered Member
Posts
29
Karma
0
OS
What about making an option for that? But I think most of the times should be the third option.
pedromc
Registered Member
Posts
8
Karma
0
OS
The "-" may be an hyphen indicating a word break on a new line, may be a hyphen ligament on a composite word, or may not even be an hyphen (e.g. minus). Thus the third option, removing it, is completely wrong in too many situation and should not be the choice taken.

The second choice, "This is an ex-ample", is obviously wrong. "ex-ample" seriously? ;)

That leaves the first choice.

PMC
User avatar karthikp
Registered Member
Posts
106
Karma
0
OS
I think the second option is closer to what I expect. It's close enough to an as-is copy, but acknowledges that there was a break in the text, so it can be corrected further. So, I voted for it.

However, since okular should be used by lots of people, most of whom might not think like me at all, the "expected" behavior should be that the program intelligently hides away hyphens when it makes sense to. So, in that light, the third option should be the correct one to implement.


karthikp, proud to be a member of KDE forums since 2008.
Image
User avatar Butcher
Registered Member
Posts
6
Karma
0
OS
I hope you're kidding.

The only realistic answer to this is "example". Why would one want to preserve the hyphen?

I mean, I'm sure you want to make your users' life simpler with your program. Just make it easy to use. Preserving an hyphen just makes no sense at all, is over-complicating the problem.

You're giving your users a choice that shouldn't even exist.
jwagoner
Registered Member
Posts
1
Karma
0
OS
Would it be possible to use spell-check to differentiate between compound and wrapped words? It seems like that would give a better hit rate than either approach as a default. E.g. if you remove the hyphen from ex-ample you get a dictionary word so remove it, but removing the hyphen from over-complicating does not result in a word so leave it in.
User avatar karthikp
Registered Member
Posts
106
Karma
0
OS
I wasn't kidding. Preserving a hyphen means I get to check whether the hyphen in the PDF was supposed to be a hyphen (-), a minus (-), an n-dash (–) or an m-dash (—). Not everyone uses LaTeX and may have represented all these with a hyphen. I'd prefer if I get the final say.

However, I acknowledge that the average user would much prefer the third option. So, while my vote is for the second, I full expect the third one to be the default behavior.


karthikp, proud to be a member of KDE forums since 2008.
Image
User avatar karthikp
Registered Member
Posts
106
Karma
0
OS
jwagoner wrote:Would it be possible to use spell-check to differentiate between compound and wrapped words? It seems like that would give a better hit rate than either approach as a default. E.g. if you remove the hyphen from ex-ample you get a dictionary word so remove it, but removing the hyphen from over-complicating does not result in a word so leave it in.


I was going to suggest that and realized the example slips through. (ex-ample is a perfectly valid, if meaningless compound word).


karthikp, proud to be a member of KDE forums since 2008.
Image
cptG
Registered Member
Posts
3
Karma
0
OS
Given the fact that Okular respects line breaks when copying text I think it makes sense to preserve the line break in this case, too.
Everything else would require the program to know more than it really can.
What happens when option 3 is implemented?
Will this:
This is a sentence with a bro-\nken word
be turned into:
This is a sentence with a broken\n word
?
pedromc
Registered Member
Posts
8
Karma
0
OS
Using a dictionary check to determine if the "-" can be removed, as suggested by some, is full of problems. From what language is the word? Is the word even in the dictionary? In some languages, xxx-yyy and xxxyyy may both be valid but semantically distinct. What then?

The copy & paste should be as predictable and exact as possible. Removing the "-" from "xxx-\nyyy" if "xxxyyy" is in some dictionary is a bad idea.

I much prefer having to do some editing after the copy & paste to remove some hyphens, than having to check that the copy & paste did not mangle the text in some unpredictable way because the source had some "-" in unlucky places.

PMC
User avatar Hans
Administrator
Posts
3124
Karma
20
OS
I feel that this is more a question of whether to preserve line breaks or not.

Line breaks - include hyphen (option 1).
No line breaks - remove hyphen (option 3).

purple-bobby's suggestion might be a good compromise, i.e., preserve line breaks for normal selection and no line breaks for text selection. If I had to choose between the two I would prefer the first option, which is simpler and more predictable.

Slightly off topic:

How does the PDF viewer know if there's a line break or not? I tried two PDF files in Adobe Reader, one created with pdflatex and the other exported from PowerPoint (I guess). For the former line breaks were preserved when copying a selection, however, they were not copied in the latter case. Okular included line breaks in both cases.


Problem solved? Please click on "Accept this answer" below the post with the best answer to mark your topic as solved.

Image
10 things you might want to do in KDE | Open menu with Super key | Mouse shortcuts
User avatar TheBlackCat
Registered Member
Posts
2945
Karma
8
OS
I would think by default it should remove the hyphen and line breaks when doing copying.

A config option is not good because it may vary even within a single document, depending on exactly what you want to do with the text.

I think the best solution would be for the edit menu to have an additional "copy verbatim" (tentative name) action that which will do a copy with all the line breaks and hyphens still in (as well as any other future smart character or text changes). The "copy" toolbar button could copy when clicked, but if you long press on it would bring up a menu that lets you do the copy verbatim as well.

Alternative you could do the other way around, and have copy preserve these things and have a "smart copy" in the edit menu, but I think most times you copy something like this it is to put it in a different document, so maintaining the line breaks and hyphens is counterproductive.


Man is the lowest-cost, 150-pound, nonlinear, all-purpose computer system which can be mass-produced by unskilled labor.
-NASA in 1965

 
Reply to topic

Bookmarks



Who is online

Registered users: Baidu [Spider], Bing [Bot], Dimitrios, drnn1076, Exabot [Bot], Google [Bot], google01103, Hans, Horus, koriun, Majestic-12 [Bot], pbCyanide, pedrorodriguez, random_fan, scummos, SeaJey, Sogatori, Sogou [Bot], TheraHedwig, vascobasque, whatthefunk, Yahoo [Bot], yurchor, z-uo