Special Characters or Tweets that get a too long error from Twitter despite that on twitter.com the tweet would work

Support / help / discussion forum for twitter bot
Locked
bitcoin
Posts: 924
Joined: Tue Jul 04, 2017 1:25 am

Special Characters or Tweets that get a too long error from Twitter despite that on twitter.com the tweet would work

Post by bitcoin »

In follow up of thread viewtopic.php?f=30&t=5027&start=10#p30405

Code: Select all

Added image to tweet: FireShot Capture 22 - Altcoin Exchange - Built By Traders, For T_ - https___www.altcoinexchange.com_.png
PreTokens: #RT! ICO platform GO TO t.co/9buSj7cx0h<br>Early access now available! Test this great exchange out!! #altcoinexchange


#Synereo #TYC #LEO #PCN #INCNT
PostTokens: #RT! ICO platform GO TO t.co/9buSj7cx0h<br>Early access now available! Test this great exchange out!! #altcoinexchange


#Synereo #TYC #LEO #PCN #INCNT
Too long: 149
Snipped to: (137): #RT! ICO platform GO TO t.co/9buSj7cx0h
Early access now available! Test this great exchange out!! #altcoinexchange


#Synereo #TYC #LEO
Posting tweet: #RT! ICO platform GO TO t.co/9buSj7cx0h
Early access now available! Test this great exchange out!! #altcoinexchange


#Synereo #TYC #LEO
* ERROR: verify action: Forbidden - The request is understood, but it has been refused or access is not allowed. An accompanying error message will explain why. This code is used when requests are being denied due to update limits.
* ERROR Info: Status is over 140 characters.
* FAILED: TweetPostId: 
I wouldn't know what the special characters are here... We have a "#" and a "<br>". And an "!". Fuzzy, no? :(

Nothing else in the log files either... Besides what's already given above. LOGS: 88597
bitcoin
Posts: 924
Joined: Tue Jul 04, 2017 1:25 am

Re: Special Characters or Tweets that get a too long error from Twitter despite that on twitter.com the tweet would work

Post by bitcoin »

MARTIN YOU CAN SKIP THIS REPLY

Code: Select all

Posting: 1 images
Post: C:\Users\Bad Robot\OneDrive\TwitterDub\DATA\CIF\WhitePaper Promo Mostly\Images\cryptoimprovementfund.io screen capture 2017-09-17_01-45-15.png
Process image: C:\Users\Bad Robot\OneDrive\TwitterDub\DATA\CIF\WhitePaper Promo Mostly\Images\cryptoimprovementfund.io screen capture 2017-09-17_01-45-15.png
Preprocess image: cryptoimprovementfund.io screen capture 2017-09-17_01-45-15.png
Add image to tweet: cryptoimprovementfund.io screen capture 2017-09-17_01-45-15.png using tmp path: C:\Users\Bad Robot\AppData\Local\Temp\oKowCwf54.png
Added image to tweet: cryptoimprovementfund.io screen capture 2017-09-17_01-45-15.png
PreTokens: CIF Ltd.'s Bus.Edu #Business Program: using #cryptocurrency as an #investment tool? $CIF makes it possible! Follow @CIF_Team


#business #blockchain
PostTokens: CIF Ltd.'s Bus.Edu #Business Program: using #cryptocurrency as an #investment tool? $CIF makes it possible! Follow @CIF_Team


#business #blockchain
Too long: 149
Snipped to: (137): CIF Ltd.'s Bus.Edu #Business Program: using #cryptocurrency as an #investment tool? $CIF makes it possible! Follow @CIF_Team


#business
Posting tweet: CIF Ltd.'s Bus.Edu #Business Program: using #cryptocurrency as an #investment tool? $CIF makes it possible! Follow @CIF_Team


#business
* ERROR: verify action: Forbidden - The request is understood, but it has been refused or access is not allowed. An accompanying error message will explain why. This code is used when requests are being denied due to update limits.
* ERROR Info: Status is over 140 characters.
Image

BUT... This one I can explain : ) Twitter converts bus.edu to an URL... Kinda makes sense now I type it here : )
User avatar
martin@rootjazz
Site Admin
Posts: 34634
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Special Characters or Tweets that get a too long error from Twitter despite that on twitter.com the tweet would work

Post by martin@rootjazz »

MARTIN YOU CAN SKIP THIS REPLY

:)
bitcoin
Posts: 924
Joined: Tue Jul 04, 2017 1:25 am

Re: Special Characters or Tweets that get a too long error from Twitter despite that on twitter.com the tweet would work

Post by bitcoin »

martin@rootjazz wrote: Thu Oct 12, 2017 8:28 pm
MARTIN YOU CAN SKIP THIS REPLY

:)
... but this reply remains: viewtopic.php?p=30715#p30627 :)
bitcoin
Posts: 924
Joined: Tue Jul 04, 2017 1:25 am

Re: Special Characters or Tweets that get a too long error from Twitter despite that on twitter.com the tweet would work

Post by bitcoin »

bitcoin wrote: Thu Oct 12, 2017 8:44 pm
martin@rootjazz wrote: Thu Oct 12, 2017 8:28 pm

:)
... but this reply remains: viewtopic.php?p=30715#p30627 :)
Martin, I found this article that might be of interest...
https://developer.twitter.com/en/docs/b ... characters

Counting Characters

General Concepts

The “café” issue mentioned above raises the question of how you count the characters in the Tweet string “café”. To the human eye the length is clearly four characters. Depending on how the data is represented this could be either five or six UTF-8 bytes. Twitter does not want to penalize a user for the fact we use UTF-8 or for the fact that the API client in question used the longer representation. Therefore, Twitter does count “café” as four characters no matter which representation is sent.

Nearly all user input methods automatically convert the longer combining mark version into the composed version but the Twitter API cannot count on that. Even if we did ignore that the byte length of the “é” character is two bytes rather than the one you would expect. Below there is some more specific information on how to get that information out of Ruby/Rails but for now I’ll cover the general concepts that should be available in any language.

The Unicode Standard covers much more that a listing of characters with numbers associated. Unicode does provide such a list of “codepoints” (`more info <http://www.unicode.org/charts/>`__), which is the U+XXXX notation you sometimes see. The Unicode Standard also provides several different ways to encode those codepoints (UTF-8 and UTF-16 are examples, but there are others). The Unicode standard also provides some detailed information on how to deal with character issues such as Sorting, Regular Expressions and of importance to this issue, Normalization.


Combining Diacritical Marks - A Prelude to Normalization

So, back in the café, the issue of multiple byte sequences having the same on-screen representation was breezed right by. There is an entire section of the Unicode tables devoted to the “Combining Diacritical Marks” (see that Unicode “block” here). These are not stand-alone characters but instead the additional “diacritical marks” used in addition to other base characters in many languages. For example the ¨ over the ü, common to German; or the ˜ over the ñ in Spanish. There are a great many combinations needed to cover all languages in the world so Unicode provides some simple building blocks, the Combining Diacritical Marks.

For the most common characters (like é, ü and company) there is also a character just for the combination. The reasons for that are mostly historical but since they exist it’s something we’ll always need to be aware of. This historical oddity is the exact reason for the two “café” representations. If you look back at the representations you’ll see one uses 0x65 0xCC 0x81, where 0x65 is simply the letter “e” and >0xCC 0x81 is the Combining Diacritical Mark for ´. Since there are multiple ways to represent the same thing using Unicode the Unicode Standard provides information on how to normalize the multiple different representations.


Unicode Normalization
The Unicode Standard provides information on several different kinds of normalization, Canonical and Compatibility. There is a full description of the different options in the Unicode Standard Annex #15, the report on normalization. The normalization report is 32 pages and covers the issue in great detail. Reproducing the entire report here would be of very little use so instead we’ll focus on what normalization Twitter is using.

Twitter counts the length of a Tweet using the Normalization Form C (NFC) version of the text. This type of normalization favors the use of a fully combined character (0xC3 0xA9 from the café example) over the long-form version (0x65 0xCC 0x81). Twitter also counts the number of codepoints in the text rather than UTF-8 bytes. The 0xC3 0xA9 from the café example is one codepoint (U+00E9) that is encoded as two bytes in UTF-8, whereas 0x65 0xCC 0x81 is two codepoints encoded as three bytes.
bitcoin
Posts: 924
Joined: Tue Jul 04, 2017 1:25 am

Re: Special Characters or Tweets that get a too long error from Twitter despite that on twitter.com the tweet would work

Post by bitcoin »

Found another example (from: viewtopic.php?f=30&t=5218&p=31097#p31097)

Code: Select all

Posting: 1 images
Post: C:\Users\Bad Robot\OneDrive\TwitterDub\DATA\CIF\WhitePaper Promo Mostly\Images\infographic-blockchain-tech-financial-markets-accenture.png
Process image: C:\Users\Bad Robot\OneDrive\TwitterDub\DATA\CIF\WhitePaper Promo Mostly\Images\infographic-blockchain-tech-financial-markets-accenture.png
Preprocess image: infographic-blockchain-tech-financial-markets-accenture.png
Add image to tweet: infographic-blockchain-tech-financial-markets-accenture.png using tmp path: C:\Users\Bad Robot\AppData\Local\Temp\NKiAoRBgt2SBv0a.png
Added image to tweet: infographic-blockchain-tech-financial-markets-accenture.png
PreTokens: Right now #businesses see no incentive to adopt #cryptocurrency by hedging risk and charging exorbitant fees<br><br>$CIF ((bitly-rand)https://goo.gl/qJs4y6)
PostTokens: Right now #businesses see no incentive to adopt #cryptocurrency by hedging risk and charging exorbitant fees<br><br>$CIF http://btc.みんな/2ywZIdm
Too long: 160
Snipped to: (137): Right now #businesses see no incentive to adopt #cryptocurrency by hedging risk and charging exorbitant fees

$CIF http://btc.みんな/2ywZIdm
* ERROR: TwitterErrorLog: The process cannot access the file 'C:\Users\Bad Robot\AppData\Roaming\rootjazz\Twitterdub\saved_data\reduced_length_tweets.txt' because it is being used by another process.
* FAILED: TweetPostId: 175655983 to ICO_PCA
Pausing for: 17029
Why does it say the tweet is too long? Is that because of the <br> statements? I don't think I can spot any other differences...?

When I checked, the tweet indeed wasn't posted... https://twitter.com/search?f=tweets&q=% ... e&src=typd
User avatar
martin@rootjazz
Site Admin
Posts: 34634
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Special Characters or Tweets that get a too long error from Twitter despite that on twitter.com the tweet would work

Post by martin@rootjazz »

just answered this. If you want unicode links then all best are off
bitcoin
Posts: 924
Joined: Tue Jul 04, 2017 1:25 am

Re: Special Characters or Tweets that get a too long error from Twitter despite that on twitter.com the tweet would work

Post by bitcoin »

martin@rootjazz wrote: Sat Oct 21, 2017 2:07 pm just answered this. If you want unicode links then all best are off
Do the math: TD has no problem with the URL - it's correctly parsed. The problem exists somewhere else :/
User avatar
martin@rootjazz
Site Admin
Posts: 34634
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Special Characters or Tweets that get a too long error from Twitter despite that on twitter.com the tweet would work

Post by martin@rootjazz »

keep to the other thread

<locked>
Locked