aimode.news
Published on

Microsoft MAI Series AI training data are exposed and there are discrepancies in the expression “business authorization only”

Authors

News from IT House on June 6, the technology media The Decoder released an article yesterday (5 June) reporting that the recently released Microsoft MAI Series AI model part uses unauthorized open network data training, which is inconsistent with its earlier assertion that “business-level, clean and business-authorized data only” was used.

As previously reported by IT House, in promoting the MAI series model, the Microsoft name was “only based on clean data starting training from scratch without using distillation data from third-party models”.

However, according to the officially disclosed MAI technical paper, these models do not rely solely on business authorization data, but also, in part, open network data, including Common Crawl, which clearly departs from Microsoft's previous external emphasis on “enterprise-level, clean and business authorization data”.

As described in the paper, Microsoft uses a mix of “publicly available data” and “authorized human-generated data”, covering both authorized language and Internet-published content.

In terms of access to online data, Microsoft claims to use its own reptiles and to comply with the Robots Export Protocol protocol.

and related meta-labels and HTML Control item. However, the media pointed out that the point of contention was that, in the case of unblocked content, acquiescence was considered to be capable of capture, and that content protection responsibilities were in fact more placed on the owners of the site, a logic that was similar to “the consent to enter without locking the door”.

Advertising statements: The external jump links (including not limited to hyperlinks, 2D codes, passwords, etc.) contained in the text are used to convey more information and save time for selection purposes only for reference purposes, which are included in all IT House articles.

Microsoft MAI Series AI training data are exposed and there are discrepancies in the expression “business authorization only” | aimode.news